The iRfcb package is an open-source R package designed
to streamline the analysis of Imaging FlowCytobot (IFCB) data, with a
focus on supporting marine ecological research and monitoring. By
integrating R and Python functionalities, the package facilitates
efficient handling and sharing of IFCB image data, extraction of key
metadata, and preparation of outputs for further taxonomic, ecological,
or spatial analyses.
This tutorial serves as an introduction to the core functionalities
of iRfcb, providing step-by-step instructions for data
preprocessing, taxonomic analysis, and SHARK-compliant data export. For
additional guides—such as quality control of IFCB data, data sharing,
and integration with MATLAB—please refer to the other tutorials
available on the project’s webpage.
You can install the package from CRAN using:
Load the iRfcb and dplyr libraries:
To get started, download sample data from the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024) with the following function:
This section demonstrates a selection of general data extraction
tools available in iRfcb.
Extract timestamps from sample names or filenames:
# Example sample names
filenames <- list.files("data/data/2023/D20230314", recursive = TRUE)
# Print filenames
print(filenames)
# Convert filenames to timestamps
timestamps <- ifcb_convert_filenames(filenames)
# Print result
print(timestamps)If the filename includes ROI numbers (e.g., in an extracted
.png image), a separate column, roi, will be
added to the output.
The analyzed volume of a sample can be calculated using data from
.hdr and .adc files.
Get the runtime from a .hdr file:
Morphological features can be computed directly from raw IFCB data in
R using the WHOI ifcb-features
Python package, without running the MATLAB ifcb-analysis
toolbox. The ifcb_extract_features() function computes the
“slim” feature set (version 4) and binary blob masks for each bin,
writing a <bin>_features_v4.csv table and a
<bin>_blobs_v4.zip archive to separate,
user-specified folders.
This requires the optional ifcb-features package, which
is installed alongside its dependencies (pyifcb,
phasepack, scikit-image,
scikit-learn) by passing features = TRUE to
ifcb_py_install():
Extract features and blobs from all bins found in a data folder.
Existing outputs are skipped unless overwrite = TRUE, so
the call can be re-run to resume an interrupted extraction. Set
parallel = TRUE to distribute bins across worker
processes:
# Extract features and blobs into separate folders
results <- ifcb_extract_features(
data_folder = "data/data/2023",
features_folder = "data/features/2023",
blobs_folder = "data/blobs/2023",
parallel = TRUE # Process bins in parallel (default FALSE)
)
# A tibble summarizing the status of each bin is returned invisibly
print(results)The resulting .csv files can then be read back in with
ifcb_read_features(), as shown below.
Read all feature files (.csv) from a folder:
# Read feature files from a folder
features <- ifcb_read_features("data/features/2023/",
verbose = FALSE) # Do not print progress bar
# Print output from the first sample in the list
print(features[[1]])
# Read only multiblob feature files
multiblob_features <- ifcb_read_features("data/features/2023",
multiblob = TRUE,
verbose = FALSE)
# Print output from the first sample in the list
print(multiblob_features[[1]])IFCB images stored in .roi files can be extracted as
.png files using the iRfcb package, as
demonstrated below.
Extract all images from a sample using the
ifcb_extract_pngs() function. You can specify the
out_folder, but by default, images will be saved in a
subdirectory within the same directory as the ROI file. The
gamma can be adjusted to enhance image contrast, and an
optional scale bar can be added by specifying
scale_bar_um.
# All ROIs in sample
ifcb_extract_pngs(
"data/data/2023/D20230314/D20230314T001205_IFCB134.roi",
gamma = 1, # Default gamma value
scale_bar_um = 5 # Add a 5 micrometer scale bar
) Extract specific ROIs:
# Only ROI number 2 and 5
ifcb_extract_pngs("data/data/2023/D20230314/D20230314T003836_IFCB134.roi",
ROInumbers = c(2, 5))To extract annotated images or classified results from MATLAB files,
please see the vignette("image-export-tutorial") and
vignette("matlab-tutorial") tutorials.
IFCB images can be classified directly in R using a CNN model served
by a Gradio application. By
default, the classification functions use an instance hosted on the
SciLifeLab Serve platform
(https://ifcb.serve.scilifelab.se). A free example Space is
also available on Hugging Face
(https://irfcb-classify.hf.space); it has limited resources
and is intended for testing and demonstration purposes. For large-scale
or production classification, we recommend deploying your own instance
of the IFCB
Classification App with your own model and passing its URL via the
gradio_url argument.
Use ifcb_classify_models() to list the CNN models
available on the Gradio server:
ifcb_classify_sample() extracts images from a
.roi file internally and returns predictions without
requiring a separate extraction step:
If images have already been extracted, pass a vector of PNG file
paths to ifcb_classify_images():
# List extracted PNG files
png_files <- list.files(
"data/data/2023/D20230314/D20230314T001205_IFCB134",
pattern = "\\.png$",
full.names = TRUE
)
# Classify images
results <- ifcb_classify_images(png_files, verbose = FALSE)
# Print result
print(results)Both functions return a data frame with file_name,
class_name, class_name_auto,
score, and model_name columns, and query the
Gradio API at https://ifcb.serve.scilifelab.se by default.
Per-class F2 optimal thresholds are always applied:
class_name contains the threshold-applied classification
(labeled "unclassified" when below threshold), while
class_name_auto contains the winning class without any
threshold. The top_n argument controls how many top
predictions are returned per image, and model_name
specifies which CNN model to use (default:
"SMHI NIVA SYKE SAMS SZN ResNet 50 V6").
ifcb_save_classification() classifies all images in a
.roi file and saves the full score matrix. Three output
formats are supported via the format argument:
# HDF5 (default) - IFCB Dashboard v3 format (requires hdf5r package)
ifcb_save_classification(
"data/data/2023/D20230314/D20230314T001205_IFCB134.roi",
output_folder = "output"
)
# Creates: output/D20230314T001205_IFCB134_class.h5
# MAT - IFCB Dashboard v1 format
ifcb_save_classification(
"data/data/2023/D20230314/D20230314T001205_IFCB134.roi",
output_folder = "output",
format = "mat"
)
# Creates: output/D20230314T001205_IFCB134_class_v1.mat
# CSV - ClassiPyR-compatible format
ifcb_save_classification(
"data/data/2023/D20230314/D20230314T001205_IFCB134.roi",
output_folder = "output",
format = "csv"
)
# Creates: output/D20230314T001205_IFCB134.csvThe output file contains output_scores (N x C matrix),
class_labels, roi_numbers, per-class
thresholds, and
class_labels_above_threshold.
Maintaining up-to-date taxonomic data is essential for ensuring
accurate species names and classifications, which directly impact
calculations like carbon concentrations in iRfcb.
Up-to-date taxonomy also ensures data harmonization by preventing issues like misspellings, outdated synonyms, or inconsistent classifications. This consistency is crucial for integrating and comparing datasets across studies, regions, and time periods, improving the reliability of scientific outcomes.
Taxonomic names can be matched against the World Register of Marine Species
(WoRMS), ensuring accuracy and consistency. The iRfcb
package includes a built-in function for taxon matching via the WoRMS
API, featuring a retry mechanism to handle server errors, making it
particularly useful for automated data pipelines. For additional tools
and functionality, the R package worrms
provides a comprehensive suite of options for interacting with the WoRMS
database.
This function takes a list of taxa names, cleans them, retrieves
their corresponding classification records from WoRMS, and checks if
they belong to the specified diatom class. The function only uses the
first name (genus name) of each taxa for classification. This function
can be useful for converting biovolumes to carbon according to
Menden-Deuer and Lessard (2000). See vol2C_diatom(),
vol2C_lgdiatom(), and vol2C_nondiatom() for
the carbon conversions; ifcb_extract_biovolumes() and
ifcb_summarize_biovolumes() use these internally and let
you choose the diatom relationship via the diatom_equation
argument.
# Read class2use file and select five taxa
class2use <- ifcb_get_mat_variable("data/config/class2use.mat")[10:15]
# Create a dataframe with class name and result from `ifcb_is_diatom`
class_list <- data.frame(class2use,
is_diatom = ifcb_is_diatom(class2use, verbose = FALSE))
# Print rows 10-15 of result
print(class_list)The default class for diatoms is defined as Bacillariophyceae, but
may be adjusted using the diatom_class argument.
This function takes a list of taxa names and matches them with the SMHI Trophic Type list used in SHARK.