Title: | Tools for Managing Imaging FlowCytobot (IFCB) Data |
---|---|
Description: | A comprehensive suite of tools for managing, processing, and analyzing data from the IFCB. I R FlowCytobot ('iRfcb') supports quality control, geospatial analysis, and preparation of IFCB data for publication in databases like <https://www.gbif.org>, <https://www.obis.org>, <https://emodnet.ec.europa.eu/en>, <https://shark.smhi.se/>, and <https://www.ecotaxa.org>. The package integrates with the MATLAB 'ifcb-analysis' tool, which is described in Sosik and Olson (2007) <doi:10.4319/lom.2007.5.204>, and provides features for working with raw, manually classified, and machine learning–classified image datasets. Key functionalities include image extraction, particle size distribution analysis, taxonomic data handling, and biomass concentration calculations, essential for plankton research. |
Authors: | Anders Torstensson [aut, cre] (Swedish Meteorological and Hydrological
Institute, <https://orcid.org/0000-0002-8283-656X>),
Kendra Hayashi [ctb] |
Maintainer: | Anders Torstensson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.3.9000 |
Built: | 2025-03-16 10:19:56 UTC |
Source: | https://github.com/europeanifcbgroup/irfcb |
This function generates a MANIFEST.txt file that lists all files in the specified paths, along with their sizes. It recursively includes files from directories and skips paths that do not exist. The manifest excludes the manifest file itself if present in the list.
create_package_manifest(paths, manifest_path = "MANIFEST.txt", temp_dir)
create_package_manifest(paths, manifest_path = "MANIFEST.txt", temp_dir)
paths |
A character vector of paths to files and/or directories to include in the manifest. |
manifest_path |
A character string specifying the path to the manifest file. Default is "MANIFEST.txt". |
temp_dir |
A character string specifying the temporary directory to be removed from the file paths. |
This function does not return any value. It creates a MANIFEST.txt
file at the specified location,
which contains a list of all files (including their sizes) in the provided paths.
The file paths are relative to the specified temp_dir
, and the manifest excludes the manifest file itself if present.
This function adjusts the classifications in manual annotation files based on a class2use file.
It loads a specified class2use file and applies the adjustments to all relevant files in the
specified manual folder. Optionally, it can also perform compression on the output files.
This is the R equivalent function of start_mc_adjust_classes_user_training
from the
ifcb-analysis repository
(Sosik and Olson 2007).
ifcb_adjust_classes(class2use_file, manual_folder, do_compression = TRUE)
ifcb_adjust_classes(class2use_file, manual_folder, do_compression = TRUE)
class2use_file |
A character string representing the full path to the class2use file (should be a .mat file). |
manual_folder |
A character string representing the path to the folder containing manual annotation files. The function will look for files starting with 'D' in this folder. |
do_compression |
A logical value indicating whether to apply compression to the output files. Defaults to TRUE. |
This function requires a python interpreter to be installed.
The required python packages can be installed in a virtual environment using ifcb_py_install
.
None
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_py_install
ifcb_create_class2use
https://github.com/hsosik/ifcb-analysis
## Not run: ifcb_adjust_classes("C:/training/config/class2use", "C:/training/manual/2014/") ## End(Not run)
## Not run: ifcb_adjust_classes("C:/training/config/class2use", "C:/training/manual/2014/") ## End(Not run)
This function creates or updates manual .mat
classlist files with a user specified class in batch,
based on input vector of IFCB image names.
These .mat
can be used with the code in the ifcb-analysis
repository (Sosik and Olson 2007).
ifcb_annotate_batch( png_images, class, manual_folder, adc_folder, class2use_file, manual_output = NULL, manual_recursive = FALSE, unclassified_id = 1, do_compression = TRUE )
ifcb_annotate_batch( png_images, class, manual_folder, adc_folder, class2use_file, manual_output = NULL, manual_recursive = FALSE, unclassified_id = 1, do_compression = TRUE )
png_images |
A character vector containing the names of the PNG images to be annotated in the format DYYYYMMDDTHHMMSS_IFCBXXX_ZZZZZ.png, where XXX represent the IFCB number and ZZZZZ the roi number. |
class |
A character string or integer specifying the class name or class2use index to annotate the images with. If a string is provided, it is matched against the available classes in |
manual_folder |
A character string specifying the path to the folder containing the manual |
adc_folder |
A character string specifying the path to the base folder containing the raw data, organized in subfolders by year (YYYY) and date (DYYYYMMDD). Each subfolder contains ADC files, which are used to determine the number of regions of interest (ROIs) for each sample when creating new manual |
class2use_file |
A character string specifying the path to the |
manual_output |
A character string specifying the path to the folder where updated or newly created |
manual_recursive |
A logical value indicating whether to search recursively within |
unclassified_id |
An integer specifying the class ID to use for unclassified regions of interest (ROIs) when creating new manual |
do_compression |
A logical value indicating whether to compress the .mat file. Default is TRUE. |
This function requires a python interpreter to be installed. The required python packages can be installed in a virtual environment using ifcb_py_install
.
If an image belongs to a sample that already has a corresponding manual .mat
file,
the function updates the class IDs for the specified regions of interest (ROIs) in that file.
If no manual file exists for the sample, the function creates a new one based on the sample's ADC data,
assigning unclassified IDs to all ROIs initially, then applying the specified class to the relevant ROIs.
The class parameter can be provided as either a string (class name) or an integer (class index).
If a string is provided, the function will attempt to match it to one of the available
classes in class2use_file
. If no match is found, an error is thrown.
The function assumes that the ADC files are organized in subfolders by year (YYYY) and date (DYYYYMMDD) within adc_folder
.
The function does not return a value. It creates or updates .mat
files in the manual_folder
to
reflect the specified annotations.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_correct_annotation
, ifcb_create_empty_manual_file
## Not run: # Initialize a python session if not already set up ifcb_py_install() # Annotate two png images with class "Nodularia_spumigena" and update or create manual files ifcb_annotate_batch( png_images = c("D20230812T162908_IFCB134_01399.png", "D20230714T102127_IFCB134_00069.png"), class = "Nodularia_spumigena", manual_folder = "path/to/manual", adc_folder = "path/to/adc", class2use_file = "path/to/class2use.mat" ) ## End(Not run)
## Not run: # Initialize a python session if not already set up ifcb_py_install() # Annotate two png images with class "Nodularia_spumigena" and update or create manual files ifcb_annotate_batch( png_images = c("D20230812T162908_IFCB134_01399.png", "D20230714T102127_IFCB134_00069.png"), class = "Nodularia_spumigena", manual_folder = "path/to/manual", adc_folder = "path/to/adc", class2use_file = "path/to/class2use.mat" ) ## End(Not run)
This function converts IFCB filenames to a data frame with separate columns for the sample name, full timestamp, year, month, day, time, and IFCB number. ROI numbers are included if available.
ifcb_convert_filenames(filenames)
ifcb_convert_filenames(filenames)
filenames |
A character vector of IFCB filenames in the format "DYYYYMMDDTHHMMSS_IFCBxxx". |
A tibble with columns:
sample
: The extracted sample name.
full_timestamp
: The full timestamp in "YYYY-MM-DD HH:MM:SS" format.
year
: The year as an integer.
month
: The month as an integer.
day
: The day as an integer.
time
: The extracted time in "HH:MM:SS" format.
ifcb_number
: The IFCB instrument number.
filenames <- c("D20230314T001205_IFCB134", "D20230615T123045_IFCB135") timestamps <- ifcb_convert_filenames(filenames) print(timestamps)
filenames <- c("D20230314T001205_IFCB134", "D20230615T123045_IFCB135") timestamps <- ifcb_convert_filenames(filenames) print(timestamps)
This function corrects annotations in MATLAB classlist files located in a specified manual folder,
generated by the code in the ifcb-analysis
repository (Sosik and Olson 2007).
It replaces the class ID of specified regions of interest (ROIs) in the classlist files based on
a correction file or a character vector.
ifcb_correct_annotation( manual_folder, out_folder, correction = NULL, correct_classid, do_compression = TRUE, correction_file = deprecated() )
ifcb_correct_annotation( manual_folder, out_folder, correction = NULL, correct_classid, do_compression = TRUE, correction_file = deprecated() )
manual_folder |
A character string specifying the path to the folder containing the original MAT classlist files to be updated. |
out_folder |
A character string specifying the path to the folder where updated MAT classlist files will be saved. |
correction |
Either a character string specifying the path to the correction file, or a character vector containing image filenames to be corrected.
If a file is provided, it should have a column named |
correct_classid |
An integer specifying the class ID to use for corrections. |
do_compression |
A logical value indicating whether to compress the .mat file. Default is TRUE. |
correction_file |
This function requires a python interpreter to be installed. The required python packages can be installed in a virtual environment using ifcb_py_install
.
The correction file is expected to contain at least one column: image_filename
, which includes the filenames of the images (with or without additional trailing information).
The function processes each file, corrects the annotations, and saves the updated files in the output folder.
If a character vector is provided as correction
, it will be used directly as a list of filenames for correction.
The correction
is typically generated using a Shiny app that provides an interactive interface for browsing and managing
IFCB (Imaging FlowCytobot) image galleries. This Shiny app can be initialized using the function ifcb_run_image_gallery()
.
This function does not return any value; it updates the classlist files in the specified output directory.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_py_install
https://github.com/hsosik/ifcb-analysis
## Not run: # Initialize a python session if not already set up ifcb_py_install() # Correct class ID in .mat classlist files using a correction file ifcb_correct_annotation("input/manual", "output/manual", "corrections.txt", 99) # Correct class ID in .mat classlist files using a character vector of filenames ifcb_correct_annotation("input/manual", "output/manual", c("D20230917T153755_IFCB134_01724.png", "D20230917T110059_IFCB134_00380.png"), 99) ## End(Not run)
## Not run: # Initialize a python session if not already set up ifcb_py_install() # Correct class ID in .mat classlist files using a correction file ifcb_correct_annotation("input/manual", "output/manual", "corrections.txt", 99) # Correct class ID in .mat classlist files using a character vector of filenames ifcb_correct_annotation("input/manual", "output/manual", c("D20230917T153755_IFCB134_01724.png", "D20230917T110059_IFCB134_00380.png"), 99) ## End(Not run)
This function processes .mat files, generated by the code in the ifcb-analysis
repository (Sosik and Olson 2007),
to count and summarize the annotations for each class based on the class2use information provided in a file.
ifcb_count_mat_annotations( manual_files, class2use_file, skip_class = NULL, sum_level = "class", mat_recursive = FALSE, use_python = FALSE )
ifcb_count_mat_annotations( manual_files, class2use_file, skip_class = NULL, sum_level = "class", mat_recursive = FALSE, use_python = FALSE )
manual_files |
A character string specifying the path to the .mat files or a folder containing .mat files. |
class2use_file |
A character string specifying the path to the file containing the class2use variable. |
skip_class |
A numeric vector of class IDs or a character vector of class names to be excluded from the count. Default is NULL. |
sum_level |
A character string specifying the level of summarization. Options: "sample", "roi" or "class" (default). |
mat_recursive |
Logical. If TRUE, the function will search for MATLAB files recursively when |
use_python |
Logical. If |
If use_python = TRUE
, the function tries to read the .mat
file using ifcb_read_mat()
, which relies on SciPy
.
This approach may be faster than the default approach using R.matlab::readMat()
, especially for large .mat
files.
To enable this functionality, ensure Python is properly configured with the required dependencies.
You can initialize the Python environment and install necessary packages using ifcb_py_install()
.
If use_python = FALSE
or if SciPy
is not available, the function falls back to using R.matlab::readMat()
.
A data frame with the total count of images per class, roi or per sample.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
## Not run: # Count annotations excluding specific class IDs result <- ifcb_count_mat_annotations("path/to/manual_folder", "path/to/class2use_file", skip_class = c(99, 100)) print(result) # Count annotations excluding a specific class name result <- ifcb_count_mat_annotations("path/to/manual_folder", "path/to/class2use_file", skip_class = "unclassified") print(result) ## End(Not run)
## Not run: # Count annotations excluding specific class IDs result <- ifcb_count_mat_annotations("path/to/manual_folder", "path/to/class2use_file", skip_class = c(99, 100)) print(result) # Count annotations excluding a specific class name result <- ifcb_count_mat_annotations("path/to/manual_folder", "path/to/class2use_file", skip_class = "unclassified") print(result) ## End(Not run)
This function creates a .mat
file containing a character vector of class names.
A class2use file can be used for manual annotation using the code in the ifcb-analysis
repository (Sosik and Olson 2007).
ifcb_create_class2use(classes, filename, do_compression = TRUE)
ifcb_create_class2use(classes, filename, do_compression = TRUE)
classes |
A character vector of class names to be saved in the |
filename |
A string specifying the output file path (with |
do_compression |
A logical value indicating whether to compress the |
This function requires a python interpreter to be installed.
The required python packages can be installed in a virtual environment using ifcb_py_install
.
No return value. This function is called for its side effect of creating a .mat
file.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_py_install
ifcb_adjust_classes
https://github.com/hsosik/ifcb-analysis
## Not run: # Example usage: classes <- c("unclassified", "Dinobryon_spp", "Helicostomella_spp") ifcb_create_class2use(classes, "class2use_output.mat", do_compression = TRUE) ## End(Not run)
## Not run: # Example usage: classes <- c("unclassified", "Dinobryon_spp", "Helicostomella_spp") ifcb_create_class2use(classes, "class2use_output.mat", do_compression = TRUE) ## End(Not run)
Generates a MAT file for IFCB data with an empty manual classification structure using a specified number of ROIs, class names, and saves it to a specified output file. This function utilizes a Python script for creating the structure.
ifcb_create_empty_manual_file( roi_length, class2use, output_file, unclassified_id = 1, do_compression = TRUE )
ifcb_create_empty_manual_file( roi_length, class2use, output_file, unclassified_id = 1, do_compression = TRUE )
roi_length |
Integer. The number of rows in the class list (number of ROIs). |
class2use |
Character vector. The names of the classes to include in the |
output_file |
Character. The path where the output MAT file will be saved. |
unclassified_id |
Integer. The value to use in the second column of the class list. Default is 1. |
do_compression |
A logical value indicating whether to compress the .mat file. Default is TRUE. |
This function requires a python interpreter to be installed. The required python packages can be installed in a virtual environment using ifcb_py_install
.
No return value. This function is called for its side effects.
The created MAT file is saved at the specified output_file
location.
## Not run: # Initialize a python session if not already set up ifcb_py_install() # Create a MAT file with 100 ROIs, using a vector of class names, and save it to "output.mat" ifcb_create_empty_manual_file(roi_length = 100, class2use = c("unclassified", "Aphanizomenon_spp"), output_file = "output.mat") # Create a MAT file with a different unclassified_id ifcb_create_empty_manual_file(roi_length = 100, class2use = c("unclassified", "Aphanizomenon_spp"), output_file = "output.mat", unclassified_id = 999) ## End(Not run)
## Not run: # Initialize a python session if not already set up ifcb_py_install() # Create a MAT file with 100 ROIs, using a vector of class names, and save it to "output.mat" ifcb_create_empty_manual_file(roi_length = 100, class2use = c("unclassified", "Aphanizomenon_spp"), output_file = "output.mat") # Create a MAT file with a different unclassified_id ifcb_create_empty_manual_file(roi_length = 100, class2use = c("unclassified", "Aphanizomenon_spp"), output_file = "output.mat", unclassified_id = 999) ## End(Not run)
This function generates a MANIFEST.txt file listing all files in a specified folder and its subfolders, along with their sizes in bytes. The function can optionally exclude an existing MANIFEST.txt file from the generated list. A manifest may be useful when archiving images in data repositories.
ifcb_create_manifest( folder_path, manifest_path = file.path(folder_path, "MANIFEST.txt"), exclude_manifest = TRUE )
ifcb_create_manifest( folder_path, manifest_path = file.path(folder_path, "MANIFEST.txt"), exclude_manifest = TRUE )
folder_path |
A character string specifying the path to the folder whose files are to be listed. |
manifest_path |
A character string specifying the path and name of the MANIFEST.txt file to be created. Defaults to "folder_path/MANIFEST.txt". |
exclude_manifest |
A logical value indicating whether to exclude an existing MANIFEST.txt file from the list. Defaults to TRUE. |
No return value, called for side effects. Creates a MANIFEST.txt file at the specified location.
## Not run: # Create a MANIFEST.txt file for the current directory ifcb_create_manifest(".") # Create a MANIFEST.txt file for a specific directory, excluding an existing MANIFEST.txt file ifcb_create_manifest("path/to/directory") # Create a MANIFEST.txt file and save it to a specific path ifcb_create_manifest("path/to/directory", manifest_path = "path/to/manifest/MANIFEST.txt") # Create a MANIFEST.txt file without excluding an existing MANIFEST.txt file ifcb_create_manifest("path/to/directory", exclude_manifest = FALSE) ## End(Not run)
## Not run: # Create a MANIFEST.txt file for the current directory ifcb_create_manifest(".") # Create a MANIFEST.txt file for a specific directory, excluding an existing MANIFEST.txt file ifcb_create_manifest("path/to/directory") # Create a MANIFEST.txt file and save it to a specific path ifcb_create_manifest("path/to/directory", manifest_path = "path/to/manifest/MANIFEST.txt") # Create a MANIFEST.txt file without excluding an existing MANIFEST.txt file ifcb_create_manifest("path/to/directory", exclude_manifest = FALSE) ## End(Not run)
This function downloads a zip archive containing MATLAB files from the iRfcb
dataset available in the SMHI IFCB Plankton Image Reference Library (Torstensson et al. 2024),
unzips them into the specified folder and extracts png images. These data can be used, for instance,
for testing iRfcb and for creating the tutorial vignette
using vignette("tutorial", package = "iRfcb")
ifcb_download_test_data( dest_dir, figshare_article = "48158716", max_retries = 5, sleep_time = 10, verbose = TRUE )
ifcb_download_test_data( dest_dir, figshare_article = "48158716", max_retries = 5, sleep_time = 10, verbose = TRUE )
dest_dir |
The destination directory where the files will be unzipped. |
figshare_article |
The file article number at the SciLifeLab Figshare data repository. By default, the iRfcb test dataset (48158716) from Torstensson et al. (2024) is used. |
max_retries |
The maximum number of retry attempts in case of download failure. Default is 5. |
sleep_time |
The sleep time between download attempts, in seconds. Default is 10. |
verbose |
A logical indicating whether to print progress messages. Default is TRUE. |
No return value. This function is called for its side effect of downloading, extracting, and organizing IFCB test data.
Torstensson, Anders; Skjevik, Ann-Turi; Mohlin, Malin; Karlberg, Maria; Karlson, Bengt (2024). SMHI IFCB Plankton Image Reference Library. Version 3. SciLifeLab. Dataset. doi:10.17044/scilifelab.25883455.v3
## Not run: # Download and unzip IFCB test data into the "data" directory ifcb_download_test_data("data") ## End(Not run)
## Not run: # Download and unzip IFCB test data into the "data" directory ifcb_download_test_data("data") ## End(Not run)
This function extracts labeled images from IFCB (Imaging FlowCytobot) data,
annotated using the MATLAB code from the ifcb-analysis
repository (Sosik and Olson 2007).
It reads manually classified data, maps class indices to class names, and extracts
the corresponding Region of Interest (ROI) images, saving them to the specified directory.
ifcb_extract_annotated_images( manual_folder, class2use_file, roi_folders, out_folder, skip_class = NA, verbose = TRUE, manual_recursive = FALSE, roi_recursive = TRUE, overwrite = FALSE, use_python = FALSE, roi_folder = deprecated() )
ifcb_extract_annotated_images( manual_folder, class2use_file, roi_folders, out_folder, skip_class = NA, verbose = TRUE, manual_recursive = FALSE, roi_recursive = TRUE, overwrite = FALSE, use_python = FALSE, roi_folder = deprecated() )
manual_folder |
A character string specifying the path to the directory containing the manually classified .mat files. |
class2use_file |
A character string specifying the path to the file containing class names. |
roi_folders |
A character vector specifying one or more directories containing the ROI files. |
out_folder |
A character string specifying the output directory where the extracted images will be saved. |
skip_class |
A numeric vector of class IDs or a character vector of class names to be excluded from the count. Default is NULL. |
verbose |
A logical value indicating whether to print progress messages. Default is TRUE. |
manual_recursive |
Logical. If TRUE, the function will search for MATLAB files recursively within the |
roi_recursive |
Logical. If TRUE, the function will search for data files recursively within the |
overwrite |
A logical value indicating whether to overwrite existing PNG files. Default is FALSE. |
use_python |
Logical. If |
roi_folder |
If use_python = TRUE
, the function tries to read the .mat
file using ifcb_read_mat()
, which relies on SciPy
.
This approach may be fasterthan the default approach using R.matlab::readMat()
, especially for large .mat
files.
To enable this functionality, ensure Python is properly configured with the required dependencies.
You can initialize the Python environment and install necessary packages using ifcb_py_install()
.
If use_python = FALSE
or if SciPy
is not available, the function falls back to using R.matlab::readMat()
.
None. The function saves the extracted PNG images to the specified output directory.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_extract_pngs
ifcb_extract_classified_images
https://github.com/hsosik/ifcb-analysis
## Not run: ifcb_extract_annotated_images( manual_folder = "path/to/manual_folder", class2use_file = "path/to/class2use_file.mat", roi_folders = "path/to/roi_folder", out_folder = "path/to/out_folder", skip_class = 1 ) ## End(Not run)
## Not run: ifcb_extract_annotated_images( manual_folder = "path/to/manual_folder", class2use_file = "path/to/class2use_file.mat", roi_folders = "path/to/roi_folder", out_folder = "path/to/out_folder", skip_class = 1 ) ## End(Not run)
This function reads biovolume data from feature files generated by the ifcb-analysis
repository (Sosik and Olson 2007)
and matches them with corresponding classification results or manual annotations. It calculates biovolume in cubic micrometers and
determines if each class is a diatom based on the World Register of Marine Species (WoRMS). Carbon content
is computed for each region of interest (ROI) using conversion functions from Menden-Deuer and Lessard (2000),
depending on whether the class is identified as a diatom.
ifcb_extract_biovolumes( feature_files, mat_folder = NULL, custom_images = NULL, custom_classes = NULL, class2use_file = NULL, micron_factor = 1/3.4, diatom_class = "Bacillariophyceae", marine_only = FALSE, threshold = "opt", multiblob = FALSE, feature_recursive = TRUE, mat_recursive = TRUE, use_python = FALSE, verbose = TRUE )
ifcb_extract_biovolumes( feature_files, mat_folder = NULL, custom_images = NULL, custom_classes = NULL, class2use_file = NULL, micron_factor = 1/3.4, diatom_class = "Bacillariophyceae", marine_only = FALSE, threshold = "opt", multiblob = FALSE, feature_recursive = TRUE, mat_recursive = TRUE, use_python = FALSE, verbose = TRUE )
feature_files |
A path to a folder containing feature files or a character vector of file paths. |
mat_folder |
(Optional) Path to the folder containing class or manual annotation files. |
custom_images |
(Optional) A character vector of image filenames in the format DYYYYMMDDTHHMMSS_IFCBXXX_ZZZZZ.png,
where "XXX" represents the IFCB number and "ZZZZZ" represents the ROI number.
These filenames should match the |
custom_classes |
(Optional) A character vector of corresponding class labels for |
class2use_file |
A character string specifying the path to the file containing the |
micron_factor |
Conversion factor for biovolume to cubic micrometers. Default is |
diatom_class |
A character vector specifying diatom class names in WoRMS. Default: |
marine_only |
Logical. If |
threshold |
Threshold for selecting classification information ( |
multiblob |
Logical. If |
feature_recursive |
Logical. If |
mat_recursive |
Logical. If |
use_python |
Logical. If |
verbose |
Logical. If |
Classification Data Handling:
If mat_folder
is provided, the function reads class annotations from MATLAB .mat
files.
If custom_images
and custom_classes
are supplied, they override MATLAB classification data (e.g. data from a CNN model).
If both mat_folder
and custom_images/custom_classes
are given, mat_folder
takes precedence.
MAT File Processing:
If use_python = TRUE
, the function reads .mat
files using ifcb_read_mat()
(requires Python + SciPy
).
Otherwise, it falls back to R.matlab::readMat()
.
A data frame containing:
sample
: The sample name.
classifier
: The classifier used (if applicable).
roi_number
: The region of interest (ROI) number.
class
: The identified taxonomic class.
biovolume_um3
: Computed biovolume in cubic micrometers.
carbon_pg
: Estimated carbon content in picograms.
Menden-Deuer Susanne, Lessard Evelyn J., (2000), Carbon to volume relationships for dinoflagellates, diatoms, and other protist plankton, Limnology and Oceanography, 3, doi: 10.4319/lo.2000.45.3.0569.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_read_features
ifcb_is_diatom
https://www.marinespecies.org/
## Not run: # Using MATLAB results: feature_files <- "data/features" mat_folder <- "data/classified" biovolume_df <- ifcb_extract_biovolumes(feature_files, mat_folder) print(biovolume_df) # Using custom classification result: class = c("Mesodinium_rubrum", "Mesodinium_rubrum") image <- c("D20220522T003051_IFCB134_00002", "D20220522T003051_IFCB134_00003") biovolume_df_custom <- ifcb_extract_biovolumes(feature_files, custom_images = image, custom_classes = class) print(biovolume_df_custom) ## End(Not run)
## Not run: # Using MATLAB results: feature_files <- "data/features" mat_folder <- "data/classified" biovolume_df <- ifcb_extract_biovolumes(feature_files, mat_folder) print(biovolume_df) # Using custom classification result: class = c("Mesodinium_rubrum", "Mesodinium_rubrum") image <- c("D20220522T003051_IFCB134_00002", "D20220522T003051_IFCB134_00003") biovolume_df_custom <- ifcb_extract_biovolumes(feature_files, custom_images = image, custom_classes = class) print(biovolume_df_custom) ## End(Not run)
This function reads a MATLAB classified sample file (.mat) generated
by the start_classify_batch_user_training
function from the ifcb-analysis
repository (Sosik and Olson 2007),
extracts specified taxa images from the corresponding ROI files,
and saves each image in a specified directory.
ifcb_extract_classified_images( sample, classified_folder, roi_folder, out_folder, taxa = "All", threshold = "opt", overwrite = FALSE, use_python = FALSE, verbose = TRUE )
ifcb_extract_classified_images( sample, classified_folder, roi_folder, out_folder, taxa = "All", threshold = "opt", overwrite = FALSE, use_python = FALSE, verbose = TRUE )
sample |
A character string specifying the sample name. |
classified_folder |
A character string specifying the directory containing the classified files. |
roi_folder |
A character string specifying the directory containing the ROI files. |
out_folder |
A character string specifying the directory to save the extracted images. |
taxa |
A character string specifying the taxa to extract. Default is "All". |
threshold |
A character string specifying the threshold to use ("none", "opt", "adhoc"). Default is "opt". |
overwrite |
A logical value indicating whether to overwrite existing PNG files. Default is FALSE. |
use_python |
Logical. If |
verbose |
A logical value indicating whether to print progress messages. Default is TRUE. |
If use_python = TRUE
, the function tries to read the .mat
file using ifcb_read_mat()
, which relies on SciPy
.
This approach may be faster than the default approach using R.matlab::readMat()
, especially for large .mat
files.
To enable this functionality, ensure Python is properly configured with the required dependencies.
You can initialize the Python environment and install necessary packages using ifcb_py_install()
.
If use_python = FALSE
or if SciPy
is not available, the function falls back to using R.matlab::readMat()
.
No return value, called for side effects. Extracts and saves taxa images to a directory.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_extract_pngs
ifcb_extract_annotated_images
https://github.com/hsosik/ifcb-analysis
## Not run: # Define the parameters sample <- "D20230311T092911_IFCB135" classified_folder <- "path/to/classified_folder" roi_folder <- "path/to/roi_folder" out_folder <- "path/to/outputdir" taxa <- "All" # or specify a particular taxa threshold <- "opt" # or specify another threshold # Extract taxa images from the classified sample ifcb_extract_classified_images(sample, classified_folder, roi_folder, out_folder, taxa, threshold) ## End(Not run)
## Not run: # Define the parameters sample <- "D20230311T092911_IFCB135" classified_folder <- "path/to/classified_folder" roi_folder <- "path/to/roi_folder" out_folder <- "path/to/outputdir" taxa <- "All" # or specify a particular taxa threshold <- "opt" # or specify another threshold # Extract taxa images from the classified sample ifcb_extract_classified_images(sample, classified_folder, roi_folder, out_folder, taxa, threshold) ## End(Not run)
This function reads an IFCB (.roi) file and its corresponding .adc file, extracts regions of interest (ROIs), and saves each ROI as a PNG image in a specified directory. Optionally, you can specify ROI numbers to extract, useful for specific ROIs from manual or automatic classification results.
ifcb_extract_pngs( roi_file, out_folder = dirname(roi_file), ROInumbers = NULL, taxaname = NULL, gamma = 1, verbose = TRUE, overwrite = FALSE )
ifcb_extract_pngs( roi_file, out_folder = dirname(roi_file), ROInumbers = NULL, taxaname = NULL, gamma = 1, verbose = TRUE, overwrite = FALSE )
roi_file |
A character string specifying the path to the .roi file. |
out_folder |
A character string specifying the directory where the PNG images will be saved. Defaults to the directory of the ROI file. |
ROInumbers |
An optional numeric vector specifying the ROI numbers to extract. If NULL, all ROIs with valid dimensions are extracted. |
taxaname |
An optional character string specifying the taxa name for organizing images into subdirectories. Defaults to NULL. |
gamma |
A numeric value for gamma correction applied to the image. Default is 1 (no correction). Values <1 increase contrast in dark regions, while values >1 decrease contrast. |
verbose |
A logical value indicating whether to print progress messages. Default is TRUE. |
overwrite |
A logical value indicating whether to overwrite existing PNG files. Default is FALSE. |
This function is called for its side effects: it writes PNG images to a directory.
ifcb_extract_classified_images
for extracting ROIs from automatic classification.
ifcb_extract_annotated_images
for extracting ROIs from manual annotation.
## Not run: # Convert ROI file to PNG images ifcb_extract_pngs("path/to/your_roi_file.roi") # Extract specific ROI numbers from ROI file ifcb_extract_pngs("path/to/your_roi_file.roi", "output_directory", ROInumbers = c(1, 2, 3)) ## End(Not run)
## Not run: # Convert ROI file to PNG images ifcb_extract_pngs("path/to/your_roi_file.roi") # Extract specific ROI numbers from ROI file ifcb_extract_pngs("path/to/your_roi_file.roi", "output_directory", ROInumbers = c(1, 2, 3)) ## End(Not run)
This function reads an example EcoTaxa metadata file included in the iRfcb
package.
ifcb_get_ecotaxa_example(example = "ifcb")
ifcb_get_ecotaxa_example(example = "ifcb")
example |
A character string specifying which example EcoTaxa metadata file to load. Options are:
|
This function loads different types of EcoTaxa metadata examples based on the user's need. The examples include a minimal template for manual data entry, as well as fully featured datasets with or without classified objects. The default is an IFCB-specific example, originating from https://github.com/VirginieSonnet/IFCBdatabaseToEcotaxa. The example headers can used when submitting data from Imaging FlowCytobot (IFCB) instruments to EcoTaxa at https://ecotaxa.obs-vlfr.fr/.
A data frame containing EcoTaxa example metadata.
ecotaxa_example <- ifcb_get_ecotaxa_example() # Print the first five columns dplyr::tibble(ecotaxa_example)
ecotaxa_example <- ifcb_get_ecotaxa_example() # Print the first five columns dplyr::tibble(ecotaxa_example)
This internal SMHI function reads .txt
files from a specified folder containing Ferrybox data,
filters them based on a specified ship name (default is "SveaFB" for R/V Svea), and extracts
data (including GPS coordinates) for timestamps (rounded to the nearest minute) falling within the date ranges defined in the file names.
ifcb_get_ferrybox_data( timestamps, ferrybox_folder, parameters = c("8002", "8003"), ship = "SveaFB", latitude_param = "8002", longitude_param = "8003" )
ifcb_get_ferrybox_data( timestamps, ferrybox_folder, parameters = c("8002", "8003"), ship = "SveaFB", latitude_param = "8002", longitude_param = "8003" )
timestamps |
A vector of POSIXct timestamps for which GPS coordinates and associated parameter data are to be retrieved. |
ferrybox_folder |
A string representing the path to the folder containing Ferrybox |
parameters |
A character vector specifying the parameters to extract from the Ferrybox data. Defaults to |
ship |
A string representing the name of the ship to filter Ferrybox files. The default is "SveaFB". |
latitude_param |
A string specifying the header name for the latitude column in the Ferrybox data. Default is "8002". |
longitude_param |
A string specifying the header name for the longitude column in the Ferrybox data. Default is "8003". |
The function extracts data from files whose names match the specified ship and fall within the date ranges defined in the file names. The columns corresponding to latitude_param
and longitude_param
will be renamed to gpsLatitude
and gpsLongitude
, respectively, if they are present in the parameters
argument.
The function also handles cases where the exact timestamp is missing by attempting to interpolate the data using floor and ceiling rounding methods. The final output will ensure that all specified parameters are numeric.
A data frame containing the input timestamps and corresponding data for the specified parameters. Columns include 'timestamp', 'gpsLatitude', 'gpsLongitude' (if applicable), and the specified parameters.
## Not run: ferrybox_folder <- "/path/to/ferrybox/data" timestamps <- as.POSIXct(c("2016-08-10 10:47:34 UTC", "2016-08-10 11:12:21 UTC", "2016-08-10 11:35:59 UTC")) result <- ifcb_get_ferrybox_data(timestamps, ferrybox_folder) print(result) ## End(Not run)
## Not run: ferrybox_folder <- "/path/to/ferrybox/data" timestamps <- as.POSIXct(c("2016-08-10 10:47:34 UTC", "2016-08-10 11:12:21 UTC", "2016-08-10 11:35:59 UTC")) result <- ifcb_get_ferrybox_data(timestamps, ferrybox_folder) print(result) ## End(Not run)
This function reads a .mat file generated the ifcb-analysis
repository (Sosik and Olson 2007) and retrieves the
names of all variables stored within it.
ifcb_get_mat_names(mat_file, use_python = FALSE)
ifcb_get_mat_names(mat_file, use_python = FALSE)
mat_file |
A character string specifying the path to the .mat file. |
use_python |
Logical. If |
If use_python = TRUE
, the function tries to read the .mat
file using ifcb_read_mat()
, which relies on SciPy
.
This approach may be faster than the default approach using R.matlab::readMat()
, especially for large .mat
files.
To enable this functionality, ensure Python is properly configured with the required dependencies.
You can initialize the Python environment and install necessary packages using ifcb_py_install()
.
If use_python = FALSE
or if SciPy
is not available, the function falls back to using R.matlab::readMat()
.
A character vector of variable names.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_get_mat_variable
https://github.com/hsosik/ifcb-analysis
## Not run: # Get variable names from a MAT file variables <- ifcb_get_mat_names("path/to/file.mat") print(variables) ## End(Not run)
## Not run: # Get variable names from a MAT file variables <- ifcb_get_mat_names("path/to/file.mat") print(variables) ## End(Not run)
This function reads a specified variable from a .mat file generated by the ifcb-analysis
repository (Sosik and Olson 2007).
It can be used, for example, to extract lists of classes from the file.
ifcb_get_mat_variable( mat_file, variable_name = "class2use", use_python = FALSE )
ifcb_get_mat_variable( mat_file, variable_name = "class2use", use_python = FALSE )
mat_file |
A character string specifying the path to the .mat file containing the class information. |
variable_name |
A character string specifying the variable name in the .mat file that contains the class information.
The default is "class2use". Other examples include "class2use_manual" from a manual file, or "class2use_auto"
for a class list used for automatic assignment. You can find available variable names using the function |
use_python |
Logical. If |
If use_python = TRUE
, the function tries to read the .mat
file using ifcb_read_mat()
, which relies on SciPy
.
This approach may be faster than the default approach using R.matlab::readMat()
, especially for large .mat
files.
To enable this functionality, ensure Python is properly configured with the required dependencies.
You can initialize the Python environment and install necessary packages using ifcb_py_install()
.
If use_python = FALSE
or if SciPy
is not available, the function falls back to using R.matlab::readMat()
.
A character vector of class names.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_get_mat_names
https://github.com/hsosik/ifcb-analysis
## Not run: # Get class names from a class2use file classes <- ifcb_get_mat_variable("path/to/class2use.mat", "class2use") print(classes) # Get class names from a classifier file class2useTB <- ifcb_get_mat_variable("path/to/classified/sample.mat", "class2useTB") print(class2useTB) ## End(Not run)
## Not run: # Get class names from a class2use file classes <- ifcb_get_mat_variable("path/to/class2use.mat", "class2use") print(classes) # Get class names from a classifier file class2useTB <- ifcb_get_mat_variable("path/to/classified/sample.mat", "class2useTB") print(class2useTB) ## End(Not run)
This function imports an IFCB header file (either from a local path or URL),
extracts specific target values such as runtime and inhibittime,
and returns them in a structured format (in seconds). This is
the R equivalent function of IFCBxxx_readhdr
from the ifcb-analysis
repository (Sosik and Olson 2007).
ifcb_get_runtime(hdr_file)
ifcb_get_runtime(hdr_file)
hdr_file |
A character string specifying the full path to the .hdr file or URL. |
A list (hdr) containing runtime, inhibittime, and runType (if available) extracted from the header file.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
https://github.com/hsosik/ifcb-analysis
## Not run: # Example: Read and extract information from an IFCB header file hdr_info <- ifcb_get_runtime("path/to/IFCB_hdr_file.hdr") print(hdr_info) ## End(Not run)
## Not run: # Example: Read and extract information from an IFCB header file hdr_info <- ifcb_get_runtime("path/to/IFCB_hdr_file.hdr") print(hdr_info) ## End(Not run)
This function reads SHARK column names from a specified tab-separated values (TSV) file included in the package. These columns are used for submitting IFCB data to https://shark.smhi.se/.
ifcb_get_shark_colnames(minimal = FALSE)
ifcb_get_shark_colnames(minimal = FALSE)
minimal |
A logical value indicating whether to load only the minimal set of column names required for data submission to SHARK. Default is FALSE. |
For a detailed example of a data submission, see ifcb_get_shark_example
.
An empty data frame containing the SHARK column names.
shark_colnames <- ifcb_get_shark_colnames() print(shark_colnames) shark_colnames_minimal <- ifcb_get_shark_colnames(minimal = TRUE) print(shark_colnames_minimal)
shark_colnames <- ifcb_get_shark_colnames() print(shark_colnames) shark_colnames_minimal <- ifcb_get_shark_colnames(minimal = TRUE) print(shark_colnames_minimal)
This function reads a SHARK submission example from a file included in the package. This format is used for submitting IFCB data to https://shark.smhi.se/.
ifcb_get_shark_example()
ifcb_get_shark_example()
A data frame containing example data following the SHARK submission format.
shark_example <- ifcb_get_shark_example() # Print example as tibble dplyr::tibble(shark_example)
shark_example <- ifcb_get_shark_example() # Print example as tibble dplyr::tibble(shark_example)
This function matches a specified list of taxa with a summarized list of trophic types
for various plankton taxa from Northern Europe (data sourced from SMHI Trophic Type
).
ifcb_get_trophic_type(taxa_list = NULL, print_complete_list = FALSE)
ifcb_get_trophic_type(taxa_list = NULL, print_complete_list = FALSE)
taxa_list |
A character vector of scientific names for which trophic types are to be retrieved. |
print_complete_list |
Logical, if TRUE, prints the complete list of summarized trophic types. |
If there are multiple trophic types for a scientific name (i.e. AU and HT size classes), the summarized trophic type is "NS".
A character vector of trophic types corresponding to the scientific names in taxa_list
,
or a data frame containing all taxa and trophic types available in the SMHI Trophic Type
list.
The available trophic types are autotrophic (AU), heterotrophic (HT), mixotrophic (MX) or not specified (NS).
# Example usage: taxa_list <- c("Acanthoceras zachariasii", "Nodularia spumigena", "Acanthoica quattrospina", "Noctiluca", "Gymnodiniales") ifcb_get_trophic_type(taxa_list)
# Example usage: taxa_list <- c("Acanthoceras zachariasii", "Nodularia spumigena", "Acanthoica quattrospina", "Noctiluca", "Gymnodiniales") ifcb_get_trophic_type(taxa_list)
This function takes a list of taxa names, cleans them, retrieves their corresponding classification records from the World Register of Marine Species (WoRMS), and checks if they belong to the specified diatom class. The function only uses the first name (genus name) of each taxa for classification.
ifcb_is_diatom( taxa_list, diatom_class = "Bacillariophyceae", max_retries = 3, sleep_time = 10, marine_only = FALSE, fuzzy = deprecated(), verbose = TRUE )
ifcb_is_diatom( taxa_list, diatom_class = "Bacillariophyceae", max_retries = 3, sleep_time = 10, marine_only = FALSE, fuzzy = deprecated(), verbose = TRUE )
taxa_list |
A character vector containing the list of taxa names. |
diatom_class |
A character string or vector specifying the class name(s) to be identified as diatoms, according to WoRMS. Default is "Bacillariophyceae". |
max_retries |
An integer specifying the maximum number of attempts to retrieve WoRMS records in case of an error. Default is 3. |
sleep_time |
A numeric value indicating the number of seconds to wait between retry attempts. Default is 10 seconds. |
marine_only |
Logical. If TRUE, restricts the search to marine taxa only. Default is FALSE. |
fuzzy |
|
verbose |
A logical indicating whether to print progress messages. Default is TRUE. |
A logical vector indicating whether each cleaned taxa name belongs to the specified diatom class.
https://www.marinespecies.org/
## Not run: taxa_list <- c("Nitzschia_sp", "Chaetoceros_sp", "Dinophysis_norvegica", "Thalassiosira_sp") ifcb_is_diatom(taxa_list) ## End(Not run)
## Not run: taxa_list <- c("Nitzschia_sp", "Chaetoceros_sp", "Dinophysis_norvegica", "Thalassiosira_sp") ifcb_is_diatom(taxa_list) ## End(Not run)
This function checks if vectors of latitude and longitude points are within a user-supplied sea basin.
The Baltic Sea basins are included as a pre-packaged shapefile in the iRfcb
package.
ifcb_is_in_basin(latitudes, longitudes, plot = FALSE, shape_file = NULL)
ifcb_is_in_basin(latitudes, longitudes, plot = FALSE, shape_file = NULL)
latitudes |
A numeric vector of latitude points. |
longitudes |
A numeric vector of longitude points. |
plot |
A boolean indicating whether to plot the points and the sea basin. Default is FALSE. |
shape_file |
The absolute path to a custom polygon shapefile in WGS84 (EPSG:4326) that represents the specific sea basin.
Default is a land-buffered shapefile of the Baltic Sea basins, included in the |
This function reads a pre-packaged shapefile of the Baltic Sea Basin from the iRfcb
package by default, or a user-supplied
shapefile if provided. It sets the CRS, transforms the CRS to WGS84 (EPSG:4326) if necessary, and checks if the given points
fall within the specified sea basin. Optionally, it plots the points and the sea basin polygons together.
A logical vector indicating whether each point is within the specified sea basin, or a plot with the points and basins if plot = TRUE
.
# Define example latitude and longitude vectors latitudes <- c(55.337, 54.729, 56.311, 57.975) longitudes <- c(12.674, 14.643, 12.237, 10.637) # Check if the points are in the Baltic Sea Basin points_in_the_baltic <- ifcb_is_in_basin(latitudes, longitudes) print(points_in_the_baltic) # Plot the points and the basin ifcb_is_in_basin(latitudes, longitudes, plot = TRUE)
# Define example latitude and longitude vectors latitudes <- c(55.337, 54.729, 56.311, 57.975) longitudes <- c(12.674, 14.643, 12.237, 10.637) # Check if the points are in the Baltic Sea Basin points_in_the_baltic <- ifcb_is_in_basin(latitudes, longitudes) print(points_in_the_baltic) # Plot the points and the basin ifcb_is_in_basin(latitudes, longitudes, plot = TRUE)
Determines whether given positions are near land based on a coastline shape file.
The Natural Earth 1:10m land vectors are included as default shapefile in iRfcb
.
ifcb_is_near_land( latitudes, longitudes, distance = 500, shape = NULL, crs = 4326, utm_zone = 33, remove_small_islands = TRUE, small_island_threshold = 2e+06 )
ifcb_is_near_land( latitudes, longitudes, distance = 500, shape = NULL, crs = 4326, utm_zone = 33, remove_small_islands = TRUE, small_island_threshold = 2e+06 )
latitudes |
Numeric vector of latitudes for positions. |
longitudes |
Numeric vector of longitudes for positions. |
distance |
Buffer distance in meters around the coastline. Default is 500 m. |
shape |
Optional path to a shapefile containing coastline data. If provided, the function will use this shapefile instead of the default Natural Earth 1:10m land vectors. Using a more detailed shapefile allows for a smaller buffer distance. For detailed European coastlines, download polygons from the EEA at https://www.eea.europa.eu/data-and-maps/data/eea-coastline-for-analysis-2/gis-data/eea-coastline-polygon. For more detailed world maps, download from Natural Earth at https://www.naturalearthdata.com/downloads/10m-physical-vectors/. |
crs |
Coordinate reference system (CRS) to use for positions and output. Default is EPSG code 4326 (WGS84). |
utm_zone |
UTM zone for buffering the coastline. Default is 33 (between 12°E and 18°E, northern hemisphere). |
remove_small_islands |
Logical indicating whether to remove small islands from the coastline if a custom shapefile is provided. Default is TRUE. |
small_island_threshold |
Area threshold in square meters below which islands will be considered small and removed, if remove_small_islands is set to TRUE. Default is 2 sqkm. |
This function calculates a buffered area around the coastline and checks if given positions (specified by longitudes and latitudes) are within this buffer or intersect with land.
Logical vector indicating whether each position is near land.
# Define coordinates latitudes <- c(62.500353, 58.964498, 57.638725, 56.575338) longitudes <- c(17.845993, 20.394418, 18.284523, 16.227174) # Call the function near_land <- ifcb_is_near_land(latitudes, longitudes, distance = 300, crs = 4326) # Print the result print(near_land)
# Define coordinates latitudes <- c(62.500353, 58.964498, 57.638725, 56.575338) longitudes <- c(17.845993, 20.394418, 18.284523, 16.227174) # Call the function near_land <- ifcb_is_near_land(latitudes, longitudes, distance = 300, crs = 4326) # Print the result print(near_land)
This function attempts to retrieve WoRMS records using the provided taxa names. It retries the operation if an error occurs, up to a specified number of attempts.
ifcb_match_taxa_names( taxa_names, best_match_only = TRUE, max_retries = 3, sleep_time = 10, marine_only = FALSE, return_list = FALSE, verbose = TRUE, fuzzy = deprecated() )
ifcb_match_taxa_names( taxa_names, best_match_only = TRUE, max_retries = 3, sleep_time = 10, marine_only = FALSE, return_list = FALSE, verbose = TRUE, fuzzy = deprecated() )
taxa_names |
A character vector of taxa names to retrieve records for. |
best_match_only |
A logical value indicating whether to automatically select the first match and return a single match. Default is TRUE. |
max_retries |
An integer specifying the maximum number of attempts to retrieve records. |
sleep_time |
A numeric value indicating the number of seconds to wait between retry attempts. |
marine_only |
Logical. If TRUE, restricts the search to marine taxa only. Default is FALSE. |
return_list |
A logical value indicating whether to to return the output as a list. Default is FALSE, where the result is returned as a dataframe. |
verbose |
A logical indicating whether to print progress messages. Default is TRUE. |
fuzzy |
A data frame (or list if return_list is TRUE) of WoRMS records or NULL if the retrieval fails after the maximum number of attempts.
## Not run: # Example: Retrieve WoRMS records for a list of taxa names taxa <- c("Calanus finmarchicus", "Thalassiosira pseudonana", "Phaeodactylum tricornutum") records <- ifcb_match_taxa_names(taxa_names = taxa, max_retries = 3, sleep_time = 5, marine_only = TRUE, verbose = TRUE) print(records) ## End(Not run)
## Not run: # Example: Retrieve WoRMS records for a list of taxa names taxa <- c("Calanus finmarchicus", "Thalassiosira pseudonana", "Phaeodactylum tricornutum") records <- ifcb_match_taxa_names(taxa_names = taxa, max_retries = 3, sleep_time = 5, marine_only = TRUE, verbose = TRUE) print(records) ## End(Not run)
This function merges two sets of manual classification data by combining
and aligning class labels from a base set and an additional set of classifications.
The merged .mat
data can be used with the code in the ifcb-analysis
repository (Sosik and Olson 2007).
ifcb_merge_manual( class2use_file_base, class2use_file_additions, class2use_file_output = NULL, manual_folder_base, manual_folder_additions, manual_folder_output, do_compression = TRUE, temp_index_offset = 50000, skip_class = NULL, quiet = FALSE )
ifcb_merge_manual( class2use_file_base, class2use_file_additions, class2use_file_output = NULL, manual_folder_base, manual_folder_additions, manual_folder_output, do_compression = TRUE, temp_index_offset = 50000, skip_class = NULL, quiet = FALSE )
class2use_file_base |
Character. Path to the |
class2use_file_additions |
Character. Path to the |
class2use_file_output |
Character. Path where the merged |
manual_folder_base |
Character. Path to the folder containing the base set of manual classification |
manual_folder_additions |
Character. Path to the folder containing the additions set of manual classification |
manual_folder_output |
Character. Path to the output folder where the merged classification files will be stored. |
do_compression |
A logical value indicating whether to compress the |
temp_index_offset |
Numeric. A large integer used to generate temporary indices during the merge process. Default is 50000. |
skip_class |
Character. A vector of class names to skip from the |
quiet |
Logical. If |
This function requires a python interpreter to be installed. The required python packages can be installed in a virtual environment using ifcb_py_install
.
The base set consists of the original classifications that are used as a reference for the merging process. The additions set contains the additional classifications that need to be merged with the base set. When merging, unique class names from the additions set that are not present in the base set are appended.
The function works by aligning the class labels from the additions set with those in the base set,
handling conflicts by using a temporary index system. It copies .mat
files from both the base and
additions folders into the output folder, while adjusting indices and and class names for the additions.
Note that the maximum limit for uint16
is 65,535, so ensure that temp_index_offset
remains below this value.
No return value. Outputs the combined class2use
file in the same folder as class2use_file_base
is located or at a user-specified location,
and merged .mat
files into the output folder.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_py_install
https://github.com/hsosik/ifcb-analysis
## Not run: ifcb_merge_manual("path/to/class2use_base.mat", "path/to/class2use_additions.mat", "path/to/class2use_combined.mat", "path/to/manual/base_folder", "path/to/manual/additions_folder", "path/to/manual/output_folder", do_compression = TRUE, temp_index_offset = 50000, quiet = FALSE) ## End(Not run)
## Not run: ifcb_merge_manual("path/to/class2use_base.mat", "path/to/class2use_additions.mat", "path/to/class2use_combined.mat", "path/to/manual/base_folder", "path/to/manual/additions_folder", "path/to/manual/output_folder", do_compression = TRUE, temp_index_offset = 50000, quiet = FALSE) ## End(Not run)
This function generates and saves data about a dataset's Particle Size Distribution (PSD) from Imaging FlowCytobot (IFCB) feature and hdr files, which can be used for data quality assurance and quality control.
ifcb_psd( feature_folder, hdr_folder, save_data = FALSE, output_file = NULL, plot_folder = NULL, use_marker = FALSE, start_fit = 10, r_sqr = 0.5, beads = NULL, bubbles = NULL, incomplete = NULL, missing_cells = NULL, biomass = NULL, bloom = NULL, humidity = NULL, micron_factor = 1/3.4 )
ifcb_psd( feature_folder, hdr_folder, save_data = FALSE, output_file = NULL, plot_folder = NULL, use_marker = FALSE, start_fit = 10, r_sqr = 0.5, beads = NULL, bubbles = NULL, incomplete = NULL, missing_cells = NULL, biomass = NULL, bloom = NULL, humidity = NULL, micron_factor = 1/3.4 )
feature_folder |
The absolute path to a directory containing all of the v2 feature files for the dataset. |
hdr_folder |
The absolute path to a directory containing all of the hdr files for the dataset. |
save_data |
A boolean indicating whether to save data to CSV files. Default is FALSE. |
output_file |
A string with the base file name for the .csv to use (including path). Set to NULL to not save data (default). |
plot_folder |
The folder where graph images for each file will be saved. Set to NULL to not save graphs (default). |
use_marker |
A boolean indicating whether to show markers on the plot. Default is FALSE. |
start_fit |
An integer indicating the start fit value for the plot. Default is 10. |
r_sqr |
The lower limit of acceptable R^2 values (any curves below it will be flagged). Default is 0.5. |
beads |
The maximum multiplier for the curve fit. Any files with higher curve fit multipliers will be flagged as bead runs. If this argument is included, files with "runBeads" marked as TRUE in the header file will also be flagged as a bead run. Optional. |
bubbles |
The minimum difference between the starting ESD and the ESD with the most targets. Any files with a difference higher than this threshold will be flagged as mostly bubbles. Optional. |
incomplete |
A tuple with the minimum volume of cells (in c/L) and the minimum mL analyzed for a complete run. Any files with values below these thresholds will be flagged as incomplete. Optional. |
missing_cells |
The minimum image count to trigger count ratio. Any files with lower ratios will be flagged as missing cells. Optional. |
biomass |
The minimum number of targets in the most populated ESD bin for any given run. Any files with fewer targets will be flagged as having low biomass. Optional. |
bloom |
The minimum difference between the starting ESD and the ESD with the most targets. Any files with a difference less than this threshold will be flagged as a bloom. Will likely be lower than the bubbles threshold. Optional. |
humidity |
The maximum percent humidity. Any files with higher values will be flagged as high humidity. Optional. |
micron_factor |
The conversion factor to microns. Default is 1/3.4. |
The PSD function originates from the PSD
python repository (Hayashi et al. in prep), which can be found at https://github.com/kudelalab/PSD.
This function requires a python interpreter to be installed. The required python packages can be installed in a virtual environment using ifcb_py_install
.
The function requires v2 features generated by the ifcb-analysis
MATLAB package (Sosik and Olson 2007) found at https://github.com/hsosik/ifcb-analysis.
A list with data, fits, and flags DataFrames if save_data
is FALSE; otherwise, NULL.
Hayashi, K., Walton, J., Lie, A., Smith, J. and Kudela M. Using particle size distribution (PSD) to automate imaging flow cytobot (IFCB) data quality in coastal California, USA. In prep. Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_py_install
https://github.com/kudelalab/PSD https://github.com/hsosik/ifcb-analysis
## Not run: # Initialize the python session if not already set up ifcb_py_install() ifcb_psd( feature_folder = 'path/to/features', hdr_folder = 'path/to/hdr_data', save_data = TRUE, output_file = 'psd/svea_2021', plot_folder = 'psd/plots', use_marker = FALSE, start_fit = 13, r_sqr = 0.5, beads = 10 ** 9, bubbles = 150, incomplete = c(1500, 3), missing_cells = 0.7, biomass = 1000, bloom = 5, humidity = NULL, micron_factor = 1/3.0 ) ## End(Not run)
## Not run: # Initialize the python session if not already set up ifcb_py_install() ifcb_psd( feature_folder = 'path/to/features', hdr_folder = 'path/to/hdr_data', save_data = TRUE, output_file = 'psd/svea_2021', plot_folder = 'psd/plots', use_marker = FALSE, start_fit = 13, r_sqr = 0.5, beads = 10 ** 9, bubbles = 150, incomplete = c(1500, 3), missing_cells = 0.7, biomass = 1000, bloom = 5, humidity = NULL, micron_factor = 1/3.0 ) ## End(Not run)
This function generates a plot for a given sample from Particle Size Distribution (PSD) data and fits from Imaging FlowCytobot (IFCB).
The PSD data and fits can be generated by ifcb_psd
(Hayashi et al. in prep).
ifcb_psd_plot(sample_name, data, fits, start_fit)
ifcb_psd_plot(sample_name, data, fits, start_fit)
sample_name |
The name of the sample to plot in DYYYYMMDDTHHMMSS. |
data |
A data frame containing the PSD data (data output from |
fits |
A data frame containing the fit parameters for the power curve (fits output from |
start_fit |
The x-value threshold below which data should be excluded from the plot and fit. |
A ggplot object representing the PSD plot for the sample.
Hayashi, K., Walton, J., Lie, A., Smith, J. and Kudela M. Using particle size distribution (PSD) to automate imaging flow cytobot (IFCB) data quality in coastal California, USA. In prep.
ifcb_psd
https://github.com/kudelalab/PSD
## Not run: # Analyze PSD psd <- ifcb_psd(feature_folder = 'path/to/features', hdr_folder = 'path/to/hdr_data', save_data = TRUE, output_file = 'psd/svea_2021', plot_folder = NULL, use_marker = FALSE, start_fit = 13, r_sqr = 0.5, beads = 10 ** 9, bubbles = 150, incomplete = c(1500, 3), missing_cells = 0.7, biomass = 1000, bloom = 5, humidity = NULL) # Plot PSD of the first sample plot <- ifcb_psd_plot(sample_name = "D20230316T101514", data = psd$data, fits = psd$fits, start_fit = 10) # Inspect plot print(plot) ## End(Not run)
## Not run: # Analyze PSD psd <- ifcb_psd(feature_folder = 'path/to/features', hdr_folder = 'path/to/hdr_data', save_data = TRUE, output_file = 'psd/svea_2021', plot_folder = NULL, use_marker = FALSE, start_fit = 13, r_sqr = 0.5, beads = 10 ** 9, bubbles = 150, incomplete = c(1500, 3), missing_cells = 0.7, biomass = 1000, bloom = 5, humidity = NULL) # Plot PSD of the first sample plot <- ifcb_psd_plot(sample_name = "D20230316T101514", data = psd$data, fits = psd$fits, start_fit = 10) # Inspect plot print(plot) ## End(Not run)
This function sets up the Python environment for iRfcb
. By default, it creates and activates a Python virtual environment (venv
) named "iRfcb" and installs the required Python packages from the "requirements.txt" file.
Alternatively, users can opt to use the system Python instead of creating a virtual environment by setting use_venv = FALSE
.
ifcb_py_install( envname = ".virtualenvs/iRfcb", use_venv = TRUE, packages = NULL )
ifcb_py_install( envname = ".virtualenvs/iRfcb", use_venv = TRUE, packages = NULL )
envname |
A character string specifying the name of the virtual environment to create. Default is ".virtualenvs/iRfcb". |
use_venv |
Logical. If |
packages |
A character vector of additional Python packages to install. If NULL (default), only the packages from "requirements.txt" are installed. |
No return value. This function is called for its side effect of configuring the Python environment.
## Not run: # Install the iRfcb Python environment using a virtual environment (default) ifcb_py_install() # Install the iRfcb Python environment with additional packages ifcb_py_install(packages = c("numpy", "plotly")) # Use system Python instead of a virtual environment ifcb_py_install(use_venv = FALSE) ## End(Not run)
## Not run: # Install the iRfcb Python environment using a virtual environment (default) ifcb_py_install() # Install the iRfcb Python environment with additional packages ifcb_py_install(packages = c("numpy", "plotly")) # Use system Python instead of a virtual environment ifcb_py_install(use_venv = FALSE) ## End(Not run)
This function reads feature files from a given folder or a specified set of file paths, optionally filtering them based on whether they are multiblob or single blob files.
ifcb_read_features(feature_files = NULL, multiblob = FALSE, verbose = TRUE)
ifcb_read_features(feature_files = NULL, multiblob = FALSE, verbose = TRUE)
feature_files |
A path to a folder containing feature files or a character vector of file paths. |
multiblob |
Logical indicating whether to filter for multiblob files (default: FALSE). |
verbose |
Logical. Whether to display progress information. Default is TRUE. |
A named list of data frames, where each element corresponds to a feature file read from feature_files
.
The list is named with the base names of the feature files.
## Not run: # Read feature files from a folder features <- ifcb_read_features("path/to/feature_folder") # Read only multiblob feature files multiblob_features <- ifcb_read_features("path/to/feature_folder", multiblob = TRUE) # Read feature files from a list of file paths features <- ifcb_read_features(c("path/to/file1.csv", "path/to/file2.csv")) ## End(Not run)
## Not run: # Read feature files from a folder features <- ifcb_read_features("path/to/feature_folder") # Read only multiblob feature files multiblob_features <- ifcb_read_features("path/to/feature_folder", multiblob = TRUE) # Read feature files from a list of file paths features <- ifcb_read_features(c("path/to/file1.csv", "path/to/file2.csv")) ## End(Not run)
This function reads all IFCB instrument settings information files (.hdr) from a specified directory.
ifcb_read_hdr_data( hdr_files, gps_only = FALSE, verbose = TRUE, hdr_folder = deprecated() )
ifcb_read_hdr_data( hdr_files, gps_only = FALSE, verbose = TRUE, hdr_folder = deprecated() )
hdr_files |
A character string specifying the path to feature files or a folder path. |
gps_only |
A logical value indicating whether to include only GPS information (latitude and longitude). Default is FALSE. |
verbose |
A logical value indicating whether to print progress messages. Default is TRUE. |
hdr_folder |
Use |
A data frame with sample names, GPS latitude, GPS longitude, and optionally timestamps.
## Not run: # Extract all HDR data hdr_data <- ifcb_read_hdr_data("path/to/data") print(hdr_data) # Extract only GPS data gps_data <- ifcb_read_hdr_data("path/to/data", gps_only = TRUE) print(gps_data) ## End(Not run)
## Not run: # Extract all HDR data hdr_data <- ifcb_read_hdr_data("path/to/data") print(hdr_data) # Extract only GPS data gps_data <- ifcb_read_hdr_data("path/to/data", gps_only = TRUE) print(gps_data) ## End(Not run)
This function reads a MATLAB .mat file using a Python function via reticulate
.
ifcb_read_mat(file_path)
ifcb_read_mat(file_path)
file_path |
A character string representing the full path to the .mat file. |
This function requires a Python interpreter with SciPy
installed.
This function requires a python interpreter to be installed.
The required python packages can be installed in a virtual environment using ifcb_py_install
.
A list containing the MATLAB variables.
## Not run: data <- read_mat_file_r("C:/data/sample.mat") ## End(Not run)
## Not run: data <- read_mat_file_r("C:/data/sample.mat") ## End(Not run)
This function reads a MATLAB .mat file containing aggregated and classified IFCB (Imaging FlowCytobot)
data generated by the countcells_allTBnew_user_training
function from the ifcb-analysis
repository (Sosik and Olson 2007),
or a list of classified data generated by ifcb_summarize_class_counts
.
It returns a data frame with species counts and optionally biovolume information based on specified thresholds.
ifcb_read_summary( summary, hdr_directory = NULL, biovolume = FALSE, threshold = "opt", use_python = FALSE )
ifcb_read_summary( summary, hdr_directory = NULL, biovolume = FALSE, threshold = "opt", use_python = FALSE )
summary |
A character string specifying the path to the .mat summary file or a list generated by |
hdr_directory |
A character string specifying the path to the directory containing header (.hdr) files. Default is NULL. |
biovolume |
A logical indicating whether the file contains biovolume data. Default is FALSE. |
threshold |
A character string specifying the threshold type for counts and biovolume. Options are "opt" (default), "adhoc", and "none". |
use_python |
Logical. If |
If use_python = TRUE
, the function tries to read the .mat
file using ifcb_read_mat()
, which relies on SciPy
.
This approach may be faster than the default approach using R.matlab::readMat()
, especially for large .mat
files.
To enable this functionality, ensure Python is properly configured with the required dependencies.
You can initialize the Python environment and install necessary packages using ifcb_py_install()
.
If use_python = FALSE
or if SciPy
is not available, the function falls back to using R.matlab::readMat()
.
A data frame containing the summary information including file list, volume analyzed, species counts, optionally biovolume, and other metadata.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
https://github.com/hsosik/ifcb-analysis
## Not run: summary_data <- ifcb_read_summary("path/to/summary_file.mat", biovolume = TRUE, threshold = "opt") print(summary_data) ## End(Not run)
## Not run: summary_data <- ifcb_read_summary("path/to/summary_file.mat", biovolume = TRUE, threshold = "opt") print(summary_data) ## End(Not run)
This function replaces a target class ID with a new ID in MATLAB classlist files,
generated by the code in the ifcb-analysis
repository (Sosik and Olson 2007).
ifcb_replace_mat_values( manual_folder, out_folder, target_id, new_id, column_index = 1, do_compression = TRUE )
ifcb_replace_mat_values( manual_folder, out_folder, target_id, new_id, column_index = 1, do_compression = TRUE )
manual_folder |
A character string specifying the path to the folder containing MAT classlist files to be updated. |
out_folder |
A character string specifying the path to the folder where updated MAT classlist files will be saved. |
target_id |
The target class ID to be replaced. |
new_id |
The new class ID to replace the target ID. |
column_index |
An integer value specifying which classlist column to edit. Default is 1 (manual). |
do_compression |
A logical value indicating whether to compress the .mat file. Default is TRUE. |
This function requires a python interpreter to be installed. The required python packages can be installed in a virtual environment using ifcb_py_install
.
This function does not return any value; it updates the classlist files in the specified directory.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
ifcb_py_install
https://github.com/hsosik/ifcb-analysis
## Not run: # Initialize a python session if not already set up ifcb_py_install() # Replace class ID 99 with 1 in .mat classlist files ifcb_replace_mat_values("output/manual", "output/manual", 99, 1, column_index = 1) ## End(Not run)
## Not run: # Initialize a python session if not already set up ifcb_py_install() # Replace class ID 99 with 1 in .mat classlist files ifcb_replace_mat_values("output/manual", "output/manual", 99, 1, column_index = 1) ## End(Not run)
Launches a Shiny application that provides an interactive interface for browsing and managing IFCB (Imaging FlowCytobot) image galleries.
ifcb_run_image_gallery()
ifcb_run_image_gallery()
Users can specify a folder containing .png images, navigate through the images, select and unselect images, and download a list of selected images. This feature is particularly useful for quality control of annotated images. A downloaded list of images from the app can also be uploaded to filter and view only the selected images.
No return value. This function launches a Shiny application for interactive image browsing and management.
## Not run: # Run the IFCB image gallery Shiny app ifcb_run_image_gallery() ## End(Not run)
## Not run: # Run the IFCB image gallery Shiny app ifcb_run_image_gallery() ## End(Not run)
This function calculates aggregated biovolumes and carbon content from Imaging FlowCytobot (IFCB) samples based on biovolume information from feature files. Images are grouped into classes either based on MATLAB classification, manually annotated files, or a user-supplied list of images and their corresponding class labels (e.g. from a CNN model).
ifcb_summarize_biovolumes( feature_folder, mat_folder = NULL, class2use_file = NULL, hdr_folder = NULL, custom_images = NULL, custom_classes = NULL, micron_factor = 1/3.4, diatom_class = "Bacillariophyceae", marine_only = FALSE, threshold = "opt", feature_recursive = TRUE, mat_recursive = TRUE, hdr_recursive = TRUE, use_python = FALSE, verbose = TRUE )
ifcb_summarize_biovolumes( feature_folder, mat_folder = NULL, class2use_file = NULL, hdr_folder = NULL, custom_images = NULL, custom_classes = NULL, micron_factor = 1/3.4, diatom_class = "Bacillariophyceae", marine_only = FALSE, threshold = "opt", feature_recursive = TRUE, mat_recursive = TRUE, hdr_recursive = TRUE, use_python = FALSE, verbose = TRUE )
feature_folder |
Path to the folder containing feature files (e.g., CSV format). |
mat_folder |
(Optional) Path to the folder containing MATLAB classification or manual annotation files. |
class2use_file |
(Optional) A character string specifying the path to the file containing the class2use variable (default NULL). Only needed when summarizing manual MATLAB results. |
hdr_folder |
(Optional) Path to the folder containing HDR files. Needed for calculating cell, biovolume and carbon concentration per liter. |
custom_images |
(Optional) A character vector of image filenames in the format DYYYYMMDDTHHMMSS_IFCBXXX_ZZZZZ,
where "XXX" represents the IFCB number and "ZZZZZ" represents the ROI number.
These filenames should match the |
custom_classes |
(Optional) A character vector of corresponding class labels for |
micron_factor |
Conversion factor from microns per pixel (default: 1/3.4). |
diatom_class |
A string vector of diatom class names in the World Register of Marine Species (WoRMS). Default is "Bacillariophyceae". |
marine_only |
Logical. If TRUE, restricts the WoRMS search to marine taxa only. Default is FALSE. |
threshold |
Threshold for classification (default: "opt"). |
feature_recursive |
Logical. If TRUE, the function will search for feature files recursively within the |
mat_recursive |
Logical. If TRUE, the function will search for MATLAB files recursively within the |
hdr_recursive |
Logical. If TRUE, the function will search for HDR files recursively within the |
use_python |
Logical. If |
verbose |
A logical indicating whether to print progress messages. Default is TRUE. |
This function performs the following steps:
Extracts biovolumes and carbon content from feature and classification results using ifcb_extract_biovolumes
.
Optionally incorporates volume data from HDR files to calculate volume analyzed per sample.
Computes biovolume and carbon content per liter of sample analyzed.
The MATLAB classification or manual annotation files are generated by the ifcb-analysis
repository
(Sosik and Olson 2007). Users can optionally provide a custom classification by supplying a vector of image filenames
(custom_images
) along with corresponding class labels (custom_classes
). This allows summarization
of biovolume and carbon content without requiring MATLAB classification or manual annotation files
(e.g. results from a CNN model).
Biovolumes are converted to carbon according to Menden-Deuer and Lessard 2000 for individual regions of interest (ROI), applying different conversion factors to diatoms and non-diatom protists. If provided, the function also incorporates sample volume data from HDR files to compute biovolume and carbon content per liter of sample.
If use_python = TRUE
, the function tries to read the .mat
file using ifcb_read_mat()
, which relies on SciPy
.
This approach may be faster than the default approach using R.matlab::readMat()
, especially for large .mat
files.
To enable this functionality, ensure Python is properly configured with the required dependencies.
You can initialize the Python environment and install necessary packages using ifcb_py_install()
.
A data frame summarizing aggregated biovolume and carbon content per class per sample. Columns include 'sample', 'classifier', 'class', 'biovolume_mm3', 'carbon_ug', 'ml_analyzed', 'biovolume_mm3_per_liter', and 'carbon_ug_per_liter'.
Menden-Deuer Susanne, Lessard Evelyn J., (2000), Carbon to volume relationships for dinoflagellates, diatoms, and other protist plankton, Limnology and Oceanography, 3, doi: 10.4319/lo.2000.45.3.0569.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
## Not run: # Example usage: ifcb_summarize_biovolumes("path/to/features", "path/to/mat", hdr_folder = "path/to/hdr") # Using custom classification result: images <- c("D20220522T003051_IFCB134_00002", "D20220522T003051_IFCB134_00003") classes = c("Mesodinium_rubrum", "Mesodinium_rubrum") ifcb_summarize_biovolumes(feature_folder = "path/to/features", hdr_folder = "path/to/hdr", custom_images = images, custom_classes = classes) ## End(Not run)
## Not run: # Example usage: ifcb_summarize_biovolumes("path/to/features", "path/to/mat", hdr_folder = "path/to/hdr") # Using custom classification result: images <- c("D20220522T003051_IFCB134_00002", "D20220522T003051_IFCB134_00003") classes = c("Mesodinium_rubrum", "Mesodinium_rubrum") ifcb_summarize_biovolumes(feature_folder = "path/to/features", hdr_folder = "path/to/hdr", custom_images = images, custom_classes = classes) ## End(Not run)
This function summarizes class results for a series of classifier output files and returns a summary data list.
ifcb_summarize_class_counts( classpath_generic, hdr_folder, year_range, use_python = FALSE )
ifcb_summarize_class_counts( classpath_generic, hdr_folder, year_range, use_python = FALSE )
classpath_generic |
Character string specifying the location of the classifier output files. The path should include 'xxxx' in place of the 4-digit year (e.g., 'classxxxx_v1/'). |
hdr_folder |
Character string specifying the directory where the data (hdr files) are located. This can be a URL for web services or a full path for local files. |
year_range |
Numeric vector specifying the range of years (e.g., 2013:2014) to process. |
use_python |
Logical. If |
If use_python = TRUE
, the function tries to read the .mat
file using ifcb_read_mat()
, which relies on SciPy
.
This approach may be faster than the default approach using R.matlab::readMat()
, especially for large .mat
files.
To enable this functionality, ensure Python is properly configured with the required dependencies.
You can initialize the Python environment and install necessary packages using ifcb_py_install()
.
If use_python = FALSE
or if SciPy
is not available, the function falls back to using R.matlab::readMat()
.
A list containing the following elements:
class2useTB |
Classes used in the TreeBagger classifier. |
classcountTB |
Counts of each class considering each target placed in the winning class. |
classcountTB_above_optthresh |
Counts of each class considering only classifications above the optimal threshold for maximum accuracy. |
ml_analyzedTB |
Volume analyzed for each file. |
mdateTB |
Dates associated with each file. |
filelistTB |
List of files processed. |
classpath_generic |
The generic classpath provided as input. |
classcountTB_above_adhocthresh (optional) |
Counts of each class considering only classifications above the adhoc threshold. |
adhocthresh (optional) |
The adhoc threshold used for classification. |
## Not run: ifcb_summarize_class_counts('C:/work/IFCB/user_training_test_data/class/classxxxx_v1/', 'C:/work/IFCB/user_training_test_data/data/', 2014) ## End(Not run)
## Not run: ifcb_summarize_class_counts('C:/work/IFCB/user_training_test_data/class/classxxxx_v1/', 'C:/work/IFCB/user_training_test_data/data/', 2014) ## End(Not run)
This function summarizes the number of images per class for each sample and timestamps,
and optionally retrieves GPS positions, and IFCB information using ifcb_read_hdr_data
and ifcb_convert_filenames
functions.
ifcb_summarize_png_counts( png_folder, hdr_folder = NULL, sum_level = "sample", verbose = TRUE )
ifcb_summarize_png_counts( png_folder, hdr_folder = NULL, sum_level = "sample", verbose = TRUE )
png_folder |
A character string specifying the path to the main directory containing subfolders (classes) with .png images. |
hdr_folder |
A character string specifying the path to the directory containing the .hdr files. Default is NULL. |
sum_level |
A character string specifying the level of summarization. Options: "sample" (default) or "class". |
verbose |
A logical indicating whether to print progress messages. Default is TRUE. |
If sum_level is "sample", returns a data frame with columns: sample, ifcb_number, class_name, n_images, gpsLatitude, gpsLongitude, timestamp, year, month, day, time, roi_numbers. If sum_level is "class", returns a data frame with columns: class_name, n_images.
ifcb_read_hdr_data
ifcb_convert_filenames
## Not run: # Example usage: # Assuming the following directory structure: # path/to/png_folder/ # |- class1/ # | |- sample1_00001.png # | |- sample1_00002.png # | |- sample2_00001.png # |- class2/ # | |- sample1_00003.png # | |- sample3_00001.png png_folder <- "path/to/png_folder" hdr_folder <- "path/to/hdr_folder" # This folder should contain corresponding .hdr files # Summarize by sample summary_sample <- ifcb_summarize_png_counts(png_folder, hdr_folder, sum_level = "sample", verbose = TRUE) print(summary_sample) # Summarize by class summary_class <- ifcb_summarize_png_counts(png_folder, hdr_folder, sum_level = "class", verbose = TRUE) print(summary_class) ## End(Not run)
## Not run: # Example usage: # Assuming the following directory structure: # path/to/png_folder/ # |- class1/ # | |- sample1_00001.png # | |- sample1_00002.png # | |- sample2_00001.png # |- class2/ # | |- sample1_00003.png # | |- sample3_00001.png png_folder <- "path/to/png_folder" hdr_folder <- "path/to/hdr_folder" # This folder should contain corresponding .hdr files # Summarize by sample summary_sample <- ifcb_summarize_png_counts(png_folder, hdr_folder, sum_level = "sample", verbose = TRUE) print(summary_sample) # Summarize by class summary_class <- ifcb_summarize_png_counts(png_folder, hdr_folder, sum_level = "class", verbose = TRUE) print(summary_class) ## End(Not run)
This function processes IFCB data by reading images, matching them to the corresponding header and feature files, and joining them into a single dataframe. This function may be useful when preparing metadata files for an Ecotaxa submission.
ifcb_summarize_png_metadata( png_folder, feature_folder = NULL, hdr_folder = NULL )
ifcb_summarize_png_metadata( png_folder, feature_folder = NULL, hdr_folder = NULL )
png_folder |
Character. The file path to the folder containing the PNG images. |
feature_folder |
Character. The file path to the folder containing the feature files (optional). |
hdr_folder |
Character. The file path to the folder containing the header files (optional). |
A dataframe that joins image data, header data, and feature data based on the sample and roi number.
## Not run: png_folder <- "path/to/pngs" feature_folder <- "path/to/features" hdr_folder <- "path/to/hdr_data" result_df <- ifcb_summarize_png_metadata(png_folder, feature_folder, hdr_folder) ## End(Not run)
## Not run: png_folder <- "path/to/pngs" feature_folder <- "path/to/features" hdr_folder <- "path/to/hdr_data" result_df <- ifcb_summarize_png_metadata(png_folder, feature_folder, hdr_folder) ## End(Not run)
This function reads an IFCB header file to extract sample run time and inhibittime,
and returns the associated estimate of sample volume analyzed (in milliliters).
The function assumes a standard IFCB configuration with a sample syringe operating
at 0.25 mL per minute. For IFCB instruments after 007 and higher (except 008). This is
the R equivalent function of IFCB_volume_analyzed
from the ifcb-analysis
repository (Sosik and Olson 2007).
ifcb_volume_analyzed(hdr_file, hdrOnly_flag = FALSE, flowrate = 0.25)
ifcb_volume_analyzed(hdr_file, hdrOnly_flag = FALSE, flowrate = 0.25)
hdr_file |
A character vector specifying the path(s) to one or more .hdr files or URLs. |
hdrOnly_flag |
An optional flag indicating whether to skip ADC file estimation (default is FALSE). |
flowrate |
Milliliters per minute for syringe pump (default is 0.25). |
A numeric vector containing the estimated sample volume analyzed for each header file.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
https://github.com/hsosik/ifcb-analysis
## Not run: # Example: Estimate volume analyzed from an IFCB header file hdr_file <- "path/to/IFCB_hdr_file.hdr" ml_analyzed <- ifcb_volume_analyzed(hdr_file) print(ml_analyzed) ## End(Not run)
## Not run: # Example: Estimate volume analyzed from an IFCB header file hdr_file <- "path/to/IFCB_hdr_file.hdr" ml_analyzed <- ifcb_volume_analyzed(hdr_file) print(ml_analyzed) ## End(Not run)
This function reads an IFCB ADC file to extract sample run time and inhibittime,
and returns the associated estimate of sample volume analyzed (in milliliters).
The function assumes a standard IFCB configuration with a sample syringe operating
at 0.25 mL per minute. For IFCB instruments after 007 and higher (except 008). This is
the R equivalent function of IFCB_volume_analyzed_fromADC
from the ifcb-analysis repository
(Sosik and Olson 2007).
ifcb_volume_analyzed_from_adc(adc_file)
ifcb_volume_analyzed_from_adc(adc_file)
adc_file |
A character vector specifying the path(s) to one or more .adc files or URLs. |
A list containing:
ml_analyzed: A numeric vector of estimated sample volume analyzed for each ADC file.
inhibittime: A numeric vector of inhibittime values extracted from ADC files.
runtime: A numeric vector of runtime values extracted from ADC files.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216.
https://github.com/hsosik/ifcb-analysis
## Not run: # Example: Estimate volume analyzed from an IFCB ADC file adc_file <- "path/to/IFCB_adc_file.csv" adc_info <- ifcb_volume_analyzed_from_adc(adc_file) print(adc_info$ml_analyzed) ## End(Not run)
## Not run: # Example: Estimate volume analyzed from an IFCB ADC file adc_file <- "path/to/IFCB_adc_file.csv" adc_info <- ifcb_volume_analyzed_from_adc(adc_file) print(adc_info$ml_analyzed) ## End(Not run)
This function identifies which sub-basin a set of latitude and longitude points belong to, using a user-specified or default shapefile.
The default shapefile includes the Baltic Sea, Kattegat, and Skagerrak basins and is included in the iRfcb
package.
ifcb_which_basin(latitudes, longitudes, plot = FALSE, shape_file = NULL)
ifcb_which_basin(latitudes, longitudes, plot = FALSE, shape_file = NULL)
latitudes |
A numeric vector of latitude points. |
longitudes |
A numeric vector of longitude points. |
plot |
A boolean indicating whether to plot the points along with the sea basins. Default is FALSE. |
shape_file |
The absolute path to a custom polygon shapefile in WGS84 (EPSG:4326) that represents the sea basin.
Defaults to the Baltic Sea, Kattegat, and Skagerrak basins included in the |
This function reads a pre-packaged shapefile of the Baltic Sea, Kattegat, and Skagerrak basins from the iRfcb
package by default, or a user-supplied
shapefile if provided. The shapefiles originate from SHARK (https://shark.smhi.se/). It sets the CRS, transforms the CRS to WGS84 (EPSG:4326) if necessary, and checks if the given points
fall within the specified sea basin. Optionally, it plots the points and the sea basin polygons together.
A vector indicating the basin each point belongs to, or a ggplot object if plot = TRUE
.
# Define example latitude and longitude vectors latitudes <- c(55.337, 54.729, 56.311, 57.975) longitudes <- c(12.674, 14.643, 12.237, 10.637) # Check in which Baltic sea basin the points are in points_in_the_baltic <- ifcb_which_basin(latitudes, longitudes) print(points_in_the_baltic) # Plot the points and the basins ifcb_which_basin(latitudes, longitudes, plot = TRUE)
# Define example latitude and longitude vectors latitudes <- c(55.337, 54.729, 56.311, 57.975) longitudes <- c(12.674, 14.643, 12.237, 10.637) # Check in which Baltic sea basin the points are in points_in_the_baltic <- ifcb_which_basin(latitudes, longitudes) print(points_in_the_baltic) # Plot the points and the basins ifcb_which_basin(latitudes, longitudes, plot = TRUE)
This function creates a zip archive containing specified files and directories for manually
annotated IFCB images, organized into a structured format suitable for distribution or storage.
The MATLAB files are generated by the ifcb-analysis
repository (Sosik and Olson 2007).
The zip archive can be used to submit IFCB data to repositories like in the SMHI IFCB Plankton Image Reference Library (Torstensson et al., 2024).
ifcb_zip_matlab( manual_folder, features_folder, class2use_file, zip_filename, data_folder = NULL, readme_file = NULL, matlab_readme_file = NULL, email_address = "", version = "", print_progress = TRUE, feature_recursive = TRUE, manual_recursive = FALSE, data_recursive = TRUE, quiet = FALSE )
ifcb_zip_matlab( manual_folder, features_folder, class2use_file, zip_filename, data_folder = NULL, readme_file = NULL, matlab_readme_file = NULL, email_address = "", version = "", print_progress = TRUE, feature_recursive = TRUE, manual_recursive = FALSE, data_recursive = TRUE, quiet = FALSE )
manual_folder |
The directory containing .mat files to be included in the zip archive. |
features_folder |
The directory containing .csv files, including subfolders, to be included in the zip archive. |
class2use_file |
The path to the file (class2use_file) that will be renamed and included in the 'config' directory of the zip archive. |
zip_filename |
The filename for the zip archive to be created. |
data_folder |
Optionally, the directory containing additional data files (.roi, .adc, .hdr) to be included in the zip archive. |
readme_file |
Optionally, the path to a README file that will be updated with metadata and included in the zip archive. |
matlab_readme_file |
Optionally, the path to a MATLAB README file whose content will be appended to the end of the README file in the zip archive. |
email_address |
The email address to be included in the README file for contact information. |
version |
Optionally, the version number to be included in the README file. |
print_progress |
A logical value indicating whether to print progress bar. Default is TRUE. |
feature_recursive |
Logical. If TRUE, the function will search for feature files recursively within the |
manual_recursive |
Logical. If TRUE, the function will search for MATLAB files recursively within the |
data_recursive |
Logical. If TRUE, the function will search for data files recursively within the |
quiet |
Logical. If TRUE, suppresses messages about the progress and completion of the zip process. Default is FALSE. |
This function performs the following operations:
Lists .mat
files from manual_folder
.
Lists .csv
files from features_folder
(including subfolders).
Lists .roi
, .adc
, .hdr
files from data_folder
if provided.
Copies listed files to temporary directories (manual_dir
, features_dir
, data_dir
, config_dir
).
Renames and copies class2use_file
to config_dir
as class2use.mat
.
Updates readme_file
with metadata (if provided) and appends PNG image statistics and MATLAB README content.
Creates a manifest file (MANIFEST.txt
) listing all files in the zip archive.
Creates a zip archive (zip_filename
) containing all copied and updated files.
Cleans up temporary directories after creating the zip archive.
No return value. This function creates a zip archive containing the specified files and directories.
Sosik, H. M. and Olson, R. J. (2007), Automated taxonomic classification of phytoplankton sampled with imaging-in-flow cytometry. Limnol. Oceanogr: Methods 5, 204–216. Torstensson, Anders; Skjevik, Ann-Turi; Mohlin, Malin; Karlberg, Maria; Karlson, Bengt (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. doi:10.17044/scilifelab.25883455
ifcb_zip_pngs
https://github.com/hsosik/ifcb-analysis
## Not run: ifcb_zip_matlab("path/to/manual_files", "path/to/feature_files", "path/to/class2use.mat", "output_zip_archive.zip", data_folder = "path/to/data_files", readme_file = system.file("exdata/README-template.md", package = "iRfcb"), matlab_readme_file = system.file("inst/exdata/MATLAB-template.md", package = "iRfcb"), email_address = "[email protected]", version = "1.0") ## End(Not run)
## Not run: ifcb_zip_matlab("path/to/manual_files", "path/to/feature_files", "path/to/class2use.mat", "output_zip_archive.zip", data_folder = "path/to/data_files", readme_file = system.file("exdata/README-template.md", package = "iRfcb"), matlab_readme_file = system.file("inst/exdata/MATLAB-template.md", package = "iRfcb"), email_address = "[email protected]", version = "1.0") ## End(Not run)
This function zips directories containing .png files and optionally includes README and MANIFEST files. It can also split the resulting zip file into smaller parts if it exceeds a specified size. The zip archive can be used to submit IFCB data to repositories like in the SMHI IFCB Plankton Image Reference Library (Torstensson et al., 2024).
ifcb_zip_pngs( png_folder, zip_filename, readme_file = NULL, email_address = "", version = "", print_progress = TRUE, include_txt = FALSE, split_zip = FALSE, max_size = 500, quiet = FALSE )
ifcb_zip_pngs( png_folder, zip_filename, readme_file = NULL, email_address = "", version = "", print_progress = TRUE, include_txt = FALSE, split_zip = FALSE, max_size = 500, quiet = FALSE )
png_folder |
The directory containing subdirectories with .png files. |
zip_filename |
The name of the zip file to create. |
readme_file |
Optional path to a README file for inclusion in the zip package. |
email_address |
Optional email address to include in the README file. |
version |
Optional version information to include in the README file. |
print_progress |
A logical value indicating whether to print progress bar. Default is TRUE. |
include_txt |
A logical value indicating whether to include text (.txt, .tsv and .csv) files located in the subdirectories. Default is FALSE. |
split_zip |
A logical value indicating whether to split the zip file into smaller parts if its size exceeds |
max_size |
The maximum size (in MB) for the zip file before it gets split. Only used if |
quiet |
Logical. If TRUE, suppresses messages about the progress and completion of the zip process. Default is FALSE. |
This function does not return any value; it creates a zip archive and optionally splits it into smaller files if specified.
Torstensson, Anders; Skjevik, Ann-Turi; Mohlin, Malin; Karlberg, Maria; Karlson, Bengt (2024). SMHI IFCB Plankton Image Reference Library. SciLifeLab. Dataset. doi:10.17044/scilifelab.25883455
## Not run: # Zip all subdirectories in the 'images' folder with a README file ifcb_zip_pngs("path/to/images", "images.zip", readme_file = system.file("exdata/README-template.md", package = "iRfcb"), email_address = "[email protected]", version = "1.0") # Zip all subdirectories in the 'images' folder without a README file ifcb_zip_pngs("path/to/images", "images.zip") ## End(Not run)
## Not run: # Zip all subdirectories in the 'images' folder with a README file ifcb_zip_pngs("path/to/images", "images.zip", readme_file = system.file("exdata/README-template.md", package = "iRfcb"), email_address = "[email protected]", version = "1.0") # Zip all subdirectories in the 'images' folder without a README file ifcb_zip_pngs("path/to/images", "images.zip") ## End(Not run)
This function reads an HDR file and extracts relevant lines containing parameters and their values.
read_hdr_file(file)
read_hdr_file(file)
file |
A character string specifying the path to the HDR file. |
A data frame with columns: parameter, value, and file.
This helper function takes an existing zip file, extracts its contents, and splits it into smaller zip files without splitting subfolders.
split_large_zip(zip_file, max_size = 500, quiet = FALSE)
split_large_zip(zip_file, max_size = 500, quiet = FALSE)
zip_file |
The path to the large zip file. |
max_size |
The maximum size (in MB) for each split zip file. Default is 500 MB. |
quiet |
Logical. If TRUE, suppresses messages about the progress and completion of the zip process. Default is FALSE. |
This function does not return any value; it creates multiple smaller zip files.
## Not run: # Split an existing zip file into parts of up to 500 MB split_large_zip("large_file.zip", max_size = 500) ## End(Not run)
## Not run: # Split an existing zip file into parts of up to 500 MB split_large_zip("large_file.zip", max_size = 500) ## End(Not run)
This function reads a TreeBagger classifier result file (.mat format) and summarizes the number of targets in each class based on the classification scores and thresholds.
summarize_TBclass(classfile, adhocthresh = NULL)
summarize_TBclass(classfile, adhocthresh = NULL)
classfile |
Character string specifying the path to the TreeBagger classifier result file (.mat format). |
adhocthresh |
Numeric vector specifying the adhoc thresholds for each class. If NULL (default), no adhoc thresholding is applied. If a single numeric value is provided, it is applied to all classes. |
A list containing three elements:
classcount |
Numeric vector of counts for each class based on the winning class assignment. |
classcount_above_optthresh |
Numeric vector of counts for each class above the optimal threshold for maximum accuracy. |
classcount_above_adhocthresh |
Numeric vector of counts for each class above the specified adhoc thresholds (if provided). |
This function converts biovolume in microns^3 to carbon in picograms for large diatoms (> 2000 micron^3) according to Menden-Deuer and Lessard 2000. The formula used is: log pgC cell^-1 = log a + b * log V (um^3), with log a = -0.933 and b = 0.881 for diatoms > 3000 um^3.
vol2C_lgdiatom(volume)
vol2C_lgdiatom(volume)
volume |
A numeric vector of biovolume measurements in microns^3. |
A numeric vector of carbon measurements in picograms.
# Volumes in microns^3 volume <- c(5000, 10000, 20000) # Convert biovolume to carbon for large diatoms vol2C_lgdiatom(volume)
# Volumes in microns^3 volume <- c(5000, 10000, 20000) # Convert biovolume to carbon for large diatoms vol2C_lgdiatom(volume)
This function converts biovolume in microns^3 to carbon in picograms for protists besides large diatoms (> 3000 micron^3) according to Menden-Deuer and Lessard 2000. The formula used is: log pgC cell^-1 = log a + b * log V (um^3), with log a = -0.665 and b = 0.939.
vol2C_nondiatom(volume)
vol2C_nondiatom(volume)
volume |
A numeric vector of biovolume measurements in microns^3. |
A numeric vector of carbon measurements in picograms.
# Volumes in microns^3 volume <- c(5000, 10000, 20000) # Convert biovolume to carbon for non-diatom protists vol2C_nondiatom(volume)
# Volumes in microns^3 volume <- c(5000, 10000, 20000) # Convert biovolume to carbon for non-diatom protists vol2C_nondiatom(volume)