Output file format options - KewBridge/specimens2illustrations GitHub Wiki
"The Herbarium at the Royal Botanic Gardens Kew houses approximately seven million plant specimens, collected from all around the world. Specimens are either pressed and dried or preserved in spirit. Kew is committed to making this important collection more accessible to botanists and others, wherever they may be, for use in their own projects: particularly in biodiversity, conservation, sustainable development and systematics. To this end Kew is building an electronic Herbarium Catalogue containing images of the specimens and information taken from their collection labels. Specimens represented in Kew’s digital collections have been collected over a period spanning three centuries, with examples dating back to the beginning of the 18th century. These include over 300,000 putative type and historically important specimens collected by plant hunters, explorers and scientists of great renown including Charles Darwin, Joseph Dalton Hooker and Nathaniel Wallich, to name just a few."
Royal Botanic Gardens, Kew - Herbarium Specimens
The specimens2illustrations dataset contains digitised collection of specimens from the Herbarium at the Royal Botanic Gardens, Kew in the form of images.
"The COCO (Common Objects in Context) format is a standard format for storing and sharing annotations for images and videos. It was developed for the COCO image and video recognition challenge, which is a large-scale benchmark for object detection and image segmentation.
In the COCO format, annotations are stored in a JSON file, which contains information about the image or video, including the file path, size, and a list of annotated objects. Each object is represented by a bounding box, which specifies the location and size of the object in the image, as well as a label indicating the class of the object.
In the COCO format, annotations are stored in a JSON file, which contains information about the image or video, as well as a list of annotated objects. The JSON file includes the following fields:
“info”: This field contains metadata about the dataset, such as the version, description, and contributor information.
“licenses”: This field contains information about the licenses associated with the images and videos in the dataset.
“images”: This field contains a list of dictionaries, each representing an image in the dataset. Each dictionary includes the following fields:
“id”: A unique identifier for the image. “width”: The width of the image in pixels. “height”: The height of the image in pixels. “file_name”: The file name of the image. “license”: The license associated with the image. “date_captured”: The date and time the image was captured (optional). “coco_url”: A URL to the image on the COCO website (optional). “flickr_url”: A URL to the image on Flickr (optional). “annotations”: This field contains a list of dictionaries, each representing an annotated object in the dataset. Each dictionary includes the following fields:
“id”: A unique identifier for the annotation. “image_id”: The identifier of the image containing the annotated object. “category_id”: The identifier of the category (i.e., class) of the annotated object. “bbox”: A list of four numbers representing the bounding box of the annotated object in the format [x, y, width, height], where (x, y) is the top-left corner of the bounding box. “area”: The area of the bounding box in square pixels. “iscrowd”: A Boolean value indicating whether the annotated object is part of a crowd (optional). “segmentation”: A list of lists of points representing the outline of the object (optional). “keypoints”: A list of keypoints (i.e., important points or features) on the object, along with their visibility and position (optional). “num_keypoints”: The number of keypoints in the “keypoints” field (optional). “attributes”: A dictionary of attributes for the annotated object (optional)."
Now let's have a look at the COCO file format:
To create a new object detection dataset, using the COCO file format can be effective for its simplicity and extensive usage.
"At a high level, the COCO format defines exactly how your annotations (bounding boxes, object classes, etc) and image metadata (like height, width, image sources, etc) are stored on disk.
Files on disk: The folder structure of a COCO dataset looks like this:
``<dataset_dir>/
data/
<filename0>.<ext>
<filename1>.<ext>
...
labels.json``"
How to work with object detection datasets in COCO format)
Strengths:
-
COCO is a proven standard for storing and sharing datasets with annotated images supporting category labeling, segmentation masks and bounding box annotations.
-
COCO can also handle multi-figure images by annotating each individual object separately.
-
COCO offers flexibility in creating custom attributes and can be adapted for text labeling.
-
COCO offers a strong community and variety of tools and libraries available for working with COCO formatted datasets.
Challenges:
-
COCO file format is originally designed for object detection and annotations. Therefore, it might not be able to fully capture the structure and complexities of text labels.
-
COCO file format might require adaption or customisation to accommodate text data.
"YOLO: In YOLO labeling format, a .txt file with the same name is created for each image file in the same directory. Each .txt file contains the annotations for the corresponding image file, that is object class, object coordinates, height and width."
Image Data Labelling and Annotation — Everything you need to know
"Both image files in the images folder and its relative text file in the labels folder must have the same filename.
For example:
images
→0001.jpg
labels
→ 0001.txt
Each text file must fulfill all the properties of the YOLO format text file which are the following:
-
The first element of each row is a class id, then bounding box properties (x, y, width, height).
-
Bounding box properties must be normalized (0–1).
-
(x, y) should be the mid-points of a box."
Converting a custom dataset from COCO format to YOLO format
Strengths:
-
YOLO's format is designed for object detection so each object is annotated with its class.
-
YOLO can handle multi figure images by annotating each object separately.
-
YOLO's annotation format is relatively simple.
Challenge:
- While YOLO's format is simpler, it may not directly support features like text labels without modification.
The specimen illustration dataset consists of species name, decsription and multi-figure illustration images. Considering the strengths and challenges of both COCO and YOLO file formats, COCO file format could be a better choice because of the following reasons:
-
Rich Annotations: COCO allows for detailed annotations including bounding boxes, segmentation masks, and captions. The caption field could be used to store the descriptions associated with the images to store textual information.
-
COCO's annotation can be extended or cuatomised to include additional attributees such as text labels.
-
"The COCO-Text dataset is a dataset for text detection and recognition consisting of non-text images, legible images and illegible text images."
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images