Subgroups - Imageomics/Image-Datapalooza-2023 GitHub Wiki

Please document your subgroup(s) here, following the template.

Subgroup 1: Museum Mollusk Segmentation

Short description of the rationale and objective for your subgroup. You can include links to supplementary documentation that already exists.

Goal: Partition images of specimen-filled drawers and extract text from specimen's catalog card; link catalog data to specimen's metadata
Members:
- Nathaniel Shoobs
- Hsun-Yi Hsieh
- Rumali Perera
- Sriram Vijendran
- Amy Cogswell

Document and/or link outcomes or results here when accomplished.

Subgroup 2: Kenyan Animal Behavior from Drone Video

_Use existing annotated drone footage and telemetry data to optimize drone missions to improve the quality of the data collected. KABR Dataset

Goals:

Stitch together annotated mini-scenes with corresponding telemetry data_
Examine relationship between altitude and behavior
Determine best altitude, speed, and angle to collect behavior data and individual id photos.
Examine how different groups (species, sex, age) group together, measured by centroid size.
Calculate distance between animals and drone
TBD

Members: Jenna, Otto, Douglas, Matt, Arnab, Andrew, Kelly, Chuck, Maksim, Namrata

Document and/or link outcomes or results here when accomplished.

Subgroup 3: Marmoset Hybrid Classification

Hybrid marmosets are being released in areas causing damage to the native marmosets population in the local ecosystem. Identification of these hybrids are vital to protecting the local endangered species of marmosets. We aim to train state of the art image classification models to classify hybrid marmosets from parent marmoset species. The project would involve image annotation, color standardization, model training, and mobile app creation.

Goal: To train a classification model to distinguish hybrid marmosets from parent species.
Members:
- Joanna Malukiewicz
- David Carlyn
- Elizabeth Campolongo
- Sydney K. Decker

Brainstorming Document

Github

Subgroup 4: Short title of your subgroup here

Short description of the rationale and objective for your subgroup. You can include links to supplementary documentation that already exists.

Goal: Enumerate and describe the goals of your subgroup
Members: Enumerate the members of your subgroup (full or part-time)

Document and/or link outcomes or results here when accomplished.

Subgroup 58: Land coverage data exploration for inaturalist image locations

Extend information from inaturalist with the additional data about the surrounding location.

Development includes creation of an addon that can work independently to the existing Andromeda data exploration application . This addon will merge data from inaturalist for species occurrence locations with data from CropScapes dynamically.

Goal:
- Dynamic data exploration to provide surrounding context for species occurances
- Inclusion of two spatial scales: local (~1/2 mile tiles) and broad (~2 mile tiles)
- Initial inclusion of timescale of 2022 with possibility of including additional years.
- Identify informatics challenges with integrating existing infrastructure and data formats.
- Develop a data dictionary
- If time allows evaluate additional data sources to integrate (e.g. RGB and near-infrared bands)
Members:
- Leanna House
- Ikenna Onyekwelu
- John Bradley
- Preetika Kaur
- Amber York

Product links:

github repo: https://github.com/Imageomics/LatLonCover
huggingface app: https://huggingface.co/spaces/imageomics/LatLonCover
iNaturalist group for observations: https://www.inaturalist.org/observations/export?projects=image-datapolooza-2023

Subgroup 6: Simulating Raw Evolutionary Data in Blender

A number of deep learning models have been developed to automate evolutionary character construction from images. However, many confounding factors can lead to the expressed traits in a clade, and it can be difficult to interpret whether or not our algorithms are appropriately untangling these effects, and if they are, how we can use them to address biological questions. Because of this, we are using the 3D modeling software Blender to simulate evolution on 3D models under known processes, that we can then render images of and train a neural network on, to probe if it is truly able to disentangle the processes we are interested in from noise/confounders.

Goal: Simulate a set of images under a known process that can be used to explore scientific interpretability with machine learning models
Members:
- Caleb Charpentier
- Mason Linscott
- Anna Lewkowicz

Project files, segmented images, and link to the complete 6432 image dataset: https://github.com/mason-linscott/Snailpalazooa2023/tree/main.

Subgroup 7: Data Dashboard

The Data Dashboard (GitHub) was designed for usage with animals (particularly those with subspecies or potential hybrids, eg., butterflies). To be a more generalized tool for use by other domain scientists, the feature requirements (documented here) should be generalized or added to. There are also ways the current functionality could be increased. By using it to analyze different datasets, these potential improvements or shortcomings can be identified, so they can be addressed.

Goal: Add more traits or generalize to expand usage of the Data Dashboard.
Members:
- Elizabeth Campolongo
- Sydney K. Decker
- Joanna Malukiewicz
- David Carlyn
- Dom Jebbia

Outcomes/Results:

Telemetry Dashboard Prototype is a newly created application (particularly for analysis of the KABR telemetry data, however it only requires lat and lon columns). This was forked from the original dashboard.
Identification of issue with improperly formatted latitude or longitude values breaking the map (documented here).
Added geographic features to map using ArcGIS World Imagery basemap. This update is live on the dev-dashboard.

Subgroup 8: CLIP-model training suitable dataset from species description figure images

The Plazi project has accumulated a large repository of figures and figure captions from species description publications. However, figure captions typically consist of multiple subcaptions, each of which describes a different subfigure. The figures are images consisting of multiple subfigures. In theory, these high-quality biodiversity specimen images with associated textual description (of species, and often occurrence location and/or notable traits shown in the (sub)figure) could be very valuable for CLIP model training for biology, but for this to be effective, we need pairs of text descriptions that describe only one image (as a subfigure), and one subfigure image.

Goal: Enumerate and describe the goals of your subgroup
Members: Jim Balhoff (remote), Hilmar Lapp, Nicky Nicolson

Document and/or link outcomes or results here when accomplished.