Subgroups - Imageomics/Image-Datapalooza-2023 GitHub Wiki
Please document your subgroup(s) here, following the template.
Subgroup 1: Museum Mollusk Segmentation
Short description of the rationale and objective for your subgroup. You can include links to supplementary documentation that already exists.
- Goal: Partition images of specimen-filled drawers and extract text from specimen's catalog card; link catalog data to specimen's metadata
- Members:
- Nathaniel Shoobs
- Hsun-Yi Hsieh
- Rumali Perera
- Sriram Vijendran
- Amy Cogswell
Document and/or link outcomes or results here when accomplished.
Subgroup 2: Kenyan Animal Behavior from Drone Video
_Use existing annotated drone footage and telemetry data to optimize drone missions to improve the quality of the data collected. KABR Dataset
- Goals:
- Stitch together annotated mini-scenes with corresponding telemetry data_
- Examine relationship between altitude and behavior
- Determine best altitude, speed, and angle to collect behavior data and individual id photos.
- Examine how different groups (species, sex, age) group together, measured by centroid size.
- Calculate distance between animals and drone
- TBD
- Members: Jenna, Otto, Douglas, Matt, Arnab, Andrew, Kelly, Chuck, Maksim, Namrata
Document and/or link outcomes or results here when accomplished.
Subgroup 3: Marmoset Hybrid Classification
Hybrid marmosets are being released in areas causing damage to the native marmosets population in the local ecosystem. Identification of these hybrids are vital to protecting the local endangered species of marmosets. We aim to train state of the art image classification models to classify hybrid marmosets from parent marmoset species. The project would involve image annotation, color standardization, model training, and mobile app creation.
- Goal: To train a classification model to distinguish hybrid marmosets from parent species.
- Members:
- Joanna Malukiewicz
- David Carlyn
- Elizabeth Campolongo
- Sydney K. Decker
Subgroup 4: Short title of your subgroup here
Short description of the rationale and objective for your subgroup. You can include links to supplementary documentation that already exists.
- Goal: Enumerate and describe the goals of your subgroup
- Members: Enumerate the members of your subgroup (full or part-time)
Document and/or link outcomes or results here when accomplished.
Subgroup 58: Land coverage data exploration for inaturalist image locations
Extend information from inaturalist with the additional data about the surrounding location.
Development includes creation of an addon that can work independently to the existing Andromeda data exploration application . This addon will merge data from inaturalist for species occurrence locations with data from CropScapes dynamically.
-
Goal:
- Dynamic data exploration to provide surrounding context for species occurances
- Inclusion of two spatial scales: local (~1/2 mile tiles) and broad (~2 mile tiles)
- Initial inclusion of timescale of 2022 with possibility of including additional years.
- Identify informatics challenges with integrating existing infrastructure and data formats.
- Develop a data dictionary
- If time allows evaluate additional data sources to integrate (e.g. RGB and near-infrared bands)
-
Members:
- Leanna House
- Ikenna Onyekwelu
- John Bradley
- Preetika Kaur
- Amber York
Product links:
- github repo: https://github.com/Imageomics/LatLonCover
- huggingface app: https://huggingface.co/spaces/imageomics/LatLonCover
- iNaturalist group for observations: https://www.inaturalist.org/observations/export?projects=image-datapolooza-2023
Subgroup 6: Simulating Raw Evolutionary Data in Blender
A number of deep learning models have been developed to automate evolutionary character construction from images. However, many confounding factors can lead to the expressed traits in a clade, and it can be difficult to interpret whether or not our algorithms are appropriately untangling these effects, and if they are, how we can use them to address biological questions. Because of this, we are using the 3D modeling software Blender to simulate evolution on 3D models under known processes, that we can then render images of and train a neural network on, to probe if it is truly able to disentangle the processes we are interested in from noise/confounders.
- Goal: Simulate a set of images under a known process that can be used to explore scientific interpretability with machine learning models
- Members:
- Caleb Charpentier
- Mason Linscott
- Anna Lewkowicz
Project files, segmented images, and link to the complete 6432 image dataset: https://github.com/mason-linscott/Snailpalazooa2023/tree/main.
Subgroup 7: Data Dashboard
The Data Dashboard (GitHub) was designed for usage with animals (particularly those with subspecies or potential hybrids, eg., butterflies). To be a more generalized tool for use by other domain scientists, the feature requirements (documented here) should be generalized or added to. There are also ways the current functionality could be increased. By using it to analyze different datasets, these potential improvements or shortcomings can be identified, so they can be addressed.
- Goal: Add more traits or generalize to expand usage of the Data Dashboard.
- Members:
- Elizabeth Campolongo
- Sydney K. Decker
- Joanna Malukiewicz
- David Carlyn
- Dom Jebbia
Outcomes/Results:
- Telemetry Dashboard Prototype is a newly created application (particularly for analysis of the KABR telemetry data, however it only requires
lat
andlon
columns). This was forked from the original dashboard. - Identification of issue with improperly formatted latitude or longitude values breaking the map (documented here).
- Added geographic features to map using ArcGIS World Imagery basemap. This update is live on the dev-dashboard.
Subgroup 8: CLIP-model training suitable dataset from species description figure images
The Plazi project has accumulated a large repository of figures and figure captions from species description publications. However, figure captions typically consist of multiple subcaptions, each of which describes a different subfigure. The figures are images consisting of multiple subfigures. In theory, these high-quality biodiversity specimen images with associated textual description (of species, and often occurrence location and/or notable traits shown in the (sub)figure) could be very valuable for CLIP model training for biology, but for this to be effective, we need pairs of text descriptions that describe only one image (as a subfigure), and one subfigure image.
- Goal: Enumerate and describe the goals of your subgroup
- Members: Jim Balhoff (remote), Hilmar Lapp, Nicky Nicolson
Document and/or link outcomes or results here when accomplished.