Components of Dashboard - prekijpatel/MetaMiner GitHub Wiki

The components of the meta-mined dashboard are outlined below from top to bottom, along with any important limitations or tips. This should help in understanding the layout and making the most out of the dashboard.


LED Label 🔢 – Total Assemblies

At the very top of the dashboard, you’ll see a glowing LED-style label showing the total number of assemblies/genomes currently available. As you apply filters using other components of the dashboard, this number updates dynamically to reflect the current subset of data you're working with.

LED


Radio Buttons – Atypical & Suppressed Assemblies

Next, you’ll find two radio buttons that allow you to choose:

When you toggle these options, a short message will appear letting you know how many assemblies were included or excluded based on your selection.

Radio Buttons


Submission Year Slider 🕒

This is a simple range slider that lets you select assemblies based on their year of submission. Use it to focus your analysis on specific timeframes. We recommend using this slider in combination with the submission over the years line chart to investigate trends.


The Choropleth_map 🌍

The choropleth map gives you a geographic perspective on your dataset—showing the distribution of isolates across the world or within individual countries.

Global View

By default, the dashboard opens in Global View, where you’ll see the number of isolates per country visualized on a logarithmic color scale (to handle big differences in counts 😅). Hover over a country to view its exact isolate count.

Country-Level View

Select a specific country (e.g., India or Brazil) from the dropdown menu to zoom in and view regional distribution—like states or provinces.

Just below the map, there is also a small message bar displaying:

  • The total isolates (after all filters are applied!)
  • The isolates without the geographical data that excluded from the map due to missing information

Side Note: The Plotly package used for this visualization does not natively support country-level views. In our dashboard, this functionality is enabled through custom GeoJSON files. These files were extracted and generated from the GADM geopackage, which offers detailed administrative boundaries. We further manually curated these GeoJSONs to ensure that every region at the ADM1 level (such as states or provinces) includes the correct ISO codes. Some of the files which lacked the proper geographical structure were rebuilt using geoJSONs from the https://simplemaps.com/. This ensures accurate linking between normalized data and geographic plots. Additionally, these GeoJSONs may be useful for other applications. If you'd like to use them, they are included in the supporting materials.

Current Limitations:

  • You can only select oneeee country at a time.
  • It's not yet possible to select individual states or provinces. (coming soon! 😄)

choropleth


Assembly-Level 📊

This chart gives you a breakdown of the assembly levels present in your dataset. Genomes in NCBI can be submitted at different assembly levels, such as:

  • Complete Genome
  • Chromosome
  • Scaffold
  • Contig

These levels reflect the continuity and completeness of the genome assembly. The chart shows the count of genomes available for each of these categories.

At the top of the bar chart, there is a row of checkboxes. These can be used to select specific assembly levels, depending on your need.

assembly_level


Annotation Providers 📦

Assemblies submitted to NCBI are usually annotated by either:

  • GenBank
  • RefSeq
  • or a custom annotation provided by the submitter.

In most cases, both GenBank and RefSeq annotations are available for the same assembly. This introduces redundancy in the metadata and can confuse downstream processing. To deal with this, MetaMiner automatically removes the GenBank version from the refined dataset whenever both are present—retaining only the RefSeq annotation. This ensures cleaner and more consistent metadata. (Don't worry—the original GenBank entries are still saved in the raw dataset when we need them.)

Here as well, the checkboxes above the chart can be used to choose the needed annotation sources.

annotation_from


📈 Submission Over the Years—Line Chart

This line chart visualizes the number of genomes submitted over the years. It’s especially useful when paired with the year-range slider, allowing you to filter the dataset by specific submission years and observe trends over time—like spikes in submissions during outbreak events.

submission_years


Sequencing Technologies Over the Years

This chart shows how different sequencing technologies (or their combinations) have been used over time to generate genome assemblies. It also highlights how normalization of the sequencing technology field helped bring consistency—merging varied naming formats into unified categories—making the trends easier to interpret.

  • Each circle represents genome assemblies submitted in a given year that were sequenced using a specific technology.
  • The position of the circle shows the year, and the size reflects the log-transformed count of assemblies.

Sequencing Technologies

Note: Bubble sizes are on a log scale, so even small differences in size can mean big differences in actual counts. Please interpret carefully!


📊 Coverage Depth – Bar Chart

This bar chart shows the coverage depth range among the selected assemblies. The values are grouped (binned) for clarity, and you can use the slider above the chart to filter out assemblies with specific coverage ranges. The selected range is also shown just below the slider for quick reference.

Side Note: During development, we noticed something interesting—some of the assemblies had unusually high and impractical coverage depth (i.e., 50000X, 4.9 million X, 😬). These may have resulted from data entry error or could be a simple misunderstanding in terms of what to enter in the data field. Naturally, such extreme values distort the visualization.

So, to keep things streamlined:

  • We grouped all values above 5000× into a single bin labeled >5000×, which you can choose to include or exclude using a checkbox above the chart.
  • Assemblies without any reported coverage depth are also accounted for. These are included by default, but there's a separate checkbox to exclude them as well. A small note on the side of the chart tells you how many assemblies are missing coverage data, so you can make informed filtering decisions.

Coverage Depth


ANI Scatter Plot

This scatter plot visualizes Average Nucleotide Identity (ANI) values between assemblies and their closest references. Here, ANI % identity is on the one axis and the ANI % coverage on the other.

Each point on the plot represents an assembly, and the color and shape of the points correspond to the sequencing technology used—helping you spot patterns or biases related to sequencing platforms.

Above the plot, there are also:

  • Two sliders to filter assemblies based on specific ANI identity and ANI coverage ranges. The selected range is displayed right below for quick feedback.
  • checkbox to include or exclude assemblies lacking ANI data.

ANI ID Coverage


🧱 Scaffold N50 vs L50 Scatter Plot

This scatter plot helps you explore assembly quality by plotting Scaffold N50 on the x-axis and Scaffold L50 on the y-axis. These two metrics are commonly used to assess how well an assembly has been stitched together—higher N50 and lower L50 usually indicate better assembly continuity.

Each dot in the plot represents an assembly, and like before, the color and shape of the points reflect the sequencing technology used. This makes it easier to observe trends in assembly quality across different platforms.

You can use this plot and the ANI Scatter Plot to spot clusters of high-quality assemblies and navigate through filtering accordingly.

L50 N50


Annotation Summary Graphs

Annotation summary charts are a quick visual summary of various gene counts: CDS count, Total gene count, Pseudogene count and Non-CDS gene count.

Each of these is shown as a histogram to reveal the overall pattern across selected assemblies. However, histograms can sometimes hide the outliers—especially when working with thousands of genomes. Thus, right above each histogram, there's a violin plot that shows the data distribution more intuitively, including those rare and may be interesting extremes.

As similar to others,

  • Use the range sliders above the plots to filter values.
  • Uncheck assemblies without respective annotation data using the checkbox.

Annotation summary

This should help in gauging the general trends in gene content and catch extreme values that may signal errors or unique biology.


Bioproject and Biosample Filters

Next, there are two simple text filters:

  • Bioproject filter
  • Biosample filter

Just type in one or more keywords (separated by commas), and the dashboard will search for relevant entries. The filtered list of isolates will appear right below.

For example, if you're looking for projects that include MDR (multi-drug resistant) isolates, simply type MDR into the Bioproject filter. You’ll see those projects selected that matched your search.

Text Filters

These filters are particularly helpful when narrowing down large datasets to focus on specific projects or sample types.


Treemap for Normalized Isolation Sources

The final visualization in the dashboard is a treemap, designed to help you explore the hierarchical structure of normalized isolation source data.

Each rectangle in the treemap represents a category or sub-category(e.g., Animal > Livestock > Cattle), and the size reflects the number of assemblies in that group. You can interactively drill in and out 🔎 of these levels by clicking on the rectangles, making it easier to understand the structure and distribution of the data.

This component is integrated with four multi-select dropdown menus located nearby. Once you identify the categories you're interested in from the treemap, you can select those specific terms in the dropdowns to filter the assemblies accordingly.

Tree Map