Backend ‐ Plugin Metadata - chanzuckerberg/napari-hub GitHub Wiki

Summary

This page details how we fetch metadata for plugins from various data sources. After aggregating all the metadata we need for plugins in the backend codebase, we display this information on the plugin detail page, e.g. https://www.napari-hub.org/plugins/{plugin}.

See below for more details on type of metadata, the specific fields we pull, and how often this runs.

Check out the napari hub tech diagram for high-level architecture of our system: https://lucid.app/lucidchart/d32995a2-42d6-4ccd-84fc-9c5a097304de/view

PyPI Metadata

pypi.py contains the logic to get metadata through PyPI API for a given plugin and version.

For more information about PyPI Metadata, check out https://pypi.org/.

# Below are fields returned by PyPI for each plugin and version
name
summary
description
description_content_type
authors
license
python_version
operating_system
release_date
version
first_released
development_status
requirements
project_site
documentation
support
report_issues
twitter
code_repository

GitHub Metadata

github.py contains the logic to get metadata through GitHub API.

The following are fields included in GitHub Metadata

citation
license
authors, visibility, conda, category are fetched from .napari-hub/config.yml.
description is fetched from .napari-hub/DESCRIPTION.md
project_urls will be deprecated soon.

We only look in napari-hub if we already do not have the metadata from PyPI metadata; we additionally check .napari/config.yml and .napari/DESCRIPTION.md

.napari is also going to be deprecated

npe2-enabled Metadata & Manifest Files

Since the release of npe2 (napari's new of plugin engine), a plugin manifest file distributed with each plugin provides rich metadata about the functionality of the plugin including what type of contributions it provides (e.g. widget, file reader, file writer, theme). The npe2 library also provides utilities for generating manifest files for plugins implementing the original plugin engine. By ingesting this metadata on the napari hub, we can support a richer filtering & browsing experience for users trying to find the right plugin for their application.

Architecture

Because discovering plugin manifests requires the fetching and inspection of Python package distributions, the discovery process must happen independently of the data fetch workflow. A separate lambda (the plugins lambda) is executed to discover manifests. This lambda is invoked by the data fetching process when a new plugin version is released. It discovers the manifest file for the plugin using npe2's own fetching mechanism, and then writes the manifest to dynamo plugin-metadata table as record of type=DISTRIBUTION.

Metadata fields

The npe2 manifest specification is being regularly (and sometimes frequently) updated. Refer to the docs on napari.org for a full listing of the manifest and contributions specifications.

A subset of this metadata is used for the napari hub:

display_name is simply the manifest field of the same name
plugin_types is retrieved from the different types of contributions declared by the plugin. Currently we look for readers, writers, widgets, themes and sample_data contributions.
reader_file_extensions are a set of all filename_patterns declared in readers contributions
writer_file_extensions are a set of all filename_extensions declared in writers contributions
writer_save_layers are a set of all layer_types declared in writers contributions
npe2 is True when the manifest's npe1_shim field is False, and vice versa

Fetching New Plugin Data in data-workflows (Cron Job)

Architecture spec for the plugin using Dynamo:

Plugin data architecture

CloudWatch event rule triggers the update to the data stored in DynamoDB. The schedule is set to once every 5 minutes for production, once every hour for staging, and once every day for dev environments. For more information, check out https://us-west-2.console.aws.amazon.com/cloudwatch/home#rules.

For data workflow, the rule publishes the following JSON message to the SQS queue:

{"type": "plugin"}

Dynamo Data Storage

The dynamo tables are named with the environment name as the prefix.

Plugin Table

The aggregated plugin data is stored in the plugin table. The table also has global secondary indices for the latest and excluded plugins.

Plugin Metadata Table

The data from Pypi, GitHub, and Manifest are all stored for the various versions of the plugins.

Plugin Blocked Table

Blocked plugins are listed here. The data in this table is filled manually.

Workflow

The generation of the records for plugins is a two-part process. The first step is to fetch all the required plugin metadata and write it to the plugin metadata table. The second step is to create/update the aggregate record for all the plugins that have had updates to their metadata.

Fetching and Updating plugin metadata

Fetch the latest plugins from Dynamo: Get all the plugins currently marked as latest in the plugin table. This produces a result of all the plugins marked as latest in our system.

Fetching plugin list from PyPI: We make requests to PyPI to fetch the latest versions of plugins that are classified with the framework as napari. This helps generate a list of all the latest plugins.

Identify newly added plugins: By filtering out the plugins already marked as latest in our system, we identify new plugins that have been added. For the newly added plugins, fetch metadata from various sources. Also, write a record to plugin-metadata of type=PYPI with is_latest=True.

Fetching PYPI metadata: Get metadata such as the release information, code repository, etc. from PyPI.

Fetching GitHub Metadata: If a valid GitHub code repository link exists for the plugin, fetch information from the README and the config files.

Fetching Manifest Metadata: Invoke the plugin lambda to capture the data from its manifest as specified above.

Identify stale plugins: All the plugins not in the latest plugins list fetched from PYPI, or if their version doesn't match the latest list, are stale. For those plugins, we update their PYPI record by removing the is_latest field.

Writing to Plugin Metadata:

The metadata is written to the plugin metadata table. The different types of records are as follows:

The PYPI record is used to identify if a specific version of the plugin is the latest version.
The METADATA record contains the metadata aggregated from PyPI and GitHub if a valid code_repository url exists.
The DISTRIBUTION record contains the data from the manifest files.

Aggregating Plugin Data

The updates to plugin metadata are tracked using the dynamo streams.
For any specific plugins-version that has had any of their metadata records updated, we recompute the plugin aggregation.
In addition to the rich metadata, the aggregation also identifies the visibility of a plugin and if it is the latest version.
The aggregation is written to the plugin table

Plugin API

/plugins/<plugin>

Returns result from query of latest_plugins index on plugin table for the plugin name. It filters to ensure the plugin visibility is either public or hidden. If no result is found, the api returns a 404 HTTP status response.

/plugins/<plugin>/versions/<version>

Returns result from query on plugin table for the plugin name and version. It filters to ensure the plugin visibility is either public or hidden. If no result is found, the api returns a 404 HTTP status response.

Disaster Recovery

Running the regular data-workflow for the plugin will backfill any missing data.

Troubleshooting

If any issue occurs during data-workflow execution, look through Lambda’s execution logs to identify the problem.