Backend ‐ Plugin Metadata - chanzuckerberg/napari-hub GitHub Wiki
This page details how we fetch metadata for plugins from various data sources. After aggregating all the metadata we need for plugins in the backend codebase, we display this information on the plugin detail page, e.g. https://www.napari-hub.org/plugins/{plugin}.
See below for more details on type of metadata, the specific fields we pull, and how often this runs.
Check out the napari hub tech diagram for high-level architecture of our system: https://lucid.app/lucidchart/d32995a2-42d6-4ccd-84fc-9c5a097304de/view
pypi.py contains the logic to get metadata through PyPI API for a given plugin and version.
For more information about PyPI Metadata, check out https://pypi.org/.
# Below are fields returned by PyPI for each plugin and version
name
summary
description
description_content_type
authors
license
python_version
operating_system
release_date
version
first_released
development_status
requirements
project_site
documentation
support
report_issues
twitter
code_repository
github.py contains the logic to get metadata through GitHub API.
The following are fields included in GitHub Metadata
citationlicense-
authors,visibility,conda,categoryare fetched from.napari-hub/config.yml. -
descriptionis fetched from.napari-hub/DESCRIPTION.md -
project_urlswill be deprecated soon.
We only look in napari-hub if we already do not have the metadata from PyPI metadata; we additionally check .napari/config.yml and .napari/DESCRIPTION.md
.napari is also going to be deprecated
Since the release of npe2 (napari's new of plugin engine), a plugin manifest file distributed with each plugin provides rich metadata about the functionality of the plugin including what type of contributions it provides (e.g. widget, file reader, file writer, theme). The npe2 library also provides utilities for generating manifest files for plugins implementing the original plugin engine. By ingesting this metadata on the napari hub, we can support a richer filtering & browsing experience for users trying to find the right plugin for their application.
Because discovering plugin manifests requires the fetching and inspection of Python package distributions, the discovery process must happen independently of the data fetch workflow. A separate lambda (the plugins lambda) is executed to discover manifests. This lambda is invoked by the data fetching process when a new plugin version is released. It discovers the manifest file for the plugin using npe2's own fetching mechanism, and then writes the manifest to dynamo plugin-metadata table as record of type=DISTRIBUTION.
The npe2 manifest specification is being regularly (and sometimes frequently) updated. Refer to the docs on napari.org for a full listing of the manifest and contributions specifications.
A subset of this metadata is used for the napari hub:
-
display_nameis simply the manifest field of the same name -
plugin_typesis retrieved from the different types of contributions declared by the plugin. Currently we look forreaders,writers,widgets,themesandsample_datacontributions. -
reader_file_extensionsare a set of allfilename_patternsdeclared inreaderscontributions -
writer_file_extensionsare a set of allfilename_extensionsdeclared inwriterscontributions -
writer_save_layersare a set of alllayer_typesdeclared inwriterscontributions -
npe2isTruewhen the manifest'snpe1_shimfield isFalse, and vice versa
Architecture spec for the plugin using Dynamo:

CloudWatch event rule triggers the update to the data stored in DynamoDB. The schedule is set to once every 5 minutes for production, once every hour for staging, and once every day for dev environments. For more information, check out https://us-west-2.console.aws.amazon.com/cloudwatch/home#rules.
For data workflow, the rule publishes the following JSON message to the SQS queue:
{"type": "plugin"}
- The dynamo tables are named with the environment name as the prefix.
The aggregated plugin data is stored in the plugin table. The table also has global secondary indices for the latest and excluded plugins.
The data from Pypi, GitHub, and Manifest are all stored for the various versions of the plugins.
Blocked plugins are listed here. The data in this table is filled manually.
The generation of the records for plugins is a two-part process. The first step is to fetch all the required plugin metadata and write it to the plugin metadata table. The second step is to create/update the aggregate record for all the plugins that have had updates to their metadata.
Fetch the latest plugins from Dynamo: Get all the plugins currently marked as latest in the plugin table. This produces a result of all the plugins marked as latest in our system.
Fetching plugin list from PyPI: We make requests to PyPI to fetch the latest versions of plugins that are classified with the framework as napari. This helps generate a list of all the latest plugins.
Identify newly added plugins: By filtering out the plugins already marked as latest in our system, we identify new plugins that have been added. For the newly added plugins, fetch metadata from various sources. Also, write a record to plugin-metadata of type=PYPI with is_latest=True.
Fetching PYPI metadata: Get metadata such as the release information, code repository, etc. from PyPI.
Fetching GitHub Metadata: If a valid GitHub code repository link exists for the plugin, fetch information from the README and the config files.
Fetching Manifest Metadata: Invoke the plugin lambda to capture the data from its manifest as specified above.
Identify stale plugins: All the plugins not in the latest plugins list fetched from PYPI, or if their version doesn't match the latest list, are stale. For those plugins, we update their PYPI record by removing the is_latest field.
The metadata is written to the plugin metadata table. The different types of records are as follows:
- The
PYPIrecord is used to identify if a specific version of the plugin is the latest version. - The
METADATArecord contains the metadata aggregated from PyPI and GitHub if a valid code_repository url exists. - The
DISTRIBUTIONrecord contains the data from the manifest files.
- The updates to plugin metadata are tracked using the dynamo streams.
- For any specific plugins-version that has had any of their metadata records updated, we recompute the plugin aggregation.
- In addition to the rich metadata, the aggregation also identifies the visibility of a plugin and if it is the latest version.
- The aggregation is written to the plugin table
Returns result from query of latest_plugins index on plugin table for the plugin name. It filters to ensure the plugin visibility is either public or hidden. If no result is found, the api returns a 404 HTTP status response.
Returns result from query on plugin table for the plugin name and version. It filters to ensure the plugin visibility is either public or hidden. If no result is found, the api returns a 404 HTTP status response.
Running the regular data-workflow for the plugin will backfill any missing data.
If any issue occurs during data-workflow execution, look through Lambda’s execution logs to identify the problem.