Category Files - osmlab/name-suggestion-index GitHub Wiki

Organization

The data/* folder contains a lot of files - one file per category.

Category files are organized in a tree/key/value path. Each category file contains all the items that share an OpenStreetMap key/value tag.

  • tree - The highest level of organization - each tree contains categories that follow a similar approach to naming and linking to Wikidata.
  • key - An OpenStreetMap tree key (e.g. "amenity")
  • value - An OpenStreetMap tag value (e.g. "fast_food")

The name-suggestion-index currently supports these trees:

  • brands - Branded businesses like restaurants, banks, fuel stations, shops identified by brand/brand:wikidata tags
  • operators - Organizations like post offices, police departments, hospitals identified by operator/operator:wikidata tags
  • flags - Flagpoles hoisting common kinds of flags (national, regional, religious, advertising) identified by flag:wikidata tag
  • transit - Transit networks (bus, rail, ferry, etc.) and related infrastructure identified by network/network:wikidata tags

For example:

  • data/
    • brands/amenity/fast_food.json
    • brands/shop/supermarket.json
    • operators/amenity/post_office.json
    • flags/man_made/flagpole.json
    • transit/route/bus.json
    • and so on…

File contents

Each category file contains:

  • properties - Object containing category-wide properties
  • items - Array containing the items in the category

For example brands/amenity/fast_food.json (comments added for clarity):

"properties": {                           // CATEGORY PROPERTIES:
  "path": "brands/amenity/fast_food"      // "path" - the tree/key/value path for this category
  …
},
"items": [                                // An array of items belonging to this category
  …
  {                                         // ITEM PROPERTIES:
    "displayName": "McDonald's",            // "displayName" - Name to display in summary screens and lists
    "id": "mcdonalds-658eea",               // "id" - a unique identifier added and generated automatically
    "locationSet": {"include": ["001"]},    // "locationSet" - defines where this brand is valid ("001" = worldwide)
    "tags": {                               // "tags" - OpenStreetMap tags that every McDonald's should have
      "amenity": "fast_food",               //   The OpenStreetMap tag for a "fast food" restaurant
      "brand": "McDonald's",                //   `brand` - Brand name in the local language (English)
      "brand:wikidata": "Q38076",           //   `brand:wikidata` - Universal Wikidata identifier
      "cuisine": "burger",                  //   `cuisine` - What kind of fast food is served here
      "name": "McDonald's"                  //   `name` - Display name, also in the local language (English)
    }
  },
  …

There may also be items for McDonald's in other languages! For example, this is how McDonald's should be mapped in Japan:

  …
  {                                         // ITEM PROPERTIES:
    "displayName": "γƒžγ‚―γƒ‰γƒŠγƒ«γƒ‰",            // "displayName" - Name to display in summary screens and lists
    "id": "γƒžγ‚―γƒ‰γƒŠγƒ«γƒ‰-3e7699",              // "id" - a unique identifier added and generated automatically
    "locationSet": { "include": ["jp"] },   // "locationSet" - defines where this brand is valid ("jp" = Japan)
    "tags": {
      "amenity": "fast_food",
      "brand": "γƒžγ‚―γƒ‰γƒŠγƒ«γƒ‰",                // `brand` - Brand name in the local language (Japanese)
      "brand:en": "McDonald's",             // `brand:en` - For non-English brands, tag the English version too
      "brand:ja": "γƒžγ‚―γƒ‰γƒŠγƒ«γƒ‰",             // `brand:ja` - Add at least one `brand:xx` tag that matches `brand`
      "brand:wikidata": "Q38076",           // `brand:wikidata` - Same Universal wikidata identifier
      "cuisine": "burger",
      "name": "γƒžγ‚―γƒ‰γƒŠγƒ«γƒ‰",                 // `name` - Display name, also in the local language (Japanese)
      "name:en": "McDonald's"               // `name:en` - For non-English names, tag the English version too
      "name:ja": "γƒžγ‚―γƒ‰γƒŠγƒ«γƒ‰",              // `name:ja` - Add at least one `name:xx` tag that matches `name`
    }
  },
  …

Identical names, different entities

Sometimes multiple brands, operators, or transit networks in the same category file use the same name - but this is okay!

Make sure each entry has a distinct locationSet and different values for displayName, typically via parenthetical disambiguation. The build script will generate unique identifiers for each entry based on the locationSet and the brand/operator/network value. Values used for disambiguation should be concise, but also distinct enough so that everyone can tell the entities apart. For example:

  …
  {
    "displayName": "Price Chopper (Kansas City)",
    "id": "pricechopper-8741a7",
    "locationSet": {
      "include": [
        "us-ks.geojson",
        "us-mo.geojson"
      ]
    },
    "tags": {
      "brand": "Price Chopper",
      "brand:wikidata": "Q7242572",
      "name": "Price Chopper",
      "shop": "supermarket"
    }
  },
  {
    "displayName": "Price Chopper (New England)",
    "id": "pricechopper-7d1b36",
    "locationSet": {
      "include": [
        "us-ct.geojson",
        "us-ma.geojson",
        "us-nh.geojson",
        "us-ny.geojson",
        "us-pa.geojson",
        "us-vt.geojson"
      ]
    },
    "tags": {
      "brand": "Price Chopper",
      "brand:wikidata": "Q7242574",
      "name": "Price Chopper",
      "shop": "supermarket"
    }
  },
  …

See Item Property Reference for additional guidance.

Adding new categories

Copied from #6053; credit to @bhousel

"Top Level" tags

NSI can support any category that is defined by a "top level" tag. What does this mean?

There is a concept in OSM of "top level" / "defining" / "physical" tags which is not well documented, but basically - it's any tag pair that can stand alone by itself and define what a thing is.

For example amenity=post_office is a "top level" tag. surface=dirt is not, it's just an "attribute" tag that must go alongside something else. Another example: you cannot just draw a rectangle and tag it with the "attribute" tag sport=soccer - you also need to add the "physical" tag leisure=pitch.

The list of "top level" tags we support is pretty much anything under the id-tagging-schema presets folder: https://github.com/openstreetmap/id-tagging-schema/tree/main/data/presets

Adding a category to NSI

For guidance, we can look at the OSM documentation for the category we want to add. As an example, we'll use advertising: https://wiki.openstreetmap.org/wiki/Key:advertising

Next, we look in the id-tagging-schema project to see if it has an entry: https://github.com/openstreetmap/id-tagging-schema/tree/main/data/presets/advertising

Since advertising is there, it looks OK for use as a "top level" tag, like advertising=billboard or advertising=column.

We can make files in the NSI for these as categories, like:

  • data/operators/advertising/billboard.json
  • data/operators/advertising/column.json
  • etc.

And our build script will generate presets that will work in RapiD, JOSM, etc.

Collecting common "names"

NSI can try to fill the files up these files with common operators collected from the OSM planet, unless the category has the skipCollection: true property set.

The scripts that do this "name" collection are kind of hacky, and live here: https://github.com/ideditor/nsi-collector

The NSI Collector project may not yet be collecting any items tagged with the desired key; in this case, no common items will get automatically added to category files under that key, such as advertising/billboard.json. But if we did want to start collecting items (such as operators tagged on advertising stuff), we could add the desired key (like 'advertising') to this line: https://github.com/ideditor/nsi-collector/blob/0594a6049c45e3bb91a96d6d29b443b164aff360/collect_osm.js#L30

These keys have folders of category files in the NSI, but are intentionally omitted from the collection script:

Tree Key Explanation
brands advertising advertising=* is collected under the operators tree
brands emergency emergency=* is collected under the operators tree
brands highway omitted since a collection would be a list of the most common highway names worldwide, and thus of no use to the NSI
brands landuse omitted since a collection would be a list of the most common residential/commercial/etc. area names worldwide, and thus of little use to the NSI
brands man_made man_made=* is collected under the operators tree
brands waterway omitted since a collection would be a list of the most common waterway names worldwide, and thus of no use to the NSI
operators barrier only barrier=toll_booth is of interest to the NSI; all other values for the key are of little use
operators highway only a handful of highway values are of interest to the NSI when considering operators; all other values for the key are of little use
operators internet_access only internet_access=wlan is of interest to the NSI; all other values for the key are of little use
operators landuse most landuse operators operate a small number of areas, meaning a collection of landuse operators is unlikely to return operators of interest to the NSI
operators natural only natural=water is of interest to the NSI; all other values for the key are of no use
operators pipeline category files in this folder are propagated by templates
operators shop shop=* is collected under the brands tree
operators tourism tourism=* is collected under the brands tree
transit highway category files in this folder are propagated by templates
transit public_transport category files in this folder are propagated by templates

Further reading