location location id_criteria - freebase-schema/freebase GitHub Wiki
-
name_match: There is a name_match if the name of the locations or their aliases listed on both the sources match. Sometimes location names may match approximately, may be localized or may be abbreviated on the sources (For ex. SC vs South Carolina or SoCal vs Southern California) .
-
geographic_region_compatibility: The locations listed on both the sources have geographic_region_compatibility if the semantic types for external topic and freebase topic match with values in column 1 and column 2 in one of the rows in the table below, and the compatibility column has a YES in the same row.
External Source Type | Freebase Semantic Type | Compatibility |
---|---|---|
sublocality (smaller than a city) | /location/neighborhood | YES |
city, town, village or township | /location/neighborhood | YES |
city, town, village or township | /location/citytown | YES |
county | /location/citytown | NO (but there are exceptions of compatibility, eg., /en/san_francisco is both a city and a county) |
county | /location/neighborhood | NO |
county | /location/administrative_division | YES |
state, region, or province | /location/citytown | NOT compatible in US, but compatible in many other countries (eg. Berlin, Shanghai) |
state, region, or province | /location/neighborhood | NO |
state, region, or province | /location/administrative_division | YES |
-
geolocation_match: There is a geolocation_match if the latitudes and longitudes from property /location/location/geolocation match with the latitudes and longitudes of the location on the other source. In particular if the difference in distance emanating from different geocodes (computed here) is around 5% or less of the area of the entity being compared, then it is a geolocation_match. (The reason that the area of the entity is used to compare with the distance difference is that entity areas are readily available while lengths and widths of entities are difficult to find in information sources)
-
contains_match: There is a contains_match if the Freebase location contains a smaller location listed in Freebase property /location/location/contains, and it matches with the smaller location that the location on the other source contains.
-
contained_by_match: There is a contained_by_match if the Freebase location is contained by a bigger location listed in Freebase property /location/location/containedby, and it matches with the bigger location that the other source location is contained by. The contained_by location should be at fine-grained enough level so as to be discriminating, for example not as large as Earth or Northern Hemisphere.
-
USBG_name_match: USBG is a name given to a location by the United States Board on Geographic Names. There is a USBG_name_match if the USBG name in Freebase property /location/location/usbg_name matches with the USBG name on the other source.
-
GNIS_ID_match: GNIS is an ID for features cataloged by the United States Geographic Names Information System. There is a GNIS_ID_match if the GNIS ID for the location listed in Freebase property /location/location/gnis_feature_id is same on both the sources.
-
GEOnet_Feature_ID_match: GEOnet Feature ID is a unique Feature ID used by GeoNet for features outside of the United States. There is a GEOnet_Feature_ID_match if the GEOnet Feature ID listed in Freebase property /location/location/gns_ufi matches with the GEOnet Feature ID on the other source.
-
street_address_match: There is a street_address_match if the complete address of the location (including street name, city, state, and country) listed in Freebase property /location/location/street_address matches with the street address on the other source.
-
events_match: There is an events_match between the sources if one or more events that happened at this location listed in Freebase property /location/location/events match with the events on the other source. Events should be distinctive enough such as famous conferences/treaties, wars/battles, natural calamities, big sports competitions, awards ceremonies.
-
people_born_here_match: There is a people_born_here_match between the sources if one or more famous people born in this location listed in the Freebase topic blurb match in both the sources.
As a general rule of thumb, locations should not be merged if one of the topics is about an abstract entity such as a business_operation, /business/organization, /education/university, while the other topic references the location of the entity (Example 9,Example 10,Example 11).
Here are some typical patterns for determining identity of two locations:
- name_match and geographic_region_compatibility and geolocation_match (Example 1, Example 2). (Example 8 for counterexample)
- name_match and geographic_region_compatibility and contains_match (Example 3)
- name_match and geographic_region_compatibility and contained_by_match (Example 4)
- name_match and street_address_match (Example 5)
- name_match and geographic_region_compatibility and events_match (Example 6)
- name_match and USBG_name_match
- GNIS_ID_match
- GEOnet_Feature_ID_match
- name_match and people_born_here_match (Example 7)

Pattern 1 Example 1: Here the difference in geocodes is equivalent to 2.328 km which is very small compared to a 600.6 km2 area of SanFrancisco) and less than 5% of the size of San Francisco area. Hence it is a match.
Pattern 1 Example 2: Here the same geocodes as in Example 1 are used for Tenderloin and Telegraph Hill, which are neighborhoods in San Francisco. But the difference in geocodes equivalent to 2.328 km, is much larger (more than 100%) than the neighborhood size of 0.56 km2 for TelegraphHill). Hence it is not a match since we are comparing the neighborhoods.
Pattern 2 Example 3: Permalink Here the names of both the locations match and both of them contain KV5. Hence it is a match.
Pattern 3 Example 4: Permalink In this example, the name of the location on Left Hand Side (LHS) matches with the Alias name on the Right Hand Side (RHS). Also the value of the contained_by property matches on both the sources. Hence it is a match.
Pattern 4 Example 5: Permalink Here the name of the location and their complete street addresses (including street name, city, country) match on both the sources. Hence it is a match.
Pattern 5 Example 6: Permalink Here the names of the locations match and an event that occurred at this location also matches between the sources. Hence it is a match.
Pattern 9 Example 7: Here the locations listed on both sides have a name match and are also listed as the birthplace of Yuri Gagarin (First Soviet Astronaut). Hence it is a match.
Pattern 1 Example 8: These topics have approximately matching names and geolocation match (an acceptable geocode difference equivalent to 3.813km computed here) ), but due to their geographic region incompatibility (province vs /location/citytown), it is not a match.
Example 9: Queensland Museum South Bank on LHS is a specific location of the Queensland Museum, which is an entity. Hence these two topics are not a match.

Example 10: The RHS source has Ars Electronica organization which is an arts and technology museum organization, while LHS has Ars Electronica Center which is one of the museum center location for the organization. Hence these two topics are not a match.

Example 11: Stanford University is an educational institution entity while Stanford University, main campus is the grounds or property on which the Stanford University is located. Hence these two topics are not a match.