Reverse Geo tagging - wywfalcon/twitter-healthcare-analysis GitHub Wiki
Reverse Geo-tagging
A lot of tweet have specified locations or coordinates but that does not mean we know where each tweet is from because the user can enter whatever they want. One way to find out is reverse geo-tagging through coordinates. With this information, we can determine information such zip-code and city.
How it works
Usage
coordinates
or location
field in a tweet
Searching city with the Search for the cities from the cities-file city1000.txt
for the tweets and save the result to the destination folder
coordinates
$ city_from_coordinates [tweetsJsonFile] [citiesFile] [destinationDirectory]
location
$ city_from_location [tweetsJsonFile] [citiesFile] [destinationDirectory]
Both at once
$ clean_city_file [tweetsJsonFile] [citiesFile] [destDir]
Explanation
City from coordinates
- It reads the
coordinates
field from the city file and use the longitude as a reference to find it later - It goes through each tweet and use the coordinates, if present, to find the closest city
- The result are then recorded in
coordinate_match.csv
City from location
- It reads the
location
field from the city file and loads each word-portions of the city into an auto-complete list - It uses the first 4 characters of each word in the
location
field to search for cities from the list - It matches each word-portions of the
location
field with the auto-complete list to find relevant cities to compare - The top cities will then use fuzzy-word-comparison to determine the internationally accepted name
- The city names will then be recorded into
strong_match.txt
and `