Reverse Geo tagging - wywfalcon/twitter-healthcare-analysis GitHub Wiki

Reverse Geo-tagging

A lot of tweet have specified locations or coordinates but that does not mean we know where each tweet is from because the user can enter whatever they want. One way to find out is reverse geo-tagging through coordinates. With this information, we can determine information such zip-code and city.

How it works

Usage

Searching city with the coordinates or location field in a tweet

Search for the cities from the cities-file city1000.txt for the tweets and save the result to the destination folder

coordinates

$ city_from_coordinates [tweetsJsonFile] [citiesFile] [destinationDirectory]

location

$ city_from_location [tweetsJsonFile] [citiesFile] [destinationDirectory]

Both at once

$ clean_city_file [tweetsJsonFile] [citiesFile] [destDir]

Explanation

City from coordinates

  1. It reads the coordinates field from the city file and use the longitude as a reference to find it later
  2. It goes through each tweet and use the coordinates, if present, to find the closest city
  3. The result are then recorded in coordinate_match.csv

City from location

  1. It reads the location field from the city file and loads each word-portions of the city into an auto-complete list
  2. It uses the first 4 characters of each word in the location field to search for cities from the list
  3. It matches each word-portions of the location field with the auto-complete list to find relevant cities to compare
  4. The top cities will then use fuzzy-word-comparison to determine the internationally accepted name
  5. The city names will then be recorded into strong_match.txt and `