Streaming with Tweepy - sahajss/knowledge_base GitHub Wiki
What is Tweepy
Tweepy is an easy-to-use Python library for accessing the Twitter API. I will specifically be talking about the Streaming API of Tweepy. Twitter's streaming gives you a slice of tweets being tweeted in real time.
Streaming
myStreamListener = StreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)
try:
myStream.filter(locations=[-124.77,24.52,-66.95,49.38])
except:
print("Waiting...")
sleep(600)
main()
The first line instantiates the StreamListener class. The second line starts the stream using the api. The try-except block is used in case the stream stops. The try part streams only tweets that are sent from the United States. The except part pauses the program for ten minutes and then switches the api key.
class StreamListener(tweepy.StreamListener):
def on_status(self, status):
try:
global numTweets
numTweets+=1
print("Number of tweets collected: " + str(numTweets),end='\r')
except Exception as e:
print('Encountered Exception Tweet:', e)
pass
return True
The first line defines the class. The next line captures every new status that Twiitter streams. I use a try-except block since the tweet may contain some characters that the program will not be able to print or store.
Parsing the data
data = json.dumps(status._json)
data = json.loads(data)
mydict = {}
mydict['text'] = data['text']
mydict['time'] = data['created_at']
Since the status object given through Tweepy streaming is extremely long and incomprehensible, it provides a json version of the status object. Using a json tree viewer, it is easy to locate the attributes of the tweet that you need. The example in the code gets the text of a tweet and the time it was created.
Error handling
def on_data(self, data):
if 'in_reply_to_status_id' in data:
status = Status.parse(self.api, json.loads(data))
if self.on_status(status) is False:
return False
elif 'delete' in data:
delete = json.loads(data)['delete']['status']
if self.on_delete(delete['id'], delete['user_id']) is False:
return False
elif 'limit' in data:
if self.on_limit(json.loads(data)['limit']['track']) is False:
return False
def on_delete(self, status_id, user_id):
return
def on_limit(self, track):
return
def on_error(self, status_code):
return False
def on_timeout(self):
return
Add these error-handling methods to your StreamListener class. on_data gives you the raw data from a streaming connection. on_delete notifies you when a status that is being streamed is deleted. on_limit checks if you are at your rate limit for your api calls. on_error is called when an error status code is being returned. on_timeout is called when the streaming connection times out.