Streaming with Tweepy - sahajss/knowledge_base GitHub Wiki

What is Tweepy

Tweepy is an easy-to-use Python library for accessing the Twitter API. I will specifically be talking about the Streaming API of Tweepy. Twitter's streaming gives you a slice of tweets being tweeted in real time.

Streaming

    myStreamListener = StreamListener()
	myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)
	try:
		myStream.filter(locations=[-124.77,24.52,-66.95,49.38])	
	except:
		print("Waiting...")
		sleep(600)
		main()

The first line instantiates the StreamListener class. The second line starts the stream using the api. The try-except block is used in case the stream stops. The try part streams only tweets that are sent from the United States. The except part pauses the program for ten minutes and then switches the api key.

class StreamListener(tweepy.StreamListener):
	def on_status(self, status):
		try:
			global numTweets
			numTweets+=1
			print("Number of tweets collected: " + str(numTweets),end='\r')
		except Exception as e:
			print('Encountered Exception Tweet:', e)
			pass
		return True

The first line defines the class. The next line captures every new status that Twiitter streams. I use a try-except block since the tweet may contain some characters that the program will not be able to print or store.

Parsing the data

data = json.dumps(status._json)
data = json.loads(data)
mydict = {}
mydict['text'] = data['text']
mydict['time'] = data['created_at']

Since the status object given through Tweepy streaming is extremely long and incomprehensible, it provides a json version of the status object. Using a json tree viewer, it is easy to locate the attributes of the tweet that you need. The example in the code gets the text of a tweet and the time it was created.

Error handling

def on_data(self, data):
    if 'in_reply_to_status_id' in data:
        status = Status.parse(self.api, json.loads(data))
        if self.on_status(status) is False:
            return False
    elif 'delete' in data:
        delete = json.loads(data)['delete']['status']
        if self.on_delete(delete['id'], delete['user_id']) is False:
            return False
    elif 'limit' in data:
        if self.on_limit(json.loads(data)['limit']['track']) is False:
            return False

def on_delete(self, status_id, user_id):
    return

def on_limit(self, track):
    return

def on_error(self, status_code):
    return False

def on_timeout(self):
    return

Add these error-handling methods to your StreamListener class. on_data gives you the raw data from a streaming connection. on_delete notifies you when a status that is being streamed is deleted. on_limit checks if you are at your rate limit for your api calls. on_error is called when an error status code is being returned. on_timeout is called when the streaming connection times out.