Iteration 1: Requirements (Delieverable) - agarcia169/4306-Donaldson-Project GitHub Wiki

Iteration 1

1. Team Info:

  • Joel King
  • Adebolanle Balogun
  • Alex Garcia

2. Vision Statement:

A program that grabs Tweets from the official Twitter accounts of companies that do business with Donaldson, and analyzes them for mentions of powertrain alternatives to internal combustion engines. The program can provide that data in aggregate or by company, possibly in a manner indicating chronology and/or positive/negative tone.

3. Feature List:

  • Show Historical mentions of powertrain tech by company

  • See overall global mentions of powertrain

  • Show mentions of selected powertrains

  • See all powertrain mentions by percentage

4. UML Use Case Diagram:

Use Case Diagram v2

5. UI Sketches:

6. Key use-cases :


Adding a company:

User requests the ability to add a company to the database. Program requests both company name (required/not-null) and Twitter handle (required/not-null) to be input via keyboard, or via a file it reads in, permitting multiple companies to be added at once. The program verifies that each handle is valid and not already in the database under the retrieved ID and presents information from that account (bio/description, perhaps) and asks to confirm this is the correct account. If confirmed, company is added to database for later use. No further analysis occurs at this time.


Pulling down a company's tweets for analysis:

A menu listing valid companies exists. User selects company from that menu, or perhaps selects multiple companies, either through the keyboard or via a file read in to the program. User asks program to query for new Tweets, possibly by count, possibly by date range, or other alternatives. Program asks for confirmation, with warnings on how this request will impact the Tweet caps. Both quarter-hourly and monthly caps are involved in this. If the user confirms the action, this API interface is queried.

Note: this pulls down all relevant data for analysis and storage in the database. No further requests that impact the monthly cap should be required?


Display data

Dumps CSV containing powertrain mentions, labeled by time or other methods of categorization. May be as a simple CSV for use in whatever program Donaldson deems fit? This avoids locking them into whatever poor UI we would likely settle on.

7. Architecture:

Python
Tweepy: a Python library for accessing the Twitter API.
NLTK: a Python library for tokenizing sentences (and doing sentiment analysis?)
MySQL: For a database storage area to keep track of companies, their associations with various tweets mentioning powertrain technologies.