[GSoC 2013] Nilaksh Das' proposal for new insights - nilakshdas/ThinkUp GitHub Wiki

This is a proposal for developing some rich new insights for the ThinkUp system. Just like ThinkUp insights disrupts the traditional manner of serving analytics as is, these insights would let the users break out of the way they interpret their social engagement based purely on the number of comments, likes, shares, retweets that their content generates.


The Emotional Quotient insight

On April 2nd 2013, Anil wrote that ThinkUp is "shooting for emotion, not automation." This insight has been inspired by that thought.

Emotional Quotient insight

An essential driving factor of social media communication is the ability to share one's emotional state of mind about various events. However, the race to collect retweets, comments and likes often overshadows the actual purpose of sharing opinions socially. This information can be analysed easily using natural language processing techniques and can provide a compelling insight into the user's emotional quotient over a period of time. ThinkUp can capture this emotional content stored as raw text and serve it as a visual insight, maybe even in the form of a radar graph.

The emotional state of a person can be classified into 6 basic emotions (anger, surprise, disgust, sadness, happiness and fear) as proposed by Paul Ekman. Each word in a tweet can be categorized syntactically to carry different weights for each kind of emotion and a weighted average of the emotional contents in a tweet can be evaluated to determine the emotional feel of the message. There is a readily available 'Lexicon' based on WordNet, that contains above 5000 English words and their emotional weights calculated by natural language processing heuristics. Each word in the list has this structure: Lexicon structure

Hence, the body of a tweet can be readily scanned for potential emotional content by comparing each word against the list of terms. After the emotional parameters contained in the text have been evaluated, the information can be served as an insight so that the user can determine the quality of responses a post has generated in the context of emotions. More creative visualizations based on colors associated with these 6 emotions can also be imagined.


The Interaction Graph insight

This insight has been inspired by mentionmapp.

Interaction Graph insight

A social network is all about connections. A user joins a social network to connect and interact with others. Now, its not necessary that a user interacts with everyone he is connected with. Hence, a visualization that depicts the extent of the user's interactions with other people/topics would be a useful insight to help the user determine his most engaging connections.

Enter, the Interaction Graph insight. The Interaction Graph can be easily created using d3js force-directed graphs. After extracting the mentions and topics in the user's posts over a week, an associative array can count the frequency of each mention/topic and finally, the graph can be generated with each unique entry in the array as nodes with the user at the center of the graph. The width of each edge would correspond to the frequency with which the user mentions the person/topic. Since the graph is generated with d3js, it can be made very interactive, just like on the d3js example page.


The Outreach Punchcard insight

The Outreach Punchcard insight is a mashup of TweetWhen and GitHub Punchcard

Outreach Punchcard insight

A very important aspect of social networking for some users is how much they are able to reach out to their audience. Social visibility is a prerequisite for social engagement. Hence, to boost social engagement, it is necessary to optimize social visibility and one of the most important factors that determines social visibility is timing. This makes it obvious that the best strategy to enhance social visibility is to post at the opportune time when most of the user's followers are online, not busy working or sleeping. This ensures that the post's impressions are delivered and that the post does not get lost in the timeline. But how to determine this opportune moment?

This dilemma about when to post can be resolved with the Outreach Punchcard insight. By collecting the responses (like retweets and replies) that the user's content generates and determining the timing of these responses, the collected data can be divided into buckets according to the hours of a day. Once such an array of buckets is ready, containing the frequency of retweets and replies for each hour of each day, we can easily plot the data with d3js using scatter plots.


The Virality Index insight

Virality Index insight

The Virality Index helps the user determine how his YouTube content is performing currently. One cannot comment on the virality of a video just by looking at the number of views. There's no direct way to tell if the view count has spiked suddenly because the video got popular or whether these views have aggregated over a long period of time. Hence, a factor of time decay needs to be introduced to the view count to determine how viral the content is.

The Virality Index is simply the number of views divided by the number of weeks since the video was uploaded. This relation may seem very trivial in the beginning but the Google Charts Gauge representation depicts more information than what immediately comes to the mind.

The max. value of the gauge represents the maximum virality the video has ever touched. Hence, the current virality is always represented relative to the maximum virality of the video. The green/red color near the needle indicates the rise/fall in the virality since last week. Hence, this insight gives the user an overview of the current popularity of the video as compared to when it was most most popular.


The Popularity Timeline insight

Popularity Timeline insight

This can be another way of correlating the number of views with a period of time. The week with the highest number of views would represent the peak popularity, say 100 units, and all the other periods can be represented by a factor of relative number of views. The YouTube Analytics API can be used to retrieve a per-week viewership of the user's video and the Google Charts Annotated Timeline can be used to represent this data. A timeline depicting these metrics can give an insight into the period when a video was trending the most.


The Engagement Index insight

Engagement Index insight

A high view count on YouTube does not necessarily denote a positive response from the audience. There are many ways a viewer on YouTube can engage with the content and the Engagement Index insight represents a cumulative measure of all these actions. This insight helps to determine the actual overall response the content has generated by taking into consideration, the extent of engagement that has occurred with each view.

The Engagement Index gives each action a weight, depending on how engaging the action is and how much time it takes to accomplish that action. The weights were first calculated out of 250 and then scaled down to 100 and can be as follows: 6 points for each view; 8 points for each like; 12 points for each comment; 20 points for each favourite added; 24 points for each share; 30 points for each subscriber gained; -4 points for each dislike; -20 points for each favourite removed; -30 points for each subscriber lost.

Finally, the Engagement Index is calculated by determining the weighted sum of all the actions done in a week and then dividing the whole sum by the total number of views in that week. Hence,

  • a video that gets only views and no other actions would only score 6 engagement points.
  • a video that gets 10 views and 3 likes would score 8.4 engagement points.
  • a video that gets 10 views, 3 likes and 11 comments would score 21.6 engagement points.

It is interesting to note here that a video can score more than 100 engagement points (because actions like shares and comments are not limited by the number of views) but that would be a hyper-engaging video! Another interesting fact here is that because of the sheer number of views, Gangnam Style by Psy only has 6.07 engagement points overall!

All the necessary data to calculate this index can be pulled from the new YouTube Analytics API. A heat-map which can be developed by using d3js calendar view would provide a beautiful visualization of the per-week engagement points and allow the user to determine when the video was most hot!


The Filter Crazy insight

Filter Crazy insight

Instagram sees a magnanimous use of all of its filters everyday, every minute. ThinkUp can aggregate the user's Instagram activity over a month and determine which filters are used most by the user. The size of a bubble in this visualization represents the number of times the filter has been used. This fun insight can be easily created with d3js bubble charts. Maybe this can even evolve into Instagram horoscopes!