Keyword analysis - robhogg/twive GitHub Wiki

I'm planning to add some intelligence here (e.g. stemming and experimentation with Pearson distance to group tweets and tweeters). At the moment, though, it's just a simple keyword cloud, 100 most frequent, after excluding list of stop-words.

The stop-word list at the moment is rather ad-hoc, and could do with some refinement. Might be worth trying a pre-prepared list, such as one of the ones here, though they looked a little broad.

List as-of 9 Feb 2013:

about from much today
all get my too
also going no very
amp got not via
and had only was
any has our what
anyone have out when
are here over where
but how que who
can I'm should why
could I've sobre will
day into some with
does it's than would
doing its that yes
don't just the you
even last there your
everyone many they  
for more this  

amp is a slightly odd keyword, I know (the regex that extracts them needs a bit of attention, as it identifies the aphabetic characters in HTML entities as words).

⚠️ **GitHub.com Fallback** ⚠️