Keyword analysis - robhogg/twive GitHub Wiki

I'm planning to add some intelligence here (e.g. stemming and experimentation with Pearson distance to group tweets and tweeters). At the moment, though, it's just a simple keyword cloud, 100 most frequent, after excluding list of stop-words.

The stop-word list at the moment is rather ad-hoc, and could do with some refinement. Might be worth trying a pre-prepared list, such as one of the ones here, though they looked a little broad.

List as-of 9 Feb 2013:

about	from	much	today
all	get	my	too
also	going	no	very
amp	got	not	via
and	had	only	was
any	has	our	what
anyone	have	out	when
are	here	over	where
but	how	que	who
can	I'm	should	why
could	I've	sobre	will
day	into	some	with
does	it's	than	would
doing	its	that	yes
don't	just	the	you
even	last	there	your
everyone	many	they
for	more	this

amp is a slightly odd keyword, I know (the regex that extracts them needs a bit of attention, as it identifies the aphabetic characters in HTML entities as words).

Keyword analysis - robhogg/twive GitHub Wiki

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️