Anti Spam Tools - internetarchive/openlibrary GitHub Wiki
This document is quickly becoming deprecated, please instead view the DDOS section of our Disaster Recovery & First Responder's Guide
The following was originally published by Giovanni Damiola @gdamdam via http://gio.blog.archive.org/2016/03/10/ol-anti-spam-tools. Gio writes:
I’ve added the common words found in the recent spam to the spam words blacklisted mail.com as almost all of the spam was coming from that domain. This may stop some genuine people from registering and making edits. blocked and reverted edits lot of accounts
Other approaches:
On ol-db1 investigate volume and patterns:
select * from store where key like 'account/%/verify' order by id desc limit 50;
Check nginx access logs for common vectoros on ol-www1
sudo cat /var/log/nginx/access.log | grep "/people"sudo cat /var/log/nginx/access.log | grep "/account/create"
Sam's magic sauce:
netstat -n | /home/samuel/work/reveal-abuse/mktable
sudo cat /var/log/nginx/access.log | cut -d ' ' -f 1 | sort | uniq -c | sort -n | tail -n 10 | /home/samuel/work/reveal-abuse/reveal | /home/samuel/work/reveal-abuse/shownames First, ssh over to ol-www0 (which is the entry point for all traffic) and determine who the bad actor(s) are. Because we anonymize IPs, you'll first have to populate a map of anonymous IPs to IPs we can actually block:
ssh -A ol-www1
netstat -n | /home/samuel/work/reveal-abuse/mktable # XXX this should probably be added to `olsystem`, see: https://github.com/internetarchive/olsystem/issues/45Then run:
sudo tail -n 5000 /var/log/nginx/access.log | cut -d ' ' -f 1 | sort | uniq -c | sort -n | tail -n 10 | /home/samuel/work/reveal-abuse/reveal | /home/samuel/work/reveal-abuse/shownamesOr...
sudo tail -n 250000 /1/var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr | head -n 30At this point, see nginx.conf, you can add the IPs to /olsystem/etc/nginx/deny.conf or add classes of IPs or user-agents to /etc/nginx/sites-available/openlibrary.conf, e.g.:
if ($http_user_agent ~* (Slurp|Yahoo|libwww-perl|Java)) {
return 403;
}
Or, you can block on a per-IP basis in /opt/openlibrary/olsystem/etc/nginx/deny.conf.