Robot Database - access-watch/logstash-filter-accesswatch GitHub Wiki

The robot database is a JSON file containing a collection of Robot objects.

Format

Robot Object

Attribute Type Description
id number A unique identifier for this robot.
name string The name of the robot to be displayed to the user.
url string The link to the robots's page on the Access Watch database.
reputation string The reputation of the robot (see below)
ips array A list of known IPs for this robot.
uas array A list of known User-Agent strings for this robot.
cidrs array A list of known CIDRs for this robot.

The reputation attribute accepts 4 possible values:

  • nice: perfect, the robot is verified, its following internet best practices, you can trust it
  • ok: all right, the robot is known, but its not easy to verify its identity, or is not following internet best practices
  • suspicious: warning, nothing really harmful, but the robot is not disclosing its identity or has a questionnable activity
  • bad: danger, the robot is involved in harmful activity: scans, spam or attacks

IP addresses are represented as integers of arbitrary length. IPv4 addresses are first transformed into IPv4-mapped IPv6 addresses and then converted to integers. For example, to transform an IPAddr to an integer in Ruby, you would write: i = ip.ipv4? ? ip.ipv4_mapped.to_i : ip.to_i.

User-Agent strings are represented as their md5-hex hash. So if you want to test for equality you will first have to take the hexadecimal MD5 hash of the User-Agent string. For example in Ruby, you would write: Digest::MD5.hexdigest(ua).

CIDRs are represented as an array of 2 elements. The first element is the first IP address of the range, as an integer, and the second element is the number of addresses in the range. For example, 5.188.211.0/24 is represented as [281470778004224 256].

How to use the Database

For a complete description of the robot detection algorithm, see the Ruby source code of the Logstash plugin.

The idea behind the algorithm is to take the intersection of lists of robot candidates.

  • ip-candidates = list of robots with same IP
  • cidr-candidates = list of robots with same CIDR
  • ua-candidates = list of robots with same User-Agent

The final list of candidates is given by: (ip-candidates ⋃ cidr-candidates) ⋂ ua-candidates.