referer parser enrichment - OXYGEN-MARKET/oxygen-market.github.io GitHub Wiki
HOME > SNOWPLOW SETUP GUIDE > Step 3: Setting up Enrich > Configurable enrichments > referer-parser enrichment
JSON Schema iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/1-0-0 Compatibility 0.9.6+ Data provider Snowplow referer-parser
The referer parser enrichment uses the Snowplow referer-parser to extract attribution data from referer URLs. You can provide a list of internal subdomains which will be treated as "internal" rather than unknown.
Note that we always use the original HTTP misspelling of 'referer' (and thus 'referal') in this project - never 'referrer'.
{
"schema": "iglu:com.snowplowanalytics.snowplow/referer_parser/jsonschema/1-0-0",
"data": {
"name": "referer_parser",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"internalDomains": [
"subdomain1.mysite.com",
"subdomain2.mysite.com"
]
}
}
}
In this example, if an event's referer URL is either "subdomain1.mysite.com" or "subdomain2.mysite.com" it will be counted as internal.
The input value for the enrichment comes from refr
parameter which is mapped to page_referrer
field in atomic.events
table. Also the value representing the domain name of the server the current page served from is taken from url
parameter. This value ends up in the page_urlhost
field of atomic.events
table. It is extracted during common enrichment process by means of ConversionUtils.explodeUri
function.
The referer parser enrichment uses the Snowplow referer-parser library to extract attribution data from referer URLs. There are 4 mediums that we support
-
unknown
for when we know the source, but not the medium -
email
for webmail providers -
social
for social media services -
search
for search engines
The referer-parser
identifies whether a URL is a known referer or not by checking it against the referers.yml
file.
Additionally the current page domain name (value for page_urlhost
) is used to determine if this is an internal referer by comparing it with the domain names extracted from internalDomains
parameter of referer_parser.json
.
Below is the summary of the fields in atomic.events
table driven by the result of this enrichment (no dedicated table).
Field | Purpose |
---|---|
refr_medium |
Type of referer (ex. 'search', 'internal') |
refr_source |
Name of referer if recognised |
refr_term |
Keywords if source is a search engine |