1. Download aggregated, anonymized data from GA - NCBI-Codeathons/Use-UMLS-and-Python-to-classify-website-visitor-queries-into-measurable-categories GitHub Wiki
How to export your source data
Scripts assume Google Analytics with search logging already configured. Can be adapted for other tools, and GA has additional options; Google Search Console has advantages, including an API. The below is a fast start for people not currently analyzing search. This method AVOIDS the collection of personally identifiable information.
Data from google.com search results, where the searcher ended up landing on our site
- Set date parameters (Consider 1 month)
- Go to Acquisition > Search Console > Queries
- Select Export > Unsampled Report as SearchConsoleNew (for multiple exports I add month, etc.)
- Copy the result to data/raw folder
Example
Search query | Clicks | Impressions | CTR | Average position |
---|---|---|---|---|
hippocratic oath | 8,672(7.45%) | 113,901(1.82%) | 7.61% | 3.9 |
Data dictionary
Column | Description |
---|---|
Search query | The actual search query that triggered impressions |
Clicks | The number of clicks on your website URLs, from a Google search results page |
Impressions | The number of times ANY URL on your site appeared in search results viewed by a user |
CTR | Click-through rate. Clicks / Impressions * 100 |
Average position | The average ranking of your website URLs for the query or queries. |
(Documentation from Google.com, but information about paid search has been removed.)
Data from internal site search
With the same date parameters as above,
- Go to Behavior > Site Search > Search Terms
- Select Export > Unsampled Report as SiteSearchNew (for multiple exports I add month, etc.)
- Copy the result to data/raw folder
Example
Search Term | Total Unique Searches | Results Pageviews/Search | % Search Exits | % Search Refinements | Time after Search | Avg. Search Depth |
---|---|---|---|---|---|---|
diabetes | 999(1.01%) | 2.79 | 67.07% | 0.50% | 00:06:20 | 3.33 |
Data dictionary
Column | Description |
---|---|
Search query | The actual search query that triggered impressions |
Total Unique Searches | Number of times people searched your site. Duplicate searches within a single visit are excluded |
Results Pageviews / Search | Average number of times visitors for you today results page after performing a search |
% Search Exits | Number of exits from your site that occurred following a result from an internal search |
% Search Refinements | Total number of times a refinement (transition) occurs between internal search keywords within a session. For example if the sequence of keywords is: "shoes", "shoes", "pants", "pants", this metric is 1 because the refinement of "shoes" to "pants" occurs once. |
Time after Search | Amount of time visitors spent on your site after getting results for the search term |
Avg. Search Depth | Number of pages visitors viewed after getting results for the search term |
Descriptions are from GA.
But I only want to analyze one log type...
If you choose to analyze only one of your log types, put into place a blank version of what you don't have - a file that only has the column names in it. That way the script will not error out.