1. Download aggregated, anonymized data from GA - NCBI-Codeathons/Use-UMLS-and-Python-to-classify-website-visitor-queries-into-measurable-categories GitHub Wiki

How to export your source data

Scripts assume Google Analytics with search logging already configured. Can be adapted for other tools, and GA has additional options; Google Search Console has advantages, including an API. The below is a fast start for people not currently analyzing search. This method AVOIDS the collection of personally identifiable information.

Data from google.com search results, where the searcher ended up landing on our site

  1. Set date parameters (Consider 1 month)
  2. Go to Acquisition > Search Console > Queries
  3. Select Export > Unsampled Report as SearchConsoleNew (for multiple exports I add month, etc.)
  4. Copy the result to data/raw folder

Example

Search query Clicks Impressions CTR Average position
hippocratic oath 8,672(7.45%) 113,901(1.82%) 7.61% 3.9

Data dictionary

Column Description
Search query The actual search query that triggered impressions
Clicks The number of clicks on your website URLs, from a Google search results page
Impressions The number of times ANY URL on your site appeared in search results viewed by a user
CTR Click-through rate. Clicks / Impressions * 100
Average position The average ranking of your website URLs for the query or queries.

(Documentation from Google.com, but information about paid search has been removed.)

Data from internal site search

With the same date parameters as above,

  1. Go to Behavior > Site Search > Search Terms
  2. Select Export > Unsampled Report as SiteSearchNew (for multiple exports I add month, etc.)
  3. Copy the result to data/raw folder

Example

Search Term Total Unique Searches Results Pageviews/Search % Search Exits % Search Refinements Time after Search Avg. Search Depth
diabetes 999(1.01%) 2.79 67.07% 0.50% 00:06:20 3.33

Data dictionary

Column Description
Search query The actual search query that triggered impressions
Total Unique Searches Number of times people searched your site. Duplicate searches within a single visit are excluded
Results Pageviews / Search Average number of times visitors for you today results page after performing a search
% Search Exits Number of exits from your site that occurred following a result from an internal search
% Search Refinements Total number of times a refinement (transition) occurs between internal search keywords within a session. For example if the sequence of keywords is: "shoes", "shoes", "pants", "pants", this metric is 1 because the refinement of "shoes" to "pants" occurs once.
Time after Search Amount of time visitors spent on your site after getting results for the search term
Avg. Search Depth Number of pages visitors viewed after getting results for the search term

Descriptions are from GA.

But I only want to analyze one log type...

If you choose to analyze only one of your log types, put into place a blank version of what you don't have - a file that only has the column names in it. That way the script will not error out.