Validating Data provided by Google - PromInc/organic-search-analytics GitHub Wiki

It is fair to want to ensure that the Organic Search Analytics tool is pulling data accurately from Google. But to ensure that the data is accurate, we need to first understand the the data that is provided from Google. Sadly, not all of the data they provide is the same.

Standard Practice

Below are a lot of instructions on different methods to validate data. The best standard practice to start is to perform the following:

  1. Google API Explorer
  2. Organic Search Analytics Reporting Tool

This will compare the data in your local database to the data provided by Google. If there is a discrepancy here, then it may be worth digging deeper to see where the issue lies - that's what the other methods are here for, further debugging.

Google Analytics

If you don't know, this is not a good valid source to find any data to compare to that which is reported in Organic Search Analytics.

Google Search Console (Webmaster Tools)

Traditionally, we have been trained to use and trust Google Search Console (formerly Google Webmaster Tools) because that was all that Google provided in terms of search ranking and traffic data. That said, they have always made it clear that not all of the actual data is reported in Google Search Console. For this reason, this isn't the best source of data either - let me explain however.

In May of 2015 Google released Search Analytics - the updated and enhanced Search Analytics brought some new features a some more accurate data and some confusion for most users.

At the top of the page, Google displays totals for clicks, impressions, CTR, and position. I'll refer to this section as Totals here forward. Below that are rows of data - up to 1,00 rows. I'll refer to this section as Data here forward.

Many people assume that all of the data listed in the Data equals the Totals. This is NOT true however, and this is by design as stated by Google in their help documentation - see the section labeled Totals in this article. This point is easy to prove by simply downloading a large report with the Download button found at the bottom of the page and opening the document in Excel. In any empty cell enter one of the following formulas to quickly total the respective column:

  • Click: =sum(B:B)
  • Impressions: =sum(C:C)
  • CTR: =sum(D:D)
  • Position: =sum(E:E)

Google API Explorer

In [August 2015 Google released their most accurate tool yet[(http://promincproductions.com/blog/google-api-access-google-search-analytics-from-google-search-console/) - the webmasters.searchanalytics.query method for Google API Explorer. This tool allows a user to make a much larger request for Search Analytics data - up to 5,000 rows. This is similar to the Data rows provided in Google Search Console discussed above, but more of it! With this tool, a webmaster can get their hands on as much data as Google will provide, and thus this is the most accurate measurement tool.

That said, it isn't provided in a neat/pretty easy to read format, and for many webmasters is a bit daunting to access. That's actually one of the sole motivations behind creating Organic Search Analytics - provide an easy user friendly method for capturing and reading this data.

If you want to validate the data provided in Organic Search Analytics to any source, this is it since this is where the data is coming from! But to get the data into a format that is readable takes a few steps.

Tabulate Totals from Google API Explorer

  • Navigate to this URL: https://developers.google.com/apis-explorer/#search/webmasters.searchanalytics.query/m/webmasters/v3/webmasters.searchanalytics.query?siteUrl=www.example.com&_h=1&resource=%257B%250A++%2522startDate%2522%253A+%25222015-12-30%2522%252C%250A++%2522endDate%2522%253A+%25222015-12-30%2522%252C%250A++%2522searchType%2522%253A+%2522web%2522%252C%250A++%2522rowLimit%2522%253A+5000%252C%250A++%2522dimensions%2522%253A+%250A++%255B%2522query%2522%252C%2522device%2522%250A++%255D%250A%257D&
  • Replace www.example.com with the domain name you are authorized to view and wish to use for validation.
  • Set the date range you wish to use for comparison to Organic Search Analytics. I suggest using a single date, thus the start and end date should be the same date.
  • Click the Authorize and Execute button to perform the query. It should ask you to authenticate and then take a few seconds to several minutes to process.
  • Right-click on the text that says 200 OK just below the Response heading and choose Inspect Element (or something similar depending on browser)
  • The developer tools window opens with one line highlight. Right-click on the line JUST ABOVE the highlighted line - it should be <div clas="NYYWNC-e-n"> and choose copy.
  • Open Notepad++
  • Choose File -> New
  • Choose Edit -> Paste
  • Choose Search -> Replace
  • Select Regular expression in the bottom left corner of the Replace window
  • Perform the following replaces, using the Replace All for each step. NOTE: For large sets of data it'll take the program a several seconds for each step.
    • Create new lines:
      • Find what: <div>
      • Replace with: <div>\r\n
    • Create more new lines:
      • Find what: </div>
      • Replace with: </div>\r\n
    • Remove non-click and non-impressions lines:
      • Find what: ^((?!"clicks": </span><span class="gwt-InlineLabel NYYWNC-a-t">|"impressions": </span><span class="gwt-InlineLabel NYYWNC-a-t">).)*$
      • Replace with: leave this field blank
    • Create more new lines:
      • Find what: <span class="gwt-InlineLabel NYYWNC-a-r"> Note: three spaces at the end of the line
      • Replace with: leave this field blank
    • Strip out extra HTML
      • Find what: </span><span class="gwt-InlineLabel NYYWNC-a-t"> Note: single space at beginning of line
      • Replace with: leave this field blank
    • Strip out even more extra HTML
      • Find what: </span><span class="gwt-InlineLabel">,</span></div>
      • Replace with: leave this field blank
  • Choose Edit -> Line Operations -> Remove Empty Lines
  • Choose Edit -> Select All
  • Choose Edit -> Copy
  • Open Microsoft Excel
  • Select cell A1 in an empty workbook and press Ctrl+V to paste the data (this should put text in column A and numbers in column B)
  • Select cell C1 and enter the following formula: =sumif(A:A,"impressions",B:B)
  • Select cell D1 and enter the following formula: =sumif(A:A,"clicks",B:B)

Cell C1 is the total count of impressions for that data set, and cell D1 is the total count of clicks for that data set.

Organic Search Analytics

There are several ways you can access your data within Organic Search Analytics to make this comparison.

Reporting Tool

NoteRequires version 2.3.2

  • Open your installation of Organic Search Analytics in a web browser and choose Reports
  • Select the Domain to the same Domain used in the Google API Explorer sample
  • Set Search Type to WEB
  • Select Specific Dates
  • Set the Date Start and Date End to the same date used in the Google API Explorer sample
  • Set Group By to Query
  • Click Generate Report Looking at the totals table you will find the numbers that accurately total up all of the rows for that report query.

Database

Using a tool such as PHP MyAdmin or MySQL Workbench where you can perform a direct query against your database, run the following query:

SELECT count(query) as 'queries', sum(impressions) as 'impressions', sum(clicks) as 'clicks', avg(avg_position) as 'avg_position' FROM search_analytics WHERE domain = 'http://www.example.com/' AND search_type = 'web' AND date = '2015-12-30'

NOTE: You'll need to change the domain name to one that matches the domain your are comparing.

The totals that return here are the totals for their respective columns.

Raw API Pull

The above two methods were looking at the data that had already been captured to your database. This method however, will look at the data pulled from the Google API using the Organic Search Analytics tool and display the result in your browser as a JSON object. The JSON object will need a bit of manipulation before it'll be in an easy/readable format that you can tabulate totals from. To do that follow these steps:

  • Visit the following URL. A large amount of text will be returned from this request. NOTE: You'll need to change the URL path to your installation of Organic Search Analytics as well as the domain you are using for this comparison.
    • http://localhost/organic-search-analytics/data-capture-run.php?type=googleSearchAnalytics&domain=www.example.com&date=2015-12-30&mode=return&search_type=web
    • Note Requires version 2.3.2
  • Press Ctrl + A to select all of the text
  • Press Ctrl + C to copy the text
  • Open Notepad++
  • Choose File -> New
  • Choose Edit -> Paste
  • Choose Search -> Replace
  • Select Regular expression in the bottom left corner of the Replace window
  • Perform the following replaces, using the Replace All for each step. NOTE: For large sets of data it'll take the program a several seconds for each step.
    • Create new lines:
      • Find what: \}\s\[(\"clicks\")\]
      • Replace with: \}\r\n\[\1\]
    • Create new lines:
      • Find what: float\(([0-9]*)\)\s?
      • Replace with: float\(\1\)\r\n
    • Remove non-click and non-impressions lines:
      • Find what: ^((?!"clicks|impressions").)*$
      • Replace with: leave this field blank
  • Select Extended in the bottom left corner of the Replace window
  • Perform the following replaces, using the Replace All for each step. NOTE: For large sets of data it'll take the program a several seconds for each step.
    • Clean up opening brackets
      • Find what: [
      • Replace with: leave this field blank
    • Clean up trailing parenthesis
      • Find what: )
      • Replace with: leave this field blank
    • Clean up extra characters and format for Excel
      • Find what: ]=> float(
      • Replace with: :
  • Choose Edit -> Line Operations -> Remove Empty Lines
  • Choose Edit -> Select All
  • Choose Edit -> Copy
  • Open Microsoft Excel
  • Select cell A1 in an empty workbook and press Ctrl+V to paste the data (this should put text in column A and numbers in column B)
  • Select cell C1 and enter the following formula: =sumif(A:A,"impressions",B:B)
  • Select cell D1 and enter the following formula: =sumif(A:A,"clicks",B:B)

Cell C1 is the total count of impressions for that data set, and cell D1 is the total count of clicks for that data set.

⚠️ **GitHub.com Fallback** ⚠️