Google Parsing - malparty/keywords GitHub Wiki

Solution exploration

After basic HttpClient trials, I quickly encountered a 429 too many requests error message. To enable parsing more keywords, I explored different approaches:

  • Put the threads in 3s sleep between each request.

This was too slow for an acceptable parsing speed.

  • Adjust sleep duration dynamically (success = faster / failure = slower).

I realized that when a 429 too many requests error was triggered, waiting several minutes did not enable to continue. Restarting the app was faster.

  • Then I tried parsing google search via Proxy service.

Instead of Google sending a 429 error, our proxy service was the one sending it. So this did not solve the problem.

  • The current application is solving that problem by disposing its HttpClient whenever a 429 error is received.

This enable to reach 600+ parsing at fast pace (single thread). Which is acceptable for our first product version

Not a perfect solution

Later in the project, I realized that AdWords are not shown to HttpClient requests.

Google knows my app is not an human.

You can find more explorations in the Improvements page of this wiki.