How To Use Last Response - michaeltelford/wgit GitHub Wiki

This simple code snippet shows how Wgit::Crawler#last_response can be used to access the most recent HTTP Wgit::Response as well as the underlying HTTP adapter response object.

require 'wgit'
require 'wgit/core_ext'

url     = 'https://vlang.io'.to_url
crawler = Wgit::Crawler.new

crawler.last_response # => nil

doc  = crawler.crawl(url)
resp = crawler.last_response

resp.class                          # => Wgit::Response
resp.adapter_response.class         # => Typhoeus::Response
resp.adapter_response.request.class # => Typhoeus::Request

resp.status         # => 200
resp.headers        # => { date: "Thu, 24 Oct 2019 01:05:36 GMT", ... }
resp.body           # => "<html>...</html>"
resp.total_time     # => 0.34611
resp.redirect_count # => 1
resp.redirections   # => { "http://vlang.io" => "https://vlang.io" }
resp.ip_address     # => "2606:4700:3033::6815:162d"

resp.headers[:content_type] # => "text/html"

# etc.

Since the crawler.last_response.adapter_response is a Typhoeus::Response object, we have the full capability of that class should we need it. However, in most cases the Wgit::Response object is more than enough.

But what if we're crawling an entire site, not just a single page?

The solution is to use crawl_site's block; we can access the last_response from inside it and process it as each page gets crawled or we can add the last_response to an Array to be processed after the crawl (of the entire site) has completed.

For example:

crawler.crawl_site(url) do
  resp = crawler.last_response
  # `all_resps << resp` or `handle_resp(resp)` etc.
end
⚠️ **GitHub.com Fallback** ⚠️