How To Use Last Response - michaeltelford/wgit GitHub Wiki
This simple code snippet shows how Wgit::Crawler#last_response
can be used to access the most recent HTTP Wgit::Response
as well as the underlying HTTP adapter response object.
require 'wgit'
require 'wgit/core_ext'
url = 'https://vlang.io'.to_url
crawler = Wgit::Crawler.new
crawler.last_response # => nil
doc = crawler.crawl(url)
resp = crawler.last_response
resp.class # => Wgit::Response
resp.adapter_response.class # => Typhoeus::Response
resp.adapter_response.request.class # => Typhoeus::Request
resp.status # => 200
resp.headers # => { date: "Thu, 24 Oct 2019 01:05:36 GMT", ... }
resp.body # => "<html>...</html>"
resp.total_time # => 0.34611
resp.redirect_count # => 1
resp.redirections # => { "http://vlang.io" => "https://vlang.io" }
resp.ip_address # => "2606:4700:3033::6815:162d"
resp.headers[:content_type] # => "text/html"
# etc.
Since the crawler.last_response.adapter_response
is a Typhoeus::Response object, we have the full capability of that class should we need it. However, in most cases the Wgit::Response
object is more than enough.
But what if we're crawling an entire site, not just a single page?
The solution is to use crawl_site
's block; we can access the last_response
from inside it and process it as each page gets crawled or we can add the last_response
to an Array to be processed after the crawl (of the entire site) has completed.
For example:
crawler.crawl_site(url) do
resp = crawler.last_response
# `all_resps << resp` or `handle_resp(resp)` etc.
end