How To Parse A URL - michaeltelford/wgit GitHub Wiki

Wgit::Url is a subclass of String which builds on URI and (more so) Addressable::URI to allow for sophisticated URL parsing. This article demonstrates how you can use Wgit to parse URL's in your application.

It all starts with a Wgit::Url object. You can initialize your own, or access a Document's URL via Wgit::Document#url. For example:

require 'wgit'

url = Wgit::Url.new 'https://www.vlang.io/search.html?q=compiler#speed'
url.is_a? String # => true

doc = Wgit::Document.new url

doc.url        # => "https://www.vlang.io/search.html?q=compiler#speed"
doc.url.class  # => Wgit::Url

Once you have a Wgit::Url object reference, you can use the following methods to assert the different components of a URL:

url.to_s         # => "https://user:[email protected]:443/search.html?q=compiler#speed"

url.to_scheme    # => "https"
url.to_user      # => "user"
url.to_password  # => "pass"
url.to_origin    # => "https://www.vlang.io:443"
url.to_base      # => "https://www.vlang.io"
url.to_host      # => "www.vlang.io"
url.to_domain    # => "vlang.io"
url.to_brand     # => "vlang"
url.to_port      # => "443"
url.to_endpoint  # => "/search.html"
url.to_path      # => "search.html"
url.to_extension # => "html"
url.to_query     # => "q=compiler"
url.to_fragment  # => "speed"

Note: The above methods are all aliased and can be called without the to_* prefix. This corresponds with many of the methods used by URI and URI::Addressable meaning you can generally use Wgit::Url as a drop in replacement.

There are methods which can be used to remove components of the URL:

url.to_s          # => "https://www.vlang.io/search.html?q=compiler#speed"

url.omit_origin   # => "search.html?q=compiler#speed"
url.omit_fragment # => "https://www.vlang.io/search.html?q=compiler"
# etc...

All of the methods seen so far return a new Wgit::Url instance meaning you can easily chain them together as needed:

url.to_s                      # => "https://www.vlang.io/search.html?q=compiler#speed"

url.omit_origin.omit_fragment # => "search.html?q=compiler"
# etc...

Also, since Wgit::Url is a subclass of String, you have access to methods such as gsub, strip and replace etc.

If you require access to the underlying URI objects, you can use the following methods:

url.to_uri.class             # => URI::HTTPS
url.to_addressable_uri.class # => Addressable::URI

The Wgit::Url class offers some advanced functionality not found in the underlying URI classes. For example:

require 'wgit'
require 'wgit/core_ext' # Gives us the `String#to_url` method.

url = 'https://www.example.com/contact'.to_url
url.class # => Wgit:Url

url.relative?                                # => false
url.relative?(domain: 'https://example.com') # => true (with relation to domain)

url + 'us' + '#top'                # => "https://www.example.com/contact/us#top" of type Wgit::Url

doc = Wgit::Document.new url
doc.url                            # => "https://www.example.com/contact"
'/about'.to_url.make_absolute(doc) # => "https://www.example.com/about"

For more information on what's possible with the Wgit::Url class, see the docs.