How To Parse A URL - michaeltelford/wgit GitHub Wiki
Wgit::Url
is a subclass of String
which builds on URI
and (more so) Addressable::URI
to allow for sophisticated URL parsing. This article demonstrates how you can use Wgit to parse URL's in your application.
It all starts with a Wgit::Url
object. You can initialize your own, or access a Document's URL via Wgit::Document#url
. For example:
require 'wgit'
url = Wgit::Url.new 'https://www.vlang.io/search.html?q=compiler#speed'
url.is_a? String # => true
doc = Wgit::Document.new url
doc.url # => "https://www.vlang.io/search.html?q=compiler#speed"
doc.url.class # => Wgit::Url
Once you have a Wgit::Url
object reference, you can use the following methods to assert the different components of a URL:
url.to_s # => "https://user:[email protected]:443/search.html?q=compiler#speed"
url.to_scheme # => "https"
url.to_user # => "user"
url.to_password # => "pass"
url.to_origin # => "https://www.vlang.io:443"
url.to_base # => "https://www.vlang.io"
url.to_host # => "www.vlang.io"
url.to_domain # => "vlang.io"
url.to_brand # => "vlang"
url.to_port # => "443"
url.to_endpoint # => "/search.html"
url.to_path # => "search.html"
url.to_extension # => "html"
url.to_query # => "q=compiler"
url.to_fragment # => "speed"
Note: The above methods are all aliased and can be called without the to_*
prefix. This corresponds with many of the methods used by URI
and URI::Addressable
meaning you can generally use Wgit::Url
as a drop in replacement.
There are methods which can be used to remove components of the URL:
url.to_s # => "https://www.vlang.io/search.html?q=compiler#speed"
url.omit_origin # => "search.html?q=compiler#speed"
url.omit_fragment # => "https://www.vlang.io/search.html?q=compiler"
# etc...
All of the methods seen so far return a new Wgit::Url
instance meaning you can easily chain them together as needed:
url.to_s # => "https://www.vlang.io/search.html?q=compiler#speed"
url.omit_origin.omit_fragment # => "search.html?q=compiler"
# etc...
Also, since Wgit::Url
is a subclass of String
, you have access to methods such as gsub
, strip
and replace
etc.
If you require access to the underlying URI objects, you can use the following methods:
url.to_uri.class # => URI::HTTPS
url.to_addressable_uri.class # => Addressable::URI
The Wgit::Url
class offers some advanced functionality not found in the underlying URI classes. For example:
require 'wgit'
require 'wgit/core_ext' # Gives us the `String#to_url` method.
url = 'https://www.example.com/contact'.to_url
url.class # => Wgit:Url
url.relative? # => false
url.relative?(domain: 'https://example.com') # => true (with relation to domain)
url + 'us' + '#top' # => "https://www.example.com/contact/us#top" of type Wgit::Url
doc = Wgit::Document.new url
doc.url # => "https://www.example.com/contact"
'/about'.to_url.make_absolute(doc) # => "https://www.example.com/about"
For more information on what's possible with the Wgit::Url
class, see the docs.