Utilities - Zadigo/kryptone GitHub Wiki

This page will discuss all the scraping utilities that Kryptone offers.

URL

The URL class transforms a simple url string in order to add various attributes to facilitate web scraping.

Is path

Checks if the url is a path e.g.g starts with /

Is valid

Checks if the url is valid e.g. starts with http or https

Has fragment

Checks if the url has a fragment e.g. http://example.com#fashion

Is file

Checks if the url points to a file

As path

Returns the url as pathlib.Path

Get extension

Return the extension part of the url e.g. .jpeg of http://example.com/image.jpeg

Url stem

Return the stem of the url e.g. image.jpeg of http://example.com/image.jpeg

Url stem

Cheks if the url uses https

Create

Creates a new url instance

Is same domain

Check if two url instances uses the same domain

Get status

Sends a request to the url (using requests.Request) and returns the status

Compare

Compares two url instances e.g. URL('http://example.com').compare(URL('http://example.com/1')

Capture

Capture a specific section of an url

url = URL('http://example.com/1')
url.capture(r'\d+')

Test url

Test if a section of the whole url passes the given test

url = URL('http://example.com/1')
url.capture(r'\d+')

> True

Test url

Test if a section of the url path alone passes the given test

url = URL('http://example.com/1')
url.capture(r'\d+')

> True

Decompose path

Decompose the url path

url = URL('http://example.com/products/1')
url.decompose()

> ['products', '1']

Certain elements can also be excluded:

url = URL('http://example.com/products/1')
url.decompose(exclude=['products'])

> ['1']