Getting started - StackUnderflowProject/Scraper GitHub Wiki
Dependencies
-
- A Kotlin-based testing/scraping/parsing library providing the ability to analyze and extract data from HTML (server & client-side rendered).
- It places particular emphasis on ease of use and a high level of readability by providing an intuitive DSL.
- First and foremost it aims to be a testing lib, but it can also be used to scrape websites in a convenient fashion.
-
Skrape{it} HTTP Fetcher » 1.1.5
- Used in combination with Skrape{it} to fetch and scrape SSR web pages.
-
- Selenium provides support for the automation of web browsers.
- It provides extensions to emulate user interaction with browsers, a distribution server for scaling browser allocation, and the infrastructure for implementations of the W3C WebDriver specification.
- Used to fetch and scrape dynamic web pages.
-
- The MongoDB Synchronous Driver
- Used for ObjectId implementation
-
- Gson is a Java library that can be used to convert Java Objects into their JSON representation.
- It can also be used to convert a JSON string to an equivalent Java object.
-
- Used to transform address to coordinates
- Requires own api key
Follow the steps below to set up and run the project on your local machine.
Tools
- IDE for Kotlin (InteliJ IDEA)
Prerequisites
Before you begin, ensure you have the following installed on your system:
Steps
1. Clone the Repository
First, clone the repository from GitHub to your local machine using the following command:
git clone https://github.com/StackUnderflowProject/Scraper.git
2. Navigate to the Project Directory
Change into the project directory:
cd Scraper
3. Build the Project
Use Gradle to build the projet. Run the following command in project direcotry:
./gradlew build
4. Run the project
Once the build is successful, you can run the project using Gradle, e.g. scrape all football data from PLT for the 2024 season:
./gradlew run --args='PLT 2024'
This outputs four .json files (teams, matches, stadiums, standings) containing the requested data.