Architecture - itsManeka/amazing-scraper GitHub Wiki
Architecture
The library follows Clean Architecture principles: the domain has no external dependencies, ports define interfaces, and infrastructure adapters implement them. All dependencies are injected via constructor.
Module Structure
src/
domain/
entities/ Product, CouponInfo, CouponResult, CouponMetadata, FetchPreSalesResult
errors/ ScraperError
application/
ports/ HttpClient, HtmlParser, Logger, RetryPolicy, UserAgentProvider
use-cases/ FetchProduct, ExtractCouponProducts, FetchPreSales
infrastructure/
http/ AxiosHttpClient (axios + tough-cookie), RotatingUserAgentProvider
parsers/ CheerioHtmlParser (cheerio)
logger/ ConsoleLogger
retry/ ExponentialBackoffRetry
index.ts Public API and factory (createScraper)
Layers
Domain
Pure entities and error types with no external dependencies. Defines the core data structures (Product, CouponInfo, CouponResult, CouponMetadata, FetchPreSalesResult) and error codes (ScraperError).
Application
Use cases that orchestrate business logic and port interfaces that define contracts for infrastructure adapters:
- FetchProduct — fetches a single product page and extracts structured data
- ExtractCouponProducts — paginates through coupon promotion API to collect all participating products
- FetchPreSales — paginates through HQ & Manga pre-sale search pages to collect ASINs
- Ports —
HttpClient,HtmlParser,Logger,RetryPolicy,UserAgentProvider
Infrastructure
Concrete implementations of the port interfaces:
- AxiosHttpClient — HTTP client with cookie jar support (axios + tough-cookie)
- CheerioHtmlParser — HTML parsing and data extraction (cheerio)
- ConsoleLogger — default logger implementation
- ExponentialBackoffRetry — retry policy with exponential backoff
- RotatingUserAgentProvider — rotates browser User-Agent strings
Data Flow
fetchProduct
- GET
/dp/{ASIN}with browser-like headers - Parse HTML for title, price, stock status, coupon link, and more
- Return
ProductPage(includescouponInfowhen coupon is detected)
extractCouponProducts
- GET coupon page, extract anti-CSRF token and metadata
- POST to
/promotion/psp/productInfoListwith pagination - Deduplicate ASINs and guard against infinite loops
- Return
CouponResultwith all products and metadata
fetchPreSales
- Build search URL for HQ & Manga pre-sales category
- Extract
data-asinfrom search result elements - Paginate with random delays between requests
- Stop on: page limit, empty results, stop-ASIN sentinel, or no next page
Built-in Protections
- Random delay between requests (configurable)
- CAPTCHA detection (3 body markers)
- 403 retry with backoff on initial page
- Session refresh on 403 during pagination
- Infinite pagination loop guard via
sortIdcomparison - ASIN deduplication across pages
- Configurable
maxProducts(1000) andmaxPages(500) limits