httpolaroid_spec - thesavant42/retrorecon GitHub Wiki
(formerly known as site2zip)
As a security researcher, developer, or analyst,
I want to take a single URL and create a full-fidelity capture of how a modern browser would load that page,
so that I can preserve the rendered DOM, assets, scripts, screenshots, and navigation behavior
into a downloadable .zip
bundle for offline review or archival β without performing a full recursive crawl.
This process is referred to as taking a "snap".
- Accept a single URL from the user
- Support custom headers (optional), e.g.:
- User-Agent override
- Cookies or auth tokens
- Detect and follow any HTTP or JavaScript redirects, saving the redirect chain
- Launch a headless browser session (Chromium or equivalent)
- Behave like a modern browser:
- Execute JavaScript
- Load iframes, scripts, images, fonts, and stylesheets
- Trigger dynamic AJAX/XHR fetches
- Wait for network idle, or use a configurable timeout to determine when the page is βdone loadingβ
On successful load:
- Save final DOM as
index.html
(post-JS rendering) - Capture a full-page screenshot
- Save a HAR log or equivalent request/response trace
- Detect and save any navigation or redirection to special pages like
logout.htm
(e.g., asredirect-capture.html
) - Save all loaded assets:
- Images
- JavaScript
- CSS
- Fonts
- Favicon
- XHR/Fetch JSON responses (optional, if possible to detect cleanly)
- Bundle all saved assets and metadata into a single
.zip
file - Folder structure inside the zip should be logical:
/HTTPolaroid_snap/
βββ index.html
βββ screenshot.png
βββ redirect-capture.html (if applicable)
βββ harlog.json
βββ assets/
β βββ js/
β βββ css/
β βββ images/
β βββ fonts/
βββ meta.json β Includes final URL, status, timestamp
- Zip should be streamed to the client or saved to disk depending on usage context (CLI vs Flask)
- β Accepts a URL
- β Uses a browser-like engine to load content (likely Puppeteer or Playwright)
- β Saves HTML and screenshot
- β Handles basic redirection and asset inclusion
- β
Creates a downloadable
.zip
file
Feature | Current | Recommendation |
---|---|---|
Headless browser | β | Retain; ensure Playwright is used for modern behavior |
Full network trace / HAR | β/ |
Add HAR export or equivalent request log for forensics |
Asset folder structure | Structure into /assets/js , /assets/css , etc. |
|
Screenshot capture | β | Retain; offer full-page by default |
Redirect detection (logout) | Explicitly detect and label redirect captures | |
Custom headers/cookies | β | Add support for auth headers or cookie strings |
Final metadata (JSON summary) | β | Include meta.json with request info, timestamps, final URL |
CLI + Flask mode support | Expand Flask usage to allow web UI trigger |
(Out of scope unless requested)
- Screenshot diffs between multiple snaps
- JS console error capture
- CSP or security header audit
- Interactive
.html
viewer for exploring the bundle
HTTPolaroid creates a high-fidelity snapshot (βsnapβ) of a web page as loaded in a full browser context. It focuses on:
- Emulating full browser behavior (JS, assets, redirects)
- Capturing all visible and background-loaded content
- Packaging everything for forensic or research-grade analysis
It is not a crawler or archiver. Itβs a browser-grade single-request capture tool that integrates seamlessly into RetroRecon as the site2zip
module β now rebranded as HTTPolaroid.