8. Contributing Data to Improve Scrapers - Pipepito/acestream-scraper GitHub Wiki
The Acestream Scraper relies on specialized code to extract channel information from various websites. As websites frequently change their structure, our scrapers sometimes need to be updated or enhanced. Your help in providing relevant information can significantly improve the application without requiring you to share potentially sensitive URLs.
To improve our scrapers, we need to understand:
- Website Structure: How Acestream links are presented on the page
- Data Patterns: How channel information is organized
- Error Patterns: What fails when scraping doesn't work
- HTML Structures: Key elements containing channel information
Instead of sharing full URLs (which may contain tracking parameters or sensitive information), please gather the following:
Provide a brief description of the website's structure, particularly focusing on:
- How Acestream links are displayed
- Whether links are in tables, lists, or other containers
- Whether content is loaded dynamically or present in the initial HTML
Example:
The channels are displayed in a grid layout with 3 columns. Each channel has a name,
a thumbnail image, and an Acestream link that appears when hovering over the thumbnail.
Share relevant HTML snippets with sensitive or identifying information removed.
For example, instead of:
<div class="channel-item" data-user="user123" data-tracking="abc123">
<h3>Sports Channel HD</h3>
<a href="acestream://1a2b3c4d5e6f7g8h9i0j">Stream Link</a>
<span class="views">Viewers: 1,234</span>
</div>Share:
<div class="channel-item" data-user="[REMOVED]" data-tracking="[REMOVED]">
<h3>[CHANNEL NAME]</h3>
<a href="acestream://1a2b3c4d5e6f7g8h9i0j">Stream Link</a>
<span class="views">Viewers: [NUMBER]</span>
</div>Use your browser's developer tools (F12) to examine how content is structured:
- Right-click on a channel link and select "Inspect"
- Look for patterns in how the Acestream links are embedded
- Check if links are:
- Directly in the HTML
- Generated by JavaScript
- Loaded from a separate file or API
If the site uses JavaScript to display channels, note:
- Variable names containing channel data
- How data is structured (arrays, objects)
- Any patterns in how data is transformed before display
Example:
The site loads channel data from a JavaScript variable called 'linksData'
which contains an array of objects with 'name' and 'url' properties.
To find structured data, run these commands in your browser's console and share any results containing Acestream links:
// Look for variables containing "acestream"
Object.keys(window).filter(k => typeof window[k] === 'string' && window[k].includes('acestream')).forEach(k => console.log(k, window[k]));
// Look for variables that might contain channel data
Object.keys(window).filter(k => k.includes('channel') || k.includes('stream') || k.includes('link')).forEach(k => console.log(k, window[k]));If you notice the site makes API calls to load channel data:
- Open the Network tab in developer tools
- Refresh the page and look for XHR or Fetch requests
- Find requests that return channel data
- Share the request URL pattern (without domain) and response structure
Example:
The site loads data from "/api/channels.json" which returns an array of objects with
structure: {"name": "Channel Name", "acestream_id": "1a2b3c4d5e6f7g8h9i0j"}
The scraper currently handles these common patterns:
-
Direct Acestream Links:
acestream://1a2b3c4d5e6f7g8h9i0jin the HTML -
JavaScript Data Objects:
const linksData = {links: [{name: "Channel", url: "acestream://..."}]} - M3U Files: Links to external M3U playlists containing Acestream channel information
- ZeroNet-specific Formats: Including channel-item classes and special iframe structures
- listaplana.txt Content: Special format commonly found on some sports streaming sites
When reporting, please use this format:
URL TYPE: [Regular HTTP or ZeroNet]
SITE PATTERN: [Brief description without naming the site]
HTML STRUCTURE:
[HTML snippet with sensitive info removed]
JAVASCRIPT DATA:
[JavaScript data pattern or variable names]
DEVELOPER CONSOLE FINDINGS:
[Any results from console commands]
ERROR OBSERVED:
[Description of what doesn't work when scraped]
The information you provide will be used to:
- Identify patterns across similar websites
- Develop or enhance scrapers for those patterns
- Test and validate scraper improvements
We don't store any personally identifiable information or full URLs.
You can submit this information:
- As a GitHub issue (preferred): https://github.com/Pipepito/acestream-scraper/issues
- Through a GitHub discussion: https://github.com/Pipepito/acestream-scraper/discussions
When submitting, please use the title format: "Scraper Data: [Site Pattern]" (without naming the specific site).
Your contributions help make Acestream Scraper more effective for everyone. By providing structured data instead of direct URLs, you help us improve the application while maintaining security.