8. Contributing Data to Improve Scrapers - Pipepito/acestream-scraper GitHub Wiki

Overview

The Acestream Scraper relies on specialized code to extract channel information from various websites. As websites frequently change their structure, our scrapers sometimes need to be updated or enhanced. Your help in providing relevant information can significantly improve the application without requiring you to share potentially sensitive URLs.

How You Can Help

What We Need

To improve our scrapers, we need to understand:

  1. Website Structure: How Acestream links are presented on the page
  2. Data Patterns: How channel information is organized
  3. Error Patterns: What fails when scraping doesn't work
  4. HTML Structures: Key elements containing channel information

Collecting Information Safely

Instead of sharing full URLs (which may contain tracking parameters or sensitive information), please gather the following:

1. Page Structure Information

Provide a brief description of the website's structure, particularly focusing on:

  • How Acestream links are displayed
  • Whether links are in tables, lists, or other containers
  • Whether content is loaded dynamically or present in the initial HTML

Example:

The channels are displayed in a grid layout with 3 columns. Each channel has a name, 
a thumbnail image, and an Acestream link that appears when hovering over the thumbnail.

2. HTML Snippets

Share relevant HTML snippets with sensitive or identifying information removed.

For example, instead of:

<div class="channel-item" data-user="user123" data-tracking="abc123">
  <h3>Sports Channel HD</h3>
  <a href="acestream://1a2b3c4d5e6f7g8h9i0j">Stream Link</a>
  <span class="views">Viewers: 1,234</span>
</div>

Share:

<div class="channel-item" data-user="[REMOVED]" data-tracking="[REMOVED]">
  <h3>[CHANNEL NAME]</h3>
  <a href="acestream://1a2b3c4d5e6f7g8h9i0j">Stream Link</a>
  <span class="views">Viewers: [NUMBER]</span>
</div>

3. Page Source Analysis

Use your browser's developer tools (F12) to examine how content is structured:

  1. Right-click on a channel link and select "Inspect"
  2. Look for patterns in how the Acestream links are embedded
  3. Check if links are:
    • Directly in the HTML
    • Generated by JavaScript
    • Loaded from a separate file or API

4. JavaScript Structure

If the site uses JavaScript to display channels, note:

  • Variable names containing channel data
  • How data is structured (arrays, objects)
  • Any patterns in how data is transformed before display

Example:

The site loads channel data from a JavaScript variable called 'linksData' 
which contains an array of objects with 'name' and 'url' properties.

5. Console Output for Pattern Discovery

To find structured data, run these commands in your browser's console and share any results containing Acestream links:

// Look for variables containing "acestream"
Object.keys(window).filter(k => typeof window[k] === 'string' && window[k].includes('acestream')).forEach(k => console.log(k, window[k]));

// Look for variables that might contain channel data
Object.keys(window).filter(k => k.includes('channel') || k.includes('stream') || k.includes('link')).forEach(k => console.log(k, window[k]));

6. Request/Response Data

If you notice the site makes API calls to load channel data:

  1. Open the Network tab in developer tools
  2. Refresh the page and look for XHR or Fetch requests
  3. Find requests that return channel data
  4. Share the request URL pattern (without domain) and response structure

Example:

The site loads data from "/api/channels.json" which returns an array of objects with
structure: {"name": "Channel Name", "acestream_id": "1a2b3c4d5e6f7g8h9i0j"}

Common Site Patterns We Support

The scraper currently handles these common patterns:

  1. Direct Acestream Links: acestream://1a2b3c4d5e6f7g8h9i0j in the HTML
  2. JavaScript Data Objects: const linksData = {links: [{name: "Channel", url: "acestream://..."}]}
  3. M3U Files: Links to external M3U playlists containing Acestream channel information
  4. ZeroNet-specific Formats: Including channel-item classes and special iframe structures
  5. listaplana.txt Content: Special format commonly found on some sports streaming sites

Sample Report Format

When reporting, please use this format:

URL TYPE: [Regular HTTP or ZeroNet]
SITE PATTERN: [Brief description without naming the site]
HTML STRUCTURE:
[HTML snippet with sensitive info removed]

JAVASCRIPT DATA:
[JavaScript data pattern or variable names]

DEVELOPER CONSOLE FINDINGS:
[Any results from console commands]

ERROR OBSERVED:
[Description of what doesn't work when scraped]

What Happens With Your Data

The information you provide will be used to:

  1. Identify patterns across similar websites
  2. Develop or enhance scrapers for those patterns
  3. Test and validate scraper improvements

We don't store any personally identifiable information or full URLs.

How to Submit

You can submit this information:

When submitting, please use the title format: "Scraper Data: [Site Pattern]" (without naming the specific site).

Thank You!

Your contributions help make Acestream Scraper more effective for everyone. By providing structured data instead of direct URLs, you help us improve the application while maintaining security.

⚠️ **GitHub.com Fallback** ⚠️