Security Concern? Please read this! - TheKingTermux/myanimelist-nekopoi-scrapper GitHub Wiki
MyAnimeList and Nekopoi Scrapper Wiki
Script Version: 13* *Version will change following the currently developed version!
Code Summary and Architecture
This is a pure Python script (~893 lines for CLI + 576 lines for GUI + 1417 lines for localization) that collects public anime data from two websites and saves it locally. No compiled binaries, no external dependencies beyond standard libraries, and no continuously running processes. Version 13 introduces optional proxy support and retry automation to avoid anti-scraping blocks, GUI for user interface, and localization system for 9 languages, but all operations remain local and transparent.
1. Import Section (Lines 1-10)
import requests
from bs4 import BeautifulSoup
import re
import os
from datetime import datetime
import time
import random
import logging
import threading
import sys
Security Analysis:
- requests: Standard HTTP library for making web requests. No backdoors in the official package. Version 13 uses this with standard headers and optional proxy support.
- BeautifulSoup: Pure Python HTML parser. Only processes local HTML strings, no network code.
- re, os, datetime, time, random, logging, threading, sys: All built-in Python modules. No external dependencies that could be compromised.
- No suspicious imports: No indicators of common malware like
socket(for unauthorized connections),subprocess(for running system commands),smtplib(for email sending),ftplib(for file uploading), orkeyloggermodules.
2. Global Variables and Configuration (Lines 12-36)
logging.basicConfig(level=logging.INFO, format='%(message)s')
loading_active = False
DANGER_GENRES = {"Adult", "Boys Love", "Yaoi", "Crossdressing", "Ecchi", "Girls Love", "Yuri", "Hentai", "Erotica"}
EPS_REGEX = re.compile(r'(\d+)(?:\s*eps)?')
DURATION_REGEX = re.compile(r'(\d+)\s*min')
Security Analysis:
- logging: Configured for console output only. No file logging or external transmission.
- loading_active: Simple boolean flag for user interface animation. No persistent state.
- DANGER_GENRES: Statically encoded string set for content filtering. No executable code.
- Regex patterns: Compiled once at startup for performance. Pure text processing, no code execution.
- No dangerous changes: Version 13 maintains this configuration without adding suspicious global variables.
3. Loading Animation Function (Lines 18-28)
def loading_animation(message="🔄 Processing..."):
spinner = ['|', '/', '-', '\\']
i = 0
while loading_active:
sys.stdout.write(f'\r{message} {spinner[i % len(spinner)]}')
sys.stdout.flush()
time.sleep(0.15)
i += 1
sys.stdout.write(f'\r{message} completed ✓\n')
sys.stdout.flush()
Security Analysis:
- Pure console UI function. No file access, network calls, or system modifications.
- Uses global
loading_activeflag to control loop. time.sleep()only for animation timing, not anti-detection.- No changes: Remains the same as previous versions, safe.
4. Utility Functions (Lines 38-80)
This includes functions for month translation and member count parsing.
translate_month() (Lines 38-54):
- Pure string manipulation using regex replacement.
- Translates month names from English to English (identity function for English version).
- No external calls or data storage.
parse_date_flexible() (Lines 56-69):
- Parses date strings using
datetime.strptime(). - Returns
Noneon failure. No error transmission.
parse_member_count() (Lines 71-79):
- Converts text like '10K' to integers using basic math.
- Uses
re.sub()for text cleaning. Pure calculations.
Security Analysis: All are stateless functions processing strings locally. No side effects. Version 13 makes no changes.
5. Data Extraction Function (Lines 81-142)
get_anime_data(entry):
- Receives BeautifulSoup HTML element.
- Extracts anime data using CSS selectors (e.g.,
entry.select_one('div.title > div > h2 > a')). - Returns string/integer dictionary.
Security Analysis:
- Pure HTML processing with BeautifulSoup.
- No JavaScript execution, no file downloads.
- All data comes from provided HTML elements (passed from scraping functions).
- Error handling only logs locally.
- No dangerous changes: Version 13 maintains safe data extraction.
6. Nekopoi Scraping Function (Lines 144-297)
scrape_nekopoi(max_retries=3, use_proxy=False, proxy_list=None):
- Makes up to 3 HTTP GET attempts to
https://nekopoi.care/jadwal-new-hentai/. - Standard User-Agent and Accept-Language headers.
- Supports optional proxies to avoid blocks.
- Processes HTML with BeautifulSoup.
- Returns processed data dictionary.
Security Analysis:
- Limited HTTP requests: Only to target URL. No redirects to harmful sites.
- Timeout: 15 seconds prevents hanging.
- Data tracking:
data_usage += len(response.content)- only counts bytes locally. - Threading: Uses daemon threads for animation, dies when function ends.
- Optional proxies: If enabled, uses user-provided proxy list, but doesn't store or transmit proxy credentials.
- Retry with backoff: Exponential backoff to avoid detection, but only local retries.
- No data transmission: All processing local.
Version 13 Security Analysis: Proxy and retry additions improve resilience, but remain safe as proxies are only for standard HTTP routing and no credential storage.
7. MAL Scraping Function (Lines 325-438)
scrape_mal_seasonal(url, max_retries=3, use_proxy=False, proxy_list=None):
- Similar to Nekopoi function, but for MyAnimeList season pages.
- Makes up to 3 GET attempts with same headers.
- Supports optional proxies.
- Checks for CAPTCHA in URL (error handling only).
- Parses different anime categories.
Security Analysis:
- Same safe pattern as Nekopoi.
- URL built from user input (year/season) but validated.
- No arbitrary code execution from collected data.
- Proxy and retry additions: Same as Nekopoi, safe and local.
8. Status Print Function (Lines 299-323)
print_status(scraping_start_time=None, continuous=False):
- Prints scraping time, data usage, and current time.
- Uses global variables for tracking.
Security Analysis:
- Console output only. No network access or sensitive file reading.
- New in version 13: Added for better time tracking, but remains safe.
9. File Saving Function (Lines 440-693)
save_to_file():
- Receives data dictionary and writes to text file.
- Uses
open()with UTF-8 encoding. - Creates directories with
os.makedirs()if needed.
Security Analysis:
- Only writes to local files in
./AnimeList/directory. - No sensitive file reading.
- Template header is statically encoded string with placeholders.
- No file uploads or external transmissions.
- No dangerous changes: Version 13 maintains local saving.
10. Header Display Function (Lines 694-702)
tampilkan_header():
- Displays program header with version 12 (note: should be 13 in updated version).
Security Analysis:
- Pure console output. No dangerous operations.
11. Main Function (Lines 704-890)
main():
- Manages user input with validation loops.
- Calls scraping functions.
- Saves data and prints status.
Input Validation:
- Year: Must be number, within reasonable range.
- Season: Options 1-4.
- Member threshold: Number with K/M suffixes.
- File name: Optional, defaults to safe pattern.
Security Analysis:
- All inputs are strings used for URL construction or file naming.
- No
eval(),exec(), or command injection. os.system('cls')only clears console (Windows-specific).- Data usage saved to local
data_usage.txt. - No dangerous changes: Version 13 adds stricter validation, but remains safe.
12. Entry Point (Lines 892-893)
if __name__ == "__main__":
main()
Security Analysis: Standard Python entry point. No automatic execution or background processes.
13. GUI Scraper (gui_scraper.py - Lines 1-541)
GUI Imports (Lines 1-28):
import tkinter as tk
from tkinter import ttk, filedialog, messagebox, scrolledtext
import threading
import os
import sys
import json
import csv
import pandas as pd
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas
from datetime import datetime
import requests
from bs4 import BeautifulSoup
import re
import time
import random
import logging
import random
# Import existing functions from the main script
from MyAnimeList_and_Nekopoi_Scrapper import (
scrape_mal_seasonal, scrape_nekopoi, save_to_file, translate_month,
parse_date_flexible, parse_member_count, get_anime_data, loading_animation,
print_status, tampilkan_header, DANGER_GENRES, EPS_REGEX, DURATION_REGEX
)
# Import localization
from localization import i18n
Security Analysis:
- tkinter: Standard Python GUI library. No backdoors, only local UI.
- pandas, reportlab: Used for data export (CSV, PDF). Local data processing, no transmission.
- Imports from main script: Uses previously analyzed safe functions.
- Localization import: Local translation system, no external calls.
AnimeScraperGUI Class (Lines 30-576):
- Creates Tkinter GUI interface.
- Manages user inputs, scraping, saving, and filtering.
- Uses threading for non-blocking operations.
Security Analysis:
- All GUI operations local. No network access except through safe scraping functions.
- File saving uses standard
filedialog, only to user-selected paths. - No code execution from user inputs.
- Threading for UI responsiveness, not dangerous operations.
- No backdoors: GUI is just a wrapper for safe CLI functions.
14. Localization System (localization.py - Lines 1-1417)
Localization Class (Lines 4-1417):
- Manages translations for 9 languages.
- Loads translation strings from code.
- Provides methods to change language and get translated text.
Security Analysis:
- All translation data statically encoded in code.
- No external file reading or network calls.
- Only string and dictionary manipulation.
- No risks: Localization system purely for UI, doesn't affect scraping logic.
Data Flow Security
User Input (validated strings) → URL Construction → HTTP GET (only 2 requests, with optional retry/proxy) → HTML Parsing → Local Dictionary → Text/JSON/CSV/PDF File Output
- No external data transmission: All data stays local.
- No user data collection: Only requests parameters for scraping.
- No persistent storage: Files created on demand, no databases or registries.
- Optional proxies: If used, only for request routing, no credential storage.
- GUI and Localization: Additional local UI, doesn't change core data flow.
Common Malware Patterns - Why None Exist
- Keyloggers: No keyboard/mouse hooks, no input monitoring.
- Data Theft: No SMTP, FTP, or HTTP POST requests.
- Remote Access: No socket connections, no listening ports.
- File Stealing: Only reads own
data_usage.txt, writes to user-specified files. - System Modification: No registry edits, no service installations, no startup entries.
- Code Obfuscation: All code readable, no base64, no minification.
- Anti-Analysis: No VM detection, no debugger checks. Retry and proxy only avoid blocks, not hide malicious activity.
Dependency Verification
requirements.txt contains:
requests>=2.25.0
beautifulsoup4>=4.9.0
pandas>=1.3.0
reportlab>=3.6.0
These are audited standard packages with no known backdoors. Version 13 adds pandas and reportlab for export features, but remains safe.
GitHub Transparency
Code available publicly at https://github.com/TheKingTermux/myanimelist-nekopoi-scrapper. Anyone can:
- Review source code
- Check commit history
- Verify no hidden commits
- Run in sandbox
Final Assurance
This script does exactly as claimed: collects public anime data and saves locally. No backdoors, no data collection beyond displayed, no remote control mechanisms. Code is transparent, uses safe libraries, follows standard Python practices. Proxy, retry, GUI, and multi-language additions in version 13 only enhance functionality without compromising security.
If still doubtful, I suggest:
- Reviewing the GitHub repository
- Running in virtual environment
- Checking network traffic (only 2 GET requests, optional proxy)
- Checking output files (only text/JSON/CSV/PDF data)