TelegramCrawler 2 - Kishan1750/OSINT GitHub Wiki
Introduction
The Telegram Crawler is a versatile Python-based script that empowers users to interact with the Telegram API and efficiently extract media files from Telegram channels and groups. This crawler serves as a valuable tool for Telegram users, enabling them to access and manage media content from various chats with ease.
Setup and Requirements
To utilize the Telegram Crawler, ensure that you have the necessary dependencies installed on your system:
- Telethon: The primary library that provides access to the Telegram API, facilitating chat interaction and media retrieval.
- tqdm: A handy progress bar library that enhances the user experience by displaying real-time progress while fetching attachments.
- tabulate: A library for generating well-formatted tables, used to present chat lists, extensions, and attachment details in a structured manner.
You can easily install these libraries via pip:
pip install telethon tqdm tabulate
Furthermore, before running the Telegram Crawler, you must obtain your Telegram API ID and API Hash. The API ID is a unique identifier for your Telegram application, while the API Hash serves as the authentication key. Once you have these credentials, the script will store them in a api_credentials.txt file for future usage.
Code
Usage
Follow the steps below to effectively use the Telegram Crawler:
Clone the repository and navigate to the project directory.
Execute the following command in your terminal or command prompt to run the script:
python TeleCrawler.py
If you're running the script for the first time, it will prompt you to input your Telegram API ID and API Hash. These credentials will be stored in a file for future reference, so you don't need to re-enter them each time you run the script.
The script will display a list of your Telegram chats, including both channels and groups, along with the total number of messages in each chat.
Select the index number of the channel/group you want to crawl, and the script will display a list of available file extensions in that chat, along with the total number of attachments for each extension.
Choose the index number of the extension you wish to download, and the script will present you with a list of attachments that match the selected extension. The list includes the filenames, associated messages (if any), and sizes in MB.
You have the option to proceed with downloading the attachments or cancel the download process.
If you choose to download the files, the script will create a directory named after the chat title, with invalid characters sanitized as underscores, and download the selected extension files into that directory.
Optionally, you can generate text files for the downloaded files. This is useful for text-based documents and allows you to archive the contents in a readable format.
The script will conclude after the download process is complete.
Code Explanation
The Telegram Crawler is composed of several key functions that perform specific tasks:
get_api_credentials(): This function prompts the user to input their Telegram API ID and API Hash. The credentials are then stored in the api_credentials.txt file.
initialize_client(): The TelegramClient is initialized with the provided API credentials, and the session is started.
fetch_extensions(messages): This function extracts available file extensions from the messages and returns a dictionary with the count for each extension.
display_extensions_table(extensions): A table is generated, displaying available extensions and their respective counts.
download_media(group, cl, name, file_ext): Media files with the specified extension are downloaded from the selected chat.
generate_txt_files(messages, extension_index): Text files are generated for the downloaded media files with the specified extension.
get_user_choice(prompt, min_value, max_value): This function prompts the user for input and validates the chosen index.
get_total_messages(channel): The total number of messages in a channel or group is retrieved.
fetch_attachments_details(messages, extension_index): This function extracts attachment details (filename, associated message, size) for a specific extension.
Screenshots
The following screenshots illustrate the Telegram Crawler in action:
Screenshot 1: API Credentials Input
User entering their Telegram API ID and API Hash.
Screenshot 2: Chat Selection
List of available chats with their total message counts.
Screenshot 3: Extension Selection
Available file extensions in the selected chat.
Screenshot 4: Download Confirmation
Confirmation prompt for downloading the selected attachments.
Downloaded .xlsx files
Generated text files
Conclusion
The Telegram Crawler provides a valuable solution for efficiently managing and downloading media content from Telegram channels and groups. By following the steps outlined in this wiki, users can easily fetch attachments, organize them, and optionally generate text files for further analysis or archiving.