Generating inputs - cshunor02/sponge-attack GitHub Wiki
Hunor's lists
> hello.txt
To generate a file that contains $n$ times (in my case, a million times) the word Hello with a space, we have to run the following code:
The file hello.py:
prompt_input = "Hello \n" * 1000000
file = open('inputs/hello.txt', 'w', encoding='utf-8')
file.write(prompt_input)
file.close()
To do so, Python 1.13.1 needs to be installed. In a terminal, run the following code:
python hello.py
If the code execution was successful, in the inputs folder there will be a hello.txt file, which can be uploaded to the given LLM model.
> strong_sponge.txt
A similar file is strong_sponge.txt, which contains different words (not just “Hello”). By combining common English words with random punctuations, we could prevent compression optimization and ask more complex questions from the LLM model.
The file strong_sponge.py:
import random
words = ["Hello", "hello", "World", "Attack", "Random", "Example", "Input", "Large", "File", "Words", "LLM Overload", "Team", ".", "?", "!", ":", ";"]
lines = []
for _ in range(1000000):
line = ' '.join(random.choices(words, k=10))
lines.append(line)
with open('inputs/strong_sponge.txt', 'w', encoding='utf-8') as f:
f.write('\n'.join(lines))
If random is not installed beforehand, you may have to run the following code in Terminal:
pip install random
> stealth.txt
To generate stealth.txt, which is a large list of books, we have to run several codes first.
The first file that needs to be run is the booktitles.py file. This code will collect $n$ number of bestsellers from Books API, and will write in a file the titles and authors.
Because there is a limit to use this code (you cannot ask for more than 100 titles in a minute because of the API), it requires some time to develop such list that would overload a model. Because of this limitation, all books that are collected right now (first 960 books) are in different files:
To combine these JSONs, you have to run makelist.py as well. Ir will collect all files in the /inputs folder that starts with books and ends with .txt, so not only 4 file can be combined (but all of them if there are more).
python booktitles.py
python makelist.py
In Line 33, you need to add your own API key from Books API - New York Times.
Benchmark Input
To evaluate the efficiency and impact of different sponge attack strategies, we introduce a standardized Benchmark Input. This input is designed to generate predictable, measurable responses from LLMs, enabling consistent comparisons across attack types and model configurations. For more details, see the full BenchmarkGenerator