Papers - 0ca/BoxPwnr GitHub Wiki
Papers
Hacking the Hacker: How AI Agents are Changing the Game of Penetration Testing
https://www.nctatechnicalpapers.com/Paper/2024/AI06_Haefner_6481_paper
Notes
They evaluated different models: Llama3, Dolphin-2.9 and GPT4o. They found GPT4o to perform the best. They also tried different architectures for agents:
- Two agents
- Central Coordinator Model
- Team Lead Model
They found that two agents works best that many specialized agents. But I wonder if this is a side effect of the prompting they were doing or the model (Are future models going to work better with more specialized agents?). They didn't test 1 single agent (That is what BoxPwrn is doing now). I wonder if two agents would work better than just one, I expect so, but I also would expect many specialized agent would work better than just 2.
Based on the output they shared, they used at least 2 machines from Starting Point: Meow & Fawn. But they don't disclose the full list and the statistics per machine, only the following:
The authors ran the LLMs against ten Hack-The-Box challenges and the models were able to complete five of them.
Their success rate with Gpt4o for Meow was 43%.
Very interesting observation:
This pattern and success rates held true even if the number of turns increased from 12 to 24. The agents just don’t seem to be able to “step back” and determine the problem. I have also observed a lack of stepping back. Hopefully we can address it with reasoning models.
I haven't ran into hallucinations yet (what surprises me). I bet it's because we are using a newer release of the same model. Or maybe something magic I have in the prompt? Also I hit guardrails at the beginning but after adding to the prompt that I was authorized to do the test I haven't hit them again.
Overall, it's an inspiring paper, but I miss more detailed statistics, more information about the iterations they do to reach the prompt they are using and of course I miss that they didn't publish the source code.
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool
https://arxiv.org/abs/2308.06782 submitted on 13 Aug 2023
Timothee Chauvin
Recent papers / work on AI and hacking byThis is an amazing list and much better that this wiki article 1000% recommended.
https://tchauvin.com/recent-papers-ai-hacking
Hacking CTFs with plain agents
https://arxiv.org/abs/2412.02776
LLM Agents can Autonomously Hack Websites
https://arxiv.org/abs/2402.06664
An Empirical Evaluation of LLMs for Solving Offensive Security Challenges
https://arxiv.org/abs/2402.11814
LLM Agents can Autonomously Exploit One-day Vulnerabilities
https://arxiv.org/abs/2404.08144
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
https://arxiv.org/abs/2406.01637
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security
https://arxiv.org/abs/2406.05590
Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
https://arxiv.org/abs/2409.16165
EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges
https://enigma-agent.com/assets/paper.pdf
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
https://arxiv.org/abs/2412.01778
Notes
They present a two model design, planner and summarizer. The summarizer runs the commands and summarizes them. The summary is injected in the prompt in each call:
One issue with this approach is that you can't cache the prompt/conversation since it's modifying the system prompt every time with the updated summary. But it reduces significantly the context used, so probably helps.
All the code is public :) https://github.com/aielte-research/HackSynth/tree/main
They also experiment with different temperature, top-p value and observation window.
The latest model they tested was gpt4o. So I wonder how this apply to new models with reasoning and longer context.
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models
https://arxiv.org/abs/2408.08926