Proposal #2 ‐ AI Prompt Injection - charlottecroce/capstone-ccc410 GitHub Wiki
Problem Statement
Large Language Models (LLMs) are vulnerable to prompt injection and jailbreaking attacks where an actor may force the LLM to perform behavior that is unexpected and outside of its intended function. A successful attack may lead to an AI’s system code being revealed, sensitive personal information being exfiltrated, manipulation of the LLM’s output to the user, and actions performed on other systems which the model may have API access to. This issue is of particular concern given the rapid adoption of AI tools by both organizations and individuals. HAI Stanford reports that AI-related incidents are rising rapidly along with adoption, but standardized risk assessment still remains rare with few existing tools on the market, further exacerbating the severity of the threat due to the relatively poor security controls currently available to mitigate such attacks. According to a recent report published by IBM Institute for Business Value, business executives have found that adopting generative AI tools into their organizations creates a 96% likelihood that they will experience a security breach in the next 3 years. Despite this, organizations continue to adopt AI tools rapidly, with an increase from 55% in 2023 to 78% in 2024, as reported by HAI Stanford in 2025.
Proposed Solution
This capstone aims to educate organizations and individuals on the methodology and risks associated with the quickly emerging threat of AI prompt injection. We aim to create a report that will provide in depth information on the statistics of AI adoption rates, how LLMs and other AI models process input, how current attacks work and their potential consequences, and provide recommendations for best security practices. We also hope to develop a mock example application to demonstrate prompt injection, the attack’s consequences, and how certain security controls can affect outcomes. The results of these deliverables will be distilled and published on a website to provide further education to organizations and individuals alike.
Scope
-
Report
- Analysis of AI adoption rates and implications for security risks
- Description of how LLMs are programmed and process user input
- Explanation of how prompt injection and jailbreaking work
- Risks and examples of historical attacks
- Determine efficacy of mitigation, security controls and provide recommendations
- Document development and function of example AI model
-
Mock LLM Application
- Designed to provide real-world example of how different attacks work
- Demonstrate the consequences associated with different attacks (e.g. data exfiltration or revealing of core instruction set)
- Provide transparency and reliability in demonstration which may not be available with ChatBots that are publicly available
- Alternative to demonstration and testing on third party organization’s LLMs
-
Educational Website
- Educate wide audience of people including end users, organizations, etc
- Condense and simplify findings from report to make it more easily digestible for a wider range of technical expertise
- Demonstrate mock LLM application to give real world examples of more abstract security threats and attack methodologies
References
Our sources used are available here