2_Prompt Attacks - Anony231/LLMSecuirty GitHub Wiki
What is a Prompt?
A prompt is a short piece of text that is given to a LLM as input, and it can be used to control the output of a model in variety of ways. Prompt design is a process of creating a prompt that will generate the desired output from an LLM.
What are Prompt Attacks?
This is technique to manipulate the Generative AI system by adding the harmful or conflicting instructions, causing unintended or malicious actions. With the prompt attacks, AI vulnerabilities can be exploited by manipulating input structure, intent or context.
What are not Prompt Attacks?
Inputs that are direct queries, lack of conflicting instructions, are not prompt attacks. Even if the content raises ethical concerns, this is not a prompt attack.
Key to understand Prompt Attacks:
This is a bit tricky part to understand a prompt attack. Focus on three factors to identify if it a prompt attack.
- Intent
- Instructions
- Context