How GitHub Copilot Generates and Delivers Code Suggestions - accentient/github-copilot-devs GitHub Wiki
GitHub Copilot works seamlessly across different platforms — whether you're in a code editor, using the command line, using the mobile app, or chatting on GitHub.com. Behind the scenes, it follows a well-structured pipeline made up of user input, context enrichment tools, prompt construction, and the large language model (LLM) at its core.
1. Gathering Context and Building a Prompt
Great answers require great context, and GitHub Copilot starts by gathering as much relevant information as possible from your current session.
- From the Code Editor: Copilot pulls context from the file you're working on, related files nearby, repository URLs, and file paths.
- From Copilot Chat: It considers highlighted code, previous questions, and responses in your conversation.
- Custom Control: You can fine-tune what Copilot uses by updating your content exclusion settings.
For example, if you're editing a file with business logic, having the unit test file open as well helps Copilot understand how the business logic should behave.
Using Your Code to Provide Suggestions
GitHub Copilot generates code suggestions by temporarily transferring elements of your code editor's context, such as file content and related metadata, to GitHub's servers for processing. This data, encrypted in transit and at rest, is used to create and return suggestions via a model hosted on Microsoft Azure. In real-time use within the code editor, the prompt and suggestions are discarded after processing, while prompts used outside the editor are stored for up to 28 days. The process ensures security while enabling accurate and context-aware coding assistance.
Data Encryption and Protection
GitHub Copilot transmits data to GitHub’s Azure tenant to generate suggestions, including both contextual data about the code and file being edited (“prompts”) and data about the user’s actions (“user engagement data”). The transmitted data is encrypted both in transit and at rest; Copilot-related data is encrypted in transit using transport layer security (TLS), and for any data we retain at rest using Microsoft Azure’s data encryption (FIPS Publication 140-2 standards).
2. Validating the Prompt for Safety and Relevance
Once Copilot builds a prompt, it’s sent securely to a proxy server hosted on GitHub-owned Azure infrastructure. Before reaching the LLM, the prompt goes through a series of checks:
- Toxic Language: Filtering out harmful or inappropriate content.
- Relevance: Ensuring the question aligns with coding and software development.
- Security Checks: Blocking attempts to manipulate Copilot into revealing restricted information.
If the prompt passes these checks, it’s sent to the LLM for processing. Prompts are routed based on regional server capacity, so traffic might not always stay within a specific region.
Duplication Detection Filter
This filter detects and suppresses suggestions that contain code segments over a certain length that match public code on GitHub. This filter can be enabled by the administrator for your enterprise and it can apply for all organizations within your enterprise, or the administrator can defer control to individual organizations. With the filter enabled, Copilot checks code suggestions for matches or near-matches against public code on GitHub of 65 lexemes or more (on average,150 characters). If there is a match, the suggestion will not be shown to the user. In addition to off-topic, harmful, and offensive output filters, GitHub Copilot also scans the outputs for vulnerable code.
3. The LLM Generates a Response
At the heart of GitHub Copilot is an LLM hosted on GitHub-owned Azure servers. This AI model, developed by OpenAI, has been trained on a vast dataset of public text and source code, including public GitHub repositories.
How Copilot Handles Your Data:
- In the Code Editor: Prompts aren’t stored after generating suggestions. Individual users can also opt out of sharing prompts for fine-tuning purposes.
- Outside the Editor (CLI, Mobile, or GitHub.com Chat): Conversations and session history persist for continuity across sessions but aren’t used for training the LLM.
4. Post-Processing the Response
Before Copilot sends a suggestion back to you, it goes through another round of validation on the proxy server:
- Toxic Language and Relevance: The response is checked again for safety and alignment with your request.
- Code Quality: Suggestions are scanned for common bugs and security risks, like SQL injection or cross-site scripting.
- Sensitive Information: Copilot removes unique identifiers, such as email addresses, IPs, and hard-coded credentials.
- Public Code Matching (Optional): If enabled by an admin, responses are compared against public code on GitHub to prevent matches.
How Public Code Matching Works:
- Suggestions over ~150 characters are stripped of whitespace and compared against public GitHub code.
- If a match is found and filtering is enabled, the suggestion is discarded, and Copilot generates a new one.
This filter is most effective when Copilot has minimal context — like when starting a brand-new file or project.
5. Delivering Suggestions Back to You
Once a suggestion clears all checks, it’s sent back to you:
- In code editors, you might see multiple suggestions and can accept, reject, or refine them.
- In some IDEs (e.g., VS Code, Visual Studio, JetBrains), you can even accept suggestions word-by-word for finer control.
Suggestions Are Generated Using Probabilistic Determination
- GitHub Copilot’s AI models are trained on public code but do not store or copy code; suggestions are probabilistic predictions
- For code suggestions, Copilot analyzes the code around your cursor and other open files to provide context.
- In the code editor, chat suggestions are based on your prompt, active document, code selection, and workspace details like frameworks and dependencies
- On GitHub.com, chat suggestions combine your prompt with previous prompts, open GitHub pages, and additional context like codebase or search results
Retaining Your Prompts
The GitHub Copilot extension in the code editor does not retain your prompts after providing suggestions unless you’re a Copilot Pro or Free subscriber who has opted in. Content from your editor is temporarily transferred to GitHub’s servers to assess context and provide suggestions, then deleted shortly after. GitHub Copilot Business and Enterprise data is never used to train the model.
And with that, the cycle starts over again — Copilot uses your interactions to build better prompts and refine future suggestions.