Understanding .pre‐commit‐config.yaml file - inzamamshajahan/github-actions-learn4 GitHub Wiki
This file is crucial for maintaining code quality and consistency before code changes are even committed to the repository.
Overall Purpose:
This file, .pre-commit-config.yaml
, configures the pre-commit
framework. pre-commit
is a tool that manages and executes Git hooks. Hooks are scripts that run automatically at certain points in the Git lifecycle, such as before a commit (pre-commit
) or before a push (pre-push
).
The primary goal of using pre-commit
with this configuration is to automate code quality checks locally on the developer's machine before code is committed. This helps to:
- Enforce Code Style and Formatting: Ensure all committed code adheres to consistent style guidelines (e.g., whitespace, line endings, import order, quote style).
- Catch Simple Errors Early: Identify syntax errors, unused variables, or potential bugs before they reach the main repository or CI/CD pipeline.
- Maintain Readability: Keep the codebase clean and easier for everyone to read and understand.
- Reduce CI Load: By catching issues locally, it can reduce the number of failed CI builds, saving time and resources.
- Improve Code Quality Incrementally: As developers commit, the code is automatically checked and often fixed, leading to a progressively cleaner codebase.
The configuration is written in YAML, a human-readable data serialization language.
Structure of the File:
The file consists of a top-level repos
key, which is a list of repositories. Each item in this list specifies a repository containing pre-commit hooks and defines which hooks from that repository to use.
1. Repository: https://github.com/pre-commit/pre-commit-hooks
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
-
repo: https://github.com/pre-commit/pre-commit-hooks
:-
What: Specifies the URL of the repository that provides the hooks. This is the official collection of general-purpose hooks maintained by the
pre-commit
project. - Why: These are foundational, widely useful hooks for many types of projects.
-
What: Specifies the URL of the repository that provides the hooks. This is the official collection of general-purpose hooks maintained by the
-
rev: v4.5.0
:-
What: Pins the version of the
pre-commit-hooks
repository to tagv4.5.0
. - Why: Using a specific tag (a released version) ensures that the behavior of these hooks is consistent and doesn't change unexpectedly if the upstream repository is updated. This is crucial for reproducible builds and local checks.
-
Alternatives: Using a specific commit SHA (even more pinned, but less readable) or a branch name like
main
(not recommended for stability, as it could pull in breaking changes). Tagged releases are the best practice.
-
What: Pins the version of the
-
hooks:
: A list of hook definitions from this repository.-
- id: trailing-whitespace
:- Purpose: Automatically removes any trailing whitespace from lines in modified files.
- Why: Trailing whitespace can be invisible but can create noise in diffs and is generally considered untidy.
-
Args/Alternatives: This hook has options to, for example, not modify markdown files where trailing whitespace might be significant (
args: [--markdown-linebreak-ext=md]
). For this project, the default (fixing all files) is used.
-
- id: end-of-file-fixer
:- Purpose: Ensures that every non-empty file ends with a single newline character. It will add a newline if missing or remove extra newlines at the end.
- Why: This is a POSIX convention and helps prevent issues with some tools, version control diffs (especially if lines are added to the end of files), and file concatenation.
-
- id: check-yaml
:-
Purpose: Checks the syntax of YAML files (e.g., this
.pre-commit-config.yaml
itself, GitHub Actions workflow files likeci-cd.yml
). - Why: Catches YAML syntax errors early, preventing issues with tools or platforms that consume these files.
-
Purpose: Checks the syntax of YAML files (e.g., this
-
- id: check-added-large-files
:- Purpose: Prevents accidentally committing large files to the Git repository. By default, it usually flags files larger than 500KB.
- Why: Large binary files can significantly bloat the Git repository, making cloning, fetching, and other operations slow. Such files are often better managed using Git LFS (Large File Storage) or stored in dedicated artifact repositories.
-
Args/Alternatives: Can be configured with
args: ['--maxkb=<size>']
to set a different size threshold.
-
2. Repository: https://github.com/astral-sh/ruff-pre-commit
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: 'v0.11.10'
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
-
repo: https://github.com/astral-sh/ruff-pre-commit
:- What: Specifies the repository providing pre-commit hooks for Ruff. Ruff is an extremely fast Python linter and code formatter, written in Rust.
- Why Ruff? It's designed to be a high-performance replacement for many individual Python linting and formatting tools like Flake8, isort, pydocstyle, pyupgrade, and more, often with significantly faster execution times.
-
rev: 'v0.11.10'
:-
What: This pins the
ruff-pre-commit
hook configuration. In the context ofruff-pre-commit
, thisrev
often directly corresponds to the version of theruff
tool itself that the hook will use. So, this setup aims to useruff
version0.11.10
. - Why pin this? Ensures that the version of Ruff used by pre-commit locally is consistent.
-
Note on Versioning & Alignment: It's important that the Ruff version used locally by pre-commit is compatible with, or ideally the same as, the Ruff version used in your CI pipeline (which is installed via
pip install -e .[dev]
based onpyproject.toml
). Your CI logs showruff-0.11.10
being installed frompyproject.toml
, so thisrev
ensures alignment. Keeping tool versions consistent between local hooks and CI is key to avoiding "it works on my machine but fails in CI" (or vice-versa). -
Updating: While
v0.11.10
is used here, Ruff has newer versions (e.g.,0.4.x
series as of May 2024). Regularly updatingruff
(inpyproject.toml
) and thisrev
to newer stable versions is good practice to get the latest features, bug fixes, and performance improvements.
-
What: This pins the
-
hooks:
:-
- id: ruff
:- Purpose: Runs the Ruff linter.
-
args: [--fix, --exit-non-zero-on-fix]
:-
--fix
: Instructs Ruff to automatically attempt to fix any fixable linting violations it finds. -
--exit-non-zero-on-fix
: If Ruff applies any fixes, the hook will exit with a non-zero status. This causespre-commit
to abort the commit process, signaling that files were changed. The developer then needs togit add
the modified files and attempt the commit again. This ensures that automated changes are reviewed and explicitly included in the commit. - Why this combination? It provides the convenience of auto-fixing while ensuring the developer is aware of and approves the changes made by the tool.
-
-
- id: ruff-format
:-
Purpose: Runs the Ruff code formatter. This ensures the code style (line length, quotes, indentation, etc.) is consistent across the project, based on the configuration in
pyproject.toml
under[tool.ruff.format]
. -
Behavior: Like the linting hook with
--fix
, ifruff-format
modifies files, the commit will typically be aborted by pre-commit, requiring the developer to stage the changes and re-commit. This enforces that all committed code is correctly formatted.
-
Purpose: Runs the Ruff code formatter. This ensures the code style (line length, quotes, indentation, etc.) is consistent across the project, based on the configuration in
-
3. Repository: https://github.com/pre-commit/mirrors-mypy
- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v1.7.1'
hooks:
- id: mypy
args: [--config-file=pyproject.toml]
additional_dependencies: ['pandas-stubs', 'types-PyYAML', 'pytest']
-
repo: https://github.com/pre-commit/mirrors-mypy
:-
What: Specifies the repository that provides a
pre-commit
compatible way to run Mypy. Mypy is a static type checker for Python. Thismirrors-mypy
repository makes it easy to integrate Mypy releases into thepre-commit
framework.
-
What: Specifies the repository that provides a
-
rev: 'v1.7.1'
:-
What: Pins the hook to use a version corresponding to Mypy
v1.7.1
. - Why: Ensures a consistent version of Mypy is used for local type checking.
-
What: Pins the hook to use a version corresponding to Mypy
-
hooks:
:-
- id: mypy
:- Purpose: Executes Mypy to perform static type checking on your Python code based on type hints.
- Why Mypy? It helps catch a class of errors (type mismatches, incorrect API usage based on types) before runtime, which can make code more robust, easier to understand, and simpler to refactor.
-
args: [--config-file=pyproject.toml]
:- This tells Mypy to load its configuration from the
[tool.mypy]
section of yourpyproject.toml
file. This is the standard way to centralize Mypy's settings (like Python version, plugins, warning flags, paths).
- This tells Mypy to load its configuration from the
-
additional_dependencies: ['pandas-stubs', 'types-PyYAML', 'pytest']
:-
Purpose: Mypy needs type information (often in the form of "stub" files,
.pyi
) for libraries that don't include inline type hints or don't distribute their own stubs. -
pandas-stubs
: Provides type stubs for thepandas
library. -
types-PyYAML
: Provides type stubs for thePyYAML
library. (Note: WhilePyYAML
is not a direct runtime dependency ofsrc/main.py
, it might be a transitive dependency of one of your development tools, or this entry is for general robustness if YAML processing is done elsewhere in the project or by dev tools that Mypy might analyze indirectly). -
pytest
: Provides stubs forpytest
itself, which is useful if your test code (tests/test_main.py
) uses type hints for Pytest-specific features (e.g., fixture types). -
Why
additional_dependencies
here?pre-commit
runs each hook in an isolated environment. These dependencies are installed into that hook's specific environment so Mypy can find and use these stubs during its analysis. They don't necessarily need to be in your main project'sdependencies
ordev-dependencies
inpyproject.toml
(thoughpandas-stubs
andtypes-PyYAML
are good candidates fordev-dependencies
if you want your IDE to also benefit from them). - Impact: Without these, Mypy might report "library stubs not found" or be unable to check code using these libraries effectively.
-
Purpose: Mypy needs type information (often in the form of "stub" files,
-
General Rationale & Alternatives:
- Why Pre-commit? It standardizes the use of hooks across different developers and languages, manages the installation of hook environments, and makes it easy to configure and share a common set of checks.
-
Alternatives to Specific Hooks:
-
Formatting/Linting: Before Ruff gained popularity, common choices were
black
for formatting,flake8
for linting, andisort
for import sorting, often as separate hooks. Ruff aims to consolidate many of these. -
Type Checking: Mypy is the most established, but other type checkers like
Pyright
(used by VS Code's Pylance) orPytype
exist.
-
Formatting/Linting: Before Ruff gained popularity, common choices were
-
Configuration Location: While tool-specific configurations are rightly in
pyproject.toml
([tool.ruff]
,[tool.mypy]
), the hook definitions must be in.pre-commit-config.yaml
. -
Updating Hooks: It's good practice to periodically update the
rev
for each repository to get the latest bug fixes and features from the hooks. This can often be done withpre-commit autoupdate
.
In summary, this .pre-commit-config.yaml
sets up a robust local development workflow by:
- Cleaning up basic file issues (whitespace, end-of-file).
- Preventing accidental commits of large files or broken YAML.
- Leveraging Ruff for extremely fast Python linting, auto-fixing, and formatting.
- Using Mypy for static type checking, with necessary stubs for key libraries.
This configuration significantly contributes to code quality and developer productivity by automating these checks.