20 questions about a git repo - chunhualiao/public-docs GitHub Wiki
question answering of codebases
## Q01: What problem does this project solve, and who is the intended audience?
## Q02: How do I install and run it locally (including prerequisites and dependencies)?
## Q03: What's the minimal "Hello World" or quick-start example?
## Q04: Which platforms, languages, and versions are supported (OS, runtimes, frameworks)?
## Q05: How stable is the project (release cadence, last commit, roadmap, breaking changes policy)?
## Q06: How do I configure it and deploy to staging/production (env vars, secrets, containers)?
## Q07: What are known limitations, security considerations, and data/privacy implications?
## Q08: How do I contribute (issues workflow, coding standards, tests, CI, code of conduct)?
## Q09: Where do I get help (docs, examples, discussions, Slack/Discord, support policy)?
## Q10: What license applies and can I use this commercially or in closed-source contexts?
## Q11: What are the core algorithms used, and what are their time/space complexity and trade-offs?
## Q12: What data structures and invariants does the system rely on, and where are they documented or asserted?
## Q13: What is the error handling strategy (recoverable vs fatal errors), and how are errors propagated and retried?
## Q14: What concurrency model is used (threads, async I/O, message passing), and which synchronization primitives are employed?
## Q15: Where are known performance hotspots, and what profiling/benchmark evidence exists with tuning knobs?
## Q16: How is resource management handled (memory ownership/lifetimes, file/socket handles) especially on failure paths?
## Q17: What is the API stability and versioning policy (SemVer, deprecations), and how are migrations communicated?
## Q18: What is the testing strategy (unit/integration/e2e/property/fuzz), coverage levels, and how are flaky tests managed?
## Q19: What observability exists (logging levels, structured logs, metrics/traces), and how is debugging enabled in production?
## Q20: What is the security model (authN/authZ, input validation, sandboxing) and supply-chain hardening practices?
Create a list of categorized, 100 questions that can be used to ask about a git repository so they reveal comprehensive, deep, and valuable information about it.
Below is a categorized, **exactly 100-question** checklist you can use to interrogate a Git repository and surface deep, valuable information. The numbering runs across categories for easy tracking.
---
## A. Purpose & Scope
1. What problem does this repository solve, and for whom?
2. What are the explicit non‑goals or out‑of‑scope areas documented anywhere in the repo?
3. What are the primary success metrics (technical and business) this code is meant to move?
4. What assumptions does the design make about its runtime environment or inputs?
5. What constraints (SLAs, budgets, hardware limits, deadlines) shaped the current solution?
## B. Stakeholders & Domain Context
6. Who are the key stakeholders/owners, and how are they identified in CODEOWNERS or docs?
7. What domain concepts (glossary/ubiquitous language) are defined, and where are they documented?
8. Which external systems or teams depend on this repo’s outputs or APIs?
9. What upstream inputs or contracts does this repo rely on (events, APIs, data feeds)?
10. Which regulatory or industry contexts (e.g., HIPAA, PCI, GDPR) influence design decisions?
## C. Architecture & Design Decisions
11. What is the current high‑level architecture (context/container/components), and where is the diagram?
12. Which ADRs (Architecture Decision Records) exist, and what trade‑offs did they capture?
13. Where are the system boundaries and decomposition principles explained (modules/services/packages)?
14. Which core patterns (CQRS, hexagonal, DDD, event‑driven, layered) are intentionally used?
15. What are the most contentious or reversible design choices we should revisit first?
## D. Code Organization & Conventions
16. What is the directory layout, and how does it map to responsibilities or layers?
17. What style guides or linters are enforced (.editorconfig, ESLint, Black, clang‑format), and how?
18. How are cross‑cutting concerns (errors, logging, config, auth) organized and reused?
19. Where are code generation or scaffolding templates defined, if any?
20. Is this a mono‑repo or multi‑repo architecture, and how are shared modules versioned?
## E. Languages, Frameworks & Dependencies
21. Which languages and framework versions are supported and pinned?
22. Where are runtime and build dependencies declared, and are lockfiles committed?
23. What is the policy for dependency updates (renovate/dependabot cadence, pinning strategy)?
24. Do we vendor dependencies or use Git submodules/subtrees anywhere, and why?
25. Is there an SBOM or dependency graph generated, and where is it stored?
## F. Build System & Tooling
26. What build tools (Make, Gradle, Maven, npm, Poetry, Bazel, CMake) does the repo standardize on?
27. How is a clean, reproducible build achieved (hermetic builds, containerized toolchains, caching)?
28. What are the canonical developer tasks (e.g., `make test`, `make lint`) and where are they documented?
29. Are pre‑commit hooks configured (formatting, linting, secret scan) and enforced in CI?
30. How are binary artifacts built, versioned, and published (registries, checksums, provenance)?
## G. Git History & Commit Practices
31. What commit message conventions are used (Conventional Commits, emojis, prefixes)?
32. Are commits signed (GPG/Sigstore), and are unsigned commits blocked?
33. What is the policy on rebasing, squashing, and force‑pushes to protected branches?
34. How are large files handled (Git LFS) and is there any history bloat to address?
35. Are tags annotated and traceable to releases/changelogs?
## H. Branching Model & Release Workflow
36. What branching strategy is used (trunk‑based, GitFlow, release branches), and why?
37. What protection rules exist on default and release branches (required reviews, status checks)?
38. How are release candidates prepared and promoted (staging branches, release trains)?
39. What versioning scheme is used (SemVer/calendar) and how are tags named?
40. How are hotfixes managed against past releases without diverging from main?
## I. Code Review & Collaboration Norms
41. What does a “ready for review” PR look like (template, size limits, evidence)?
42. Who must review which areas (CODEOWNERS), and what are approval thresholds?
43. What automated checks gate merges (tests, coverage, SAST, license scan)?
44. How are PR discussions resolved and decisions captured for posterity?
45. What is the policy for handling stale PRs and long‑running feature branches?
## J. Testing Strategy & Quality Gates
46. What is the test pyramid composition (unit, integration, E2E, contract, property tests)?
47. What coverage thresholds exist, and are they enforced in CI?
48. How are flaky tests detected, quarantined, and deflaked?
49. How are test data and fixtures managed (seed data, synthetic vs. real anonymized)?
50. Are non‑functional tests (security, performance, accessibility, i18n) part of the pipeline?
## K. CI/CD Pipeline & Automation
51. Which CI system(s) are used (GitHub Actions, GitLab CI, CircleCI, Jenkins), and where are workflows defined?
52. What are the pipeline stages (lint → build → test → package → deploy), and what caches/artifacts flow between them?
53. How are secrets provided to CI (OIDC, vault, masked vars), and how is secret sprawl prevented?
54. Where are build artifacts and container images stored (registries, retention policy)?
55. What deployment automation exists (push‑button, chat‑ops, GitOps), and what manual gates remain?
## L. Configuration, Secrets & Environments
56. Where do configuration files live, and how is precedence managed (env vars, files, flags)?
57. How are secrets managed (Vault, KMS, SOPS, sealed secrets) and rotated?
58. How is environment parity maintained across dev/stage/prod (config as code, feature flags)?
59. What is the strategy for local development environments (devcontainers, Docker Compose, kind/minikube)?
60. How are runtime configuration changes audited and rolled back?
## M. Security, Privacy & Compliance
61. What is the threat model and trust boundaries, and where is it documented?
62. How are authn/authz implemented (protocols, roles, scopes), and tested?
63. What security scanning runs (SAST, SCA, IaC scan, secrets scan), and how are findings triaged?
64. How is sensitive data handled (PII classification, encryption at rest/in transit, key management)?
65. What compliance evidence is generated (audit logs, attestations, SBOM, provenance) and retained?
## N. Performance, Scalability & Capacity
66. What SLIs/SLOs exist (latency, error rate, saturation), and where are they tracked?
67. What load/perf testing is performed (tools, scenarios, test data realism)?
68. What caching strategies are used (in‑process, distributed, CDN) and what are invalidation rules?
69. How does the system handle concurrency, backpressure, and timeouts?
70. Where are known hotspots or bottlenecks, and what profiling tools/benchmarks document them?
## O. Data Model, Storage & Migrations
71. Which data stores are used (versions, HA/replication), and why were they chosen?
72. How are schema changes managed (migrations framework, backward compatibility, zero‑downtime)?
73. What is the data retention, archival, and deletion policy (legal holds, GDPR erasure)?
74. How are backups, restores, and disaster recovery tested and documented?
75. How are reference/seed data and fixtures versioned alongside code?
## P. Observability: Logging, Metrics & Tracing
76. What structured logging format and levels are used, and where are logs shipped?
77. Which key metrics are emitted, and which dashboards/alerts consume them?
78. Is distributed tracing enabled (propagation format, sampling, spans), and what’s the trace coverage?
79. What alerting rules and paging thresholds exist, and how is alert noise managed?
80. Where are runbooks and incident postmortems stored, and how are lessons learned captured?
## Q. Runtime, Deployment & Infrastructure
81. How is the app packaged (containers, serverless, binaries), and what base images/runtimes are used?
82. Where is IaC kept (Terraform, CloudFormation, Pulumi, Helm, Kustomize), and how is drift detected?
83. What is the environment topology (regions, VPCs, clusters), and how is multi‑region handled?
84. What deployment strategies are used (canary, blue/green, rolling), and how are rollbacks executed?
85. Which managed services or cloud dependencies exist, and what are their SLAs/quotas?
## R. APIs, Integrations & Contracts
86. What public or internal APIs does the repo expose, and how are they documented (OpenAPI/GraphQL)?
87. How is API versioning and deprecation handled to preserve backward compatibility?
88. Are contract tests in place for both provided and consumed APIs, and where do they run?
89. How are failures and retries handled for third‑party integrations (idempotency, circuit breakers)?
90. Where are SDKs/clients generated and published, and how are breaking changes communicated?
## S. Documentation, Onboarding & Knowledge
91. How complete is the README (purpose, quickstart, troubleshooting), and what’s missing?
92. What onboarding guide exists for new contributors (setup script, prerequisites, first issue)?
93. Where are architectural/sequence diagrams stored and kept in sync with code changes?
94. What contribution guidelines (CONTRIBUTING.md), PR/issue templates, and style guides exist?
95. What is the bus factor, and how is knowledge spread (docs sprints, ADRs, internal talks)?
## T. Roadmap, Maintenance, Risk & Governance
96. What is the current roadmap and how is prioritization tracked (issues, milestones, projects)?
97. How are bugs, features, and tech debt labeled and triaged (SLA, severity, ownership)?
98. What is the deprecation/removal policy for features, flags, and endpoints?
99. What is the repository’s license, and how is third‑party license compliance verified?
100. Who owns long‑term stewardship (maintainers, escalation paths), and what is the succession plan?
---
**How to use this list:** Pick the most relevant categories for your context, then ask these questions during repo discovery, code reviews, architecture sessions, or when drafting onboarding and runbooks.