GitHub Integration - softwarewrighter/overall GitHub Wiki
This page documents how overall integrates with GitHub through the gh CLI tool.
- Architecture
- Authentication
- GitHub Module Structure
- Key Operations
- Data Transformations
- Error Handling
- Rate Limiting
The GitHub integration is built entirely around the official GitHub CLI (gh) tool, rather than making direct REST API calls.
graph TB
subgraph "overall Application"
App[Application Code]
GHMod[github/ Module]
end
subgraph "GitHub CLI"
GHCLI[gh binary]
Auth[Stored Credentials]
end
subgraph "GitHub"
API[GitHub REST API]
GraphQL[GitHub GraphQL API]
end
App -->|Call functions| GHMod
GHMod -->|Execute commands| GHCLI
GHCLI -->|Read| Auth
GHCLI -->|HTTPS| API
GHCLI -->|HTTPS| GraphQL
Advantages:
-
Authentication Management:
ghhandles OAuth flow and credential storage securely -
API Versioning:
ghmaintains compatibility with GitHub API changes -
Rate Limiting:
ghrespects rate limits and provides retry logic -
JSON Output:
ghprovides structured JSON output for easy parsing -
Feature Parity: Access to all GitHub features as they're added to
gh
Tradeoffs:
- Requires
ghto be installed and in PATH - External process overhead (minimal impact)
- Less control over exact HTTP requests
Authentication is handled entirely by gh CLI.
sequenceDiagram
participant User
participant Terminal
participant GH as gh CLI
participant Browser
participant GitHub
User->>Terminal: gh auth login
GH->>User: Choose auth method?
User->>GH: Browser
GH->>Browser: Open GitHub OAuth page
Browser->>GitHub: Request authorization
GitHub->>User: Show permission approval
User->>GitHub: Approve
GitHub->>GH: OAuth token
GH->>GH: Store token securely (OS keychain)
GH->>Terminal: Authentication complete
gh stores credentials in OS-specific secure storage:
| OS | Storage Location |
|---|---|
| macOS | Keychain |
| Linux | Secret Service API / gnome-keyring |
| Windows | Windows Credential Manager |
The application checks authentication status before operations:
gh auth status
# Output:
# ✓ Logged in to github.com as username
# ✓ Token: *******************overall-cli/src/github/
├── mod.rs # Public API and exports
├── commands.rs # gh CLI execution functions
└── models.rs # GitHub-specific data structures
classDiagram
class GitHubModule {
+list_repos(owner: &str) Result~Vec~Repository~~
+fetch_branches(owner: &str, repo: &str) Result~Vec~Branch~~
+fetch_pull_requests(owner: &str, repo: &str) Result~Vec~PullRequest~~
+fetch_commits(owner: &str, repo: &str, branch: &str) Result~Vec~Commit~~
+create_pull_request(request: &PrRequest) Result~PullRequest~
}
class Commands {
-execute_gh_command(args: &[&str]) Result~String~
-parse_json~T~(output: &str) Result~T~
}
class Models {
+GhRepo
+GhBranch
+GhPullRequest
+GhCommit
}
GitHubModule --> Commands
GitHubModule --> Models
Commands ..> Models : parses into
Fetches the top 50 most recently pushed repositories for a user/organization.
Command:
gh repo list <owner> \
--limit 50 \
--json name,owner,language,description,pushedAt,createdAt,updatedAt,isForkImplementation:
pub fn list_repos(owner: &str) -> Result<Vec<Repository>> {
let output = Command::new("gh")
.args(&[
"repo", "list", owner,
"--limit", "50",
"--json", "name,owner,language,description,pushedAt,createdAt,updatedAt,isFork",
])
.output()?;
let gh_repos: Vec<GhRepo> = serde_json::from_slice(&output.stdout)?;
// Transform to internal Repository model
let repos = gh_repos.into_iter()
.map(|gh_repo| Repository {
id: format!("{}/{}", gh_repo.owner.login, gh_repo.name),
owner: gh_repo.owner.login,
name: gh_repo.name,
// ... other fields
})
.collect();
Ok(repos)
}Response Structure:
[
{
"name": "overall",
"owner": {
"login": "softwarewrighter"
},
"language": "Rust",
"description": "GitHub Repository Manager",
"pushedAt": "2025-11-17T12:00:00Z",
"createdAt": "2025-01-01T00:00:00Z",
"updatedAt": "2025-11-17T11:55:00Z",
"isFork": false
}
]Gets all branches for a repository with ahead/behind counts.
Command:
gh api repos/<owner>/<repo>/branches \
--jq '.[] | {name, commit: .commit.sha}'Then for each branch:
gh api repos/<owner>/<repo>/compare/<base>...<head> \
--jq '{ahead: .ahead_by, behind: .behind_by}'Implementation Flow:
sequenceDiagram
participant App
participant GH as gh CLI
participant API as GitHub API
App->>GH: gh api repos/owner/repo/branches
GH->>API: GET /repos/owner/repo/branches
API-->>GH: Branch list with SHAs
GH-->>App: JSON array
loop For each branch
App->>GH: gh api repos/owner/repo/compare/main...branch
GH->>API: GET /repos/owner/repo/compare/main...branch
API-->>GH: Comparison data
GH-->>App: {ahead_by, behind_by, status}
end
App->>App: Build Branch models
Comparison Response:
{
"status": "ahead",
"ahead_by": 5,
"behind_by": 0,
"total_commits": 5,
"commits": [ ... ]
}Lists all pull requests for a repository.
Command:
gh pr list \
--repo <owner>/<repo> \
--json number,title,state,headRefName,baseRefName,createdAt,updatedAt \
--limit 100Response:
[
{
"number": 42,
"title": "Add new feature",
"state": "OPEN",
"headRefName": "feature-branch",
"baseRefName": "main",
"createdAt": "2025-11-15T10:00:00Z",
"updatedAt": "2025-11-17T09:30:00Z"
}
]Gets commit history for a specific branch.
Command:
gh api repos/<owner>/<repo>/commits \
--field sha=<branch> \
--field per_page=50 \
--jq '.[] | {sha, message: .commit.message, author: .commit.author, committer: .commit.committer}'Response:
[
{
"sha": "abc123def456",
"message": "Add feature X",
"author": {
"name": "John Doe",
"email": "[email protected]",
"date": "2025-11-17T10:00:00Z"
},
"committer": {
"name": "John Doe",
"email": "[email protected]",
"date": "2025-11-17T10:00:00Z"
}
}
]Creates a new pull request from a branch.
Command:
gh pr create \
--repo <owner>/<repo> \
--head <branch> \
--base <base-branch> \
--title "PR Title" \
--body "PR Description"Implementation:
pub fn create_pull_request(request: &PrRequest) -> Result<PullRequest> {
let body_file = tempfile::NamedTempFile::new()?;
std::fs::write(&body_file, &request.body)?;
let output = Command::new("gh")
.args(&[
"pr", "create",
"--repo", &format!("{}/{}", request.owner, request.repo),
"--head", &request.head,
"--base", &request.base,
"--title", &request.title,
"--body-file", body_file.path().to_str().unwrap(),
])
.output()?;
// Parse URL from output
let url = String::from_utf8(output.stdout)?.trim().to_string();
// Extract PR number from URL
let pr_number = extract_pr_number(&url)?;
Ok(PullRequest {
number: pr_number,
state: "OPEN".to_string(),
title: request.title.clone(),
url,
// ...
})
}graph LR
subgraph "GitHub API Response"
GHRepo[GhRepo JSON from gh CLI]
GHBranch[GhBranch]
GHPR[GhPullRequest]
end
subgraph "Transformation Layer"
Parse[Parse JSON]
Map[Map Fields]
Enrich[Enrich Data]
end
subgraph "Internal Models"
Repo[Repository]
Branch[Branch]
PR[PullRequest]
end
GHRepo --> Parse
GHBranch --> Parse
GHPR --> Parse
Parse --> Map
Map --> Enrich
Enrich --> Repo
Enrich --> Branch
Enrich --> PR
| GitHub Field | Internal Field | Transformation |
|---|---|---|
name |
name |
Direct |
owner.login |
owner |
Extract from object |
pushedAt |
pushed_at |
Parse ISO 8601 |
isFork |
is_fork |
Boolean (0/1 in SQLite) |
| - | id |
Computed: {owner}/{name}
|
| - | priority |
Calculated by analysis module |
| GitHub Field | Internal Field | Transformation |
|---|---|---|
name |
name |
Direct |
commit.sha |
sha |
Extract from commit object |
| Comparison API | ahead_by |
From compare endpoint |
| Comparison API | behind_by |
From compare endpoint |
| - | status |
Calculated from ahead/behind/PRs |
graph TD
Start[New Branch] --> HasPR{Has Open PR?}
HasPR -->|Yes| PROpen[Status: PROpen]
HasPR -->|No| CheckAhead{Ahead > 0?}
CheckAhead -->|Yes| CheckBehind{Behind > 0?}
CheckAhead -->|No| CheckMerged{In default branch?}
CheckBehind -->|Yes| NeedsSync[Status: NeedsSync]
CheckBehind -->|No| ReadyPR[Status: ReadyForPR]
CheckMerged -->|Yes| Merged[Status: Merged]
CheckMerged -->|No| Unknown[Status: Unknown]
Status Enum:
pub enum BranchStatus {
Unknown, // No comparison data
NeedsSync, // Behind base branch
ReadyForPR, // Ahead, no PR
PROpen, // Has open PR
Merged, // Already merged
}graph TB
Error[GitHub Error] --> Auth[Authentication Error]
Error --> RateLimit[Rate Limit Error]
Error --> NotFound[Not Found Error]
Error --> Network[Network Error]
Error --> Parse[JSON Parse Error]
Error --> Other[Other API Error]
Auth --> Retry1[Prompt re-auth]
RateLimit --> Retry2[Wait + Retry]
NotFound --> Handle1[Mark as deleted]
Network --> Retry3[Retry with backoff]
Parse --> Handle2[Log and skip]
Other --> Handle3[Log and propagate]
pub fn execute_gh_command(args: &[&str]) -> Result<String> {
let output = Command::new("gh")
.args(args)
.output()
.context("Failed to execute gh command")?;
if !output.status.success() {
let stderr = String::from_utf8_lossy(&output.stderr);
// Parse specific error types
if stderr.contains("authentication") {
return Err(anyhow!("GitHub authentication required. Run: gh auth login"));
}
if stderr.contains("rate limit") {
// Extract retry-after if available
return Err(anyhow!("GitHub API rate limit exceeded"));
}
if stderr.contains("404") || stderr.contains("Not Found") {
return Err(anyhow!("Repository or resource not found"));
}
return Err(anyhow!("gh command failed: {}", stderr));
}
Ok(String::from_utf8(output.stdout)?)
}pub fn fetch_with_retry<T, F>(operation: F, max_retries: u32) -> Result<T>
where
F: Fn() -> Result<T>,
{
let mut attempts = 0;
let mut delay = Duration::from_secs(2);
loop {
match operation() {
Ok(result) => return Ok(result),
Err(e) if attempts < max_retries => {
if e.to_string().contains("rate limit") {
tracing::warn!("Rate limited, waiting {:?} before retry", delay);
std::thread::sleep(delay);
delay *= 2; // Exponential backoff
attempts += 1;
} else {
return Err(e); // Don't retry non-rate-limit errors
}
}
Err(e) => return Err(e),
}
}
}| Authentication | Rate Limit | Reset Period |
|---|---|---|
| Authenticated (gh) | 5000 req/hr | 1 hour |
| Unauthenticated | 60 req/hr | 1 hour |
- Aggressive Caching: Store all data in SQLite, minimize API calls
- Batch Operations: Combine related requests when possible
- Incremental Updates: Only fetch changed data (future enhancement)
- Priority Queue: Fetch most important repos first
- Rate Limit Monitoring: Track remaining requests
pub struct RateLimitInfo {
pub limit: u32,
pub remaining: u32,
pub reset_at: DateTime<Utc>,
}
pub fn check_rate_limit() -> Result<RateLimitInfo> {
let output = Command::new("gh")
.args(&["api", "rate_limit"])
.output()?;
let data: serde_json::Value = serde_json::from_slice(&output.stdout)?;
Ok(RateLimitInfo {
limit: data["resources"]["core"]["limit"].as_u64().unwrap() as u32,
remaining: data["resources"]["core"]["remaining"].as_u64().unwrap() as u32,
reset_at: DateTime::from_timestamp(
data["resources"]["core"]["reset"].as_i64().unwrap(),
0,
).unwrap(),
})
}graph TB
Start[API Request] --> Cache{In Cache & Fresh?}
Cache -->|Yes| Return[Return Cached]
Cache -->|No| Check{Rate Limit OK?}
Check -->|Yes| Fetch[Execute Request]
Check -->|No| Wait[Wait for Reset]
Fetch --> Store[Store in Cache]
Store --> Return
Wait --> Check
For testing without actual GitHub API calls:
#[cfg(test)]
pub struct MockGitHubClient {
pub repos: HashMap<String, Vec<Repository>>,
pub branches: HashMap<String, Vec<Branch>>,
pub prs: HashMap<String, Vec<PullRequest>>,
}
impl MockGitHubClient {
pub fn new() -> Self {
Self {
repos: HashMap::new(),
branches: HashMap::new(),
prs: HashMap::new(),
}
}
pub fn with_repo(mut self, owner: &str, repo: Repository) -> Self {
self.repos
.entry(owner.to_string())
.or_insert_with(Vec::new)
.push(repo);
self
}
}For real GitHub API testing (requires authentication):
#[test]
#[ignore] // Run with: cargo test -- --ignored
fn test_real_github_api() {
let repos = list_repos("softwarewrighter").unwrap();
assert!(!repos.is_empty());
let first_repo = &repos[0];
let branches = fetch_branches(&first_repo.owner, &first_repo.name).unwrap();
assert!(!branches.is_empty());
}-
Always Check Authentication First: Fail early if
ghnot authenticated -
Use Structured Output: Always use
--jsonflag for parseable output -
Handle Pagination: Use
--limitappropriately, default to 50-100 -
Log Commands: Use
tracingto log allghcommands for debugging - Timeout Long Operations: Set reasonable timeouts for API calls
-
Validate Input: Check owner/repo names before passing to
gh - Parse Errors Carefully: Extract meaningful error messages from stderr
- Cache Aggressively: Store results in SQLite to minimize API calls
- Data Flow - Sequence diagrams showing GitHub API workflows
- Architecture Overview - High-level system design
- Storage Layer - How GitHub data is persisted