Advanced Git Topics - bounswe/bounswe2025group2 GitHub Wiki

git

Managing Feature Branches

Why Use Feature Branches?

Feature branches allow developers to work on new features without affecting the main codebase. This keeps the main branch stable and enables collaboration without conflicts.

Creating a Feature Branch

git checkout -b feature-branch

Creates and switches to a new feature branch.

Pushing the Feature Branch

git push origin feature-branch

Uploads the new branch to the remote repository.

Working on the Feature Branch

After creating the branch, make changes, stage, and commit them as usual:

git add .
git commit -m "Implemented new feature"

Keeping the Feature Branch Updated

To sync your feature branch with the main branch:

git checkout main
git pull origin main
git checkout feature-branch
git merge main

This prevents conflicts when merging later. Applying this step regularly (e.g., daily) when a feature addition takes significant time will reduce conflicts once the feature is complete. It helps stay up to date with the main branch.

Merging the Feature Branch

Once the feature is complete, merge it into the main branch:

git checkout main
git merge feature-branch

Then delete the feature branch:

git branch -d feature-branch

IMPORTANT: We should always use pull requests instead of directly merging. The concept of pull requests is explained below.

Handling Merge Conflicts

If there are conflicts during merging, Git will prompt you to resolve them manually. Open the affected files, fix conflicts, then:

git add <file>
git commit -m "Resolved merge conflicts"

By resolving the problems caused by version differences in those files, you can commit and push them without issues.

Pull Requests

What is a Pull Request?

A pull request (PR) is a feature of platforms like GitHub, GitLab, and Bitbucket that allows developers to propose changes to a repository. It provides a way to review, discuss, and approve code before merging it into the main branch.

Creating a Pull Request

  1. Push your feature branch to the remote repository:
    git push origin feature-branch
  2. Navigate to your repository on GitHub (or another Git hosting service).
  3. Click on "Compare & pull request."
  4. Add a title and description explaining your changes.
  5. Select the base branch (e.g., main) and the compare branch (e.g., feature-branch).
  6. Click "Create pull request."

Reviewing and Merging a Pull Request

  1. Team members review the pull request and provide feedback.
  2. If necessary, make changes and push them to the same feature branch.
  3. Once approved, merge the pull request using the "Merge" button on the platform or:
    git checkout main
    git merge feature-branch
  4. Delete the feature branch after merging:
    git branch -d feature-branch

Why Use Pull Requests?

  • Code Review: Ensures code quality and adherence to project standards.
  • Collaboration: Encourages discussions and knowledge sharing.
  • Version Control: Provides a clear history of changes and discussions.
  • Error Prevention: Helps catch bugs before merging into the main branch.

Undoing Changes in Git

Git provides several tools to undo changes, depending on the situation. The most commonly used commands are git reset, git checkout, and git revert. Each serves a specific purpose, so it’s important to understand their differences.


1. Git Reset

git reset is used to undo changes by moving the HEAD pointer to a specific commit. It can modify the working directory, staging area, and commit history, depending on the mode used.

Modes of git reset:

  • --soft: Moves the HEAD pointer to a specific commit but leaves the staging area and working directory unchanged. This is useful for redoing commits.

    git reset --soft <commit-hash>
  • --mixed (default): Moves the HEAD pointer and resets the staging area to match the specified commit, but leaves the working directory unchanged. This is useful for unstaging changes.

    git reset --mixed <commit-hash>
  • --hard: Moves the HEAD pointer, resets the staging area, and updates the working directory to match the specified commit. This is destructive and will permanently discard uncommitted changes.

    git reset --hard <commit-hash>

Common Use Cases:

  • Undo the last commit (keep changes in the working directory):
    git reset --soft HEAD~1
  • Unstage changes (keep changes in the working directory):
    git reset HEAD <file>
  • Discard all changes and reset to a specific commit:
    git reset --hard <commit-hash>

2. Git Checkout

git checkout is used to switch branches or restore files from a specific commit. It does not modify the commit history but updates the working directory.

Switching Branches:

git checkout <branch-name>

Restoring Files:

  • Restore a file from the last commit (discard changes in the working directory):
    git checkout -- <file>
  • Restore a file from a specific commit:
    git checkout <commit-hash> -- <file>

Detached HEAD State:

If you check out a specific commit (instead of a branch), you enter a detached HEAD state. This allows you to explore the repository's history but is not intended for making changes.

git checkout <commit-hash>

3. Git Revert

git revert is used to undo changes by creating a new commit that reverses the effects of a previous commit. This is a safe way to undo changes because it does not rewrite history.

Reverting a Commit:

git revert <commit-hash>

This creates a new commit that undoes the changes introduced by the specified commit.

Reverting a Merge Commit:

If you need to revert a merge commit, you must specify the parent branch using the -m flag:

git revert -m 1 <merge-commit-hash>

Here, -m 1 refers to the first parent branch (usually the main branch).

Common Use Cases:

  • Undo a specific commit without rewriting history:
    git revert <commit-hash>
  • Undo the last commit:
    git revert HEAD

Comparison of git reset, git checkout, and git revert

Command Purpose Effect on History Safe for Shared Repositories?
git reset Moves the HEAD pointer to a specific commit (can discard changes). Rewrites history No (use with caution).
git checkout Switches branches or restores files from a commit. Does not modify history Yes.
git revert Creates a new commit that undoes the changes of a previous commit. Preserves history Yes.

When to Use Each Command:

  • Use git reset to undo local changes or rewrite commit history (only for private branches).
  • Use git checkout to switch branches or restore files without modifying history.
  • Use git revert to undo changes in a shared repository without rewriting history.

Example Workflow

  1. Undo the last commit (keep changes):

    git reset --soft HEAD~1
  2. Discard changes in a file:

    git checkout -- <file>
  3. Revert a specific commit:

    git revert <commit-hash>
  4. Reset to a previous commit (discard all changes):

    git reset --hard <commit-hash>

Conclusion

Understanding git reset, git checkout, and git revert is essential for managing changes and navigating Git history effectively. Each command has a specific use case, so choose the one that best fits your needs:

  • Use git reset for local changes and history rewriting.
  • Use git checkout for switching branches or restoring files.
  • Use git revert for undoing changes in shared repositories.

Advanced Git Log

The git log command is one of the most powerful tools in Git for exploring the commit history of a repository. While the basic git log command provides a simple list of commits, Git offers a variety of options to customize and filter the output for more advanced use cases. Below, we’ll explore some of the most useful advanced features of git log.


1. Basic Git Log

The simplest form of git log displays a list of commits in reverse chronological order (most recent commits first):

git log

Output:

commit abc1234 (HEAD -> main)
Author: John Doe <[email protected]>
Date:   Mon Oct 9 12:34:56 2023 +0000

    Added new feature X

commit def5678
Author: Jane Smith <[email protected]>
Date:   Sun Oct 8 10:11:12 2023 +0000

    Fixed bug in feature Y

2. Customizing Git Log Output

Git log can be customized to display specific information in a more readable format.

Show a Compact One-Line Log:

git log --oneline

Output:

abc1234 (HEAD -> main) Added new feature X
def5678 Fixed bug in feature Y

Show a Graph of Branches and Merges:

git log --graph --oneline

Output:

* abc1234 (HEAD -> main) Added new feature X
* def5678 Fixed bug in feature Y
| * 123456 (feature-branch) Implemented feature Z
|/
* 789abc Initial commit

Show Full Diffs for Each Commit:

git log -p

This displays the full diff (changes) for each commit.

Show Stats for Each Commit:

git log --stat

Output:

commit abc1234 (HEAD -> main)
Author: John Doe <[email protected]>
Date:   Mon Oct 9 12:34:56 2023 +0000

    Added new feature X

 file1.txt | 5 ++++-
 file2.txt | 3 ++-
 2 files changed, 6 insertions(+), 2 deletions(-)

3. Filtering Git Log

Git log can be filtered to show only specific commits based on various criteria.

Filter by Author:

git log --author="John Doe"

This shows only commits made by the specified author.

Filter by Date:

git log --since="2023-10-01" --until="2023-10-31"

This shows commits made between October 1, 2023, and October 31, 2023.

Filter by Commit Message:

git log --grep="bug fix"

This shows commits with messages containing the phrase "bug fix".

Filter by File:

git log -- <file-path>

This shows commits that modified the specified file.


4. Limiting Git Log Output

You can limit the number of commits displayed by Git log.

Show the Last N Commits:

git log -n 5

This shows the last 5 commits.

Show Commits Since a Specific Commit:

git log abc1234..

This shows all commits since the commit with hash abc1234.


5. Formatting Git Log Output

Git log allows you to customize the output format using the --pretty option.

Custom Format:

git log --pretty=format:"%h - %an, %ar : %s"

Output:

abc1234 - John Doe, 2 hours ago : Added new feature X
def5678 - Jane Smith, 1 day ago : Fixed bug in feature Y

Common Format Placeholders:

  • %h: Abbreviated commit hash
  • %an: Author name
  • %ar: Author date, relative
  • %s: Commit message
  • %ad: Author date (formatted)
  • %cn: Committer name
  • %cd: Committer date

JSON Format:

git log --pretty=format:'{"commit": "%h", "author": "%an", "date": "%ad", "message": "%s"}'

Output:

{"commit": "abc1234", "author": "John Doe", "date": "Mon Oct 9 12:34:56 2023 +0000", "message": "Added new feature X"}

6. Combining Filters and Options

You can combine multiple options to create powerful Git log queries.

Example: Show the Last 3 Commits by a Specific Author in a Compact Format:

git log --author="John Doe" --oneline -n 3

Output:

abc1234 Added new feature X
789abc Refactored code
456def Updated documentation

Example: Show a Graph of Merges with Stats:

git log --graph --stat

7. Searching for Changes in Git Log

You can search for commits that introduced or removed specific content.

Search for Commits That Added or Removed a String:

git log -S "TODO"

This shows commits that added or removed the string "TODO".

Search for Commits That Changed a Specific Line:

git log -L 10,20:file.txt

This shows commits that modified lines 10 to 20 in file.txt.


8. Viewing Git Log for a Specific Branch

You can view the commit history for a specific branch:

git log <branch-name>

Compare Branches:

git log main..feature-branch

This shows commits in feature-branch that are not in main.


9. Using Aliases for Common Git Log Commands

To save time, you can create Git aliases for frequently used log commands.

Example Alias for a Compact Log with a Graph:

git config --global alias.lg "log --oneline --graph --decorate"

Now you can use:

git lg

Conclusion

The git log command is incredibly versatile and can be tailored to suit almost any need when exploring commit history. By combining filters, formatting options, and customizations, you can extract meaningful insights from your repository's history. Whether you’re debugging, auditing, or simply reviewing changes, mastering git log will make you a more efficient Git user.

Here’s a quick summary of the most useful options:

  • --oneline: Compact one-line output.
  • --graph: Visualize branch and merge history.
  • --author: Filter by author.
  • --since/--until: Filter by date.
  • --grep: Filter by commit message.
  • --stat: Show file statistics.
  • -p: Show full diffs.
  • --pretty: Customize output format.

Git Hooks

Git hooks are scripts that Git executes before or after events such as commits, pushes, and merges. They allow you to automate tasks, enforce policies, and integrate with other tools. Hooks are stored in the .git/hooks directory of your repository and are local to that repository. Each hook is a script that can be written in any scripting language (e.g., Bash, Python, Ruby).


Types of Git Hooks

Git hooks are divided into two categories: client-side and server-side. Client-side hooks run on your local machine, while server-side hooks run on the remote repository (e.g., on GitHub or GitLab).

Common Client-Side Hooks:

  1. pre-commit: Runs before a commit is created. Useful for linting, formatting, or running tests.
  2. commit-msg: Runs after the commit message is created but before the commit is finalized. Useful for enforcing commit message conventions.
  3. post-commit: Runs after a commit is completed. Useful for notifications or logging.
  4. pre-push: Runs before a push to a remote repository. Useful for running tests or checks before sharing changes.

Common Server-Side Hooks:

  1. pre-receive: Runs before changes are accepted into the remote repository. Useful for enforcing policies.
  2. update: Runs after pre-receive but before changes are applied. Useful for branch-specific checks.
  3. post-receive: Runs after changes are accepted into the remote repository. Useful for notifications or deployment.

How to Use Git Hooks

1. Locating Hooks

Git hooks are stored in the .git/hooks directory of your repository. Each hook is a script with a predefined name (e.g., pre-commit, commit-msg).

Example:

.git/hooks/
├── pre-commit.sample
├── commit-msg.sample
├── post-commit.sample
└── ...

2. Enabling a Hook

To enable a hook, remove the .sample extension from the script and make it executable.

Example: Enable the pre-commit hook:

mv .git/hooks/pre-commit.sample .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

3. Writing a Custom Hook

You can write your own hook script in any scripting language. For example, a pre-commit hook in Bash:

#!/bin/bash
# .git/hooks/pre-commit

# Run linting
if ! npm run lint; then
  echo "Linting failed. Commit aborted."
  exit 1
fi

# Run tests
if ! npm test; then
  echo "Tests failed. Commit aborted."
  exit 1
fi

Make the script executable:

chmod +x .git/hooks/pre-commit

4. Sharing Hooks

Since hooks are local to the repository, they are not automatically shared with others. To share hooks, you can:

  • Store them in a directory (e.g., hooks/) in your repository.
  • Use a tool like Husky (for Node.js projects) to manage hooks.
  • Add a script to copy hooks from a shared directory to .git/hooks.

Example: Copy hooks from a hooks/ directory:

#!/bin/bash
# copy-hooks.sh

cp hooks/* .git/hooks/
chmod +x .git/hooks/*

Example Use Cases for Git Hooks

1. Enforce Commit Message Format

Use the commit-msg hook to enforce a specific commit message format (e.g., requiring a JIRA ticket number).

#!/bin/bash
# .git/hooks/commit-msg

commit_msg=$(cat $1)

if ! [[ $commit_msg =~ ^[A-Z]+-[0-9]+: ]]; then
  echo "Commit message must start with a JIRA ticket number (e.g., PROJ-123:)."
  exit 1
fi

2. Run Tests Before Pushing

Use the pre-push hook to run tests before pushing changes to the remote repository.

#!/bin/bash
# .git/hooks/pre-push

if ! npm test; then
  echo "Tests failed. Push aborted."
  exit 1
fi

3. Automate Code Formatting

Use the pre-commit hook to automatically format code before committing.

#!/bin/bash
# .git/hooks/pre-commit

# Format code using Prettier
npx prettier --write .

# Add formatted files to the commit
git add .

4. Notify Team After a Push

Use the post-receive hook to send a notification to your team after a push.

#!/bin/bash
# .git/hooks/post-receive

# Send a notification
curl -X POST -H 'Content-type: application/json' \
  --data '{"text":"New changes pushed to the repository!"}' \
  https://hooks.slack.com/services/your/slack/webhook/url

Best Practices for Git Hooks

  1. Keep Hooks Lightweight: Hooks should run quickly and not block developers for too long.
  2. Document Hooks: Add comments to your hook scripts to explain their purpose.
  3. Test Hooks: Test your hooks thoroughly to avoid unexpected behavior.
  4. Use Tools for Sharing: Use tools like Husky or a shared hooks directory to make hooks available to all team members.
  5. Avoid Overusing Hooks: Only use hooks for tasks that are critical to your workflow.

Conclusion

Git hooks are a powerful way to automate tasks, enforce policies, and integrate with other tools in your development workflow. By leveraging hooks, you can ensure consistency, improve code quality, and streamline your processes. Whether you’re running tests, formatting code, or enforcing commit conventions, Git hooks can help you achieve your goals efficiently.

Here’s a quick summary of the most common hooks:

  • pre-commit: Run checks before committing.
  • commit-msg: Enforce commit message conventions.
  • pre-push: Run checks before pushing.
  • post-receive: Trigger actions after changes are accepted.

Git Refs and the Reflog

In Git, refs and the reflog are powerful tools for tracking and managing changes in your repository. They help you navigate through the history of your branches, tags, and commits, and recover lost work. Let’s dive into what refs and the reflog are and how to use them effectively.


1. Git Refs (References)

Refs are pointers to specific commits in your Git history. They are used to identify branches, tags, and other important points in your repository. Refs are stored in the .git/refs directory.

Types of Refs:

  1. Branches: Pointers to the latest commit in a branch (e.g., refs/heads/main).
  2. Tags: Pointers to specific commits, often used for releases (e.g., refs/tags/v1.0.0).
  3. Remote Refs: Pointers to the state of branches in remote repositories (e.g., refs/remotes/origin/main).

Viewing Refs:

You can view all refs in your repository using:

git show-ref

Output:

abc1234 refs/heads/main
def5678 refs/remotes/origin/main
123456 refs/tags/v1.0.0

Creating and Deleting Refs:

  • Create a new branch (ref):
    git branch new-feature
  • Delete a branch (ref):
    git branch -d new-feature

Updating Refs:

Refs are automatically updated when you commit, merge, or checkout branches. For example, when you create a new commit, the branch ref (e.g., refs/heads/main) is updated to point to the new commit.


2. Git Reflog (Reference Log)

The reflog is a safety net that records all changes to refs (e.g., branches, HEAD) in your local repository. It allows you to recover lost commits, branches, or other changes that are no longer part of the commit history.

How the Reflog Works:

  • Every time a ref is updated (e.g., by committing, merging, or checking out a branch), Git logs the change in the reflog.
  • The reflog is local to your repository and is not shared with others.
  • Entries in the reflog expire after a certain period (default: 90 days).

Viewing the Reflog:

To view the reflog, use:

git reflog

Output:

abc1234 (HEAD -> main) HEAD@{0}: commit: Added new feature X
def5678 HEAD@{1}: checkout: moving from feature-branch to main
123456 HEAD@{2}: commit: Fixed bug in feature Y

Each entry includes:

  • A commit hash (e.g., abc1234).
  • A ref pointer (e.g., HEAD@{0}).
  • An action description (e.g., commit: Added new feature X).

Using the Reflog to Recover Lost Work:

If you accidentally delete a branch or reset your branch to an older commit, you can use the reflog to find the lost commit and restore it.

Example: Restore a deleted branch:

  1. Find the commit hash in the reflog:

    git reflog

    Output:

    abc1234 HEAD@{0}: commit: Added new feature X
    def5678 HEAD@{1}: checkout: moving from feature-branch to main
    123456 HEAD@{2}: commit: Fixed bug in feature Y
    
  2. Create a new branch pointing to the lost commit:

    git branch recovered-branch abc1234

Reflog for Specific Refs:

You can view the reflog for a specific ref (e.g., a branch):

git reflog show main

3. Practical Use Cases for Refs and Reflog

Use Case 1: Recovering a Deleted Branch

  1. Check the reflog to find the commit hash of the deleted branch:
    git reflog
  2. Restore the branch:
    git branch recovered-branch <commit-hash>

Use Case 2: Undoing a Hard Reset

If you accidentally reset your branch to an older commit using git reset --hard, you can use the reflog to find the latest commit and reset back to it:

  1. Check the reflog:
    git reflog
  2. Reset to the desired commit:
    git reset --hard HEAD@{1}

Use Case 3: Finding Lost Commits

If you lose commits due to a rebase or merge, you can use the reflog to locate them:

  1. Check the reflog:
    git reflog
  2. Create a new branch or cherry-pick the lost commits:
    git cherry-pick <commit-hash>

4. Reflog Expiration

Reflog entries expire after a certain period (default: 90 days). You can configure the expiration time using:

git config gc.reflogExpire <time>

Example: Set reflog expiration to 180 days:

git config gc.reflogExpire 180.days

5. Refs and Reflog in Remote Repositories

  • Refs: Remote refs (e.g., refs/remotes/origin/main) track the state of branches in remote repositories. They are updated when you fetch or pull from the remote.
  • Reflog: The reflog is local and does not exist on remote repositories. However, you can use the reflog to recover changes that were pushed to the remote but later lost locally.

Conclusion

Refs and the reflog are essential tools for navigating and managing your Git history. Here’s a quick summary of their uses:

  • Refs: Pointers to commits (e.g., branches, tags). Use them to track the state of your repository.
  • Reflog: A log of all changes to refs. Use it to recover lost commits, branches, or changes.

Git Rebase vs. Merge

The git rebase and git merge commands are both used to integrate changes from one branch into another, but they do so in different ways. Understanding the differences between them is crucial for maintaining a clean and efficient Git workflow.


1. Git Merge

git merge combines the changes from one branch into another by creating a merge commit. It preserves the history of both branches, showing a clear divergence and convergence of the branches.

How Merge Works:

  • Merging integrates changes from the source branch into the target branch.
  • It creates a new merge commit that has two parent commits (one from each branch).
  • The commit history remains intact, showing the exact point where the branches were merged.

Example:

git checkout main
git merge feature-branch

This merges feature-branch into main and creates a merge commit.

Pros of Merge:

  • Preserves the complete history of both branches.
  • Easy to understand and use.
  • Non-destructive (does not rewrite history).

Cons of Merge:

  • Can create a cluttered commit history with many merge commits.
  • May make it harder to follow the history of individual changes.

2. Git Rebase

git rebase integrates changes by replaying the commits from one branch onto another. Instead of creating a merge commit, it moves or "replays" the commits from the source branch onto the tip of the target branch.

How Rebase Works:

  • Rebasing takes the commits from the source branch and applies them one by one onto the target branch.
  • It creates new commits with new commit hashes, effectively rewriting history.
  • The result is a linear commit history without merge commits.

Example:

git checkout feature-branch
git rebase main

This replays the commits from feature-branch onto main.

Pros of Rebase:

  • Creates a clean, linear commit history.
  • Easier to follow the history of individual changes.
  • Avoids unnecessary merge commits.

Cons of Rebase:

  • Rewrites history, which can cause issues if used improperly.
  • Requires resolving conflicts for each commit being replayed.
  • Not suitable for shared branches (e.g., main or develop).

3. Key Differences Between Rebase and Merge

Feature Merge Rebase
History Preserves history with merge commits. Creates a linear history by replaying commits.
Commit Hashes Retains original commit hashes. Creates new commit hashes.
Conflict Resolution Resolves conflicts once at the end. Resolves conflicts for each commit.
Use Case Suitable for shared branches. Suitable for feature branches.
Safety Non-destructive (safe for shared branches). Rewrites history (use with caution).

4. When to Use Merge vs. Rebase

Use Merge When:

  • You want to preserve the complete history of both branches.
  • You are working on a shared branch (e.g., main or develop).
  • You want a simple and safe way to integrate changes.

Use Rebase When:

  • You want a clean, linear commit history.
  • You are working on a feature branch and want to integrate the latest changes from main.
  • You want to avoid unnecessary merge commits.

5. Example Workflows

Merge Workflow:

  1. Check out the target branch:
    git checkout main
  2. Merge the feature branch:
    git merge feature-branch

Rebase Workflow:

  1. Check out the feature branch:
    git checkout feature-branch
  2. Rebase onto the target branch:
    git rebase main
  3. Resolve conflicts (if any) and continue the rebase:
    git rebase --continue
  4. Fast-forward merge into the target branch:
    git checkout main
    git merge feature-branch

6. Best Practices for Rebase and Merge

For Merge:

  • Use git merge for integrating changes into shared branches.
  • Avoid excessive merge commits by squashing or rebasing feature branches before merging.

For Rebase:

  • Use git rebase for feature branches to keep the history clean.
  • Never rebase commits that have been pushed to a shared repository.
  • Resolve conflicts carefully during rebase to avoid losing changes.

Conclusion

Both git merge and git rebase are essential tools for integrating changes in Git, but they serve different purposes:

  • Merge: Preserves history and is safe for shared branches.
  • Rebase: Creates a clean, linear history and is ideal for feature branches.

Git Submodules and Subtrees

When working on large projects or managing multiple repositories, you may need to include external repositories within your main repository. Git provides two mechanisms for this: submodules and subtrees. Both allow you to include external repositories, but they work differently and have distinct use cases. Let’s explore both in detail.


1. Git Submodules

Git submodules allow you to include an external repository as a subdirectory within your main repository. The submodule is a separate repository with its own history, and it is linked to a specific commit in the external repository.

Key Characteristics of Submodules:

  • The submodule is a separate repository.
  • The main repository tracks a specific commit in the submodule.
  • Submodules are useful when you want to include an external project but keep its history separate.

Adding a Submodule

To add a submodule, use the git submodule add command:

git submodule add <repository-url> <path>

Example:

git submodule add https://github.com/user/repo.git external/repo

This adds the external repository as a submodule in the external/repo directory.

Cloning a Repository with Submodules

When cloning a repository that contains submodules, you need to initialize and update the submodules:

git clone <repository-url>
cd <repository>
git submodule init
git submodule update

Alternatively, you can clone the repository and initialize submodules in one step:

git clone --recurse-submodules <repository-url>

Updating a Submodule

To update a submodule to the latest commit:

  1. Navigate to the submodule directory:
    cd external/repo
  2. Pull the latest changes:
    git pull origin main
  3. Return to the main repository and commit the updated submodule:
    cd ..
    git add external/repo
    git commit -m "Updated submodule to latest commit"

Removing a Submodule

To remove a submodule:

  1. Delete the submodule entry from .gitmodules and .git/config:
    git submodule deinit -f external/repo
  2. Remove the submodule directory:
    rm -rf .git/modules/external/repo
    git rm -f external/repo
  3. Commit the changes:
    git commit -m "Removed submodule external/repo"

2. Git Subtrees

Git subtrees allow you to merge an external repository into a subdirectory of your main repository. Unlike submodules, subtrees integrate the external repository’s history into your main repository.

Key Characteristics of Subtrees:

  • The external repository’s history is merged into the main repository.
  • Subtrees are easier to manage than submodules because they don’t require separate initialization or updates.
  • Subtrees are useful when you want to include an external project and treat it as part of your repository.

Adding a Subtree

To add a subtree, use the git subtree add command:

git subtree add --prefix=<path> <repository-url> <branch> --squash

Example:

git subtree add --prefix=external/repo https://github.com/user/repo.git main --squash

This adds the external repository as a subtree in the external/repo directory and squashes its history into a single commit.

Pulling Changes from a Subtree

To pull the latest changes from the external repository:

git subtree pull --prefix=external/repo https://github.com/user/repo.git main --squash

Pushing Changes to a Subtree

To push changes from your repository to the external repository:

git subtree push --prefix=external/repo https://github.com/user/repo.git main

Removing a Subtree

To remove a subtree, simply delete the directory and commit the change:

rm -rf external/repo
git commit -m "Removed subtree external/repo"

3. Submodules vs. Subtrees: When to Use Which

Feature Submodules Subtrees
History Separate history for submodule. Integrated history in main repo.
Ease of Use Requires initialization and updates. Easier to manage.
Dependency Management Tracks specific commits. Tracks entire history.
Use Case External projects with separate history. External projects treated as part of the main repo.

Use Submodules When:

  • You want to keep the external repository’s history separate.
  • You need to track specific commits in the external repository.
  • The external repository is large and you want to avoid bloating your main repository.

Use Subtrees When:

  • You want to integrate the external repository’s history into your main repository.
  • You want to treat the external project as part of your repository.
  • You prefer simpler management without separate initialization or updates.

4. Best Practices for Submodules and Subtrees

For Submodules:

  • Always initialize and update submodules after cloning.
  • Document submodule usage in your repository’s README.
  • Avoid frequent updates to submodules to prevent conflicts.

For Subtrees:

  • Use --squash to simplify history when adding or pulling subtrees.
  • Document subtree usage in your repository’s README.
  • Regularly pull changes from the external repository to stay up-to-date.

5. Example Workflows

Submodule Workflow:

  1. Add a submodule:
    git submodule add https://github.com/user/repo.git external/repo
  2. Clone the repository and initialize submodules:
    git clone --recurse-submodules <repository-url>
  3. Update the submodule:
    cd external/repo
    git pull origin main
    cd ..
    git add external/repo
    git commit -m "Updated submodule"

Subtree Workflow:

  1. Add a subtree:
    git subtree add --prefix=external/repo https://github.com/user/repo.git main --squash
  2. Pull changes from the subtree:
    git subtree pull --prefix=external/repo https://github.com/user/repo.git main --squash
  3. Push changes to the subtree:
    git subtree push --prefix=external/repo https://github.com/user/repo.git main

Conclusion

Git submodules and subtrees are powerful tools for managing external repositories within your project. Here’s a quick summary:

  • Submodules: Use when you want to keep the external repository’s history separate and track specific commits.
  • Subtrees: Use when you want to integrate the external repository’s history into your main repository and treat it as part of your project.

Git LFS, GC, Prune, and Bash Integration

In this section, we’ll cover Git Large File Storage (LFS), garbage collection (git gc), pruning (git prune), and how to integrate Git with Bash scripting for automation. These tools and techniques are essential for managing large repositories, optimizing performance, and automating repetitive tasks.


1. Git Large File Storage (LFS)

Git LFS is an extension for Git that handles large files more efficiently. Instead of storing large files directly in the repository, Git LFS stores pointers to the files and uploads the actual files to a separate storage server.

Why Use Git LFS?

  • Git is not optimized for large files (e.g., binaries, videos, datasets).
  • Large files can bloat the repository and slow down operations like cloning and fetching.
  • Git LFS keeps your repository lightweight by offloading large files to external storage.

Installing Git LFS

  1. Download and install Git LFS from git-lfs.com.
  2. Initialize Git LFS in your repository:
    git lfs install

Tracking Large Files

To track specific file types with Git LFS:

git lfs track "*.psd"
git lfs track "*.mp4"

This creates a .gitattributes file that specifies which files are managed by Git LFS.

Adding and Committing Large Files

Once files are tracked, add and commit them as usual:

git add file.psd
git commit -m "Added large design file"
git push origin main

Cloning a Repository with Git LFS

When cloning a repository with Git LFS, the large files are downloaded automatically:

git clone <repository-url>

Managing Git LFS

  • View tracked files:
    git lfs ls-files
  • Remove files from Git LFS:
    git lfs untrack "*.psd"

2. Git Garbage Collection (git gc)

Git garbage collection (git gc) is a maintenance command that cleans up unnecessary files and optimizes your repository. It removes unreachable objects (e.g., orphaned commits) and compresses file storage.

Running Git Garbage Collection

git gc

This command:

  • Removes unreachable objects.
  • Compresses file revisions.
  • Optimizes the repository for better performance.

Automating Git Garbage Collection

Git automatically runs git gc periodically, but you can configure its behavior:

  • Set the frequency of automatic garbage collection:
    git config gc.auto 1000
  • Disable automatic garbage collection:
    git config gc.auto 0

3. Git Prune (git prune)

git prune removes unreachable objects (e.g., orphaned commits) from the repository. It is often used in conjunction with git gc.

Running Git Prune

git prune

This removes objects that are no longer referenced by any branch or tag.

Pruning Remote References

To prune stale remote-tracking branches:

git remote prune origin

Combining Prune with Fetch

You can prune stale references while fetching updates:

git fetch --prune

4. Bash Integration with Git

Bash scripting can automate repetitive Git tasks, such as committing, pushing, or cleaning up branches. Below are some examples of how to integrate Git with Bash.

Example 1: Automating Commits

Create a Bash script to automate committing changes:

#!/bin/bash
# auto-commit.sh

# Add all changes
git add .

# Commit with a timestamp
timestamp=$(date +"%Y-%m-%d %H:%M:%S")
git commit -m "Auto-commit at $timestamp"

# Push to remote
git push origin main

Run the script:

bash auto-commit.sh

Example 2: Cleaning Up Merged Branches

Create a Bash script to delete merged branches:

#!/bin/bash
# cleanup-branches.sh

# Fetch latest changes
git fetch --prune

# Delete merged branches
git branch --merged main | grep -v "main" | xargs git branch -d

Run the script:

bash cleanup-branches.sh

Example 3: Bulk Cloning Repositories

Create a Bash script to clone multiple repositories:

#!/bin/bash
# bulk-clone.sh

repos=(
  "https://github.com/user/repo1.git"
  "https://github.com/user/repo2.git"
  "https://github.com/user/repo3.git"
)

for repo in "${repos[@]}"; do
  git clone "$repo"
done

Run the script:

bash bulk-clone.sh

Example 4: Git Status Notifier

Create a Bash script to notify you of uncommitted changes:

#!/bin/bash
# git-status-notifier.sh

if [[ $(git status --porcelain) ]]; then
  echo "There are uncommitted changes!"
else
  echo "Working directory is clean."
fi

Run the script:

bash git-status-notifier.sh

5. Best Practices for Git LFS, GC, Prune, and Bash Integration

For Git LFS:

  • Use Git LFS for files larger than 100 MB.
  • Regularly check for untracked large files using git lfs ls-files.
  • Document Git LFS usage in your repository’s README.

For Git GC and Prune:

  • Run git gc periodically to optimize your repository.
  • Use git fetch --prune to clean up stale remote references.
  • Avoid running git prune manually unless necessary.

For Bash Scripting:

  • Use descriptive names for scripts.
  • Add comments to explain the purpose of each script.
  • Test scripts thoroughly before using them in production.

Conclusion

By combining Git LFS, garbage collection, pruning, and Bash scripting, you can manage large repositories, optimize performance, and automate repetitive tasks. Here’s a quick summary:

  • Git LFS: Handles large files efficiently by storing them externally.
  • Git GC: Cleans up and optimizes your repository.
  • Git Prune: Removes unreachable objects and stale references.
  • Bash Scripting: Automates Git workflows for efficiency.
⚠️ **GitHub.com Fallback** ⚠️