Extension Fixing - Xentraxx/GooglePhotosTakeoutHelper GitHub Wiki

Extension Fixing

Overview

The Extension Fixing feature in Google Photos Takeout Helper (GPTH) identifies and corrects files where the file extension doesn't match the actual file content. This is a common issue in Google Photos exports where files may have been compressed, converted, or incorrectly named on the google servers or during the export process.

Common Problems Addressed

Why Extension Fixing is Needed

Google Photos Takeout often contains files with mismatched extensions due to various reasons:

  • Compression artifacts: Images compressed to JPEG format but retaining original extensions (e.g., .png, .heic)
  • Export inconsistencies: HEIF files exported with .jpeg extensions
  • Web downloads: Images downloaded from web with generic or incorrect extensions
  • Processing chains: Files processed through various photo editing tools that change format but not extension
  • Legacy compatibility: Files renamed from AVI to .mp4 during export for compatibility reasons

ExifTool Compatibility Requirements

Extension fixing is essential for proper ExifTool operation, which is the core metadata processing engine used by GPTH. ExifTool relies heavily on file extensions to determine:

  • File format expectations: ExifTool uses the extension to select appropriate parsers and processing methods
  • Metadata writing capabilities: Different formats support different metadata standards (EXIF, XMP, IPTC)
  • Safety protocols: ExifTool applies format-specific safety checks based on the expected file type

When file extensions don't match content, ExifTool can:

  • Fail to write metadata: Refuse to process files with mismatched extensions for safety
  • Apply wrong processing: Use incorrect parsers leading to corrupted metadata
  • Generate errors: Report extension/MIME type conflicts that halt processing

For example, a JPEG file with a .png extension may cause ExifTool to:

# ExifTool error when extensions don't match content
Error: File format error - PNG signature not found
# or
Warning: [minor] Possibly incorrect file extension

GPTH's extension fixing ensures ExifTool can reliably:

  • Write EXIF timestamps to correct photo dates
  • Preserve existing metadata during processing
  • Apply appropriate format-specific optimizations
  • Avoid corruption from mismatched processing expectations

Real-World Examples

❌ Problematic files:
- actual-jpeg-content.png    (JPEG content with PNG extension)
- compressed-image.heic      (JPEG content with HEIC extension)
- video-file.mp4             (AVI content with MP4 extension)

✅ After extension fixing:
- actual-jpeg-content.png.jpg
- compressed-image.heic.jpg  
- video-file.mp4.avi

How Extension Fixing Works

Detection Process

  1. Header Analysis: Reads the first 128 bytes of each file to determine actual MIME type
  2. Extension Comparison: Compares actual MIME type with extension-based MIME type detection
  3. Safety Checks: Applies safety filters to prevent corruption of specific file types
  4. Atomic Renaming: Renames both media file and associated JSON metadata file simultaneously

Safety Features

The extension fixing process includes several safety mechanisms:

  • TIFF-based Protection: Skips TIFF-based files (RAW formats like .CR2, .NEF, .ARW, .DNG) as they're often misidentified
  • Conservative Mode: Optional mode that also skips actual JPEG files for maximum safety
  • Atomic Operations: Either both media and JSON files are renamed successfully, or the operation is rolled back
  • Collision Detection: Validates that target files don't already exist before renaming
  • Content Preservation: Only changes file extensions while preserving file content
  • Metadata Association: Automatically maintains JSON metadata file associations

Configuration Modes

Available Modes

The extension fixing feature supports four different modes via the --fix-extensions flag:

1. none - No Extension Fixing

gpth input_folder output_folder --fix-extensions none
  • Behavior: No extension fixing performed
  • Risk: Files may have incorrect extensions making them unopenable
  • Use Case: When file extensions are already correct or user wants to handle manually
  • Performance: Fastest (skips extension analysis)

2. standard - Standard Mode (Default)

gpth input_folder output_folder --fix-extensions standard
# or simply (default):
gpth input_folder output_folder
  • Behavior: Fixes extensions but protects TIFF-based files
  • Protected Types: .CR2, .NEF, .ARW, .DNG, .TIFF, .TIF
  • Fixed Types: Common formats like JPEG, PNG, MP4, AVI, etc.
  • Safety: Prevents corruption of complex RAW file formats
  • Use Case: Recommended for most users with mixed file types

3. conservative - Conservative Mode

gpth input_folder output_folder --fix-extensions conservative
  • Behavior: Fixes extensions but protects both TIFF-based and JPEG files
  • Protected Types: All TIFF formats plus .JPG, .JPEG
  • Fixed Types: Only non-image formats and clearly safe image formats
  • Safety: Maximum protection against metadata loss in JPEG files
  • Use Case: When preserving original JPEG metadata is critical

4. solo - Solo Mode

gpth input_folder output_folder --fix-extensions solo
  • Behavior: Performs extension fixing then immediately exits
  • Use Case: Diagnostic mode to see what extensions would be changed
  • Output: Reports on extension changes without performing other processing
  • Safety: Allows users to verify extension changes before full processing

Interactive Mode

When running GPTH in interactive mode (without command-line arguments), users are prompted to choose extension fixing options:

Do you want to fix these mismatched extensions?

[1] (default) - Standard: Fix extensions but skip TIFF-based files
[2] - Conservative: Fix extensions but skip both TIFF-based and JPEG files  
[3] - Solo: Fix extensions then exit immediately
[4] - None: Don't fix extensions

Enter your choice (1-4):

Technical Implementation

File Processing Algorithm

// Simplified processing flow
1. Read file header (first 128 bytes)
2. Determine actual MIME type from content
3. Get expected MIME type from file extension
4. Apply safety filters (TIFF, JPEG if conservative)
5. If mismatch detected:
   a. Find associated JSON metadata file
   b. Determine correct extension for MIME type
   c. Check target files don't exist
   d. Perform atomic rename of both files
   e. Verify cleanup of original files

ExifTool Integration

Extension fixing is implemented as Step 1 in GPTH's processing pipeline, running before any ExifTool operations to ensure compatibility:

Processing Pipeline Order:

  1. Extension Fixing ← Corrects file extensions first
  2. Date Extraction (may use ExifTool)
  3. EXIF Writing (requires ExifTool)
  4. File Moving and Organization
  5. Additional metadata operations

ExifTool Error Prevention: The extension fixing step specifically prevents common ExifTool errors:

# Without extension fixing - ExifTool may fail:
$ exiftool -overwrite_original -DateTimeOriginal="2023:01:15 14:30:00" photo.png
Error: File format error - JPEG signature found, but file has PNG extension

# After extension fixing - ExifTool works correctly:
$ exiftool -overwrite_original -DateTimeOriginal="2023:01:15 14:30:00" photo.png.jpg
1 image files updated

When EXIF Writing Detects Issues: If extension fixing is disabled and EXIF writing encounters mismatches, GPTH will suggest enabling extension fixing:

Error: Extension/MIME type mismatch detected for file photo.png
Suggestion: Consider using --fix-extensions to correct file extensions before EXIF processing

Command Line Examples

Basic Usage

# Use default standard mode
gpth "/path/to/takeout" "/path/to/output"

# Explicit standard mode
gpth "/path/to/takeout" "/path/to/output" --fix-extensions standard

# Conservative mode for sensitive collections
gpth "/path/to/takeout" "/path/to/output" --fix-extensions conservative

# Diagnostic mode to preview changes
gpth "/path/to/takeout" "/path/to/output" --fix-extensions solo

Combined with Other Features

# Full processing with extension fixing
gpth "/path/to/takeout" "/path/to/output" \
  --fix-extensions standard \
  --write-exif true \
  --update-creation-time \
  --verbose

# Conservative processing for valuable originals
gpth "/path/to/takeout" "/path/to/output" \
  --fix-extensions conservative \
  --write-exif false \
  --skip-extras true

Error Handling and Troubleshooting

Common Issues

Target File Already Exists

Warning: Skipped fixing extension because target file already exists: photo.png.jpg

Solution: Check for existing files with corrected extensions and resolve manually if needed.

Permission Errors

Error: Failed to process file photo.png: Permission denied

Solution: Ensure GPTH has write permissions to the input directory.

JSON File Not Found

Warning: Unable to find matching JSON for file: photo.png

Impact: Media file will still be renamed, but JSON association may be lost.

Rollback Mechanism

If extension fixing fails partway through:

Error: Extension fixing failed, attempting rollback: JSON file does not exist after rename
Info: Rolled back media file rename: /path/to/original/photo.png

The system automatically attempts to restore the original state.

Best Practices

When to Use Each Mode

Use standard mode when:

  • Processing general Google Photos exports
  • You have mixed file types including some RAW photos
  • You want good protection with reasonable correction coverage

Use conservative mode when:

  • Working with valuable original photos
  • You have many JPEG files with critical metadata
  • Maximum safety is more important than correction coverage

Use solo mode when:

  • You want to preview what would be changed
  • Diagnosing extension issues before full processing
  • Running extension fixing separately from other operations

Use none mode when:

  • File extensions are already correct
  • You want to handle extension issues manually
  • Maximum processing speed is required