Extension Fixing - Xentraxx/GooglePhotosTakeoutHelper GitHub Wiki
Extension Fixing
Overview
The Extension Fixing feature in Google Photos Takeout Helper (GPTH) identifies and corrects files where the file extension doesn't match the actual file content. This is a common issue in Google Photos exports where files may have been compressed, converted, or incorrectly named on the google servers or during the export process.
Common Problems Addressed
Why Extension Fixing is Needed
Google Photos Takeout often contains files with mismatched extensions due to various reasons:
- Compression artifacts: Images compressed to JPEG format but retaining original extensions (e.g.,
.png
,.heic
) - Export inconsistencies: HEIF files exported with
.jpeg
extensions - Web downloads: Images downloaded from web with generic or incorrect extensions
- Processing chains: Files processed through various photo editing tools that change format but not extension
- Legacy compatibility: Files renamed from AVI to
.mp4
during export for compatibility reasons
ExifTool Compatibility Requirements
Extension fixing is essential for proper ExifTool operation, which is the core metadata processing engine used by GPTH. ExifTool relies heavily on file extensions to determine:
- File format expectations: ExifTool uses the extension to select appropriate parsers and processing methods
- Metadata writing capabilities: Different formats support different metadata standards (EXIF, XMP, IPTC)
- Safety protocols: ExifTool applies format-specific safety checks based on the expected file type
When file extensions don't match content, ExifTool can:
- Fail to write metadata: Refuse to process files with mismatched extensions for safety
- Apply wrong processing: Use incorrect parsers leading to corrupted metadata
- Generate errors: Report extension/MIME type conflicts that halt processing
For example, a JPEG file with a .png
extension may cause ExifTool to:
# ExifTool error when extensions don't match content
Error: File format error - PNG signature not found
# or
Warning: [minor] Possibly incorrect file extension
GPTH's extension fixing ensures ExifTool can reliably:
- Write EXIF timestamps to correct photo dates
- Preserve existing metadata during processing
- Apply appropriate format-specific optimizations
- Avoid corruption from mismatched processing expectations
Real-World Examples
❌ Problematic files:
- actual-jpeg-content.png (JPEG content with PNG extension)
- compressed-image.heic (JPEG content with HEIC extension)
- video-file.mp4 (AVI content with MP4 extension)
✅ After extension fixing:
- actual-jpeg-content.png.jpg
- compressed-image.heic.jpg
- video-file.mp4.avi
How Extension Fixing Works
Detection Process
- Header Analysis: Reads the first 128 bytes of each file to determine actual MIME type
- Extension Comparison: Compares actual MIME type with extension-based MIME type detection
- Safety Checks: Applies safety filters to prevent corruption of specific file types
- Atomic Renaming: Renames both media file and associated JSON metadata file simultaneously
Safety Features
The extension fixing process includes several safety mechanisms:
- TIFF-based Protection: Skips TIFF-based files (RAW formats like .CR2, .NEF, .ARW, .DNG) as they're often misidentified
- Conservative Mode: Optional mode that also skips actual JPEG files for maximum safety
- Atomic Operations: Either both media and JSON files are renamed successfully, or the operation is rolled back
- Collision Detection: Validates that target files don't already exist before renaming
- Content Preservation: Only changes file extensions while preserving file content
- Metadata Association: Automatically maintains JSON metadata file associations
Configuration Modes
Available Modes
The extension fixing feature supports four different modes via the --fix-extensions
flag:
none
- No Extension Fixing
1. gpth input_folder output_folder --fix-extensions none
- Behavior: No extension fixing performed
- Risk: Files may have incorrect extensions making them unopenable
- Use Case: When file extensions are already correct or user wants to handle manually
- Performance: Fastest (skips extension analysis)
standard
- Standard Mode (Default)
2. gpth input_folder output_folder --fix-extensions standard
# or simply (default):
gpth input_folder output_folder
- Behavior: Fixes extensions but protects TIFF-based files
- Protected Types:
.CR2
,.NEF
,.ARW
,.DNG
,.TIFF
,.TIF
- Fixed Types: Common formats like JPEG, PNG, MP4, AVI, etc.
- Safety: Prevents corruption of complex RAW file formats
- Use Case: Recommended for most users with mixed file types
conservative
- Conservative Mode
3. gpth input_folder output_folder --fix-extensions conservative
- Behavior: Fixes extensions but protects both TIFF-based and JPEG files
- Protected Types: All TIFF formats plus
.JPG
,.JPEG
- Fixed Types: Only non-image formats and clearly safe image formats
- Safety: Maximum protection against metadata loss in JPEG files
- Use Case: When preserving original JPEG metadata is critical
solo
- Solo Mode
4. gpth input_folder output_folder --fix-extensions solo
- Behavior: Performs extension fixing then immediately exits
- Use Case: Diagnostic mode to see what extensions would be changed
- Output: Reports on extension changes without performing other processing
- Safety: Allows users to verify extension changes before full processing
Interactive Mode
When running GPTH in interactive mode (without command-line arguments), users are prompted to choose extension fixing options:
Do you want to fix these mismatched extensions?
[1] (default) - Standard: Fix extensions but skip TIFF-based files
[2] - Conservative: Fix extensions but skip both TIFF-based and JPEG files
[3] - Solo: Fix extensions then exit immediately
[4] - None: Don't fix extensions
Enter your choice (1-4):
Technical Implementation
File Processing Algorithm
// Simplified processing flow
1. Read file header (first 128 bytes)
2. Determine actual MIME type from content
3. Get expected MIME type from file extension
4. Apply safety filters (TIFF, JPEG if conservative)
5. If mismatch detected:
a. Find associated JSON metadata file
b. Determine correct extension for MIME type
c. Check target files don't exist
d. Perform atomic rename of both files
e. Verify cleanup of original files
ExifTool Integration
Extension fixing is implemented as Step 1 in GPTH's processing pipeline, running before any ExifTool operations to ensure compatibility:
Processing Pipeline Order:
- Extension Fixing ← Corrects file extensions first
- Date Extraction (may use ExifTool)
- EXIF Writing (requires ExifTool)
- File Moving and Organization
- Additional metadata operations
ExifTool Error Prevention: The extension fixing step specifically prevents common ExifTool errors:
# Without extension fixing - ExifTool may fail:
$ exiftool -overwrite_original -DateTimeOriginal="2023:01:15 14:30:00" photo.png
Error: File format error - JPEG signature found, but file has PNG extension
# After extension fixing - ExifTool works correctly:
$ exiftool -overwrite_original -DateTimeOriginal="2023:01:15 14:30:00" photo.png.jpg
1 image files updated
When EXIF Writing Detects Issues: If extension fixing is disabled and EXIF writing encounters mismatches, GPTH will suggest enabling extension fixing:
Error: Extension/MIME type mismatch detected for file photo.png
Suggestion: Consider using --fix-extensions to correct file extensions before EXIF processing
Command Line Examples
Basic Usage
# Use default standard mode
gpth "/path/to/takeout" "/path/to/output"
# Explicit standard mode
gpth "/path/to/takeout" "/path/to/output" --fix-extensions standard
# Conservative mode for sensitive collections
gpth "/path/to/takeout" "/path/to/output" --fix-extensions conservative
# Diagnostic mode to preview changes
gpth "/path/to/takeout" "/path/to/output" --fix-extensions solo
Combined with Other Features
# Full processing with extension fixing
gpth "/path/to/takeout" "/path/to/output" \
--fix-extensions standard \
--write-exif true \
--update-creation-time \
--verbose
# Conservative processing for valuable originals
gpth "/path/to/takeout" "/path/to/output" \
--fix-extensions conservative \
--write-exif false \
--skip-extras true
Error Handling and Troubleshooting
Common Issues
Target File Already Exists
Warning: Skipped fixing extension because target file already exists: photo.png.jpg
Solution: Check for existing files with corrected extensions and resolve manually if needed.
Permission Errors
Error: Failed to process file photo.png: Permission denied
Solution: Ensure GPTH has write permissions to the input directory.
JSON File Not Found
Warning: Unable to find matching JSON for file: photo.png
Impact: Media file will still be renamed, but JSON association may be lost.
Rollback Mechanism
If extension fixing fails partway through:
Error: Extension fixing failed, attempting rollback: JSON file does not exist after rename
Info: Rolled back media file rename: /path/to/original/photo.png
The system automatically attempts to restore the original state.
Best Practices
When to Use Each Mode
Use standard
mode when:
- Processing general Google Photos exports
- You have mixed file types including some RAW photos
- You want good protection with reasonable correction coverage
Use conservative
mode when:
- Working with valuable original photos
- You have many JPEG files with critical metadata
- Maximum safety is more important than correction coverage
Use solo
mode when:
- You want to preview what would be changed
- Diagnosing extension issues before full processing
- Running extension fixing separately from other operations
Use none
mode when:
- File extensions are already correct
- You want to handle extension issues manually
- Maximum processing speed is required