Date Extraction Methods - Xentraxx/GooglePhotosTakeoutHelper GitHub Wiki

Date Extraction Methods

Google Photos Takeout Helper uses multiple date extraction methods to determine the correct creation date for each photo and video. These methods are applied in priority order, ensuring the most accurate date is used for organizing your media chronologically.

Overview

When Google Photos exports your data, timestamps often become corrupted or inconsistent. GPTH solves this by trying multiple extraction approaches, from most to least reliable, until it finds a valid date for each media file.

Priority Order Summary

Priority Method Accuracy Source Configuration
1 (Highest) JSON Metadata Highest Google Photos metadata Always enabled
2 EXIF Data High Camera/device metadata Always enabled
3 Filename Patterns Medium Filename date patterns Optional (--guess-from-name)
4 JSON Tryhard Lower Aggressive JSON matching Always enabled
5 (Lowest) Folder Year Lowest Parent folder year Always enabled

Detailed Method Descriptions

1. JSON Metadata Extraction (Priority 1)

How it works: Extracts timestamps from Google Photos' .json metadata files that accompany each media file.

What JSON Files Contain

{
  "title": "IMG_20230615_143022.jpg",
  "description": "",
  "imageViews": "1",
  "creationTime": {
    "timestamp": "1686841822",
    "formatted": "15.06.2023, 14:30:22 UTC"
  },
  "photoTakenTime": {
    "timestamp": "1686841822", 
    "formatted": "15.06.2023, 14:30:22 UTC"
  },
  "geoData": {
    "latitude": 52.5200,
    "longitude": 13.4050,
    "altitude": 34.0
  }
}

Technical Details

  • Source Field: photoTakenTime.timestamp (Unix timestamp)
  • Format: Seconds since Unix epoch (converted to milliseconds)
  • Accuracy: ±0 seconds (exact original timestamp)
  • Timezone: Usually UTC, with timezone info when available
  • Coverage: Available for most Google Photos exports

Advantages

  • Most Accurate: Preserves exact original Google Photos timestamps
  • Complete Metadata: Often includes GPS coordinates and other details
  • Timezone Aware: Maintains original timezone information
  • Unmodified: Not affected by file copying or editing

Limitations

  • File Association Required: JSON file must be found and matched to media file
  • Export Dependent: Only available in Google Takeout exports
  • Naming Sensitivity: Requires correct filename matching between media and JSON

When This Method Fails

  • JSON file is missing or corrupted
  • Filename doesn't match between media file and JSON
  • JSON structure is invalid or missing required fields
  • File encoding issues (rare Unicode problems)

2. EXIF Data Extraction (Priority 2)

How it works: Reads embedded metadata directly from photo and video files using native libraries and ExifTool.

EXIF Tags Used (in priority order)

Native Extraction (exif_reader library):

  1. EXIF DateTimeOriginal - Original photo creation time
  2. EXIF DateTime - General datetime from image metadata

ExifTool Extraction (fallback/video files):

  1. DateTimeOriginal - Original photo/video creation time
  2. MediaCreateDate - Video creation date (for video files)
  3. CreationDate - General creation date

Supported File Formats

Native Support (Fast - exif_reader library):

  • JPEG, TIFF, HEIC, PNG, WebP
  • JPEG XL (JXL)
  • Sony ARW, Canon CR2, Canon CR3, Canon CRW
  • Nikon NEF, NRW, Panasonic RW2, Fuji RAF
  • Adobe DNG, generic RAW formats
  • TIFF-FX, Portable Anymap

ExifTool Support (Comprehensive - when installed):

  • All above formats plus:
  • Video formats: MP4, MOV, AVI (requires ExifTool)
  • Proprietary camera formats
  • Legacy and specialized formats
  • Enhanced metadata extraction for supported formats

Technical Implementation

// Actual GPTH extraction logic
DateTime? result;

// For video files, use ExifTool directly
if (mimeType?.startsWith('video/') == true) {
  if (exifToolInstalled) {
    result = await exifToolExtractor(file);
  }
  return result;
}

// Try native extraction first (faster) for supported formats
if (supportedNativeExifMimeTypes.contains(mimeType)) {
  result = await nativeExifExtractor(file);
  if (result != null) {
    return result;
  }
  // Log warning and fall back to ExifTool
}

// Fallback to ExifTool for unsupported formats or native failures
if (exifToolInstalled) {
  result = await exifToolExtractor(file);
}

return result;

Advantages

  • High Accuracy: Direct from camera/device timestamps
  • Dual-Layer Support: Native extraction for speed, ExifTool for completeness
  • Wide Format Support: Works with photos and videos (when ExifTool installed)
  • Performance Optimized: Native extraction prioritized for common formats
  • Reliable: Not affected by file transfers or Google processing

Limitations

  • Editing Software Impact: May be modified by photo editing applications
  • Camera Clock Issues: Affected by incorrect camera time settings
  • Format Limitations: Some formats don't support EXIF data
  • Video Dependency: Video file support requires ExifTool installation
  • Timezone Complexity: May lack timezone information

Date Validation

GPTH performs several validation checks on EXIF dates:

  • Invalid Date Patterns: Rejects dates like 0000:00:00 or 0000-00-00
  • Edge Case Handling: Special handling for the 2036-01-01 timestamp edge case
  • Historical Limits: Questions dates before 1900
  • Format Normalization: Standardizes date separators before parsing

3. Filename Pattern Extraction (Priority 3)

How it works: Analyzes filenames for embedded date patterns using regular expressions.

Supported Patterns

Standard Camera Patterns:

IMG_20230615_143022.jpg     → 2023-06-15 14:30:22
MVIMG_20190215_193501.MP4   → 2019-02-15 19:35:01
Screenshot_20190919-053857.jpg → 2019-09-19 05:38:57
signal-2020-10-26-163832.jpg  → 2020-10-26 16:38:32

Timestamp Patterns:

20230615143022.jpg          → 2023-06-15 14:30:22
2023_01_30_11_49_15.mp4     → 2023-01-30 11:49:15
2019-04-16-11-19-37.jpg     → 2019-04-16 11:19:37

Android/iOS Patterns:

Screenshot_2019-04-16-11-19-37-232_com.google.a.jpg
BURST20190216172030.jpg
00004XTR_00004_BURST20190216172030.jpg

Configuration Control

# Enable filename guessing (default)
gpth --input source --output dest --guess-from-name

# Disable filename guessing
gpth --input source --output dest --no-guess-from-name

Pattern Recognition Details

The extraction uses carefully crafted regular expressions:

// Example pattern for IMG_YYYYMMDD_HHMMSS format
RegExp(r'(?<date>(20|19|18)\d{2}(01|02|03|04|05|06|07|08|09|10|11|12)[0-3]\d_\d{6})')

Pattern Validation:

  • Year Range: 1800-2099
  • Month Validation: 01-12 only
  • Day Range: 01-31 (basic validation)
  • Time Format: 24-hour format with seconds

Advantages

  • No External Files: Works when JSON/EXIF data is missing
  • Camera Consistency: Many devices use predictable naming
  • Screenshot Support: Excellent for screenshots with timestamps
  • Burst Photo Support: Handles camera burst sequences

Limitations

  • Renaming Breaks It: Manual filename changes lose date info
  • Limited Accuracy: Usually only accurate to the second
  • Pattern Dependency: Only works with recognized naming conventions
  • False Positives: May extract incorrect dates from unrelated numbers

When To Enable/Disable

Enable When:

  • Processing screenshots or screen recordings
  • Dealing with camera files with broken EXIF
  • Working with renamed files that preserved date patterns
  • Processing social media downloads

Disable When:

  • Files have been extensively renamed
  • Processing professionally edited photos
  • Working with archives where filenames are unreliable
  • Prioritizing speed over coverage

4. JSON Tryhard Extraction (Priority 4)

How it works: Uses aggressive pattern matching to find JSON files when standard matching fails.

Advanced Matching Strategies

Filename Variations:

Original:     IMG_1234.jpg
JSON Tried:   IMG_1234.jpg.json
              IMG_1234.json
              IMG_1234-edited.jpg.json
              IMG_1234(1).jpg.json

Truncated Name Handling:

Truncated:    Very_Long_Filename_That_Gets_Cut_Off_By_File.jpg
JSON Found:   Very_Long_Filename_That_Gets_Cut_Off_By_Filesystem.jpg.json

Extra Format Removal:

Processed:    Photo-edited-effects.jpg
Cleaned:      Photo.jpg
JSON Found:   Photo.jpg.json

Cleaning Patterns

GPTH removes common editing suffixes:

  • -edited, -effects, -COLLAGE, -ANIMATION
  • (1), (2), etc. (duplicate numbering)
  • Partial suffixes from truncated names

Technical Implementation

// Simplified tryhard logic
String cleanFilename(String name) {
  // Remove common editing suffixes
  name = name.replaceAll(RegExp(r'-edited|-effects'), '');
  
  // Remove duplicate numbers
  name = name.replaceAll(RegExp(r'\(\d+\)'), '');
  
  // Handle truncated names
  if (isLikelyTruncated(name)) {
    name = findBestMatch(name, availableJsonFiles);
  }
  
  return name;
}

Advantages

  • Handles Edge Cases: Works when standard matching fails
  • Filename Tolerance: Robust against common naming variations
  • Editing Workflow Support: Finds JSON for edited photos
  • Batch Processing: Good for large collections with inconsistent naming

Limitations

  • Lower Accuracy: More prone to false matches
  • Performance Impact: Slower due to extensive searching
  • Ambiguity: May match wrong JSON file in edge cases
  • Complex Logic: More likely to have bugs or unexpected behavior

5. Folder Year Extraction (Priority 5)

How it works: Extracts year information from parent folder names when no other date source is available.

Recognized Folder Patterns

Direct Year Patterns:

2023/                       → January 1, 2023
Photos from 2022/           → January 1, 2022
Takeout/Google Photos/2021/ → January 1, 2021

Year-Month Patterns:

2023-01/                    → January 1, 2023
2023_12/                    → January 1, 2023
Photos 2022-06/             → January 1, 2022

Album Year Patterns:

Vacation 2023/              → January 1, 2023
Birthday Party 2022/        → January 1, 2022
Wedding Photos 2021/        → January 1, 2021

Year Validation

  • Range Check: Years between 1900 and current year + 1
  • Reasonable Values: Rejects obviously wrong years
  • Context Awareness: Considers folder hierarchy for validation

Assignment Logic

When folder year is used:

  • Date: January 1st of extracted year
  • Time: 00:00:00 (midnight)
  • Timezone: Local system timezone
  • Accuracy: Lowest (marked for potential manual review)

Advantages

  • Universal Fallback: Works when all other methods fail
  • Organizational Hint: Provides approximate timeframe
  • Better Than Nothing: Enables basic chronological sorting
  • Folder Structure Leveraging: Uses existing organization

Limitations

  • Very Low Accuracy: Only provides year-level precision
  • Arbitrary Date: Always assigns January 1st
  • Pattern Dependent: Only works with recognizable folder naming
  • No Sub-Year Precision: Cannot determine month, day, or time

Configuration and Control

Command Line Options

# Control filename extraction
gpth --guess-from-name        # Enable filename pattern extraction (default)
gpth --no-guess-from-name     # Disable filename pattern extraction

# Verbose output shows detailed extraction information
gpth --verbose                # See which method was used for each file

Interactive Mode Configuration

When running GPTH interactively, you can:

  1. Choose whether to enable filename guessing
  2. See real-time extraction statistics
  3. Monitor which methods are being used

Processing Statistics

GPTH provides detailed statistics about extraction methods:

DateTime extraction method statistics:
  JSON metadata: 8,542 files (71.2%)
  EXIF data: 2,187 files (18.3%) 
  Filename patterns: 891 files (7.4%)
  JSON tryhard: 234 files (2.0%)
  Folder year: 134 files (1.1%)
  No date found: 12 files (0.1%)

Understanding date extraction methods helps you choose the right configuration for your specific photo collection and ensures the most accurate chronological organization of your memories.

⚠️ **GitHub.com Fallback** ⚠️