Date Extraction Methods - Xentraxx/GooglePhotosTakeoutHelper GitHub Wiki
Google Photos Takeout Helper uses multiple date extraction methods to determine the correct creation date for each photo and video. These methods are applied in priority order, ensuring the most accurate date is used for organizing your media chronologically.
When Google Photos exports your data, timestamps often become corrupted or inconsistent. GPTH solves this by trying multiple extraction approaches, from most to least reliable, until it finds a valid date for each media file.
Priority | Method | Accuracy | Source | Configuration |
---|---|---|---|---|
1 (Highest) | JSON Metadata | Highest | Google Photos metadata | Always enabled |
2 | EXIF Data | High | Camera/device metadata | Always enabled |
3 | Filename Patterns | Medium | Filename date patterns | Optional (--guess-from-name ) |
4 | JSON Tryhard | Lower | Aggressive JSON matching | Always enabled |
5 (Lowest) | Folder Year | Lowest | Parent folder year | Always enabled |
How it works: Extracts timestamps from Google Photos' .json
metadata files that accompany each media file.
{
"title": "IMG_20230615_143022.jpg",
"description": "",
"imageViews": "1",
"creationTime": {
"timestamp": "1686841822",
"formatted": "15.06.2023, 14:30:22 UTC"
},
"photoTakenTime": {
"timestamp": "1686841822",
"formatted": "15.06.2023, 14:30:22 UTC"
},
"geoData": {
"latitude": 52.5200,
"longitude": 13.4050,
"altitude": 34.0
}
}
-
Source Field:
photoTakenTime.timestamp
(Unix timestamp) - Format: Seconds since Unix epoch (converted to milliseconds)
- Accuracy: ±0 seconds (exact original timestamp)
- Timezone: Usually UTC, with timezone info when available
- Coverage: Available for most Google Photos exports
- Most Accurate: Preserves exact original Google Photos timestamps
- Complete Metadata: Often includes GPS coordinates and other details
- Timezone Aware: Maintains original timezone information
- Unmodified: Not affected by file copying or editing
- File Association Required: JSON file must be found and matched to media file
- Export Dependent: Only available in Google Takeout exports
- Naming Sensitivity: Requires correct filename matching between media and JSON
- JSON file is missing or corrupted
- Filename doesn't match between media file and JSON
- JSON structure is invalid or missing required fields
- File encoding issues (rare Unicode problems)
How it works: Reads embedded metadata directly from photo and video files using native libraries and ExifTool.
Native Extraction (exif_reader library):
- EXIF DateTimeOriginal - Original photo creation time
- EXIF DateTime - General datetime from image metadata
ExifTool Extraction (fallback/video files):
- DateTimeOriginal - Original photo/video creation time
- MediaCreateDate - Video creation date (for video files)
- CreationDate - General creation date
Native Support (Fast - exif_reader library):
- JPEG, TIFF, HEIC, PNG, WebP
- JPEG XL (JXL)
- Sony ARW, Canon CR2, Canon CR3, Canon CRW
- Nikon NEF, NRW, Panasonic RW2, Fuji RAF
- Adobe DNG, generic RAW formats
- TIFF-FX, Portable Anymap
ExifTool Support (Comprehensive - when installed):
- All above formats plus:
- Video formats: MP4, MOV, AVI (requires ExifTool)
- Proprietary camera formats
- Legacy and specialized formats
- Enhanced metadata extraction for supported formats
// Actual GPTH extraction logic
DateTime? result;
// For video files, use ExifTool directly
if (mimeType?.startsWith('video/') == true) {
if (exifToolInstalled) {
result = await exifToolExtractor(file);
}
return result;
}
// Try native extraction first (faster) for supported formats
if (supportedNativeExifMimeTypes.contains(mimeType)) {
result = await nativeExifExtractor(file);
if (result != null) {
return result;
}
// Log warning and fall back to ExifTool
}
// Fallback to ExifTool for unsupported formats or native failures
if (exifToolInstalled) {
result = await exifToolExtractor(file);
}
return result;
- High Accuracy: Direct from camera/device timestamps
- Dual-Layer Support: Native extraction for speed, ExifTool for completeness
- Wide Format Support: Works with photos and videos (when ExifTool installed)
- Performance Optimized: Native extraction prioritized for common formats
- Reliable: Not affected by file transfers or Google processing
- Editing Software Impact: May be modified by photo editing applications
- Camera Clock Issues: Affected by incorrect camera time settings
- Format Limitations: Some formats don't support EXIF data
- Video Dependency: Video file support requires ExifTool installation
- Timezone Complexity: May lack timezone information
GPTH performs several validation checks on EXIF dates:
-
Invalid Date Patterns: Rejects dates like
0000:00:00
or0000-00-00
- Edge Case Handling: Special handling for the 2036-01-01 timestamp edge case
- Historical Limits: Questions dates before 1900
- Format Normalization: Standardizes date separators before parsing
How it works: Analyzes filenames for embedded date patterns using regular expressions.
Standard Camera Patterns:
IMG_20230615_143022.jpg → 2023-06-15 14:30:22
MVIMG_20190215_193501.MP4 → 2019-02-15 19:35:01
Screenshot_20190919-053857.jpg → 2019-09-19 05:38:57
signal-2020-10-26-163832.jpg → 2020-10-26 16:38:32
Timestamp Patterns:
20230615143022.jpg → 2023-06-15 14:30:22
2023_01_30_11_49_15.mp4 → 2023-01-30 11:49:15
2019-04-16-11-19-37.jpg → 2019-04-16 11:19:37
Android/iOS Patterns:
Screenshot_2019-04-16-11-19-37-232_com.google.a.jpg
BURST20190216172030.jpg
00004XTR_00004_BURST20190216172030.jpg
# Enable filename guessing (default)
gpth --input source --output dest --guess-from-name
# Disable filename guessing
gpth --input source --output dest --no-guess-from-name
The extraction uses carefully crafted regular expressions:
// Example pattern for IMG_YYYYMMDD_HHMMSS format
RegExp(r'(?<date>(20|19|18)\d{2}(01|02|03|04|05|06|07|08|09|10|11|12)[0-3]\d_\d{6})')
Pattern Validation:
- Year Range: 1800-2099
- Month Validation: 01-12 only
- Day Range: 01-31 (basic validation)
- Time Format: 24-hour format with seconds
- No External Files: Works when JSON/EXIF data is missing
- Camera Consistency: Many devices use predictable naming
- Screenshot Support: Excellent for screenshots with timestamps
- Burst Photo Support: Handles camera burst sequences
- Renaming Breaks It: Manual filename changes lose date info
- Limited Accuracy: Usually only accurate to the second
- Pattern Dependency: Only works with recognized naming conventions
- False Positives: May extract incorrect dates from unrelated numbers
Enable When:
- Processing screenshots or screen recordings
- Dealing with camera files with broken EXIF
- Working with renamed files that preserved date patterns
- Processing social media downloads
Disable When:
- Files have been extensively renamed
- Processing professionally edited photos
- Working with archives where filenames are unreliable
- Prioritizing speed over coverage
How it works: Uses aggressive pattern matching to find JSON files when standard matching fails.
Filename Variations:
Original: IMG_1234.jpg
JSON Tried: IMG_1234.jpg.json
IMG_1234.json
IMG_1234-edited.jpg.json
IMG_1234(1).jpg.json
Truncated Name Handling:
Truncated: Very_Long_Filename_That_Gets_Cut_Off_By_File.jpg
JSON Found: Very_Long_Filename_That_Gets_Cut_Off_By_Filesystem.jpg.json
Extra Format Removal:
Processed: Photo-edited-effects.jpg
Cleaned: Photo.jpg
JSON Found: Photo.jpg.json
GPTH removes common editing suffixes:
-
-edited
,-effects
,-COLLAGE
,-ANIMATION
-
(1)
,(2)
, etc. (duplicate numbering) - Partial suffixes from truncated names
// Simplified tryhard logic
String cleanFilename(String name) {
// Remove common editing suffixes
name = name.replaceAll(RegExp(r'-edited|-effects'), '');
// Remove duplicate numbers
name = name.replaceAll(RegExp(r'\(\d+\)'), '');
// Handle truncated names
if (isLikelyTruncated(name)) {
name = findBestMatch(name, availableJsonFiles);
}
return name;
}
- Handles Edge Cases: Works when standard matching fails
- Filename Tolerance: Robust against common naming variations
- Editing Workflow Support: Finds JSON for edited photos
- Batch Processing: Good for large collections with inconsistent naming
- Lower Accuracy: More prone to false matches
- Performance Impact: Slower due to extensive searching
- Ambiguity: May match wrong JSON file in edge cases
- Complex Logic: More likely to have bugs or unexpected behavior
How it works: Extracts year information from parent folder names when no other date source is available.
Direct Year Patterns:
2023/ → January 1, 2023
Photos from 2022/ → January 1, 2022
Takeout/Google Photos/2021/ → January 1, 2021
Year-Month Patterns:
2023-01/ → January 1, 2023
2023_12/ → January 1, 2023
Photos 2022-06/ → January 1, 2022
Album Year Patterns:
Vacation 2023/ → January 1, 2023
Birthday Party 2022/ → January 1, 2022
Wedding Photos 2021/ → January 1, 2021
- Range Check: Years between 1900 and current year + 1
- Reasonable Values: Rejects obviously wrong years
- Context Awareness: Considers folder hierarchy for validation
When folder year is used:
- Date: January 1st of extracted year
- Time: 00:00:00 (midnight)
- Timezone: Local system timezone
- Accuracy: Lowest (marked for potential manual review)
- Universal Fallback: Works when all other methods fail
- Organizational Hint: Provides approximate timeframe
- Better Than Nothing: Enables basic chronological sorting
- Folder Structure Leveraging: Uses existing organization
- Very Low Accuracy: Only provides year-level precision
- Arbitrary Date: Always assigns January 1st
- Pattern Dependent: Only works with recognizable folder naming
- No Sub-Year Precision: Cannot determine month, day, or time
# Control filename extraction
gpth --guess-from-name # Enable filename pattern extraction (default)
gpth --no-guess-from-name # Disable filename pattern extraction
# Verbose output shows detailed extraction information
gpth --verbose # See which method was used for each file
When running GPTH interactively, you can:
- Choose whether to enable filename guessing
- See real-time extraction statistics
- Monitor which methods are being used
GPTH provides detailed statistics about extraction methods:
DateTime extraction method statistics:
JSON metadata: 8,542 files (71.2%)
EXIF data: 2,187 files (18.3%)
Filename patterns: 891 files (7.4%)
JSON tryhard: 234 files (2.0%)
Folder year: 134 files (1.1%)
No date found: 12 files (0.1%)
Understanding date extraction methods helps you choose the right configuration for your specific photo collection and ensures the most accurate chronological organization of your memories.