Data Quality - banisterious/obsidian-charted-roots GitHub Wiki
Data Quality
Charted Roots includes comprehensive data quality tools to help you maintain accurate, consistent genealogical data. These tools detect issues, preview changes, and apply fixes across your vault.
Table of Contents
- Overview
- Post-Import Cleanup Workflow
- Control Center Data Quality Tab
- Batch Operations
- Place Data Quality
- GEDCOM Import Quality Preview
- Best Practices
Overview
Data quality tools are available in two contexts:
- During GEDCOM import - Preview issues before any files are created
- Post-import - Analyze and fix issues in existing person notes
Post-Import Cleanup Workflow
After importing a GEDCOM file (especially a "messy" one with data quality issues), use the Cleanup Wizard to guide you through the recommended sequence of fixes.
Using the Cleanup Wizard (Recommended)
Location: Control Center → Data Quality tab → "Cleanup Wizard"
The Cleanup Wizard provides a guided, step-by-step workflow that walks you through all data quality operations in the correct order. It's the recommended approach for post-import cleanup.
Wizard Features:
- 14 sequential steps covering cleanup, normalization, and migration operations
- Progress tracking with completed/pending status for each step
- Session persistence — your progress is saved, so you can close the wizard and resume later
- Preview before apply — each operation shows what will change before making modifications
- Skip/reset options — skip steps that don't apply or reset to re-run a step
- Mixed execution modes — review-only, batch (applied automatically), and interactive (you approve each change)
Wizard Steps:
| Step | Operation | Mode | Description |
|---|---|---|---|
| 1 | Quality Report | Review | Surface current data quality issues so you can decide which later steps apply |
| 2 | Bidirectional Relationships | Batch | Ensure parent-child and spouse links are reciprocated |
| 3 | Date Formats | Batch | Normalize dates to ISO 8601 (YYYY-MM-DD) |
| 4 | Gender Values | Batch | Normalize gender/sex values to GEDCOM-standard M/F/X/U |
| 5 | Orphan References | Batch | Clear dangling father_id / mother_id / etc. links to missing notes |
| 6 | Source Migration | Batch | Convert indexed source properties (source, source_2) to a sources: [] array |
| 7 | Place Variants | Interactive | Standardize spelling variations of place names |
| 8 | Bulk Geocode | Interactive | Look up and add coordinates to places for map display |
| 9 | Place Hierarchy | Interactive | Build contained_by containment chains between places |
| 10 | Flatten Properties | Batch | Convert nested frontmatter to flat properties |
| 11 | Event Person Migration | Batch | Convert legacy person field on event notes to the persons array |
| 12 | Evidence Tracking Migration | Batch | Convert nested sourced_facts object to flat {field}_sources properties |
| 13 | Life Events Migration | Batch | Convert inline events: arrays on person notes to separate cr_type: event files |
| 14 | Normalize Children Property | Batch | Rename legacy child field to children on person notes |
Tips:
- Run the Quality Report (step 1) first to understand the scope of issues — it's review-only and won't change anything
- Steps can be skipped if they don't apply to your data
- The wizard remembers which steps you've completed across sessions
- Interactive steps (7, 8, 9) pause for your approval on each candidate; batch steps apply changes to all matching records at once
Migration steps are rerun-safe. The migration operations (steps 11, 12, 13, 14) are designed to be idempotent. The Life Events Migration in particular scans for existing event notes and reuses any whose (persons, event_type, date) tuple matches the inline event being migrated — so re-running on a person whose events were already converted won't create duplicates. The post-migration notice reports (reused N existing events) alongside the created count when any reuse happened.
Manual Cleanup Steps
If you prefer to run individual operations manually, or need more control over specific steps, you can access each operation directly from the Control Center tabs.
Step 1: Review the Quality Report
Location: Control Center → Data Quality tab → "Run analysis"
Start here to understand the scope of issues. The report shows:
- Overall quality score (0-100)
- Issues grouped by severity (errors, warnings, info)
- Completeness metrics
This gives you the big picture before diving into fixes.
Step 2: Fix Bidirectional Relationships
Location: Control Center → People tab → Batch operations → "Fix bidirectional relationship inconsistencies"
Run this early — it ensures the family graph is internally consistent. If a child lists a parent, the parent should list the child. This step is essential for tree generation and navigation to work correctly.
When to run this operation:
- After manual edits where you added a parent/spouse/child to one person but forgot the reciprocal link
- After bulk operations that might have created one-sided relationships
- As a periodic sanity check during active data entry sessions (every 10-20 edits)
- Before major exports or canvas generation
When it's unnecessary:
- After GEDCOM imports — the importer creates bidirectional relationships automatically
- After using the plugin's built-in relationship editing UI — those create both sides automatically
Step 3: Normalize Date Formats
Location: Control Center → Data Quality tab → "Normalize date formats"
Converts varied date formats (15 Mar 1920, Mar 15, 1920, etc.) to the standard YYYY-MM-DD format. Standardized dates enable proper sorting, filtering, and age calculations.
Step 4: Normalize Sex Values
Location: Control Center → Data Quality tab → "Normalize sex values"
Converts male, female, man, woman, etc. to GEDCOM-standard canonical values (M, F, X, U). Consistent sex values are required for parent role validation (father vs. mother) and GEDCOM export compatibility.
Step 5: Clear Orphan References
Location: Control Center → Data Quality tab → "Clear orphan references"
Removes father_id and mother_id values that point to non-existent people. This cleans up dangling references that can cause errors in tree generation.
Step 6: Migrate Source Arrays
Location: Control Center → Data Quality tab → "Migrate source arrays"
Converts indexed source properties (source, source_2, source_3) to a single sources YAML array. This aligns with the modern array-based property format used throughout Charted Roots.
Step 7: Standardize Place Names
Location: Control Center → Places tab → "Standardize variants"
Unifies spelling variations ("USA" vs "United States of America", state abbreviations). Consistent place names enable proper grouping and hierarchy building.
Step 8: Geocode Places
Location: Control Center → Places tab → "Bulk geocode"
Looks up coordinates for place notes that don't have them. Required for map visualizations. Note: Rate-limited to 1 request/second.
Step 9: Enrich Place Hierarchy
Location: Control Center → Places tab → "Enrich place hierarchy"
Uses geocoding API to fill in contained_by relationships (city → county → state → country). Creates proper place containment chains.
Optional: Flatten Nested Properties
Location: Control Center → Data Quality tab → "Flatten nested properties"
If your GEDCOM import created nested frontmatter (e.g., coordinates: { lat: ..., long: ... }), this converts them to flat properties (coordinates_lat, coordinates_long). Flat properties work better with Obsidian's property editor.
Control Center Data Quality Tab
Access the Data Quality tab from Control Center to analyze and fix issues in your existing data.
Quality Report
The Quality Report analyzes all person notes and generates:
- Quality Score (0-100) - Overall data quality rating
- Issues by Severity - Errors, warnings, and informational items
- Issues by Category - Date, relationship, data format, references, etc.
- Completeness Metrics - Percentage of notes with birth dates, parents, etc.
Issue Categories
Date Inconsistencies
| Issue | Severity | Description |
|---|---|---|
| Death before birth | Error | Death date is earlier than birth date |
| Future birth/death | Error | Date is in the future |
| Unreasonable age | Warning | Lifespan exceeds 120 years |
| Born before parent | Error | Child's birth predates parent's birth |
| Parent too young | Warning | Parent was under 12 at child's birth |
| Parent too old | Warning | Father over 80 or mother over 55 at birth |
| Born after parent death | Error | Birth after mother's death (or >1 year after father's) |
Relationship Inconsistencies
| Issue | Severity | Description |
|---|---|---|
| Gender/role mismatch | Warning | Female listed as father, or male as mother |
| Self-reference | Error | Person listed as their own parent/spouse |
| Circular relationship | Error | A is parent of B, B is parent of A |
| Duplicate spouse | Warning | Same person listed multiple times as spouse |
Missing Data
| Issue | Severity | Description |
|---|---|---|
| No parents | Info | Neither father nor mother defined |
| One parent only | Info | Only one parent defined |
| No birth date | Info | Birth date not recorded |
| No gender | Info | Gender not specified |
Orphan References
| Issue | Severity | Description |
|---|---|---|
| Orphan parent ref | Warning | Father/mother ID points to non-existent person |
| Orphan spouse ref | Warning | Spouse ID points to non-existent person |
| Orphan child ref | Warning | Child ID points to non-existent person |
Data Format Issues
| Issue | Severity | Description |
|---|---|---|
| Non-standard date | Info | Date not in YYYY-MM-DD or YYYY format |
| Invalid gender value | Warning | Gender value not recognized |
| Nested property | Warning | Frontmatter contains nested objects |
| Legacy type property | Info | Uses type instead of cr_type |
Batch Operations
Fix Bidirectional Relationships
Ensures all parent-child and spouse relationships are properly reciprocated.
What it fixes:
- Child lists parent, but parent doesn't list child in
children_id - Parent lists child, but child doesn't have
father_id/mother_idset - Person lists spouse, but spouse doesn't reciprocate
Preview mode: Shows all inconsistencies before applying fixes
Conflict handling: When two people both claim the same child as their own, the tool flags this for manual resolution rather than automatically overwriting.
Normalize Date Formats
Converts various date formats to the standard YYYY-MM-DD format.
Formats recognized:
15 Mar 1920→1920-03-15Mar 15, 1920→1920-03-1515/03/1920→1920-03-15about 1920→1920
Normalize Sex Values
Converts sex values to GEDCOM-standard canonical forms using built-in synonyms and user-configured aliases.
Canonical values: M, F, X, U (GEDCOM standard)
Built-in mappings:
male,man,boy→Mfemale,woman,girl→Fnonbinary,non-binary,nb,intersex→Xunknown,?→U
Customization: Configure additional mappings in Preferences → Value Aliases
Normalization Modes
Control how normalization behaves via Preferences → Data Quality → Sex value normalization:
| Mode | Behavior |
|---|---|
| Standard | Normalize all sex values to GEDCOM M/F (default) |
| Schema-aware | Skip notes covered by schemas that define custom sex enum values |
| Disabled | Never normalize (preview shows what would change) |
Schema-aware mode is designed for worldbuilders who define custom sex values (e.g., "hermaphrodite", "neuter") in a schema. When enabled, the normalization operation checks if each person note has an applicable schema with a custom sex enum definition. Notes with such schemas are skipped, preserving custom values.
Example: A person note in a "Sci-Fi Universe" with a schema defining sex: ["male", "female", "neuter", "hermaphrodite"] will be skipped, while genealogy notes in the main tree continue to normalize to GEDCOM M/F
Clear Orphan References
Removes father_id and mother_id values that point to non-existent cr_id values.
Use case: After deleting person notes, clear dangling references
Migrate Legacy Type Property
Migrates from the legacy type property to the namespaced cr_type property.
When to use: If upgrading from an older version that used type: person instead of cr_type: person
Place Data Quality
The Places tab in Control Center includes several data quality tools for place names.
Standardize Place Names
Unifies spelling variations of place names across your vault.
Example: "New York City", "NYC", "New York, NY" → standardized to your chosen form
Standardize Place Variants
Normalizes common abbreviations and alternate forms:
Countries:
- "United States of America", "United States", "US" → "USA"
- "United Kingdom", "Great Britain" → "UK"
US States:
- "California", "Cal." → "CA"
- "New York" → "NY"
Merge Duplicate Places
Combines separate place notes that represent the same location.
Detection methods:
- Case-insensitive matching on
full_name - Title + parent combination matching
Standardize Place Types
Converts generic place types to specific settlement types. This tool flags places with ambiguous types that typically need human review.
Types flagged for review:
locality— Generic term often assigned during importmunicipality— Administrative term that varies by countryhamlet— May be better classified as villagesettlement— Generic term needing specificity
Standard types to convert to:
city— Large urban area, typically >10,000 populationtown— Medium settlement, typically 1,000–10,000 populationvillage— Small rural settlement, typically <1,000 population
Types NOT flagged:
township— Recognized as a valid US administrative division (Midwest/Northeast civil townships)- All other built-in types (country, state, county, district, etc.)
GEDCOM Import Quality Preview
When importing a GEDCOM file, Charted Roots analyzes the data before creating any files.
What's Detected
- Date issues - Death before birth, future dates, impossible ages
- Relationship issues - Gender/role mismatches, circular references
- Reference issues - Pointers to non-existent records
- Data completeness - Missing names, unknown sex, no dates
- Place variants - Inconsistent place name formats
Place Variant Standardization
During import preview, you can choose canonical forms for place names:
- Review detected variants (e.g., "USA" vs "United States of America")
- Select your preferred form for each variant group
- Changes apply to both file names and frontmatter values
This ensures consistency from the start, avoiding post-import cleanup.
Preview Actions
- Proceed with import - Apply your choices and create files
- Cancel - Abort without creating any files
Best Practices
Regular Maintenance
- Run the Quality Report periodically (monthly or after major imports)
- Address errors first, then warnings
- Informational items can often be ignored (missing data may be unavailable)
Before Sharing Data
- Run bidirectional relationship fix to ensure consistency
- Check for orphan references
- Standardize date formats for interoperability
After GEDCOM Import
- Review the quality preview before importing
- Standardize place variants during import
- Run the Places tab tools to complete place hierarchy
- Resolve any parent claim conflicts before running other batch operations
- Skip "Fix bidirectional relationships" — GEDCOM data already contains complete bidirectional relationships
Important: The GEDCOM importer preserves complete relationship data from the source file. Running the bidirectional fix immediately after import is unnecessary and could cause issues if there are unresolved parent claim conflicts (e.g., step-parents flagged as conflicting with biological parents).
Data Entry Guidelines
- Use YYYY-MM-DD format for dates (or YYYY for year-only)
- Use
cr_typeinstead oftypefor note types - Keep frontmatter flat (avoid nested objects)
- Enter gender as
male,female, ornonbinary