.json vs .jsonl - chunhualiao/public-docs GitHub Wiki

Key Differences

.json (Standard JSON)

  • One JSON object/array per file
  • Must be valid JSON as a whole
  • Requires parsing the entire file at once

Example:

{
  "users": [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"},
    {"id": 3, "name": "Charlie"}
  ]
}

.jsonl (JSON Lines)

  • One JSON object per line
  • Each line is independently valid JSON
  • Also called: newline-delimited JSON (ndjson), line-delimited JSON

Example:

{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Charlie"}

Advantages of .jsonl

  1. Stream processing - Read/process one line at a time (memory efficient)
  2. Append-friendly - Add new records without reparsing entire file
  3. Fault tolerant - One corrupt line doesn't break the whole file
  4. Parallelizable - Easy to split and process chunks independently
  5. Log-friendly - Perfect for append-only logs

When to Use Each

Use .json when:

  • Data has complex nested structure
  • Need to preserve relationships between objects
  • File size is manageable
  • Standard interchange format required

Use .jsonl when:

  • Processing large datasets
  • Streaming data (logs, events)
  • Need incremental processing
  • Working with big data tools (many support jsonl natively)

Python Processing Examples

Reading .json:

with open('data.json') as f:
    data = json.load(f)  # Load entire file

Reading .jsonl:

with open('data.jsonl') as f:
    for line in f:
        record = json.loads(line)  # Parse each line
        process(record)