.json vs .jsonl - chunhualiao/public-docs GitHub Wiki
Key Differences
.json (Standard JSON)
- One JSON object/array per file
- Must be valid JSON as a whole
- Requires parsing the entire file at once
Example:
{
"users": [
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"},
{"id": 3, "name": "Charlie"}
]
}
.jsonl (JSON Lines)
- One JSON object per line
- Each line is independently valid JSON
- Also called: newline-delimited JSON (ndjson), line-delimited JSON
Example:
{"id": 1, "name": "Alice"}
{"id": 2, "name": "Bob"}
{"id": 3, "name": "Charlie"}
Advantages of .jsonl
- Stream processing - Read/process one line at a time (memory efficient)
- Append-friendly - Add new records without reparsing entire file
- Fault tolerant - One corrupt line doesn't break the whole file
- Parallelizable - Easy to split and process chunks independently
- Log-friendly - Perfect for append-only logs
When to Use Each
Use .json when:
- Data has complex nested structure
- Need to preserve relationships between objects
- File size is manageable
- Standard interchange format required
Use .jsonl when:
- Processing large datasets
- Streaming data (logs, events)
- Need incremental processing
- Working with big data tools (many support jsonl natively)
Python Processing Examples
Reading .json:
with open('data.json') as f:
data = json.load(f) # Load entire file
Reading .jsonl:
with open('data.jsonl') as f:
for line in f:
record = json.loads(line) # Parse each line
process(record)