File and Data Serialization - CameronAuler/python-devops GitHub Wiki
Data serialization allows storing and transferring structured data in formats like CSV, JSON, XML, and Pickle. This is essential for saving, sharing, and loading data in different applications.
CSV (Comma-Separated Values) is a tabular format where values are separated by commas. It is used in spreadsheets, databases, and data exchanges.
Use case: Extracting data from spreadsheets, logs, and databases.
import csv
with open("data.csv", mode="r", newline="") as file:
reader = csv.reader(file)
for row in reader:
print(row) # Each row is a list of values
Use case: Saving structured data for easy storage and sharing.
import csv
data = [["Name", "Age"], ["Alice", 25], ["Bob", 30]]
with open("output.csv", mode="w", newline="") as file:
writer = csv.writer(file)
writer.writerows(data) # Write multiple rows
Use cases: Handling CSV files with named columns.
import csv
with open("data.csv", mode="r", newline="") as file:
reader = csv.DictReader(file) # Reads CSV into dictionaries
for row in reader:
print(row["Name"], row["Age"])
JavaScript Object Notation (JSON) is a human-readable format used for APIs and data storage. It supports nested structures (dictionaries & lists).
Use cases: Processing API responses and structured configuration files.
import json
with open("data.json", "r") as file:
data = json.load(file) # Load JSON data into a Python dictionary
print(data)
Use cases: Saving structured data for configuration, APIs, and inter-service communication.
import json
data = {"name": "Alice", "age": 25}
with open("output.json", "w") as file:
json.dump(data, file, indent=4) # Save JSON with indentation
Use cases: Converting Python objects to JSON for storage, APIs, and web development.
import json
json_str = '{"name": "Bob", "age": 30}'
python_dict = json.loads(json_str) # Convert JSON string to Python dictionary
print(python_dict["name"])
python_to_json = json.dumps(python_dict, indent=4) # Convert Python dictionary to JSON string
print(python_to_json)
XML (Extensible Markup Language) stores hierarchical data using tags. It is commonly used for web services, configurations, and structured data storage.
Use case: Parsing data from XML-based web services and configurations.
import xml.etree.ElementTree as ET
xml_data = """<data>
<person>
<name>Alice</name>
<age>25</age>
</person>
</data>"""
root = ET.fromstring(xml_data) # Parse XML string
for person in root.findall("person"):
name = person.find("name").text
age = person.find("age").text
print(name, age)
Use case: Saving structured hierarchical data in XML format.
import xml.etree.ElementTree as ET
root = ET.Element("data") # Root element
person = ET.SubElement(root, "person")
ET.SubElement(person, "name").text = "Alice"
ET.SubElement(person, "age").text = "25"
tree = ET.ElementTree(root)
tree.write("output.xml")
The pickle
module serializes Python objects into a binary format. It allows for saving complex objects like dictionaries, lists, or custom classes.
Use case: Saving Python objects for later use.
import pickle
data = {"name": "Alice", "age": 25}
with open("data.pkl", "wb") as file: # "wb" mode for writing in binary
pickle.dump(data, file)
Use case: Restoring objects from a saved state.
import pickle
with open("data.pkl", "rb") as file: # "rb" mode for reading binary
data = pickle.load(file)
print(data) # Output: {'name': 'Alice', 'age': 25}
Use case: Storing Python class instances for later use.
import pickle
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
person = Person("Alice", 25)
# Save object
with open("person.pkl", "wb") as file:
pickle.dump(person, file)
# Load object
with open("person.pkl", "rb") as file:
loaded_person = pickle.load(file)
print(loaded_person.name, loaded_person.age) # Alice 25
Format | Best Use Case |
---|---|
CSV | Tabular data (spreadsheets, reports, logs) |
JSON | API communication, config files, lightweight storage |
XML | Hierarchical data (configurations, web services) |
Pickle | Saving Python objects (dictionaries, classes) |