09 03 Working with relational databases in Python - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki
- tables are linked
- Relational Database Management Systems
- PostgreSQL
- MySQL
- SQLite
Creating a database engine in Python
- SQLite database
- Fast and simple
- SQLAlchemy
- Works with many Relational Database Management Systems
'sqlite:///Northwind.sqlite'
is called the connection string to the SQLite databaseNorthwind.sqlite
# Import necessary module
from sqlalchemy import create_engine
# Create engine: engine
engine = create_engine('sqlite:///Northwind.sqlite')
- get table names :
table_names = engine.table_names()
: alist
of table names
Querying relational databases in Python
Basic SQL query
SELECT * FROM Table_Name
Workflow of SQL querying
- Import packages and functions :
pandas
&sqlalchemy``create_engine
- Create the database engine
engine = create_engine('sqlite:///xxx.sqlite')
- Connect to the engine :
con = engine.connect()
- Query the database :
rs = con.execute('SELECT * FROM Table')
- Save query results to a DataFrame:
df = pd.DataFrame(rs.fetchall())
- Set column names:
df.columns = rs.keys()
- Close the connection :
con.close()
Using the context manager
- after create the database engine
- Select specified columns from a table;
fetchmany(size=)
: Select a specified number of rows;WHERE
: filterORDER BY
: ordering
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
rs = con.execute('SELECT LastName, Title FROM Employee')
df = pd.DataFrame(rs.fetchmany(size=3))
df.columns = rs.keys()
Querying relational databases directly with pandas
df = pd.read_sql_query("SELECT * FROM Table", engine)
- A more complex querying:
df = pd.read_sql_query('SELECT * FROM Employee WHERE EmployeeId >=6 ORDER BY BirthDate', engine)
Advanced querying: exploiting table relationships
JOINing tables (INNER JOIN)
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
rs = con.execute('SELECT Title, Name FROM Album INNER JOIN Artist on Album.ArtistID = Artist.ArtistID')
df = pd.DataFrame(rs.fetchall())
df.columns =['Title','Name']
also
df = pd.read_sql_query('SELECT * FROM PlaylistTrack INNER JOIN Track on PlaylistTrack.TrackId = Track.TrackId WHERE Milliseconds < 250000', engine)