09 03 Working with relational databases in Python - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki

  • tables are linked
  • Relational Database Management Systems
    • PostgreSQL
    • MySQL
    • SQLite

Creating a database engine in Python

  • SQLite database
    • Fast and simple
  • SQLAlchemy
    • Works with many Relational Database Management Systems
  • 'sqlite:///Northwind.sqlite' is called the connection string to the SQLite database Northwind.sqlite
# Import necessary module
from sqlalchemy import create_engine

# Create engine: engine
engine = create_engine('sqlite:///Northwind.sqlite')
  • get table names : table_names = engine.table_names() : a list of table names

Querying relational databases in Python

Basic SQL query

SELECT * FROM Table_Name

Workflow of SQL querying

  • Import packages and functions : pandas & sqlalchemy``create_engine
  • Create the database engine engine = create_engine('sqlite:///xxx.sqlite')
  • Connect to the engine : con = engine.connect()
  • Query the database : rs = con.execute('SELECT * FROM Table')
  • Save query results to a DataFrame: df = pd.DataFrame(rs.fetchall())
  • Set column names: df.columns = rs.keys()
  • Close the connection : con.close()

Using the context manager

  • after create the database engine
  • Select specified columns from a table;
  • fetchmany(size=): Select a specified number of rows;
  • WHERE: filter
  • ORDER BY: ordering
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute('SELECT LastName, Title FROM Employee')
    df = pd.DataFrame(rs.fetchmany(size=3))
    df.columns = rs.keys()

Querying relational databases directly with pandas

  • df = pd.read_sql_query("SELECT * FROM Table", engine)
  • A more complex querying:
    • df = pd.read_sql_query('SELECT * FROM Employee WHERE EmployeeId >=6 ORDER BY BirthDate', engine)

Advanced querying: exploiting table relationships

JOINing tables (INNER JOIN)

# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute('SELECT Title, Name FROM Album INNER JOIN Artist on Album.ArtistID = Artist.ArtistID')
    df = pd.DataFrame(rs.fetchall())
    df.columns =['Title','Name']

also

df = pd.read_sql_query('SELECT * FROM PlaylistTrack INNER JOIN Track on PlaylistTrack.TrackId = Track.TrackId WHERE Milliseconds < 250000', engine)