python sqlalchemy - ghdrako/doc_snipets GitHub Wiki

https://www.sqlalchemy.org/

from sqlalchemy import create_engine
import pandas as pd

cnxn_string = ("postgresql+psycopg2://{username}:{pswd}"
"@{host}:{port}/{database}")
print(cnxn_string)

engine = create_engine(cnxn_string.format(
username="your_username",
pswd="your_password",
host="your_host",
port=5432,
database="your_database_name"))

engine.execute("SELECT * FROM customers LIMIT 2;").fetchall()
// use SQLAlchemy Engine object to read data into a Pandas DataFrame
customers_data = pd.read_sql_table('customers', engine) 

The Pandas read_sql_table function requires two parameters: the name of a table and the connectable database (in this case, the SQLAlchemy Engine). Alternatively, we can use the read_sql_query function, which takes a query string instead of a table name.

query = """
SELECT city,
COUNT(1) AS number_of_customers,
COUNT(NULLIF(gender, 'M')) AS female,
COUNT(NULLIF(gender, 'F')) AS male
FROM customers
WHERE city IS NOT NULL
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
"""

top_cities_data = pd.read_sql_query(query, engine)

ax = top_cities_data.plot.bar('city', y=['female', 'male'], title='Number of Customers by Gender and City')

Writing Data to the Database Using Python

Pandas DataFrame can be write data back to the database using the Pandas to_sql(…) function, which requires two parameters: the name of the table to write to and the connection. Best of all, the to_sql(…) function also creates the target table for us by inferring column types using a DataFrame's data types.

top_cities_data.to_sql('top_cities_data', engine,
index=False, if_exists='replace')

In addition to the two required parameters, we added two optional parameters to this function – the index parameter specifies whether we want the index to be a column in our database table as well (a value of False means that we will not include it), and the if_exists parameter allows us to specify how to handle a scenario in which there is already a table with data in the database. In this case, we want to drop that table and replace it with the new data, so we use the 'replace' option. In general, you should exercise caution when using the 'replace' option as you can inadvertently lose your existing data.

Using \copy with python

import csv
from io import StringIO
def psql_insert_copy(table, conn, keys, data_iter):
  # gets a DBAPI connection that can provide a cursor
  dbapi_conn = conn.connection
  with dbapi_conn.cursor() as cur:
    s_buf = StringIO()
    writer = csv.writer(s_buf)
    writer.writerows(data_iter)
    s_buf.seek(0)
    columns = ', '.join('"{}"'.format(k) for k in keys)
    if table.schema:
      table_name = '{}.{}'.format(table.schema, table.name)
    else:
      table_name = table.name
    sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(table_name, columns)
    cur.copy_expert(sql=sql, file=s_buf)

top_cities_data.to_sql('top_cities_data', engine,index=False, if_exists='replace',method=psql_insert_copy)

Loading dat using insert

import psycopg2
# Establish connection to the Postgresql database
conn = psycopg2.connect(database="your_database_name",
user="your_username", password="your_password",
host="your_host", port="your_port")
# Create a cursor object
cur = conn.cursor()
insert_query_vehicle = '''INSERT INTO chicago_dmv.Vehicle
(CRASH_UNIT_ID,
CRASH_ID,
CRASH_DATE,
VEHICLE_ID,
VEHICLE_MAKE,
VEHICLE_MODEL,
VEHICLE_YEAR,
VEHICLE_TYPE)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s);'''

for index, row in df.iterrows():
  # vehicles
  values_vehicle = (row['CRASH_UNIT_ID'],
                    row['CRASH_ID'],
                    row['CRASH_DATE'],
                    row['VEHICLE_ID'],
                    row['VEHICLE_MAKE'],
                    row['VEHICLE_MODEL'],
                    row['VEHICLE_YEAR'],
                    row['VEHICLE_TYPE'])
  # Insert data int
  cur.execute(insert_query_vehicle, values_vehicle)

# Commit the changes to the database
conn.commit()
# Close the cursor and database connection
cur.close()
conn.close()