SQL - taoualiw/My-Knowledge-Base GitHub Wiki

SQL

Importance of Data normalization :

  1. Are the tables storing logical groupings of the data?
  2. Can I make changes in a single location, rather than in many tables for the same information?
  3. Can I access and manipulate data quickly and efficiently? In a normalized database, the relationships among the tables match the relationships that are really there among the data. Checkout Kent's paper "A Simple Guide to Five Normal Forms in Relational Database Theory". Normalization Rules :
  • Every row has the same number of columns
  • there is a unique key, and everything in a row says something about the key
  • Facts that don't relate to the key belong in different table
  • Tables shouldn't imply relationships that do not exist

PK - The PK here stands for primary key. A primary key exists in every table, and it is a column that has a unique value for every row. FK - A foreign key is when we see a primary key in another table. JOIN applies between an FK and an PK

An SQL query returns a table.

- SQL Types:

The exact list of types differs from one database to another. Some of them are:

  • Text and string types:
    • text - a string of any length . Values are written in 'single quotes'
    • char(n) β€” a string of exactly n characters.
    • varchar(n) β€” a string of up to n characters.
  • Numeric types:
    • integer - like int in python but with different limits
    • serial - is an integer with special property of getting default values
    • real β€” a floating-point value, like Python float. Accurate up to six decimal places.
    • double precision β€” a higher-precision floating-point value. Accurate up to 15 decimal places.
    • decimal β€” an exact decimal value.
  • Date and time types
    • date β€” a calendar date; including year, month, and day. Values are written lile '2017-05-12' (single quotes)
    • time β€” a time of day.
    • timestamp β€” a date and time together.

- Listing tables and columns

One thing terrible in SQL is that there is no standard way of listing tables columns. Each database system supports doing that but they all do it in different ways :

  • PostgreSQl : \dt and \d table-name
  • MySQL: show tables and describe table-name
  • Sqlite : .tables and .schema

and in most of them you can't do this from your code itself only from the database console or from a special administrative software

- Select clauses:

select * from posts \watch

--> display and refresh

SELECT name FROM table_name LIMIT 10 OFFSET 7;

--> returns 10 rows in one page starting from the 7th row

... ORDER BY species DESC;

--> sorts result rows in reverse order

SELECT name,count(*) AS num FROM animals Group By name;

---> Group by defines which column to use when aggregating (count, sum, avg)

Subqueries:

select avg(bigscore) from (select max(score) as bigscore from mooseball group by team) as maxes;

--> postgreSQL requires an alias for subquery (as maxes) but sqlite no. Why do that in a database, I know how to sort a list in python ?:

  • Speed / Memory is much better handled in databases

- Adding rows:

INSERT INTO table_name ( column1, column2, ... ) VALUES ( val1, val2, ... );

- Inner Joins

select animals.name from animals join diet on diet.species = animals.species where diet.food='fish';

- Self Inner Joins

To find pairs of entries that have something in common

-- This query is intended to find pairs of roommates who live in the same building and same room

select a.id, b.id, a.building, a.room
       from residences as a, residences as b
 where a.building = b.building
   and a.room = b.room
   and a.id < b.id --to avoid dubplicates
 order by a.building, a.room;

- Outer (Left/Right) Joins

A regular (inner) join returns only those rows where the two tables have entries matching the join condition. A left join returns all those rows, plus the rows where the left table has an entry but the right table doesn’t. And a right join does the same but for the right table.

select products.name, products.sku, count(sales.sku) as num
  from products left join sales
    on products.sku = sales.sku
  group by products.sku;

- After Aggragating

  • you can't use where after a group by, WHERE runs always before aggregations. You should use HAVING instead.
select name, count(*) as num from sales having num > 5;
select food, count(animals.name) as num
       from diet join animals
       on diet.species = animals.species
       group by food
       having num = 1

More Joins

select ordernames.name, count(*) as num
  from (animals join taxonomy
                on animals.species = taxonomy.name)
                as ani_tax
        join ordernames
             on ani_tax.t_order = ordernames.t_order
  group by ordernames.name
  order by num desc

- Create tables

create table tablename(column1 type [constraints],column2 type [constraints]....[row constraints]);
create table students(id serial primary key, name text, birthdate date);
create table grades(student_id integer references students(id), course text, grade text);

--> defines relation with first data base. Here student_id is a foreign key, it uniquely identifies a row in a reference table (students).

create table postal(postal_code text, country text, name text, primary key (postal_code,country));

Use .sql file

Given an existing database_name:

psql -f file.sql database_name

- Create views

A view is a select query stored in the database in a way that lets you use it like a table. In SQLite you can not update/delete rows in a view, but in postgreSQL it is allowed on simple views (without joins, aggreagations..).

create view topfive as
select species, count(*) as num
  from animals
  group by species
  order by num desc
  limit 5;

Use SQL from Python

Behind web servers, we are running Python that connects to a database (SQLite,Postgres..) using DB-API calls.

References :

⚠️ **GitHub.com Fallback** ⚠️