SQL - taoualiw/My-Knowledge-Base GitHub Wiki
Importance of Data normalization :
- Are the tables storing logical groupings of the data?
- Can I make changes in a single location, rather than in many tables for the same information?
- Can I access and manipulate data quickly and efficiently? In a normalized database, the relationships among the tables match the relationships that are really there among the data. Checkout Kent's paper "A Simple Guide to Five Normal Forms in Relational Database Theory". Normalization Rules :
- Every row has the same number of columns
- there is a unique key, and everything in a row says something about the key
- Facts that don't relate to the key belong in different table
- Tables shouldn't imply relationships that do not exist
PK - The PK here stands for primary key. A primary key exists in every table, and it is a column that has a unique value for every row. FK - A foreign key is when we see a primary key in another table. JOIN applies between an FK and an PK
An SQL query returns a table.
The exact list of types differs from one database to another. Some of them are:
-
Text and string types:
- text - a string of any length . Values are written in 'single quotes'
- char(n) β a string of exactly n characters.
- varchar(n) β a string of up to n characters.
-
Numeric types:
- integer - like int in python but with different limits
- serial - is an integer with special property of getting default values
- real β a floating-point value, like Python float. Accurate up to six decimal places.
- double precision β a higher-precision floating-point value. Accurate up to 15 decimal places.
- decimal β an exact decimal value.
-
Date and time types
- date β a calendar date; including year, month, and day. Values are written lile '2017-05-12' (single quotes)
- time β a time of day.
- timestamp β a date and time together.
One thing terrible in SQL is that there is no standard way of listing tables columns. Each database system supports doing that but they all do it in different ways :
- PostgreSQl : \dt and \d table-name
- MySQL: show tables and describe table-name
- Sqlite : .tables and .schema
and in most of them you can't do this from your code itself only from the database console or from a special administrative software
select * from posts \watch
--> display and refresh
SELECT name FROM table_name LIMIT 10 OFFSET 7;
--> returns 10 rows in one page starting from the 7th row
... ORDER BY species DESC;
--> sorts result rows in reverse order
SELECT name,count(*) AS num FROM animals Group By name;
---> Group by defines which column to use when aggregating (count, sum, avg)
Subqueries:
select avg(bigscore) from (select max(score) as bigscore from mooseball group by team) as maxes;
--> postgreSQL requires an alias for subquery (as maxes) but sqlite no. Why do that in a database, I know how to sort a list in python ?:
- Speed / Memory is much better handled in databases
INSERT INTO table_name ( column1, column2, ... ) VALUES ( val1, val2, ... );
select animals.name from animals join diet on diet.species = animals.species where diet.food='fish';
To find pairs of entries that have something in common
-- This query is intended to find pairs of roommates who live in the same building and same room
select a.id, b.id, a.building, a.room
from residences as a, residences as b
where a.building = b.building
and a.room = b.room
and a.id < b.id --to avoid dubplicates
order by a.building, a.room;
A regular (inner) join returns only those rows where the two tables have entries matching the join condition. A left join returns all those rows, plus the rows where the left table has an entry but the right table doesnβt. And a right join does the same but for the right table.
select products.name, products.sku, count(sales.sku) as num
from products left join sales
on products.sku = sales.sku
group by products.sku;
- you can't use where after a group by,
WHERE
runs always before aggregations. You should useHAVING
instead.
select name, count(*) as num from sales having num > 5;
select food, count(animals.name) as num
from diet join animals
on diet.species = animals.species
group by food
having num = 1
More Joins
select ordernames.name, count(*) as num
from (animals join taxonomy
on animals.species = taxonomy.name)
as ani_tax
join ordernames
on ani_tax.t_order = ordernames.t_order
group by ordernames.name
order by num desc
create table tablename(column1 type [constraints],column2 type [constraints]....[row constraints]);
create table students(id serial primary key, name text, birthdate date);
create table grades(student_id integer references students(id), course text, grade text);
--> defines relation with first data base. Here student_id is a foreign key, it uniquely identifies a row in a reference table (students).
create table postal(postal_code text, country text, name text, primary key (postal_code,country));
Given an existing database_name:
psql -f file.sql database_name
A view is a select query stored in the database in a way that lets you use it like a table. In SQLite you can not update/delete rows in a view, but in postgreSQL it is allowed on simple views (without joins, aggreagations..).
create view topfive as
select species, count(*) as num
from animals
group by species
order by num desc
limit 5;
Behind web servers, we are running Python that connects to a database (SQLite,Postgres..) using DB-API calls.