SQL Query Examples - saitomics/metatryp GitHub Wiki
Using the Database
The main database is a SQLite standalone database file proteomics.db.sqlite
.
The redundancy database is a SQLite standalone database file redundancy.db.sqlite
.
If sqlite installed, then these can be queried directly via sqlite:
sqlite3 redundancy.db.sqlite
or sqlite3 proteomics.db.sqlite
Use of the sqlite3 client is out of the scope of this document. Run the command 'man sqlite3' for more information about the sqlite3 client. (type 'q' to leave the man page)
For general SQL tutorials, try this site: http://sqlzoo.net/
If you plan to use the database, it is recommended that you understand the 'Database Schema' section. Below are Example Queries:
Identify all taxa in the redundancy table database
SELECT
taxon.id
FROM
taxon
Count unique peptides in a taxon digest
SELECT
count(*)
FROM
taxon_digest_peptide
JOIN taxon_digest ON taxon_digest_peptide.taxon_digest_id = taxon_digest.id
JOIN taxon ON taxon_digest.taxon_id = taxon.id
JOIN digest on taxon_digest.digest_id = digest.id
WHERE
taxon.id = 'syn7805'
AND digest.id = 1
List Peptides for a given taxon digest
SELECT
taxon.id, peptide.sequence
FROM
taxon_digest_peptide
JOIN peptide ON taxon_digest_peptide.peptide_id = peptide.id
JOIN taxon_digest ON taxon_digest_peptide.taxon_digest_id = taxon_digest.id
JOIN taxon ON taxon_digest.taxon_id = taxon.id
JOIN digest on taxon_digest.digest_id = digest.id
WHERE
taxon.id = 'syn7805'
AND digest.id = 1
List taxa whose digests produce a specific peptide sequence
SELECT
peptide.sequence, taxon.id
FROM
taxon_digest_peptide
JOIN peptide ON taxon_digest_peptide.peptide_id = peptide.id
JOIN taxon_digest ON taxon_digest_peptide.taxon_digest_id = taxon_digest.id
JOIN taxon ON taxon_digest.taxon_id = taxon.id
JOIN digest on taxon_digest.digest_id = digest.id
WHERE
peptide.sequence = 'ATLQP'
AND digest.id = 1
ORDER BY
taxon.id
Count number of proteins that occur multiple times in a taxon proteome
SELECT
COUNT(*) AS multiply_occuring_proteins
FROM (
SELECT
COUNT(protein.id) AS num_instances
FROM
taxon_protein
JOIN taxon ON taxon_protein.taxon_id = taxon.id
JOIN protein ON taxon_protein.protein_id = protein.id
WHERE
taxon.id = 'syn7805'
GROUP BY
protein.id
HAVING
num_instances > 1
) AS subquery