Home - PRIDE-Archive/pride-converter-2 GitHub Wiki
#Introduction
PRIDE Converter 2 is a complete rewrite of the previous version (PRIDE Converter).
##How to build the tool
The main directory for the converter tool is: https://github.com/PRIDE-Toolsuite/pride-converter-2/tree/master/pride-converter
This is the directory that should be packaged, which can be done by first checking out the code, then go to that directory in a terminal and run the command: maven:package
Each subdirectory of the repository's parent directory is a library itself. These libraries should be released and deployed to the new EBI Nexus, as listed in the POM files.
If a dependency is not present in Nexus, this will prevent the converter tool being built. This would be a mistake and it will need to be released properly. For the immediate-term such a library will need to be installed to your local machine manually with Maven, e.g. for the converter-common library dependency under https://github.com/PRIDE-Toolsuite/pride-converter-2/tree/master/converter-common first check out the code, then go to that directory in a terminal and run the command: maven:install
This would need to be repeated for all missing dependencies.
#Supported Formats
Currently, PRIDE Converter 2 supports the following MS data formats:
Format|Type --- | --- | --- Mascot .dat | Identifications + Spectra X!Tandem | Identifications + (processed spectra). Can process additional peak list files for spectra mzIdentML | Identifications + spectra in additional file OMSSA | Identifications + Spectra dta | Spectra mgf | Spectra ms2 | Spectra mzML | Spectra mzXML | Spectra pkl | Spectra Crux-txt | Identifications + Spectra SpectraST-xls | Identifications + Spectra OMSSA-csv | Identifications + Spectra
#Supported additional data
Additional data is generally extracted from mzTab files. A "skeleton" mzTab file can be generated using the new PRIDE Converter when setting "mztab" as mode. This generates a basic mzTab file based on the search engine's input file. This file can then be used to add quantitative / gel based data as described below.
##Special mzTab fields
Some fields / values can be supplied using defined optional columns. The more "simpler" fields are summarised in the following table:
Column Header | Level | Description |
---|---|---|
opt_empai | Protein | The emPAI for the given protein. This value is mapped to the cvParam PRIDE:0000363 "emPAI value". |
opt_tic | Protein, Peptide | The Total Ion Count (TIC) for the given protein or peptide. This valus is mapped to the cvParam PRIDE:0000364 "TIC value". |
##Quantiative data
Quantitative data should be reported as defined by the mzTab format specification. This information is then automatically parsed by PRIDE Converter. Detailed information about what kind of data needs to be present can be found in QuantitativeMzTabFiles.
##Gel-based data
PRIDE Converter is able to store basic gel-specific data in PRIDE XML files. For detailed information how the data must be prepared please see GelBasedData.
#Known Issues
- Duplicate protein entries: The Mascot DAO can report indistinguishable accessions for a protein identification. In case a protein's accession is not found in the provided mzTab file but one of the indistinguishable ones is the protein's accession and this indistinguishable accession are replaced. In rare cases PRIDE XML files might already contain an entry with such an accession. This results in two protein entries with the same accession but different peptides in the PRIDE XML file.