2. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI meeting
Ghent, 18 April 2016
Proteogenomics related formats
• Two ongoing formats are being developed: proBed and
proBAM.
• Same overall objective: to map identified peptides to
genome coordinates.
• Different level of detail:
• proBed is tab-delimited and simpler, based on the original BED
format. Less level of detail.
• proBAM is based in the original SAM/BAM formats, widely
used in genomics. Much higher level of detail.
4. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI meeting
Ghent, 18 April 2016
proBed files
12 Mandatory columns, 12 optional columns
• Tab delimited format, that extends the original BED format
(developed by UCSC).
• The 12 original fields are unchanged from the original BED.
• Additional 12 fields extend the format for reporting PSMs.
• It can be converted to the binary indexed version (bigBed)
with the available Bed tools.
• It needs to be compatible with Bed tools.
• bigBed is a format supported by genome browsers to
provide external annotations as ‘TrackHubs’.
5. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI meeting
Ghent, 18 April 2016
proBed file format specification
• Status: format is “final”.
• Final example files to be generated.
• Not sure whether it is worth going
through the PSI standardisation process
6. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI meeting
Ghent, 18 April 2016
proBAM format (X. Wang, G. Menschaert)
• Supported used cases:
• Includes the exact positions of the origin of a peptide identification in the
genome
• Includes very detailed information related to proteomics data
• Serving as a well-defined interface between PSM identification and
downstream analyses
• The Sequence Alignment Map format (SAM/BAM)
• BAM-a highly integrated and compact structure to describe genomic
alignments.
– Flexible in style
– A well defined interface between alignment and downstream analyses
– Compact in size
– Efficient in random access or integration
7. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI meeting
Ghent, 18 April 2016
proBAM file specification
• Status: closed to finish the first phase (development by a
small group). To be circulated to PSI mailing list.
• The format is much more detailed that proBed.
• First tools to produce the software exist.
• Going to be submitted to the PSI standardisation process.
13. Juan A. Vizcaíno
juan@ebi.ac.uk
PSI meeting
Ghent, 18 April 2016
Acknowledgements
proBED
Andy Jones
Tobias Ternent
Fawaz Ghali
proBAM
X. Wang
G. Menschaert
E. Deutsch
BBSRC PROCESS grant
BBSRC ProteoGenomics grant
Ensembl team at EMBL-EBI:
A. Yates, A. Vullo & P. Flicek