SlideShare ist ein Scribd-Unternehmen logo
1 von 37
PIR & MINT
-MONA DUBEY
PROTEIN
INFORMATI
ON
RESOURCE(P
IR)
INTRODUCTION
● The Protein Information Resource (PIR) is an integrated public bioinformatics
resource to support genomic, proteomic and systems biology research and
scientific studies.
● The PIR database evolved from the original NBRF Protein Sequence Database,
developed over a 20 year period by the late Margaret O. Dayhoff and published
as the ‘Atlas of Protein Sequence and Structure’. PIR-International is a
collaboration established in 1988 between the NBRF,
the Munich Information Center for Protein Sequences (MIPS),and the Japan
International Protein Information Database(JIPID) to collect and publish what
is now the oldest database of biomolecular sequence, source, bibliographic and
feature information.
Missions of PIR
1. To create and maintain the Protein Sequence Database as a comprehensive,
non-redundant, well verified collection, organized according to biological
principles, including structural, functional and evolutionary relationships.
2. To provide a research tool that supports the study of protein sequences, their
structural and functional properties, and their biological origins.
3. To freely distribute the database to the public by the most accessible means
including the PIR Web site (see Table 1) and CD-ROM.
4.To collaborate with other databases in organizing and coordinating the
presentation of biomolecular structural information.
FEATURES OF THE PROTEIN SEQUENCE
DATABASE
1. Non-redundancy: The database is non-redundant; identical and highly similar
sequences from the same species are merged into a single entry. In merged
entries, each separately reported sequence is represented in a manner that
clearly shows any differences with the canonical sequence shown in the entry
and that allows the reported sequence to be reconstructed on the PIR Web site.
1. Classification: PIR sequences are classified by sequence similarity into
superfamilies, families and homology domains. Alignments of these
families are available. Full-scale family classification assists database organization,
improves database integrity and supports database searches by gene families.
3.Standardized annotation: The PIR Database is a value-added database in which
entries are annotated to include important features not found in the original
submission. Full citations are given, including article titles. Genetic information is
provided, including map position, intron positions and start codon (if different from
AUG). Feature annotations and other terminology have been standardized and
restricted vocabularies are enforced to provide greater accuracy and consistency
4. Cross references: To optimize information retrieval, PIR entries are cross-
referenced to major molecular and reference databases, including Medline,
GenBank, EMBL, DDBJ, Protein Data Bank, Human Genome Database and others.
Hypertext-links to the cross-referenced database entry are available on the PIR
Web site.
5. Comprehensiveness: The Protein Sequence Database, supplemented with other
PIR-maintained databases, comprises the most comprehensive collection of non-
redundant protein sequences available.
6.Public domain with regular releases: The database is freely available to the public
and has been updated and released four times per year since 1988.
Weekly interim updates of the database are available for searching and browsing on
the PIR Web site. All sequence data are available to the public as soon as they are
available to the PIR staff.
7.Information retrieval: The database serves as a major information resource to
support biological research. Retrieval and knowledge discovery are facilitated by a
variety of search options including various database fields (such as superfamily,
authors, features and keywords) and direct database sequence similarity
searches.Family classification and multiple sequence alignments, coupled with
extensive hypertext-links, make it possible to rapidly find and retrieve information
on related sequences in PIR and other molecular databases.
Molecular Inter-
action
database(MINT)
INTRODUCTION
MINT is a database designed to store data on functional
interactions between proteins. Beyond cataloguing binary
complexes, MINT was conceived to store other types of
functional interactions, including enzymatic modifications of
one of the partners.. Release 1.0 of MINT focuses on
experimentally verified protein-protein interactions. Both
direct and indirect relationships are considered.
Furthermore, MINT aims at being exhaustive in the description
of the interaction and, whenever available, information about
kinetic and binding constants and about the domains
participating in the interaction is included in the entry.
● MINT consists of entries extracted from the
scientific literature by expert curators
assisted by `MINT Assistant', a software
that targets abstracts containing
interaction information and presents them
to the curator in a user-friendly format.
RESOURCES
● The interaction data can be easily extracted and viewed
graphically through `MINT Viewer'. Presently MINT
contains 4568 interactions, 782 of which are indirect or
genetic interactions.
● MINT is a relational database designed to collect and
integrate experimental protein interaction data, in a
unique database accessible via a user-friendly web
interface written in an HTML embedded scripting language
named PHP(personal home page) .The MINT core is stored in
an SQL server (PostgreSQL). The entity relationship model
underlying the database structure is shown in a
simplified form in the figure displayed further.
Data submission and MINT Assistant
● MINT entries are curated by expert biologists who carefully screen the
interaction information published in peer-reviewed journals.
● Each entry contains a `core' information consisting of the SWISS-
PROT/TREMBL(Translation of EMBL nucleotide sequence
database)accession numbers of the two proteins and the specification of the
functional interaction (binds, activates, phosphorylatesT). Most of the entries in
the database currently refer to a Pubmed identification (PMID) number.
Unpublished observations, however, can also be added to the database.
● Furthermore, the curator can enter information about the domains that are
demonstrated to be involved in the interaction, the binding and/or kinetic
constants and the experimental method(s) utilized to characterize the
interaction.
● The software scans titles and abstracts, extracted from the scientific literature,
by counting words that are frequent in papers describing protein-protein
interactions, essentially as already described.When a protein name is identified,
the program also registers the gene name, protein accession number and any
other information that is required to complete a protein-protein interaction
entry in MINT.
● The software output consists of several html pages that can be viewed by an
internet browser.The front page displays the abstract titles, ranked according
to the likelihood to contain information about protein interactions, as assessed
by a statistical algorithm. By clicking a title the MINT curator has access to a
new html page that provides the information needed to complete the entry.
When an entry is completed, the information is stored in temporary tables
where the data are automatically double-checked and then entered in the
MINT database tables.
● The MIPS (Munich Information Center for Protein Sequences) yeast physical
and genetic interactions tables have also been incorporated into MINT.
Searching, browsing and visualizing a
protein network
● Searches can be performed via protein name, accession number or keywords.
The search returns a list of entries containing the query names or keywords
(only if present in the KEYWORDS line of the SWISS-PROT entries).
● By clicking the corresponding protein ID all the interactions described in MINT
and having the selected protein as one of the partners are displayed. Each
interaction ID in the output page is hyperlinked to a MINT entry.
● In order to produce the output, the information about a specific interaction is
retrieved from one of the MINT tables and composed in two frames.
1. The first frame contains information about the interacting proteins,
2. The second shows the features of the interaction itself and the corresponding
experimental procedure.
3. Finally a third frame permits to display graphically the network of the
interacting proteins as produced by `MINT Viewer'
As shown in the IMAGE given in the next slide.
● This tool is based on a java applet derived from the Sun's applet `Graph'
(http://java.sun.com) and adapted for use in this database. Proteins are
represented by ovals whose size is proportional to molecular weight. Protein
interactions are represented by lines (edges) connecting the proteins (nodes).
Both nodes and edges are interactive and the action of clicking results in the
display of additional information about the partner proteins and their
interaction or in the expansion of the displayed network.
● At 1 November 2001, the MINT database contained interaction information
about 3556 proteins from 64 different organisms. These proteins participate in
3786 pairwise interactions, 3 multimeric complexes, and 782 `indirect'
interactions.
● 76% of the interactions rely on a single experimental procedure, mostly yeast
two-hybrid, as many as 206 interactions are supported by three independent
approaches.
Current status of MINT
● More than 700 articles have been processed manually by curators and 569
entries describe interactions between proteins of mammalian organisms.
● It is likely that, as the number of interactions in MINT is increased, the smaller
clusters will merge into a single network.
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Swiss PROT
Swiss PROT Swiss PROT
Swiss PROT
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Swiss pdb viewer
Swiss pdb viewerSwiss pdb viewer
Swiss pdb viewer
 
Protein database
Protein  databaseProtein  database
Protein database
 
Prosite
PrositeProsite
Prosite
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Protein sequence databases
Protein sequence databasesProtein sequence databases
Protein sequence databases
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Ddbj
DdbjDdbj
Ddbj
 
EMBL
EMBLEMBL
EMBL
 
Swiss pdb viewer
Swiss pdb viewerSwiss pdb viewer
Swiss pdb viewer
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc database
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 

Ähnlich wie PIR & MINT

Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
SBituila
 
Protein structure
Protein structureProtein structure
Protein structure
Pooja Pawar
 

Ähnlich wie PIR & MINT (20)

Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
BIOINFO unit 1.pptx
BIOINFO unit 1.pptxBIOINFO unit 1.pptx
BIOINFO unit 1.pptx
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
String.pptx
String.pptxString.pptx
String.pptx
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Bind database
Bind databaseBind database
Bind database
 
Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu Bioinformatics in biotechnology by kk sahu
Bioinformatics in biotechnology by kk sahu
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
PERFORMANCE EVALUATION OF STRUCTURED AND SEMI-STRUCTURED BIOINFORMATICS TOOLS...
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASy
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Group 3 presentation.pptx
Group 3 presentation.pptxGroup 3 presentation.pptx
Group 3 presentation.pptx
 
Protein structure
Protein structureProtein structure
Protein structure
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 
Bioinformatics on internet
Bioinformatics on internetBioinformatics on internet
Bioinformatics on internet
 
Protein Network Analysis
Protein Network AnalysisProtein Network Analysis
Protein Network Analysis
 
Biological database
Biological databaseBiological database
Biological database
 

Kürzlich hochgeladen

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Kürzlich hochgeladen (20)

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 

PIR & MINT

  • 3. INTRODUCTION ● The Protein Information Resource (PIR) is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research and scientific studies. ● The PIR database evolved from the original NBRF Protein Sequence Database, developed over a 20 year period by the late Margaret O. Dayhoff and published as the ‘Atlas of Protein Sequence and Structure’. PIR-International is a collaboration established in 1988 between the NBRF,
  • 4. the Munich Information Center for Protein Sequences (MIPS),and the Japan International Protein Information Database(JIPID) to collect and publish what is now the oldest database of biomolecular sequence, source, bibliographic and feature information.
  • 5.
  • 6. Missions of PIR 1. To create and maintain the Protein Sequence Database as a comprehensive, non-redundant, well verified collection, organized according to biological principles, including structural, functional and evolutionary relationships. 2. To provide a research tool that supports the study of protein sequences, their structural and functional properties, and their biological origins. 3. To freely distribute the database to the public by the most accessible means including the PIR Web site (see Table 1) and CD-ROM.
  • 7.
  • 8. 4.To collaborate with other databases in organizing and coordinating the presentation of biomolecular structural information.
  • 9. FEATURES OF THE PROTEIN SEQUENCE DATABASE 1. Non-redundancy: The database is non-redundant; identical and highly similar sequences from the same species are merged into a single entry. In merged entries, each separately reported sequence is represented in a manner that clearly shows any differences with the canonical sequence shown in the entry and that allows the reported sequence to be reconstructed on the PIR Web site. 1. Classification: PIR sequences are classified by sequence similarity into superfamilies, families and homology domains. Alignments of these
  • 10. families are available. Full-scale family classification assists database organization, improves database integrity and supports database searches by gene families. 3.Standardized annotation: The PIR Database is a value-added database in which entries are annotated to include important features not found in the original submission. Full citations are given, including article titles. Genetic information is provided, including map position, intron positions and start codon (if different from AUG). Feature annotations and other terminology have been standardized and restricted vocabularies are enforced to provide greater accuracy and consistency
  • 11. 4. Cross references: To optimize information retrieval, PIR entries are cross- referenced to major molecular and reference databases, including Medline, GenBank, EMBL, DDBJ, Protein Data Bank, Human Genome Database and others. Hypertext-links to the cross-referenced database entry are available on the PIR Web site. 5. Comprehensiveness: The Protein Sequence Database, supplemented with other PIR-maintained databases, comprises the most comprehensive collection of non- redundant protein sequences available. 6.Public domain with regular releases: The database is freely available to the public and has been updated and released four times per year since 1988.
  • 12. Weekly interim updates of the database are available for searching and browsing on the PIR Web site. All sequence data are available to the public as soon as they are available to the PIR staff. 7.Information retrieval: The database serves as a major information resource to support biological research. Retrieval and knowledge discovery are facilitated by a variety of search options including various database fields (such as superfamily, authors, features and keywords) and direct database sequence similarity searches.Family classification and multiple sequence alignments, coupled with extensive hypertext-links, make it possible to rapidly find and retrieve information on related sequences in PIR and other molecular databases.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 23. INTRODUCTION MINT is a database designed to store data on functional interactions between proteins. Beyond cataloguing binary complexes, MINT was conceived to store other types of functional interactions, including enzymatic modifications of one of the partners.. Release 1.0 of MINT focuses on experimentally verified protein-protein interactions. Both direct and indirect relationships are considered. Furthermore, MINT aims at being exhaustive in the description of the interaction and, whenever available, information about kinetic and binding constants and about the domains participating in the interaction is included in the entry.
  • 24. ● MINT consists of entries extracted from the scientific literature by expert curators assisted by `MINT Assistant', a software that targets abstracts containing interaction information and presents them to the curator in a user-friendly format. RESOURCES
  • 25. ● The interaction data can be easily extracted and viewed graphically through `MINT Viewer'. Presently MINT contains 4568 interactions, 782 of which are indirect or genetic interactions. ● MINT is a relational database designed to collect and integrate experimental protein interaction data, in a unique database accessible via a user-friendly web interface written in an HTML embedded scripting language named PHP(personal home page) .The MINT core is stored in an SQL server (PostgreSQL). The entity relationship model underlying the database structure is shown in a simplified form in the figure displayed further.
  • 26.
  • 27. Data submission and MINT Assistant ● MINT entries are curated by expert biologists who carefully screen the interaction information published in peer-reviewed journals. ● Each entry contains a `core' information consisting of the SWISS- PROT/TREMBL(Translation of EMBL nucleotide sequence database)accession numbers of the two proteins and the specification of the functional interaction (binds, activates, phosphorylatesT). Most of the entries in the database currently refer to a Pubmed identification (PMID) number. Unpublished observations, however, can also be added to the database.
  • 28. ● Furthermore, the curator can enter information about the domains that are demonstrated to be involved in the interaction, the binding and/or kinetic constants and the experimental method(s) utilized to characterize the interaction. ● The software scans titles and abstracts, extracted from the scientific literature, by counting words that are frequent in papers describing protein-protein interactions, essentially as already described.When a protein name is identified, the program also registers the gene name, protein accession number and any other information that is required to complete a protein-protein interaction entry in MINT.
  • 29. ● The software output consists of several html pages that can be viewed by an internet browser.The front page displays the abstract titles, ranked according to the likelihood to contain information about protein interactions, as assessed by a statistical algorithm. By clicking a title the MINT curator has access to a new html page that provides the information needed to complete the entry. When an entry is completed, the information is stored in temporary tables where the data are automatically double-checked and then entered in the MINT database tables. ● The MIPS (Munich Information Center for Protein Sequences) yeast physical and genetic interactions tables have also been incorporated into MINT.
  • 30. Searching, browsing and visualizing a protein network ● Searches can be performed via protein name, accession number or keywords. The search returns a list of entries containing the query names or keywords (only if present in the KEYWORDS line of the SWISS-PROT entries). ● By clicking the corresponding protein ID all the interactions described in MINT and having the selected protein as one of the partners are displayed. Each interaction ID in the output page is hyperlinked to a MINT entry.
  • 31. ● In order to produce the output, the information about a specific interaction is retrieved from one of the MINT tables and composed in two frames. 1. The first frame contains information about the interacting proteins, 2. The second shows the features of the interaction itself and the corresponding experimental procedure. 3. Finally a third frame permits to display graphically the network of the interacting proteins as produced by `MINT Viewer' As shown in the IMAGE given in the next slide.
  • 32.
  • 33. ● This tool is based on a java applet derived from the Sun's applet `Graph' (http://java.sun.com) and adapted for use in this database. Proteins are represented by ovals whose size is proportional to molecular weight. Protein interactions are represented by lines (edges) connecting the proteins (nodes). Both nodes and edges are interactive and the action of clicking results in the display of additional information about the partner proteins and their interaction or in the expansion of the displayed network.
  • 34. ● At 1 November 2001, the MINT database contained interaction information about 3556 proteins from 64 different organisms. These proteins participate in 3786 pairwise interactions, 3 multimeric complexes, and 782 `indirect' interactions. ● 76% of the interactions rely on a single experimental procedure, mostly yeast two-hybrid, as many as 206 interactions are supported by three independent approaches. Current status of MINT
  • 35. ● More than 700 articles have been processed manually by curators and 569 entries describe interactions between proteins of mammalian organisms. ● It is likely that, as the number of interactions in MINT is increased, the smaller clusters will merge into a single network.
  • 36.