SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Kusarinoko:
developing
the public next generation sequencing data
search interface
that works.


                                               Tazro Ohta
                          Database Center for Life Science
         Research Organization of Information and Systems
Problems for NGS data archive
managing large-scale data


Kusarinoko project, for better way to search and browse
metadata, fix and add


Inside of Sequence Read Archive
statistics of SRA reveals how it is




Today’s topics
Problems for NGS data
archive
Storing large-scale NGS data causes many problems
data transfer, storage, backup...

Metadata management is one big problem for public NGS
database
metadata : description of sequencing data. sample, sequencer platform,
application, etc.


Fixing metadata is a lifeline for public NGS database




Cost of storing large-scale sequence data
organism : mouse

                                                                                          ATGCATGCATGCATGCATGCAT
                                                                                          GCATGCATGCATGCATGCATGC : nervous cell
                                                                                                           cell
                                                                                          ATGCATGCATGCATGCATGCAT
                                                                                          GCATGCATGCATGATGCATGCA
                                                                                                             sequencer : 454
                                                                                          TGCATGCATGCATGCATGCATG
                                                                                          CATGCATGCATGCATGCATGCA
                                                                                                            date : 2011 12 08
                                                                                          TGCATGATGCATCGATGCAATG
                                                                                          CATGCATGCATGCATGCATGCA
                                                                                          TGCATGCATGCATGCATGCATG
                                                                                          CATGCATGCATGCAGCATGCAT
                                                                                          GCATGCATGCATGCATGCATGC



                                                                       SRA                ATGCATGCATGCATGCATGCAT




Lab / Research institute

                                                                DRA                                INSDC
                                                                                  int’l nucleotide seq DB collaboration

                                                                          data exchange
                                                                           and sharing
                                                                                                                                  ATGCATGCATGCAT
                                                                                                                                  GCATGCATGCATGC
                                                                                                                                  ATGCATGCATGCAT



data submission
                                      ATGCATGCATGCATGCATGCAT                                                                      GCATGCATGCATGA
                                      GCATGCATGCATGCATGCATGC                                                                      TGCATGCATGCATG
                                      ATGCATGCATGCATGCATGCAT                                                                      CATGCATGCATGCA
                                      GCATGCATGCATGATGCATGCA                                                                                 Dat
                                                                                                                                  TGCATGATGCATCG
                                      TGCATGCATGCATGCATGCATG



  w/ metadata
                                                                                                                                  CATGCATGCATGCA
                        Data ID : 000001
                                      CATGCATGCATGCATGCATGCA                                                                                 org
                                                                                                                                  TGCATGCATGCATG
                                       TGCATGATGCATCGATGCAATG                                                                     CATGCATGCATGCA
                                       CATGCATGCATGCATGCATGCA
                        organism : mouse                                                                                          GCATGCATGCATGC
                                                                                                                                            cell
                                       TGCATGCATGCATGCATGCATG                                                                     ATGCATGCATGCAT
                                       CATGCATGCATGCAGCATGCAT
                       cell : nervous cell                                                                                                   seq
                                       GCATGCATGCATGCATGCATGC
                                       ATGCATGCATGCATGCATGCAT
                         sequencer : 454                                                                                                    date


                        date : 2011 12 08



                                                                                                                           ENA
                                                                      Sequence Read Archive


Public NGS database, Sequence Read Archive
Over 55,000 submissions, over 350,000 sequence runs
and still increasing amount and size of the data

Metadata is provided apart, and is not described perfectly
submission / study / experiment / sample / run


Fixing metadata and adding extra information is NEEDED




It cannot be easy to find the data you want
Kusarinoko project, for better way to search and browse
Cutting the cost of using public data of SRA
search, browse, download, check


Giving more resources to support using data
is the data really sound?




Aim of Kusarinoko project
Study.xml        Experiment.xml        Submission.xml          Sequence Data

             metadata
Run.xml                    Sample.xml
                                           pubmed ID              FastQC result

                                        get from sra.dbcls.jp   calculate seq quality
          Submission.xml                                            by FastQC

                                            integrate

                                    Kusarinoko



 Integrate metadata, add extra information
Covering only the data which has at least one published
article
if a paper is not published yet, Kusarinoko cannot find it. publication info:
sra.dbcls.jp


Quality checking is still beta ver
still on validating and trying to offer better information, will take more time




Limitation and features
http://g86.dbcls.jp/kusarinoko or google “kusarinoko”
Inside of Sequence Read Archive
Statistics of SRA by publication and seq quality

  ONLY PUBLIC NGS DATA IN SRA WHICH HAS
PUBLICATION

  Detailed stat will be available online at project website soon




Statistics for stepping into SRA
2007~2011
                            number of
                            submission

                            Blue: Roche
                            Yellow: Illumina
                            Green: AB
                            Pink: Helicos
                            Red: PacBio




platform trend statistics
number of PubMed
                     ID

                     colored by Library
                     type
                     Blue: genomic
                     Red: transcriptomic
                     Brown:
                     metagenomic
                     Yellow: synthetic
                     Purple: Viral RNA

                     Green: non genomic
 total 97 journals   (unidentified) 587
                      total # of pmid:



Journal statistics
quick quality calc;
                                                 total average qual
                                                 (phred)

                                                 Blue: Roche
                                                 Yellow: Illumina
                                                 Green: AB
                                                 Pink: Helicos
                                                 Red: PacBio

                                                 same as max read
                                                 length
                                                       total # of items
                                  (continuing)
                                                       (run): 16,006



minimum read length vs average quality value
total N content rate;

                                                    no correlation with
                                                    number of reads,
                                                    library prep methods



                                                          total # of items
                                     (continuing)
                                                          (run): 16,006



total number of reads vs N content
total sequence
                                                   duplication
                                                   same as previous stat

                                                   amount of reads
                                                   seems not to effect
                                                   duplication


                                                         total # of items
                                    (continuing)
                                                         (run): 16,006



total number of reads vs duplication rate
Conclusion
Developed a service to help searching and browsing SRA data
publication information and result of quality check support the metadata.


Statistics revealed the inside of SRA and gave some insights
showed NGS trends, and some items don’t have enough quality even if it has a
published article.


Detailed results and more at poster presentation: 2P-0132
(today)



Conclusion: for making use of public resources
Thank You

Weitere ähnliche Inhalte

Mehr von Tazro Ohta

遺伝研 Rina Aizawa ユーザミーティング
遺伝研 Rina Aizawa ユーザミーティング遺伝研 Rina Aizawa ユーザミーティング
遺伝研 Rina Aizawa ユーザミーティングTazro Ohta
 
Database Integration to Improve Accessibility to High-Throughput Sequence Data
Database Integration to Improve Accessibility to High-Throughput Sequence DataDatabase Integration to Improve Accessibility to High-Throughput Sequence Data
Database Integration to Improve Accessibility to High-Throughput Sequence DataTazro Ohta
 
Now and then: next-generation sequencing database to encourage the big data s...
Now and then: next-generation sequencing database to encourage the big data s...Now and then: next-generation sequencing database to encourage the big data s...
Now and then: next-generation sequencing database to encourage the big data s...Tazro Ohta
 
次世代おもろい話
次世代おもろい話次世代おもろい話
次世代おもろい話Tazro Ohta
 
第三回統合牧場収穫祭イントロダクション
第三回統合牧場収穫祭イントロダクション 第三回統合牧場収穫祭イントロダクション
第三回統合牧場収穫祭イントロダクション Tazro Ohta
 
Large-scale data in life science
Large-scale data in life scienceLarge-scale data in life science
Large-scale data in life scienceTazro Ohta
 
JPMA forum 2 at DBCLS 3. June 2011
JPMA forum 2 at DBCLS 3. June 2011JPMA forum 2 at DBCLS 3. June 2011
JPMA forum 2 at DBCLS 3. June 2011Tazro Ohta
 
"次世代シーケンサのデータ解析 戦略立案編"
"次世代シーケンサのデータ解析 戦略立案編""次世代シーケンサのデータ解析 戦略立案編"
"次世代シーケンサのデータ解析 戦略立案編"Tazro Ohta
 
Transcriptome Sequenceによる遺伝子発現解析の実際
Transcriptome Sequenceによる遺伝子発現解析の実際Transcriptome Sequenceによる遺伝子発現解析の実際
Transcriptome Sequenceによる遺伝子発現解析の実際Tazro Ohta
 

Mehr von Tazro Ohta (10)

遺伝研 Rina Aizawa ユーザミーティング
遺伝研 Rina Aizawa ユーザミーティング遺伝研 Rina Aizawa ユーザミーティング
遺伝研 Rina Aizawa ユーザミーティング
 
Database Integration to Improve Accessibility to High-Throughput Sequence Data
Database Integration to Improve Accessibility to High-Throughput Sequence DataDatabase Integration to Improve Accessibility to High-Throughput Sequence Data
Database Integration to Improve Accessibility to High-Throughput Sequence Data
 
Now and then: next-generation sequencing database to encourage the big data s...
Now and then: next-generation sequencing database to encourage the big data s...Now and then: next-generation sequencing database to encourage the big data s...
Now and then: next-generation sequencing database to encourage the big data s...
 
次世代おもろい話
次世代おもろい話次世代おもろい話
次世代おもろい話
 
第三回統合牧場収穫祭イントロダクション
第三回統合牧場収穫祭イントロダクション 第三回統合牧場収穫祭イントロダクション
第三回統合牧場収穫祭イントロダクション
 
Large-scale data in life science
Large-scale data in life scienceLarge-scale data in life science
Large-scale data in life science
 
JPMA forum 2 at DBCLS 3. June 2011
JPMA forum 2 at DBCLS 3. June 2011JPMA forum 2 at DBCLS 3. June 2011
JPMA forum 2 at DBCLS 3. June 2011
 
"次世代シーケンサのデータ解析 戦略立案編"
"次世代シーケンサのデータ解析 戦略立案編""次世代シーケンサのデータ解析 戦略立案編"
"次世代シーケンサのデータ解析 戦略立案編"
 
Transcriptome Sequenceによる遺伝子発現解析の実際
Transcriptome Sequenceによる遺伝子発現解析の実際Transcriptome Sequenceによる遺伝子発現解析の実際
Transcriptome Sequenceによる遺伝子発現解析の実際
 
Jaspug 2010
Jaspug 2010Jaspug 2010
Jaspug 2010
 

Kürzlich hochgeladen

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Kürzlich hochgeladen (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Kusarinoko: developing the public next generation sequencing data search interface that works.

  • 1. Kusarinoko: developing the public next generation sequencing data search interface that works. Tazro Ohta Database Center for Life Science Research Organization of Information and Systems
  • 2. Problems for NGS data archive managing large-scale data Kusarinoko project, for better way to search and browse metadata, fix and add Inside of Sequence Read Archive statistics of SRA reveals how it is Today’s topics
  • 3. Problems for NGS data archive
  • 4. Storing large-scale NGS data causes many problems data transfer, storage, backup... Metadata management is one big problem for public NGS database metadata : description of sequencing data. sample, sequencer platform, application, etc. Fixing metadata is a lifeline for public NGS database Cost of storing large-scale sequence data
  • 5. organism : mouse ATGCATGCATGCATGCATGCAT GCATGCATGCATGCATGCATGC : nervous cell cell ATGCATGCATGCATGCATGCAT GCATGCATGCATGATGCATGCA sequencer : 454 TGCATGCATGCATGCATGCATG CATGCATGCATGCATGCATGCA date : 2011 12 08 TGCATGATGCATCGATGCAATG CATGCATGCATGCATGCATGCA TGCATGCATGCATGCATGCATG CATGCATGCATGCAGCATGCAT GCATGCATGCATGCATGCATGC SRA ATGCATGCATGCATGCATGCAT Lab / Research institute DRA INSDC int’l nucleotide seq DB collaboration data exchange and sharing ATGCATGCATGCAT GCATGCATGCATGC ATGCATGCATGCAT data submission ATGCATGCATGCATGCATGCAT GCATGCATGCATGA GCATGCATGCATGCATGCATGC TGCATGCATGCATG ATGCATGCATGCATGCATGCAT CATGCATGCATGCA GCATGCATGCATGATGCATGCA Dat TGCATGATGCATCG TGCATGCATGCATGCATGCATG w/ metadata CATGCATGCATGCA Data ID : 000001 CATGCATGCATGCATGCATGCA org TGCATGCATGCATG TGCATGATGCATCGATGCAATG CATGCATGCATGCA CATGCATGCATGCATGCATGCA organism : mouse GCATGCATGCATGC cell TGCATGCATGCATGCATGCATG ATGCATGCATGCAT CATGCATGCATGCAGCATGCAT cell : nervous cell seq GCATGCATGCATGCATGCATGC ATGCATGCATGCATGCATGCAT sequencer : 454 date date : 2011 12 08 ENA Sequence Read Archive Public NGS database, Sequence Read Archive
  • 6. Over 55,000 submissions, over 350,000 sequence runs and still increasing amount and size of the data Metadata is provided apart, and is not described perfectly submission / study / experiment / sample / run Fixing metadata and adding extra information is NEEDED It cannot be easy to find the data you want
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Kusarinoko project, for better way to search and browse
  • 13. Cutting the cost of using public data of SRA search, browse, download, check Giving more resources to support using data is the data really sound? Aim of Kusarinoko project
  • 14. Study.xml Experiment.xml Submission.xml Sequence Data metadata Run.xml Sample.xml pubmed ID FastQC result get from sra.dbcls.jp calculate seq quality Submission.xml by FastQC integrate Kusarinoko Integrate metadata, add extra information
  • 15. Covering only the data which has at least one published article if a paper is not published yet, Kusarinoko cannot find it. publication info: sra.dbcls.jp Quality checking is still beta ver still on validating and trying to offer better information, will take more time Limitation and features
  • 16.
  • 17.
  • 18.
  • 19.
  • 21. Inside of Sequence Read Archive
  • 22. Statistics of SRA by publication and seq quality ONLY PUBLIC NGS DATA IN SRA WHICH HAS PUBLICATION Detailed stat will be available online at project website soon Statistics for stepping into SRA
  • 23. 2007~2011 number of submission Blue: Roche Yellow: Illumina Green: AB Pink: Helicos Red: PacBio platform trend statistics
  • 24. number of PubMed ID colored by Library type Blue: genomic Red: transcriptomic Brown: metagenomic Yellow: synthetic Purple: Viral RNA Green: non genomic total 97 journals (unidentified) 587 total # of pmid: Journal statistics
  • 25. quick quality calc; total average qual (phred) Blue: Roche Yellow: Illumina Green: AB Pink: Helicos Red: PacBio same as max read length total # of items (continuing) (run): 16,006 minimum read length vs average quality value
  • 26. total N content rate; no correlation with number of reads, library prep methods total # of items (continuing) (run): 16,006 total number of reads vs N content
  • 27. total sequence duplication same as previous stat amount of reads seems not to effect duplication total # of items (continuing) (run): 16,006 total number of reads vs duplication rate
  • 29. Developed a service to help searching and browsing SRA data publication information and result of quality check support the metadata. Statistics revealed the inside of SRA and gave some insights showed NGS trends, and some items don’t have enough quality even if it has a published article. Detailed results and more at poster presentation: 2P-0132 (today) Conclusion: for making use of public resources

Hinweis der Redaktion

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n