SlideShare ist ein Scribd-Unternehmen logo
1 von 9
The missing data issue and the data resurrection miracle [ElCierne ] December 10, 2010
What is the missing data issue Critical Run files are missing/corrupt after the Run folder was transferred from the HiSeq storage to the cluster storage Consequence Config.xmlmight need to be corrected Missing *.bcl, *.stats can be recreated Missing *.filter, *.pos.txtcauses theloss of a tile December 10, 2010
What causes the missing data issue? Files are not transferred correctly Millisecond hang-ups of the network, which are not recognized by windows RTA did not generate files in the first place HiSeq computer overload Mismanagement of parallel threads (two processes accessing the same file) December 10, 2010
Why is it an issue? Usual workflow crashes: bclConverter does not proceed if there are missing files. December 10, 2010
Solutions to recoverable missing data issues 1 2 3 4 Copy .stats from the same tile of a different cycle PRO: fast  CON: fudge, trusts RTA, requires separate workflow for missing *.bcl files Recalculate *.stats from *.dif, *.filter and *.bcl (Sanger) PRO: accurate & fast CON: requires separate workflow for missing *.bcl files, trusts RTA Calculate *.qseqfrom *.cif for missing tile (QBI) PRO: handles missing *.stats, *.bcl CON: slow, trusts RTA Calculate *.qseqfrom *.cif for all tiles PRO: handles missing *.stats, *.bcl, recalculates all – no usage of potentially corrupt RTA bcl/stats files CON: slow (days) December 10, 2010
New workflow with OLB Identify missing files, calculate qseq for them and merge with the qseqs from the normal workflow to proceed December 10, 2010
Details: If *.stats or *.bcl was missing Start offline base caller (OLB) for the missing tiles Comment out missing tile in config.xml and start bclConverter to convert intact tiles (or use setupBclToQseq + bcl2qseq directly with --ignore-missing-bcl or --ignore-missing-stats) Merge *.qseqgenerated from OLB and bclConverter in one directory (BaseCalls_<date>_<user>) Start GERALD to convert to fastq (_sequence.txt) December 10, 2010
Solution requires .cifs to be saved  Intensity files (*.cif) are not stored by default Remember to tick the safe intensity box when starting a run Or make it default: In c:/illumina/HiSeqControlSoftware/RTA/RTA.exe.config add <add key="DeleteIntensityFiles" value="0" />  December 10, 2010
Acknowledgement Thanks to  Dr. Steven Leonard, Informatics Division, The Sanger Institute.  Eugene, illumina tech-support. December 10, 2010

Weitere ähnliche Inhalte

Was ist angesagt?

An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...Jeff Yang
 
20140513_jeffyang_demo_openstack
20140513_jeffyang_demo_openstack20140513_jeffyang_demo_openstack
20140513_jeffyang_demo_openstackJeff Yang
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDeltares
 
Datafying Bitcoins
Datafying BitcoinsDatafying Bitcoins
Datafying BitcoinsTariq Ahmad
 
IPTC News Exchange Working Group 2013 Autumn Meeting
IPTC News Exchange Working Group 2013 Autumn MeetingIPTC News Exchange Working Group 2013 Autumn Meeting
IPTC News Exchange Working Group 2013 Autumn MeetingStuart Myles
 
Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?Ali LeClerc
 
SqliteToRealm
SqliteToRealmSqliteToRealm
SqliteToRealmPluu love
 
All you need to know about Kotlin's documentation engine Dokka
All you need to know about Kotlin's documentation engine Dokka All you need to know about Kotlin's documentation engine Dokka
All you need to know about Kotlin's documentation engine Dokka Florian Benz
 
MongoDB IoT City Tour EINDHOVEN: Managing the Database Complexity
MongoDB IoT City Tour EINDHOVEN: Managing the Database ComplexityMongoDB IoT City Tour EINDHOVEN: Managing the Database Complexity
MongoDB IoT City Tour EINDHOVEN: Managing the Database ComplexityMongoDB
 
Marian Marinov Clusters With Glusterfs
Marian Marinov Clusters With GlusterfsMarian Marinov Clusters With Glusterfs
Marian Marinov Clusters With GlusterfsOntico
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
FTS middleware doc.
FTS middleware doc.FTS middleware doc.
FTS middleware doc.chopkins19
 
Intro to mongodb mongouk jun2010
Intro to mongodb mongouk jun2010Intro to mongodb mongouk jun2010
Intro to mongodb mongouk jun2010Skills Matter
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best PracticesJason Terpko
 
Cloud Optimized GeotTIFFs: enabling efficient cloud workflows
Cloud Optimized GeotTIFFs: enabling efficient cloud workflows Cloud Optimized GeotTIFFs: enabling efficient cloud workflows
Cloud Optimized GeotTIFFs: enabling efficient cloud workflows Eugene Cheipesh
 
Ui5 con@Banglore - UI5 App with Offline Storage using PouchDB
Ui5 con@Banglore - UI5 App with Offline Storage using PouchDBUi5 con@Banglore - UI5 App with Offline Storage using PouchDB
Ui5 con@Banglore - UI5 App with Offline Storage using PouchDBGAURAV SHROFF
 
Improving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois TigeotImproving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois Tigeoteurobsdcon
 
LiteDB - A .NET NoSQL Document Store in a single data file
LiteDB - A .NET NoSQL Document Store in a single data fileLiteDB - A .NET NoSQL Document Store in a single data file
LiteDB - A .NET NoSQL Document Store in a single data fileLarry Nung
 
Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Dave Gardner
 
Ruby,no sql and tokyocabinet
Ruby,no sql and tokyocabinetRuby,no sql and tokyocabinet
Ruby,no sql and tokyocabinetbiaowei zhuang
 

Was ist angesagt? (20)

An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
An High Available Database for OpenStack Cloud Production by Pacemaker, Coros...
 
20140513_jeffyang_demo_openstack
20140513_jeffyang_demo_openstack20140513_jeffyang_demo_openstack
20140513_jeffyang_demo_openstack
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De Boer
 
Datafying Bitcoins
Datafying BitcoinsDatafying Bitcoins
Datafying Bitcoins
 
IPTC News Exchange Working Group 2013 Autumn Meeting
IPTC News Exchange Working Group 2013 Autumn MeetingIPTC News Exchange Working Group 2013 Autumn Meeting
IPTC News Exchange Working Group 2013 Autumn Meeting
 
Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?Level 101 for Presto: What is PrestoDB?
Level 101 for Presto: What is PrestoDB?
 
SqliteToRealm
SqliteToRealmSqliteToRealm
SqliteToRealm
 
All you need to know about Kotlin's documentation engine Dokka
All you need to know about Kotlin's documentation engine Dokka All you need to know about Kotlin's documentation engine Dokka
All you need to know about Kotlin's documentation engine Dokka
 
MongoDB IoT City Tour EINDHOVEN: Managing the Database Complexity
MongoDB IoT City Tour EINDHOVEN: Managing the Database ComplexityMongoDB IoT City Tour EINDHOVEN: Managing the Database Complexity
MongoDB IoT City Tour EINDHOVEN: Managing the Database Complexity
 
Marian Marinov Clusters With Glusterfs
Marian Marinov Clusters With GlusterfsMarian Marinov Clusters With Glusterfs
Marian Marinov Clusters With Glusterfs
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
FTS middleware doc.
FTS middleware doc.FTS middleware doc.
FTS middleware doc.
 
Intro to mongodb mongouk jun2010
Intro to mongodb mongouk jun2010Intro to mongodb mongouk jun2010
Intro to mongodb mongouk jun2010
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best Practices
 
Cloud Optimized GeotTIFFs: enabling efficient cloud workflows
Cloud Optimized GeotTIFFs: enabling efficient cloud workflows Cloud Optimized GeotTIFFs: enabling efficient cloud workflows
Cloud Optimized GeotTIFFs: enabling efficient cloud workflows
 
Ui5 con@Banglore - UI5 App with Offline Storage using PouchDB
Ui5 con@Banglore - UI5 App with Offline Storage using PouchDBUi5 con@Banglore - UI5 App with Offline Storage using PouchDB
Ui5 con@Banglore - UI5 App with Offline Storage using PouchDB
 
Improving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois TigeotImproving DragonFly's performance with PostgreSQL by Francois Tigeot
Improving DragonFly's performance with PostgreSQL by Francois Tigeot
 
LiteDB - A .NET NoSQL Document Store in a single data file
LiteDB - A .NET NoSQL Document Store in a single data fileLiteDB - A .NET NoSQL Document Store in a single data file
LiteDB - A .NET NoSQL Document Store in a single data file
 
Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011
 
Ruby,no sql and tokyocabinet
Ruby,no sql and tokyocabinetRuby,no sql and tokyocabinet
Ruby,no sql and tokyocabinet
 

Ähnlich wie The missing data issue for HiSeq runs

HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valkhvdvalk
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1Hassy Veldstra
 
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoringBde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoringBigData_Europe
 
Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2nesmaddy
 
342557457-Tcs-d-DBA-Rac.ppt
342557457-Tcs-d-DBA-Rac.ppt342557457-Tcs-d-DBA-Rac.ppt
342557457-Tcs-d-DBA-Rac.pptNithinRoy12
 
What’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data managementWhat’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data managementAlluxio, Inc.
 
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSeeQuality.net
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Groupjbellis
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under controlMarcin Przepiórowski
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stackHajime Tazaki
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
 
FAQ on Dedupe NetApp
FAQ on Dedupe NetAppFAQ on Dedupe NetApp
FAQ on Dedupe NetAppAshwin Pawar
 
Demystifying the Microsoft Extended FAT File System (exFAT)
Demystifying the Microsoft Extended FAT File System (exFAT)Demystifying the Microsoft Extended FAT File System (exFAT)
Demystifying the Microsoft Extended FAT File System (exFAT)overcertified
 
Kotlin + spring boot = decision making platform
Kotlin + spring boot = decision making platformKotlin + spring boot = decision making platform
Kotlin + spring boot = decision making platformAndrei Chernyshev
 
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expre...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expre...Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expre...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expre...Alex Zaballa
 
2 architecture anddatastructures
2 architecture anddatastructures2 architecture anddatastructures
2 architecture anddatastructuresSolin TEM
 
Tickling CGI Problems (Tcl Web Server Scripting Vulnerability Research)
Tickling CGI Problems (Tcl Web Server Scripting Vulnerability Research)Tickling CGI Problems (Tcl Web Server Scripting Vulnerability Research)
Tickling CGI Problems (Tcl Web Server Scripting Vulnerability Research)Derek Callaway
 
Distributed parallel architecture for big data
Distributed parallel architecture for big dataDistributed parallel architecture for big data
Distributed parallel architecture for big datakamicool13
 

Ähnlich wie The missing data issue for HiSeq runs (20)

HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1
 
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoringBde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
Bde sc3 2nd_workshop_2016_10_04_p05_bde_system_monitoring
 
Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2Steps for upgrading the database to 10g release 2
Steps for upgrading the database to 10g release 2
 
Tool Time
Tool TimeTool Time
Tool Time
 
342557457-Tcs-d-DBA-Rac.ppt
342557457-Tcs-d-DBA-Rac.ppt342557457-Tcs-d-DBA-Rac.ppt
342557457-Tcs-d-DBA-Rac.ppt
 
What’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data managementWhat’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data management
 
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance TuningSQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
SQLDAY 2023 Chodkowski Adrian Databricks Performance Tuning
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Group
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
Frits Hoogland - About multiblock reads
Frits Hoogland - About multiblock readsFrits Hoogland - About multiblock reads
Frits Hoogland - About multiblock reads
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
FAQ on Dedupe NetApp
FAQ on Dedupe NetAppFAQ on Dedupe NetApp
FAQ on Dedupe NetApp
 
Demystifying the Microsoft Extended FAT File System (exFAT)
Demystifying the Microsoft Extended FAT File System (exFAT)Demystifying the Microsoft Extended FAT File System (exFAT)
Demystifying the Microsoft Extended FAT File System (exFAT)
 
Kotlin + spring boot = decision making platform
Kotlin + spring boot = decision making platformKotlin + spring boot = decision making platform
Kotlin + spring boot = decision making platform
 
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expre...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expre...Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expre...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expre...
 
2 architecture anddatastructures
2 architecture anddatastructures2 architecture anddatastructures
2 architecture anddatastructures
 
Tickling CGI Problems (Tcl Web Server Scripting Vulnerability Research)
Tickling CGI Problems (Tcl Web Server Scripting Vulnerability Research)Tickling CGI Problems (Tcl Web Server Scripting Vulnerability Research)
Tickling CGI Problems (Tcl Web Server Scripting Vulnerability Research)
 
Distributed parallel architecture for big data
Distributed parallel architecture for big dataDistributed parallel architecture for big data
Distributed parallel architecture for big data
 

Mehr von Denis C. Bauer

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Denis C. Bauer
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteDenis C. Bauer
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...Denis C. Bauer
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisDenis C. Bauer
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingDenis C. Bauer
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysisDenis C. Bauer
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Denis C. Bauer
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variantsDenis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Denis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDenis C. Bauer
 

Mehr von Denis C. Bauer (20)

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynote
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome Sequencing
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genome
 
ReliF
ReliFReliF
ReliF
 

Kürzlich hochgeladen

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Kürzlich hochgeladen (20)

Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

The missing data issue for HiSeq runs

  • 1. The missing data issue and the data resurrection miracle [ElCierne ] December 10, 2010
  • 2. What is the missing data issue Critical Run files are missing/corrupt after the Run folder was transferred from the HiSeq storage to the cluster storage Consequence Config.xmlmight need to be corrected Missing *.bcl, *.stats can be recreated Missing *.filter, *.pos.txtcauses theloss of a tile December 10, 2010
  • 3. What causes the missing data issue? Files are not transferred correctly Millisecond hang-ups of the network, which are not recognized by windows RTA did not generate files in the first place HiSeq computer overload Mismanagement of parallel threads (two processes accessing the same file) December 10, 2010
  • 4. Why is it an issue? Usual workflow crashes: bclConverter does not proceed if there are missing files. December 10, 2010
  • 5. Solutions to recoverable missing data issues 1 2 3 4 Copy .stats from the same tile of a different cycle PRO: fast CON: fudge, trusts RTA, requires separate workflow for missing *.bcl files Recalculate *.stats from *.dif, *.filter and *.bcl (Sanger) PRO: accurate & fast CON: requires separate workflow for missing *.bcl files, trusts RTA Calculate *.qseqfrom *.cif for missing tile (QBI) PRO: handles missing *.stats, *.bcl CON: slow, trusts RTA Calculate *.qseqfrom *.cif for all tiles PRO: handles missing *.stats, *.bcl, recalculates all – no usage of potentially corrupt RTA bcl/stats files CON: slow (days) December 10, 2010
  • 6. New workflow with OLB Identify missing files, calculate qseq for them and merge with the qseqs from the normal workflow to proceed December 10, 2010
  • 7. Details: If *.stats or *.bcl was missing Start offline base caller (OLB) for the missing tiles Comment out missing tile in config.xml and start bclConverter to convert intact tiles (or use setupBclToQseq + bcl2qseq directly with --ignore-missing-bcl or --ignore-missing-stats) Merge *.qseqgenerated from OLB and bclConverter in one directory (BaseCalls_<date>_<user>) Start GERALD to convert to fastq (_sequence.txt) December 10, 2010
  • 8. Solution requires .cifs to be saved Intensity files (*.cif) are not stored by default Remember to tick the safe intensity box when starting a run Or make it default: In c:/illumina/HiSeqControlSoftware/RTA/RTA.exe.config add <add key="DeleteIntensityFiles" value="0" /> December 10, 2010
  • 9. Acknowledgement Thanks to Dr. Steven Leonard, Informatics Division, The Sanger Institute. Eugene, illumina tech-support. December 10, 2010

Hinweis der Redaktion

  1. http://new.taringa.net/posts/info/7202836/Sherlock-Holmes.html
  2. ILLUMINA:The bclToQseq converter only needs them to pass forward the cluster position information and the intensity averages. The former stays unchanded from one cycle to the next within the same tile, and the latter is only used for building IVC plots. So, the effect of replacing one file with a copy form another cycle will be an IVC plot that&apos;s not 100% accurate at the given tile/cycle. Since you would normally be interested in avegaes across all tiles, the effect of this is really minimal. Still, this is just a workaround and certainly not a long term solution.