Critical Run files can be missing/corrupt after the Run folder was transferred from the HiSeq storage to the cluster storage. This presentation discusses the issue and suggests four workarounds.
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
The missing data issue for HiSeq runs
1. The missing data issue and the data resurrection miracle [ElCierne ] December 10, 2010
2. What is the missing data issue Critical Run files are missing/corrupt after the Run folder was transferred from the HiSeq storage to the cluster storage Consequence Config.xmlmight need to be corrected Missing *.bcl, *.stats can be recreated Missing *.filter, *.pos.txtcauses theloss of a tile December 10, 2010
3. What causes the missing data issue? Files are not transferred correctly Millisecond hang-ups of the network, which are not recognized by windows RTA did not generate files in the first place HiSeq computer overload Mismanagement of parallel threads (two processes accessing the same file) December 10, 2010
4. Why is it an issue? Usual workflow crashes: bclConverter does not proceed if there are missing files. December 10, 2010
5. Solutions to recoverable missing data issues 1 2 3 4 Copy .stats from the same tile of a different cycle PRO: fast CON: fudge, trusts RTA, requires separate workflow for missing *.bcl files Recalculate *.stats from *.dif, *.filter and *.bcl (Sanger) PRO: accurate & fast CON: requires separate workflow for missing *.bcl files, trusts RTA Calculate *.qseqfrom *.cif for missing tile (QBI) PRO: handles missing *.stats, *.bcl CON: slow, trusts RTA Calculate *.qseqfrom *.cif for all tiles PRO: handles missing *.stats, *.bcl, recalculates all – no usage of potentially corrupt RTA bcl/stats files CON: slow (days) December 10, 2010
6. New workflow with OLB Identify missing files, calculate qseq for them and merge with the qseqs from the normal workflow to proceed December 10, 2010
7. Details: If *.stats or *.bcl was missing Start offline base caller (OLB) for the missing tiles Comment out missing tile in config.xml and start bclConverter to convert intact tiles (or use setupBclToQseq + bcl2qseq directly with --ignore-missing-bcl or --ignore-missing-stats) Merge *.qseqgenerated from OLB and bclConverter in one directory (BaseCalls_<date>_<user>) Start GERALD to convert to fastq (_sequence.txt) December 10, 2010
8. Solution requires .cifs to be saved Intensity files (*.cif) are not stored by default Remember to tick the safe intensity box when starting a run Or make it default: In c:/illumina/HiSeqControlSoftware/RTA/RTA.exe.config add <add key="DeleteIntensityFiles" value="0" /> December 10, 2010
9. Acknowledgement Thanks to Dr. Steven Leonard, Informatics Division, The Sanger Institute. Eugene, illumina tech-support. December 10, 2010
ILLUMINA:The bclToQseq converter only needs them to pass forward the cluster position information and the intensity averages. The former stays unchanded from one cycle to the next within the same tile, and the latter is only used for building IVC plots. So, the effect of replacing one file with a copy form another cycle will be an IVC plot that's not 100% accurate at the given tile/cycle. Since you would normally be interested in avegaes across all tiles, the effect of this is really minimal. Still, this is just a workaround and certainly not a long term solution.