Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Open stack summit-2015-dp

  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Open stack summit-2015-dp

  1. 1. If you build it, will they come? Dirk Petersen, Scientific Computing Director, Fred Hutchinson Cancer Research Center Joe Arnold, Chief Product Officer, President SwiftStack Using OpenStack Swift to build a large scale active archive in a Scientific Computing environment
  2. 2. Challenge Need an archive to offload expensive storage ❖ Lost cost storage ❖ High throughput: Load large genome files HPC ❖ Faster and lower cost than S3 & no proprietary lock-in.
  3. 3. About Fred Hutch ❖ Cancer & HIV research ❖ 3 Nobel Laureates ❖ $430M budget / 85% NIH funding ❖ 2,700 employees ❖ Conservative use of information technology
  4. 4. IT at Fred Hutch ❖ Multiple data centers with >1000kw capacity ❖ 100 staff in Center IT plus divisional IT ❖ Team of 3 Sysadmins to support storage ❖ IT funded by indirects (F&A) ❖ Storage Chargebacks started Nov 2014 ❖ 1.03 PUE, natural air cooled Inside Fred Hutch data center
  5. 5. About SwiftStack ❖ Object Storage software ❖ Build with OpenStack Swift ❖ SwiftStack is leading contributor and Project Technical Lead ❖ Software-defined storage platform for object storage Security Authentication & Authorization SwiftStack Storage Clusters Runtime Agents include: Load balancing, Monitoring, Utilization, Device Inventory OS HW OS HW OS HW OS HW OS HW OS HW Swift Object Storage Engine NFS/CIFSSwift API Device & Node Management Datacenter 1 Datacenter 2 Datacenter 3 User Dashboard … OS HW OS HW OS HW Out of Band, Software-Defined Controller SwiftStack Controller
  6. 6. SwiftStack Resources https://swiftstack.com/books/
  7. 7. Researchers concerned about …. ❖ Significant storage costs – $40/TiB/month chargebacks (first 5 TB is free) and declining grant funding ❖ “If you charge us please give us some cheap storage for old and big files” ❖ (Mis)perception on storage value (I can buy a hard drive at BestBuy) Not what you want: Unsecured and unprotected external USB storage
  8. 8. Finance concerned about …. ❖ Cost predictability and scale ❖ Data growth causes storage costing up to $1M per year ❖ Genomics data grows at 40%/Y and chargebacks don’t cover all costs ❖ Expensive forklift upgrades every few years ❖ The public cloud (e.g. Amazon S3) set new transparent cost benchmark.
  9. 9. How much does it cost? ❖ Only small changes vs 2014 ❖ Kryder’s law obsolete at <15%/Y ? ❖ Swift now down to Glacier cost (hardware down to $3 / TB / month) ❖ No price reductions in the cloud ❖ 4TB (~$120) and 6TB (~$250) drives cost the same ❖ Do you want a fault domain of 144TB or 216TB in your storage servers ❖ Don’t save on CPU / Erasure Code is coming ! 40 28 26 11 0 10 20 30 40 50 NAS Amazon S3 Google Swiftstack $/TB/Mo AWS EFS is $300/TB/Mo
  10. 10. Economy File in production in 2014 ❖ Chargebacks drove the Hutch to embrace more economical storage ❖ Selected Swift object storage managed by SwiftStack ❖ Go-live in 2014, strong interest and expansion in 2015 ❖ Researchers do not want to pay the price for standard enterprise storage
  11. 11. Chargebacks spike Swift utilization! ❖ Started storage chargebacks on Nov 1st ❖ Triggered strong growth in October ❖ Users sought to avoid high cost of enterprise NAS and put as much as possible into lower cost Swift ❖ Underestimated success of Swift ❖ Needed to stop migration to buy more hardware ❖ Can migrate 30+ TB per day today
  12. 12. Standard Hardware ❖ Supermicro with Silicon Mechanics ❖ 2.1PB raw capacity; ~700TB usable ❖ No RAID controllers; no storage lost to RAID ❖ Seagate SATA drives (desktop) ❖ 2 x 120GB Intel S3700 SSDs; OS + metadata ❖ 10Gb Base-T connectivity ❖ (2) Intel Xeon E5 CPUs ❖ 64GB RAM
  13. 13. Management of OpenStack Swift using SwiftStack ❖ Out-of-band management controller ❖ SwiftStack provides control & visibility ❖ Monitoring and stats at cluster, node, and drive levels ❖ Authentication & Authorization ❖ Capacity & Utilization Management via Quotas and Rate Limits ❖ Alerting, & Diagnostics
  14. 14. SwiftStack Automation ❖ Deployment automation ❖ Let us roll out Swift nodes in 10 minutes ❖ Upgrading Swift across clusters with 1 click ❖ 0.25 FTE to manage cluster
  15. 15. Supporting Scientific Computing Workloads HPC Use Cases & Tools
  16. 16. HPC Requirements ❖ High Aggregate throughput ❖ Current network architecture is bottleneck ❖ Many parallel streams used to max out throughput ❖ Ideal for HPC cluster architecture
  17. 17. Not a Filesystem No traditional file system hierarchy, we just have containers, that can contain millions of objects (aka files) Huh, no sub-directories? But how the heck can I upload my uber-complex bioinformatics file system with 11 folder hierarchies to Swift?
  18. 18. Filesystem Mapping with Swift We simulate the hierarchical structure by simply putting forward slashes (/) in the object name (or file name) ❖ So, how do you actually copy a folder? ❖ However, the Swift client is frequently used, well supported, maintained and really fast !! $ swift upload --changed --segment- size=2G --use-slo --object- name=“pseudo/folder" “container" " /my/local/folder" Really? Can’t we get this a little easier?
  19. 19. Introducing Swift Commander ❖ Swift Commander, a simple shell wrapper for the Swift client, curl and some other tools makes working with Swift very easy. ❖ Sub commands such as swc ls, swc cd, swc rm, swc more give you a feel that is quite similar to a Unix file system ❖ Actively maintained and available at: ❖ https://github.com/FredHutch/Swift- commander/ $ swc upload /my/posix/folder /my/Swift/folder $ swc compare /my/posix/folder /my/Swift/folder $ swc download /my/Swift/folder /my/scratch/fs Much easier… Some additional examples
  20. 20. Swift Commander + Metadata ❖ Didn’t someone say that object storage systems were great at using metadata? ❖ Yes, and you can just add a few key:value pairs as upload argument: ❖ Query the meta data via swc, or use an external search engine such as elastic search $ swc meta /my/Swift/folder Meta Cancer: breast Meta Collaborators: jill,joe,jim Meta Project: grant-xyz $ swc upload /my/posix/folder /my/Swift/folder project:grant-xyz collaborators:jill,joe,jim cancer:breast
  21. 21. Integrating with HPC ❖ Integrating Swift in HPC workflows is not really hard ❖ Example, running samtools using persistent scratch space (files deleted if not accessed for 30 days) If ! [[ -f /fh/scratch/delete30/pi/raw/genome.bam ]]; then swc download /Swiftfolder/genome.bam /fh/scratch/delete30/raw/genome.bam fi samtools view -F 0xD04 -c /fh/scratch/delete30/pi/raw/genome.bam > otherfile A complex 50 line HPC submission script prepping a GATK workflow requires just 3 more lines !!
  22. 22. Other HPC Integrations ❖ Use HPC system to download lots of bam files in parallel ❖ 30 cluster jobs run in parallel on 30 1G nodes (which is my HPC limit) ❖ My scratch file system says it loads data at 1.4 GB/s ❖ This means that each bam file is downloaded at 47 MB/s on average and downloading this dataset of 1.2 TB takes 14 min $ swc ls /Ext/seq_20150112/ > bamfiles.txt $ while read FILE; do $ sbatch -N1 -c4 --wrap="swc download /Ext/seq_20150112/$FILE ."; $ done < bamfiles.txt $ squeue -u petersen JOBID PARTITION NAME USER ST TIME NODES NODELIST 17249368 campus sbatch petersen R 15:15 1 gizmof120 17249371 campus sbatch petersen R 15:15 1 gizmof123 17249378 campus sbatch petersen R 15:15 1 gizmof130 $ fhgfs-ctl --userstats --names --interval=5 --nodetype=storage ====== 10 s ====== Sum: 13803 [sum] 13803 [ops-wr] 1380.300 [MiB-wr/s] petersen 13803 [sum] 13803 [ops-wr] 1380.300 [MiB-wr/s]
  23. 23. Swift Commander + Small Files So, we could tar up this entire directory structure… but then we have one giant tar ball Solution: tar up sub dirs in one file but create a tar ball for each level eg. /folder1/folder2/folder3 restoring folder2 and below we just need folder2.tar.gz + folder3.tar.gz $ swc arch /my/posix/folder /my/Swift/folder $ swc unarch /my/Swift/folder /my/scratch/fs It’s available at https://github.com/FredHutch/Swift-commander/blob/master/bin/swbundler.py It’s Easy It’s Fast ❖ Archiving uses multiple processes, measured up to 400 MB/s from one Linux box. ❖ Each process uses pigz multithreaded gzip compression (Example: compressing 1GB DNA string down to 272MB: 111 sec using gzip, 5 seconds using pigz) ❖ Restore can use standard gzip
  24. 24. Desktop Clients & Collaboration ❖ Reality: Every archive requires access via GUI tools ❖ Requirements ❖ Easy to use ❖ Do not create any proprietary data structures in Swift that cannot be read by other tools Cyberduck desktop client running in windows
  25. 25. Desktop Clients & Collaboration ❖ Another example: ExpanDrive and Storage Made Easy ❖ Works with Windows and Mac ❖ Integrates in Mac Finder and is mountable as a drive in Windows
  26. 26. rclone: mass copy, backup, data migration ❖ rclone is a multithreaded data copy / mirror tool ❖ Consistent performance on Linux, Mac and Windows ❖ E.g. keep a mirror of Synology workgroup NAS (QNAP has a builtin swift mirror option) ❖ Data remains accessible by swc, desktop clients ❖ Mirror protected by swift undelete (currently 60 days retention)
  27. 27. Galaxy: Scientific Workflow Management ❖ Galaxy web based high throughput computing at the Hutch uses Swift as primary storage in production today ❖ SwiftStack patches contributed to Galaxy Project ❖ Swift allows to delegate “root” access to bioinformaticians ❖ Integrated with Slurm HPC scheduler: automatically assigns default PI account for each user
  28. 28. Summary Discovery is driven by technologies that generate larger and larger datasets ❖ Object storage ideal for ❖ Ever-growing data volumes ❖ High throughput required for HPC ❖ Faster and lower cost than S3 & no proprietary lock-in
  29. 29. Thank you! Dirk Petersen, Scientific Computing Director, Fred Hutchinson Cancer Research Center Joe Arnold, Chief Product Officer, President SwiftStack

×