SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Lessons Learned From a Year's Worth of Benchmarks of Large Data Clouds Yunhong Gu and Robert L Grossman Laboratory for Advanced Computing University of Illinois at Chicago November 16, 2009
Part 1.  Overview of Sector http://sector.sourceforge.net
Sector Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Google’s Large Data Cloud Storage Services Data Services Compute Services Google’s Stack Applications Google File System (GFS) Google’s MapReduce Google’s BigTable
Hadoop’s Large Data Cloud Storage Services Compute Services Hadoop’s Stack Applications Hadoop Distributed File System (HDFS) Hadoop’s MapReduce Data Services
Sector’s Large Data Cloud Storage Services Compute Services Sector’s Stack Applications Sector’s Distributed File System (SDFS) Sphere’s UDFs Routing &  Transport Services UDP-based Data Transport Protocol (UDT) Data Services
Open Cloud Testbed (2009) ,[object Object],[object Object],[object Object],[object Object],MREN CENIC Dragon ,[object Object],[object Object],[object Object],[object Object],C-Wave
What I Learned ,[object Object],[object Object],[object Object]
Cloud Interoperability ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Open Virtualization  Format (OVF ) Open Cloud Computing Interface (OCCI) SNIA Cloud Data Management Interface (CDMI) Large Data Cloud Interoperability Framework OCC Large Data Cloud Interoperability Framework
Raywulf Cluster ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part 2.  Sector Architecture Data intensive computing that scales to a data center.
Sector Architecture
Sector DFS vs Hadoop DFS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Sphere UDF ,[object Object],[object Object],[object Object],for (int i = 0; i < total_segment_num; ++ i) UDF(segment[i]); for (int i = 0; i < total_segment_num; ++ i) UDF(segment_A[i], segment_B[i]);
Sphere UDF vs. MapReduce Sphere UDF MapReduce Record Offset Index Parser / Input Reader UDF Map Bucket Partition - Compare UDF Reduce - Output Writer
Comparing Sector and Hadoop Hadoop Sector Storage Cloud Block-based file system File-based Programming Model MapReduce UDF & MapReduce Protocol TCP UDP-based protocol (UDT) Replication At write At write or period. Security Not yet HIPAA capable Language Java C++
Balancing Data Flow
Terasort Benchmark Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks. 1 Rack 2 Racks 3 Racks 4 Racks Nodes 32 64 96 128 Cores 128 256 384 512 Hadoop 85m 49s 37m 0s 25m 14s 17m 45s Sector 28m 25s 15m 20s 10m 19s 7m 56s Speed up 3.0 2.4 2.4 2.2
MalStone (OCC Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack.  Data consisted of 20 nodes with 500 million 100-byte records / node. MalStone A MalStone B Hadoop 455m 13s 840m 50s Hadoop streaming with Python 87m 29s 142m 32s Sector/Sphere 33m 40s 43m 44s Speed up (Sector v Hadoop) 13.5x 19.2x
Lessons Learned ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
For More Information ,[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Slide 1
Slide 1Slide 1
Slide 1
butest
 

Was ist angesagt? (20)

OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
Bioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9pBioclouds CAMDA (Robert Grossman) 09-v9p
Bioclouds CAMDA (Robert Grossman) 09-v9p
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting LiStanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
Stanford/SLAC Cryo-EM Computing and Storage, Yee-Ting Li
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
OpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences DataOpenTopography - Scalable Services for Geosciences Data
OpenTopography - Scalable Services for Geosciences Data
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Earth Science Platform
Earth Science PlatformEarth Science Platform
Earth Science Platform
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Slide 1
Slide 1Slide 1
Slide 1
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 

Ähnlich wie Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert Grossman) 09-v3p

sector-sphere
sector-spheresector-sphere
sector-sphere
xlight
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
lilyco
 
spectrum Storage Whitepaper
spectrum Storage Whitepaperspectrum Storage Whitepaper
spectrum Storage Whitepaper
Carina Kordan
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Andrey Vykhodtsev
 

Ähnlich wie Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert Grossman) 09-v3p (20)

Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
 
My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
Sector Cloudcom Tutorial
Sector Cloudcom TutorialSector Cloudcom Tutorial
Sector Cloudcom Tutorial
 
HPCC Systems Engineering Summit Presentation - Improving Thor Data Loading us...
HPCC Systems Engineering Summit Presentation - Improving Thor Data Loading us...HPCC Systems Engineering Summit Presentation - Improving Thor Data Loading us...
HPCC Systems Engineering Summit Presentation - Improving Thor Data Loading us...
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
spectrum Storage Whitepaper
spectrum Storage Whitepaperspectrum Storage Whitepaper
spectrum Storage Whitepaper
 
Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)Open Cloud Consortium: An Update (04-23-10, v9)
Open Cloud Consortium: An Update (04-23-10, v9)
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Windows Azure: Lessons From The Field
Windows Azure: Lessons From The FieldWindows Azure: Lessons From The Field
Windows Azure: Lessons From The Field
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Sector Vs Hadoop
Sector Vs HadoopSector Vs Hadoop
Sector Vs Hadoop
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
PyModESt: A Python Framework for Staging of Geo-referenced Data on the Coll...
 

Mehr von Robert Grossman

Mehr von Robert Grossman (20)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
 

Kürzlich hochgeladen

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert Grossman) 09-v3p

  • 1. Lessons Learned From a Year's Worth of Benchmarks of Large Data Clouds Yunhong Gu and Robert L Grossman Laboratory for Advanced Computing University of Illinois at Chicago November 16, 2009
  • 2. Part 1. Overview of Sector http://sector.sourceforge.net
  • 3.
  • 4. Google’s Large Data Cloud Storage Services Data Services Compute Services Google’s Stack Applications Google File System (GFS) Google’s MapReduce Google’s BigTable
  • 5. Hadoop’s Large Data Cloud Storage Services Compute Services Hadoop’s Stack Applications Hadoop Distributed File System (HDFS) Hadoop’s MapReduce Data Services
  • 6. Sector’s Large Data Cloud Storage Services Compute Services Sector’s Stack Applications Sector’s Distributed File System (SDFS) Sphere’s UDFs Routing & Transport Services UDP-based Data Transport Protocol (UDT) Data Services
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. Part 2. Sector Architecture Data intensive computing that scales to a data center.
  • 13.
  • 14.
  • 15. Sphere UDF vs. MapReduce Sphere UDF MapReduce Record Offset Index Parser / Input Reader UDF Map Bucket Partition - Compare UDF Reduce - Output Writer
  • 16. Comparing Sector and Hadoop Hadoop Sector Storage Cloud Block-based file system File-based Programming Model MapReduce UDF & MapReduce Protocol TCP UDP-based protocol (UDT) Replication At write At write or period. Security Not yet HIPAA capable Language Java C++
  • 18. Terasort Benchmark Sector/Sphere 1.24a, Hadoop 0.20.1 with no replication on Phase 2 of Open Cloud Testbed with co-located racks. 1 Rack 2 Racks 3 Racks 4 Racks Nodes 32 64 96 128 Cores 128 256 384 512 Hadoop 85m 49s 37m 0s 25m 14s 17m 45s Sector 28m 25s 15m 20s 10m 19s 7m 56s Speed up 3.0 2.4 2.4 2.2
  • 19. MalStone (OCC Benchmark) Sector/Sphere 1.20, Hadoop 0.18.3 with no replication on Phase 1 of Open Cloud Testbed in a single rack. Data consisted of 20 nodes with 500 million 100-byte records / node. MalStone A MalStone B Hadoop 455m 13s 840m 50s Hadoop streaming with Python 87m 29s 142m 32s Sector/Sphere 33m 40s 43m 44s Speed up (Sector v Hadoop) 13.5x 19.2x
  • 20.
  • 21.