SlideShare ist ein Scribd-Unternehmen logo
1 von 20
www.opencloudconsortium.org Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, YunhongGu, and Andrew LevineOpen Cloud Consortium June 21, 2010
Project Matsu Goals Provide persistent data resources and elastic computing to assist in disasters: Make imagery available for disaster relief workers Elastic computing for large scale image processing Change detection for temporally different and geospatially identical image sets Provide a resource to test standards and interoperability studies large data clouds
Part 1:Open Cloud Consortium
501(3)(c) Not-for-profit corporation Supports the development of standards, interoperability frameworks, and reference implementations. Manages testbeds: Open Cloud Testbed and  IntercloudTestbed. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. Develops benchmarks. 4 www.opencloudconsortium.org
OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities:  CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago Government agencies: NASA Open Source Projects: Sector Project 5
Operates Clouds 500 nodes 3000 cores 1.5+ PB Four data centers 10 Gbps Target to refresh 1/3 each year. ,[object Object]
Open Science Data Cloud
IntercloudTestbed
Project Matsu: Cloud-based Disaster Relief Services,[object Object]
Focus of OCC Large Data Cloud Working Group 8 App App App App App Table-based Data Services Relational-like Data Services App App Cloud Compute Services (MapReduce, UDF, & other programming frameworks) App App Cloud Storage Services Developing APIs for this framework.
Tools and Standards Apache Hadoop/MapReduce Sector/Sphere large data cloud Open Geospatial Consortium Web Map Service (WMS) OCC tools are open source (matsu-project) http://code.google.com/p/matsu-project/
Part 2: Technical Approach Hadoop – Lead Andrew Levine Hadoop with Python Streams – Lead Collin Bennet Sector/Sphere – Lead YunhongGu
Implementation 1: Hadoop & MapreduceAndrew Levine
Image Processing in the Cloud - Mapper Mapper Input Key: Bounding Box Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5) Mapper Input Value: Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Step 1: Input to Mapper Mapper Output Value: + Timestamp Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Step 3: Mapper Output Step 2: Processing in Mapper
Image Processing in the Cloud - Reducer Reducer Key Input: Bounding Box (minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375) Reducer Value Input: … … Step 1: Input to Reducer Result is a delta of the two Images Assemble Images based on timestamps and compare Step 2: Process difference in Reducer All images go to different map layers set of images for display in WMS Timestamp 1 Set Timestamp 2 Set Delta Set Step 3: Reducer Output
Implementation 2: Hadoop & Python StreamsCollin Bennett
Preprocessing Step ,[object Object]
Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-timeHadoop streaming paradigm).geolocation  timestamp | tuple size ; image width ; image height; comma-separated list of pixels the fields in red are metadata needed to process the image in the reducer
Map and Shuffle ,[object Object]
All of the work for mapping was done in the pre-process step

Weitere ähnliche Inhalte

Was ist angesagt?

Slide 1
Slide 1Slide 1
Slide 1
butest
 

Was ist angesagt? (20)

Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Slide 1
Slide 1Slide 1
Slide 1
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Big Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental ScienceBig Data, Big Computing, AI, and Environmental Science
Big Data, Big Computing, AI, and Environmental Science
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
DATACUBES: Conquering Space & Time
DATACUBES: Conquering Space & TimeDATACUBES: Conquering Space & Time
DATACUBES: Conquering Space & Time
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engine
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
 

Andere mochten auch (6)

Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016AnalyticOps - Chicago PAW 2016
AnalyticOps - Chicago PAW 2016
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 

Ähnlich wie Project Matsu: Elastic Clouds for Disaster Relief

Algorithms and tools for point cloud generation
Algorithms and tools for point cloud generationAlgorithms and tools for point cloud generation
Algorithms and tools for point cloud generation
Radhe Syam
 
sector-sphere
sector-spheresector-sphere
sector-sphere
xlight
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 

Ähnlich wie Project Matsu: Elastic Clouds for Disaster Relief (20)

Project Matsu
Project MatsuProject Matsu
Project Matsu
 
Brewing the Ultimate Data Fusion
Brewing the Ultimate Data FusionBrewing the Ultimate Data Fusion
Brewing the Ultimate Data Fusion
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
SeRanet introduction
SeRanet introductionSeRanet introduction
SeRanet introduction
 
Algorithms and tools for point cloud generation
Algorithms and tools for point cloud generationAlgorithms and tools for point cloud generation
Algorithms and tools for point cloud generation
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
prj exam
prj examprj exam
prj exam
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
 
GRID COMPUTING
GRID COMPUTINGGRID COMPUTING
GRID COMPUTING
 
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENTREAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT
 
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
Real-Time Pedestrian Detection Using Apache Storm in a Distributed Environment
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009Sector - Presentation at Cloud Computing & Its Applications 2009
Sector - Presentation at Cloud Computing & Its Applications 2009
 
PhD Thesis Proposal
PhD Thesis Proposal PhD Thesis Proposal
PhD Thesis Proposal
 
Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)Open Science Data Cloud (June 21, 2010)
Open Science Data Cloud (June 21, 2010)
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Log polar coordinates
Log polar coordinatesLog polar coordinates
Log polar coordinates
 

Mehr von Robert Grossman

Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
Robert Grossman
 

Mehr von Robert Grossman (18)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
 
Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)Big Data - Lab A1 (SC 11 Tutorial)
Big Data - Lab A1 (SC 11 Tutorial)
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Project Matsu: Elastic Clouds for Disaster Relief

  • 1. www.opencloudconsortium.org Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, YunhongGu, and Andrew LevineOpen Cloud Consortium June 21, 2010
  • 2. Project Matsu Goals Provide persistent data resources and elastic computing to assist in disasters: Make imagery available for disaster relief workers Elastic computing for large scale image processing Change detection for temporally different and geospatially identical image sets Provide a resource to test standards and interoperability studies large data clouds
  • 3. Part 1:Open Cloud Consortium
  • 4. 501(3)(c) Not-for-profit corporation Supports the development of standards, interoperability frameworks, and reference implementations. Manages testbeds: Open Cloud Testbed and IntercloudTestbed. Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. Develops benchmarks. 4 www.opencloudconsortium.org
  • 5. OCC Members Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo Universities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago Government agencies: NASA Open Source Projects: Sector Project 5
  • 6.
  • 9.
  • 10. Focus of OCC Large Data Cloud Working Group 8 App App App App App Table-based Data Services Relational-like Data Services App App Cloud Compute Services (MapReduce, UDF, & other programming frameworks) App App Cloud Storage Services Developing APIs for this framework.
  • 11. Tools and Standards Apache Hadoop/MapReduce Sector/Sphere large data cloud Open Geospatial Consortium Web Map Service (WMS) OCC tools are open source (matsu-project) http://code.google.com/p/matsu-project/
  • 12. Part 2: Technical Approach Hadoop – Lead Andrew Levine Hadoop with Python Streams – Lead Collin Bennet Sector/Sphere – Lead YunhongGu
  • 13. Implementation 1: Hadoop & MapreduceAndrew Levine
  • 14. Image Processing in the Cloud - Mapper Mapper Input Key: Bounding Box Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5) Mapper Input Value: Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Step 1: Input to Mapper Mapper Output Value: + Timestamp Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Step 3: Mapper Output Step 2: Processing in Mapper
  • 15. Image Processing in the Cloud - Reducer Reducer Key Input: Bounding Box (minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375) Reducer Value Input: … … Step 1: Input to Reducer Result is a delta of the two Images Assemble Images based on timestamps and compare Step 2: Process difference in Reducer All images go to different map layers set of images for display in WMS Timestamp 1 Set Timestamp 2 Set Delta Set Step 3: Reducer Output
  • 16. Implementation 2: Hadoop & Python StreamsCollin Bennett
  • 17.
  • 18. Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-timeHadoop streaming paradigm).geolocation timestamp | tuple size ; image width ; image height; comma-separated list of pixels the fields in red are metadata needed to process the image in the reducer
  • 19.
  • 20. All of the work for mapping was done in the pre-process step
  • 21. Map / Shuffle key is the geolocation
  • 22.
  • 23. Sector Distributed File System Sector aggregate hard disk storage across commodity computers With single namespace, file system level reliability (using replication), high availability Sector does not split files A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network A directory can be kept together on a single node as well, as an option
  • 24. Sphere UDF Sphere allows a User Defined Function to be applied to each file (either it is a single image or multiple images) Existing applications can be wrapped up in a Sphere UDF In many situations, Sphere streaming utility accepts a data directory and a application binary as inputs ./stream -ihaiti -cossim_foo -o results
  • 25. For More Information info@opencloudconsortium.org www.opencloudconsortium.org