SlideShare ist ein Scribd-Unternehmen logo
1 von 17
HIPI: Computer Vision atLarge Scale Chris Sweeny Liu Liu
Intro to MapReduce SIMD at Scale Mapper / Reducer
MapReduce, Main Takeaway Data Centric, Data Centric, Data Centric!
Hadoop, a Java Impl An Implementation of MapReduce originated from Yahoo! The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834
Computer Vision at Scale The “computational vision” The sheer size of dataset: PCA of Natural Images (1992):  15 images, 4096 patches High-perf Face Detection (2007): 75,000 samples IM2GPS (2008):  6,472,304 images
HIPI Workflow
HIPI Image Bundle Setup Moral of the story: Many small files are killing the performance in distributed file system.
Redo PCA in Natural Images at Scale The first 15 principal components with 15 images (Hancock, 1992):
Redo PCA in Natural Images at Scale Comparison: Hancock, 1992 HIPI, 100 HIPI, 1,000 HIPI, 10,000 HIPI, 100,000
Optimize HIPI Performance Culling: because decompression is costly Decompress at need A boolean cull(ImageHeader header) method for conditional decompression
Culling, to inspect specific camera effects Canon Powershot S500, at 2592x1944
HIPI, Glance at Performance figures An empty job (only decompressing and looping over images), 5 run, using minimal figure, in seconds, lower is better:
HIPI, Glance at Performance figures Im2gray job (converting images to gray scale), 5 run, using minimal figure, in seconds, lower is better:
HIPI, Glance at Performance figures Covariance job (compute covariance matrix of patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:
HIPI, Glance at Performance figures Culling job (decompressing all images V.S. decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:
Conclusion Everything at large scale gets better. HIPI provides an image-centric interface that performs on par or better than the leading alternative Cull method provides significant improvement and convenience HIPI offers noticeable improvements!
Future work Release HIPI as Opensource Project. Work on deep integration with Hadoop. Making HIPI work-load more configurable. Making work-load more balanced.

Weitere ähnliche Inhalte

Ähnlich wie Hipi: Computer Vision at Large Scale

Hadoop in the Cloud
Hadoop in the CloudHadoop in the Cloud
Hadoop in the CloudJim O'Neil
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018Masud Rahman
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
 
Sydgraph presentation 2004
Sydgraph presentation 2004Sydgraph presentation 2004
Sydgraph presentation 2004Steve Smith
 
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsAugust 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsYahoo Developer Network
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?J Langley
 
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)Rohith R
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...Edge AI and Vision Alliance
 
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...Jui-Hsin (Larry) Lai
 

Ähnlich wie Hipi: Computer Vision at Large Scale (20)

F07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAMF07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAM
 
Hadoop in the Cloud
Hadoop in the CloudHadoop in the Cloud
Hadoop in the Cloud
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
RaspberryPiPresentation
RaspberryPiPresentationRaspberryPiPresentation
RaspberryPiPresentation
 
A hadoop map reduce
A hadoop map reduceA hadoop map reduce
A hadoop map reduce
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
 
Sydgraph presentation 2004
Sydgraph presentation 2004Sydgraph presentation 2004
Sydgraph presentation 2004
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsAugust 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
 
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
 
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
 

Kürzlich hochgeladen

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Kürzlich hochgeladen (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Hipi: Computer Vision at Large Scale

  • 1. HIPI: Computer Vision atLarge Scale Chris Sweeny Liu Liu
  • 2. Intro to MapReduce SIMD at Scale Mapper / Reducer
  • 3. MapReduce, Main Takeaway Data Centric, Data Centric, Data Centric!
  • 4. Hadoop, a Java Impl An Implementation of MapReduce originated from Yahoo! The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834
  • 5. Computer Vision at Scale The “computational vision” The sheer size of dataset: PCA of Natural Images (1992): 15 images, 4096 patches High-perf Face Detection (2007): 75,000 samples IM2GPS (2008): 6,472,304 images
  • 7. HIPI Image Bundle Setup Moral of the story: Many small files are killing the performance in distributed file system.
  • 8. Redo PCA in Natural Images at Scale The first 15 principal components with 15 images (Hancock, 1992):
  • 9. Redo PCA in Natural Images at Scale Comparison: Hancock, 1992 HIPI, 100 HIPI, 1,000 HIPI, 10,000 HIPI, 100,000
  • 10. Optimize HIPI Performance Culling: because decompression is costly Decompress at need A boolean cull(ImageHeader header) method for conditional decompression
  • 11. Culling, to inspect specific camera effects Canon Powershot S500, at 2592x1944
  • 12. HIPI, Glance at Performance figures An empty job (only decompressing and looping over images), 5 run, using minimal figure, in seconds, lower is better:
  • 13. HIPI, Glance at Performance figures Im2gray job (converting images to gray scale), 5 run, using minimal figure, in seconds, lower is better:
  • 14. HIPI, Glance at Performance figures Covariance job (compute covariance matrix of patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:
  • 15. HIPI, Glance at Performance figures Culling job (decompressing all images V.S. decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:
  • 16. Conclusion Everything at large scale gets better. HIPI provides an image-centric interface that performs on par or better than the leading alternative Cull method provides significant improvement and convenience HIPI offers noticeable improvements!
  • 17. Future work Release HIPI as Opensource Project. Work on deep integration with Hadoop. Making HIPI work-load more configurable. Making work-load more balanced.