4. Hadoop, a Java Impl An Implementation of MapReduce originated from Yahoo! The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834
5. Computer Vision at Scale The “computational vision” The sheer size of dataset: PCA of Natural Images (1992): 15 images, 4096 patches High-perf Face Detection (2007): 75,000 samples IM2GPS (2008): 6,472,304 images
7. HIPI Image Bundle Setup Moral of the story: Many small files are killing the performance in distributed file system.
8. Redo PCA in Natural Images at Scale The first 15 principal components with 15 images (Hancock, 1992):
9. Redo PCA in Natural Images at Scale Comparison: Hancock, 1992 HIPI, 100 HIPI, 1,000 HIPI, 10,000 HIPI, 100,000
10. Optimize HIPI Performance Culling: because decompression is costly Decompress at need A boolean cull(ImageHeader header) method for conditional decompression
11. Culling, to inspect specific camera effects Canon Powershot S500, at 2592x1944
12. HIPI, Glance at Performance figures An empty job (only decompressing and looping over images), 5 run, using minimal figure, in seconds, lower is better:
13. HIPI, Glance at Performance figures Im2gray job (converting images to gray scale), 5 run, using minimal figure, in seconds, lower is better:
14. HIPI, Glance at Performance figures Covariance job (compute covariance matrix of patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:
15. HIPI, Glance at Performance figures Culling job (decompressing all images V.S. decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:
16. Conclusion Everything at large scale gets better. HIPI provides an image-centric interface that performs on par or better than the leading alternative Cull method provides significant improvement and convenience HIPI offers noticeable improvements!
17. Future work Release HIPI as Opensource Project. Work on deep integration with Hadoop. Making HIPI work-load more configurable. Making work-load more balanced.