Nov 2010 HUG: Fuzzy Table - B.A.H

Fuzzy Table Distributed Fuzzy Matching Database Ed Kohlwey [email_address]

Session Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object]

What is Fuzzy Matching? *Euclidean Distance in this example These images are very similar, but obliviously not the same. To find image #2 given image #1, some sort of fuzzy matching technique needs to be used Images from Flickr; Licensed under Creative Commons http://www.flickr.com/photos/mdpettitt/455527136/sizes/l/in/photostream/ http://www.flickr.com/photos/mdpettitt/455539917/sizes/l/in/photostream/ Distance Function* 31.46 Feature Extraction & Normalization Feature Extraction & Normalization 1 2 Start with some multimedia image/voice/audio/video/etc Create a Vector or Matrix of doubles

How Is Fuzzy Matching Being Used Today?

Why Do We Care? ,[object Object],[object Object],[object Object]

Biometrics – A Fuzzy Matching Problem Same Person? Lifted From A Crime Scene Law Enforcement Database

Biometrics – Example *Euclidean Distance in this example Distance Function* 2.41 Feature Extraction & Normalization Feature Extraction & Normalization 1 2 Query Biometrics Database Create a Vector or Matrix of doubles

Growth of Multimedia Databases ,[object Object],[object Object],http://techcrunch.com/2009/04/07/who-has-the-most-photos-of-them-all-hint-it-is-not-facebook/ http://ksudigg.wetpaint.com/page/YouTube+Statistics http://techcrunch.com/2009/04/28/as-youtube-passes-a-billion-unique-us-viewers-hulu-rushes-into-third-place/ ,[object Object],[object Object]

Growth of Biometric Databases ,[object Object],[object Object],[object Object],* US-VISIT: The world’s largest biometric application. William Graves. ** http://www.fbi.gov/hq/cjisd/iafis/iafis_facts.htm *** http://www.business-standard.com/india/news/national-population-register-to-start-biometrics-data-collectiondec/399135/ **** http://www.findbiometrics.com/articles/i/5220/ ***** http://www.alltrustnetworks.com/News/6Million/tabid/378/Default.aspx ,[object Object],[object Object],[object Object],US-VISIT

Biometric Databases are a Big (Data) Problem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],= 1 – 2 PB ,[object Object],= 2 – 27 TB

Hadoop and Multimedia Databases ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Fuzzy Table: Large-scale, Low Latency, Fuzzy Matching Database ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Bulk Binning and Real-time Classification * Efficient fingerprint search based on database clustering. Manhua Liu, Xudong Jiang, Alex Chichung Kot

Fuzzy Table: Bulk Data Processing Component ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],*Efficient Search and Retrieval in Biometric Databases by Amit Mhatre, Srinivas Palla, Sharat Chikkerur and Venu Govindaraju * Efficient fingerprint search based on database clustering. Manhua Liu, Xudong Jiang, Alex Chichung Kot

Fuzzy Table: Data Storage and Bins ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Low Latency Component ,[object Object],[object Object],[object Object],[object Object],[object Object]

Fuzzy Table: Optimizations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Performance and Scalability Testing On EC2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Average Query Times # Of Data Servers Time To Respond (ms)

Average Query Times # Of Data Servers Time To Respond (ms) Linear Scalability to ~ 7 Nodes Lower limit due to I/O latencies

Longest Query Times # Of Data Servers Time To Respond (ms) Frequent Namenode access + large number of DFS clients begins to erode performance

Shortest Query Times # Of Data Servers Time To Respond (ms) ~500 ms

EC2 Results Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Performance and Scalability (Local) ,[object Object],[object Object],[object Object]

Caching Performance # Threads Polling The Master Server Average Response Time (ns) Major discrepency, grows with load

Conclusion & Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Contact Information – Cloud Computing Team Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)617-3523 [email_address] Jesse Yates Consultant @jason_trost @ekohlwey @jesse_yates @mikeridley Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)543-4611 [email_address] Michael Ridley Associate Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)543-4400 [email_address] Jason Trost Associate Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)821-8000 [email_address] Edmund Kohlwey Senior Consultant Booz Allen Hamilton Inc. 134 National Business Parkway. Annapolis Junction, Maryland 20701 (301)821-8000 [email_address] Robert Gordon Associate

Thanks ,[object Object],[object Object]

Technologies Used ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Fuzzy Table: Low Latency Fuzzy Matching Component Details ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Nov 2010 HUG: Fuzzy Table - B.A.H

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Nov 2010 HUG: Fuzzy Table - B.A.H

Ähnlich wie Nov 2010 HUG: Fuzzy Table - B.A.H (20)

Mehr von Yahoo Developer Network

Mehr von Yahoo Developer Network (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Nov 2010 HUG: Fuzzy Table - B.A.H

Hinweis der Redaktion