SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Downloaden Sie, um offline zu lesen
Machine Learning and Multimedia
     Information Retrieval*
     Integrated Knowledge Solutions
            iksinc@yahoo.com



        * Based on a talk at ICMLA Conference
Outline
             •   Introduction
             •   Bridging the Semantic Gap
             •   Events in Videos
             •   Use of Tagging in MIR
             •   Killer Apps of MIR
             •   Take Home Message



12/12/2010                     ICMLA Talk    2
Too Much Information

                                                             Which is more frustrating?




Being stuck in traffic on way to or from work




Not being able to find information you
urgently need
                                                                 According to a survey by
                                                                          Xerox
12/12/2010                                      ICMLA Talk                                  3
Not a New Problem
                        Nalanda University was one of the first universities
                        in the world, founded in the 5th Century BC, and
                        reported to have been visited by the Buddha during
                        his lifetime. At its peak, in the 7th century AD,
                        Nalanda held some 10,000 students when it was
                        visited by the Chinese scholar Xuanzang.




               The Royal Library of Alexandria, in Egypt, seems to have been
               the largest and most significant great library of the ancient
               world. It functioned as a major center of scholarship from its
               construction in the third century B.C. until the Roman
               conquest of Egypt in 48 B.C.

12/12/2010                   ICMLA Talk                                     4
However, Earlier
 Data Producers




                                     Data Consumers

12/12/2010              ICMLA Talk                    5
But Now a Days




12/12/2010        ICMLA Talk   6
Some Relevant Numbers
                           Photobucket has 6.2 billion photos and Flickr
                           has over 2 billion.



                Facebook has over 10 Billion photos and over
                400 million active users.




12/12/2010                    ICMLA Talk                                   7
Phenomenon
                       • 24 hours of videos are
                         uploaded to YouTube
                         every one minute
                       • YouTube streams 2
                         billions of videos every
                         day




12/12/2010      ICMLA Talk                          8
So how do we get help in finding the desired
                     multimedia information?




                                                   MIR




12/12/2010                            ICMLA Talk            9
So What is MIR?

• Also known as CBIR (Content-based Image Retrieval) and
  CBVIR (Content-based Visual Information Retrieval)
• Deals with systems that manage and facilitate searching
  for multimedia documents such as images, videos, audio
  clips and slides etc based on content




12/12/2010                  ICMLA Talk                      10
History of MIR
• Conference on Database Applications of Pictorial Applications,
  1979 (Florence, Italy)
• NSF Workshop on Visual Information Management Systems,
  1992 (Redwood, CA)
• QBIC (Query By Image Content), 1993 (SPIE’s Conf on Storage
  and Retrieval for Image and Video Databases), Also First ACM
  Multimedia Conference
• Shift to semantic similarity from signal similarity, 1999
• Community tagging, photo and video sharing sites, 2002



12/12/2010                   ICMLA Talk                        11
A Typical MIR System
                                                           Relevance
                                                           Feedback


                 Query            Feature
                                 Extraction

                                              Indexing &   Retrieved
                                              Matching      Results




    Media                 Feature              Features
    Collection           Extraction




12/12/2010                                    ICMLA Talk               12
Semantic Gap


                                                                                Early systems produced results
                                                                                wherein the retrieved
                                                                                documents were visually similar
                                                                                (signal level similar) but not
                                                                                necessarily similar in showing
                                                                                the same semantic concept.




http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/

                                                 Content-Based Image Retrieval at the End of the Early Years
                                                 Found in: IEEE Transactions on Pattern Analysis and Machine
                                                 Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini ,
                                                 Amarnath Gupta , Ramesh Jain , December 2000

     12/12/2010                                      ICMLA Talk                                                        13
Semantic Gap
Users also like to query using descriptive
words rather than query images or other
multimedia objects. This requires MIR
systems to correlate low-level features
with high level concepts.




Visually dissimilar
images representing
the same concept.




12/12/2010                              ICMLA Talk   14
How to Bridge the Semantic Gap?
Exploit context
• Text surrounding images
• Associated sound track and
closed captions in videos
• Query history

                                     Use machine learning to:
                                     • Build image category classifiers to
                                     perform semantic filtering of the
                                     results
                                     • Build specific detectors for objects
                                     to associate concepts with images
                                     •Build object models using low level
                                     features

12/12/2010                     ICMLA Talk                                     15
Exploiting Context: An Example




Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,”
Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001


 12/12/2010                                               ICMLA Talk                                16
Example of Using Surrounding Text




12/12/2010                 ICMLA Talk            17
Context via Surrounding Text




12/12/2010               ICMLA Talk         18
Context Via Surrounding Text: One
               More Example




12/12/2010          ICMLA Talk            19
Better Context with More Text




12/12/2010           ICMLA Talk         20
Improving Context via More Words per
               Query




12/12/2010      ICMLA Talk         21
Issues Unique to ML for MIR
• Simultaneous presence of
  multiple concepts
• How to extract/isolate
  concept-specific features?
  Segment or do not
  segment?
• Imbalance between                    Romance, couple, beach, sundown
                                       From: s163.photobucket.com
  positive and negative
  examples
• Extremely large number
  of concepts for a general
  purpose MIR

12/12/2010                ICMLA Talk                                     22
A Template Relating Concepts with Pictures
       Concepts   Image Tokens        Images




12/12/2010               ICMLA Talk            23
Feature Extraction Issues
  Whole image based features.
  Easy to use but not very
  effective



                                                    Region based features. Both
                                                    regular region structure and
                                                    segmented regions are popular



  Salient objects based features.
  Connected regions
  corresponding to dominant
  visual properties of objects in an
  image
12/12/2010                             ICMLA Talk                                   24
Scale Invariant Feature Transform
               (SIFT) Descriptors
  SIFT descriptors or its variants are
  currently the most popular features
  in use. Each image generates
  thousands of features (key point
  descriptors) with each feature
  typically consisting of 128 values

                                                      http://www.vlfeat.org/


                                                      D. G. Lowe, “Distinctive image
                                                      features from scale-invariant
                                                      keypoints,” IJCV, 2004.




12/12/2010                               ICMLA Talk                                    25
Feature Discovery



                                         Basic idea is to discover
                                         features that are best
                                         suitable for a given
                                         collection




         Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on
         Automated Shape Feature Discovery,” ISMSE 2004

12/12/2010                                          ICMLA Talk                                          26
Image Category Classifiers (ICC)
• Trained using both supervised and
  unsupervised learning methods (SVM,
  DT, AdaBoost, VQ etc)
• Early work limited to few tens of
  categories; however some of the current
  systems can work with thousands of
  categories/concepts

12/12/2010          ICMLA Talk              27
VQ Based Image Category Classifier


               Fire Codebook




                                  Test      Best
                                 Image      Codebook
             Sky Codebook                   Label




                                              Mustafa & Sethi (2004)
             Water Codebook


12/12/2010                     ICMLA Talk                              28
Object Detectors



                        PASCAL Visual Object Classes Challenge




12/12/2010         ICMLA Talk                                    29
Project
             Web-based annotation tool to segment and label image
             regions. Labeled objects in images are used as training images
             to build object detectors.




12/12/2010             http://labelme.csail.mit.edu/
                                   ICMLA Talk                                 30
Image Category Classifiers Examples




     IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings,
     activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM
     alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos
     per day. Several demos of IMARS are available (see IMARS demos)
12/12/2010                                                  ICMLA Talk                                                           31
Image Classification via Probabilistic
                 Modeling




Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic
concept and learns a probabilistic model for each concept. (b) The system represents
each image by a vector of posterior concept probabilities.

             From Pixels to Semantic Spaces: Advances in Content-Based Image
             Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)


12/12/2010                              ICMLA Talk                                 32
Retrieving Events in Videos
• An event in MIR implies an interesting
  spatiotemporal instance
• Considerable work in MIR community on
  events because of popularity of sports videos
• Also tremendous interest in detecting and
  recognizing events with potential homeland
  security applications


12/12/2010               ICMLA Talk               33
Event Retrieval Examples: Supervised
               Approach




                  Mustafa & Sethi AVSS Conference 2005

12/12/2010        ICMLA Talk                             34
Unsupervised Learning for Event Retrieval




Mustafa & Sethi, ICTAI 2007
12/12/2010                    ICMLA Talk   35
Unsupervised Learning Based Event
                Retrieval




    Mustafa & Sethi, ICTAI 2007


12/12/2010                        ICMLA Talk   36
Retrieval By Cross-Modal Associations




  - Using query from one modality (e.g. audio) to
  retrieve content on a different modality (e.g. video)
  - Directly on low-level features
                                        Approaches:
                                        Latent semantic indexing (LSI)
Li, Dimitrova, Li and Sethi (ACM        Cross-modal factor analysis (CFA)
MM 03)                                  Canonical correlation analysis (CCA)
12/12/2010                               ICMLA Talk                            37
Talking Face Example
                 Feature
                Extraction
    Query

                                          Retrieval Results
   Collection
                         Cross-Modal
    of Image
   Sequences             Association




       .
       .         Feature
       .        Extraction




  M. Li, D. Li, Dimitrova and Sethi, “Audio-Visual Talking Face Detection,” Proceedings, ICME,
  2003


12/12/2010                                    ICMLA Talk                                         38
Tagging in MIR




             All time most popular tags at Flickr

12/12/2010                  ICMLA Talk              39
About Tags
•   User centered
•   Imprecise and often overly personalized
•   Tag distribution follows power law
•   Most users use very few distinct tags while a small group of users works
    with extremely large set of tags




12/12/2010                          ICMLA Talk                                 40
How are Tags Being Used in MIR?



             Relating tags in different languages through visual features


                   Aurnhammer, Hanappe and Steels Proc. WWW2006




12/12/2010                              ICMLA Talk                          41
Tag Suggester




             Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)



12/12/2010                               ICMLA Talk                         42
Collaborative Tags
• Also known as Folksonomy, social tagging, and social
  classification
• Great for content characterization
• The tag size represents the number of times the tag has
  been applied to the same item by different users. It kind of
  represents the level of agreement /confidence in a tag.




12/12/2010                         ICMLA Talk                    43
Decision Tree Based Tagger
 • Uses social tags in binary/weighted mode
 • Generates/suggests multiple tags through a single decision
   tree classifier

First, the label vectors associated
with training vectors are
clustered into two initial groups


Next, the SVM is used on training
vectors to yield the split that best
matches the clustering result

An impurity based measure is
used to iteratively adjust the split,
if needed

      Ma, Sethi, and Patel. “Multilabel Classification Method for Multimedia Tagging”. (IJMDEM, 2010)

 12/12/2010                                         ICMLA Talk                                          44
12/12/2010   ICMLA Talk   45
12/12/2010   ICMLA Talk   46
Current Status of MIR
• Extensive interest as evident from conferences,
  journals, and special issues
• Most in the MM community happy with the progress
• Gap between published results and results from
  publicly available systems on web.
    (http://www.theopavlidis.com/technology/CBIR/PaperB/icpr08.htm)
• Lack of application focus
• Plenty of scope for machine learning to help
  improve MIR systems performance
• Killer applications are beginning to emerge
12/12/2010                       ICMLA Talk                           47
MIR Application Examples




  Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain,
  Jung-Eun Lee, and Rong Jin)



12/12/2010                                    ICMLA Talk                                           48
Biological and Medical Data Retrieval




             http://www.cs.washington.edu/research/VACE/Multimedia/




12/12/2010                              ICMLA Talk                    49
Killer Apps?




12/12/2010       ICMLA Talk   50
http://www.iqengines.com/applications.php




12/12/2010                       ICMLA Talk              51
http://www.iqengines.com/applications.php




12/12/2010                       ICMLA Talk              52
http://www.thingd.com




                             Bloomberg Businessweek, Nov29, 2010
12/12/2010             ICMLA Talk                                  53
12/12/2010   ICMLA Talk   54
Take Home Message
• MIR is emerging in the commercial domain.
  Lot more activity is expected in near future
• MIR community is obsessed with general
  purpose retrieval engine; a folly pursued by
  computer vision community for a long time
• ML is playing a vital role in MIR
• Approaches combining social search and
  visual search techniques are expected to gain
  prominence
12/12/2010            ICMLA Talk                  55
Acknowledgement
• This presentation is based on the work of
  numerous researchers from the MIR/ML/CVPR
  community. I have tried to give
  credit/references wherever possible. Any
  omission is unintentional and I apologize for
  that.
• Also want to thank my present and past
  students and collaborators.

12/12/2010           ICMLA Talk               56
Questions?




12/12/2010      ICMLA Talk   57

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
Comparing Presence and Immersion in Three Different Collaborative Virtual Env...
Comparing Presence and Immersion in Three Different Collaborative Virtual Env...Comparing Presence and Immersion in Three Different Collaborative Virtual Env...
Comparing Presence and Immersion in Three Different Collaborative Virtual Env...Caglayan Karapinar
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DLPractical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DLAlbert Y. C. Chen
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya
 
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooDeep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooChristian Perone
 
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleRoelof Pieters
 
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...Jeffrey Too Chuan TAN
 
Top Cited Articles in Computer Graphics and Animation
Top Cited Articles in Computer Graphics and AnimationTop Cited Articles in Computer Graphics and Animation
Top Cited Articles in Computer Graphics and Animationijcga
 
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...Jeffrey Too Chuan TAN
 

Was ist angesagt? (20)

Motaz_CV
Motaz_CVMotaz_CV
Motaz_CV
 
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
 
Comparing Presence and Immersion in Three Different Collaborative Virtual Env...
Comparing Presence and Immersion in Three Different Collaborative Virtual Env...Comparing Presence and Immersion in Three Different Collaborative Virtual Env...
Comparing Presence and Immersion in Three Different Collaborative Virtual Env...
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
 
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
 
Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DLPractical computer vision-- A problem-driven approach towards learning CV/ML/DL
Practical computer vision-- A problem-driven approach towards learning CV/ML/DL
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
 
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooDeep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural Zoo
 
Empirical AI Research
Empirical AI Research Empirical AI Research
Empirical AI Research
 
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016
 
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
 
視訊訊號處理與深度學習應用
視訊訊號處理與深度學習應用視訊訊號處理與深度學習應用
視訊訊號處理與深度學習應用
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
 
Semantics based Summarization of Entities in Knowledge Graphs
Semantics based Summarization of Entities in Knowledge GraphsSemantics based Summarization of Entities in Knowledge Graphs
Semantics based Summarization of Entities in Knowledge Graphs
 
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Mon...
 
Podobnostní hledání v netextových datech (Pavel Zezula)
Podobnostní hledání v netextových datech (Pavel Zezula)Podobnostní hledání v netextových datech (Pavel Zezula)
Podobnostní hledání v netextových datech (Pavel Zezula)
 
Top Cited Articles in Computer Graphics and Animation
Top Cited Articles in Computer Graphics and AnimationTop Cited Articles in Computer Graphics and Animation
Top Cited Articles in Computer Graphics and Animation
 
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
RoboCup@HomeEDU AI-Focused Robotics Education by Home Service Robot DIY | Vic...
 

Ähnlich wie Machine learning and multimedia information retrieval

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the WebTwenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the WebStuart Weibel
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Multimedia Semantics: Metadata, Analysis and Interaction
Multimedia Semantics:Metadata, Analysis and InteractionMultimedia Semantics:Metadata, Analysis and Interaction
Multimedia Semantics: Metadata, Analysis and InteractionRaphael Troncy
 
Enhancing the Web Experience
Enhancing the Web ExperienceEnhancing the Web Experience
Enhancing the Web ExperienceJohn Breslin
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012Lee Dirks
 
Semantics And Multimedia
Semantics And MultimediaSemantics And Multimedia
Semantics And MultimediaPeter Berger
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRobert H. McDonald
 
DIGITAL LIBRARIES
DIGITAL LIBRARIESDIGITAL LIBRARIES
DIGITAL LIBRARIESviedma2
 
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...Cybera Inc.
 
Sakai And The Academic Enterprise
Sakai And The Academic EnterpriseSakai And The Academic Enterprise
Sakai And The Academic EnterpriseMichael Feldstein
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notesBernadette Hyland-Wood
 
Forethoughts (or Four Provocations) on Linked Data and Digital Scholarship
Forethoughts (or Four Provocations) on Linked Data and Digital ScholarshipForethoughts (or Four Provocations) on Linked Data and Digital Scholarship
Forethoughts (or Four Provocations) on Linked Data and Digital ScholarshipDavid De Roure
 
SAIL presentation at FuNeMS 2011
SAIL presentation at FuNeMS 2011SAIL presentation at FuNeMS 2011
SAIL presentation at FuNeMS 2011SAIL
 
What to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based ArtWhat to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based Artneilgrindley
 
Research, the Cloud, and the IRB
Research, the Cloud, and the IRBResearch, the Cloud, and the IRB
Research, the Cloud, and the IRBMichael Zimmer
 

Ähnlich wie Machine learning and multimedia information retrieval (20)

Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the WebTwenty Years of Metadata: Lessons from the First Two Decades of the Web
Twenty Years of Metadata: Lessons from the First Two Decades of the Web
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Multimedia Semantics: Metadata, Analysis and Interaction
Multimedia Semantics:Metadata, Analysis and InteractionMultimedia Semantics:Metadata, Analysis and Interaction
Multimedia Semantics: Metadata, Analysis and Interaction
 
Enhancing the Web Experience
Enhancing the Web ExperienceEnhancing the Web Experience
Enhancing the Web Experience
 
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
ExLibris National Library Meeting @ IFLA-Helsinki - Aug 15th 2012
 
Semantics And Multimedia
Semantics And MultimediaSemantics And Multimedia
Semantics And Multimedia
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
DIGITAL LIBRARIES
DIGITAL LIBRARIESDIGITAL LIBRARIES
DIGITAL LIBRARIES
 
Making Conversations Visible
Making Conversations VisibleMaking Conversations Visible
Making Conversations Visible
 
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
Collaborative Science: Technologies & Examples - Cameron Kiddle, Grid Researc...
 
Kno.e.sis Student Accomplishments
Kno.e.sis Student AccomplishmentsKno.e.sis Student Accomplishments
Kno.e.sis Student Accomplishments
 
Sakai And The Academic Enterprise
Sakai And The Academic EnterpriseSakai And The Academic Enterprise
Sakai And The Academic Enterprise
 
20111120 warsaw learning curve by b hyland notes
20111120 warsaw   learning curve by b hyland notes20111120 warsaw   learning curve by b hyland notes
20111120 warsaw learning curve by b hyland notes
 
Forethoughts (or Four Provocations) on Linked Data and Digital Scholarship
Forethoughts (or Four Provocations) on Linked Data and Digital ScholarshipForethoughts (or Four Provocations) on Linked Data and Digital Scholarship
Forethoughts (or Four Provocations) on Linked Data and Digital Scholarship
 
Multimedia Database
Multimedia DatabaseMultimedia Database
Multimedia Database
 
SAIL presentation at FuNeMS 2011
SAIL presentation at FuNeMS 2011SAIL presentation at FuNeMS 2011
SAIL presentation at FuNeMS 2011
 
What to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based ArtWhat to curate? Preserving and Curating Software-Based Art
What to curate? Preserving and Curating Software-Based Art
 
Research, the Cloud, and the IRB
Research, the Cloud, and the IRBResearch, the Cloud, and the IRB
Research, the Cloud, and the IRB
 

Mehr von Si Krishan

Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroSi Krishan
 
Bringing AI to Business Intelligence
Bringing AI to Business IntelligenceBringing AI to Business Intelligence
Bringing AI to Business IntelligenceSi Krishan
 
Intro to Excel Basics: Part II
Intro to Excel Basics: Part IIIntro to Excel Basics: Part II
Intro to Excel Basics: Part IISi Krishan
 
Intro to Excel Basics: Part I
Intro to Excel Basics: Part IIntro to Excel Basics: Part I
Intro to Excel Basics: Part ISi Krishan
 
How to Do Research: Seven Steps to Successful Research
How to Do Research: Seven Steps to Successful ResearchHow to Do Research: Seven Steps to Successful Research
How to Do Research: Seven Steps to Successful ResearchSi Krishan
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningSi Krishan
 
Social media in banking
Social media in bankingSocial media in banking
Social media in bankingSi Krishan
 

Mehr von Si Krishan (8)

Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
Bringing AI to Business Intelligence
Bringing AI to Business IntelligenceBringing AI to Business Intelligence
Bringing AI to Business Intelligence
 
Ml intro
Ml introMl intro
Ml intro
 
Intro to Excel Basics: Part II
Intro to Excel Basics: Part IIIntro to Excel Basics: Part II
Intro to Excel Basics: Part II
 
Intro to Excel Basics: Part I
Intro to Excel Basics: Part IIntro to Excel Basics: Part I
Intro to Excel Basics: Part I
 
How to Do Research: Seven Steps to Successful Research
How to Do Research: Seven Steps to Successful ResearchHow to Do Research: Seven Steps to Successful Research
How to Do Research: Seven Steps to Successful Research
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Social media in banking
Social media in bankingSocial media in banking
Social media in banking
 

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 

Machine learning and multimedia information retrieval

  • 1. Machine Learning and Multimedia Information Retrieval* Integrated Knowledge Solutions iksinc@yahoo.com * Based on a talk at ICMLA Conference
  • 2. Outline • Introduction • Bridging the Semantic Gap • Events in Videos • Use of Tagging in MIR • Killer Apps of MIR • Take Home Message 12/12/2010 ICMLA Talk 2
  • 3. Too Much Information Which is more frustrating? Being stuck in traffic on way to or from work Not being able to find information you urgently need According to a survey by Xerox 12/12/2010 ICMLA Talk 3
  • 4. Not a New Problem Nalanda University was one of the first universities in the world, founded in the 5th Century BC, and reported to have been visited by the Buddha during his lifetime. At its peak, in the 7th century AD, Nalanda held some 10,000 students when it was visited by the Chinese scholar Xuanzang. The Royal Library of Alexandria, in Egypt, seems to have been the largest and most significant great library of the ancient world. It functioned as a major center of scholarship from its construction in the third century B.C. until the Roman conquest of Egypt in 48 B.C. 12/12/2010 ICMLA Talk 4
  • 5. However, Earlier Data Producers Data Consumers 12/12/2010 ICMLA Talk 5
  • 6. But Now a Days 12/12/2010 ICMLA Talk 6
  • 7. Some Relevant Numbers Photobucket has 6.2 billion photos and Flickr has over 2 billion. Facebook has over 10 Billion photos and over 400 million active users. 12/12/2010 ICMLA Talk 7
  • 8. Phenomenon • 24 hours of videos are uploaded to YouTube every one minute • YouTube streams 2 billions of videos every day 12/12/2010 ICMLA Talk 8
  • 9. So how do we get help in finding the desired multimedia information? MIR 12/12/2010 ICMLA Talk 9
  • 10. So What is MIR? • Also known as CBIR (Content-based Image Retrieval) and CBVIR (Content-based Visual Information Retrieval) • Deals with systems that manage and facilitate searching for multimedia documents such as images, videos, audio clips and slides etc based on content 12/12/2010 ICMLA Talk 10
  • 11. History of MIR • Conference on Database Applications of Pictorial Applications, 1979 (Florence, Italy) • NSF Workshop on Visual Information Management Systems, 1992 (Redwood, CA) • QBIC (Query By Image Content), 1993 (SPIE’s Conf on Storage and Retrieval for Image and Video Databases), Also First ACM Multimedia Conference • Shift to semantic similarity from signal similarity, 1999 • Community tagging, photo and video sharing sites, 2002 12/12/2010 ICMLA Talk 11
  • 12. A Typical MIR System Relevance Feedback Query Feature Extraction Indexing & Retrieved Matching Results Media Feature Features Collection Extraction 12/12/2010 ICMLA Talk 12
  • 13. Semantic Gap Early systems produced results wherein the retrieved documents were visually similar (signal level similar) but not necessarily similar in showing the same semantic concept. http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/ Content-Based Image Retrieval at the End of the Early Years Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta , Ramesh Jain , December 2000 12/12/2010 ICMLA Talk 13
  • 14. Semantic Gap Users also like to query using descriptive words rather than query images or other multimedia objects. This requires MIR systems to correlate low-level features with high level concepts. Visually dissimilar images representing the same concept. 12/12/2010 ICMLA Talk 14
  • 15. How to Bridge the Semantic Gap? Exploit context • Text surrounding images • Associated sound track and closed captions in videos • Query history Use machine learning to: • Build image category classifiers to perform semantic filtering of the results • Build specific detectors for objects to associate concepts with images •Build object models using low level features 12/12/2010 ICMLA Talk 15
  • 16. Exploiting Context: An Example Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,” Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001 12/12/2010 ICMLA Talk 16
  • 17. Example of Using Surrounding Text 12/12/2010 ICMLA Talk 17
  • 18. Context via Surrounding Text 12/12/2010 ICMLA Talk 18
  • 19. Context Via Surrounding Text: One More Example 12/12/2010 ICMLA Talk 19
  • 20. Better Context with More Text 12/12/2010 ICMLA Talk 20
  • 21. Improving Context via More Words per Query 12/12/2010 ICMLA Talk 21
  • 22. Issues Unique to ML for MIR • Simultaneous presence of multiple concepts • How to extract/isolate concept-specific features? Segment or do not segment? • Imbalance between Romance, couple, beach, sundown From: s163.photobucket.com positive and negative examples • Extremely large number of concepts for a general purpose MIR 12/12/2010 ICMLA Talk 22
  • 23. A Template Relating Concepts with Pictures Concepts Image Tokens Images 12/12/2010 ICMLA Talk 23
  • 24. Feature Extraction Issues Whole image based features. Easy to use but not very effective Region based features. Both regular region structure and segmented regions are popular Salient objects based features. Connected regions corresponding to dominant visual properties of objects in an image 12/12/2010 ICMLA Talk 24
  • 25. Scale Invariant Feature Transform (SIFT) Descriptors SIFT descriptors or its variants are currently the most popular features in use. Each image generates thousands of features (key point descriptors) with each feature typically consisting of 128 values http://www.vlfeat.org/ D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 2004. 12/12/2010 ICMLA Talk 25
  • 26. Feature Discovery Basic idea is to discover features that are best suitable for a given collection Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on Automated Shape Feature Discovery,” ISMSE 2004 12/12/2010 ICMLA Talk 26
  • 27. Image Category Classifiers (ICC) • Trained using both supervised and unsupervised learning methods (SVM, DT, AdaBoost, VQ etc) • Early work limited to few tens of categories; however some of the current systems can work with thousands of categories/concepts 12/12/2010 ICMLA Talk 27
  • 28. VQ Based Image Category Classifier Fire Codebook Test Best Image Codebook Sky Codebook Label Mustafa & Sethi (2004) Water Codebook 12/12/2010 ICMLA Talk 28
  • 29. Object Detectors PASCAL Visual Object Classes Challenge 12/12/2010 ICMLA Talk 29
  • 30. Project Web-based annotation tool to segment and label image regions. Labeled objects in images are used as training images to build object detectors. 12/12/2010 http://labelme.csail.mit.edu/ ICMLA Talk 30
  • 31. Image Category Classifiers Examples IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings, activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos per day. Several demos of IMARS are available (see IMARS demos) 12/12/2010 ICMLA Talk 31
  • 32. Image Classification via Probabilistic Modeling Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic concept and learns a probabilistic model for each concept. (b) The system represents each image by a vector of posterior concept probabilities. From Pixels to Semantic Spaces: Advances in Content-Based Image Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007) 12/12/2010 ICMLA Talk 32
  • 33. Retrieving Events in Videos • An event in MIR implies an interesting spatiotemporal instance • Considerable work in MIR community on events because of popularity of sports videos • Also tremendous interest in detecting and recognizing events with potential homeland security applications 12/12/2010 ICMLA Talk 33
  • 34. Event Retrieval Examples: Supervised Approach Mustafa & Sethi AVSS Conference 2005 12/12/2010 ICMLA Talk 34
  • 35. Unsupervised Learning for Event Retrieval Mustafa & Sethi, ICTAI 2007 12/12/2010 ICMLA Talk 35
  • 36. Unsupervised Learning Based Event Retrieval Mustafa & Sethi, ICTAI 2007 12/12/2010 ICMLA Talk 36
  • 37. Retrieval By Cross-Modal Associations - Using query from one modality (e.g. audio) to retrieve content on a different modality (e.g. video) - Directly on low-level features Approaches: Latent semantic indexing (LSI) Li, Dimitrova, Li and Sethi (ACM Cross-modal factor analysis (CFA) MM 03) Canonical correlation analysis (CCA) 12/12/2010 ICMLA Talk 37
  • 38. Talking Face Example Feature Extraction Query Retrieval Results Collection Cross-Modal of Image Sequences Association . . Feature . Extraction M. Li, D. Li, Dimitrova and Sethi, “Audio-Visual Talking Face Detection,” Proceedings, ICME, 2003 12/12/2010 ICMLA Talk 38
  • 39. Tagging in MIR All time most popular tags at Flickr 12/12/2010 ICMLA Talk 39
  • 40. About Tags • User centered • Imprecise and often overly personalized • Tag distribution follows power law • Most users use very few distinct tags while a small group of users works with extremely large set of tags 12/12/2010 ICMLA Talk 40
  • 41. How are Tags Being Used in MIR? Relating tags in different languages through visual features Aurnhammer, Hanappe and Steels Proc. WWW2006 12/12/2010 ICMLA Talk 41
  • 42. Tag Suggester Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08) 12/12/2010 ICMLA Talk 42
  • 43. Collaborative Tags • Also known as Folksonomy, social tagging, and social classification • Great for content characterization • The tag size represents the number of times the tag has been applied to the same item by different users. It kind of represents the level of agreement /confidence in a tag. 12/12/2010 ICMLA Talk 43
  • 44. Decision Tree Based Tagger • Uses social tags in binary/weighted mode • Generates/suggests multiple tags through a single decision tree classifier First, the label vectors associated with training vectors are clustered into two initial groups Next, the SVM is used on training vectors to yield the split that best matches the clustering result An impurity based measure is used to iteratively adjust the split, if needed Ma, Sethi, and Patel. “Multilabel Classification Method for Multimedia Tagging”. (IJMDEM, 2010) 12/12/2010 ICMLA Talk 44
  • 45. 12/12/2010 ICMLA Talk 45
  • 46. 12/12/2010 ICMLA Talk 46
  • 47. Current Status of MIR • Extensive interest as evident from conferences, journals, and special issues • Most in the MM community happy with the progress • Gap between published results and results from publicly available systems on web. (http://www.theopavlidis.com/technology/CBIR/PaperB/icpr08.htm) • Lack of application focus • Plenty of scope for machine learning to help improve MIR systems performance • Killer applications are beginning to emerge 12/12/2010 ICMLA Talk 47
  • 48. MIR Application Examples Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain, Jung-Eun Lee, and Rong Jin) 12/12/2010 ICMLA Talk 48
  • 49. Biological and Medical Data Retrieval http://www.cs.washington.edu/research/VACE/Multimedia/ 12/12/2010 ICMLA Talk 49
  • 50. Killer Apps? 12/12/2010 ICMLA Talk 50
  • 53. http://www.thingd.com Bloomberg Businessweek, Nov29, 2010 12/12/2010 ICMLA Talk 53
  • 54. 12/12/2010 ICMLA Talk 54
  • 55. Take Home Message • MIR is emerging in the commercial domain. Lot more activity is expected in near future • MIR community is obsessed with general purpose retrieval engine; a folly pursued by computer vision community for a long time • ML is playing a vital role in MIR • Approaches combining social search and visual search techniques are expected to gain prominence 12/12/2010 ICMLA Talk 55
  • 56. Acknowledgement • This presentation is based on the work of numerous researchers from the MIR/ML/CVPR community. I have tried to give credit/references wherever possible. Any omission is unintentional and I apologize for that. • Also want to thank my present and past students and collaborators. 12/12/2010 ICMLA Talk 56
  • 57. Questions? 12/12/2010 ICMLA Talk 57