SlideShare ist ein Scribd-Unternehmen logo
1 von 127
Downloaden Sie, um offline zu lesen
Advances and Challenges in
Visual Information Search and
           Retrieval 	

                 Oge Marques	

            Florida Atlantic University	

              Boca Raton, FL - USA	


     VIII	
  Workshop	
  de	
  Visão	
  Computacional	
  (WVC)	
  2012	
  
                          May	
  27	
  –	
  30,	
  2012	
  
                          Goiania,	
  GO	
  -­‐	
  Brazil	
  
Take-home message	



Visual Information Retrieval (VIR) is a fascinating
research field with many open challenges and
opportunities which have the potential to impact
the way we organize, annotate, and retrieve visual
data (images and videos).	





                                                 Oge	
  Marques	
  
Disclaimer #1	

•  Visual Information Retrieval (VIR) is a highly
   interdisciplinary field, but …	


                          Image and       (Multimedia)
                                                           Information
                            Video          Database
                                                             Retrieval
                          Processing        Systems




                                            Visual
                       Machine                                 Computer
                       Learning          Information            Vision
                                           Retrieval



                                           Visual data
                                                          Human Visual
                           Data Mining    modeling and
                                                           Perception
                                         representation


                                                                    Oge	
  Marques	
  
Disclaimer #2	

•  There are many things that I believe…	





•  … but cannot prove	




                                              Oge	
  Marques	
  
Background and Motivation	

       “What is it that we’re trying to do 
                        and 
             why is it so difficult?”	


–  Taking pictures and storing, sharing, and publishing
   them has never been so easy and inexpensive. 	

–  If only we could say the same about finding the images
   we want and retrieving them… 	





                                                   Oge	
  Marques	
  
Background and Motivation	

The “big mismatch”	

         easy
                              • Take pictures
                              • Store pictures
                              • Publish pictures
  expensive                   • Share pictures     cheap

       • Organize pictures
       • Annotate pictures
       • Find pictures
       • Retrieve pictures   difficult


                                                      Oge	
  Marques	
  
Background and Motivation	

•  Q: What do you do when you need to find an image
   (on the Web)?

•  A1: Google (image search), of course!




                                              Oge	
  Marques	
  
Background and Motivation	

Google image search results for “sydney opera house”




                                      Source: Google Image Search (http://images.google.com/)	

                                                                                    Oge	
  Marques	
  
Background and Motivation	

Google image search results for “opera”




                                      Source: Google Image Search (http://images.google.com/)	


                                                                                      Oge	
  Marques	
  
Background and Motivation	

•  Q: What do you do when you need to find an
   image (on the Web)?

•  A2: Other (so-called specialized) image search
   engines
     •  http://images.search.yahoo.com/
     •  http://pictures.ask.com
     •  http://www.bing.com/images




                                                    Oge	
  Marques	
  
Yahoo!	





            Oge	
  Marques	
  
Ask	





         Oge	
  Marques	
  
Bing	





          Oge	
  Marques	
  
Background and Motivation	

•  Q: What do you do when you need to find an
   image (on the Web)?	


•  A3: Search directly on large photo repositories:	

  –  Flickr	

  –  Webshots	

  –  Shutterstock	





                                                    Oge	
  Marques	
  
Background and Motivation	

Flickr image search results for “opera”	





                                             Oge	
  Marques	
  
Background and Motivation	

Webshots image search results for “opera”	





                                               Oge	
  Marques	
  
Background and Motivation	

Shutterstock image search results for “opera”	





                                                   Oge	
  Marques	
  
Background and Motivation	

                 	

                 	

                 	

Are you happy with the results so far?	

                 	





                                       Oge	
  Marques	
  
Background and Motivation	

•  Back to our original (two-part) question:	

  –  What is it that we’re trying to do?	


  –  We're trying to create 
     	

 	

automated solutions to the problem of 
     	

 	

finding and retrieving visual information, 
     	

 	

from (large, unstructured) repositories, 
     	

 	

in a way that satisfies search criteria specified by
     	

 	

 	

users,
     	

 	

relying (primarily) on the visual contents of the 	

     	

 	

 	

media.	


                                                              Oge	
  Marques	
  
Background and Motivation	

•  Why is it so difficult?	


•  There are many challenges, among them:	

   –  The elusive notion of similarity 	

   –  The semantic gap	

   –  Large datasets and broad domains	

   –  Combination of visual and textual information	

   –  The users (and how to make them happy)	




                                                         Oge	
  Marques	
  
Outline	

•  Part I – Concepts, challenges, and state of the art	


•  Part II – Medical image retrieval	


•  Part III – Mobile visual search	


•  Part IV – Where is image search headed? 	




                                                     Oge	
  Marques	
  
Part I	


Concepts, challenges, and state of
            the art
The elusive notion of similarity	

•  Are these two images similar?	





  Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004. 
  Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf 	



                                                                                                                                   Oge	
  Marques	
  
The elusive notion of similarity	

•  Are these two images similar?	





Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004. 
Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf 	



                                                                                                                                       Oge	
  Marques	
  
The elusive notion of similarity	

•  Is the second or the third image more similar to
   the first?	





 Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004. 
 Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf 	



                                                                                                                                        Oge	
  Marques	
  
The elusive notion of similarity	

•  Which image fits better to the first two: the third
   or the fourth?	





 Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004. 
 Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf 	



                                                                                                                                    Oge	
  Marques	
  
The semantic gap	

•  The semantic gap is the lack of coincidence
   between the information that one can extract
   from the visual data and the interpretation that
   the same data have for a user in a given situation.	

      •  “The pivotal point in content-based retrieval is that the user
         seeks semantic similarity, but the database can only provide
         similarity by data processing. This is what we called the
         semantic gap.” [Smeulders et al., 2000]	





                                                                 Oge	
  Marques	
  
Alipr	





           Oge	
  Marques	
  
Alipr	





           Oge	
  Marques	
  
Alipr	





           Oge	
  Marques	
  
Alipr	





           Oge	
  Marques	
  
Google similarity search	





                              Oge	
  Marques	
  
Google similarity search	





                              Oge	
  Marques	
  
Google sort by subject	





http://www.google.com/landing/imagesorting/ 	

                                                  Oge	
  Marques	
  
Google image swirl	





http://image-swirl.googlelabs.com/ 	

   Oge	
  Marques	
  
How I see it…	

•  The semantic gap problem has not been solved (and
   maybe will never be…)	


•  What are the alternatives?	

   –  Treat visual similarity and semantic relatedness differently	

      •  Examples: Alipr, Google (or Bing) similarity search, etc.	

   –  Improve both (text-based and visual) search methods
      independently 	

   –  Combine visual and textual information in a meaningful
      way	

   –  Engage the user 	

      •  Collaborative filtering, crowdsourcing, games.	


                                                                        Oge	
  Marques	
  
•  But, wait… There
   are other gaps!	


     –  Just when you
        thought the
        semantic gap was
        your only
        problem…	




Source: [Deserno, Antani, and Long, 2009]
                                            Oge	
  Marques	
  
Large datasets and broad domains	

•  Large datasets bring additional challenges in all
   aspects of the system:	

  –  Storage requirements: images, metadata, and “visual
     signatures”	

  –  Computational cost of indexing, searching, retrieving,
     and displaying images	

  –  Network and latency issues 	

	





                                                        Oge	
  Marques	
  
Large datasets and broad domains	





       Source: Smeulders et al., “Content-based image retrieval at the end of
       the early years”, IEEE Transactions on PAMI, Vol 22, Issue 12, Dec 2000	

   Oge	
  Marques	
  
Challenge: users’ needs and intentions	

•  Users and developers have quite different views	

•  Cultural and contextual information should be
   taken into account	

•  User intentions are hard to infer	

  –  Privacy issues	

  –  Users themselves don’t always know what they want	

  –  Who misses the MS Office paper clip?	





                                                     Oge	
  Marques	
  
Challenge: users’ needs and intentions	

•  The user’s
   perspective	

   –  What do they
      want? 	

   –  Where do
      they want to
      search?	

   –  In what form
      do they
      express their   Source: R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image Retrieval: Ideas,
      query? 	

      Influences, and Trends of the New Age”, ACM Computing Surveys, April 2008. 	

                      	



                                                                                       Oge	
  Marques	
  
Challenge: users’ needs and intentions	

•  The image
   retrieval system
   should be able to
   be mindful of: 	

   –  How users wish
      the results to be
      presented 	

   –  Where users
      desire to search	

   –  The nature of
      user input/           Source: R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image Retrieval: Ideas,
                            Influences, and Trends of the New Age”, ACM Computing Surveys, April 2008. 	

      interaction.	

       	



                                                                                             Oge	
  Marques	
  
Challenge: users’ needs and intentions	

•  Each application has
   different users (with
   different intent, needs,
   background, cultural bias,
   etc.) and different visual
   assets. 	



               ???


                                     Oge	
  Marques	
  
Challenge: growing up (as a field)	

•  It’s been 10 years since the “end of the early years”	





   –  Are the challenges from 2000 still relevant?	

   –  Are the directions and guidelines from 2000 still
      appropriate?	

   –  Have we grown up (at all)?	

   –  Let’s revisit the ‘Concluding Remarks’ from that paper…	


                                                            Oge	
  Marques	
  
Revisiting [Smeulders et al. 2000]	


What they said	

                         How I see it	

•  Driving forces	

                      •  Yes, we have seen many new
                                             audiences, new purposes, new
   –  “[…] content-based image               styles of use, and new modes
      retrieval (CBIR) will continue         of interaction emerge.	

      to grow in every direction:
      new audiences, new purposes,        •  Each of these usually requires
      new styles of use, new modes           new methods to solve the
      of interaction, larger data sets,      problems that they bring.	

      and new methods to solve
      the problems.”	

                   •  However, not too many
                                             researchers see them as a
                                             driving force (as they should).	




                                                                             Oge	
  Marques	
  
Revisiting [Smeulders et al. 2000]	


What they said	

                 How I see it	

•  Heritage of computer vision	

 •  I’m afraid I have bad news…	

                                       –  Computer vision hasn’t made
   –  “An important obstacle to           so much progress during the
      overcome […] is to realize          past 10 years.	

      that image retrieval does not
      entail solving the general       –  Some classical problems 
      image understanding                 (including image 
                                          understanding)
      problem.”	

                        remain unresolved.	


                                       –  Similarly, CBIR from a 
                                          pure computer vision
                                          perspective didn’t work 
                                          too well either.	




                                                                        Oge	
  Marques	
  
Revisiting [Smeulders et al. 2000]	


What they said	

                      How I see it	

•  Influence on computer                •  The adoption of large data sets
                                          became standard practice in
   vision	

                              computer vision.	

   –  “[…] CBIR offers a different     •  No reliance on strong
      look at traditional computer        segmentation (still unresolved) led
                                          to new areas of research, e.g.,
      vision problems: large data         automatic ROI extraction and RBIR.	

      sets, no reliance on strong      •  Color image processing and color
      segmentation, and revitalized       descriptors became incredibly
      interest in color image             popular, useful, and (to some
      processing and invariance.”	

      degree) effective.	

                                       •    Invariance still a huge problem	

                                            –  But it’s cheaper than ever to have
                                               multiple views.	




                                                                                 Oge	
  Marques	
  
Revisiting [Smeulders et al. 2000]	


What they said	

                         How I see it	

•  Similarity and learning	

             •  The authors were pointing in the
                                             right direction (human in the
    –  “We make a pledge for the             loop, role of context, benefits
       importance of human- based            from learning,…)	

       similarity rather than general
       similarity. Also, the connection   •  However:	

       between image semantics,               –  Similarity is a tough problem to
                                                 crack and model. 	

       image data, and query context               •  Even the understanding of how
       will have to be made clearer                   humans judge image similarity is
                                                      very limited.	

       in the future.”	

                     –  Machine learning is almost
    –  “[…] in order to bring                    inevitable…	

                                                   •  … but sometimes it can be
       semantics to the user, learning                abused.	

       is inevitable.” 	



                                                                                    Oge	
  Marques	
  
Revisiting [Smeulders et al. 2000]	


What they said	

                     How I see it	

•  Interaction	

                     •  Significant progress on
   –  Better visualization options,      visualization interfaces and
      more control to the user,          devices.	

      ability to provide feedback
      […]	

                                      •  Relevance Feedback: still a
                                         very tricky tradeoff (effort
                                         vs. perceived benefit), but
                                         more popular than ever
                                         (rating, thumbs up/down,
                                         etc.)	


                                                                    Oge	
  Marques	
  
Revisiting [Smeulders et al. 2000]	


What they said	

                     How I see it	

•  Need for databases	

              •  Very little progress	

   –  “The connection between
      CBIR and database research is      –  Image search and retrieval has
      likely to increase in the             benefited much more from
      future. […] problems like the         document information
      definition of suitable query           retrieval than from database
      languages, efficient search in         research.	

      high dimensional feature
      space, search in the presence
      of changing similarity
      measures are largely unsolved
      […]”	



                                                                      Oge	
  Marques	
  
Revisiting [Smeulders et al. 2000]	


What they said	

                        How I see it	

•  The problem of evaluation	

          •  Significant progress on
    –  CBIR could use a reference           benchmarks, standardized
       standard against which new           datasets, etc. 	

       algorithms could be evaluated
       (similar to TREC in the field of      –  ImageCLEF	

       text recognition). 	

               –  Pascal VOC Challenge	

    –  “A comprehensive and publicly        –  MSRA dataset	

       available collection of images,      –  Simplicity dataset	

       sorted by class and retrieval        –  UCID dataset and ground truth
       purposes, together with a               (GT)	

       protocol to standardize              –  Accio / SIVAL dataset and GT	

       experimental practices, will be      –  Caltech 101, Caltech 256	

       instrumental in the next phase       –  LabelMe	

       of CBIR.”	



                                                                          Oge	
  Marques	
  
Revisiting [Smeulders et al. 2000]	


What they said	

                        How I see it	

•  Semantic gap and other                •  The semantic gap problem
   sources	

                               has not been solved (and
   –  “A critical point in the              maybe will never be…)	

      advancement of CBIR is the
      semantic gap, where the
      meaning of an image is rarely      •  But the idea about using
      self-evident. […] One way to          other sources was right on
      resolve the semantic gap              the spot!	

      comes from sources outside
                                            –  Geographical context	

      the image by integrating other
      sources of information about the      –  Social networks	

      image in the query.”	

               –  Tags	



                                                                         Oge	
  Marques	
  
Part II	


Medical Image Retrieval
Medical image retrieval	

•  Challenges	

   –  We’re entering a new country… 	

      •  How much can we bring?	

      •  Do we speak the language?	

      •  Do we know their culture?	

      •  Do they understand us and where we come from?	

•  Opportunities	

   –  They use images (extensively)	

   –  They have expert knowledge	

   –  Domains are narrow (almost by definition)	

   –  Fewer clients, but potentially more $$	


                                                            Oge	
  Marques	
  
Medical image retrieval	

•  Selected challenges:	

   –  Different terminology	

   –  Standards	

   –  Modality dependencies	



•  Other challenges:	

   –  Equipment dependencies	

   –  Privacy issues	

   –  Proprietary data 	


                                         Oge	
  Marques	
  
Different terminology	

•  Be prepared for:	

   –  New acronyms	

      •  CBMIR (Content-Based Medical Image Retrieval)	

      •  PACS (Picture Archiving and Communication System)	

      •  DICOM (Digital Imaging and COmmunication in Medicine)	

      •  Hospital Information Systems (HIS)	

      •  Radiological Information Systems (RIS)	

   –  New phrases	

      •  Imaging informatics	

   –  Lots of technical medical terms 	


                                                             Oge	
  Marques	
  
Standards	

•  DICOM (http://medical.nema.org/)	

   –  Global IT standard, created in 1993, used in virtually all
      hospitals worldwide. 	

   –  Designed to ensure the interoperability of different
      systems and manage related workflow.	

   –  Will be required by all EHR systems that include imaging
      information as an integral part of the patient record. 	

   –  750+ technical and medical experts participate in 20+
      active DICOM working groups.	

   –  Standard is updated 4-5 times per year.	

   –  Many available tools! (see http://www.idoimaging.com/)	


                                                             Oge	
  Marques	
  
Medical image modalities	

•  The IRMA code [Lehmann et al., 2003]	

   –  4 axes with 3 to 4 positions, each in {0,...9,a,...,z}, where 0
      denotes unspecified to determine the end of a path along an
      axis. 	

       •  Technical code (T) describes the imaging modality	

       •  Directional code (D) models body orientations	

       •  Anatomical code (A) refers to the body region examined	

       •  Biological code (B) describes the biological system
          examined. 	





                                                                   Oge	
  Marques	
  
Medical image modalities	

•  The IRMA code [Lehmann et al., 2003]	

   –  The entire code results in a character string of 14
      characters (IRMA: TTTT – DDD – AAA – BBB).	




 Example: “x-ray, projection radiography,
 analog, high energy – sagittal, left lateral
 decubitus, inspiration – chest, lung –
 respiratory system, lung”




                         Source: [Lehmann et al., 2003]
                                                             Oge	
  Marques	
  
Medical image modalities	

•  The IRMA code
   [Lehmann et al.,
   2003]	


     –  The companion
        tool…	





Source: [Lehmann et al., 2004]



                                               Oge	
  Marques	
  
CBMIR vs. text-based MIR	

•  Most current retrieval systems in clinical use rely on
   text keywords such as DICOM header information to
   perform retrieval.	

•  CBIR has been widely researched in a variety of
   domains and provides an intuitive and expressive
   method for querying visual data using features, e.g.
   color, shape, and texture.	

•  However, current CBIR systems:	

   –  are not easily integrated into the healthcare environment; 	

   –  have not been widely evaluated using a large dataset; and 	

   –  lack the ability to perform relevance feedback to refine
      retrieval results.	


                     Source: [Hsu et al., 2009]
                                                              Oge	
  Marques	
  
Who are the main players?	

•  USA	

   –  NIH (National Institutes of Health)	

      •  NIBIB - National Institute of Biomedical Imaging and
         Bioengineering	

      •  NCI - National Cancer Institute	

      •  NLM – National Libraries of Medicine	

   –  Several universities and hospitals	

•  Europe	

   –  Aachen University (Germany)	

   –  Geneva University (Switzerland)	

•  Big companies (Siemens, GE, etc.)	

                                                                Oge	
  Marques	
  
Medical image retrieval systems: examples	

•  IRMA (Image Retrieval in Medical Applications)	


   –  Aachen University (Germany)	

      •  http://ganymed.imib.rwth-aachen.de/irma/	



   –  3 online demos:	

      •  IRMA Query demo: allows the evaluation of CBIR on several
         databases.	

      •  IRMA Extended Query Refinement demo: CBIR from the IRMA
         database (a subset of 10,000 images). 	

      •  Spine Pathology and Image Retrieval Systems (SPIRS) designed by the
         NLM/NIH (USA): holds information of ~17,000 spine x-rays.	


                                                                      Oge	
  Marques	
  
Medical image retrieval systems: examples	

•  MedGIFT (GNU Image Finding Tool)	


  –  Geneva University (Switzerland)	

     •  http://www.sim.hcuge.ch/medgift/	



  –  Large effort, including projects such as:	

     •  Talisman (lung image retrieval) 	

     •  Case-based fracture image retrieval system	

     •  Onco-Media: medical image retrieval + grid computing	

     •  ImageCLEF: evaluation and validation	

     •  medSearch	

                                                                  Oge	
  Marques	
  
Medical image retrieval systems: examples	

•  WebMIRS	


  –  NIH / NLM (USA)	

     •  http://archive.nlm.nih.gov/proj/webmirs/index.php 	



  –  Query by text + navigation by categories	


  –  Uses datasets and related x-ray images from the
     National Health and Nutrition Examination Survey
     (NHANES)	


                                                                Oge	
  Marques	
  
Medical image retrieval systems: examples	

•  SPIRS (Spine Pathology  Image Retrieval System):
   Web-based image retrieval system for large
   biomedical databases	

  –  NIH / UCLA (USA)	

  –  Representative case study on highly specialized CBMIR	





                       Source: [Hsu et al., 2009]      Oge	
  Marques	
  
Medical image retrieval systems: examples	

•  National Biomedical Imaging Archive (NBIA)	


  –  NCI / NIH (USA)	

     •  https://imaging.nci.nih.gov/ 	



  –  Search based on metadata (DICOM fields)	

  –  3 search options:	

     •  Simple	

     •  Advanced	

     •  Dynamic	


                                                   Oge	
  Marques	
  
Medical image retrieval systems: examples	

•  ARSS Goldminer 	


  –  American Roentgen Ray Society (USA)	

     •  http://goldminer.arrs.org/ 	



  –  Query by text	

  –  Results can be filtered by:	

     •  Modality	

     •  Age	

     •  Sex	


                                              Oge	
  Marques	
  
Evaluation: ImageCLEF Medical Image Retrieval 	

•  ImageCLEF Medical Image 
   Retrieval 	

     •  http://www.imageclef.org/2011/medical 	

  –  Dataset: 77,000+ images from articles published in
     medical journals including text of the captions and link
     to the html of the full text articles. 	

  –  3 types of tasks:	

     •  Modality Classification: given an image, return its modality	

     •  Ad-hoc retrieval: classic medical retrieval task, with 3
        “flavors”: textual, mixed and semantic queries 	

     •  Case-based retrieval: retrieve cases including images that
        might best suit the provided case description. 	


                                                                   Oge	
  Marques	
  
Medical Image Retrieval: promising directions	

•  Better user interfaces (responsive, highly interactive,
   and capable of supporting relevance feedback)	

•  New applications of CBMIR, including:	

   –  Teaching	

   –  Research 	

   –  Diagnosis 	

   –  PACS and Electronic Patient Records	

•  CBMIR evaluation using medical experts	

•  Integration of local and global features	

•  New visual descriptors	

                                                      Oge	
  Marques	
  
Medical Image Retrieval: promising directions	

•  New devices	





                                             Oge	
  Marques	
  
Part III	


Mobile visual search
Mobile visual search: driving factors	

  •  Age of mobile computing	





hIp://60secondmarketer.com/blog/2011/10/18/more-­‐mobile-­‐phones-­‐than-­‐toothbrushes/	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Why do I need a camera? I have a smartphone…
     
         	

  
  (22 Dec 2011) 	





hIp://www.cellular-­‐news.com/story/52382.php	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Powerful devices	





                                                            1 GHz ARM
                                                            Cortex-A9
                                                            processor,
                                                            PowerVR
                                                            SGX543MP2,
  	

                                                            Apple A5 chipset	

  	


  	

hIp://www.apple.com/iphone/specs.html	
  	
  
hIp://www.gsmarena.com/apple_iphone_4s-­‐4212.php	
  	
                           Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Powerful devices	





hIp://europe.nokia.com/PRODUCT_METADATA_0/Products/Phones/8000-­‐series/808/Nokia808PureView_Whitepaper.pdf	
  	
  
hIp://www.nokia.com/fr-­‐fr/produits/mobiles/808/	
  	
                                                               Oge	
  Marques	
  
Mobile visual search: driving factors	

                                                    Social networks
                                                    and mobile
                                                    devices	

                                                      
                                                      (May 2011)	





hIp://jess3.com/geosocial-­‐universe-­‐2/	
  	
                       Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Social networks and mobile devices	

           –  Motivated users: image taking and image sharing are
              huge!	





           	



:	
  hIp://www.onlinemarke_ng-­‐trends.com/2011/03/facebook-­‐photo-­‐sta_s_cs-­‐and-­‐insights.html	
  	
     Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Instagram: 	

           –  50 million registered users (35 M in last four
              months)	

           –  7 employees	

           –  A (growing ecosystem) based on it!	

                    •    Search 	

                    •    Send postcards	

                    •    Manage your photos	

                    •    Build a poster	

                    •    etc.	

           –  Sold to Facebook (for $ 1 Billion !) 
              earlier this year	

  	

hIp://thenextweb.com/apps/2011/12/07/instagram-­‐hits-­‐15m-­‐users-­‐and-­‐has-­‐2-­‐people-­‐working-­‐on-­‐an-­‐android-­‐app-­‐right-­‐now/	
  	
  
hIp://www.nuwomb.com/instagram/	
  	
  	
                                                                                                                 Oge	
  Marques	
  
Mobile visual search: driving factors	

  •  Legitimate (or not quite…) needs and use cases	





hIp://www.slideshare.net/dtunkelang/search-­‐by-­‐sight-­‐google-­‐goggles	
  
hIps://twiIer.com/#!/courtanee/status/14704916575	
  	
  	
                      Oge	
  Marques	
  
Search system, a low-latency interactive visual search system.         base and is the key to very fast retr
                                                      Several sidebars in this article invite the interested reader to dig   features they have in common wit
                                                      deeper into the underlying algorithms.                                 of potentially similar images is sele
                                                                                                                                 Finally, a geometric verificatio

            Mobile visual search: driving factors	

  ROBUST MOBILE IMAGE RECOGNITION
                                                      Today, the most successful algorithms for content-based image
                                                                                                                             most similar matches in the datab
                                                                                                                             spatial pattern between features of
                                                      retrieval use an approach that is referred to as bag of features       didate database image to ensure
                                                      (BoFs) or bag of words (BoWs). The BoW idea is borrowed from           Example retrieval systems are pres
    •  A natural use case for CBIR with QBE (at last!)	

                                                      text retrieval. To find a particular text document, such as a Web
                                                      page, it is sufficient to use a few well-chosen words. In the
                                                                                                                                 For mobile visual search, ther
                                                                                                                             to provide the users with an int
               –  The example is right in front of the user!	

                                                      database, the document itself can be likewise represented by a         deployed systems typically transm
                                                                                                                             the server, which might require t
                                                                                                                             large databases, the inverted file in
                                                                                                                             memory swapping operations slow
                                                                                                                             ing stage. Further, the GV step
                                                                                                                             and thus increases the response t
                                                                                                                             the retrieval pipeline in the follow
                                                                                                                             the challenges of mobile visual se




                                                                                                                                    Query         Feature
                                                                                                                                    Image        Extraction


                                                                                                                             [FIG2] A Pipeline for image retrieva
                                                                                                                             from the query image. Feature mat
                                                      [FIG1] A snapshot of an outdoor mobile visual search system            images in the database that have m
                                                      being used. The system augments the viewfinder with                    with the query image. The GV step
                                                      information about the objects it recognizes in the image taken         feature locations that cannot be pl
                                                      with a camera phone.                                                   in viewing position.
Girod	
  et	
  al.	
  IEEE	
  Mul_media	
  2011	
                                                                                           Oge	
  Marques	
  
MVS: technical challenges	

•  How to ensure low latency (and interactive
   queries) under constraints such as:	

  –  Network bandwidth	

  –  Computational power 	

  –  Battery consumption	

•  How to achieve robust visual recognition in spite
   of low-resolution cameras, varying lighting
   conditions, etc.	

•  How to handle broad and narrow domains	


                                                 Oge	
  Marques	
  
MVS: Pipeline for image retrieval	





Girod	
  et	
  al.	
  IEEE	
  Mul_media	
  2011	
      Oge	
  Marques	
  
3 scenarios	





Girod	
  et	
  al.	
  IEEE	
  Mul_media	
  2011	
                      Oge	
  Marques	
  
MVS: descriptor extraction	

    •  Interest point detection	

    •  Feature descriptor computation	





Girod	
  et	
  al.	
  IEEE	
  Mul_media	
  2011	
                  Oge	
  Marques	
  
Interest point detection	

   •  Numerous interest-point detectors have been proposed in
      the literature:	

              –  Harris Corners (Harris and Stephens 1988)	

              –  Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian
                 (DoG) (Lowe 2004)	

              –  Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002)	

              –  Hessian affine (Mikolajczyk et al. 2005)	

              –  Features from Accelerated Segment Test (FAST) (Rosten and
                 Drummond 2006)	

              –  Hessian blobs (Bay, Tuytelaars and Van Gool 2006) 	

   •  Different tradeoffs in repeatability and complexity	

   •  See (Mikolajczyk and Schmid 2005) for a comparative
      performance evaluation of local descriptors in a common
      framework. 	


Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
     Oge	
  Marques	
  
Feature descriptor computation	

   •  After interest-point detection, we compute a
      visual word descriptor on a normalized patch. 	


   •  Ideally, descriptors should be:	

              –  robust to small distortions in scale, orientation, and
                 lighting conditions;	

              –  discriminative, i.e., characteristic of an image or a small
                 set of images;	

              –  compact, due to typical mobile computing constraints.	



Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
     Oge	
  Marques	
  
Feature descriptor computation	

   •  Examples of feature descriptors in the literature:	

              –  SIFT (Lowe 1999)	

              –  Speeded Up Robust Feature (SURF) interest-point
                 detector (Bay et al. 2008) 	

              –  Gradient Location and Orientation Histogram (GLOH)
                 (Mikolajczyk and Schmid 2005)	

              –  Compressed Histogram of Gradients (CHoG)
                 (Chandrasekhar et al. 2009, 2010)	

   •  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and
      (Mikolajczyk and Schmid PAMI 2005) for comparative
      performance evaluation of different descriptors. 	

Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
     Oge	
  Marques	
  
Feature descriptor computation	

   •  What about compactness?	

              –  Option 1: Compress off-the-shelf descriptors. 	

                         •  Result: poor rate-constrained image-retrieval
                            performance. 	



              –  Option 2: Design a descriptor with compression in
                 mind. 	

                        –  Example: CHoG (Compressed Histogram of Gradients) 
                           (Chandrasekhar et al. 2009, 2010)	




Girod	
  et	
  al.	
  IEEE	
  Signal	
  Processing	
  Magazine	
  2011	
     Oge	
  Marques	
  
CHoG: Compressed Histogram of Gradients	

                                                  Gradients
   Gradient distributions
                                Patch
                             for each bin
                                                     dx



                                                     dy

                                                               dx
                                                                            dy
             011101


                                                  Spatial
                                  0100101


                                                  binning
                                                                                            01101

                                                                                            101101  

                                                                  Histogram
                                                                                            0100011

                                                                                            111001  

                                                                 compression
                                                                                            0010011

                                                                                            01100

                                                                                            1010100
                                                                                                    

                                                                                          CHoG

                                                                                         Descriptor
       Bernd Girod: Mobile Visual Search
Chandrasekhar	
  et	
  al.	
  CVPR	
  09,10	
                                                  Oge	
  Marques	
  
CHoG: Compressed Histogram of Gradients	

                                                                                  [3B2-9]   mmu2011030086.3d    30/7/011    16:27   Page 92


    •  Performance evaluation	

               –  Recall vs. bit rate	

      Industry and Standards


                                                                           100
                                                                                                                                                       features, as they arrive.15 On
                                                                           98                                                                          finds a result that has sufficien
                                                                                                                                                       ing score, it terminates the searc
                                                                           96                                                                          ately sends the results back. T
                                                                                                                                                       optimization reduces system
                                             Classification accuracy (%)




                                                                           94
                                                                                                                                                       other factor of two.
                                                                           92                                                                             Overall, the SPS system dem
                                                                                                                                                       using the described array of tec
                                                                           90                                                                          bile visual-search systems can ac
                                                                                                                                                       ognition accuracy, scale to re
                                                                           88
                                                                                                                                                       databases, and deliver search r
                                                                           86                                                                          ceptable time.

                                                                           84                                              Send feature (CHoG)         Emerging MPEG standard
                                                                                                                           Send image (JPEG)              As we have seen, key compo
                                                                           82
                                                                                                                           Send feature (SIFT)         gies for mobile visual search alr
                                                                           80                                                                          we can choose among several p
                                                                                 100                           101                               102
                                                                                                                                                       tures to design such a system. W
                                                                                                  Query size (Kbytes)
                                                                                                                                                       these options at the beginnin
                                               Figure 7. Comparison of different schemes with regard to classification                                 The architecture shown in Figur
Girod	
  et	
  al.	
  IEEE	
  Mul_media	
  2011	
                                                                                                                   Oge	
  Marques	
  
                                                                                                                                                       est one to implement on a mobi
                                               accuracy and query size. CHoG descriptor data is an order of magnitude
                                               smaller compared to JPEG images or uncompressed SIFT descriptors.                                       requires fast networks such as W
                                                                                                                                                       good performance. The archite
MVS: feature indexing and matching	

    •  Goal: produce a data structure that can quickly return a short
       list of the database candidates most likely to match the query
       image. 	

               –  The short list may contain false positives as long as the correct match
                  is included. 	

               –  Slower pairwise comparisons can be subsequently performed on just
                  the short list of candidates rather than the entire database.	

    •  Example of a technique: Vocabulary Tree (VT)-Based Retrieval	





Girod	
  et	
  al.	
  IEEE	
  Mul_media	
  2011	
                                  Oge	
  Marques	
  
MVS: geometric verification	

    •  Goal: use location information of features in
       query and database images to confirm that the
       feature matches are consistent with a change in
       view-point between the two images.	





Girod	
  et	
  al.	
  IEEE	
  Mul_media	
  2011	
                Oge	
  Marques	
  
ik2, c, ikNk 6 is sorted, it is more
utive ID differences 5 dk1 5 ik1,
es.                                       is used to encode the inverted index.




2 ik1Nk 212 6 in place of the IDs. This
 dex [58] can significantly reduce
cting recognition accuracy. First,        [64] and recursive bottom-up complete (RBUC) code [65] have
                                          been shown to be at least ten times faster in decoding than


                                 MVS: geometric verification	

                                          AC, while achieving comparable compression gains as AC. The
                                          carryover and RBUC codes attain these speedups by enforcing
ed in text retrieval [62]. Second,        word-aligned memory accesses.
 n be quantized to a few repre-               Figure S6(a) compares the memory usage of the invert-
               •  Method: perform ed index with and without feature descriptorsRBUC evaluate
Max quantization. Third, the dis-          pairwise matching of compression using the and
ces and visit counts are far from         code. Index compression reduces memory usage from near-
                    geometricrate ly 10 GBof correspondences. 	

 coding can be much more
                                    consistency to 2 GB. This five times reduction leads to a sub-
               •  Techniques: 	

oding. Using the distributions of         stantial speedup in server-side processing, as shown in
counts, each inverted list can be         Figure S6(b). Without compression, the large inverted
 c code (AC) [63].        The geometricindex causes swapping between main anddatabase image is usually
                     –  Since keeping       transform between the query and virtual memory                          estimated
 very important for interactive regression down the retrieval engine. After compression,
                          using robust and slows techniques such as:	

ions, a scheme that allows ultra- sample consensus (RANSAC) (Fischlermemory congestion
                            •  Random memory swapping is avoided and and Bolles 1981)	

 red over AC. The carryover code          delays no longer contribute to the query latency.
                           •  Hough transform (Lowe 2004)	

                    –  The transformation is often represented by an affine mapping or a homography. 	

        •  Note: GV is computationally expensive, which is why it’s only used for a subset
            of images selected during the feature-matching stage. 	

onsistency checks to rerank
 tion and scale information of
  [53] and [69] propose incor-
tion into the VT matching or
 71], the authors investigate
stimation itself. Philbin et al.
atching features to propose
 c transformation model and
 hypotheses. Weak geometric
cally used to rerank a larger
ore a full GVt	
  al.	
  Iperformed on011	
  
        Girod	
  e is EEE	
  Mul_media	
  2                                                                           Oge	
  Marques	
  
                                                [FIG4] In the GV step, we match feature descriptors pairwise and
                                                find feature correspondences that are consistent with a geometric
add a geometric reranking step
Datasets for MVS research	

   •  Stanford Mobile Visual Search Data Set 
           (http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/)	

             –  Key characteristics:	

                       •  rigid objects	

                       •  widely varying lighting conditions	

                       •  perspective distortion	

                       •  foreground and background clutter	

                       •  realistic ground-truth reference data	

                       •  query data collected from heterogeneous low and high-end
                          camera phones. 	




Chandrasekhar	
  et	
  al.	
  ACM	
  MMSys	
  2011	
                          Oge	
  Marques	
  
SMVS Data Set: categories and examples	


  •  DVD covers	





hIp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/dvd_covers.html	
  	
     Oge	
  Marques	
  
SMVS Data Set: categories and examples	


  •  CD covers	





hIp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/cd_covers.html	
  	
     Oge	
  Marques	
  
SMVS Data Set: categories and examples	


  •  Museum paintings	





hIp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/museum_pain_ngs.html	
  	
     Oge	
  Marques	
  
Other MVS data sets	





ISO/IEC	
  JTC1/SC29/WG11/N12202	
  -­‐	
  July	
  2011,	
  Torino,	
  IT	
     Oge	
  Marques	
  
MPEG Compact Descriptors for Visual Search (CDVS)	


   •  Objective	

              –  Define a standard that enables efficient
                 implementation of visual search functionality on mobile
                 devices	

   •  Scope	

                        •  bitstream of descriptors	

                        •  parts of descriptor extraction process (e.g. key-point
                           detection) needed to ensure interoperability	



              –  Additional info: 	

	

                        •  https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs 	

                        •  http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm (Ad hoc groups)	




Bober,	
  Cordara,	
  and	
  Reznik	
  (2010)	
                                                             Oge	
  Marques	
  
MPEG CDVS	

                          [3B2-9]    mmu2011030086.3d       1/8/011   16:44   Page 93




  •  Summarized timeline	

         Table 1. Timeline for development of MPEG standard for visual search.


         When                   Milestone                             Comments
         March, 2011            Call for Proposals is published       Registration deadline: 11 July 2011
                                                                      Proposals due: 21 November 2011
         December, 2011         Evaluation of proposals               None
         February, 2012         1st Working Draft                     First specification and test software model that can
                                                                        be used for subsequent improvements.
         July, 2012             Committee Draft                       Essentially complete and stabilized specification.
         January, 2013          Draft International Standard          Complete specification. Only minor editorial
                                                                        changes are allowed after DIS.
         July, 2013             Final Draft International             Finalized specification, submitted for approval and
                                    Standard                            publication as International standard.




                that among several component technologies for         existing standards, such as MPEG Query For-
                image retrieval, such a standard should focus pri-    mat, HTTP, XML, JPEG, and JPSearch.
                marily on defining the format of descriptors and
Girod	
  et	
  al.	
  IEEE	
  Mul_media	
  2011	
                                                                    Oge	
  Marques	
  
                parts of their extraction process (such as interest   Conclusions and outlook
                point detectors) needed to ensure interoperabil-         Recent years have witnessed remarkable
Examples	

•    Google Goggles	

•    SnapTell 	

•    oMoby (and the IQ Engines API)	

•    pixlinQ	

•    Moodstocks	





                                         Oge	
  Marques	
  
Examples of commercial MVS apps	

  •  Google
     Goggles 	

          –  Android
             and iPhone	

          –  Narrow-
             domain
             search and
             retrieval	





hIp://www.google.com/mobile/goggles	
  	
     Oge	
  Marques	
  
SnapTell	

                                                             	

  •  One of the earliest (ca. 2008) MVS apps for iPhone	

          –  Eventually acquired by Amazon (A9)	

  •  Proprietary technique (“highly accurate and robust
     algorithm for image matching: Accumulated Signed Gradient
     (ASG)”).	





hIp://www.snaptell.com/technology/index.htm	
  	
                   Oge	
  Marques	
  
oMoby (and the IQ Engines API)	

          –  iPhone app	





hIp://omoby.com/pages/screenshots.php	
  	
     Oge	
  Marques	
  
oMoby (and the IQ Engines API)	


  •  The IQ Engines API: 
     “vision as a service”	





hIp://www.iqengines.com/applica_ons.php	
  	
     Oge	
  Marques	
  
pixlinQ	

  •  A “mobile visual
     search solution that
     enables you to link
     users to digital
     content whenever
     they take a mobile
     picture of your
     printed materials.”	

          –  Powered by image
             recognition from LTU
             technologies	


hIp://www.pixlinq.com/home	
  	
                  Oge	
  Marques	
  
pixlinQ	

  •  Example app (La Redoute)	





hIp://www.youtube.com/watch?v=qUZCFtc42Q4	
  	
                  Oge	
  Marques	
  
Moodstocks: overview	

  •  Offline image recognition thanks to a smart image
      signatures synchronization	

  	





hIp://www.youtube.com/watch?v=tsxe23b12eU	
  	
         Oge	
  Marques	
  
Moodstocks: technology	

•  Unique features:	

   –  offline image recognition thanks to a smart image signatures
      synchronization,	

   –  QR Code decoding,	

   –  EAN 8/13 decoding,	

   –  online image recognition as a fallback for very large image databases,	

   –  simultaneous run of image recognition and barcode decoding,	

   –  seamless scans logging in the background.	


•  Cross-platform (iOS / Android) client-side SDK and HTTP API
   available: https://github.com/Moodstocks 	


•  JPEG encoder used within their SDK also publicly
   available: https://github.com/Moodstocks/jpec 	



                                                                         Oge	
  Marques	
  
Moodstocks	

  •  Many successful apps for different platforms	





hIp://www.moodstocks.com/gallery/	
  	
                     Oge	
  Marques	
  
MVS: concluding thoughts	

•  Mobile Visual Search (MVS) is coming of age.	


•  This is not a fad and it can only grow.	


•  Still a good research topic	

   –  Many relevant technical challenges	

   –  MPEG efforts have just started	



•  Infinite creative commercial possibilities	

                                                     Oge	
  Marques	
  
Part IV	


Where is image search headed?
Where is image search headed? 	

•  Advice for [young] researchers	


  –  In this last part, I’ve compiled pieces and bits of advice
     that I believe might help researchers who are entering
     the field.	


  –  They focus on research avenues that I personally
     consider to be the most promising.	





                                                          Oge	
  Marques	
  
Advice for [young] researchers	


• LOOK	

• THINK	

• UNDERSTAND	

• CREATE	

                                  Oge	
  Marques	
  
Advice for [young] researchers	


• LOOK…	

 –  at yourself (how do you search for images and videos?)	


 –  around (related areas and how they have grown)	

 	

 –  at Google (and other major players)	




                                                        Oge	
  Marques	
  
Advice for [young] researchers	


• THINK…	

 –  mobile devices	


 –  new devices and services	


 –  social networks	


 –  games	

                                    Oge	
  Marques	
  
Advice for [young] researchers	


• UNDERSTAND…	

 –  human intentions and emotions	


 –  the context of the search	


 –  user’s preferences and needs	




                                       Oge	
  Marques	
  
Advice for [young] researchers	


• CREATE…	

      –  better interfaces	


      –  better user experience	


      –  new business opportunities (added value)	

	


                                                       Oge	
  Marques	
  
Concluding thoughts	

–  I believe (but cannot prove…) that successful VIR
   solutions will:	

  •  combine content-based image retrieval (CBIR) with
     metadata (high-level semantic-based image retrieval)	

  •  only be truly successful in narrow domains	

  •  include the user in the loop	

      –  Relevance Feedback (RF)	

      –  Collaborative efforts (tagging, rating, annotating)	

  •  provide friendly, intuitive interfaces	

  •  incorporate results and insights from cognitive science,
     particularly human visual attention, perception, and
     memory	



                                                                  Oge	
  Marques	
  
Concluding thoughts	

–  I believe (but cannot prove…) that successful VIR
   solutions will:	

  •  combine content-based image retrieval (CBIR) with
     metadata (high-level semantic-based image retrieval)	

  •  only be truly successful in narrow domains	

  •  include the user in the loop	

      –  Relevance Feedback (RF)	

      –  Collaborative efforts (tagging, rating, annotating)	

  •  provide friendly, intuitive interfaces	

  •  incorporate results and insights from cognitive science,
     particularly human visual attention, perception, and
     memory	



                                                                  Oge	
  Marques	
  
Concluding thoughts	

–  I believe (but cannot prove…) that successful VIR
   solutions will:	

  •  combine content-based image retrieval (CBIR) with
     metadata (high-level semantic-based image retrieval)	

  •  only be truly successful in narrow domains	

  •  include the user in the loop	

      –  Relevance Feedback (RF)	

      –  Collaborative efforts (tagging, rating, annotating)	

  •  provide friendly, intuitive interfaces	

  •  incorporate results and insights from cognitive science,
     particularly human visual attention, perception, and
     memory	



                                                                  Oge	
  Marques	
  
Concluding thoughts	

–  I believe (but cannot prove…) that successful VIR
   solutions will:	

  •  combine content-based image retrieval (CBIR) with
     metadata (high-level semantic-based image retrieval)	

  •  only be truly successful in narrow domains	

  •  include the user in the loop	

      –  Relevance Feedback (RF)	

      –  Collaborative efforts (tagging, rating, annotating)	

  •  provide friendly, intuitive interfaces	

  •  incorporate results and insights from cognitive science,
     particularly human visual attention, perception, and
     memory	



                                                                  Oge	
  Marques	
  
Concluding thoughts	

–  I believe (but cannot prove…) that successful VIR
   solutions will:	

  •  combine content-based image retrieval (CBIR) with
     metadata (high-level semantic-based image retrieval)	

  •  only be truly successful in narrow domains	

  •  include the user in the loop	

      –  Relevance Feedback (RF)	

      –  Collaborative efforts (tagging, rating, annotating)	

  •  provide friendly, intuitive interfaces	

  •  incorporate results and insights from cognitive science,
     particularly human visual attention, perception, and
     memory	



                                                                  Oge	
  Marques	
  
Concluding thoughts	

•  “Image search and retrieval” is not a problem, but
   rather a collection of related problems that look like
   one.	


•  There is a great need for good solutions to specific
   problems. 	


•  10 years after “the end of the early years”, research in
   visual information retrieval still has many open
   problems, challenges, and opportunities.	


                                                      Oge	
  Marques	
  
Learn more about it	

•  http://savvash.blogspot.com/ 	





                                      Oge	
  Marques	
  
Thanks!	

•  Questions?	





•  For additional information: omarques@fau.edu	

                                                 Oge	
  Marques	
  

Weitere ähnliche Inhalte

Andere mochten auch

Region-based volumetric medical image retrieval
Region-based volumetric medical image retrievalRegion-based volumetric medical image retrieval
Region-based volumetric medical image retrieval
Institute of Information Systems (HES-SO)
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
Robert Viseur
 
Proforma 1 source_october_specials
Proforma 1 source_october_specialsProforma 1 source_october_specials
Proforma 1 source_october_specials
proformami
 

Andere mochten auch (20)

Region-based volumetric medical image retrieval
Region-based volumetric medical image retrievalRegion-based volumetric medical image retrieval
Region-based volumetric medical image retrieval
 
Offline Images Retrieval in PACS
Offline Images Retrieval in PACSOffline Images Retrieval in PACS
Offline Images Retrieval in PACS
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
 
ECGS Module 7
ECGS Module 7ECGS Module 7
ECGS Module 7
 
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSISFUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
Types of fossils
Types of fossilsTypes of fossils
Types of fossils
 
Proforma 1 source_october_specials
Proforma 1 source_october_specialsProforma 1 source_october_specials
Proforma 1 source_october_specials
 
Merieu p-frankestein-educador-9-32
Merieu p-frankestein-educador-9-32Merieu p-frankestein-educador-9-32
Merieu p-frankestein-educador-9-32
 
Mural projects I worked on
Mural projects I worked onMural projects I worked on
Mural projects I worked on
 
Crew, FOIA,Documents 016127- 016510
Crew, FOIA,Documents 016127- 016510Crew, FOIA,Documents 016127- 016510
Crew, FOIA,Documents 016127- 016510
 
Crew, FOIA,Documents 016760- 017218
Crew, FOIA,Documents 016760- 017218Crew, FOIA,Documents 016760- 017218
Crew, FOIA,Documents 016760- 017218
 
News Filtering Efforts
News Filtering EffortsNews Filtering Efforts
News Filtering Efforts
 
Email and Social Media Marketing Synergies - Responsys Leadersihp Forum
Email and Social Media Marketing Synergies - Responsys Leadersihp ForumEmail and Social Media Marketing Synergies - Responsys Leadersihp Forum
Email and Social Media Marketing Synergies - Responsys Leadersihp Forum
 
Solar Power Market 2015
Solar Power Market 2015Solar Power Market 2015
Solar Power Market 2015
 
CAR Email 06.05.02 (a)
CAR Email 06.05.02 (a)CAR Email 06.05.02 (a)
CAR Email 06.05.02 (a)
 
Shift Happens
Shift HappensShift Happens
Shift Happens
 
R&B Abeslarien Diskak
R&B Abeslarien DiskakR&B Abeslarien Diskak
R&B Abeslarien Diskak
 
Global Climate Change Science FOIA Documents 1-421
Global Climate Change Science FOIA Documents 1-421Global Climate Change Science FOIA Documents 1-421
Global Climate Change Science FOIA Documents 1-421
 

Ähnlich wie Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil)

Conole icem keynote_final_28_sept
Conole icem keynote_final_28_septConole icem keynote_final_28_sept
Conole icem keynote_final_28_sept
Grainne Conole
 
M&L 2012 - Scientific Imagery in Higher Education - by Ruth Kerr, Ilaria Merciai
M&L 2012 - Scientific Imagery in Higher Education - by Ruth Kerr, Ilaria MerciaiM&L 2012 - Scientific Imagery in Higher Education - by Ruth Kerr, Ilaria Merciai
M&L 2012 - Scientific Imagery in Higher Education - by Ruth Kerr, Ilaria Merciai
Media & Learning Conference
 
Powerpoint cognitive
Powerpoint cognitivePowerpoint cognitive
Powerpoint cognitive
guestf33991
 
Smartphone-Educational Apps
Smartphone-Educational AppsSmartphone-Educational Apps
Smartphone-Educational Apps
sinpaak
 
Monday January 11 Images
Monday January 11   ImagesMonday January 11   Images
Monday January 11 Images
Bill Tracey
 
e-Learning for Radiation Oncology: What, Why & How?
e-Learning for Radiation Oncology: What, Why & How?e-Learning for Radiation Oncology: What, Why & How?
e-Learning for Radiation Oncology: What, Why & How?
adrianaberlanga
 
CBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in MultimediaCBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in Multimedia
dermotte
 
Smatphone
SmatphoneSmatphone
Smatphone
sinpaak
 
Teaching Visual Literacy Skills in a One-Shot Session
Teaching Visual Literacy Skills in a One-Shot SessionTeaching Visual Literacy Skills in a One-Shot Session
Teaching Visual Literacy Skills in a One-Shot Session
mollyjschoen
 

Ähnlich wie Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil) (20)

Promising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in visionPromising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in vision
 
Mapping the use of digital sources amongst Humanities scholars in the Netherl...
Mapping the use of digital sources amongst Humanities scholars in the Netherl...Mapping the use of digital sources amongst Humanities scholars in the Netherl...
Mapping the use of digital sources amongst Humanities scholars in the Netherl...
 
Diagramming, Figures, and Imagery (2D): Think Visual in Online Learning
Diagramming, Figures, and Imagery (2D):  Think Visual in Online LearningDiagramming, Figures, and Imagery (2D):  Think Visual in Online Learning
Diagramming, Figures, and Imagery (2D): Think Visual in Online Learning
 
Social Media Workshop - The Content Challenge - Sankalp Unconvention 2013
Social Media Workshop - The Content Challenge - Sankalp Unconvention 2013Social Media Workshop - The Content Challenge - Sankalp Unconvention 2013
Social Media Workshop - The Content Challenge - Sankalp Unconvention 2013
 
Invited Talk OAGM Workshop Salzburg, May 2015
Invited Talk OAGM Workshop Salzburg, May 2015Invited Talk OAGM Workshop Salzburg, May 2015
Invited Talk OAGM Workshop Salzburg, May 2015
 
Conole icem keynote_final_28_sept
Conole icem keynote_final_28_septConole icem keynote_final_28_sept
Conole icem keynote_final_28_sept
 
Image Search: Then and Now
Image Search: Then and NowImage Search: Then and Now
Image Search: Then and Now
 
Avatars' in teaching the early experiences of [autosaved]new
Avatars' in teaching  the early experiences of [autosaved]newAvatars' in teaching  the early experiences of [autosaved]new
Avatars' in teaching the early experiences of [autosaved]new
 
M&L 2012 - Scientific Imagery in Higher Education - by Ruth Kerr, Ilaria Merciai
M&L 2012 - Scientific Imagery in Higher Education - by Ruth Kerr, Ilaria MerciaiM&L 2012 - Scientific Imagery in Higher Education - by Ruth Kerr, Ilaria Merciai
M&L 2012 - Scientific Imagery in Higher Education - by Ruth Kerr, Ilaria Merciai
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
 
Powerpoint cognitive
Powerpoint cognitivePowerpoint cognitive
Powerpoint cognitive
 
Leverage Personal Technology to Streamline Your Days
Leverage Personal Technology to Streamline Your DaysLeverage Personal Technology to Streamline Your Days
Leverage Personal Technology to Streamline Your Days
 
Smartphone-Educational Apps
Smartphone-Educational AppsSmartphone-Educational Apps
Smartphone-Educational Apps
 
Monday January 11 Images
Monday January 11   ImagesMonday January 11   Images
Monday January 11 Images
 
Information Architecture in the Workplace
Information Architecture in the WorkplaceInformation Architecture in the Workplace
Information Architecture in the Workplace
 
e-Learning for Radiation Oncology: What, Why & How?
e-Learning for Radiation Oncology: What, Why & How?e-Learning for Radiation Oncology: What, Why & How?
e-Learning for Radiation Oncology: What, Why & How?
 
CBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in MultimediaCBMI 2013 Presentation: User Intentions in Multimedia
CBMI 2013 Presentation: User Intentions in Multimedia
 
Creating infographics
Creating infographicsCreating infographics
Creating infographics
 
Smatphone
SmatphoneSmatphone
Smatphone
 
Teaching Visual Literacy Skills in a One-Shot Session
Teaching Visual Literacy Skills in a One-Shot SessionTeaching Visual Literacy Skills in a One-Shot Session
Teaching Visual Literacy Skills in a One-Shot Session
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Advances and Challenges in Visual Information Search and Retrieval (WVC 2012 - Goiania-GO, Brazil)

  • 1. Advances and Challenges in Visual Information Search and Retrieval Oge Marques Florida Atlantic University Boca Raton, FL - USA VIII  Workshop  de  Visão  Computacional  (WVC)  2012   May  27  –  30,  2012   Goiania,  GO  -­‐  Brazil  
  • 2. Take-home message Visual Information Retrieval (VIR) is a fascinating research field with many open challenges and opportunities which have the potential to impact the way we organize, annotate, and retrieve visual data (images and videos). Oge  Marques  
  • 3. Disclaimer #1 •  Visual Information Retrieval (VIR) is a highly interdisciplinary field, but … Image and (Multimedia) Information Video Database Retrieval Processing Systems Visual Machine Computer Learning Information Vision Retrieval Visual data Human Visual Data Mining modeling and Perception representation Oge  Marques  
  • 4. Disclaimer #2 •  There are many things that I believe… •  … but cannot prove Oge  Marques  
  • 5. Background and Motivation “What is it that we’re trying to do and why is it so difficult?” –  Taking pictures and storing, sharing, and publishing them has never been so easy and inexpensive. –  If only we could say the same about finding the images we want and retrieving them… Oge  Marques  
  • 6. Background and Motivation The “big mismatch” easy • Take pictures • Store pictures • Publish pictures expensive • Share pictures cheap • Organize pictures • Annotate pictures • Find pictures • Retrieve pictures difficult Oge  Marques  
  • 7. Background and Motivation •  Q: What do you do when you need to find an image (on the Web)? •  A1: Google (image search), of course! Oge  Marques  
  • 8. Background and Motivation Google image search results for “sydney opera house” Source: Google Image Search (http://images.google.com/) Oge  Marques  
  • 9. Background and Motivation Google image search results for “opera” Source: Google Image Search (http://images.google.com/) Oge  Marques  
  • 10. Background and Motivation •  Q: What do you do when you need to find an image (on the Web)? •  A2: Other (so-called specialized) image search engines •  http://images.search.yahoo.com/ •  http://pictures.ask.com •  http://www.bing.com/images Oge  Marques  
  • 11. Yahoo! Oge  Marques  
  • 12. Ask Oge  Marques  
  • 13. Bing Oge  Marques  
  • 14. Background and Motivation •  Q: What do you do when you need to find an image (on the Web)? •  A3: Search directly on large photo repositories: –  Flickr –  Webshots –  Shutterstock Oge  Marques  
  • 15. Background and Motivation Flickr image search results for “opera” Oge  Marques  
  • 16. Background and Motivation Webshots image search results for “opera” Oge  Marques  
  • 17. Background and Motivation Shutterstock image search results for “opera” Oge  Marques  
  • 18. Background and Motivation Are you happy with the results so far? Oge  Marques  
  • 19. Background and Motivation •  Back to our original (two-part) question: –  What is it that we’re trying to do? –  We're trying to create automated solutions to the problem of finding and retrieving visual information, from (large, unstructured) repositories, in a way that satisfies search criteria specified by users, relying (primarily) on the visual contents of the media. Oge  Marques  
  • 20. Background and Motivation •  Why is it so difficult? •  There are many challenges, among them: –  The elusive notion of similarity –  The semantic gap –  Large datasets and broad domains –  Combination of visual and textual information –  The users (and how to make them happy) Oge  Marques  
  • 21. Outline •  Part I – Concepts, challenges, and state of the art •  Part II – Medical image retrieval •  Part III – Mobile visual search •  Part IV – Where is image search headed? Oge  Marques  
  • 22. Part I Concepts, challenges, and state of the art
  • 23. The elusive notion of similarity •  Are these two images similar? Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004. Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf Oge  Marques  
  • 24. The elusive notion of similarity •  Are these two images similar? Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004. Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf Oge  Marques  
  • 25. The elusive notion of similarity •  Is the second or the third image more similar to the first? Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004. Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf Oge  Marques  
  • 26. The elusive notion of similarity •  Which image fits better to the first two: the third or the fourth? Source: Eidenberger, H., Introduction:Visual Information Retrieval, “Habilitation thesis”,Vienna University of Technology, 2004. Available at http://www.ims.tuwien.ac.at/~hme/papers/habil-full.pdf Oge  Marques  
  • 27. The semantic gap •  The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation. •  “The pivotal point in content-based retrieval is that the user seeks semantic similarity, but the database can only provide similarity by data processing. This is what we called the semantic gap.” [Smeulders et al., 2000] Oge  Marques  
  • 28. Alipr Oge  Marques  
  • 29. Alipr Oge  Marques  
  • 30. Alipr Oge  Marques  
  • 31. Alipr Oge  Marques  
  • 32. Google similarity search Oge  Marques  
  • 33. Google similarity search Oge  Marques  
  • 34. Google sort by subject http://www.google.com/landing/imagesorting/ Oge  Marques  
  • 36. How I see it… •  The semantic gap problem has not been solved (and maybe will never be…) •  What are the alternatives? –  Treat visual similarity and semantic relatedness differently •  Examples: Alipr, Google (or Bing) similarity search, etc. –  Improve both (text-based and visual) search methods independently –  Combine visual and textual information in a meaningful way –  Engage the user •  Collaborative filtering, crowdsourcing, games. Oge  Marques  
  • 37. •  But, wait… There are other gaps! –  Just when you thought the semantic gap was your only problem… Source: [Deserno, Antani, and Long, 2009] Oge  Marques  
  • 38. Large datasets and broad domains •  Large datasets bring additional challenges in all aspects of the system: –  Storage requirements: images, metadata, and “visual signatures” –  Computational cost of indexing, searching, retrieving, and displaying images –  Network and latency issues Oge  Marques  
  • 39. Large datasets and broad domains Source: Smeulders et al., “Content-based image retrieval at the end of the early years”, IEEE Transactions on PAMI, Vol 22, Issue 12, Dec 2000 Oge  Marques  
  • 40. Challenge: users’ needs and intentions •  Users and developers have quite different views •  Cultural and contextual information should be taken into account •  User intentions are hard to infer –  Privacy issues –  Users themselves don’t always know what they want –  Who misses the MS Office paper clip? Oge  Marques  
  • 41. Challenge: users’ needs and intentions •  The user’s perspective –  What do they want? –  Where do they want to search? –  In what form do they express their Source: R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image Retrieval: Ideas, query? Influences, and Trends of the New Age”, ACM Computing Surveys, April 2008. Oge  Marques  
  • 42. Challenge: users’ needs and intentions •  The image retrieval system should be able to be mindful of: –  How users wish the results to be presented –  Where users desire to search –  The nature of user input/ Source: R. Datta, D. Joshi, J. Li, and J. Z. Wang, “Image Retrieval: Ideas, Influences, and Trends of the New Age”, ACM Computing Surveys, April 2008. interaction. Oge  Marques  
  • 43. Challenge: users’ needs and intentions •  Each application has different users (with different intent, needs, background, cultural bias, etc.) and different visual assets. ??? Oge  Marques  
  • 44. Challenge: growing up (as a field) •  It’s been 10 years since the “end of the early years” –  Are the challenges from 2000 still relevant? –  Are the directions and guidelines from 2000 still appropriate? –  Have we grown up (at all)? –  Let’s revisit the ‘Concluding Remarks’ from that paper… Oge  Marques  
  • 45. Revisiting [Smeulders et al. 2000] What they said How I see it •  Driving forces •  Yes, we have seen many new audiences, new purposes, new –  “[…] content-based image styles of use, and new modes retrieval (CBIR) will continue of interaction emerge. to grow in every direction: new audiences, new purposes, •  Each of these usually requires new styles of use, new modes new methods to solve the of interaction, larger data sets, problems that they bring. and new methods to solve the problems.” •  However, not too many researchers see them as a driving force (as they should). Oge  Marques  
  • 46. Revisiting [Smeulders et al. 2000] What they said How I see it •  Heritage of computer vision •  I’m afraid I have bad news… –  Computer vision hasn’t made –  “An important obstacle to so much progress during the overcome […] is to realize past 10 years. that image retrieval does not entail solving the general –  Some classical problems image understanding (including image understanding) problem.” remain unresolved. –  Similarly, CBIR from a pure computer vision perspective didn’t work too well either. Oge  Marques  
  • 47. Revisiting [Smeulders et al. 2000] What they said How I see it •  Influence on computer •  The adoption of large data sets became standard practice in vision computer vision. –  “[…] CBIR offers a different •  No reliance on strong look at traditional computer segmentation (still unresolved) led to new areas of research, e.g., vision problems: large data automatic ROI extraction and RBIR. sets, no reliance on strong •  Color image processing and color segmentation, and revitalized descriptors became incredibly interest in color image popular, useful, and (to some processing and invariance.” degree) effective. •  Invariance still a huge problem –  But it’s cheaper than ever to have multiple views. Oge  Marques  
  • 48. Revisiting [Smeulders et al. 2000] What they said How I see it •  Similarity and learning •  The authors were pointing in the right direction (human in the –  “We make a pledge for the loop, role of context, benefits importance of human- based from learning,…) similarity rather than general similarity. Also, the connection •  However: between image semantics, –  Similarity is a tough problem to crack and model. image data, and query context •  Even the understanding of how will have to be made clearer humans judge image similarity is very limited. in the future.” –  Machine learning is almost –  “[…] in order to bring inevitable… •  … but sometimes it can be semantics to the user, learning abused. is inevitable.” Oge  Marques  
  • 49. Revisiting [Smeulders et al. 2000] What they said How I see it •  Interaction •  Significant progress on –  Better visualization options, visualization interfaces and more control to the user, devices. ability to provide feedback […] •  Relevance Feedback: still a very tricky tradeoff (effort vs. perceived benefit), but more popular than ever (rating, thumbs up/down, etc.) Oge  Marques  
  • 50. Revisiting [Smeulders et al. 2000] What they said How I see it •  Need for databases •  Very little progress –  “The connection between CBIR and database research is –  Image search and retrieval has likely to increase in the benefited much more from future. […] problems like the document information definition of suitable query retrieval than from database languages, efficient search in research. high dimensional feature space, search in the presence of changing similarity measures are largely unsolved […]” Oge  Marques  
  • 51. Revisiting [Smeulders et al. 2000] What they said How I see it •  The problem of evaluation •  Significant progress on –  CBIR could use a reference benchmarks, standardized standard against which new datasets, etc. algorithms could be evaluated (similar to TREC in the field of –  ImageCLEF text recognition). –  Pascal VOC Challenge –  “A comprehensive and publicly –  MSRA dataset available collection of images, –  Simplicity dataset sorted by class and retrieval –  UCID dataset and ground truth purposes, together with a (GT) protocol to standardize –  Accio / SIVAL dataset and GT experimental practices, will be –  Caltech 101, Caltech 256 instrumental in the next phase –  LabelMe of CBIR.” Oge  Marques  
  • 52. Revisiting [Smeulders et al. 2000] What they said How I see it •  Semantic gap and other •  The semantic gap problem sources has not been solved (and –  “A critical point in the maybe will never be…) advancement of CBIR is the semantic gap, where the meaning of an image is rarely •  But the idea about using self-evident. […] One way to other sources was right on resolve the semantic gap the spot! comes from sources outside –  Geographical context the image by integrating other sources of information about the –  Social networks image in the query.” –  Tags Oge  Marques  
  • 54. Medical image retrieval •  Challenges –  We’re entering a new country… •  How much can we bring? •  Do we speak the language? •  Do we know their culture? •  Do they understand us and where we come from? •  Opportunities –  They use images (extensively) –  They have expert knowledge –  Domains are narrow (almost by definition) –  Fewer clients, but potentially more $$ Oge  Marques  
  • 55. Medical image retrieval •  Selected challenges: –  Different terminology –  Standards –  Modality dependencies •  Other challenges: –  Equipment dependencies –  Privacy issues –  Proprietary data Oge  Marques  
  • 56. Different terminology •  Be prepared for: –  New acronyms •  CBMIR (Content-Based Medical Image Retrieval) •  PACS (Picture Archiving and Communication System) •  DICOM (Digital Imaging and COmmunication in Medicine) •  Hospital Information Systems (HIS) •  Radiological Information Systems (RIS) –  New phrases •  Imaging informatics –  Lots of technical medical terms Oge  Marques  
  • 57. Standards •  DICOM (http://medical.nema.org/) –  Global IT standard, created in 1993, used in virtually all hospitals worldwide. –  Designed to ensure the interoperability of different systems and manage related workflow. –  Will be required by all EHR systems that include imaging information as an integral part of the patient record. –  750+ technical and medical experts participate in 20+ active DICOM working groups. –  Standard is updated 4-5 times per year. –  Many available tools! (see http://www.idoimaging.com/) Oge  Marques  
  • 58. Medical image modalities •  The IRMA code [Lehmann et al., 2003] –  4 axes with 3 to 4 positions, each in {0,...9,a,...,z}, where 0 denotes unspecified to determine the end of a path along an axis. •  Technical code (T) describes the imaging modality •  Directional code (D) models body orientations •  Anatomical code (A) refers to the body region examined •  Biological code (B) describes the biological system examined. Oge  Marques  
  • 59. Medical image modalities •  The IRMA code [Lehmann et al., 2003] –  The entire code results in a character string of 14 characters (IRMA: TTTT – DDD – AAA – BBB). Example: “x-ray, projection radiography, analog, high energy – sagittal, left lateral decubitus, inspiration – chest, lung – respiratory system, lung” Source: [Lehmann et al., 2003] Oge  Marques  
  • 60. Medical image modalities •  The IRMA code [Lehmann et al., 2003] –  The companion tool… Source: [Lehmann et al., 2004] Oge  Marques  
  • 61. CBMIR vs. text-based MIR •  Most current retrieval systems in clinical use rely on text keywords such as DICOM header information to perform retrieval. •  CBIR has been widely researched in a variety of domains and provides an intuitive and expressive method for querying visual data using features, e.g. color, shape, and texture. •  However, current CBIR systems: –  are not easily integrated into the healthcare environment; –  have not been widely evaluated using a large dataset; and –  lack the ability to perform relevance feedback to refine retrieval results. Source: [Hsu et al., 2009] Oge  Marques  
  • 62. Who are the main players? •  USA –  NIH (National Institutes of Health) •  NIBIB - National Institute of Biomedical Imaging and Bioengineering •  NCI - National Cancer Institute •  NLM – National Libraries of Medicine –  Several universities and hospitals •  Europe –  Aachen University (Germany) –  Geneva University (Switzerland) •  Big companies (Siemens, GE, etc.) Oge  Marques  
  • 63. Medical image retrieval systems: examples •  IRMA (Image Retrieval in Medical Applications) –  Aachen University (Germany) •  http://ganymed.imib.rwth-aachen.de/irma/ –  3 online demos: •  IRMA Query demo: allows the evaluation of CBIR on several databases. •  IRMA Extended Query Refinement demo: CBIR from the IRMA database (a subset of 10,000 images). •  Spine Pathology and Image Retrieval Systems (SPIRS) designed by the NLM/NIH (USA): holds information of ~17,000 spine x-rays. Oge  Marques  
  • 64. Medical image retrieval systems: examples •  MedGIFT (GNU Image Finding Tool) –  Geneva University (Switzerland) •  http://www.sim.hcuge.ch/medgift/ –  Large effort, including projects such as: •  Talisman (lung image retrieval) •  Case-based fracture image retrieval system •  Onco-Media: medical image retrieval + grid computing •  ImageCLEF: evaluation and validation •  medSearch Oge  Marques  
  • 65. Medical image retrieval systems: examples •  WebMIRS –  NIH / NLM (USA) •  http://archive.nlm.nih.gov/proj/webmirs/index.php –  Query by text + navigation by categories –  Uses datasets and related x-ray images from the National Health and Nutrition Examination Survey (NHANES) Oge  Marques  
  • 66. Medical image retrieval systems: examples •  SPIRS (Spine Pathology Image Retrieval System): Web-based image retrieval system for large biomedical databases –  NIH / UCLA (USA) –  Representative case study on highly specialized CBMIR Source: [Hsu et al., 2009] Oge  Marques  
  • 67. Medical image retrieval systems: examples •  National Biomedical Imaging Archive (NBIA) –  NCI / NIH (USA) •  https://imaging.nci.nih.gov/ –  Search based on metadata (DICOM fields) –  3 search options: •  Simple •  Advanced •  Dynamic Oge  Marques  
  • 68. Medical image retrieval systems: examples •  ARSS Goldminer –  American Roentgen Ray Society (USA) •  http://goldminer.arrs.org/ –  Query by text –  Results can be filtered by: •  Modality •  Age •  Sex Oge  Marques  
  • 69. Evaluation: ImageCLEF Medical Image Retrieval •  ImageCLEF Medical Image Retrieval •  http://www.imageclef.org/2011/medical –  Dataset: 77,000+ images from articles published in medical journals including text of the captions and link to the html of the full text articles. –  3 types of tasks: •  Modality Classification: given an image, return its modality •  Ad-hoc retrieval: classic medical retrieval task, with 3 “flavors”: textual, mixed and semantic queries •  Case-based retrieval: retrieve cases including images that might best suit the provided case description. Oge  Marques  
  • 70. Medical Image Retrieval: promising directions •  Better user interfaces (responsive, highly interactive, and capable of supporting relevance feedback) •  New applications of CBMIR, including: –  Teaching –  Research –  Diagnosis –  PACS and Electronic Patient Records •  CBMIR evaluation using medical experts •  Integration of local and global features •  New visual descriptors Oge  Marques  
  • 71. Medical Image Retrieval: promising directions •  New devices Oge  Marques  
  • 73. Mobile visual search: driving factors •  Age of mobile computing hIp://60secondmarketer.com/blog/2011/10/18/more-­‐mobile-­‐phones-­‐than-­‐toothbrushes/     Oge  Marques  
  • 74. Mobile visual search: driving factors •  Why do I need a camera? I have a smartphone… (22 Dec 2011) hIp://www.cellular-­‐news.com/story/52382.php     Oge  Marques  
  • 75. Mobile visual search: driving factors •  Powerful devices 1 GHz ARM Cortex-A9 processor, PowerVR SGX543MP2, Apple A5 chipset hIp://www.apple.com/iphone/specs.html     hIp://www.gsmarena.com/apple_iphone_4s-­‐4212.php     Oge  Marques  
  • 76. Mobile visual search: driving factors •  Powerful devices hIp://europe.nokia.com/PRODUCT_METADATA_0/Products/Phones/8000-­‐series/808/Nokia808PureView_Whitepaper.pdf     hIp://www.nokia.com/fr-­‐fr/produits/mobiles/808/     Oge  Marques  
  • 77. Mobile visual search: driving factors Social networks and mobile devices (May 2011) hIp://jess3.com/geosocial-­‐universe-­‐2/     Oge  Marques  
  • 78. Mobile visual search: driving factors •  Social networks and mobile devices –  Motivated users: image taking and image sharing are huge! :  hIp://www.onlinemarke_ng-­‐trends.com/2011/03/facebook-­‐photo-­‐sta_s_cs-­‐and-­‐insights.html     Oge  Marques  
  • 79. Mobile visual search: driving factors •  Instagram: –  50 million registered users (35 M in last four months) –  7 employees –  A (growing ecosystem) based on it! •  Search •  Send postcards •  Manage your photos •  Build a poster •  etc. –  Sold to Facebook (for $ 1 Billion !) earlier this year hIp://thenextweb.com/apps/2011/12/07/instagram-­‐hits-­‐15m-­‐users-­‐and-­‐has-­‐2-­‐people-­‐working-­‐on-­‐an-­‐android-­‐app-­‐right-­‐now/     hIp://www.nuwomb.com/instagram/       Oge  Marques  
  • 80. Mobile visual search: driving factors •  Legitimate (or not quite…) needs and use cases hIp://www.slideshare.net/dtunkelang/search-­‐by-­‐sight-­‐google-­‐goggles   hIps://twiIer.com/#!/courtanee/status/14704916575       Oge  Marques  
  • 81. Search system, a low-latency interactive visual search system. base and is the key to very fast retr Several sidebars in this article invite the interested reader to dig features they have in common wit deeper into the underlying algorithms. of potentially similar images is sele Finally, a geometric verificatio Mobile visual search: driving factors ROBUST MOBILE IMAGE RECOGNITION Today, the most successful algorithms for content-based image most similar matches in the datab spatial pattern between features of retrieval use an approach that is referred to as bag of features didate database image to ensure (BoFs) or bag of words (BoWs). The BoW idea is borrowed from Example retrieval systems are pres •  A natural use case for CBIR with QBE (at last!) text retrieval. To find a particular text document, such as a Web page, it is sufficient to use a few well-chosen words. In the For mobile visual search, ther to provide the users with an int –  The example is right in front of the user! database, the document itself can be likewise represented by a deployed systems typically transm the server, which might require t large databases, the inverted file in memory swapping operations slow ing stage. Further, the GV step and thus increases the response t the retrieval pipeline in the follow the challenges of mobile visual se Query Feature Image Extraction [FIG2] A Pipeline for image retrieva from the query image. Feature mat [FIG1] A snapshot of an outdoor mobile visual search system images in the database that have m being used. The system augments the viewfinder with with the query image. The GV step information about the objects it recognizes in the image taken feature locations that cannot be pl with a camera phone. in viewing position. Girod  et  al.  IEEE  Mul_media  2011   Oge  Marques  
  • 82. MVS: technical challenges •  How to ensure low latency (and interactive queries) under constraints such as: –  Network bandwidth –  Computational power –  Battery consumption •  How to achieve robust visual recognition in spite of low-resolution cameras, varying lighting conditions, etc. •  How to handle broad and narrow domains Oge  Marques  
  • 83. MVS: Pipeline for image retrieval Girod  et  al.  IEEE  Mul_media  2011   Oge  Marques  
  • 84. 3 scenarios Girod  et  al.  IEEE  Mul_media  2011   Oge  Marques  
  • 85. MVS: descriptor extraction •  Interest point detection •  Feature descriptor computation Girod  et  al.  IEEE  Mul_media  2011   Oge  Marques  
  • 86. Interest point detection •  Numerous interest-point detectors have been proposed in the literature: –  Harris Corners (Harris and Stephens 1988) –  Scale-Invariant Feature Transform (SIFT) Difference-of-Gaussian (DoG) (Lowe 2004) –  Maximally Stable Extremal Regions (MSERs) (Matas et al. 2002) –  Hessian affine (Mikolajczyk et al. 2005) –  Features from Accelerated Segment Test (FAST) (Rosten and Drummond 2006) –  Hessian blobs (Bay, Tuytelaars and Van Gool 2006) •  Different tradeoffs in repeatability and complexity •  See (Mikolajczyk and Schmid 2005) for a comparative performance evaluation of local descriptors in a common framework. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 87. Feature descriptor computation •  After interest-point detection, we compute a visual word descriptor on a normalized patch. •  Ideally, descriptors should be: –  robust to small distortions in scale, orientation, and lighting conditions; –  discriminative, i.e., characteristic of an image or a small set of images; –  compact, due to typical mobile computing constraints. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 88. Feature descriptor computation •  Examples of feature descriptors in the literature: –  SIFT (Lowe 1999) –  Speeded Up Robust Feature (SURF) interest-point detector (Bay et al. 2008) –  Gradient Location and Orientation Histogram (GLOH) (Mikolajczyk and Schmid 2005) –  Compressed Histogram of Gradients (CHoG) (Chandrasekhar et al. 2009, 2010) •  See (Winder, (Hua,) and Brown CVPR 2007, 2009) and (Mikolajczyk and Schmid PAMI 2005) for comparative performance evaluation of different descriptors. Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 89. Feature descriptor computation •  What about compactness? –  Option 1: Compress off-the-shelf descriptors. •  Result: poor rate-constrained image-retrieval performance. –  Option 2: Design a descriptor with compression in mind. –  Example: CHoG (Compressed Histogram of Gradients) (Chandrasekhar et al. 2009, 2010) Girod  et  al.  IEEE  Signal  Processing  Magazine  2011   Oge  Marques  
  • 90. CHoG: Compressed Histogram of Gradients Gradients Gradient distributions Patch for each bin dx dy dx dy 011101 Spatial 0100101 binning 01101 101101 Histogram 0100011 111001 compression 0010011 01100 1010100 CHoG
 Descriptor Bernd Girod: Mobile Visual Search Chandrasekhar  et  al.  CVPR  09,10   Oge  Marques  
  • 91. CHoG: Compressed Histogram of Gradients [3B2-9] mmu2011030086.3d 30/7/011 16:27 Page 92 •  Performance evaluation –  Recall vs. bit rate Industry and Standards 100 features, as they arrive.15 On 98 finds a result that has sufficien ing score, it terminates the searc 96 ately sends the results back. T optimization reduces system Classification accuracy (%) 94 other factor of two. 92 Overall, the SPS system dem using the described array of tec 90 bile visual-search systems can ac ognition accuracy, scale to re 88 databases, and deliver search r 86 ceptable time. 84 Send feature (CHoG) Emerging MPEG standard Send image (JPEG) As we have seen, key compo 82 Send feature (SIFT) gies for mobile visual search alr 80 we can choose among several p 100 101 102 tures to design such a system. W Query size (Kbytes) these options at the beginnin Figure 7. Comparison of different schemes with regard to classification The architecture shown in Figur Girod  et  al.  IEEE  Mul_media  2011   Oge  Marques   est one to implement on a mobi accuracy and query size. CHoG descriptor data is an order of magnitude smaller compared to JPEG images or uncompressed SIFT descriptors. requires fast networks such as W good performance. The archite
  • 92. MVS: feature indexing and matching •  Goal: produce a data structure that can quickly return a short list of the database candidates most likely to match the query image. –  The short list may contain false positives as long as the correct match is included. –  Slower pairwise comparisons can be subsequently performed on just the short list of candidates rather than the entire database. •  Example of a technique: Vocabulary Tree (VT)-Based Retrieval Girod  et  al.  IEEE  Mul_media  2011   Oge  Marques  
  • 93. MVS: geometric verification •  Goal: use location information of features in query and database images to confirm that the feature matches are consistent with a change in view-point between the two images. Girod  et  al.  IEEE  Mul_media  2011   Oge  Marques  
  • 94. ik2, c, ikNk 6 is sorted, it is more utive ID differences 5 dk1 5 ik1, es. is used to encode the inverted index. 2 ik1Nk 212 6 in place of the IDs. This dex [58] can significantly reduce cting recognition accuracy. First, [64] and recursive bottom-up complete (RBUC) code [65] have been shown to be at least ten times faster in decoding than MVS: geometric verification AC, while achieving comparable compression gains as AC. The carryover and RBUC codes attain these speedups by enforcing ed in text retrieval [62]. Second, word-aligned memory accesses. n be quantized to a few repre- Figure S6(a) compares the memory usage of the invert- •  Method: perform ed index with and without feature descriptorsRBUC evaluate Max quantization. Third, the dis- pairwise matching of compression using the and ces and visit counts are far from code. Index compression reduces memory usage from near- geometricrate ly 10 GBof correspondences. coding can be much more consistency to 2 GB. This five times reduction leads to a sub- •  Techniques: oding. Using the distributions of stantial speedup in server-side processing, as shown in counts, each inverted list can be Figure S6(b). Without compression, the large inverted c code (AC) [63]. The geometricindex causes swapping between main anddatabase image is usually –  Since keeping transform between the query and virtual memory estimated very important for interactive regression down the retrieval engine. After compression, using robust and slows techniques such as: ions, a scheme that allows ultra- sample consensus (RANSAC) (Fischlermemory congestion •  Random memory swapping is avoided and and Bolles 1981) red over AC. The carryover code delays no longer contribute to the query latency. •  Hough transform (Lowe 2004) –  The transformation is often represented by an affine mapping or a homography. •  Note: GV is computationally expensive, which is why it’s only used for a subset of images selected during the feature-matching stage. onsistency checks to rerank tion and scale information of [53] and [69] propose incor- tion into the VT matching or 71], the authors investigate stimation itself. Philbin et al. atching features to propose c transformation model and hypotheses. Weak geometric cally used to rerank a larger ore a full GVt  al.  Iperformed on011   Girod  e is EEE  Mul_media  2 Oge  Marques   [FIG4] In the GV step, we match feature descriptors pairwise and find feature correspondences that are consistent with a geometric add a geometric reranking step
  • 95. Datasets for MVS research •  Stanford Mobile Visual Search Data Set (http://web.cs.wpi.edu/~claypool/mmsys-dataset/2011/stanford/) –  Key characteristics: •  rigid objects •  widely varying lighting conditions •  perspective distortion •  foreground and background clutter •  realistic ground-truth reference data •  query data collected from heterogeneous low and high-end camera phones. Chandrasekhar  et  al.  ACM  MMSys  2011   Oge  Marques  
  • 96. SMVS Data Set: categories and examples •  DVD covers hIp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/dvd_covers.html     Oge  Marques  
  • 97. SMVS Data Set: categories and examples •  CD covers hIp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/cd_covers.html     Oge  Marques  
  • 98. SMVS Data Set: categories and examples •  Museum paintings hIp://web.cs.wpi.edu/~claypool/mmsys-­‐2011-­‐dataset/stanford/mvs_images/museum_pain_ngs.html     Oge  Marques  
  • 99. Other MVS data sets ISO/IEC  JTC1/SC29/WG11/N12202  -­‐  July  2011,  Torino,  IT   Oge  Marques  
  • 100. MPEG Compact Descriptors for Visual Search (CDVS) •  Objective –  Define a standard that enables efficient implementation of visual search functionality on mobile devices •  Scope •  bitstream of descriptors •  parts of descriptor extraction process (e.g. key-point detection) needed to ensure interoperability –  Additional info: •  https://mailhost.tnt.uni-hannover.de/mailman/listinfo/cdvs •  http://mpeg.chiariglione.org/meetings/geneva11-1/geneva_ahg.htm (Ad hoc groups) Bober,  Cordara,  and  Reznik  (2010)   Oge  Marques  
  • 101. MPEG CDVS [3B2-9] mmu2011030086.3d 1/8/011 16:44 Page 93 •  Summarized timeline Table 1. Timeline for development of MPEG standard for visual search. When Milestone Comments March, 2011 Call for Proposals is published Registration deadline: 11 July 2011 Proposals due: 21 November 2011 December, 2011 Evaluation of proposals None February, 2012 1st Working Draft First specification and test software model that can be used for subsequent improvements. July, 2012 Committee Draft Essentially complete and stabilized specification. January, 2013 Draft International Standard Complete specification. Only minor editorial changes are allowed after DIS. July, 2013 Final Draft International Finalized specification, submitted for approval and Standard publication as International standard. that among several component technologies for existing standards, such as MPEG Query For- image retrieval, such a standard should focus pri- mat, HTTP, XML, JPEG, and JPSearch. marily on defining the format of descriptors and Girod  et  al.  IEEE  Mul_media  2011   Oge  Marques   parts of their extraction process (such as interest Conclusions and outlook point detectors) needed to ensure interoperabil- Recent years have witnessed remarkable
  • 102. Examples •  Google Goggles •  SnapTell •  oMoby (and the IQ Engines API) •  pixlinQ •  Moodstocks Oge  Marques  
  • 103. Examples of commercial MVS apps •  Google Goggles –  Android and iPhone –  Narrow- domain search and retrieval hIp://www.google.com/mobile/goggles     Oge  Marques  
  • 104. SnapTell •  One of the earliest (ca. 2008) MVS apps for iPhone –  Eventually acquired by Amazon (A9) •  Proprietary technique (“highly accurate and robust algorithm for image matching: Accumulated Signed Gradient (ASG)”). hIp://www.snaptell.com/technology/index.htm     Oge  Marques  
  • 105. oMoby (and the IQ Engines API) –  iPhone app hIp://omoby.com/pages/screenshots.php     Oge  Marques  
  • 106. oMoby (and the IQ Engines API) •  The IQ Engines API: “vision as a service” hIp://www.iqengines.com/applica_ons.php     Oge  Marques  
  • 107. pixlinQ •  A “mobile visual search solution that enables you to link users to digital content whenever they take a mobile picture of your printed materials.” –  Powered by image recognition from LTU technologies hIp://www.pixlinq.com/home     Oge  Marques  
  • 108. pixlinQ •  Example app (La Redoute) hIp://www.youtube.com/watch?v=qUZCFtc42Q4     Oge  Marques  
  • 109. Moodstocks: overview •  Offline image recognition thanks to a smart image signatures synchronization hIp://www.youtube.com/watch?v=tsxe23b12eU     Oge  Marques  
  • 110. Moodstocks: technology •  Unique features: –  offline image recognition thanks to a smart image signatures synchronization, –  QR Code decoding, –  EAN 8/13 decoding, –  online image recognition as a fallback for very large image databases, –  simultaneous run of image recognition and barcode decoding, –  seamless scans logging in the background. •  Cross-platform (iOS / Android) client-side SDK and HTTP API available: https://github.com/Moodstocks •  JPEG encoder used within their SDK also publicly available: https://github.com/Moodstocks/jpec Oge  Marques  
  • 111. Moodstocks •  Many successful apps for different platforms hIp://www.moodstocks.com/gallery/     Oge  Marques  
  • 112. MVS: concluding thoughts •  Mobile Visual Search (MVS) is coming of age. •  This is not a fad and it can only grow. •  Still a good research topic –  Many relevant technical challenges –  MPEG efforts have just started •  Infinite creative commercial possibilities Oge  Marques  
  • 113. Part IV Where is image search headed?
  • 114. Where is image search headed? •  Advice for [young] researchers –  In this last part, I’ve compiled pieces and bits of advice that I believe might help researchers who are entering the field. –  They focus on research avenues that I personally consider to be the most promising. Oge  Marques  
  • 115. Advice for [young] researchers • LOOK • THINK • UNDERSTAND • CREATE Oge  Marques  
  • 116. Advice for [young] researchers • LOOK… –  at yourself (how do you search for images and videos?) –  around (related areas and how they have grown) –  at Google (and other major players) Oge  Marques  
  • 117. Advice for [young] researchers • THINK… –  mobile devices –  new devices and services –  social networks –  games Oge  Marques  
  • 118. Advice for [young] researchers • UNDERSTAND… –  human intentions and emotions –  the context of the search –  user’s preferences and needs Oge  Marques  
  • 119. Advice for [young] researchers • CREATE… –  better interfaces –  better user experience –  new business opportunities (added value) Oge  Marques  
  • 120. Concluding thoughts –  I believe (but cannot prove…) that successful VIR solutions will: •  combine content-based image retrieval (CBIR) with metadata (high-level semantic-based image retrieval) •  only be truly successful in narrow domains •  include the user in the loop –  Relevance Feedback (RF) –  Collaborative efforts (tagging, rating, annotating) •  provide friendly, intuitive interfaces •  incorporate results and insights from cognitive science, particularly human visual attention, perception, and memory Oge  Marques  
  • 121. Concluding thoughts –  I believe (but cannot prove…) that successful VIR solutions will: •  combine content-based image retrieval (CBIR) with metadata (high-level semantic-based image retrieval) •  only be truly successful in narrow domains •  include the user in the loop –  Relevance Feedback (RF) –  Collaborative efforts (tagging, rating, annotating) •  provide friendly, intuitive interfaces •  incorporate results and insights from cognitive science, particularly human visual attention, perception, and memory Oge  Marques  
  • 122. Concluding thoughts –  I believe (but cannot prove…) that successful VIR solutions will: •  combine content-based image retrieval (CBIR) with metadata (high-level semantic-based image retrieval) •  only be truly successful in narrow domains •  include the user in the loop –  Relevance Feedback (RF) –  Collaborative efforts (tagging, rating, annotating) •  provide friendly, intuitive interfaces •  incorporate results and insights from cognitive science, particularly human visual attention, perception, and memory Oge  Marques  
  • 123. Concluding thoughts –  I believe (but cannot prove…) that successful VIR solutions will: •  combine content-based image retrieval (CBIR) with metadata (high-level semantic-based image retrieval) •  only be truly successful in narrow domains •  include the user in the loop –  Relevance Feedback (RF) –  Collaborative efforts (tagging, rating, annotating) •  provide friendly, intuitive interfaces •  incorporate results and insights from cognitive science, particularly human visual attention, perception, and memory Oge  Marques  
  • 124. Concluding thoughts –  I believe (but cannot prove…) that successful VIR solutions will: •  combine content-based image retrieval (CBIR) with metadata (high-level semantic-based image retrieval) •  only be truly successful in narrow domains •  include the user in the loop –  Relevance Feedback (RF) –  Collaborative efforts (tagging, rating, annotating) •  provide friendly, intuitive interfaces •  incorporate results and insights from cognitive science, particularly human visual attention, perception, and memory Oge  Marques  
  • 125. Concluding thoughts •  “Image search and retrieval” is not a problem, but rather a collection of related problems that look like one. •  There is a great need for good solutions to specific problems. •  10 years after “the end of the early years”, research in visual information retrieval still has many open problems, challenges, and opportunities. Oge  Marques  
  • 126. Learn more about it •  http://savvash.blogspot.com/ Oge  Marques  
  • 127. Thanks! •  Questions? •  For additional information: omarques@fau.edu Oge  Marques