SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Context Adaptation in Image Search

                       arjen@acm.org
Context Adaptation
GOAL:

Present different photos to a sports
journalist who queries for Beckham, than
the glossy magazine editor issuing the
same query
IPTC Categories
• ACE (arts, culture,           • LIF (lifestyle & leisure)
   entertainment)               • POL (politics)
• CLJ (crime, law & justice)    • REL (religion)
• DIS (disasters & accidents)   • SCI (science & technology)
• EBF (economy, business &      • SOI (social issues)
   finance)                     • SPO (sports)
• EDU (education)               • WAR (unrest, conflicts,
• ENV (environment)                war)
• HTH (health)                  • WEA (weather)
• HUM (human interest)
• LAB (labour, work)
What Context?
• Collection context
  – One “main” IPTC category per image
    • 96,351 out of 97,760 images in 100k Belga
      Collection
    • Note: noisy data, in spite of it being edited
      content!
      E.g., we found lifestyle Beckham images annotated
      as SPO, and even typos in IPTC category
      assignment!
• User context
  – Classified 813 users into IPTC categories to
    represent their main interest (based on Belga
    input about the user’s organizations)
Filter on IPTC?
 //image[@IPTC eq SPO][about(.,Beckham)]
• Bad for recall:
  – Not all images have been assigned IPTC
    categories
• Bad for precision:
  – Noisy assignment of IPTC categories to
    images
    • At least 4 of the top 10 SPO Beckham results do
      not show Beckham taking part in sporting activities
Retrieval Model
• Re-rank results based on cluster
  membership
   λρd(q) + (1-λ) ∑c ∈ Clusters ρc(q) ρc(d)
       P(Q|D)                                   P(D|c)
                                 P(Q|c)


  – Modify scores based on document’s context
    Oren Kurland and Lillian Lee.
    ACM Transactions on Information Systems (TOIS), 27(3),
   2009.

• Novelty in Vitalas:
  – Modify scores based on user’s context
    • Cluster formation based on user clicks
    • Cluster selection based on user context
Retrieval Model
• Cluster formation:
  – IPTC-image categories; forms disjoint clusters
  – IPTC-user categories of users who clicked the
    image; gives overlapping clusters
• Cluster selection:
  – {d∈c}: cluster contains document
  – {u∈c}: cluster/@category corresponds to
    user's interests
Results on Click Prediction
                 image    image     image     image     user      user     user      User
NDCG     D
                 0.0      0.1       0.4       0.7       0.0        0.1     0.4       0.7
ACE    0.1724   0.1423    0.1741    0.1721   0.1721   0.2070    0.1978    0.1767   0.1747
EBF    0.5527   0.4744    0.5460    0.5497   0.5504   0.4882    0.5519    0.5509   0.5509
EDU    0.0145   0.0163    0.0145    0.0145   0.0145   0.0165    0.0167    0.0155   0.0146
HTH    0.1308   0.1347    0.1308    0.1308   0.1308   0.6342    0.3712    0.1934   0.1414
HUM    0.1849   0.1612    0.1798    0.1772   0.1849   0.2109    0.2043    0.1776   0.1760
LAB    0.1331   0.1543    0.1331    0.1331   0.1331   0.2164    0.2339    0.1817   0.1380
LIF    0.1245   0.0888    0.1234    0.1233   0.1232   0.1894    0.1555    0.1121   0.1253
POL    0.0723   0.0586    0.0704    0.0717   0.0721   0.1054    0.0990    0.0916   0.0769
SOI    0.2880   0.1806    0.2883    0.2880   0.2880   0.2964    0.2970    0.2968   0.3008
SPO    0.1811   0.1801    0.1809    0.1806   0.1807   0.2151    0.2005    0.1839   0.1820




                Related literature on evaluation methodology: Carterette and Jones, NIPS
                2007, and, Carterette, Allan, and Sitaraman, SIGIR 2006.
No Adaptation
    “Greece”
SPO Adaptation
“Greece, collection-based clusters, λ=0.1”
SPO Adaptation
“Greece, collection-based clusters, λ=0.0”
SPO Adaptation
“Greece, user-based clusters, λ=0.1”
SPO Adaptation
“Greece, user-based clusters, λ=0.0”
SPO Observations
• Re-ranking pushes the sports-related
  images to the top
  – No more images about the fires
  – When λ=0.0 the initial retrieval score is not
    taken into account (initial text ranking
    ignored)
• Minimal differences between collection-
  based and user-based cluster formation
  – Archivists consider as sports-related those
    images that users with sports-related
    interests click on
POL Adaptation
“Greece, collection-based clusters, λ=0.1”
POL Adaptation
“Greece, collection-based clusters, λ=0.0”
POL Adaptation
“Greece, user-based clusters, λ=0.1”
POL Adaptation
“Greece, user-based clusters, λ=0.0”
POL Observations
• Re-ranking for a politics context shows a
  difference in interpretation between the
  archivist and the user group
  – Archivists focussed on the actual political
    rallies etc.
  – Users focussed on the forest fires
ACE Adaptation
“Greece, collection-based clusters, λ=0.1”
ACE Adaptation
“Greece, collection-based clusters, λ=0.0”
ACE Observations
• Re-ranking for arts, culture and
  entertainment requires λ=0.0, to ignore
  the initial ranking and let the right images
  shine
No Adaptation
   “Beckham”
SPO Adaptation
“Beckham, collection-based clusters, λ=0.1”
SPO Adaptation
“Beckham, collection-based clusters, λ=0.0”
HUM Adaptation
“Beckham, collection-based clusters, λ=0.1”
Conclusions this far
• Adaptation also retrieves images not
  assigned IPTC category, by considering
  clusters formed by the images clicked by
  users with the same interests
• Alternative cluster formation approaches
  can be investigated; e.g., using visual
  features
• Method easily adapted for personalised
  and/or collaborative search
Potential for Personalization
• Which queries have the potential to
  benefit by context adaptation
  (personalisation)?
• The ones for which different users click on
  different results
  – Can be studied looking at nDCG of one user
    assuming another user’s clicks are ideal
     Jaime Teevan, Susan T. Dumais and Eric Horvitz. Potential for
     Personalization. ACM Transactions on Computer-Human Interaction (ToCHI)
     special issue on Data Mining for Understanding User Needs, 17(1), March
     2010.

• Novel in Vitalas: compare IPTC-defined
  user groups (instead of individual users)
P4P in Belga 100K
P4P in Belga 100K
                 nDCG high: low potential


        Dean (0.8067)

        King albert ii (0.7810)


                             greece (0.3910)

                                    nDCG low:
                                     high potential
No Adaptation
  “King Albert II”
EBF Adaptation
   “King Albert II”
POL Adaptation
   “King Albert II”
No Adaptation
    “Dean”
ACE Adaptation
“Dean, user-based clusters”
ACE Adaptation
“Dean, collection-based clusters”
Dean: Temporal Effect
• Log files: “Dean” = “Hurricane Dean”
• Still, query is quite ambiguous:
  –   James Dean
  –   Agyness Dean (a model)
  –   a (university) dean
  –   Dean Dealannoi
  –   Howard Dean
  –   Dean Martin
• Context adaptation for “Dean” requires
  archivist
Future Work
• Address various normalization issues
  – In context adaptation (due to NLLR
    approximation)
  – In “potential for personalization”/adaptation
• Explore temporal dimension
  – Combinations of collection and user context?
• Explore cross-media cluster-based
  retrieval
  – Use visual features in cluster formation
See also
“CWI” Vitalas demonstrations:
 http://www.ins.cwi.nl/projects/M4/vitalas/

Collection context instead of user context:
 http://www.ins.cwi.nl/projects/M4/vitalas/context_adap
 tation.html

Detectors trained by query log
 http://olympus.ee.auth.gr/diou/civr2009/

Weitere ähnliche Inhalte

Ähnlich wie Context Adaptation in Image Search

Phd defense - Linked data based exploratory search - Nicolas MARIE
Phd defense - Linked data based exploratory search - Nicolas MARIEPhd defense - Linked data based exploratory search - Nicolas MARIE
Phd defense - Linked data based exploratory search - Nicolas MARIE
Nicolas MARIE
 
Global lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataGlobal lodlam_communities and open cultural data
Global lodlam_communities and open cultural data
Minerva Lin
 

Ähnlich wie Context Adaptation in Image Search (20)

Denis Reznik Data driven future
Denis Reznik Data driven futureDenis Reznik Data driven future
Denis Reznik Data driven future
 
SE2016 BigData Denis Reznik "Data driven future"
SE2016 BigData Denis Reznik "Data driven future"SE2016 BigData Denis Reznik "Data driven future"
SE2016 BigData Denis Reznik "Data driven future"
 
Phd defense - Linked data based exploratory search - Nicolas MARIE
Phd defense - Linked data based exploratory search - Nicolas MARIEPhd defense - Linked data based exploratory search - Nicolas MARIE
Phd defense - Linked data based exploratory search - Nicolas MARIE
 
Global lodlam_communities and open cultural data
Global lodlam_communities and open cultural dataGlobal lodlam_communities and open cultural data
Global lodlam_communities and open cultural data
 
Improving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log AnalysisImproving Semantic Search Using Query Log Analysis
Improving Semantic Search Using Query Log Analysis
 
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organi...
 
Explain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You DoExplain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You Do
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
Klout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIsKlout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIs
 
Sharing Economy 2.0 & The Internet of People (IoP) Workshop
Sharing Economy 2.0 & The Internet of People (IoP) WorkshopSharing Economy 2.0 & The Internet of People (IoP) Workshop
Sharing Economy 2.0 & The Internet of People (IoP) Workshop
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Sensors1(1)
Sensors1(1)Sensors1(1)
Sensors1(1)
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Lecture4 Social Web
Lecture4 Social Web Lecture4 Social Web
Lecture4 Social Web
 
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish PerspectiveData Analytics and Industry-Academic Partnerships: An Irish Perspective
Data Analytics and Industry-Academic Partnerships: An Irish Perspective
 
Importance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistryImportance of data standards for large scale data integration in chemistry
Importance of data standards for large scale data integration in chemistry
 
Digital Video File Organization
Digital Video File OrganizationDigital Video File Organization
Digital Video File Organization
 
The aggregator database cornerstone or annex
The aggregator database cornerstone or annexThe aggregator database cornerstone or annex
The aggregator database cornerstone or annex
 
090107 section 1 1050109
090107 section 1 1050109090107 section 1 1050109
090107 section 1 1050109
 

Mehr von Arjen de Vries

The personal search engine
The personal search engineThe personal search engine
The personal search engine
Arjen de Vries
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
Arjen de Vries
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
Arjen de Vries
 

Mehr von Arjen de Vries (20)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
The personal search engine
The personal search engineThe personal search engine
The personal search engine
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Context Adaptation in Image Search

  • 1. Context Adaptation in Image Search arjen@acm.org
  • 2. Context Adaptation GOAL: Present different photos to a sports journalist who queries for Beckham, than the glossy magazine editor issuing the same query
  • 3. IPTC Categories • ACE (arts, culture, • LIF (lifestyle & leisure) entertainment) • POL (politics) • CLJ (crime, law & justice) • REL (religion) • DIS (disasters & accidents) • SCI (science & technology) • EBF (economy, business & • SOI (social issues) finance) • SPO (sports) • EDU (education) • WAR (unrest, conflicts, • ENV (environment) war) • HTH (health) • WEA (weather) • HUM (human interest) • LAB (labour, work)
  • 4. What Context? • Collection context – One “main” IPTC category per image • 96,351 out of 97,760 images in 100k Belga Collection • Note: noisy data, in spite of it being edited content! E.g., we found lifestyle Beckham images annotated as SPO, and even typos in IPTC category assignment! • User context – Classified 813 users into IPTC categories to represent their main interest (based on Belga input about the user’s organizations)
  • 5. Filter on IPTC? //image[@IPTC eq SPO][about(.,Beckham)] • Bad for recall: – Not all images have been assigned IPTC categories • Bad for precision: – Noisy assignment of IPTC categories to images • At least 4 of the top 10 SPO Beckham results do not show Beckham taking part in sporting activities
  • 6. Retrieval Model • Re-rank results based on cluster membership λρd(q) + (1-λ) ∑c ∈ Clusters ρc(q) ρc(d) P(Q|D) P(D|c) P(Q|c) – Modify scores based on document’s context Oren Kurland and Lillian Lee. ACM Transactions on Information Systems (TOIS), 27(3), 2009. • Novelty in Vitalas: – Modify scores based on user’s context • Cluster formation based on user clicks • Cluster selection based on user context
  • 7. Retrieval Model • Cluster formation: – IPTC-image categories; forms disjoint clusters – IPTC-user categories of users who clicked the image; gives overlapping clusters • Cluster selection: – {d∈c}: cluster contains document – {u∈c}: cluster/@category corresponds to user's interests
  • 8. Results on Click Prediction image image image image user user user User NDCG D 0.0 0.1 0.4 0.7 0.0 0.1 0.4 0.7 ACE 0.1724 0.1423 0.1741 0.1721 0.1721 0.2070 0.1978 0.1767 0.1747 EBF 0.5527 0.4744 0.5460 0.5497 0.5504 0.4882 0.5519 0.5509 0.5509 EDU 0.0145 0.0163 0.0145 0.0145 0.0145 0.0165 0.0167 0.0155 0.0146 HTH 0.1308 0.1347 0.1308 0.1308 0.1308 0.6342 0.3712 0.1934 0.1414 HUM 0.1849 0.1612 0.1798 0.1772 0.1849 0.2109 0.2043 0.1776 0.1760 LAB 0.1331 0.1543 0.1331 0.1331 0.1331 0.2164 0.2339 0.1817 0.1380 LIF 0.1245 0.0888 0.1234 0.1233 0.1232 0.1894 0.1555 0.1121 0.1253 POL 0.0723 0.0586 0.0704 0.0717 0.0721 0.1054 0.0990 0.0916 0.0769 SOI 0.2880 0.1806 0.2883 0.2880 0.2880 0.2964 0.2970 0.2968 0.3008 SPO 0.1811 0.1801 0.1809 0.1806 0.1807 0.2151 0.2005 0.1839 0.1820 Related literature on evaluation methodology: Carterette and Jones, NIPS 2007, and, Carterette, Allan, and Sitaraman, SIGIR 2006.
  • 9. No Adaptation “Greece”
  • 14. SPO Observations • Re-ranking pushes the sports-related images to the top – No more images about the fires – When λ=0.0 the initial retrieval score is not taken into account (initial text ranking ignored) • Minimal differences between collection- based and user-based cluster formation – Archivists consider as sports-related those images that users with sports-related interests click on
  • 19. POL Observations • Re-ranking for a politics context shows a difference in interpretation between the archivist and the user group – Archivists focussed on the actual political rallies etc. – Users focussed on the forest fires
  • 22. ACE Observations • Re-ranking for arts, culture and entertainment requires λ=0.0, to ignore the initial ranking and let the right images shine
  • 23. No Adaptation “Beckham”
  • 27. Conclusions this far • Adaptation also retrieves images not assigned IPTC category, by considering clusters formed by the images clicked by users with the same interests • Alternative cluster formation approaches can be investigated; e.g., using visual features • Method easily adapted for personalised and/or collaborative search
  • 28. Potential for Personalization • Which queries have the potential to benefit by context adaptation (personalisation)? • The ones for which different users click on different results – Can be studied looking at nDCG of one user assuming another user’s clicks are ideal Jaime Teevan, Susan T. Dumais and Eric Horvitz. Potential for Personalization. ACM Transactions on Computer-Human Interaction (ToCHI) special issue on Data Mining for Understanding User Needs, 17(1), March 2010. • Novel in Vitalas: compare IPTC-defined user groups (instead of individual users)
  • 29. P4P in Belga 100K
  • 30. P4P in Belga 100K nDCG high: low potential Dean (0.8067) King albert ii (0.7810) greece (0.3910) nDCG low: high potential
  • 31. No Adaptation “King Albert II”
  • 32. EBF Adaptation “King Albert II”
  • 33. POL Adaptation “King Albert II”
  • 34. No Adaptation “Dean”
  • 37. Dean: Temporal Effect • Log files: “Dean” = “Hurricane Dean” • Still, query is quite ambiguous: – James Dean – Agyness Dean (a model) – a (university) dean – Dean Dealannoi – Howard Dean – Dean Martin • Context adaptation for “Dean” requires archivist
  • 38. Future Work • Address various normalization issues – In context adaptation (due to NLLR approximation) – In “potential for personalization”/adaptation • Explore temporal dimension – Combinations of collection and user context? • Explore cross-media cluster-based retrieval – Use visual features in cluster formation
  • 39. See also “CWI” Vitalas demonstrations: http://www.ins.cwi.nl/projects/M4/vitalas/ Collection context instead of user context: http://www.ins.cwi.nl/projects/M4/vitalas/context_adap tation.html Detectors trained by query log http://olympus.ee.auth.gr/diou/civr2009/