1. Machine Learning and Information Retrieval Zoubin Ghahramani Department of Engineering University of Cambridge
2.
3.
4.
5.
6.
7. Universe of items being searched… Imagine a universe of items: The items could be: images, music, documents, websites, publications, proteins, news stories, customer profiles, products, medical records, … or any other type of item one might want to query. … .
18. Image Retrieval Results for Query: “sunset” These are the top 9 images returned. Our system finds images of sunsets using only the color and texture features of these unlabelled images.
19. Results for Query: “sign” These are the top 9 images returned. It finds images of signs using only the color and texture features of these unlabelled images.
26. Example Labelled Images for “sunset” These are 9 random images that were labelled “sunset” in the labelled training data. Notice that these images are quite variable, and the labelling subjective and somewhat noisy. Our retrieval system does very well and is quite robust to ambiguous categories and poor labelling.
Hinweis der Redaktion
Hi, I’ll be presenting our work on Bayesian information retrieval. This is joint work with Katherine Heller. Let me first describe what I mean by information retrieval…
We could call this problem ‘clustering on demand’. Assume a universe of objects (D) which could be images, web pages, documents, movies, proteins, or any other object we wish to form queries on. For example here we have a potentially very large universe of objects, which includes a yellow train, a blue car, a London bus, a tomato, and a computational neuroscientist. (pause) We want an algorithm which does the following:
Given a query, a very small set (Dc) of the objects in our universe, for example here we have a red car and a blue car. We want our algorithm to return all the objects which belong to the concept, exemplified by the query. So for the query red car, blue car, we want it to return all the objects which are cars. While for the query red car, tomato, we want it to return all the red objects.
Our approach to this problem is to rank each object in our universe D by how well it would fit into a set which includes the query, D_C, or how relevant it is to the query. So for the query red car blue car we can rank all of the objects from best fit to worst fit. We use a Bayesian model-based probabilistic relevance criterion to do this and we limit the output to the top few items.
We’ve created an image ret system which searches large collections of unlab images.