8. Click-count Ranking
MSR dataset provide real world data for user query
log.
With this, generated homemade searching table
by“Click-count”.
“Max click count rule”
Log data 1,000,000 (only 1/20)
We can make sure that candidate pictures are
most popular.
13. Prepare and Work
Off-line:
NLTK to process user query log
Build Ranking table (1,000,000)
Include image(base64) to Database(800,000)
On-line:
NLTK to process query input
Query expansion by word net and wikipedia
Large-scale database query processing
15. Compound word-query
book store, picture frame, the lost and bewildered
tourist, ice cream, cell phone, apple pie, a story as
old as time, a cool wet afternoon, many cases of
infectious disease
swimming pool, the senlie old man,pencil box , long
and winding road, tiddy bear , hot dog, jennifer
love hewitt, some cookie shaped like stars
hello kitty coloring page, kelly osbourne drinking,
micky mouse, a wet amd stinky dog
Test : 20 queries Acc:42.28 %
17. Spelling correctly can improve retrieval accuracy.
Query expansion can find more related images
!
A ambiguous query can be difficult to used.
The gap exists between users and result images,
because the word is polysemic.
The user query still has a semantic problem.
Finding
18. In a compound word query, the relationship
between previous and next word is very
important.
Query semantic is still a challenge.
Large-scale data processing is a big problem.
How to speed up search performance?
Difficulty