Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Building k-nn Graphs From Large Text Data
1. IEEE BigData 2014
Building k-nn Graphs From
Large Text Data
Thibault Debatty, Pietro Michiardi,
Olivier Thonnard & Wim Mees
2. The context : TRIAGE
Building k-nn Graphs From Large Text Data 2
3. The problem
The subject of a SPAM is more than a
set of keywords
Rep|icaWatches For Sale: cRolex
Rep1icaWatches For Sale: R0lex
RepilcaWatches For Sale: Rolex
Building k-nn Graphs From Large Text Data 3
4. The problem
How to build a k-nn graph from
large text data using using
arbitrary similarity metric?
– Naive
– Index
– Locality-sensitive hashing (LSH)
– nn-descent
Building k-nn Graphs From Large Text Data 4
5. NNCTPH
Map
Reduce
SPAM 1 SPAM 2
CTPH* CTPH*
Sig 1 Sig 2
nn-descent
Building k-nn Graphs From Large Text Data 5
6. Experimental results
● Dataset: 200k to 800k spam subjects
● Tests:
– Stages
– Buckets
– Comparison with MR nn-descent
– Scalability
● Measures:
– Speed
– Recall
Building k-nn Graphs From Large Text Data 6
11. Conclusions & future work...
● 10x faster than MR nn-descent
● Speedup increases with size of dataset
● Limited recall
● Future:
– Improve recall?
– Quality of graph?
– Influence of graph quality?
– Compare with bag-of-words model
Building k-nn Graphs From Large Text Data 11