More Related Content Similar to Vector Similarity Search & Indexing Methods (20) Vector Similarity Search & Indexing Methods3. © 2020 Zilliz. All rights reserved.
Information Retrieval: from text to versatile data types
How to measure similarity between data?
4. © 2020 Zilliz. All rights reserved.
Embeddings: represent data as vectors
a b c
a
b
c
5. © 2020 Zilliz. All rights reserved.
Efficiency problem for big data
• Trade accuracy for efficiency
• Indexing method
7. © 2020 Zilliz. All rights reserved.
Graph based Index: general idea
Approximate nearest neighbor algorithm based on navigable small world graphs
8. © 2020 Zilliz. All rights reserved.
Graph based Index: general idea
Approximate nearest neighbor algorithm based on navigable small world graphs
9. © 2020 Zilliz. All rights reserved.
Graph based Index: general idea
Approximate nearest neighbor algorithm based on navigable small world graphs
10. © 2020 Zilliz. All rights reserved.
Graph: optimizations
Efficient and robust approximate nearest neighbor search
using Hierarchical Navigable Small world graphs
Approximate nearest neighbor algorithm based on
navigable small world graphs
11. © 2020 Zilliz. All rights reserved.
Example:
Space Partition based index
Approximate nearest neighbor
methods and vector models The inverted Multi-Index
12. © 2020 Zilliz. All rights reserved.
Optimization for space partition
Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
13. © 2020 Zilliz. All rights reserved.
Encoding based index: general idea
Product quantization for nearest neighbor search
14. © 2020 Zilliz. All rights reserved.
Encoding: product quantization
Similarity Query Processing for High-Dimensional Data
15. © 2020 Zilliz. All rights reserved.
Comparison
Fast, accurate, and small,
never reached at the same time…
Fast
Accurate Small
HNSW L&C
IVF_PQ
IVF
_SQ
FLAT ∅
16. © 2020 Zilliz. All rights reserved.
Flexible indexes: A layered framework
17. © 2020 Zilliz. All rights reserved.
Layers: function decomposition
Layer
Function
Data Size Candidates
for a query
Requireme
nt
Space
Partition
Regions Small Full Accurate,
Fast
Candidate
Filtering
Compress
ed vectors
Mediu
m
Small
portion
Fast
Result
Validation
Original
vectors
Large Very small
portion
Accurate
18. © 2020 Zilliz. All rights reserved.
Layer
Function
Size Require
ment
Index Type
(Adjustable)
Optimization
Opportunity
Space
Partition
Small Accurate,
fast
Graph Cache-based
optimization
Candidate
Filtering
Medi
um
Small Coarse
encoding
Data locality,
inter/intra query
parallelism
Result
Validation
Large Accurate Flat SSD-based Storage,
compute-read pipeline
Layers: optimization opportunity