SlideShare a Scribd company logo
1 of 19
Download to read offline
Vector Similarity Search & Indexing Methods
Xiaomeng Yi
Senior Researcher, Zilliz
© 2020 Zilliz. All rights reserved.
Vector Similarity Search
© 2020 Zilliz. All rights reserved.
Information Retrieval: from text to versatile data types
How to measure similarity between data?
© 2020 Zilliz. All rights reserved.
Embeddings: represent data as vectors
a b c
a
b
c
© 2020 Zilliz. All rights reserved.
Efficiency problem for big data
• Trade accuracy for efficiency
• Indexing method
© 2020 Zilliz. All rights reserved.
Indexing Methods
© 2020 Zilliz. All rights reserved.
Graph based Index: general idea
Approximate nearest neighbor algorithm based on navigable small world graphs
© 2020 Zilliz. All rights reserved.
Graph based Index: general idea
Approximate nearest neighbor algorithm based on navigable small world graphs
© 2020 Zilliz. All rights reserved.
Graph based Index: general idea
Approximate nearest neighbor algorithm based on navigable small world graphs
© 2020 Zilliz. All rights reserved.
Graph: optimizations
Efficient and robust approximate nearest neighbor search
using Hierarchical Navigable Small world graphs
Approximate nearest neighbor algorithm based on
navigable small world graphs
© 2020 Zilliz. All rights reserved.
Example:
Space Partition based index
Approximate nearest neighbor
methods and vector models The inverted Multi-Index
© 2020 Zilliz. All rights reserved.
Optimization for space partition
Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
© 2020 Zilliz. All rights reserved.
Encoding based index: general idea
Product quantization for nearest neighbor search
© 2020 Zilliz. All rights reserved.
Encoding: product quantization
Similarity Query Processing for High-Dimensional Data
© 2020 Zilliz. All rights reserved.
Comparison
Fast, accurate, and small,
never reached at the same time…
Fast
Accurate Small
HNSW L&C
IVF_PQ
IVF
_SQ
FLAT ∅
© 2020 Zilliz. All rights reserved.
Flexible indexes: A layered framework
© 2020 Zilliz. All rights reserved.
Layers: function decomposition
Layer
Function
Data Size Candidates
for a query
Requireme
nt
Space
Partition
Regions Small Full Accurate,
Fast
Candidate
Filtering
Compress
ed vectors
Mediu
m
Small
portion
Fast
Result
Validation
Original
vectors
Large Very small
portion
Accurate
© 2020 Zilliz. All rights reserved.
Layer
Function
Size Require
ment
Index Type
(Adjustable)
Optimization
Opportunity
Space
Partition
Small Accurate,
fast
Graph Cache-based
optimization
Candidate
Filtering
Medi
um
Small Coarse
encoding
Data locality,
inter/intra query
parallelism
Result
Validation
Large Accurate Flat SSD-based Storage,
compute-read pipeline
Layers: optimization opportunity
© 2020 Zilliz. All rights reserved.
Thank you!

More Related Content

What's hot

Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Simplilearn
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Skillspeed
 

What's hot (20)

Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
Building a Directed Graph with MongoDB
Building a Directed Graph with MongoDBBuilding a Directed Graph with MongoDB
Building a Directed Graph with MongoDB
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
DynamodbDB Deep Dive
DynamodbDB Deep DiveDynamodbDB Deep Dive
DynamodbDB Deep Dive
 
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsIntroduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals
 
Machine Learning with Spark MLlib
Machine Learning with Spark MLlibMachine Learning with Spark MLlib
Machine Learning with Spark MLlib
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Mongo db workshop # 01
Mongo db workshop # 01Mongo db workshop # 01
Mongo db workshop # 01
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic training
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Webinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDBWebinar | Introduction to Amazon DynamoDB
Webinar | Introduction to Amazon DynamoDB
 
Serving ML easily with FastAPI
Serving ML easily with FastAPIServing ML easily with FastAPI
Serving ML easily with FastAPI
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 
Apache hive
Apache hiveApache hive
Apache hive
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 

Similar to Vector Similarity Search & Indexing Methods

Predictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is DifferentPredictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is Different
Altoros
 

Similar to Vector Similarity Search & Indexing Methods (20)

Government GraphSummit: Optimizing the Supply Chain
Government GraphSummit: Optimizing the Supply ChainGovernment GraphSummit: Optimizing the Supply Chain
Government GraphSummit: Optimizing the Supply Chain
 
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
 
GPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge GraphGPT and Graph Data Science to power your Knowledge Graph
GPT and Graph Data Science to power your Knowledge Graph
 
Optimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j GraphOptimizing Your Supply Chain with the Neo4j Graph
Optimizing Your Supply Chain with the Neo4j Graph
 
K Nearest Neighbor V1.0 Supervised Machine Learning Algorithm
K Nearest Neighbor V1.0 Supervised Machine Learning AlgorithmK Nearest Neighbor V1.0 Supervised Machine Learning Algorithm
K Nearest Neighbor V1.0 Supervised Machine Learning Algorithm
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
 
Workshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data ScienceWorkshop - Neo4j Graph Data Science
Workshop - Neo4j Graph Data Science
 
Fast Data – the New Big Data
Fast Data – the New Big DataFast Data – the New Big Data
Fast Data – the New Big Data
 
GPSBUS211-Edge Intelligence for IoT Applications
GPSBUS211-Edge Intelligence for IoT ApplicationsGPSBUS211-Edge Intelligence for IoT Applications
GPSBUS211-Edge Intelligence for IoT Applications
 
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ... SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
SRV307 Applying AWS Purpose-Built Database Strategy: Match Your Workload to ...
 
Optimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4jOptimizing Your Supply Chain with Neo4j
Optimizing Your Supply Chain with Neo4j
 
Predictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is DifferentPredictive Analytics: Why (I)IoT Is Different
Predictive Analytics: Why (I)IoT Is Different
 
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptxRisk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
Risk Signature Profiles in Health Care Claims(Risk_Signature_Profiles)_.pptx
 
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
El camino hacia el éxito con las bases de datos de grafos, la ciencia de dato...
 
Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
 
Webinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph DatabasesWebinar: Fighting Fraud with Graph Databases
Webinar: Fighting Fraud with Graph Databases
 
Turbo charge-your-analytics-with-ibm-netezza-and-revolution-r-enterprise-pres...
Turbo charge-your-analytics-with-ibm-netezza-and-revolution-r-enterprise-pres...Turbo charge-your-analytics-with-ibm-netezza-and-revolution-r-enterprise-pres...
Turbo charge-your-analytics-with-ibm-netezza-and-revolution-r-enterprise-pres...
 
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
FIWARE Wednesday Webinars - Cities as Enablers of the Data Economy: Smart Dat...
 
Challenges on geo spatial visual analytics eurographics
Challenges on geo spatial visual analytics eurographicsChallenges on geo spatial visual analytics eurographics
Challenges on geo spatial visual analytics eurographics
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Vector Similarity Search & Indexing Methods

  • 1. Vector Similarity Search & Indexing Methods Xiaomeng Yi Senior Researcher, Zilliz
  • 2. © 2020 Zilliz. All rights reserved. Vector Similarity Search
  • 3. © 2020 Zilliz. All rights reserved. Information Retrieval: from text to versatile data types How to measure similarity between data?
  • 4. © 2020 Zilliz. All rights reserved. Embeddings: represent data as vectors a b c a b c
  • 5. © 2020 Zilliz. All rights reserved. Efficiency problem for big data • Trade accuracy for efficiency • Indexing method
  • 6. © 2020 Zilliz. All rights reserved. Indexing Methods
  • 7. © 2020 Zilliz. All rights reserved. Graph based Index: general idea Approximate nearest neighbor algorithm based on navigable small world graphs
  • 8. © 2020 Zilliz. All rights reserved. Graph based Index: general idea Approximate nearest neighbor algorithm based on navigable small world graphs
  • 9. © 2020 Zilliz. All rights reserved. Graph based Index: general idea Approximate nearest neighbor algorithm based on navigable small world graphs
  • 10. © 2020 Zilliz. All rights reserved. Graph: optimizations Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small world graphs Approximate nearest neighbor algorithm based on navigable small world graphs
  • 11. © 2020 Zilliz. All rights reserved. Example: Space Partition based index Approximate nearest neighbor methods and vector models The inverted Multi-Index
  • 12. © 2020 Zilliz. All rights reserved. Optimization for space partition Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors
  • 13. © 2020 Zilliz. All rights reserved. Encoding based index: general idea Product quantization for nearest neighbor search
  • 14. © 2020 Zilliz. All rights reserved. Encoding: product quantization Similarity Query Processing for High-Dimensional Data
  • 15. © 2020 Zilliz. All rights reserved. Comparison Fast, accurate, and small, never reached at the same time… Fast Accurate Small HNSW L&C IVF_PQ IVF _SQ FLAT ∅
  • 16. © 2020 Zilliz. All rights reserved. Flexible indexes: A layered framework
  • 17. © 2020 Zilliz. All rights reserved. Layers: function decomposition Layer Function Data Size Candidates for a query Requireme nt Space Partition Regions Small Full Accurate, Fast Candidate Filtering Compress ed vectors Mediu m Small portion Fast Result Validation Original vectors Large Very small portion Accurate
  • 18. © 2020 Zilliz. All rights reserved. Layer Function Size Require ment Index Type (Adjustable) Optimization Opportunity Space Partition Small Accurate, fast Graph Cache-based optimization Candidate Filtering Medi um Small Coarse encoding Data locality, inter/intra query parallelism Result Validation Large Accurate Flat SSD-based Storage, compute-read pipeline Layers: optimization opportunity
  • 19. © 2020 Zilliz. All rights reserved. Thank you!