This talk is an introduction to the vector search engine Weaviate. You will learn how storing data using vectors enables semantic search and automatic data classification. Topics like the underlying vector storage mechanism and how the pre-trained language vectorization model enables this are touched. In addition, this presentation consists of live demos to show the power of Weaviate and how you can get started with your own datasets. No prior technical knowledge is required; all concepts are illustrated with real use case examples and live demos. Most of all data is unstructured. Additionally, data is often stored without context, meaning and relation to concepts in the real world. This means that all this data is difficult to index, classify and search through. While this is traditionally solved by manual effort or expensive machine learning models, Weaviate takes another approach to this problem. Weaviate is a vector search engine, which stores data as vectors and automatically adds context and meaning to new data. This enables to search through the data without using exact matching keywords. Moreover, data can be automatically classified. Weaviate is completely open source, has a built-in machine learning model, has a graph-like data model, completely API-based and is cloud-native. Weaviate uses a GraphQL API next to RESTful endpoints to interact with the data in an intuitive manner. Additionally, Python, Go and JavaScript clients are available to facilitate interaction between Weaviate and your applications. GraphQL and client examples will be shown in the presentation.
4. Weaviate-whyavectorsearchengine?
4
"Wine for seafood"
No products found ...
Traditionalsearchengine
"Wine for seafood"
Vectorsearchengine
Covey Run 2005
Chardonnay
{ "data": [{
"Wine": "Covey Run 2005 Chardonnay",
"Description": "... good with fish ..."
}]}
5. 5
How to do this on
your own data?
And how so
extremely fast?
The question is very
abstract.
How did Google find
exactly this data
node?
How can we predict
this relation?
6. What'stheproblem?
● 80% to 90% of data is unstructured, that’s about 34 trillion gigabytes!
● We keep storing more and more data but businesses can’t get valuable insights from it.
● 95% of businesses have issues getting value from their unstructured data.
6
7. 7
How to achieve search and automatically
make relations within your own data?
In a easy, fast, secure and scalable way?
8. 8
Weaviate is a cloud-native, modular,
real-time vector search engine
built to scale your machine learning models
9. 9
Weaviate is a cloud-native, modular,
real-time vector search engine
built to scale your machine learning models
10. 10
Weaviate is a cloud-native, modular,
real-time vector search engine
built to scale your machine learning models
11. 11
Weaviate is a cloud-native, modular,
real-time vector search engine
built to scale your machine learning models
12. 12
Weaviate is a cloud-native, modular,
real-time vector search engine
built to scale your machine learning models
17. Weaviate-Howdoesitwork?
● Data is stored as high dimensional vectors
● Vector positions capture data meaning and
context
● Pre-trained NLP module for automatic
○ Vectorization
○ Classification
○ Nearest neighbor search
⇒ Weaviateunderstandsthe data
17
18. Weaviate-Howdoesitwork?
1. Weaviate Machine Learning Modules
● E.g. NLP module trained with fasttext
● This language model represents all words
and concepts in the hyperspace.
18
19. Weaviate-Howdoesitwork?
2. Automatically vectorize and index your
data
● Weaviate understandsyour data
● When you import data: Weaviate looks at
the language in your data object
● E.g. a Chardonnay is closely related to Wine,
White and the food it fits with
19
20. Weaviate-Howdoesitwork?
3. Search query
● Your search queries in natural language will
also be vectorized and understood by the
machine learning module of Weaviate
● It is places close to the words and data
object that are semantically related to the
query
● E.g. "Winethatfitswithaseafooddish"
20
24. Weaviate-Corefeatures&possibilities
24
Search
howtosearchthevectorspace
Unique combination of
scalar ('non-semantic') search and
ANN or vector ('semantic') search
Classification
howtoclassifyinthevectorspace
Automatically making relations
between data and derive insights
Weaviate Modules
out-of-the-box modules you can use
e.g. Text or Image vectorization,
Question Answering module,
Custom modules or No modules, etc
25. WeaviateVectorizationModules
● Textvectorizationmodules
○ Weaviate's NLP module: Contextionary
○ Transformer modules: general purpose from Huggingface
● Imagevectorizationmodules
○ To store and search through images
● Question Answering (Q&A) modules
○ To find answers in textual data
● Custom modules
● No modules
○ Vector storage and -search only
25
27. Open Source
all code and models are open
Weaviate-ScalableVectorSearchEngine
27
End-to-end
Complete solution for industry,
API driven
Cloud-native
Secure
Scalable and fast
Vectorizing and querying big data
Weaviate Modules
out-of-the-box modules you can use
e.g. Text or Image vectorization,
Question Answering module,
Custom modules or No modules, etc
Search
howtosearchthevectorspace
Unique combination of
scalar ('non-semantic') search and
ANN or vector ('semantic') search
Classification
howtoclassifyinthevectorspace
Automatically making relations
between data and derive insights