Intro in Product Management - Коротко про професію продакт менеджера
RheoData_23ai_Vector-Datatype-Webinar-2024.pptx
1. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Oracle 23ai:
AI Vector Search:
The What, The How, & The
Possibilities
Will start about 3 minutes
after the hour.
2. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Webinar
Keeping
Use Q&A dialog to ask
questions. Will answer as
pauses occur or at end of
webinar.
3. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Speaker
Bobby Curtis, MBA
Atlanta, GA
• Author
• Speaker
@dbasolved | @rheodatallc
https://dbasolved.com | https://www.rheodata.com/blog
bobby.curtis@rheodata.com
https://www.slideshare.net/BobbyCurtisMBA
4. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Cloud Managed Services Provider (CMSP)
24/7 On-Call Support
Data and Cloud Migrations
Near-Zero Downtime
Data Engineering
Analytics, Machine Learning, Artificial Intelligence
(Oracle Digital Assistants)
License Management
On-Premises & Cloud Cost Management
Data Modernization
Data Visualization, Cyber Security, Data Warehouse, etc.
Service Offerings:
www.rheodata.com | www.rheodata.ai
Co-Managed
Fully-Managed
Professional
Services
6. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
The What?
7. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Vector Database are a new
classification of specialized databases
designed for AI workloads that allows
you to query based on semantics,
rather than keywords.
8. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Vector Search enhances information
retrieval by mapping queries to relevant
data in your database based on
semantics, instead of precise matches,
using vectors to measure similarity.
9. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
45
68
16
50
42
Vector
Vectors are a sequence of numbers,
called dimensions, used to capture the
important “features” of the data.
Vectors in AI represent semantics of
unstructured data such as images,
documents, videos, etc.
10. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
45
68
16
50
42
Vector
The mathematical distance between two vectors
indicate how similar they are
Vectors represent the semantic content
of data, not the underlying words or
pixels
13. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Input
Data
Object
Embedding
Model 4 8 42 72
Vector Search
Relevant Content
Data Object
Retrieval
Vector ID
Matches
Vector Embedding
14. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
The How? -> The Math
15. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
The Math?
Euclidean/Squared
Euclidean Distance
Cosine Similarity
Reflects the distance
between each of the
vector coordinates being
compared -> straight line
distance between vectors
Measures the cosine of
the angle between two
vectors.
16. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
The Math?
DOT Product Similarity Manhattan Distance
Similarity of two vectors
are viewed by multiplying
the size of each vector by
the cosine of their angle.
Calculated by summing
the distance between
the coordinates of two
vectors.
17. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
The Math?
Hamming Distance
The distance between two vectors
represent the number of coordinates where
they differ.
18. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
The How? -> Oracle
19. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Vector data types can be declared in the following forms:
Vector Datatype
Vector Arbitrary number of dimensions and
format
Vector (*,*) Vector = Vector(*,*)
Vector (number_of_dimensions, *) Vectors must all have the specified
number of dimensions, or an error is
thrown. No format modification.
Vector(*, dimension_element_format) Arbitrary number of dimensions, but
format will be up-converted or down-
converted to the specific dimension
(INT8, FLOAT32, FLOAT64).
Vector(number_of_dimensions,
dimension_element_format)
Must have all the specified number of
dimensions, or error thrown. Each will be
up-converted or down-converted to the
specified format (INT8, FLOAT32,
FLOAT64).
Vectors can be NULL but dimensions cannot:
[1.1, NULL, 2.2]
Same Thing
21. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Vector Base Operations
TO_VECTOR || VECTOR Constructors for vectors
FROM_VECTOR || VECTOR_SERIALIZE Functions take a vector and return as
string in VARCHAR2 or CLOB as output
VECTOR_NORM Function returns the Euclidean norm of
the vector in BINARY_DOUBLE
(SQRT(SUM((xi-yi)2))
VECTOR_DIMENSION_COUNT Function returns the number of
dimensions of a vector as an Oracle
number
VECTOR_DIMENSION_FORMAT Returns the storage format of the vector
in VARCHAR2 format (INT8, FLOAT32,
FLOAT64)
VECTOR_DISTANCE* Function returns Squared Euclidean
distance. Default for Oracle AI Vector
Search
There are a few base functions that are used with Vectors.
23. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
How do I create a “vector”?
24. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Declare vector datatypes during table creation.
Example 1:
CREATE TABLE text_vec
(
id NUMBER,
embeddings VECTOR
);
Example 2:
CREATE TABLE text_vec
(
id NUMBER,
embeddings VECTOR(384, INT8)
);
25. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Vector datatypes can be added to tables later as well.
Example 1:
CREATE TABLE text_vec
(
id NUMBER,
col1 varchar2(50),
);
ALTER TABLE text_vec add
(
vec vector
);
27. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Let’s go with Python!
28. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 1 of 4
import oracledb
import cohere
import os
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
29. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 2 of 4
with open("./Scripts/oracle/vector/cities.txt") as f:
cities = f.read()
#print(cities)
text_splitter = CharacterTextSplitter(separator="n",
chunk_size=35, chunk_overlap=5, length_function=len,
is_separator_regex=False)
texts = text_splitter.split_text(cities)
30. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 3 of 4
api_key = ”API Key"
os.environ["COHERE_API_KEY"] = api_key
print(os.environ["COHERE_API_KEY"])
co = cohere.Client(os.getenv("COHERE_API_KEY"))
Cohere offers two types of API Keys:
• Development (Limit 10 calls per minute)
• Production (unlimited calls per minute)
31. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 3.5 of 4
def database_connection():
try:
connection = oracledb.connect(
user="vector",
password="vector",
dsn="100.130.230.100:1521/freepdb1"
)
print('connected’)
return connection
except:
print('Could not make a connection')
32. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 4 of 4
connection = database_connection()
id_val = 1
for i in texts:
response = co.embed(
texts=[i],
model='embed-english-light-v3.0',
input_type='classification'
)
vtext = i
vec = response.embeddings[0]
#print(vtext)
#print(len(vec))
#print(vec)
cursor = connection.cursor()
cursor.setinputsizes(None, oracledb.DB_TYPE_VECTOR)
cursor.execute("insert into vector.text_vec (id, txt_vec, text)
values (:1, :2, :3)",[id_val, vec, vtext])
connection.commit()
#print("recorded inserted")
id_val = id_val + 1
print(str(id_val-1) + " inserted")
cursor.execute("select * from vector.text_vec")
for row in cursor:
print(row)
34. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Create Vectors
select text
from vector.text_vec
order by vector_distance(txt_vec,(
select txt_vec
from vector.text_vec_lookup
where id = :1), DOT)
fetch first :2 rows only;
36. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Similarly Searches (Demo)
Explain Plan
• No indexes
• Selected 43 rows in milliseconds – all text
New webinar: Vector Indexes, coming soon.
37. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
How do I embed a LLM within the database?
38. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
ONNX Files
ONNX = Open Neural Network Exchange
An open-source format designed for machine-
learning models. Ensuring cross-platform
compatibility and supports major languages
and frameworks, facilitating easy and efficient
model exchanges.
39. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 1 of 5
$ export
ORACLE_HOME_23ai=/opt/oracle/product/23ai/dbhome_1
$ cd $ORACLE_HOME_23ai/python/bin
$ python -V
$ export PATH=$ORACLE_HOME_23ai/python/bin:$PATH
$ python -v
• Verify that the correct Python is installed.
• Python 3.12 (min)
40. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 2 of 5
$ cd ~
$ mkdir onnx
• Make directory for all ONNX files
41. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 3 of 5
$ cd ~/onnx
$ unzip ./omlutils.zip -d
• Change directory to ONNX directory
• Unzip OMLUTILS.zip file
42. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 4 of 5
$ cd ~/onnx
$ python -m pip install -r requirements.txt
$ python -m pip install omlutils-0.13.0-cp312-cp312-
linux_x86_64.whl
• Install all the necessary requirments
• Install OMLUTILS wheel file
43. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Python Approach – 5 of 5
$ cd ~/onnx
$ python
>>> from omlutils import EmbeddingModel,
EmbeddingModelConfig
>>> em = EmbeddingModel(model_name="sentence-
transformers/all-MiniLM-L6-v2”)
>>> em.export2file("all-MiniLM-L6-v2",output_dir=".”)
>>> exit()
$
• Install requested LLM into database using Python
Interactive Prompt
• View model in Oracle Data Dictonary
44. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
New Database Views
*_DATA_MODELS
SELECT * FROM *_DATA_MODELS;
45. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Using the Embedded Model
(TO_VECTOR &
VECTOR_EMBEDDING)
SELECT
to_vector(vector_embedding(VECTOR.DOC_MODE
L using url as data)) as embedding
from ggai.base;
46. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Updating existing table with
Vectors
update ggai.base_url
set urlvec = (SELECT
to_vector(vector_embedding(VECTOR.DOC_MODEL using
url as data)) from ggai.base_url where id = 1)
where client = upper('test');
47. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
The Possibilities?
48. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
The possibilities of Oracle Vector Search are going
to drive growth in everything!
49. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Retrieval Augmented Generation (RAG)
A technique for enhancing the accuracy
and reliability of generative AI models with facts
from external sources
50. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Input
Data
Object
Embedding
Model 4 8 42 72
Vector Search
Relevant Content
Data Object
Retrieval
Vector ID
Matches
Vector Embedding
51. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Use Cases:
• Search (known use case)
• Natural-Language Processing/Search (semantic search)
• Recommendation Systems (retail, customer service)
• Biometric and Anomaly Detection (healthcare)
• Drug discovery and genomics (healthcare)
• Patient similarity Analysis (healthcare)
• Personalized Digital Assistants
52. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
• Private data shared via Vector databases/data type need to be
secure.
• Organizations should evaluate what data is vectorized (shared)
before using with public LLMs.
• Consider building private LLMs to keep data secure.
54. Copyright, 2024 RheoData and affiliates
bobby.curtis@rheodata.com | www.rheodata.com | @rheodatallc
Next webinars!
June 13, 2024 @ 1 pm EST:
Oracle GoldenGate 23ai: What is new and how to upgrade
June 27, 2024 @ 1 pm EST (tentative):
Oracle Database 23ai: Vector Indexes – What and how to use