Weitere ähnliche Inhalte
Ähnlich wie Data meets AI - AICUG - Santa Clara (20)
Mehr von Sandesh Rao (18)
Kürzlich hochgeladen (20)
Data meets AI - AICUG - Santa Clara
- 1. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
When Data meets AI
AICUG Meetup – Santa Clara , Oracle
Sandesh Rao
VP AIOps , Autonomous Database
- 2. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, timing, and pricing of any
features or functionality described for Oracle’s products may change and remains at the
sole discretion of Oracle Corporation.
- 3. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
whoami
Real
Application
Clusters - HA
DataGuard-
DR
Machine
Learning-
AIOps
Enterprise
Management
Sharding
Big Data
Operational
Management
Home
Automation
Geek
@sandeshr
https://www.linkedin.com/in/raosandesh/
- 4. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Agenda
• What motivated us to go into Machine
Learning ?
• Which algorithms, tools & technologies are
used?
• Oracle & Machine Learning initiatives and
tools
• Cx_oracle and OML4Py -
• Questions and Open Talk
- 5. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 5
Why Machine Learning for us and why now?
• Lots of Data generated as exhaust from systems
– Cloud , different formats and interfaces , frameworks
• Machine Learning has become accessible
– Anyone can be a Data Scientist
– Algorithms are accessible as libraries aka scikit , keras ,
tensorflow ..
– Sandbox to get started as easy as a docker init
• Business use cases
• How to find value from the data , fewer guesses to make decisions
- 6. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
ML Project Workflow
• Set Business Objectives
• Gather , Prepare and Cleanse Data
• Model Data
– Feature Extraction , Test , Train ,
Optimizer
– Loss Function , effectiveness
– Framework and Library to use
• Apply the Model as an inference
engine
– Decision making using the Model’s
output
– Tune Model till outcome is closer to
Business Objective
6
Set Business
Objectives
Understand Use
case
Create Pseudo
Code
Synthetic Data
Generation
Pick Tools and
Frameworks
Train Test Model
Deploy Model
Measure Results
and Feedback
- 7. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Types of Machine Learning
Supervised Learning
Predict future outcomes with the help of
training data provided by human experts
Semi-Supervised Learning
Discover patterns within raw data and make
predictions, which are then reviewed by human
experts, who provide feedback which is used to
improve the model accuracy
Unsupervised Learning
Find patterns without any external input other
than the raw data
Reinforcement Learning
Take decisions based on past rewards for this
type of action
7
- 8. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
• Hierarchical k-means, Orthogonal
Partitioning Clustering, Expectation-
Maximization
Clustering
Feature Extraction/Attribute
Importance / Component Analysis
• Decision Tree, Naive Bayes, Random
Forest, Logistic Regression, Support
Vector Machine
Classification
8
Machine Learning Algorithms
• Multiple Regression, Support Vector
Machine, Linear Model, LASSO, Random
Forest, Ridge Regression, Generalized
Linear Model, Stepwise Linear Regression
Regression
Association & Collaborative Filtering
Reinforcement Learning - brute force,
Monte Carlo, temporal difference....
• Many different use cases
Neural network & deep Learning with
Deep Neural Network
- 9. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Modeling Phase – AutoML to the rescue
Provide Dataset to
AutoML2
Configuration parameters
for model picked
Dataset is divided into
training set & testing set
Actual Training
Evaluate performance of
trained model
Tweak model parameters,
change predictors change
test/train data splits and
change algorithms
Pick model plus
parameters depending on
outcome and measure , F1
, Precision , Recall , MSE
Document all runs and
apply A/B testing to see
what the variations
produce
9
- 10. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 10
Tools & Libraries Assisting ML projects
- 11. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
What is Oracle Doing Around Machine Learning?
• Big Data Appliance
• Big Data Discovery , Big Data Preparation Data Visualization Cloud
• Analytics Cloud
– Sales, Marketing, HCM on top of SaaS
• DaaS – Oracle Data Cloud , Eloqua ..
• Oracle Labs (labs.oracle.com)
– Machine Learning Research Group
• Autonomous Database
– Zeppelin Notebooks preloaded with use cases
– Applied Machine Learning used for Implementing AIOps
11
- 12. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle AI Platform Cloud Service – Coming Soon…
• Collaborative end-to-end machine learning in the cloud
• Enables data science teams to
– Organize their work
– Access data and computing resources
– Build , Train , Deploy
– Manage models
• Collaborative , Self-Service , Integrated
• https://cloud.oracle.com/en_US/ai-platform
12
- 13. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Autonomous Data Warehouse Cloud Key Features
Highly Elastic
Independently scale compute and
storage, without having to overpay for
fixed blocks of resources
Built-in Web-Based SQL ML Tool
Apache Zeppelin Oracle Machine Learning
notebooks ready to run ML from browser
Database migration utility
Dedicated cloud-ready migration tools
for easy migration from Amazon
Redshift, SQL Server and other databases
Enterprise Grade Security
Data is encrypted by default in the cloud,
as well as in transit and at rest
High-Performance Queries
and Concurrent Workloads
Optimized query performance with
preconfigured resource profiles for different
types of users
Oracle SQL
Autonomous DW Cloud is compatible with
all business analytics tools that support
Oracle Database
Self Driving
Fully automated database for self-tuning
patching and upgrading itself while the
system is running
Cloud-Based Data Loading
Fast, scalable data-loading from Oracle
Object Store, AWS S3, or on-premises
13
Oracle Machine Learning
- 14. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning and Advanced Analytics
• Support multiple data platforms, analytical engines, languages, UIs and
deployment strategies
Strategy and Road Map
Big Data / Big Data Cloud Relational
ML Algorithms
Common core, parallel, distributed
SQL R, Python, etc.GUI
Data Miner, RStudio
Notebooks
Advanced Analytics
Oracle Database Cloud DWCS
- 15. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
CLASSIFICATION
– Naïve Bayes
– Logistic Regression (GLM)
– Decision Tree
– Random Forest
– Neural Network
– Support Vector Machine
– Explicit Semantic Analysis
CLUSTERING
– Hierarchical K-Means
– Hierarchical O-Cluster
– Expectation Maximization (EM)
ANOMALY DETECTION
– One-Class SVM
TIME SERIES
– Holt-Winters, Regular & Irregular,
with and w/o trends & seasonal
– Single, Double Exp Smoothing
REGRESSION
– Linear Model
– Generalized Linear Model
– Support Vector Machine (SVM)
– Stepwise Linear regression
– Neural Network
– LASSO
ATTRIBUTE IMPORTANCE
– Minimum Description Length
– Principal Comp Analysis (PCA)
– Unsupervised Pair-wise KL Div
– CUR decomposition for row & AI
ASSOCIATION RULES
– A priori/ market basket
PREDICTIVE QUERIES
– Predict, cluster, detect, features
SQL ANALYTICS
– SQL Windows, SQL Patterns,
SQL Aggregates
A1 A2 A3 A4 A5 A6 A7
• OAA (Oracle Data Mining + Oracle R Enterprise) and ORAAH combined
• OAA includes support for Partitioned Models, Transactional, Unstructured, Geo-spatial, Graph data. etc,
Oracle’s Machine Learning & Adv. Analytics Algorithms
FEATURE EXTRACTION
– Principal Comp Analysis (PCA)
– Non-negative Matrix Factorization
– Singular Value Decomposition (SVD)
– Explicit Semantic Analysis (ESA)
TEXT MINING SUPPORT
– Algorithms support text type
– Tokenization and theme extraction
– Explicit Semantic Analysis (ESA) for
document similarity
STATISTICAL FUNCTIONS
– Basic statistics: min, max,
median, stdev, t-test, F-test,
Pearson’s, Chi-Sq, ANOVA, etc.
R PACKAGES
– CRAN R Algorithm Packages
through Embedded R Execution
– Spark MLlib algorithm integration
EXPORTABLE ML MODELS
– C and Java code for deployment
- 16. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning
Key Features
• Collaborative UI for data scientists
– Packaged with Autonomous Data
Warehouse Cloud (V1)
– Easy access to shared notebooks,
templates, permissions, scheduler, etc.
– SQL ML algorithms API (V1)
– Supports deployment of ML analytics
Machine Learning Notebook for Autonomous Data Warehouse Cloud
- 17. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning UI in ADW
- 20. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
AI and ML with Python and Oracle
1
2
3
What is Python
Oracle’s Advanced Analytics
cx_Oracle Package
Oracle Machine Learning for Python
20
4
- 21. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
What is Python?
• An interpreted, object-oriented, high level, general purpose
programming language
• Designed for rapid application development and scripting to connect
existing components
• Open source scripting language and environment
https://www.python.org
• Created in the late 1980s
• World-wide usage
– Widely taught in Universities
– Many Data Scientists know and use Python
• Thousands of open source packages to enhance productivity
21
- 22. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Popularity by question views for major PLs
Growth of major programming languages using Stack Overflow question views
22
https://insights.stackoverflow.com/trends?tags=python%2Cjavascript%2Cjava%2Cc%23%2Cphp%2Cc%2B%2B&utm_source=so
-owned&utm_medium=blog&utm_campaign=gen-blog&utm_content=blog-link&utm_term=incredible-growth-python
- 23. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Why use Python?
23
Small
Resembles English
Many third-party libraries
Strict punctuation rules
Uniform code formatting
PyPI – 168203 projects
https://pypi.python.org/pypi
Wide-spread user groupsSimple language
Heavily used for websites
Increasingly used by data scientists
- 24. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Python IDEs
24
- 25. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OAC/OBIEE/ODV
Oracle Database Enterprise Edition
Oracle’s Advanced Analytics
Multiple interfaces across platforms — SQL, R, Python*, GUI, Dashboards, Apps
Oracle Advanced Analytics - Database Option
SQL, R & Python* Integration
for Scalable, Distributed, Parallel in-Database ML Execution
SQL Developer/
Oracle Data Miner
ApplicationsR & Python* Clients
Data / Business AnalystsR & Python programmers Business Analysts/Mgrs Domain End UsersUsers
Platform
Hadoop
Oracle R Advanced
Analytics for Hadoop
Big Data Connectors
Parallel, distributed
Spark-based algorithms
Oracle Cloud
25
Oracle Database
* Not yet released
- 26. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Advanced Analytics differentiators
Work directly with data in Database and Hadoop
• Eliminate need to request extracts from IT/DBA – immediate access to database and Hadoop data
• Process data where they reside – minimize or eliminate data movement
Scalability and Performance
• Use parallel, distributed algorithms that scale to big data on Oracle Database and Hadoop platforms
• Leverage powerful engineered systems to build models on billions of rows of data or
millions of models in parallel
Ease of deployment
• Using Oracle Database, place Python, R, and SQL scripts immediately in production (no need to recode)
• Use production quality infrastructure without custom plumbing or extra complexity
Process support
• Maintain and ensure data security, backup, and recovery using existing processes
• Store, access, manage, and track analytics objects (models, scripts, workflows, data) in Oracle Database
26
- 27. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle’s Python Technologies
Supporting Oracle Database
• cx_Oracle package
• Oracle Machine Learning for Python
Component of the Oracle Advanced Analytics option to Oracle Database
27
- 28. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
cx_Oracle
• Python package enabling scalable and performant connectivity to Oracle Database
– Open source, publicly available on PyPI, OTN, and github
– Oracle is maintainer
• Oracle Database Interface for Python conforming to Python DB API 2.0 specification
– Optimized driver based on OCI
– Execute SQL statements from Python
– Enables transactional behavior for insert, update, and delete
Oracle Database
cx_Oracle
28
- 29. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
cx_Oracle - Requirements
• Easily installed from PyPI
• Support for Python 2 and 3
• Support for Oracle Client 11.2, 12.1, 12.2, 18
– Oracle's standard cross-version interoperability, allows easy upgrades and
connectivity to different Oracle Database versions
• Connect to Oracle Database 9.2, 10, 11, 12, 18
– (Depending on the Oracle Client version used)
• SQL and PL/SQL Execution
– Underlying Oracle Client libraries have optimizations: compressed fetch, pre-fetching,
client and server result set caching, and statement caching with auto-tuning
29
- 30. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
cx_Oracle Example
import cx_Oracle
con = cx_Oracle.connect('pythonhol/welcome@127.0.0.1/orcl')
print(con.version)
con.close()
con = cx_Oracle.connect('pythonhol', 'welcome', '127.0.0.1:/orcl:pooled',
cclass = "HOL", purity = cx_Oracle.ATTR_PURITY_SELF)
con.close()
30
- 31. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
cx_Oracle Example
cur = con.cursor() # opens cursor for statements to use
cur.execute('select * from departments order by department_id')
for result in cur: # prints all data
print(result)
#or
row = cur.fetchone() # return a single row as tuple and advance row
print(row)
row = cur.fetchone()
print(row)
#or
res = cur.fetchmany(numRows=3) # returns list of tuples
print(res)
cur.close()
con.close()
31
- 32. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data Types (1)
Full listing for cx_Oracle
cx_Oracle Type Oracle Type Python Type
cx_Oracle.BINARY RAW bytes (Python 3), str (Python 2)
cx_Oracle.BFILE BFILE cx_Oracle.LOB
cx_Oracle.BOOLEAN boolean (PL/SQL only) bool
cx_Oracle.CLOB CLOB cx_Oracle.LOB
cx_Oracle.CURSOR REF CURSOR cx_Oracle.Cursor
cx_Oracle.DATETIME DATE datetime.datetime
cx_Oracle.FIXED_CHAR CHAR str
cx_Oracle.FIXED_NCHAR NCHAR str (Python 3), unicode (Python 2)
cx_Oracle.INTERVAL INTERVAL DAY TO SECOND datetime.timedelta
cx_Oracle.LOB CLOB, BLOB, BFILE, NCLOB cx_Oracle.LOB
cx_Oracle.LONG_BINARY LONG RAW bytes (Python 3), str (Python 2)
32
- 33. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data Types (2)
Full listing for cx_Oracle
cx_Oracle Type Oracle Type Python Type
cx_Oracle.LONG_STRING LONG str
cx_Oracle.NATIVE_FLOAT BINARY_DOUBLE float
cx_Oracle.NATIVE_INT - int (Python 3), long/int (Python 2)
cx_Oracle.NCHAR NVARCHAR2 str (Python 3), unicode (Python 2)
cx_Oracle.NCLOB NCLOB cx_Oracle.LOB
cx_Oracle.NUMBER NUMBER float
cx_Oracle.OBJECT instances created by CREATE OR REPLACE
TYPE
cx_Oracle.Object
cx_Oracle.ROWID ROWID str
cx_Oracle.STRING VARCHAR2 str
cx_Oracle.TIMESTAMP TIMESTAMP, TIMESTAMP WITH TIME ZONE,
TIMESTAMP WITH LOCAL TIME ZONE
datetime.datetime
33
- 34. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 34
Oracle Machine Learning
for Python (OML4Py)
- 35. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Traditional Python and Database Interaction
• Access latency
• Memory limitation – data size
• Single threaded
• Paradigm shift: Python à SQL à Python
• Ad hoc production deployment
• Issues for backup, recovery, security
Python
script
cron job
Database
Flat Files extract / exportread
export load
35
SQL
mxODBC, pyodbc, turboodbc, JayDeBeApi, cx_Oracle
- 36. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for Python
Oracle Advanced Analytics option to Oracle Database >= 18c
• Use Oracle Database as HPC environment
• Use in-database parallel and distributed
machine learning algorithms
• Manage Python scripts and
Python objects in Oracle Database
• Integrate Python results into applications
and dashboards via SQL
• Produce better models faster with
automated machine learning
36
Oracle Database
User tables
In-db
stats
Database
Server
Machine
SQL Interfaces
SQL*Plus,
SQLDeveloper, …
Oracle Machine Learning
for Python
Python Client
- 37. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Oracle Machine Learning for Python
• Transparency layer
– Leverage proxy objects so data remain in database
– Overload Python functions translating functionality to SQL
– Use familiar Python syntax to manipulate database data
• Parallel, distributed algorithms
– Scalability and performance
– Exposes in-database algorithms from Oracle Data Mining
• Embedded Python execution
– Manage and invoke Python scripts in Oracle Database
– Data-parallel, task-parallel, and non-parallel execution
– Use open source Python packages
• Automated machine learning
– Feature selection, model selection, hyper-parameter tuning
37
Oracle Database
User tables
In-db
stats
Database
Server
Machine
SQL Interfaces
SQL*Plus,
SQLDeveloper, …
Oracle Machine Learning
for Python
Python Client
- 38. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML4Py Transparency Layer
• Leverages proxy objects for database data: oml.DataFrame
# Create table from Pandas DataFrame data
DATA = oml.create(data, table = 'BOSTON')
# Get proxy object to DB table boston
DATA = oml.sync(table = 'BOSTON')
• Overloads Python functions translating functionality to SQL
• Uses familiar Python syntax to manipulate database data
DATA.shape
DATA.head()
DATA.describe()
DATA.std()
DATA.skew()
train_dat, test_dat =
DATA.split()
train_dat.shape
test_dat.shape
38
- 39. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data Transfer-related functions
• oml.create(x, table[, oranumber, dbtypes, . . . ])
– Creates a table in Oracle Database from a Pandas DataFrame returning a proxy object
• oml.push(x[, oranumber, dbtypes])
– Pushes data to Oracle Database creating a temporary table returning a proxy object
• oml.sync(schema=None, regex_match=False, table=None, view=None, query=None)
– Creates a DataFrame proxy object in Python that represents an Oracle Database table
• oml.drop([table, view])
– Drops the named database table or view
• oml.dir()
– Returns the names of OML objects in the workspace
• oml.cursor()
– Returns a cx_Oracle cursor object of the current OML database connection
39
- 40. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
List of functions on OML DataFrame executed in-database
• KFold
• append
• columns
• concat
• corr
• count
• create_view
• crosstab
• cumsum
• describe
• drop
• drop_duplicates
• dropna
• head
• kurtosis
• materialize
• max
• mean
• median
• merge
• min
• nunique
• pivot_table
• pull
• rename
• round
• select_types
• shape
• skew
• sort_values
• split
• std
• sum
• t_dot
• tail
• types
40
- 41. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Example – create a DataFrame
41
- 42. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Example using crosstab on oml.DataFrame
42
- 43. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML4Py 1.0
Machine Learning algorithms in-Database
• Decision Tree
• Naïve Bayes
• Generalized Linear Model
• Support Vector Machine
• RandomForest
• Neural Network
Regression
• Generalized Linear Model
• Neural Network
• Support Vector Machine
Classification
Attribute Importance
• Minimum Description Length
Clustering
• Expectation Maximization
• Hierarchical k-Means
Feature Extraction
• Singular Value Decomposition
• Explicit Semantic Analysis
Market Basket Analysis
• Apriori – Association Rules
Anomaly Detection
• 1 Class Support Vector Machine
…plus open source Python packages for algorithms in
combination with embedded Python execution
43
Supports integrated partitioned models, text mining
- 44. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Connect to the database
Client Python Engine
OML4Py
Python user on laptop
Oracle Database
Transparency Layer
import oml
import os
sid = os.environ["ORACLE_SID"]
oml.connect(user="pyquser", password="pyquser",
dsn="(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=...)
(PORT=1521))(CONNECT_DATA=(SID=sid)))")
oml.isconnected()
- 45. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Invoke in-database aggregation function
Client Python Engine
OML4Py
Python user on desktop
Oracle Database
User tables
Transparency Layer
ONTIME_S = oml.sync(table="ONTIME_S")
res = ONTIME_S.crosstab('DEST')
type(res)
res.head()
Source data is a DataFrame, ONTIME_S,
which is an Oracle Database table
crosstab() function overloaded to accept OML
DataFrame objects and transparently
generates SQL for execution in Oracle
Database
Returns an ‘oml.core.frame.DataFrame’ object
In-db
stats
select DEST, count(*)
from ONTIME_S
group by DEST
- 46. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML4Py Embedded Python
def fit(data):
from sklearn.svm import LinearSVC
x = data.drop('TARGET',
axis = 1).values
y = data['TARGET']
return LinearSVC().fit(x, y)
oml.script.create('sk_svc_fit', fit,
overwrite = True)
oml.script.dir()
mod = oml.table_apply(train_dat,
func = 'sk_svc_fit',
oml_input_type = 'pandas.DataFrame')
46
Client Python Engine
OML4Py
User tables
pyq*eval ()
interface
2
3
Oracle Database
extproc
DB Python Engine
4
OML4Py
1
- 47. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
oml.group_apply – partitioned data flow
Client Python Engine
OML4Py
User tables
DB Python Engine
pyq*eval ()
interface
extproc
2
3
4
OML4Py
Oracle Database
extproc
DB Python Engine
4
OML4Py
def build_lm(dat):
from sklearn import linear_model
lm = linear_model.LinearRegression()
X = dat[['PETAL_WIDTH']]
y = dat[['PETAL_LENGTH']]
lm.fit(X, y)
return lm
index = oml.DataFrame(IRIS['SPECIES'])
mods = oml.group_apply(
IRIS[:,['PETAL_LENGTH',
"PETAL_WIDTH",
'SPECIES']],
index,
func=build_lm)
sorted(mods.pull().items())
1
- 48. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Embedded Python Execution functions
• oml.do_eval(func[, func_value, func_owner, . . . ])
– Executes the user-defined Python function at the Oracle Database server machine
• oml.table_apply(data, func[, func_value, . . . ])
– Executes the user-defined Python function at the Oracle Database server machine supplying data
pulled from Oracle Database
• oml.row_apply(data, func[, func_value, . . . ])
– Partitions a table or view into row chunks and executes the user-defined python function on each chunk
within one or more Python processes running at the Oracle Database server machine
• oml.group_apply(data, index, func[, . . . ])
– Partitions a table or view by the values in column(s) specified in index and executes the user-defined python
function on those partitions within one or more Python processes running at the Oracle Database server machine
• oml.index_apply(times, func[, func_value, . . . ])
– Executes the user-defined python function multiple times inside Oracle Database server
- 49. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Script repository functions for saving Python scripts in ODB
• oml.script.create(name, func[, is_global, . . . ])
– Creates a Python script, which contains a single function definition, in the
Oracle Database Python script repository
• oml.script.dir([name, regex_match, sctype])
– Lists the scripts present in the Oracle Database Python script repository
• oml.script.load(name[, owner])
– Loads the named script from the Oracle Database Python script repository as a callable object
• oml.script.drop(name[, is_global, silent])
– Drops the named script from the Oracle Database Python script repository
- 50. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Datastore functions for saving Python objects in ODB
• oml.ds.save(objs, name[, description, . . . ])
– Saves Python objects to a datastore in the user’s Oracle Database schema
• oml.ds.dir([name, regex_match, dstype])
– Lists existing datastores available to the current session user
• oml.ds.describe(name[, owner])
– Describes the contents of the named datastore available to the current session user
• oml.ds.load(name[, objs, owner, to_globals])
– Loads Python objects from a datastore in the user’s Oracle Database schema
• oml.ds.delete(name[, objs, regex_match])
– Deletes one or more datastores from the user’s Oracle Database schema or deletes specific objects to delete from
within a datastore
- 51. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Data Types
Mapping between OML4Py and Oracle Database
cx_Oracle Read Python cx_Oracle Write
varchar2, char, clob str varchar2, char, clob
number, binary_double, binary_float float if oranumber == True then number (default)
else binary_double
boolean if oranumber == True then number (default)
else binary_double
raw, blob bytes raw, blob
51
- 52. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
AutoML – new with OML4Py in Oracle Advanced Analytics
• Goal: increase model quality and data scientist productivity while reducing overall compute time
• Auto Feature Selection
– Reduce the number of features by identifying most relevant
– Improve performance and accuracy
• Auto Model Selection for classification and regression
– Identify best algorithm to achieve maximum score
– Find best model many times faster than with exhaustive search techniques
• Auto Tuning of Hyper-parameters
– Significantly improve model accuracy
– Avoid manual or exhaustive search techniques
52
- 53. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Auto Feature Selection: Motivation & Example
Confidential – Oracle Internal/Restricted/Highly Restricted 53
• Many real-world datasets have a
large number of irrelevant
features
• Slows down training
• Goal: Speed-up ML pipeline by
selecting most relevant features
0
5
10
15
20
25
30
1 2
Trainingtime(seconds)
ML training time
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
1 2
Accuracy
Prediction Accuracy
33x
+4%
OpenML dataset 312 with 1925 rowsOpenML dataset 40996 (56000 rows, 784 columns)
Using SVM Gaussian with Auto Feature Selection
• Features reduced from 784 to 309
• Accuracy improves from 65.9% to 84.3%
• Training time reduced 1.3x
- 54. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Auto Feature Selection: Evaluation for OAA SVM Gaussian
Confidential – Oracle Internal/Restricted/Highly Restricted 54
• 150 Datasets with
more than 500 cases
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1 2 3 4 5 6 7 8 9 10
Accuracy
Series1
Series2
Avg Accuracy Gain
2.5%
Avg Feature Reduction
52%
- 55. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Auto Feature Selection Example
fs = FeatureSelection(mining_function = 'classification',
score_metric = 'accuracy')
selected_features = fs.reduce('dt', X_train, y_train)
X_train = X_train[:,selected_features]
55Confidential – Oracle Internal/Restricted/Highly Restricted
- 56. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Auto Model Selection Example
ms = ModelSelection(mining_function = 'classification',
score_metric = 'accuracy')
best_model = ms.select(X_train, y_train)
y_pred = best_model.predict(X_test)
56Confidential – Oracle Internal/Restricted/Highly Restricted
- 57. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Auto Tune Example
at = Autotune(mining_function = 'classification',
score_metric = 'accuracy')
evals = at.tune('dt', X_train, y_train)
mod = evals['best_model']
y_pred = mod.predict(X_test)
57Confidential – Oracle Internal/Restricted/Highly Restricted
- 58. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
OML4Py - Deployment Architecture
Oracle Confidential – Internal/Restricted/Highly Restricted 58
Oracle Database
Python 3 engine
OAA / OML4Py
Zeppelin / Jupyter
web interface
BDA / Hadoop
Big Data SQL
Web browser
Web browser
OML4Py Client
Python Engine
Python Script
Repository
Python Object
Datastore
Oracle Analytics Cloud
Oracle Data Visualization Desktop
OBIEE
- 59. Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |
Summary - Oracle Machine Learning for Python
• Oracle Database enabled with Python scripting language and environment
for the enterprise via Oracle Advanced Analytics option
• Oracle’s Python technologies extend Python for enterprise use
– Supports data analysis, exploration, and machine learning
– Enables streamlined production development
– Automates key data science steps for greater data scientist productivity,
while enhancing accuracy and performance
• Achieve performance and scalability leveraging Oracle Database as a
high performance compute engine
59