SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
Practical Machine Learning for DBAs

Alex Gorbachev
Las Vegas, NV
April 2014
Alex Gorbachev
• Chief Technology Officer at Pythian
• Blogger
• Cloudera Champion of Big Data
• OakTable Network member
• Oracle ACE Director
• Founder of BattleAgainstAnyGuess.com
• Founder of Sydney Oracle Meetup
• IOUG Director of Communities
• EVP, Ottawa Oracle User Group
Agenda
• What’s Machine Learning
– Typical Machine Learning applications
• Why using Oracle Database for
Machine Learning
• Practical examples
– Classifying PL/SQL code
– Classifying database schemas into good
and bad
– SQL statements clustering
– Detecting anomalies in database
workload
What is Machine Learning?
data magic
scientific	

data	

analysis
modern	

practical	

AI
building simplified	

models of the universe	

using probabilistic models
Tom Mitchell’s definition
• Machine Learning is the study of computer
algorithms that improve automatically through
experience.
!
• A computer program is said to learn from
experience E with respect to some task T and some
performance measure P, if its performance on T, as
measured by P, improves with experience E.
Why is it useful?
Why is it useful?
Why is it useful?
Why is it useful?
Classes of ML algorithms
• Supervised learning
– Input: data + known facts; Output - predictions
• Unsupervised learning
– Input: data; Output – hypothesis
!
– Other less common algorithms such as reinforcement
learning, recommenders and etc
Supervised Learning: Linear Regression
Supervised Learning: Classification
Unsupervised Learning: Clustering
Unsupervised Learning: Anomaly Detection
Machine Learning workflow
• Gather
• Clean & transform
• Explore
• Model
• Interpret
• Produce value
} today’s focus
Why Machine Learning in
Oracle Database?
Machine Learning in Oracle DB?
• That’s where the data is
• Data in an RDBMS is often clean
• Easy to transform data with SQL
• Powerful algorithms implemented
– Oracle Data Mining option
– Analytic SQL
Machine Learning by example
Applying Machine Learning
to the business of DBAs
Problem: Detect bad PL/SQL
• Goal: automated PL/SQL code grading
– Classify as Good or Bad
• Typical classification task
– Assignment of labels to the set of unlabeled items
based on prior observations
Classification process
• Parse input data
• Extract features
– Manually or automatically or they are clearly defined (if
row is an item, columns may be features)
• Train – calculate model based on labeled input
• Verify – test model on labeled input
• Apply labels to unlabeled input
!
• Classification is supervised learning
Features definition - easy task?
Kittens vs …
Kittens vs Puppies
PL/SQL code features
• Automatically extract words from the text as
features (tokenize)
– EASY TO AUTOMATE
• Assign features intelligently
– Code size
– Author
– Percent of comment lines
– Presence of specific code patterns
– DIFFICULT TO AUTOMATE
Classification model workflow
1. Create Oracle Text policy (define lexer)
2. Configure and build the model on training set
3. Apply model to the testing set
4. Assess model performance
5. Adjust model settings/features/size and repeat
Basic probability lesson
• p(A) is the probability that A is true
A is
false
A is
true
Area is 1
Basic probability lesson
• p(A) is the probability that A is true
• Axioms of Probability
Basic probability lesson
• p(A) is the probability that A is true
• Axioms of Probability
!
!
!
!
• Bayes Law
How Bayes Law can work for us?
!
!
!
• A – presence of a feature
like WHEN OTHERS THEN NULL in PL/SQL
• B – bad PL/SQL code
A
B
Area is 1
B|A
PL/SQL data source
• OBJECT_ID – case ID
• CODE – text column
• TARGET_VALUE – 0 is good and 1 is bad
• Training set
– where mod(object_id, 10) < 5
• Testing set
– where mod(object_id, 10) >= 5
Oracle Text policy
begin
begin
ctx_ddl.drop_policy('plsql_nb_policy');
exception when others then null;
end;
begin
ctx_ddl.drop_preference('plsql_nb_lexer');
exception when others then null;
end;
ctx_ddl.create_preference
('plsql_nb_lexer’, 'BASIC_LEXER');
ctx_ddl.create_policy
('plsql_nb_policy', lexer=>'plsql_nb_lexer');
end;
/
Model settings
CREATE TABLE plsql_nb_settings (
setting_name VARCHAR2(30),
setting_value VARCHAR2(4000));
BEGIN
-- Populate settings table
INSERT INTO plsql_svm_settings VALUES
(dbms_data_mining.algo_name, dbms_data_mining.algo_naive_bayes);
INSERT INTO plsql_nb_settings VALUES
(dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);
INSERT INTO plsql_nb_settings VALUES
(dbms_data_mining.odms_text_policy_name, 'plsql_nb_policy');
-- INSERT INTO plsql_nb_settings VALUES
-- (dbms_data_mining.NABS_PAIRWISE_THRESHOLD,0.01);
-- INSERT INTO plsql_nb_settings VALUES
-- (dbms_data_mining.NABS_SINGLETON_THRESHOLD,0.01);
COMMIT;
END;
/
Build model
DECLARE
xformlist dbms_data_mining_transform.TRANSFORM_LIST;
BEGIN
BEGIN DBMS_DATA_MINING.DROP_MODEL('PLSQL_NB');
EXCEPTION WHEN OTHERS THEN NULL; END;
!
dbms_data_mining_transform.SET_TRANSFORM(
xformlist, 'code', null, 'code', null, 'TEXT(TOKEN_TYPE:NORMAL)');
!
DBMS_DATA_MINING.CREATE_MODEL(
model_name => 'PLSQL_NB',
mining_function => dbms_data_mining.classification,
data_table_name => 'plsql_build',
case_id_column_name => 'object_id',
target_column_name => 'target_value',
settings_table_name => 'plsql_nb_settings',
xform_list => xformlist);
END;
/
Test model
SELECT
target_value AS actual_target,
PREDICTION(plsql_nb USING *) AS predicted_target,
COUNT(*) AS cases_count
FROM plsql_test
GROUP BY target_value,
PREDICTION(plsql_nb USING *)
ORDER BY 1, 2;
Demo
40
Skyline and Oculus by Etsy
blackbox anomaly detection
41
Thanks and Q&A
Contact info
gorbachev@pythian.com
+1-877-PYTHIAN
To follow us
pythian.com/blog
@alexgorbachev

@pythian
linkedin.com/company/pythian

Weitere ähnliche Inhalte

Was ist angesagt?

DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationDBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationTammy Bednar
 
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...Lucas Jellema
 
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
Database Cloud Services Office Hours : Oracle sharding  hyperscale globally d...Database Cloud Services Office Hours : Oracle sharding  hyperscale globally d...
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...Tammy Bednar
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)avanttic Consultoría Tecnológica
 
Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Databricks
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahData Con LA
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 
#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?Tammy Bednar
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationMichael Rainey
 
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the MapTammy Bednar
 
EDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to PostgresEDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to PostgresEDB
 
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...Tammy Bednar
 
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsGetting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsairisData
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1PgTraining
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jDataWorks Summit
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Connor McDonald
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 

Was ist angesagt? (20)

DBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through MigrationDBCS Office Hours - Modernization through Migration
DBCS Office Hours - Modernization through Migration
 
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...The Art of Intelligence – Introduction Machine Learning for Oracle profession...
The Art of Intelligence – Introduction Machine Learning for Oracle profession...
 
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
Database Cloud Services Office Hours : Oracle sharding  hyperscale globally d...Database Cloud Services Office Hours : Oracle sharding  hyperscale globally d...
Database Cloud Services Office Hours : Oracle sharding hyperscale globally d...
 
SQL On Hadoop
SQL On HadoopSQL On Hadoop
SQL On Hadoop
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
 
Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...Improving Python and Spark Performance and Interoperability with Apache Arrow...
Improving Python and Spark Performance and Interoperability with Apache Arrow...
 
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi MensahTurning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
Turning Relational Database Tables into Hadoop Datasources by Kuassi Mensah
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?#dbhouseparty - Should I be building Microservices?
#dbhouseparty - Should I be building Microservices?
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
#dbhouseparty - Spatial Technologies - @Home and Everywhere Else on the Map
 
EDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to PostgresEDB's Migration Portal - Migrate from Oracle to Postgres
EDB's Migration Portal - Migrate from Oracle to Postgres
 
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
Database@Home - Data Driven : Loading, Indexing, and Searching with Text and ...
 
Getting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analyticsGetting Spark ready for real-time, operational analytics
Getting Spark ready for real-time, operational analytics
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Spark mhug2
Spark mhug2Spark mhug2
Spark mhug2
 

Ähnlich wie Introduction to Machine Learning for Oracle Database Professionals

ETL for the masses with Power Query and M
ETL for the masses with Power Query and METL for the masses with Power Query and M
ETL for the masses with Power Query and MRégis Baccaro
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptxjasontseng19
 
Querying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptxQuerying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptxQuyVo27
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedLaurenz Wuttke
 
Object Oriented Programming Principles
Object Oriented Programming PrinciplesObject Oriented Programming Principles
Object Oriented Programming PrinciplesAndrew Ferlitsch
 
Data Warehousing Tutorials
Data Warehousing TutorialsData Warehousing Tutorials
Data Warehousing TutorialsSofia Khan
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contentsManoj Jagtap
 
Azure machine learning tech mela
Azure machine learning tech melaAzure machine learning tech mela
Azure machine learning tech melaYogendra Tamang
 
Schema less table & dynamic schema
Schema less table & dynamic schemaSchema less table & dynamic schema
Schema less table & dynamic schemaDavide Mauri
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion EngineAdam Doyle
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxIvo Andreev
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureSanil Mhatre
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 
How to Start a Career in Data Science - Jovian.ml
How to Start a Career in Data Science - Jovian.ml How to Start a Career in Data Science - Jovian.ml
How to Start a Career in Data Science - Jovian.ml Aakash N S
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 

Ähnlich wie Introduction to Machine Learning for Oracle Database Professionals (20)

ETL for the masses with Power Query and M
ETL for the masses with Power Query and METL for the masses with Power Query and M
ETL for the masses with Power Query and M
 
machine learning workflow with data input.pptx
machine learning workflow with data input.pptxmachine learning workflow with data input.pptx
machine learning workflow with data input.pptx
 
Querying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptxQuerying_with_T-SQL_-_01.pptx
Querying_with_T-SQL_-_01.pptx
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
Object Oriented Programming Principles
Object Oriented Programming PrinciplesObject Oriented Programming Principles
Object Oriented Programming Principles
 
Data Warehousing Tutorials
Data Warehousing TutorialsData Warehousing Tutorials
Data Warehousing Tutorials
 
Where to Start ETL Developer Career
Where to Start ETL Developer CareerWhere to Start ETL Developer Career
Where to Start ETL Developer Career
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contents
 
Azure machine learning tech mela
Azure machine learning tech melaAzure machine learning tech mela
Azure machine learning tech mela
 
ORMs Meet SQL
ORMs Meet SQLORMs Meet SQL
ORMs Meet SQL
 
Schema less table & dynamic schema
Schema less table & dynamic schemaSchema less table & dynamic schema
Schema less table & dynamic schema
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 
Apache Spark MLlib
Apache Spark MLlib Apache Spark MLlib
Apache Spark MLlib
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
ORM Methodology
ORM MethodologyORM Methodology
ORM Methodology
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & Azure
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
How to Start a Career in Data Science - Jovian.ml
How to Start a Career in Data Science - Jovian.ml How to Start a Career in Data Science - Jovian.ml
How to Start a Career in Data Science - Jovian.ml
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 

Mehr von Alex Gorbachev

Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...Alex Gorbachev
 
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex GorbachevBenchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex GorbachevAlex Gorbachev
 
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...Alex Gorbachev
 
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...Alex Gorbachev
 
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, PythianMOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, PythianAlex Gorbachev
 
Oracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionOracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionAlex Gorbachev
 
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...Alex Gorbachev
 

Mehr von Alex Gorbachev (8)

Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
UTHOC2 - Under The Hood of Oracle Clusterware 2.0 - Grid Infrastructure by Al...
 
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex GorbachevBenchmarking Oracle I/O Performance with Orion by Alex Gorbachev
Benchmarking Oracle I/O Performance with Orion by Alex Gorbachev
 
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
Demystifying Oracle RAC Workload Management by Alex Gorbachev, Pythian | NoCO...
 
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
MOW2010: 1TB MySQL Database Migration and HA Infrastructure by Alex Gorbachev...
 
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, PythianMOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
MOW2010: Under the Hood of Oracle Clusterware by Alex Gorbachev, Pythian
 
Oracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionOracle ASM 11g - The Evolution
Oracle ASM 11g - The Evolution
 
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
Oracle 11g New Features Out-of-the-Box by Alex Gorbachev (from Sydney Oracle ...
 

Kürzlich hochgeladen

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Introduction to Machine Learning for Oracle Database Professionals

  • 1. Practical Machine Learning for DBAs
 Alex Gorbachev Las Vegas, NV April 2014
  • 2. Alex Gorbachev • Chief Technology Officer at Pythian • Blogger • Cloudera Champion of Big Data • OakTable Network member • Oracle ACE Director • Founder of BattleAgainstAnyGuess.com • Founder of Sydney Oracle Meetup • IOUG Director of Communities • EVP, Ottawa Oracle User Group
  • 3. Agenda • What’s Machine Learning – Typical Machine Learning applications • Why using Oracle Database for Machine Learning • Practical examples – Classifying PL/SQL code – Classifying database schemas into good and bad – SQL statements clustering – Detecting anomalies in database workload
  • 4. What is Machine Learning?
  • 8. building simplified models of the universe using probabilistic models
  • 9. Tom Mitchell’s definition • Machine Learning is the study of computer algorithms that improve automatically through experience. ! • A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
  • 10. Why is it useful?
  • 11. Why is it useful?
  • 12. Why is it useful?
  • 13. Why is it useful?
  • 14. Classes of ML algorithms • Supervised learning – Input: data + known facts; Output - predictions • Unsupervised learning – Input: data; Output – hypothesis ! – Other less common algorithms such as reinforcement learning, recommenders and etc
  • 19. Machine Learning workflow • Gather • Clean & transform • Explore • Model • Interpret • Produce value } today’s focus
  • 20. Why Machine Learning in Oracle Database?
  • 21. Machine Learning in Oracle DB? • That’s where the data is • Data in an RDBMS is often clean • Easy to transform data with SQL • Powerful algorithms implemented – Oracle Data Mining option – Analytic SQL
  • 22. Machine Learning by example Applying Machine Learning to the business of DBAs
  • 23. Problem: Detect bad PL/SQL • Goal: automated PL/SQL code grading – Classify as Good or Bad • Typical classification task – Assignment of labels to the set of unlabeled items based on prior observations
  • 24. Classification process • Parse input data • Extract features – Manually or automatically or they are clearly defined (if row is an item, columns may be features) • Train – calculate model based on labeled input • Verify – test model on labeled input • Apply labels to unlabeled input ! • Classification is supervised learning
  • 25. Features definition - easy task?
  • 28. PL/SQL code features • Automatically extract words from the text as features (tokenize) – EASY TO AUTOMATE • Assign features intelligently – Code size – Author – Percent of comment lines – Presence of specific code patterns – DIFFICULT TO AUTOMATE
  • 29. Classification model workflow 1. Create Oracle Text policy (define lexer) 2. Configure and build the model on training set 3. Apply model to the testing set 4. Assess model performance 5. Adjust model settings/features/size and repeat
  • 30. Basic probability lesson • p(A) is the probability that A is true A is false A is true Area is 1
  • 31. Basic probability lesson • p(A) is the probability that A is true • Axioms of Probability
  • 32. Basic probability lesson • p(A) is the probability that A is true • Axioms of Probability ! ! ! ! • Bayes Law
  • 33. How Bayes Law can work for us? ! ! ! • A – presence of a feature like WHEN OTHERS THEN NULL in PL/SQL • B – bad PL/SQL code A B Area is 1 B|A
  • 34. PL/SQL data source • OBJECT_ID – case ID • CODE – text column • TARGET_VALUE – 0 is good and 1 is bad • Training set – where mod(object_id, 10) < 5 • Testing set – where mod(object_id, 10) >= 5
  • 35. Oracle Text policy begin begin ctx_ddl.drop_policy('plsql_nb_policy'); exception when others then null; end; begin ctx_ddl.drop_preference('plsql_nb_lexer'); exception when others then null; end; ctx_ddl.create_preference ('plsql_nb_lexer’, 'BASIC_LEXER'); ctx_ddl.create_policy ('plsql_nb_policy', lexer=>'plsql_nb_lexer'); end; /
  • 36. Model settings CREATE TABLE plsql_nb_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(4000)); BEGIN -- Populate settings table INSERT INTO plsql_svm_settings VALUES (dbms_data_mining.algo_name, dbms_data_mining.algo_naive_bayes); INSERT INTO plsql_nb_settings VALUES (dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on); INSERT INTO plsql_nb_settings VALUES (dbms_data_mining.odms_text_policy_name, 'plsql_nb_policy'); -- INSERT INTO plsql_nb_settings VALUES -- (dbms_data_mining.NABS_PAIRWISE_THRESHOLD,0.01); -- INSERT INTO plsql_nb_settings VALUES -- (dbms_data_mining.NABS_SINGLETON_THRESHOLD,0.01); COMMIT; END; /
  • 37. Build model DECLARE xformlist dbms_data_mining_transform.TRANSFORM_LIST; BEGIN BEGIN DBMS_DATA_MINING.DROP_MODEL('PLSQL_NB'); EXCEPTION WHEN OTHERS THEN NULL; END; ! dbms_data_mining_transform.SET_TRANSFORM( xformlist, 'code', null, 'code', null, 'TEXT(TOKEN_TYPE:NORMAL)'); ! DBMS_DATA_MINING.CREATE_MODEL( model_name => 'PLSQL_NB', mining_function => dbms_data_mining.classification, data_table_name => 'plsql_build', case_id_column_name => 'object_id', target_column_name => 'target_value', settings_table_name => 'plsql_nb_settings', xform_list => xformlist); END; /
  • 38. Test model SELECT target_value AS actual_target, PREDICTION(plsql_nb USING *) AS predicted_target, COUNT(*) AS cases_count FROM plsql_test GROUP BY target_value, PREDICTION(plsql_nb USING *) ORDER BY 1, 2;
  • 39. Demo
  • 40. 40
  • 41. Skyline and Oculus by Etsy blackbox anomaly detection 41
  • 42. Thanks and Q&A Contact info gorbachev@pythian.com +1-877-PYTHIAN To follow us pythian.com/blog @alexgorbachev
 @pythian linkedin.com/company/pythian