SlideShare ist ein Scribd-Unternehmen logo
1 von 73
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  The	
  one	
  language	
  to	
  rule	
  all	
  your	
  data	
  
	
  
	
  
	
  
Brendan Tierney
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
§  Data	
  Warehousing	
  since	
  1997	
  
§  Data	
  Mining	
  since	
  1998	
  
§  Analy)cs	
  since	
  1993	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Store	
  
Access	
  
Analyze	
  
Protect	
  
Store	
  
Access	
  
Analyze	
  
Protect	
  
Analyze	
  
SELECT product, SUM(sale) AS "Total Sales"
FROM order_details
GROUP BY product;
Analyze	
  
SELECT product, SUM(sale) AS "Total Sales"
FROM order_details
GROUP BY product
HAVING SUM(sale) >= 10000;
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Let	
  us	
  start	
  with	
  some	
  Basics	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
SUM(x)	
  
AVG(x)	
  
STDDEV(x)	
  
CORR(x,	
  y)	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Crea)ng	
  a	
  story	
  about	
  our	
  data.	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
How	
  we	
  do	
  Analy)cs?	
  	
  
	
  Some)mes	
  how	
  we	
  are	
  told	
  how	
  to	
  do	
  Analy)cs	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Do	
  we	
  really	
  need	
  to	
  use	
  other	
  tools	
  &	
  languages?	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
But	
  !	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
But	
  !	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
But	
  !	
  
Our	
  data	
  no	
  longer	
  fits	
  on	
  our	
  laptop.	
  
	
  
a	
  Big	
  Data	
  issue?	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
R	
  -­‐	
  The	
  Challenges	
  	
  
§  Scalability	
  
§  Regardless	
  of	
  the	
  number	
  of	
  cores	
  on	
  your	
  CPU,	
  R	
  will	
  only	
  use	
  1	
  on	
  a	
  default	
  
build	
  
§  Performance	
  
§  R	
  reads	
  data	
  into	
  memory	
  by	
  default.	
  Easy	
  to	
  exhaust	
  RAM	
  by	
  storing	
  unnecessary	
  
data.	
  Typically	
  R	
  will	
  throw	
  an	
  excep)on	
  at	
  2GB.	
  
§  Paralleliza)on	
  can	
  be	
  challenge.	
  Is	
  not	
  Default.	
  Packages	
  available	
  
§  Produc)on	
  Deployment	
  
§  Difficul)es	
  deploying	
  R	
  in	
  produc)on	
  
§  Typically	
  need	
  to	
  re-­‐code	
  in	
  …..	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
I’m	
  ge`ng	
  too	
  old	
  for	
  this	
  new	
  stuff	
  !	
  
Can	
  you	
  teach	
  an	
  old	
  dog	
  new	
  tricks?	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
What	
  if	
  you	
  could	
  use	
  the	
  language	
  and	
  skills	
  you	
  
already	
  have?	
  
	
  
	
  
	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Did you know?
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Sta)s)cal	
  Func)ons	
  in	
  Oracle	
  
All	
  of	
  these	
  are	
  
FREE	
  	
  
with	
  the	
  Database	
  
These	
  are	
  ocen	
  
forgoden	
  about	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
DBMS_STAT_FUNC	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
set serveroutput on
declare 

   s         DBMS_STAT_FUNCS.SummaryType; 

begin 

  

   DBMS_STAT_FUNCS.SUMMARY('DMUSER', 'MINING_DATA_BUILD_V', 'AGE', 3, s);
   dbms_output.put_line('SUMMARY STATISTICS'); 

   dbms_output.put_line('Count  : '||s.count); 

   dbms_output.put_line('Min    : '||s.min); 

   dbms_output.put_line('Max    : '||s.max); 

   dbms_output.put_line('Range  : '||s.range); 

   dbms_output.put_line('Mean   : '||round(s.mean)); 

   dbms_output.put_line('Mode Count : '||s.cmode.count); 

   dbms_output.put_line('Mode        : '||s.cmode(1)); 

   dbms_output.put_line('Variance    : '||round(s.variance)); 

   dbms_output.put_line('Stddev      : '||round(s.stddev)); 

   dbms_output.put_line('Quantile 5  : '||s.quantile_5); 

   dbms_output.put_line('Quantile 25 : '||s.quantile_25); 

   dbms_output.put_line('Median      : '||s.median); 

   dbms_output.put_line('Quantile 75 : '||s.quantile_75); 

   dbms_output.put_line('Quantile 95 : '||s.quantile_95); 

   dbms_output.put_line('Extreme Count : '||s.extreme_values.count); 

   dbms_output.put_line('Extremes      : '||s.extreme_values(1)); 

   dbms_output.put_line('Top 5 : '||s.top_5_values(1)||','|| 

                                                s.top_5_values(2)||','|| 

                                                s.top_5_values(3)||','|| 

                                                s.top_5_values(4)||','|| 

                                                s.top_5_values(5)); 

   dbms_output.put_line('Bottom 5 : '||s.bottom_5_values(5)||','|| 

                                                     s.bottom_5_values(4)||','|| 

                                                     s.bottom_5_values(3)||','|| 

                                                     s.bottom_5_values(2)||','|| 

                                                     s.bottom_5_values(1)); 

end;
hdp://www.oraly)cs.com/2013/04/part-­‐1ge`ng-­‐started-­‐with-­‐sta)s)cs.html	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Stats	
  and	
  More	
  Stats	
  
§  Correla)ons	
  	
  (Spearman’s,	
  Kendall)	
  
§  Linear	
  Regression	
  
§  T-­‐Test	
  
§  F-­‐Test	
  
§  Hypothesis	
  tes)ng	
  
§  Anova	
  
§  Ranking	
  
§  Window	
  Aggregate	
  func)ons	
  
§  Lead	
  /	
  Lag	
  
§  Cross	
  Tabula)on	
  
§  PIVOT	
  /	
  UNPIVOT	
  
§  …	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Scalable	
  
Highly	
  Secure	
  
No	
  Data	
  	
  
Movement	
  
Real	
  Time	
  
Produc)on	
  
Deployment	
  
Faster	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Comprehensive	
  Advanced	
  Analy)cs	
  Plalorm	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Oracle	
  Data	
  Mining	
  
§  PL/SQL	
  Package	
  
§  DBMS_DATA_MINING	
  
§  DBMS_DATA_MINING_TRANSFORM	
  
§  DBMS_PREDICTIVE_ANALYTICS	
  
§  SQL	
  Func)ons	
  
–  PREDICTION	
  
–  PREDICTION_PROBABILITY	
  
–  PREDICTION_BOUNDS	
  
–  PREDICTION_COST	
  
–  PREDICTION_DETAILS	
  
–  PREDICTION_SET	
  
–  CLUSTER_ID	
  
–  CLUSTER_DETAILS	
  
–  CLUSTER_DISTANCE	
  
–  CLUSTER_PROBABILITY	
  
–  CLUSTER_SET	
  
–  FEATURE_ID	
  
–  FEATURE_DETAILS	
  
–  FEATURE_SET	
  
–  FEATURE_VALUE	
  
§  12c	
  –	
  Predic)ve	
  Queries	
  
§  aka	
  	
  Dynamic	
  Queries	
  
§  Transi)ve	
  dynamic	
  Data	
  Mining	
  models	
  
§  Can	
  scale	
  to	
  many	
  100+	
  models	
  all	
  in	
  one	
  
statement	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
select cust_id, affinity_card,
PREDICTION( FOR to_char(affinity_card) USING *) OVER () pred_affinity_card
from mining_data_build_v;
PQ	
  to	
  predict	
  the	
  
AFFINITY_CARD	
  value.	
  
	
  
Using	
  all	
  the	
  data	
  
	
  USING	
  *	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
select cust_id, affinity_card,
PREDICTION( FOR to_char(affinity_card) USING *) OVER () pred_affinity_card
from mining_data_build_v;
With	
  PQs	
  we	
  can	
  
dynamically	
  create	
  
new	
  DM	
  models	
  based	
  
on	
  an	
  Adribute(s)	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
select cust_id, affinity_card,
PREDICTION( FOR to_char(affinity_card) USING *) OVER
(PARTITION BY "COUNTRY_NAME") pred_affinity_card
from mining_data_build_v;
A	
  new	
  DM	
  Model	
  will	
  
be	
  created	
  for	
  each	
  
Country	
  (19)	
  
With	
  PQs	
  we	
  can	
  
dynamically	
  create	
  
new	
  DM	
  models	
  based	
  
on	
  an	
  Adribute(s)	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Analy)c	
  Func)ons	
  	
  (in	
  12c)	
  
>46	
  Analy)cs	
  Func)ons	
  in	
  12c	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
What	
  about	
  R	
  ?	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
--
-- There are 2 ways to use the GLM model : in Batch and in Real-Time mode
--
-- First Step : Build the in-database R script to score you new data
--
Begin
sys.rqScriptDrop('Demo_GLM_Batch');
sys.rqScriptCreate('Demo_GLM_Batch',
'function(dat, datastore_name) {
ore.load(datastore_name)
prd <- predict(mod, newdata=dat)
prd[as.integer(rownames(prd))] <- prd
res <- cbind(dat, PRED = prd)
res}');
end;
/
--
-- Now you can run the script to score the new data in Batch model
-- The data is located in the table MINING_DATA_APPLY
--
select * from table(rqTableEval(
cursor(select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION,
HOUSEHOLD_SIZE, YRS_RESIDENCE
from MINING_DATA_APPLY_V
where rownum <= 10),
cursor(select 1 as "ore.connect", 'myDatastore' as "datastore_name" from dual),
'select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION,
HOUSEHOLD_SIZE, YRS_RESIDENCE, 1 PRED from MINING_DATA_APPLY_V','Demo_GLM_Batch'))
order by 1, 2, 3;
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Store	
  
Access	
  
Analyze	
  
Protect	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
External	
  
View	
  
External	
  
View	
  
Conceptual	
  Schema	
  
Physical	
  Schema	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
CREATE TABLE countries_ext (
country_code VARCHAR2(5),
country_name VARCHAR2(50),
country_language VARCHAR2(50)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY ext_tab_data
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY ','
MISSING FIELD VALUES ARE NULL
(
country_code CHAR(5),
country_name CHAR(50),
country_language CHAR(50)
)
)
LOCATION ('Countries1.txt','Countries2.txt')
)
PARALLEL 5
REJECT LIMIT UNLIMITED;
SELECT * FROM countries_ext ORDER BY country_name;
COUNT COUNTRY_NAME COUNTRY_LANGUAGE
----- ---------------------------- -----------------------------
ENG England English
FRA France French
GER Germany German
IRE Ireland English
External	
  
View	
  
External	
  
View	
  
Conceptual	
  Schema	
  
Physical	
  Schema	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
CREATE TABLE json_dump_file_contents (
json_document CLOB
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY order_entry_dir
ACCESS PARAMETERS (
RECORDS DELIMITED BY 0x'0A'
DISABLE_DIRECTORY_LINK_CHECK
BADFILE loader_output_dir: 'JSONDumpFile.bad'
LOGFILE order_entry_dir: 'JSONDumpFile.log'
FIELDS (
json_document CHAR(5000)
)
)
LOCATION (order_entry_dir:'PurchaseOrders.dmp')
)
PARALLEL
REJECT LIMIT UNLIMITED;
SELECT count(*)
FROM json_dump_file_contents po
WHERE to_number(json_value(json_document, '$.PONumber')) > 1500;
External	
  
View	
  
External	
  
View	
  
Conceptual	
  Schema	
  
Physical	
  Schema	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
CREATE TABLE json_documents (
id RAW(16) NOT NULL,
data CLOB,
CONSTRAINT json_documents_pk PRIMARY KEY (id),
CONSTRAINT json_documents_json_chk CHECK (data IS JSON (STRICT) )
);
INSERT INTO json_documents (id, data)
VALUES (SYS_GUID(),
'{ "FirstName" : ”Brendan",
"LastName" : ”Tierney",
"Job" : "Clerk",
"Address" : { "Street" : ”1 Main Street",
"City" : ”Dublin",
"Country" : ”Ireland”},
"ContactDetails" : { "Email" : ”xyz@oralytics.com",
"Phone" : ”353 123 1234567",
"Twitter" : "@brendantierney" },
"DateOfBirth" : "01-JAN-2000",
"Active" : unknown }');
SELECT a.data.FirstName,
a.data.LastName,
a.data.ContactDetails.Email AS Email
FROM json_documents a
ORDER BY a.data.FirstName, a.data.LastName;
FIRSTNAME LASTNAME EMAIL
--------------- --------------- -------------------------
Brendan Tierney xyz@oralytics.com
External	
  
View	
  
External	
  
View	
  
Conceptual	
  Schema	
  
Physical	
  Schema	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
CREATE TABLE customer (
id NUMBER(38),
name VARCHAR2(100),
address VARCHAR2(100),
city VARCHAR2(40),
country VARCHAR2(50),
location MDSYS.SDO_GEOMETRY
);
INSERT INTO customer VALUES (
cust_seq.nextval,‘Brendan Tierney’, ‘1 Main Street’, ‘Dublin’, ‘Ireland’,
SDO_GEOMETRY
(2001, -- Geometry Type: 2-D Point
8307, -- SRID, Datum: WGS84
SDO_POINT_TYPE
(53.3498, -- Longitude for Dublin
6.2603, -- Latitude for Dublin
NULL),
NULL,
NULL
)
)
SELECT sdo_geom.sdo_distance(c1.locationm c2.location, 0.5, ‘unit=kilometer’)
FROM customer c1,
customer c2
WHERE c1.id = 1
AND c2.id = 2;
External	
  
View	
  
External	
  
View	
  
Conceptual	
  Schema	
  
Physical	
  Schema	
  
Spa)al	
  
&	
  
Graph	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Using	
  Oracle	
  Big	
  Data	
  SQL,	
  organiza)ons	
  can:	
  
•  Combine	
  data	
  from	
  Oracle	
  Database,	
  Apache	
  Hadoop	
  and	
  NoSQL	
  in	
  a	
  single	
  SQL	
  query	
  
•  Query	
  and	
  analyze	
  data	
  In	
  Apache	
  Hadoop	
  and	
  NoSQL	
  
•  Maximize	
  query	
  performance	
  on	
  all	
  data	
  using	
  advanced	
  techniques	
  like	
  Smart	
  Scan,	
  
Par))on	
  Pruning,	
  Storage	
  Indexes,	
  Bloom	
  Filters	
  and	
  Predicate	
  Push-­‐Down	
  in	
  a	
  
distributed	
  architecture	
  
•  Integrate	
  big	
  data	
  analyses	
  into	
  exis)ng	
  applica)ons	
  and	
  architectures	
  
	
  
External	
  
View	
  
External	
  
View	
  
Conceptual	
  Schema	
  
Physical	
  Schema	
  
Spa)al	
  
&	
  
Graph	
  
Oracle	
  
NoSQL	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
External	
  
View	
  
External	
  
View	
  
Conceptual	
  Schema	
  
Physical	
  Schema	
  
Spa)al	
  
&	
  
Graph	
  
Oracle	
  
NoSQL	
  
Accessing	
  data	
  on	
  Hadoop	
  or	
  an	
  Oracle	
  NoSQL	
  Database	
  requires	
  access	
  via	
  Hive/
HCatalog.	
  	
  
	
  
To	
  use	
  this	
  Hadoop	
  or	
  NoSQL	
  data	
  
•  Crea)ng	
  a	
  NoSQL	
  Store	
  and	
  a	
  Table	
  (or	
  Hadoop	
  data)	
  
•  Configuring	
  Hive/HCatalog	
  to	
  access	
  NoSQL	
  Table	
  or	
  other	
  data	
  	
  
•  Configuring	
  Oracle	
  Database	
  to	
  talk	
  to	
  HCatalog	
  via	
  an	
  external	
  table
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
External	
  
View	
  
External	
  
View	
  
Conceptual	
  Schema	
  
Physical	
  Schema	
  
Spa)al	
  
&	
  
Graph	
  
Oracle	
  
NoSQL	
  
CREATE TABLE movieapp_log_json (
custid INTEGER ,
movieid INTEGER ,
genreid INTEGER ,
time VARCHAR2 (20) ,
recommended VARCHAR2 (4) ,
activity NUMBER,
rating INTEGER,
price NUMBER
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
)
REJECT LIMIT UNLIMITED;
SELECT f.custid, m.title, m.year, m.gross, f.rating
FROM movieapp_log_json f, movie m
WHERE f.movieId = m.movie_id
AND f.rating > 4
Selects Hadoop data
and in-database data
Store	
  
Access	
  
Analyze	
  
Protect	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Data	
  Security	
  
§  Can	
  apply	
  all	
  the	
  typical	
  data	
  security	
  that	
  comes	
  with	
  Oracle	
  on	
  all	
  our	
  data	
  
–  Masking/Redac)on	
  
–  Virtual	
  Private	
  Databases	
  
–  Fine-­‐grained	
  access	
  control	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Store	
  
Access	
  
Analyze	
  
Protect	
  
External	
  
View	
  
External	
  
View	
  
Conceptual	
  Schema	
  
Physical	
  Schema	
  
Spa)al	
  
&	
  
Graph	
  
Oracle	
  
NoSQL	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
SQL	
  
	
  
One	
  Ring	
  to	
  rule	
  them	
  all,	
  One	
  Ring	
  to	
  find	
  them,	
  
One	
  Ring	
  to	
  bring	
  them	
  all	
  and	
  in	
  the	
  darkness	
  bind	
  them	
  
	
  
Sauron	
  –	
  Lord	
  of	
  the	
  Rings	
  
	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
SQL	
  
	
  
One	
  SQL	
  	
  to	
  rule	
  them	
  all,	
  One	
  	
  SQL	
  to	
  find	
  them,	
  
One	
  	
  SQL	
  to	
  bring	
  them	
  all	
  and	
  in	
  the	
  Database	
  bind	
  them	
  
	
  
Sauron	
  –	
  Lord	
  of	
  the	
  Rings	
  
Brendan	
  Tierney	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
SQL	
  
	
  
One	
  Language	
  to	
  rule	
  them	
  all,	
  One	
  Language	
  to	
  find	
  them,	
  
One	
  Language	
  to	
  bring	
  them	
  all	
  and	
  in	
  the	
  Database	
  bind	
  them	
  
	
  
Sauron	
  –	
  Lord	
  of	
  the	
  Rings	
  
Brendan	
  Tierney	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
	
  	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  brendan.)erney@oraly)cs.com	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  @brendan)erney	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  www.oraly)cs.com	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ie.linkedin.com/in/brendan)erney	
  
 	
  	
  www.oraly)cs.com 	
  t	
  :	
  @brendan)erney 	
  e	
  :	
  brendan.)erney@oraly)cs.com	
   	
   	
   	
  	
  
Word	
  Cloud	
  of	
  the	
  Oracle	
  Advanced	
  
Analy)cs	
  web-­‐pages	
  
	
  
hdp://www.oraly)cs.com/2015/01/crea)ng-­‐word-­‐cloud-­‐of-­‐oracle-­‐oaa.html	
  

Weitere ähnliche Inhalte

Was ist angesagt?

Css+tutorial+in+bangla
Css+tutorial+in+banglaCss+tutorial+in+bangla
Css+tutorial+in+bangla
jessicaemily
 
Weaving a Semantic Web across OSS repositories - a spotlight on bts-link, UDD...
Weaving a Semantic Web across OSS repositories - a spotlight on bts-link, UDD...Weaving a Semantic Web across OSS repositories - a spotlight on bts-link, UDD...
Weaving a Semantic Web across OSS repositories - a spotlight on bts-link, UDD...
olberger
 

Was ist angesagt? (14)

Conversational Semantics for the Web [CascadiaJS 2018]
Conversational Semantics for the Web [CascadiaJS 2018]Conversational Semantics for the Web [CascadiaJS 2018]
Conversational Semantics for the Web [CascadiaJS 2018]
 
SEO 101 - hipages Group Friday talk
SEO 101 - hipages Group Friday talkSEO 101 - hipages Group Friday talk
SEO 101 - hipages Group Friday talk
 
Croco talk pgconfeu
Croco talk pgconfeuCroco talk pgconfeu
Croco talk pgconfeu
 
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
 
Css+tutorial+in+bangla
Css+tutorial+in+banglaCss+tutorial+in+bangla
Css+tutorial+in+bangla
 
How elephants survive in big data environments
How elephants survive in big data environmentsHow elephants survive in big data environments
How elephants survive in big data environments
 
Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...
Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...
Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...
 
ReadingSEO - Technical SEO at Scale
ReadingSEO - Technical SEO at ScaleReadingSEO - Technical SEO at Scale
ReadingSEO - Technical SEO at Scale
 
Designing the Conversation [Concatenate 2018]
Designing the Conversation [Concatenate 2018]Designing the Conversation [Concatenate 2018]
Designing the Conversation [Concatenate 2018]
 
London seo master - feb 2020
London seo master - feb 2020London seo master - feb 2020
London seo master - feb 2020
 
Weaving a Semantic Web across OSS repositories - a spotlight on bts-link, UDD...
Weaving a Semantic Web across OSS repositories - a spotlight on bts-link, UDD...Weaving a Semantic Web across OSS repositories - a spotlight on bts-link, UDD...
Weaving a Semantic Web across OSS repositories - a spotlight on bts-link, UDD...
 
HighEdWeb 2014: Don't like your Google Search Interface? Make your Own!
HighEdWeb 2014: Don't like your Google Search Interface? Make your Own!HighEdWeb 2014: Don't like your Google Search Interface? Make your Own!
HighEdWeb 2014: Don't like your Google Search Interface? Make your Own!
 
(SDD407) Amazon DynamoDB: Data Modeling and Scaling Best Practices | AWS re:I...
(SDD407) Amazon DynamoDB: Data Modeling and Scaling Best Practices | AWS re:I...(SDD407) Amazon DynamoDB: Data Modeling and Scaling Best Practices | AWS re:I...
(SDD407) Amazon DynamoDB: Data Modeling and Scaling Best Practices | AWS re:I...
 
Svetlin Nakov - Cognate or False Friend? Ask the Web!
Svetlin Nakov - Cognate or False Friend? Ask the Web!Svetlin Nakov - Cognate or False Friend? Ask the Web!
Svetlin Nakov - Cognate or False Friend? Ask the Web!
 

Ähnlich wie SQL: The one language to rule all your data

Employing Custom Fonts
Employing Custom FontsEmploying Custom Fonts
Employing Custom Fonts
Paul Irish
 
Campaign monitor guide to CSS support in email
Campaign monitor guide to CSS support in emailCampaign monitor guide to CSS support in email
Campaign monitor guide to CSS support in email
Alex Grinyayev
 

Ähnlich wie SQL: The one language to rule all your data (19)

Overview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseOverview of running R in the Oracle Database
Overview of running R in the Oracle Database
 
SQL : The one language to rule all your data
SQL : The one language to rule all your dataSQL : The one language to rule all your data
SQL : The one language to rule all your data
 
Employing Custom Fonts
Employing Custom FontsEmploying Custom Fonts
Employing Custom Fonts
 
Architecting Container Infrastructure for Security and Compliance - CON406 - ...
Architecting Container Infrastructure for Security and Compliance - CON406 - ...Architecting Container Infrastructure for Security and Compliance - CON406 - ...
Architecting Container Infrastructure for Security and Compliance - CON406 - ...
 
Database Backup
Database BackupDatabase Backup
Database Backup
 
Using the whole web as your dataset
Using the whole web as your datasetUsing the whole web as your dataset
Using the whole web as your dataset
 
Google and People Friendly Blogging
Google and People Friendly BloggingGoogle and People Friendly Blogging
Google and People Friendly Blogging
 
useR! 2012 Talk
useR! 2012 TalkuseR! 2012 Talk
useR! 2012 Talk
 
Campaign monitor guide to CSS support in email
Campaign monitor guide to CSS support in emailCampaign monitor guide to CSS support in email
Campaign monitor guide to CSS support in email
 
Regular Expressions for SEO
Regular Expressions for SEORegular Expressions for SEO
Regular Expressions for SEO
 
C 2
C 2C 2
C 2
 
SREcon americas 2019 - Latency SLOs Done Right
SREcon americas 2019 - Latency SLOs Done RightSREcon americas 2019 - Latency SLOs Done Right
SREcon americas 2019 - Latency SLOs Done Right
 
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...
NEW LAUNCH! Natural Language Processing for Data Analytics - MCL343 - re:Inve...
 
AWS AI Services - What's new
AWS AI Services - What's newAWS AI Services - What's new
AWS AI Services - What's new
 
Distributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in PythonDistributed Natural Language Processing Systems in Python
Distributed Natural Language Processing Systems in Python
 
Web Typography
Web TypographyWeb Typography
Web Typography
 
Influx/Days 2017 San Francisco | Christine Yen
Influx/Days 2017 San Francisco | Christine YenInflux/Days 2017 San Francisco | Christine Yen
Influx/Days 2017 San Francisco | Christine Yen
 
Reification
ReificationReification
Reification
 
Serverless Text Analytics with Amazon Comprehend
Serverless Text Analytics with Amazon ComprehendServerless Text Analytics with Amazon Comprehend
Serverless Text Analytics with Amazon Comprehend
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

SQL: The one language to rule all your data

  • 1.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com                                                    The  one  language  to  rule  all  your  data         Brendan Tierney
  • 2.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 3.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           §  Data  Warehousing  since  1997   §  Data  Mining  since  1998   §  Analy)cs  since  1993  
  • 4.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 5.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 8. Analyze   SELECT product, SUM(sale) AS "Total Sales" FROM order_details GROUP BY product;
  • 9. Analyze   SELECT product, SUM(sale) AS "Total Sales" FROM order_details GROUP BY product HAVING SUM(sale) >= 10000;
  • 10.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 11.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Let  us  start  with  some  Basics  
  • 12.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 13.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 14.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 15.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           SUM(x)   AVG(x)   STDDEV(x)   CORR(x,  y)  
  • 16.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 17.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 18.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 19.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Crea)ng  a  story  about  our  data.  
  • 20.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 21.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           How  we  do  Analy)cs?      Some)mes  how  we  are  told  how  to  do  Analy)cs  
  • 22.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Do  we  really  need  to  use  other  tools  &  languages?  
  • 23.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 24.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 25.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           But  !  
  • 26.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           But  !  
  • 27.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           But  !   Our  data  no  longer  fits  on  our  laptop.     a  Big  Data  issue?  
  • 28.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 29.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           R  -­‐  The  Challenges     §  Scalability   §  Regardless  of  the  number  of  cores  on  your  CPU,  R  will  only  use  1  on  a  default   build   §  Performance   §  R  reads  data  into  memory  by  default.  Easy  to  exhaust  RAM  by  storing  unnecessary   data.  Typically  R  will  throw  an  excep)on  at  2GB.   §  Paralleliza)on  can  be  challenge.  Is  not  Default.  Packages  available   §  Produc)on  Deployment   §  Difficul)es  deploying  R  in  produc)on   §  Typically  need  to  re-­‐code  in  …..  
  • 30.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           I’m  ge`ng  too  old  for  this  new  stuff  !   Can  you  teach  an  old  dog  new  tricks?  
  • 31.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           What  if  you  could  use  the  language  and  skills  you   already  have?        
  • 32.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Did you know?
  • 33.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Sta)s)cal  Func)ons  in  Oracle   All  of  these  are   FREE     with  the  Database   These  are  ocen   forgoden  about  
  • 34.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 35.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           DBMS_STAT_FUNC  
  • 36.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           set serveroutput on declare 
    s         DBMS_STAT_FUNCS.SummaryType; 
 begin 
   
    DBMS_STAT_FUNCS.SUMMARY('DMUSER', 'MINING_DATA_BUILD_V', 'AGE', 3, s);    dbms_output.put_line('SUMMARY STATISTICS'); 
    dbms_output.put_line('Count  : '||s.count); 
    dbms_output.put_line('Min    : '||s.min); 
    dbms_output.put_line('Max    : '||s.max); 
    dbms_output.put_line('Range  : '||s.range); 
    dbms_output.put_line('Mean   : '||round(s.mean)); 
    dbms_output.put_line('Mode Count : '||s.cmode.count); 
    dbms_output.put_line('Mode        : '||s.cmode(1)); 
    dbms_output.put_line('Variance    : '||round(s.variance)); 
    dbms_output.put_line('Stddev      : '||round(s.stddev)); 
    dbms_output.put_line('Quantile 5  : '||s.quantile_5); 
    dbms_output.put_line('Quantile 25 : '||s.quantile_25); 
    dbms_output.put_line('Median      : '||s.median); 
    dbms_output.put_line('Quantile 75 : '||s.quantile_75); 
    dbms_output.put_line('Quantile 95 : '||s.quantile_95); 
    dbms_output.put_line('Extreme Count : '||s.extreme_values.count); 
    dbms_output.put_line('Extremes      : '||s.extreme_values(1)); 
    dbms_output.put_line('Top 5 : '||s.top_5_values(1)||','|| 
                                                 s.top_5_values(2)||','|| 
                                                 s.top_5_values(3)||','|| 
                                                 s.top_5_values(4)||','|| 
                                                 s.top_5_values(5)); 
    dbms_output.put_line('Bottom 5 : '||s.bottom_5_values(5)||','|| 
                                                      s.bottom_5_values(4)||','|| 
                                                      s.bottom_5_values(3)||','|| 
                                                      s.bottom_5_values(2)||','|| 
                                                      s.bottom_5_values(1)); 
 end; hdp://www.oraly)cs.com/2013/04/part-­‐1ge`ng-­‐started-­‐with-­‐sta)s)cs.html  
  • 37.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Stats  and  More  Stats   §  Correla)ons    (Spearman’s,  Kendall)   §  Linear  Regression   §  T-­‐Test   §  F-­‐Test   §  Hypothesis  tes)ng   §  Anova   §  Ranking   §  Window  Aggregate  func)ons   §  Lead  /  Lag   §  Cross  Tabula)on   §  PIVOT  /  UNPIVOT   §  …  
  • 38.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 39.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 40. Scalable   Highly  Secure   No  Data     Movement   Real  Time   Produc)on   Deployment   Faster  
  • 41.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 42.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 43.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Comprehensive  Advanced  Analy)cs  Plalorm  
  • 44.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Oracle  Data  Mining   §  PL/SQL  Package   §  DBMS_DATA_MINING   §  DBMS_DATA_MINING_TRANSFORM   §  DBMS_PREDICTIVE_ANALYTICS   §  SQL  Func)ons   –  PREDICTION   –  PREDICTION_PROBABILITY   –  PREDICTION_BOUNDS   –  PREDICTION_COST   –  PREDICTION_DETAILS   –  PREDICTION_SET   –  CLUSTER_ID   –  CLUSTER_DETAILS   –  CLUSTER_DISTANCE   –  CLUSTER_PROBABILITY   –  CLUSTER_SET   –  FEATURE_ID   –  FEATURE_DETAILS   –  FEATURE_SET   –  FEATURE_VALUE   §  12c  –  Predic)ve  Queries   §  aka    Dynamic  Queries   §  Transi)ve  dynamic  Data  Mining  models   §  Can  scale  to  many  100+  models  all  in  one   statement    
  • 45.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           select cust_id, affinity_card, PREDICTION( FOR to_char(affinity_card) USING *) OVER () pred_affinity_card from mining_data_build_v; PQ  to  predict  the   AFFINITY_CARD  value.     Using  all  the  data    USING  *  
  • 46.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           select cust_id, affinity_card, PREDICTION( FOR to_char(affinity_card) USING *) OVER () pred_affinity_card from mining_data_build_v; With  PQs  we  can   dynamically  create   new  DM  models  based   on  an  Adribute(s)    
  • 47.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           select cust_id, affinity_card, PREDICTION( FOR to_char(affinity_card) USING *) OVER (PARTITION BY "COUNTRY_NAME") pred_affinity_card from mining_data_build_v; A  new  DM  Model  will   be  created  for  each   Country  (19)   With  PQs  we  can   dynamically  create   new  DM  models  based   on  an  Adribute(s)    
  • 48.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Analy)c  Func)ons    (in  12c)   >46  Analy)cs  Func)ons  in  12c  
  • 49.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 50.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           What  about  R  ?  
  • 51.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           -- -- There are 2 ways to use the GLM model : in Batch and in Real-Time mode -- -- First Step : Build the in-database R script to score you new data -- Begin sys.rqScriptDrop('Demo_GLM_Batch'); sys.rqScriptCreate('Demo_GLM_Batch', 'function(dat, datastore_name) { ore.load(datastore_name) prd <- predict(mod, newdata=dat) prd[as.integer(rownames(prd))] <- prd res <- cbind(dat, PRED = prd) res}'); end; / -- -- Now you can run the script to score the new data in Batch model -- The data is located in the table MINING_DATA_APPLY -- select * from table(rqTableEval( cursor(select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION, HOUSEHOLD_SIZE, YRS_RESIDENCE from MINING_DATA_APPLY_V where rownum <= 10), cursor(select 1 as "ore.connect", 'myDatastore' as "datastore_name" from dual), 'select CUST_GENDER, AGE, CUST_MARITAL_STATUS, COUNTRY_NAME, CUST_INCOME_LEVEL, EDUCATION, HOUSEHOLD_SIZE, YRS_RESIDENCE, 1 PRED from MINING_DATA_APPLY_V','Demo_GLM_Batch')) order by 1, 2, 3;
  • 52.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 53. Store   Access   Analyze   Protect  
  • 54.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           External   View   External   View   Conceptual  Schema   Physical  Schema  
  • 55.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           CREATE TABLE countries_ext ( country_code VARCHAR2(5), country_name VARCHAR2(50), country_language VARCHAR2(50) ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY ext_tab_data ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE FIELDS TERMINATED BY ',' MISSING FIELD VALUES ARE NULL ( country_code CHAR(5), country_name CHAR(50), country_language CHAR(50) ) ) LOCATION ('Countries1.txt','Countries2.txt') ) PARALLEL 5 REJECT LIMIT UNLIMITED; SELECT * FROM countries_ext ORDER BY country_name; COUNT COUNTRY_NAME COUNTRY_LANGUAGE ----- ---------------------------- ----------------------------- ENG England English FRA France French GER Germany German IRE Ireland English External   View   External   View   Conceptual  Schema   Physical  Schema  
  • 56.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           CREATE TABLE json_dump_file_contents ( json_document CLOB ) ORGANIZATION EXTERNAL ( TYPE ORACLE_LOADER DEFAULT DIRECTORY order_entry_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY 0x'0A' DISABLE_DIRECTORY_LINK_CHECK BADFILE loader_output_dir: 'JSONDumpFile.bad' LOGFILE order_entry_dir: 'JSONDumpFile.log' FIELDS ( json_document CHAR(5000) ) ) LOCATION (order_entry_dir:'PurchaseOrders.dmp') ) PARALLEL REJECT LIMIT UNLIMITED; SELECT count(*) FROM json_dump_file_contents po WHERE to_number(json_value(json_document, '$.PONumber')) > 1500; External   View   External   View   Conceptual  Schema   Physical  Schema  
  • 57.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           CREATE TABLE json_documents ( id RAW(16) NOT NULL, data CLOB, CONSTRAINT json_documents_pk PRIMARY KEY (id), CONSTRAINT json_documents_json_chk CHECK (data IS JSON (STRICT) ) ); INSERT INTO json_documents (id, data) VALUES (SYS_GUID(), '{ "FirstName" : ”Brendan", "LastName" : ”Tierney", "Job" : "Clerk", "Address" : { "Street" : ”1 Main Street", "City" : ”Dublin", "Country" : ”Ireland”}, "ContactDetails" : { "Email" : ”xyz@oralytics.com", "Phone" : ”353 123 1234567", "Twitter" : "@brendantierney" }, "DateOfBirth" : "01-JAN-2000", "Active" : unknown }'); SELECT a.data.FirstName, a.data.LastName, a.data.ContactDetails.Email AS Email FROM json_documents a ORDER BY a.data.FirstName, a.data.LastName; FIRSTNAME LASTNAME EMAIL --------------- --------------- ------------------------- Brendan Tierney xyz@oralytics.com External   View   External   View   Conceptual  Schema   Physical  Schema  
  • 58.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           CREATE TABLE customer ( id NUMBER(38), name VARCHAR2(100), address VARCHAR2(100), city VARCHAR2(40), country VARCHAR2(50), location MDSYS.SDO_GEOMETRY ); INSERT INTO customer VALUES ( cust_seq.nextval,‘Brendan Tierney’, ‘1 Main Street’, ‘Dublin’, ‘Ireland’, SDO_GEOMETRY (2001, -- Geometry Type: 2-D Point 8307, -- SRID, Datum: WGS84 SDO_POINT_TYPE (53.3498, -- Longitude for Dublin 6.2603, -- Latitude for Dublin NULL), NULL, NULL ) ) SELECT sdo_geom.sdo_distance(c1.locationm c2.location, 0.5, ‘unit=kilometer’) FROM customer c1, customer c2 WHERE c1.id = 1 AND c2.id = 2; External   View   External   View   Conceptual  Schema   Physical  Schema   Spa)al   &   Graph  
  • 59.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Using  Oracle  Big  Data  SQL,  organiza)ons  can:   •  Combine  data  from  Oracle  Database,  Apache  Hadoop  and  NoSQL  in  a  single  SQL  query   •  Query  and  analyze  data  In  Apache  Hadoop  and  NoSQL   •  Maximize  query  performance  on  all  data  using  advanced  techniques  like  Smart  Scan,   Par))on  Pruning,  Storage  Indexes,  Bloom  Filters  and  Predicate  Push-­‐Down  in  a   distributed  architecture   •  Integrate  big  data  analyses  into  exis)ng  applica)ons  and  architectures     External   View   External   View   Conceptual  Schema   Physical  Schema   Spa)al   &   Graph   Oracle   NoSQL  
  • 60.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           External   View   External   View   Conceptual  Schema   Physical  Schema   Spa)al   &   Graph   Oracle   NoSQL   Accessing  data  on  Hadoop  or  an  Oracle  NoSQL  Database  requires  access  via  Hive/ HCatalog.       To  use  this  Hadoop  or  NoSQL  data   •  Crea)ng  a  NoSQL  Store  and  a  Table  (or  Hadoop  data)   •  Configuring  Hive/HCatalog  to  access  NoSQL  Table  or  other  data     •  Configuring  Oracle  Database  to  talk  to  HCatalog  via  an  external  table
  • 61.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           External   View   External   View   Conceptual  Schema   Physical  Schema   Spa)al   &   Graph   Oracle   NoSQL   CREATE TABLE movieapp_log_json ( custid INTEGER , movieid INTEGER , genreid INTEGER , time VARCHAR2 (20) , recommended VARCHAR2 (4) , activity NUMBER, rating INTEGER, price NUMBER ) ORGANIZATION EXTERNAL ( TYPE ORACLE_HIVE DEFAULT DIRECTORY DEFAULT_DIR ) REJECT LIMIT UNLIMITED; SELECT f.custid, m.title, m.year, m.gross, f.rating FROM movieapp_log_json f, movie m WHERE f.movieId = m.movie_id AND f.rating > 4 Selects Hadoop data and in-database data
  • 62. Store   Access   Analyze   Protect  
  • 63.
  • 64.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Data  Security   §  Can  apply  all  the  typical  data  security  that  comes  with  Oracle  on  all  our  data   –  Masking/Redac)on   –  Virtual  Private  Databases   –  Fine-­‐grained  access  control  
  • 65.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Store   Access   Analyze   Protect   External   View   External   View   Conceptual  Schema   Physical  Schema   Spa)al   &   Graph   Oracle   NoSQL  
  • 66.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 67.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           SQL     One  Ring  to  rule  them  all,  One  Ring  to  find  them,   One  Ring  to  bring  them  all  and  in  the  darkness  bind  them     Sauron  –  Lord  of  the  Rings    
  • 68.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           SQL     One  SQL    to  rule  them  all,  One    SQL  to  find  them,   One    SQL  to  bring  them  all  and  in  the  Database  bind  them     Sauron  –  Lord  of  the  Rings   Brendan  Tierney  
  • 69.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           SQL     One  Language  to  rule  them  all,  One  Language  to  find  them,   One  Language  to  bring  them  all  and  in  the  Database  bind  them     Sauron  –  Lord  of  the  Rings   Brendan  Tierney  
  • 70.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 71.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com          
  • 72.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com                                            brendan.)erney@oraly)cs.com                                  @brendan)erney                                www.oraly)cs.com                                  ie.linkedin.com/in/brendan)erney  
  • 73.      www.oraly)cs.com  t  :  @brendan)erney  e  :  brendan.)erney@oraly)cs.com           Word  Cloud  of  the  Oracle  Advanced   Analy)cs  web-­‐pages     hdp://www.oraly)cs.com/2015/01/crea)ng-­‐word-­‐cloud-­‐of-­‐oracle-­‐oaa.html