SlideShare ist ein Scribd-Unternehmen logo
1 von 25
1 
Advanced 
Analy,cs 
Part 
II: 
Do 
More 
with 
Your 
Data 
(and 
your 
people 
too..) 
DC: 
Rob 
Morrow, 
Senior 
Systems 
Engineer 
MD: 
Richard 
Im, 
Senior 
Systems 
Engineer 
September 
3
Doing 
more 
with 
Cloudera 
Enterprise 
Data 
Hub 
and 
why 
there 
is 
no 
Compe,,on 
to 
EDH. 
2 
Do 
more 
with 
data 
• Deliver 
mul*-­‐genre 
analy*cs 
in 
a 
single 
pla3orm 
• Apply 
diverse 
concurrent 
analy,cs 
to 
your 
full 
datasets 
in-­‐ 
place 
• Connect 
easily 
to 
Partner 
products 
• Not 
more 
copies 
of 
the 
data; 
More 
analy,cs 
without 
moving 
the 
data 
• The 
ONLY 
way 
to 
do 
that 
is 
with 
a 
single 
Tool-­‐Rich 
plaRorm 
that 
has 
Analy,cs 
baked-­‐in. 
It’s 
more 
than 
just 
Map/Reduce… 
• Protect 
the 
data 
products: 
• System 
Changes 
(Installa,on/Upgrades: 
CM) 
• Data 
Change 
(Navigator) 
• Data 
Over-­‐writes 
(Spark 
vs 
Storm) 
• Unauthorized 
Access 
(Sentry, 
Kerberos, 
LDAP) 
• Encryp,on, 
Data 
Movement 
(Rhino, 
Intel, 
Gazzang)
Business 
Intelligence 
If 
all 
we 
want 
to 
know 
is 
what 
has 
already 
happened, 
then 
BI 
is 
a 
fine 
answer. 
3 
What 
happened, 
where, 
and 
when? 
Time 
Facts 
Interpreta,ons
How 
about 
the 
Future? 
How 
about 
Root 
Cause? 
How 
About 
“What 
If’s”? 
How 
about 
“fuzzy” 
mul,variate 
ques,ons? 
4 
What 
will 
happen? 
What 
happened, 
where, 
and 
when? 
How 
can 
we 
do 
beeer? 
How 
and 
why 
did 
it 
happen? 
Time 
Data 
Size 
Facts 
Interpreta,ons
5 
A 
drive-­‐by 
of 
Sta,s,cal 
op,ons 
on 
EDH 
Do 
more 
with 
data 
• Oryx: 
Recommender 
System 
used 
for 
Produc*on 
Purposes. 
Real-­‐Time 
Updates. 
• MADlib: 
In-­‐database 
Machine 
Learning 
libraries 
for 
Impala. 
• ImPyla: 
Python 
UDF’s 
inside 
Impala. 
Really 
fast. 
• Mahout: 
Yep, 
s*ll 
there. 
Easy-­‐to-­‐use 
Bayesian 
models, 
Classifiers, 
Clustering, 
etc. 
Requires 
data 
wrangling. 
• SparkML: 
Exact 
same 
algorithms 
as 
Mahout, 
only 
faster 
because 
it’s 
in-­‐memory. 
• SAS 
Visual 
Analy*cs: 
U*lizes 
Cloudera 
EDH 
as 
data 
hos*ng 
pla3orm. 
• Rhadoop/Revolu*on 
Analy*cs: 
Quickly 
becoming 
defacto 
Data 
Science 
pla3orm 
due 
to 
cost/performance 
• H2O: 
In-­‐Development, 
but 
very 
promising 
alterna*ve 
to 
R/ 
SAS
Spark: 
Why 
do 
we 
need 
it? 
10x 
beeer 
performance 
than 
M/R 
Less 
Code 
wrieen 
using 
Scala 
instead 
of 
Java 
Allows 
founda,onal 
improvements 
that 
drive 
many 
new 
features: 
in-­‐memory 
indexing, 
etc. 
6
7 
How 
Cloudera 
Helps… 
• Best 
Data 
Wrangling 
Tools 
on 
the 
planet, 
Period. 
And 
all 
of 
them 
at 
scale! 
• If 
you 
can 
write 
code 
in 
any 
language 
available 
inside 
Linux 
(Perl, 
shell, 
Java, 
Python, 
SQL), 
you 
can 
data 
wrangle 
on 
Cloudera’s 
EDH. 
Not 
to 
men,on 
partners 
like 
Informa,ca, 
Pemtaho, 
etc. 
• Advanced 
Analy,cal 
Methods 
backed-­‐in, 
at-­‐scale 
and 
many 
different 
ways 
to 
u,lize 
them. 
• Cloudera 
excels 
at 
having 
the 
right 
tools 
to 
answer 
tough, 
fuzzy 
ques,ons 
that 
have 
an 
impact 
on 
the 
real 
world. 
If 
we 
don’t 
have 
it, 
we 
invent 
it 
then 
give 
it 
back! 
• Try 
Another 
model, 
improve 
this 
one, 
track 
changes 
in 
results, 
all 
on 
one 
plaRorm—and 
securely! 
• The 
combina,on 
of 
Cloudera 
EDH’s 
Navigator, 
Spark, 
Mahout, 
Sentry 
means 
you 
can 
run 
advanced 
models 
in 
an 
Enterprise-­‐supportable 
way 
that 
allows 
granular 
tracking 
of 
every 
step. 
• Cloudera 
have 
been 
providing 
advanced 
methods 
to 
the 
Government 
for 
years. 
• Cloudera 
are 
deployed 
to 
answer 
other 
tough 
ques,ons 
all 
over 
the 
Intelligence 
Community 
(HUMINT< 
SIGINT<COMMINT<IMINT) 
Which 
Mission 
problem 
do 
you 
want 
to 
address? 
These 
DO 
NOT 
and 
WILL 
NOT 
show 
up 
in 
our 
marke,ng 
collateral. 
So, 
talk 
to 
your 
Account 
SE! 
Prepare 
Data 
Model 
Data 
Mgt 
Contextual 
Modelling
8 
Advanced 
Analy,c 
Tools 
already 
on 
Cloudera’s 
Enterprise 
Data 
Hub 
• Advanced 
Informa,on 
Loca,ng 
methods 
Available 
out-­‐of-­‐the-­‐box 
in 
Cloudera 
Search 
• Fuzzy 
Search 
Finds 
“boat” 
and 
“float” 
with 
the 
same 
search 
query 
and 
is 
easily 
tunable. 
• Batch-­‐query 
allows 
users 
to 
upload 
a 
file 
with 
thousands/millions 
of 
search 
terms 
and 
which 
index 
they 
want 
to 
search 
and 
returns 
a 
down-­‐selectable 
set 
of 
data 
for 
all 
terms. 
• In-­‐memory 
indexing 
coming 
soon! 
• Graph 
Analy,cs 
Tools 
Extend 
EDH 
to 
uncover 
hidden 
rela,onships 
• Whether 
you 
choose 
Titan 
Aurelius, 
Giraph, 
or 
the 
Spark-­‐enabled 
GraphX 
for 
your 
Graph 
analy,c 
workload, 
Cloudera 
supports 
your 
Mission. 
• Spark 
technology 
was 
invented 
specifically 
to 
address 
itera,ve 
and 
self-­‐referen,al 
data 
at 
the 
speed 
of 
memory! 
• Machine 
Learning 
Methods 
to 
Predict 
future 
events 
and 
Rapidly 
Categorize 
current 
trends 
• Whether 
you 
choose 
a 
Naïve 
Bayesian 
filter 
to 
mobilize 
against 
Insider 
Threat, 
a 
Classifier 
to 
group 
objects 
for 
more 
effec,ve 
models, 
or 
Principal 
Components 
Analysis 
to 
reveal 
the 
3 
relevant 
factors 
out 
of 
thousands, 
Cloudera’s 
EDH 
is 
ready 
for 
your 
workload.
9 
Current 
Ways 
to 
make 
decisions: 
I 
think 
we 
should 
go 
this 
way… 
HIghest 
Paid 
Person 
in 
the 
Organiza,on 
OR 
People 
like 
us 
more 
when 
we 
go 
this 
way… 
Poli*cs 
Subject 
MaYer 
Expert 
OR 
My 
training/ 
intui9on 
says 
we 
should 
go 
this 
way… 
VS 
Data 
Scien*st
10 
Data 
Science 
Teams 
in 
the 
Real 
World. 
You 
already 
have 
the 
people… 
Es9mates 
required 
capital 
(people 
and 
poli9cal), 
team 
care/feeding. 
Chooses 
alterna9ves. 
Fallout 
of 
each 
path? 
Provides 
legit 
and 
relevant 
hypotheses. 
Provides 
models/ 
Probabili9es 
Provides 
Code, 
tools, 
code. 
Tools.
11 
Strategies 
that 
Drive 
Data 
Science 
into 
your 
Organiza,on: 
• Choose 
a 
single 
Specific 
Business/Mission 
problem 
• Never 
Choose 
IT 
first, 
and 
then 
look 
for 
a 
problem 
it 
solves. 
Great 
way 
to 
create 
a 
Welfare 
Program, 
but 
the 
most 
expensive 
and 
lowest 
payback 
projects 
in 
Government 
have 
this 
approach 
in 
common. 
• Seek 
to 
Disconfirm 
an 
Idea/Approach 
• Never 
begin 
with 
an 
idea/tool 
you’d 
like 
to 
confirm. 
Science 
doesn’t 
work 
that 
way. 
Sorry. 
• Promote 
Equal 
Contribu,ons 
from 
Areas 
of 
Exper,se 
and 
Acquire 
Tools 
as-­‐needed 
• Do 
not 
allow 
micro-­‐steering 
due 
to 
“personali,es”. 
This 
is 
quite 
hard 
in 
prac,ce, 
actually. 
• Be 
honest 
about 
Team/Organiza,onal 
Weaknesses 
• Though 
it 
takes 
slightly 
longer, 
crea,ng 
a 
data-­‐oriented 
center 
of 
gravity 
reduces 
risk, 
increases 
effec,veness, 
and 
balances 
contribu,ons 
while 
naturally 
crea,ng 
independent 
measures 
of 
success. 
But 
you 
can’t 
begin 
un,l 
you 
understand 
the 
gaps.
12 
Let’s 
Pick 
One 
Area 
and 
Go 
Deeper: 
Principal 
Components 
Analysis 
If 
a 
mul,variate 
dataset 
is 
viewed 
as 
a 
set 
of 
coordinates 
in 
a 
high-­‐dimensional 
data 
space, 
PCA 
can 
supply 
the 
user 
with 
a 
lower-­‐dimensionality 
picture. 
If 
the 
original 
data 
has 
1B 
variables, 
how 
few 
of 
these 
variables 
do 
you 
need 
in 
order 
to 
predict 
90% 
of 
the 
variance? 
80%? 
60%? 
How 
few 
dimensions 
do 
you 
need 
for 
a 
visualiza,on? 
Humans 
easily 
visualize 
3D 
with 
current 
technology, 
but 
no 
more. 
A 
25-­‐element 
high-­‐dimensional 
space 
vector 
(term 
co-­‐occurrences) 
of 
the 
word 
"road" 
rendered 
in 
gray 
scale. 
Original 
was 
300K 
terms.
Real-­‐World 
: 
Rescue 
Objec,vity 
from 
Subjec,vity 
People 
[who 
want 
money] 
complain 
of 
an 
increased 
focus 
on 
DoD 
training 
for 
“culture” 
over 
“language”, 
sta*ng 
that… 
A) 
“Culture” 
trains 
13 
concurrently 
whenever 
you 
train 
“Language” 
because 
the 
two 
are 
linked. 
B) 
This 
hypothesis 
isn’t 
testable 
because 
it’s 
far 
too 
subjec,ve 
and 
too 
domain-­‐ 
specific. 
Assessment 
Requires 
SME’s-­‐-­‐from 
the 
“language” 
skills 
community-­‐-­‐to 
make 
recommenda,ons. 
Circular 
logic, 
huh? 
J 
Yielding 
2 
Dis,nct 
Hypotheses: 
1) “Culture” 
is 
focused 
upon 
more 
than 
“Language” 
in 
DoD 
Policy 
2) “Language” 
is 
direc,onally 
linked 
to 
“Culture” 
(if 
you 
acquire 
“language”, 
you 
also 
acquire 
“culture”)
Real-­‐World 
: 
Create 
Objec,vity 
from 
the 
Subjec,ve 
I 
Don’t 
buy 
it. 
So, 
we 
tested 
it 
and 
published 
the 
results… 
here’s 
how: 
Point 
1: 
As 
a 
ra*o 
among 
Policy 
docs, 
“Culture” 
has 
fewer 
men*ons 
than 
Language 
in 
every 
document 
except 
2. 
14
Algorithms: 
“Fuzzy” 
Hypothesis 
Tes,ng 
Point 
2a: 
Ager 
building 
a 
high-­‐dimensional 
model 
based 
on 
weighted 
co-­‐occurrences 
which 
uses 
all 
English 
Language 
topics 
from 
Wikipedia 
to 
define 
“meaning”, 
it 
turns 
out 
the 
opposite 
is 
true. 
First, 
the 
Representa*onal 
Density 
of 
Culture 
is 
Higher 
Than 
Language 
(average 
distance 
is 
lower) 
15
Algorithms: 
“Fuzzy” 
Hypothesis 
Tes,ng 
Point 
2b: 
Using 
the 
same 
model, 
we 
grabbed 
the 
20 
nearest-­‐neighbors 
to 
each 
word. 
Then, 
ager 
reducing 
the 
dimensionality, 
we 
showed: 
Though 
“Language” 
tends 
to 
include 
more 
func*on 
words 
and 
fewer 
seman*cally 
rich 
terms, 
it 
DOES 
have 
“Culture” 
as 
it’s 
nearest 
non-­‐deriva*ve 
neighbor. 
But 
the 
inverse 
is 
not 
true. 
Meaning: 
Knowing 
something 
about 
Culture 
contributes 
to 
Language, 
but 
knowing 
something 
about 
Language 
contributes 
far 
less 
to 
Culture 
(and 
is 
equivalent 
to 
or 
less 
than 
other 
noise/func*on 
words.) 
16
17 
Language
18
19 
Why 
do 
this 
stuff? 
• Results 
Published 
in 
peer-­‐reviewed 
scien,fic 
journal: 
• Abbe 
& 
Morrow, 
“Lexical 
and 
Seman,c 
Analysis 
of 
Culture 
and 
Foreign 
Language 
Policies”, 
Journal 
of 
Culture, 
Language 
and 
Interna9onal 
Security, 
May 
2014 
• It 
impacts 
the 
DoD/Government. 
• A 
problem 
exists 
when 
someone 
establishes 
an 
oligarchy: 
“You 
can’t 
measure 
it 
easily, 
therefore 
let 
me 
make 
all 
the 
decisions 
for 
you.” 
• It 
uses 
DATA 
to 
drive 
the 
decision, 
and 
not 
whichever 
sub-­‐ 
organiza,on 
is 
the 
most 
popular 
for 
this 
5 
minutes. 
• Inserts 
harmony 
and 
objec,vity. 
It’s 
a 
cool 
“fuzzy” 
problem.
20 
Analy,cs 
to 
users: 
HUE 
• Included 
in 
EDH 
• Mul,-­‐capability 
interface 
for 
analy,cs 
• Interac,ve 
graph 
libraries 
• Customizable 
Search, 
Impala, 
Hive, 
Pig 
Apps 
• But 
Also: 
Tableau, 
Pentaho, 
PlaRora, 
ZoomData, 
SAS…
21 
Cloudera 
Manager 
End-­‐to-­‐End 
Administra,on 
for 
CDH 
Manage 
Easily 
deploy, 
configure 
& 
op,mize 
clusters 
1 
Monitor 
Maintain 
a 
central 
view 
of 
all 
ac,vity 
2 
Diagnose 
Easily 
iden,fy 
and 
resolve 
issues 
3 
Integrate 
Use 
Cloudera 
Manager 
with 
exis,ng 
tools 
4
Thank 
You!
23 
2 
3 
Enterprise 
Services 
Inges,on 
& 
ETL 
Pilot 
Reference 
implementa,on 
up 
to 
3 
sources, 
5 
transforma,ons, 
1 
target 
Create, 
execute, 
test, 
and 
review 
a 
custom 
inges,on/ETL 
plan 
Security 
Integra,on 
Implementa,on 
of 
role 
based 
access 
control 
with 
the 
data 
processing 
environment 
Hadoop 
Cluster 
Deployment 
Cer,fica,on 
Fully 
review 
hardware, 
data 
sources, 
typical 
jobs, 
and 
exis,ng 
SLAs 
Develop, 
implement, 
benchmark, 
and 
document 
Hadoop 
deployment
24 
Path to Success – Services & Training 
Hadoop 
Cluster 
Deployment 
Cer,fica,on 
1 
week 
Inges,on 
& 
ETL 
Pilot 
2 
weeks 
Security 
Integra,on 
1 
week 
Cloudera 
Admin 
Training 
3 
days 
Hive/Pig 
Training 
2 
days 
Data 
Science 
3 
days 
Developer 
Training 
4 
days
25 
Nomina*ons 
are 
open 
for 
the 
Data 
Impact 
Awards! 
Submission 
deadline: 
September 
12th 
• Winners 
will 
receive: 
• Free 
Strata 
+ 
Hadoop 
World 
pass 
• Free 
seat 
to 
any 
public 
Cloudera 
University 
Training 
• Invita,on 
to 
exclusive 
awards 
dinner 
• Bragging 
rights 
©2014 
Cloudera, 
Inc. 
All 
rights 
reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...Jongwook Woo
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksJongwook Woo
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...Swiss Data Forum Swiss Data Forum
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLPaco Nathan
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Course 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos DeligiannisCourse 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos DeligiannisBetacowork
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USCSri Ambati
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentalsrjain51
 
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Amr Awadallah
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindwise
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Amr Awadallah
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLPaco Nathan
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 

Was ist angesagt? (20)

Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
Special talk: Introduction to Big Data and FinTech at Financial Supervisory S...
 
Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on Networks
 
Big Data, Baby Steps
Big Data, Baby StepsBig Data, Baby Steps
Big Data, Baby Steps
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
Cas pratique de la science de la donnée dans le domaine universitaire - Data ...
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Human-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage MLHuman-in-a-loop: a design pattern for managing teams which leverage ML
Human-in-a-loop: a design pattern for managing teams which leverage ML
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Course 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos DeligiannisCourse 3 : Types of data and opportunities by Nikolaos Deligiannis
Course 3 : Types of data and opportunities by Nikolaos Deligiannis
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
 
Big Data
Big DataBig Data
Big Data
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)Cloudera/Stanford EE203 (Entrepreneurial Engineer)
Cloudera/Stanford EE203 (Entrepreneurial Engineer)
 
Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?Disruptive Innovation: how do you use these theories to manage your IT?
Disruptive Innovation: how do you use these theories to manage your IT?
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligence
 
Yahoo Microstrategy 2008
Yahoo Microstrategy 2008Yahoo Microstrategy 2008
Yahoo Microstrategy 2008
 
Human-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage MLHuman-in-the-loop: a design pattern for managing teams that leverage ML
Human-in-the-loop: a design pattern for managing teams that leverage ML
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 

Ähnlich wie Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data

Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerMicrosoft
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6varshakumar21
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overviewNitesh Ghosh
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera, Inc.
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists CCG
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 

Ähnlich wie Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data (20)

Big data business case
Big data   business caseBig data   business case
Big data business case
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Innovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringerInnovation med big data – chr. hansens erfaringer
Innovation med big data – chr. hansens erfaringer
 
Data science unit2
Data science unit2Data science unit2
Data science unit2
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
data science chapter-4,5,6
data science chapter-4,5,6data science chapter-4,5,6
data science chapter-4,5,6
 
Big data and Hadoop overview
Big data and Hadoop overviewBig data and Hadoop overview
Big data and Hadoop overview
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists How Cloud is Affecting Data Scientists
How Cloud is Affecting Data Scientists
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Kürzlich hochgeladen

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data

  • 1. 1 Advanced Analy,cs Part II: Do More with Your Data (and your people too..) DC: Rob Morrow, Senior Systems Engineer MD: Richard Im, Senior Systems Engineer September 3
  • 2. Doing more with Cloudera Enterprise Data Hub and why there is no Compe,,on to EDH. 2 Do more with data • Deliver mul*-­‐genre analy*cs in a single pla3orm • Apply diverse concurrent analy,cs to your full datasets in-­‐ place • Connect easily to Partner products • Not more copies of the data; More analy,cs without moving the data • The ONLY way to do that is with a single Tool-­‐Rich plaRorm that has Analy,cs baked-­‐in. It’s more than just Map/Reduce… • Protect the data products: • System Changes (Installa,on/Upgrades: CM) • Data Change (Navigator) • Data Over-­‐writes (Spark vs Storm) • Unauthorized Access (Sentry, Kerberos, LDAP) • Encryp,on, Data Movement (Rhino, Intel, Gazzang)
  • 3. Business Intelligence If all we want to know is what has already happened, then BI is a fine answer. 3 What happened, where, and when? Time Facts Interpreta,ons
  • 4. How about the Future? How about Root Cause? How About “What If’s”? How about “fuzzy” mul,variate ques,ons? 4 What will happen? What happened, where, and when? How can we do beeer? How and why did it happen? Time Data Size Facts Interpreta,ons
  • 5. 5 A drive-­‐by of Sta,s,cal op,ons on EDH Do more with data • Oryx: Recommender System used for Produc*on Purposes. Real-­‐Time Updates. • MADlib: In-­‐database Machine Learning libraries for Impala. • ImPyla: Python UDF’s inside Impala. Really fast. • Mahout: Yep, s*ll there. Easy-­‐to-­‐use Bayesian models, Classifiers, Clustering, etc. Requires data wrangling. • SparkML: Exact same algorithms as Mahout, only faster because it’s in-­‐memory. • SAS Visual Analy*cs: U*lizes Cloudera EDH as data hos*ng pla3orm. • Rhadoop/Revolu*on Analy*cs: Quickly becoming defacto Data Science pla3orm due to cost/performance • H2O: In-­‐Development, but very promising alterna*ve to R/ SAS
  • 6. Spark: Why do we need it? 10x beeer performance than M/R Less Code wrieen using Scala instead of Java Allows founda,onal improvements that drive many new features: in-­‐memory indexing, etc. 6
  • 7. 7 How Cloudera Helps… • Best Data Wrangling Tools on the planet, Period. And all of them at scale! • If you can write code in any language available inside Linux (Perl, shell, Java, Python, SQL), you can data wrangle on Cloudera’s EDH. Not to men,on partners like Informa,ca, Pemtaho, etc. • Advanced Analy,cal Methods backed-­‐in, at-­‐scale and many different ways to u,lize them. • Cloudera excels at having the right tools to answer tough, fuzzy ques,ons that have an impact on the real world. If we don’t have it, we invent it then give it back! • Try Another model, improve this one, track changes in results, all on one plaRorm—and securely! • The combina,on of Cloudera EDH’s Navigator, Spark, Mahout, Sentry means you can run advanced models in an Enterprise-­‐supportable way that allows granular tracking of every step. • Cloudera have been providing advanced methods to the Government for years. • Cloudera are deployed to answer other tough ques,ons all over the Intelligence Community (HUMINT< SIGINT<COMMINT<IMINT) Which Mission problem do you want to address? These DO NOT and WILL NOT show up in our marke,ng collateral. So, talk to your Account SE! Prepare Data Model Data Mgt Contextual Modelling
  • 8. 8 Advanced Analy,c Tools already on Cloudera’s Enterprise Data Hub • Advanced Informa,on Loca,ng methods Available out-­‐of-­‐the-­‐box in Cloudera Search • Fuzzy Search Finds “boat” and “float” with the same search query and is easily tunable. • Batch-­‐query allows users to upload a file with thousands/millions of search terms and which index they want to search and returns a down-­‐selectable set of data for all terms. • In-­‐memory indexing coming soon! • Graph Analy,cs Tools Extend EDH to uncover hidden rela,onships • Whether you choose Titan Aurelius, Giraph, or the Spark-­‐enabled GraphX for your Graph analy,c workload, Cloudera supports your Mission. • Spark technology was invented specifically to address itera,ve and self-­‐referen,al data at the speed of memory! • Machine Learning Methods to Predict future events and Rapidly Categorize current trends • Whether you choose a Naïve Bayesian filter to mobilize against Insider Threat, a Classifier to group objects for more effec,ve models, or Principal Components Analysis to reveal the 3 relevant factors out of thousands, Cloudera’s EDH is ready for your workload.
  • 9. 9 Current Ways to make decisions: I think we should go this way… HIghest Paid Person in the Organiza,on OR People like us more when we go this way… Poli*cs Subject MaYer Expert OR My training/ intui9on says we should go this way… VS Data Scien*st
  • 10. 10 Data Science Teams in the Real World. You already have the people… Es9mates required capital (people and poli9cal), team care/feeding. Chooses alterna9ves. Fallout of each path? Provides legit and relevant hypotheses. Provides models/ Probabili9es Provides Code, tools, code. Tools.
  • 11. 11 Strategies that Drive Data Science into your Organiza,on: • Choose a single Specific Business/Mission problem • Never Choose IT first, and then look for a problem it solves. Great way to create a Welfare Program, but the most expensive and lowest payback projects in Government have this approach in common. • Seek to Disconfirm an Idea/Approach • Never begin with an idea/tool you’d like to confirm. Science doesn’t work that way. Sorry. • Promote Equal Contribu,ons from Areas of Exper,se and Acquire Tools as-­‐needed • Do not allow micro-­‐steering due to “personali,es”. This is quite hard in prac,ce, actually. • Be honest about Team/Organiza,onal Weaknesses • Though it takes slightly longer, crea,ng a data-­‐oriented center of gravity reduces risk, increases effec,veness, and balances contribu,ons while naturally crea,ng independent measures of success. But you can’t begin un,l you understand the gaps.
  • 12. 12 Let’s Pick One Area and Go Deeper: Principal Components Analysis If a mul,variate dataset is viewed as a set of coordinates in a high-­‐dimensional data space, PCA can supply the user with a lower-­‐dimensionality picture. If the original data has 1B variables, how few of these variables do you need in order to predict 90% of the variance? 80%? 60%? How few dimensions do you need for a visualiza,on? Humans easily visualize 3D with current technology, but no more. A 25-­‐element high-­‐dimensional space vector (term co-­‐occurrences) of the word "road" rendered in gray scale. Original was 300K terms.
  • 13. Real-­‐World : Rescue Objec,vity from Subjec,vity People [who want money] complain of an increased focus on DoD training for “culture” over “language”, sta*ng that… A) “Culture” trains 13 concurrently whenever you train “Language” because the two are linked. B) This hypothesis isn’t testable because it’s far too subjec,ve and too domain-­‐ specific. Assessment Requires SME’s-­‐-­‐from the “language” skills community-­‐-­‐to make recommenda,ons. Circular logic, huh? J Yielding 2 Dis,nct Hypotheses: 1) “Culture” is focused upon more than “Language” in DoD Policy 2) “Language” is direc,onally linked to “Culture” (if you acquire “language”, you also acquire “culture”)
  • 14. Real-­‐World : Create Objec,vity from the Subjec,ve I Don’t buy it. So, we tested it and published the results… here’s how: Point 1: As a ra*o among Policy docs, “Culture” has fewer men*ons than Language in every document except 2. 14
  • 15. Algorithms: “Fuzzy” Hypothesis Tes,ng Point 2a: Ager building a high-­‐dimensional model based on weighted co-­‐occurrences which uses all English Language topics from Wikipedia to define “meaning”, it turns out the opposite is true. First, the Representa*onal Density of Culture is Higher Than Language (average distance is lower) 15
  • 16. Algorithms: “Fuzzy” Hypothesis Tes,ng Point 2b: Using the same model, we grabbed the 20 nearest-­‐neighbors to each word. Then, ager reducing the dimensionality, we showed: Though “Language” tends to include more func*on words and fewer seman*cally rich terms, it DOES have “Culture” as it’s nearest non-­‐deriva*ve neighbor. But the inverse is not true. Meaning: Knowing something about Culture contributes to Language, but knowing something about Language contributes far less to Culture (and is equivalent to or less than other noise/func*on words.) 16
  • 18. 18
  • 19. 19 Why do this stuff? • Results Published in peer-­‐reviewed scien,fic journal: • Abbe & Morrow, “Lexical and Seman,c Analysis of Culture and Foreign Language Policies”, Journal of Culture, Language and Interna9onal Security, May 2014 • It impacts the DoD/Government. • A problem exists when someone establishes an oligarchy: “You can’t measure it easily, therefore let me make all the decisions for you.” • It uses DATA to drive the decision, and not whichever sub-­‐ organiza,on is the most popular for this 5 minutes. • Inserts harmony and objec,vity. It’s a cool “fuzzy” problem.
  • 20. 20 Analy,cs to users: HUE • Included in EDH • Mul,-­‐capability interface for analy,cs • Interac,ve graph libraries • Customizable Search, Impala, Hive, Pig Apps • But Also: Tableau, Pentaho, PlaRora, ZoomData, SAS…
  • 21. 21 Cloudera Manager End-­‐to-­‐End Administra,on for CDH Manage Easily deploy, configure & op,mize clusters 1 Monitor Maintain a central view of all ac,vity 2 Diagnose Easily iden,fy and resolve issues 3 Integrate Use Cloudera Manager with exis,ng tools 4
  • 23. 23 2 3 Enterprise Services Inges,on & ETL Pilot Reference implementa,on up to 3 sources, 5 transforma,ons, 1 target Create, execute, test, and review a custom inges,on/ETL plan Security Integra,on Implementa,on of role based access control with the data processing environment Hadoop Cluster Deployment Cer,fica,on Fully review hardware, data sources, typical jobs, and exis,ng SLAs Develop, implement, benchmark, and document Hadoop deployment
  • 24. 24 Path to Success – Services & Training Hadoop Cluster Deployment Cer,fica,on 1 week Inges,on & ETL Pilot 2 weeks Security Integra,on 1 week Cloudera Admin Training 3 days Hive/Pig Training 2 days Data Science 3 days Developer Training 4 days
  • 25. 25 Nomina*ons are open for the Data Impact Awards! Submission deadline: September 12th • Winners will receive: • Free Strata + Hadoop World pass • Free seat to any public Cloudera University Training • Invita,on to exclusive awards dinner • Bragging rights ©2014 Cloudera, Inc. All rights reserved.