SlideShare a Scribd company logo
1 of 30
1	
  
Advanced	
  Analy,cs	
  Part	
  I:	
  Use	
  All	
  Your	
  Data	
  
	
  
DC:	
  Rob	
  Morrow,	
  Senior	
  Systems	
  Engineer	
  
MD:	
  Chris	
  Bove,	
  Senior	
  Systems	
  Engineer	
  
August	
  6	
  
2	
  
From	
  BI	
  to	
  Advanced	
  Analy,cs	
  
2	
  
What	
  happened,	
  
where,	
  
	
  and	
  when?	
  
What	
  will	
  
happen?	
  
How	
  and	
  why	
  
did	
  it	
  happen?	
  
How	
  can	
  we	
  do	
  
beLer?	
  
Time	
  
Data	
  Size	
  
Facts	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Interpreta,ons	
  
3	
  
Tradi,onal	
  Analy,cs	
  Process	
  
3	
  
Opera,onalize	
  
Model	
  
In-­‐Database	
  
Model	
  
Scoring	
  
Data	
  
Cleansing	
  &	
  
Processing	
  
Data	
  
Extrac,on	
  
Data	
  Explora,on	
  &	
  
Discovery	
  
In-­‐Memory	
  
Model	
  
Development	
  
Time-­‐to-­‐Insight	
  
4	
  
Accessing	
  &	
  Sharing	
  the	
  Data	
  is	
  Difficult	
  	
  
DW	
  
External	
  Mul7-­‐structured	
  Structured	
  
5	
  
“Are	
  we	
  there	
  yet?”	
  
1. Find
the data
2. Get access
to data
4. Move
sample data
to ADW
5. Analysis
Finally!
6. Operationalize
the model
3. Learn
about the data
Data	
  Discovery:	
  
6-­‐9	
  Months	
  
6	
  
Silo’d	
  PlaZorms	
  Challenge	
  Collabora,on	
  
6	
  
Departmental	
  
Warehouse	
  
Non-­‐Agile	
  Models	
  
Enterprise	
  
Apps	
  
Repor,ng	
  
Prioritized
Operational Processes
Departmental	
  
Warehouse	
  
Silo’d	
  
Analy7cs	
  
Sta,c	
  schemas	
  
accrete	
  over	
  ,me	
  
Data	
  
Sources	
  
Silo’d	
  
Analy7cs	
  
7	
  
1. Find
the data
2. Get access
To data
4. Move to
ADW
5. Analysis
Finally!
6. Operationalize
the model
3. Learn
about the data
6-­‐9	
  Months	
  
Users	
  &	
  Business	
  
Influencers	
  
Data	
  Scien,st,	
  	
  
Business	
  Analysts	
  
“I’m	
  sick	
  of	
  wai.ng	
  for	
  my	
  
data,	
  I’m	
  going	
  to	
  make	
  my	
  
own	
  copy.”	
  
Technical	
  
Influencers	
  
DBA/DW	
  Admins	
  
“I	
  need	
  to	
  get	
  those	
  data	
  
scien.sts	
  the	
  data	
  they	
  want,	
  
or	
  else	
  they	
  will	
  stand	
  up	
  
another	
  data	
  mart,	
  I	
  will	
  have	
  
to	
  manage	
  it	
  sooner	
  or	
  later.”	
  
Ouch!	
  
7	
  
Execu,ves	
  
Execu,ve	
  Sponsors,	
  	
  
LOB	
  Manager	
  	
  
(PM,	
  Director,	
  R&D,	
  etc.)	
  
“We	
  don’t	
  have	
  the	
  
informa.on	
  we	
  need	
  to	
  
answer	
  key	
  business	
  
ques.ons.”	
  
8	
  
Unified	
  Scale-­‐out	
  Storage	
  
For	
  Any	
  Type	
  of	
  Data	
  
Elas,c,	
  Fault-­‐tolerant,	
  Self-­‐healing,	
  In-­‐memory	
  capabili,es	
  
Resource	
  Management	
  
Batch	
  	
  
Processing	
  
Analy,c	
  	
  
MPP	
  DBMS	
  
Search	
  	
  
Engine	
  
Online	
  
NoSQL	
  	
  
DBMS	
  
Stream	
  	
  
Processing	
  
Machine	
  	
  
Learning	
  
SQL	
   Streaming	
   File	
  System	
  (NFS)	
  
Data	
  
Management	
  
System	
  
Management	
  
Metadata,	
  Security,	
  Audit,	
  Lineage	
  
Training	
  &	
  Services	
  
Solu,on:	
  Cloudera	
  EDH	
  
8	
  
Search	
  	
  
Faster	
  data	
  
discovery	
  
Navigator	
  
Mul7ple	
  tools	
  on	
  
one	
  plaGorm	
  
Impala	
   Spark	
  
Hadoop	
  	
  
Map	
  
Reduce	
  
Use	
  all	
  data	
  with	
  
centralized	
  mgmt	
  
&	
  security	
  
Metadata,	
  Security	
  
Cloudera	
  Manager	
  
Training	
  &	
  Services	
  
Opera7onalize	
  
Models	
  
Flume	
  /	
  
Spark	
  
Streaming	
  
HBase	
  
9	
  
	
  	
  	
  	
  Enterprise	
  Data	
  Hub	
  
Unified	
  Scale-­‐out	
  Storage	
  
For	
  Any	
  Type	
  of	
  Data	
  
Elas,c,	
  Fault-­‐tolerant,	
  Self-­‐healing,	
  In-­‐memory	
  capabili,es	
  
Resource	
  Management	
  
Batch	
  	
  
Processing	
  
Analy,c	
  	
  
MPP	
  DBMS	
  
Search	
  	
  
Engine	
  
Online	
  
NoSQL	
  	
  
DBMS	
  
Stream	
  	
  
Processing	
  
Machine	
  	
  
Learning	
  
SQL	
   Streaming	
   File	
  System	
  (NFS)	
  
Data	
  
Management	
  
System	
  
Management	
  
Metadata,	
  Security,	
  Audit,	
  Lineage	
  
Training	
  &	
  Services	
  
Solu,on:	
  Cloudera	
  EDH	
  
9	
  
10	
  
Analy,cs	
  with	
  EDH	
  
10	
  
Opera,onalize	
  
Model	
  
In-­‐Database	
  
Model	
  
Scoring	
  
Data	
  
Cleansing	
  &	
  
Processing	
  
Data	
  Explora,on	
  &	
  
Discovery	
  
In-­‐Memory	
  
Model	
  
Development	
  
Time-­‐to-­‐Insight	
  
Data	
  
Explora,on	
  &	
  
Discovery	
  
Data	
  
Cleansing	
  &	
  
Processing	
  
Opera,onalize	
  
Model	
  
Data	
  
Extrac,on	
  
In-­‐PlaGorm	
  
Model	
  Dev	
  &	
  
Scoring	
  
Deliver	
  Insight	
  Sooner	
  
11	
  
Solu,on	
  Benefits	
  
•  Use	
  100x	
  more	
  data,	
  and	
  more	
  types	
  of	
  data,	
  with	
  exis,ng	
  tools	
  	
  
•  Reduce	
  sampling	
  and	
  increase	
  model	
  accuracy	
  and	
  precision	
  
•  Centralize	
  informa,on	
  security,	
  metadata,	
  management,	
  and	
  
governance	
  
Use	
  all	
  your	
  
data	
  
•  Compress	
  the	
  cycle	
  7me	
  from	
  data	
  to	
  insights	
  
•  Facilitate	
  data	
  discovery	
  with	
  real-­‐,me	
  SQL	
  and	
  Search	
  
•  Track	
  data	
  life-­‐cycle	
  in	
  place	
  
•  Define,	
  test,	
  deploy,	
  and	
  update	
  models	
  all	
  within	
  the	
  EDH	
  
Shorten	
  
analy,cs	
  
lifecycle	
  
•  Deliver	
  mul7-­‐genre	
  analy7cs	
  in	
  a	
  single	
  plaGorm	
  
•  Apply	
  diverse	
  concurrent	
  analy,cs	
  to	
  your	
  full	
  datasets	
  in-­‐place	
  
•  Protect	
  exis,ng	
  technology	
  and	
  skillset	
  investments	
  
Do	
  more	
  with	
  
data	
  
11	
  
12	
  
“I’m	
  sick	
  of	
  wai.ng	
  for	
  my	
  
data,	
  I’m	
  going	
  to	
  make	
  
my	
  own	
  copy.”	
  
“I	
  need	
  to	
  get	
  those	
  data	
  
scien.sts	
  the	
  data	
  they	
  
want,	
  or	
  else	
  they	
  will	
  
stand	
  up	
  another	
  data	
  
mart,	
  which	
  I	
  will	
  have	
  to	
  
manage	
  sooner	
  or	
  later.”	
  
“We	
  don’t	
  have	
  the	
  
informa.on	
  we	
  need	
  to	
  
answer	
  key	
  business	
  
ques.ons.”	
  
Data	
  Scien,st,	
  	
  
Business	
  Analysts	
  
DBA/DW	
  Admins	
  
Execu,ve	
  Sponsors,	
  	
  
LOB	
  Manager	
  	
  
(Marke,ng,	
  Sales,	
  R&D,	
  
etc.)	
  
•  Acquire	
  data	
  necessary	
  for	
  
projects	
  
•  Develop	
  analysis/models	
  
with	
  beLer	
  fit	
  faster	
  	
  
•  Share	
  data	
  sets	
  to	
  
empower	
  others	
  
•  Spend	
  less	
  ,me	
  and	
  
money	
  reconciling	
  
shadow	
  IT	
  environments	
  
•  Shared	
  security,	
  
metadata,	
  management,	
  
and	
  governance	
  
•  Acquire	
  necessary	
  
informa,on	
  sooner	
  to	
  
make	
  cri,cal	
  business	
  
decisions	
  
Business	
  Value	
  Delivered	
  
12	
  
Users	
  &	
  Business	
  
Influencers	
  
Technical	
  
Influencers	
  
Execu,ves	
  
Buyers	
  
13	
  
Thrio	
  pdf/
Word/txt	
  
csv	
  
Data	
  Access:	
  Stores	
  and	
  Connectors	
  
13	
  
CONNECTORS	
  
ORACLE	
  
NETEZZA	
  
ODBC/JDBC	
  
TERADATA	
  
MongoDB	
  
Splunk/Hunk	
  
MICROSTRATEGY	
  
IMPALA	
  
HBASE	
  
SOLR	
  
SPARK	
  
ACCUMULO	
  
ZoomData	
  
Hive	
  
Sqoop	
  
Flume	
  
Partner	
  Na,ve	
  
Connectors	
  
Revolu7on	
  R	
  
Parquet	
  
Sequenc
e	
  
JSON	
  
Binary	
  
SkyTree	
  
Avro	
  
14	
  
Historical	
  Archive:	
  Tape	
  vs	
  Data	
  
14	
  
•  Direct	
  access	
  to	
  data	
  has	
  value,	
  Data	
  Stored	
  offsite/offline	
  has	
  cost	
  
•  A	
  single	
  8k	
  record	
  may	
  have	
  nearly	
  zero	
  value,	
  but	
  10,000?	
  10,000,000?	
  
•  What	
  is	
  Business	
  value	
  of	
  tes,ng	
  the	
  predic,ve	
  power	
  of	
  current	
  data?	
  
Aggregate	
  Data	
  
Value	
  
•  Assuming	
  locality,	
  Is	
  110MB	
  per	
  drive	
  fast	
  enough?	
  
•  Certainly	
  not	
  fast	
  enough	
  to	
  be	
  included	
  in	
  any	
  current	
  analy,cs.	
  
•  Striping	
  across	
  tape	
  drives	
  is	
  Science-­‐Fic,on.	
  Complex	
  Tiers,	
  anyone?	
  
•  I/O	
  IS	
  the	
  problem.	
  Not	
  CPU.	
  
Data	
  
Availability	
  
•  Everyone	
  prac7ces	
  Backups.	
  How	
  about	
  Restores?	
  Full	
  site	
  restores?	
  
•  Can’t	
  we	
  just	
  more	
  aggressively	
  compress	
  online	
  data?	
  
•  “Tape	
  is	
  cheap”.	
  It	
  had	
  beLer	
  be,	
  because	
  the	
  data	
  isn’t	
  easily	
  usable.	
  
Data	
  Volume/
Cost	
  
15	
  
Spark	
  Streaming:	
  What	
  is	
  it?	
  
15	
  
Spark	
  is	
  processed	
  in	
  micro-­‐batches:	
  
Resilient	
  Distributed	
  Datasets	
  (RDD)	
  
Consistent	
  with	
  HDFS	
  Architectural	
  Principles	
  
Processing	
  individual	
  records	
  creates	
  inconsistencies	
  (simultaneous	
  writes),	
  AKA	
  Storm.	
  
16	
  
What	
  can	
  you	
  do	
  with	
  it?:	
  Stream	
  It	
  
16	
  
Streaming	
  “Windows”	
  allows	
  ,me-­‐sliced	
  atomic	
  updates	
  to	
  Analy,cs	
  
Discre,zed	
  Stream	
  (DStream):	
  
Sequence	
  of	
  RDD’s	
  arranged	
  as	
  lines/
words	
  
Window:	
  Sequence	
  of	
  DStreams	
  ,me-­‐
arranged	
  as	
  windows	
  
17	
  
What	
  can	
  you	
  do	
  with	
  it?:	
  ML	
  
17	
  
Spark-­‐ML:	
  Same	
  Input	
  format	
  and	
  algorithms	
  as	
  
Mahout.	
  
Uses	
  Resilient	
  Distributed	
  DataSets	
  In-­‐Memory	
  
	
  
Useful	
  for:	
  
Clustering	
  (k-­‐Means,	
  etc)	
  
Classifica,on	
  (email,	
  sen,ment)	
  
Recommenders	
  (ra,ngs	
  correla,on)	
  
Dimensionality	
  Reduc,on	
  (PCA,	
  SVD)	
  
“What	
  about	
  
Machine	
  
Learning?”	
  
18	
  
Model	
  Effec,veness	
  and	
  Sampling	
  
•  Some	
  Sta,s,cians	
  (medical)	
  find	
  it	
  hard	
  to	
  turn	
  the	
  corner	
  on	
  the	
  sampling	
  topic:	
  
•  ANOVA	
  vs	
  Mul,ple	
  Regression.	
  Same	
  tests**,	
  one’s	
  a	
  vector	
  without	
  the	
  Power	
  
problems	
  
•  Algorithm	
  choice	
  should	
  be	
  related	
  to,	
  not	
  restricted	
  by,	
  data	
  volume.	
  
•  Best	
  approach	
  =	
  simple	
  algorithm,	
  lots	
  of	
  data	
  
•  Sampling	
  should	
  s7ll	
  be	
  used,	
  but	
  to	
  test	
  model	
  effec7veness.	
  Not	
  to	
  fix	
  IT.	
  
**Source:	
  Applied	
  Mul,ple	
  Regression/Correla,on	
  Analysis	
  (Cohen	
  &	
  Cohen,	
  1983)	
  
19	
  
Which	
  dataset	
  offers	
  beLer	
  predic,ve	
  power?	
  
Remember,	
  this	
  is	
  not	
  tes,ng	
  for	
  an	
  effect…	
  
Alic
e	
  
Bo
b	
  
Chuc
k	
  
Donna	
   Eddi
e	
  
Frank	
   Gina	
  
Uses	
  work	
  
computer	
  
for	
  
shopping	
  
1	
   4	
   5	
   1	
  
Moves	
  
data	
  
between	
  
networks	
  
4	
   5	
   2	
  
Works	
  long	
  
hours	
  
4	
   3	
   3	
  
System/
Network	
  
admin	
  
Privs	
  
5	
  
Alice	
   Bo
b	
  
Chuc
k	
  
Donn
a	
  
Eddi
e	
  
Frank	
   Gina	
  
Uses	
  work	
  
computer	
  
for	
  
shopping	
  
1	
   4	
   2	
   4	
   5	
   1	
   3	
  
Moves	
  
data	
  
between	
  
networks	
  
4	
   3	
   1	
   5	
   1	
   4	
   3	
  
Works	
  long	
  
hours	
  
2	
   4	
   3	
   3	
   4	
   3	
   2	
  
System/
Network	
  
admin	
  
Privs	
  
1	
   2	
   1	
   5	
   3	
   5	
   4	
  
1	
   2	
  
1.	
  As	
  we	
  add	
  dimensions,	
  average	
  distance	
  increases.	
  Add	
  Data.	
  	
  
2.	
  Fewer	
  “neighbors”	
  within	
  a	
  certain	
  radius	
  of	
  any	
  given	
  point	
  when	
  the	
  dataset	
  
is	
  smaller.	
  Add	
  Data.	
  
3.	
  Are	
  you	
  looking	
  at	
  similarity	
  (r/cosine)	
  or	
  are	
  you	
  using	
  dissimilarity	
  (Euclidean)?	
  
20	
  
Algorithms:	
  Clustering	
  
	
  
Sort	
  documents,	
  emails,	
  
objects	
  by	
  text	
  class	
  and	
  
group	
  terms/documents	
  
into	
  dis,nct	
  categories.	
  
	
  
Produce	
  visualiza,on.	
  
Ques,on:	
  What’s	
  an	
  emerging	
  topic	
  among	
  users?	
  
21	
  
Algorithms:	
  Naïve	
  Bayesian	
  Classifier	
  
Given	
  a	
  training	
  
set,	
  sort	
  
documents	
  by	
  
content:	
  Spam/
Not,	
  Religion/
Poli,cs/Art,	
  etc.	
  
Ques,on:	
  Which	
  content	
  “looks	
  like”	
  other	
  content?	
  
22	
  
Algorithms:	
  Recommender	
  Systems	
  
•  User-­‐based	
  filtering	
  for	
  cold	
  start	
  
(AKA	
  “likes”)	
  
	
  
•  Item-­‐based	
  (user	
  similarity)	
  
filtering	
  once	
  there	
  is	
  sufficient	
  
user	
  data	
  
Ques,on:	
  If	
  user	
  thinks	
  “A”	
  is	
  useful,	
  how	
  about	
  “B”,	
  “C”?	
  
	
   	
  How	
  similar	
  is	
  one	
  user’s	
  paLern	
  to	
  another?	
  
23	
  
Easily	
  Convert	
  between	
  bits/bytes	
  and	
  
numbers/words	
  with	
  Avro	
  
•  Serializa,on	
  
•  Expressive	
  
•  Records,	
  arrays,	
  unions,	
  enums	
  	
  
•  Efficient	
  
•  Compact	
  binary,	
  compressed,	
  spliLable	
  	
  
•  Interoperable	
  	
  
•  Langs:	
  C,	
  C++,	
  C#,	
  Java,	
  Perl,	
  Python,	
  Ruby,	
  PHP	
  	
  
•  Tools:	
  MR,	
  Pig,	
  Hive,	
  Crunch,	
  Flume,	
  Sqoop,	
  etc	
  	
  
•  Dynamic	
  	
  
•  Can	
  read	
  &	
  write	
  w/o	
  genera,ng	
  code	
  first	
  	
  
•  Evolvable	
  	
  
	
  
24	
  
Query	
  results	
  from	
  large	
  analyses	
  in	
  Impala	
  
•  Brings	
  real-­‐,me	
  query	
  capabili,es	
  to	
  Hadoop	
  
•  It’s	
  fast!	
  Na,vely	
  wriLen	
  
	
  	
  	
  in	
  C++	
  
•  Same	
  great	
  SQL	
  query	
  
	
  	
  	
  language	
  as	
  Hive	
  
25	
  
Analy,cs	
  to	
  users:	
  HUE	
  
•  Included	
  in	
  EDH	
  
•  Mul,-­‐capability	
  
interface	
  for	
  analy,cs	
  
•  Interac,ve	
  graph	
  
libraries	
  
•  Customizable	
  Search,	
  
Impala,	
  Hive,	
  Pig	
  Apps	
  
•  But	
  Also:	
  Tableau,	
  
Pentaho,	
  PlaZora,	
  
ZoomData,	
  SAS…	
  
26	
  
Cloudera	
  Manager	
  
End-­‐to-­‐End	
  Administra,on	
  for	
  CDH	
  
Manage	
  
Easily	
  deploy,	
  configure	
  &	
  op,mize	
  clusters	
  1
Monitor	
  
Maintain	
  a	
  central	
  view	
  of	
  all	
  ac,vity	
  2
Diagnose	
  
Easily	
  iden,fy	
  and	
  resolve	
  issues	
  3
Integrate	
  
Use	
  Cloudera	
  Manager	
  with	
  exis,ng	
  tools	
  
4
Thank	
  You!	
  
28	
  
2
8	
  
Enterprise	
  Services	
  
Inges,on	
  &	
  ETL	
  
Pilot	
  
Reference	
  implementa,on	
  up	
  to	
  3	
  sources,	
  5	
  transforma,ons,	
  1	
  target	
  
Create,	
  execute,	
  test,	
  and	
  review	
  a	
  custom	
  inges,on/ETL	
  plan	
  
Security	
  
Integra,on	
  
	
  
Implementa,on	
  of	
  role	
  based	
  access	
  control	
  with	
  the	
  data	
  
processing	
  environment	
  
Hadoop	
  Cluster	
  
Deployment	
  
Cer,fica,on	
  
Fully	
  review	
  hardware,	
  data	
  sources,	
  typical	
  jobs,	
  and	
  exis,ng	
  SLAs	
  
Develop,	
  implement,	
  benchmark,	
  and	
  document	
  Hadoop	
  deployment	
  
29	
  
Path to Success – Services & Training	
  
Hadoop	
  Cluster	
  
Deployment	
  Cer,fica,on	
  
	
  
1	
  week	
  
Inges,on	
  &	
  ETL	
  Pilot	
  
	
  
2	
  weeks	
  
Security	
  Integra,on	
  
	
  
1	
  week	
  
Cloudera	
  Admin	
  Training	
  
	
  
3	
  days	
  
Hive/Pig	
  Training	
  
	
  	
  
2	
  days	
  	
  
Data	
  
Science	
  
	
  
3	
  days	
  
Developer	
  
Training	
  
	
  
4	
  days	
  
30	
   ©2014	
  Cloudera,	
  Inc.	
  All	
  
rights	
  reserved.	
  
•  Winners	
  will	
  receive:	
  
•  Free	
  Strata	
  +	
  Hadoop	
  World	
  pass	
  
•  Free	
  seat	
  to	
  any	
  public	
  Cloudera	
  
University	
  Training	
  
•  Invita,on	
  to	
  exclusive	
  awards	
  dinner	
  
•  Bragging	
  rights	
  	
  
Nomina7ons	
  are	
  open	
  for	
  	
  
the	
  Data	
  Impact	
  Awards!	
  
Submission	
  deadline:	
  September	
  12th	
  

More Related Content

What's hot

New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...Cloudera, Inc.
 
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...ArabNet ME
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_WebinarSean Spediacci
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsCloudera, Inc.
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudCloudera, Inc.
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchCloudera, Inc.
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Cloudera, Inc.
 
Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersCloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingCloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Customer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCustomer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCloudera, Inc.
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...DataStax
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Cloudera, Inc.
 
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Data Con LA
 
Debunking Common Myths of Hadoop Backup & Test Data Management
Debunking Common Myths of Hadoop Backup & Test Data ManagementDebunking Common Myths of Hadoop Backup & Test Data Management
Debunking Common Myths of Hadoop Backup & Test Data ManagementImanis Data
 

What's hot (20)

New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
New Performance Benchmarks: Apache Impala (incubating) Leads Traditional Anal...
 
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
 
Data Drive Applications_Webinar
Data Drive Applications_WebinarData Drive Applications_Webinar
Data Drive Applications_Webinar
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber Threats
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
 
Secure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game ChangersSecure Data - Why Encryption and Access Control are Game Changers
Secure Data - Why Encryption and Access Control are Game Changers
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
Managing Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team BuildingManaging Successful Data Projects: Technology Selection and Team Building
Managing Successful Data Projects: Technology Selection and Team Building
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Customer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCustomer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWS
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
 
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
 
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
 
Debunking Common Myths of Hadoop Backup & Test Data Management
Debunking Common Myths of Hadoop Backup & Test Data ManagementDebunking Common Myths of Hadoop Backup & Test Data Management
Debunking Common Myths of Hadoop Backup & Test Data Management
 

Viewers also liked

Dutch Interactive Awards - Nominees and Jury feedback
Dutch Interactive Awards - Nominees and Jury feedbackDutch Interactive Awards - Nominees and Jury feedback
Dutch Interactive Awards - Nominees and Jury feedbackAntoaneta Kyoseva
 
Cloudera Federal Forum 2014: A 360 Degree View of the Insider Threat
Cloudera Federal Forum 2014: A 360 Degree View of the Insider ThreatCloudera Federal Forum 2014: A 360 Degree View of the Insider Threat
Cloudera Federal Forum 2014: A 360 Degree View of the Insider ThreatCloudera, Inc.
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupCloudera, Inc.
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptxMoldovan Radu Adrian
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
The Benefits of Predictive and Proactive Support for an Enterprise Data Hub
The Benefits of Predictive and Proactive Support for an Enterprise Data HubThe Benefits of Predictive and Proactive Support for an Enterprise Data Hub
The Benefits of Predictive and Proactive Support for an Enterprise Data HubCloudera, Inc.
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.
 
Nhom 16 big data
Nhom 16 big dataNhom 16 big data
Nhom 16 big dataDuy Phan
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationmattlieber
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
Live Seminar Cloudera & Big Data Ecosystem
Live Seminar Cloudera & Big Data Ecosystem Live Seminar Cloudera & Big Data Ecosystem
Live Seminar Cloudera & Big Data Ecosystem Xpand IT
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 

Viewers also liked (17)

Dutch Interactive Awards - Nominees and Jury feedback
Dutch Interactive Awards - Nominees and Jury feedbackDutch Interactive Awards - Nominees and Jury feedback
Dutch Interactive Awards - Nominees and Jury feedback
 
Cloudera Federal Forum 2014: A 360 Degree View of the Insider Threat
Cloudera Federal Forum 2014: A 360 Degree View of the Insider ThreatCloudera Federal Forum 2014: A 360 Degree View of the Insider Threat
Cloudera Federal Forum 2014: A 360 Degree View of the Insider Threat
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
 
SQL on Accumulo
SQL on AccumuloSQL on Accumulo
SQL on Accumulo
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptx
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
The Benefits of Predictive and Proactive Support for an Enterprise Data Hub
The Benefits of Predictive and Proactive Support for an Enterprise Data HubThe Benefits of Predictive and Proactive Support for an Enterprise Data Hub
The Benefits of Predictive and Proactive Support for an Enterprise Data Hub
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
 
Nhom 16 big data
Nhom 16 big dataNhom 16 big data
Nhom 16 big data
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
Live Seminar Cloudera & Big Data Ecosystem
Live Seminar Cloudera & Big Data Ecosystem Live Seminar Cloudera & Big Data Ecosystem
Live Seminar Cloudera & Big Data Ecosystem
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 

Similar to Cloudera Breakfast Series, Analytics Part 1: Use All Your Data

A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platformmartinbpeters
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningSergey Karayev
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos MilovanovicInstitute of Contemporary Sciences
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkMukesh Singh
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Ahmed Kamal
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 

Similar to Cloudera Breakfast Series, Analytics Part 1: Use All Your Data (20)

A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Spark
SparkSpark
Spark
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 Planning and Optimizing Data Lake Architecture - Milos Milovanovic Planning and Optimizing Data Lake Architecture - Milos Milovanovic
Planning and Optimizing Data Lake Architecture - Milos Milovanovic
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Recently uploaded (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Cloudera Breakfast Series, Analytics Part 1: Use All Your Data

  • 1. 1   Advanced  Analy,cs  Part  I:  Use  All  Your  Data     DC:  Rob  Morrow,  Senior  Systems  Engineer   MD:  Chris  Bove,  Senior  Systems  Engineer   August  6  
  • 2. 2   From  BI  to  Advanced  Analy,cs   2   What  happened,   where,    and  when?   What  will   happen?   How  and  why   did  it  happen?   How  can  we  do   beLer?   Time   Data  Size   Facts                                              Interpreta,ons  
  • 3. 3   Tradi,onal  Analy,cs  Process   3   Opera,onalize   Model   In-­‐Database   Model   Scoring   Data   Cleansing  &   Processing   Data   Extrac,on   Data  Explora,on  &   Discovery   In-­‐Memory   Model   Development   Time-­‐to-­‐Insight  
  • 4. 4   Accessing  &  Sharing  the  Data  is  Difficult     DW   External  Mul7-­‐structured  Structured  
  • 5. 5   “Are  we  there  yet?”   1. Find the data 2. Get access to data 4. Move sample data to ADW 5. Analysis Finally! 6. Operationalize the model 3. Learn about the data Data  Discovery:   6-­‐9  Months  
  • 6. 6   Silo’d  PlaZorms  Challenge  Collabora,on   6   Departmental   Warehouse   Non-­‐Agile  Models   Enterprise   Apps   Repor,ng   Prioritized Operational Processes Departmental   Warehouse   Silo’d   Analy7cs   Sta,c  schemas   accrete  over  ,me   Data   Sources   Silo’d   Analy7cs  
  • 7. 7   1. Find the data 2. Get access To data 4. Move to ADW 5. Analysis Finally! 6. Operationalize the model 3. Learn about the data 6-­‐9  Months   Users  &  Business   Influencers   Data  Scien,st,     Business  Analysts   “I’m  sick  of  wai.ng  for  my   data,  I’m  going  to  make  my   own  copy.”   Technical   Influencers   DBA/DW  Admins   “I  need  to  get  those  data   scien.sts  the  data  they  want,   or  else  they  will  stand  up   another  data  mart,  I  will  have   to  manage  it  sooner  or  later.”   Ouch!   7   Execu,ves   Execu,ve  Sponsors,     LOB  Manager     (PM,  Director,  R&D,  etc.)   “We  don’t  have  the   informa.on  we  need  to   answer  key  business   ques.ons.”  
  • 8. 8   Unified  Scale-­‐out  Storage   For  Any  Type  of  Data   Elas,c,  Fault-­‐tolerant,  Self-­‐healing,  In-­‐memory  capabili,es   Resource  Management   Batch     Processing   Analy,c     MPP  DBMS   Search     Engine   Online   NoSQL     DBMS   Stream     Processing   Machine     Learning   SQL   Streaming   File  System  (NFS)   Data   Management   System   Management   Metadata,  Security,  Audit,  Lineage   Training  &  Services   Solu,on:  Cloudera  EDH   8   Search     Faster  data   discovery   Navigator   Mul7ple  tools  on   one  plaGorm   Impala   Spark   Hadoop     Map   Reduce   Use  all  data  with   centralized  mgmt   &  security   Metadata,  Security   Cloudera  Manager   Training  &  Services   Opera7onalize   Models   Flume  /   Spark   Streaming   HBase  
  • 9. 9          Enterprise  Data  Hub   Unified  Scale-­‐out  Storage   For  Any  Type  of  Data   Elas,c,  Fault-­‐tolerant,  Self-­‐healing,  In-­‐memory  capabili,es   Resource  Management   Batch     Processing   Analy,c     MPP  DBMS   Search     Engine   Online   NoSQL     DBMS   Stream     Processing   Machine     Learning   SQL   Streaming   File  System  (NFS)   Data   Management   System   Management   Metadata,  Security,  Audit,  Lineage   Training  &  Services   Solu,on:  Cloudera  EDH   9  
  • 10. 10   Analy,cs  with  EDH   10   Opera,onalize   Model   In-­‐Database   Model   Scoring   Data   Cleansing  &   Processing   Data  Explora,on  &   Discovery   In-­‐Memory   Model   Development   Time-­‐to-­‐Insight   Data   Explora,on  &   Discovery   Data   Cleansing  &   Processing   Opera,onalize   Model   Data   Extrac,on   In-­‐PlaGorm   Model  Dev  &   Scoring   Deliver  Insight  Sooner  
  • 11. 11   Solu,on  Benefits   •  Use  100x  more  data,  and  more  types  of  data,  with  exis,ng  tools     •  Reduce  sampling  and  increase  model  accuracy  and  precision   •  Centralize  informa,on  security,  metadata,  management,  and   governance   Use  all  your   data   •  Compress  the  cycle  7me  from  data  to  insights   •  Facilitate  data  discovery  with  real-­‐,me  SQL  and  Search   •  Track  data  life-­‐cycle  in  place   •  Define,  test,  deploy,  and  update  models  all  within  the  EDH   Shorten   analy,cs   lifecycle   •  Deliver  mul7-­‐genre  analy7cs  in  a  single  plaGorm   •  Apply  diverse  concurrent  analy,cs  to  your  full  datasets  in-­‐place   •  Protect  exis,ng  technology  and  skillset  investments   Do  more  with   data   11  
  • 12. 12   “I’m  sick  of  wai.ng  for  my   data,  I’m  going  to  make   my  own  copy.”   “I  need  to  get  those  data   scien.sts  the  data  they   want,  or  else  they  will   stand  up  another  data   mart,  which  I  will  have  to   manage  sooner  or  later.”   “We  don’t  have  the   informa.on  we  need  to   answer  key  business   ques.ons.”   Data  Scien,st,     Business  Analysts   DBA/DW  Admins   Execu,ve  Sponsors,     LOB  Manager     (Marke,ng,  Sales,  R&D,   etc.)   •  Acquire  data  necessary  for   projects   •  Develop  analysis/models   with  beLer  fit  faster     •  Share  data  sets  to   empower  others   •  Spend  less  ,me  and   money  reconciling   shadow  IT  environments   •  Shared  security,   metadata,  management,   and  governance   •  Acquire  necessary   informa,on  sooner  to   make  cri,cal  business   decisions   Business  Value  Delivered   12   Users  &  Business   Influencers   Technical   Influencers   Execu,ves   Buyers  
  • 13. 13   Thrio  pdf/ Word/txt   csv   Data  Access:  Stores  and  Connectors   13   CONNECTORS   ORACLE   NETEZZA   ODBC/JDBC   TERADATA   MongoDB   Splunk/Hunk   MICROSTRATEGY   IMPALA   HBASE   SOLR   SPARK   ACCUMULO   ZoomData   Hive   Sqoop   Flume   Partner  Na,ve   Connectors   Revolu7on  R   Parquet   Sequenc e   JSON   Binary   SkyTree   Avro  
  • 14. 14   Historical  Archive:  Tape  vs  Data   14   •  Direct  access  to  data  has  value,  Data  Stored  offsite/offline  has  cost   •  A  single  8k  record  may  have  nearly  zero  value,  but  10,000?  10,000,000?   •  What  is  Business  value  of  tes,ng  the  predic,ve  power  of  current  data?   Aggregate  Data   Value   •  Assuming  locality,  Is  110MB  per  drive  fast  enough?   •  Certainly  not  fast  enough  to  be  included  in  any  current  analy,cs.   •  Striping  across  tape  drives  is  Science-­‐Fic,on.  Complex  Tiers,  anyone?   •  I/O  IS  the  problem.  Not  CPU.   Data   Availability   •  Everyone  prac7ces  Backups.  How  about  Restores?  Full  site  restores?   •  Can’t  we  just  more  aggressively  compress  online  data?   •  “Tape  is  cheap”.  It  had  beLer  be,  because  the  data  isn’t  easily  usable.   Data  Volume/ Cost  
  • 15. 15   Spark  Streaming:  What  is  it?   15   Spark  is  processed  in  micro-­‐batches:   Resilient  Distributed  Datasets  (RDD)   Consistent  with  HDFS  Architectural  Principles   Processing  individual  records  creates  inconsistencies  (simultaneous  writes),  AKA  Storm.  
  • 16. 16   What  can  you  do  with  it?:  Stream  It   16   Streaming  “Windows”  allows  ,me-­‐sliced  atomic  updates  to  Analy,cs   Discre,zed  Stream  (DStream):   Sequence  of  RDD’s  arranged  as  lines/ words   Window:  Sequence  of  DStreams  ,me-­‐ arranged  as  windows  
  • 17. 17   What  can  you  do  with  it?:  ML   17   Spark-­‐ML:  Same  Input  format  and  algorithms  as   Mahout.   Uses  Resilient  Distributed  DataSets  In-­‐Memory     Useful  for:   Clustering  (k-­‐Means,  etc)   Classifica,on  (email,  sen,ment)   Recommenders  (ra,ngs  correla,on)   Dimensionality  Reduc,on  (PCA,  SVD)   “What  about   Machine   Learning?”  
  • 18. 18   Model  Effec,veness  and  Sampling   •  Some  Sta,s,cians  (medical)  find  it  hard  to  turn  the  corner  on  the  sampling  topic:   •  ANOVA  vs  Mul,ple  Regression.  Same  tests**,  one’s  a  vector  without  the  Power   problems   •  Algorithm  choice  should  be  related  to,  not  restricted  by,  data  volume.   •  Best  approach  =  simple  algorithm,  lots  of  data   •  Sampling  should  s7ll  be  used,  but  to  test  model  effec7veness.  Not  to  fix  IT.   **Source:  Applied  Mul,ple  Regression/Correla,on  Analysis  (Cohen  &  Cohen,  1983)  
  • 19. 19   Which  dataset  offers  beLer  predic,ve  power?   Remember,  this  is  not  tes,ng  for  an  effect…   Alic e   Bo b   Chuc k   Donna   Eddi e   Frank   Gina   Uses  work   computer   for   shopping   1   4   5   1   Moves   data   between   networks   4   5   2   Works  long   hours   4   3   3   System/ Network   admin   Privs   5   Alice   Bo b   Chuc k   Donn a   Eddi e   Frank   Gina   Uses  work   computer   for   shopping   1   4   2   4   5   1   3   Moves   data   between   networks   4   3   1   5   1   4   3   Works  long   hours   2   4   3   3   4   3   2   System/ Network   admin   Privs   1   2   1   5   3   5   4   1   2   1.  As  we  add  dimensions,  average  distance  increases.  Add  Data.     2.  Fewer  “neighbors”  within  a  certain  radius  of  any  given  point  when  the  dataset   is  smaller.  Add  Data.   3.  Are  you  looking  at  similarity  (r/cosine)  or  are  you  using  dissimilarity  (Euclidean)?  
  • 20. 20   Algorithms:  Clustering     Sort  documents,  emails,   objects  by  text  class  and   group  terms/documents   into  dis,nct  categories.     Produce  visualiza,on.   Ques,on:  What’s  an  emerging  topic  among  users?  
  • 21. 21   Algorithms:  Naïve  Bayesian  Classifier   Given  a  training   set,  sort   documents  by   content:  Spam/ Not,  Religion/ Poli,cs/Art,  etc.   Ques,on:  Which  content  “looks  like”  other  content?  
  • 22. 22   Algorithms:  Recommender  Systems   •  User-­‐based  filtering  for  cold  start   (AKA  “likes”)     •  Item-­‐based  (user  similarity)   filtering  once  there  is  sufficient   user  data   Ques,on:  If  user  thinks  “A”  is  useful,  how  about  “B”,  “C”?      How  similar  is  one  user’s  paLern  to  another?  
  • 23. 23   Easily  Convert  between  bits/bytes  and   numbers/words  with  Avro   •  Serializa,on   •  Expressive   •  Records,  arrays,  unions,  enums     •  Efficient   •  Compact  binary,  compressed,  spliLable     •  Interoperable     •  Langs:  C,  C++,  C#,  Java,  Perl,  Python,  Ruby,  PHP     •  Tools:  MR,  Pig,  Hive,  Crunch,  Flume,  Sqoop,  etc     •  Dynamic     •  Can  read  &  write  w/o  genera,ng  code  first     •  Evolvable      
  • 24. 24   Query  results  from  large  analyses  in  Impala   •  Brings  real-­‐,me  query  capabili,es  to  Hadoop   •  It’s  fast!  Na,vely  wriLen        in  C++   •  Same  great  SQL  query        language  as  Hive  
  • 25. 25   Analy,cs  to  users:  HUE   •  Included  in  EDH   •  Mul,-­‐capability   interface  for  analy,cs   •  Interac,ve  graph   libraries   •  Customizable  Search,   Impala,  Hive,  Pig  Apps   •  But  Also:  Tableau,   Pentaho,  PlaZora,   ZoomData,  SAS…  
  • 26. 26   Cloudera  Manager   End-­‐to-­‐End  Administra,on  for  CDH   Manage   Easily  deploy,  configure  &  op,mize  clusters  1 Monitor   Maintain  a  central  view  of  all  ac,vity  2 Diagnose   Easily  iden,fy  and  resolve  issues  3 Integrate   Use  Cloudera  Manager  with  exis,ng  tools   4
  • 28. 28   2 8   Enterprise  Services   Inges,on  &  ETL   Pilot   Reference  implementa,on  up  to  3  sources,  5  transforma,ons,  1  target   Create,  execute,  test,  and  review  a  custom  inges,on/ETL  plan   Security   Integra,on     Implementa,on  of  role  based  access  control  with  the  data   processing  environment   Hadoop  Cluster   Deployment   Cer,fica,on   Fully  review  hardware,  data  sources,  typical  jobs,  and  exis,ng  SLAs   Develop,  implement,  benchmark,  and  document  Hadoop  deployment  
  • 29. 29   Path to Success – Services & Training   Hadoop  Cluster   Deployment  Cer,fica,on     1  week   Inges,on  &  ETL  Pilot     2  weeks   Security  Integra,on     1  week   Cloudera  Admin  Training     3  days   Hive/Pig  Training       2  days     Data   Science     3  days   Developer   Training     4  days  
  • 30. 30   ©2014  Cloudera,  Inc.  All   rights  reserved.   •  Winners  will  receive:   •  Free  Strata  +  Hadoop  World  pass   •  Free  seat  to  any  public  Cloudera   University  Training   •  Invita,on  to  exclusive  awards  dinner   •  Bragging  rights     Nomina7ons  are  open  for     the  Data  Impact  Awards!   Submission  deadline:  September  12th