Business Integration with         CDH 4       (including Apache Hadoop)   Alexander Alten-Lorenz, Cloudera INC       Muenc...
ChallengesVolume      Velocity   Variety
Business Integration•   CRM               •   Invoicing•   Analytics         •   Risk Management•   Social Networks   •   ...
Use Cases
Risk Management• Problem: Scoring of Customers and  Projects• Solution: Finance History, Communication  and Pattern Detect...
Recommendations• Problem: Recommend convenient products  to purchased products, matching the  interests• Solution: Statist...
Graph-Analytics• Problem: Detect trends and curves in large  distributed networks (Wired, Social, Mesh)• Solution: Collect...
Detection of       Dangerous Use• Problem: Spam, Credit Card Abuse• Solution: Pattern Detection, Prioritizing,  heuristica...
Text Analysis• Problem: Detect the meaning of the written  word (Sentiment Analysis)• Solution: Keyword patterns, Coherenc...
Amounts of real Data• Ebay: 12 PB, Search Optimization• Facebook: 50 PB, Logs, Reports• Walmart, 4.5 PB, Customer Transact...
Apache Hadoop• Software Framework for large amounts of  unstructured data• Apache-License• Two main cores • HDFS: Distribu...
Hadoop ClusterData Node     Data Node   Data Node    Data NodeData Node     Data Node   Data Node    Data NodeData Node   ...
Hadoop Distributed      File System                            FileBlock    Block   Block     Block     Block   Block    B...
MapReduce                 DataRDBMS    Query                 DataHadoop   Query
Features                  HDFS   MapReduce Distribution      ✔        ✔Fault Tolerance    ✔        ✔  Scalability      ✔  ...
Hadoop Eco System         SQL               Scripts            HBase                                                      ...
Example of a Integration
Scope• Successful Audits per ISO 27001• Analyze different Data Sources from  different Data Bases and CRM Systems• Realtim...
Solution Path• Cluster Authentication and Authorization via  Kerberos and crypted data communication / Data  Protection• S...
CRM Park         Integration         CDH    Authentification                     Sqoop                                     ...
How to Manage?
Cloudera Manager•   Automated Deployment   •   Reporting•   Monitoring             •   Support Integration•   Service Mana...
Cloudera• Founded 2009 in Palo Alto• Clouderas Distribution Including Hadoop• CDH4 / Cloudera Manager 4• > 320 employees w...
Thank You!• alexander@cloudera.com• Twitter: @mapredit• Blog: mapredit.blogspot.com• http://www.cloudera.com/• http://hado...
Nächste SlideShare
Wird geladen in ...5
×

Bi with apache hadoop(en)

824

Published on

Simple, low level presentation just to get the audience smooth into hadoop and show them real use cases

Published in: Technologie
0 Kommentare
1 Gefällt mir
Statistiken
Notizen
  • Hinterlassen Sie den ersten Kommentar

Keine Downloads
Views
Gesamtviews
824
Bei Slideshare
0
Aus Einbettungen
0
Anzahl an Einbettungen
2
Aktionen
Geteilt
0
Downloads
20
Kommentare
0
Gefällt mir
1
Einbettungen 0
No embeds

No notes for slide

Transcript of "Bi with apache hadoop(en)"

  1. 1. Business Integration with CDH 4 (including Apache Hadoop) Alexander Alten-Lorenz, Cloudera INC Muenchen, 22. February 2013
  2. 2. ChallengesVolume Velocity Variety
  3. 3. Business Integration• CRM • Invoicing• Analytics • Risk Management• Social Networks • Universal Data Access• Marketing • Data Governance• Document Store • SAP / Salesforce• Search-Indices • Article and Storage Management
  4. 4. Use Cases
  5. 5. Risk Management• Problem: Scoring of Customers and Projects• Solution: Finance History, Communication and Pattern Detection• User: Finance, Insurance
  6. 6. Recommendations• Problem: Recommend convenient products to purchased products, matching the interests• Solution: Statistical analysis of interests, purchase history, detect matching swarm patterns• Users: eCommerce, Advertising
  7. 7. Graph-Analytics• Problem: Detect trends and curves in large distributed networks (Wired, Social, Mesh)• Solution: Collecting and Data Mining all data, applying to self learning patterns to detect trends and forecasts• User: Enterprises, Gov, NGO, Provider, Telco, Stock Exchange
  8. 8. Detection of Dangerous Use• Problem: Spam, Credit Card Abuse• Solution: Pattern Detection, Prioritizing, heuristically Analytics• Users: Retail, Finance, Reseller
  9. 9. Text Analysis• Problem: Detect the meaning of the written word (Sentiment Analysis)• Solution: Keyword patterns, Coherences detection, Path detection• Users: eCommerce, Social Media Service Provider, Attitude Research
  10. 10. Amounts of real Data• Ebay: 12 PB, Search Optimization• Facebook: 50 PB, Logs, Reports• Walmart, 4.5 PB, Customer Transactions http://wiki.apache.org/hadoop/PoweredBy http://en.wikipedia.org/wiki/Big_data
  11. 11. Apache Hadoop• Software Framework for large amounts of unstructured data• Apache-License• Two main cores • HDFS: Distributed data storage • MapReduce: Distributed data handling
  12. 12. Hadoop ClusterData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data Node Data Node: 4-16 Cores, 4-16 Disks, 8-64 GB RAM, 1-10GB Network
  13. 13. Hadoop Distributed File System FileBlock Block Block Block Block Block Block Data Node Data Node Data Node
  14. 14. MapReduce DataRDBMS Query DataHadoop Query
  15. 15. Features HDFS MapReduce Distribution ✔ ✔Fault Tolerance ✔ ✔ Scalability ✔ ✔
  16. 16. Hadoop Eco System SQL Scripts HBase Whirr Hive Pig Oozie MapReduce Avro Java API HDFS eeper Zook Sqoop Flume Connectors Hue RDBMS Logs ... Mahout
  17. 17. Example of a Integration
  18. 18. Scope• Successful Audits per ISO 27001• Analyze different Data Sources from different Data Bases and CRM Systems• Realtime and Lifetime Statistics per Product• Periodical Analytic and Statistic Jobs• Weekly Re-Import into CRM• Single Queries per User (Analyst) over a Secured GUI
  19. 19. Solution Path• Cluster Authentication and Authorization via Kerberos and crypted data communication / Data Protection• Sqoop Connector to CRM / DB • Terradata, Oracle, Postgres, MySQL, MS SQL• Hive - HBase Integration• Hive Analytics, controlled automatically over Oozie Workload Orchestrator• Hue Shell, Authentication via Kerberos SPNEGO
  20. 20. CRM Park Integration CDH Authentification Sqoop Kerberos (AD, MITv5)Real Time HBase Hive Oozie Automation Enduser HUE
  21. 21. How to Manage?
  22. 22. Cloudera Manager• Automated Deployment • Reporting• Monitoring • Support Integration• Service Management• Log Management• Events and Alerts
  23. 23. Cloudera• Founded 2009 in Palo Alto• Clouderas Distribution Including Hadoop• CDH4 / Cloudera Manager 4• > 320 employees worldwide• Training, Consulting, Support, Development• Enterprise Tools
  24. 24. Thank You!• alexander@cloudera.com• Twitter: @mapredit• Blog: mapredit.blogspot.com• http://www.cloudera.com/• http://hadoop. apache.org/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×