Bi with apache hadoop(en)

1.237 Aufrufe

Veröffentlicht am

Simple, low level presentation just to get the audience smooth into hadoop and show them real use cases

Veröffentlicht in: Technologie
0 Kommentare
1 Gefällt mir
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe insgesamt
Auf SlideShare
Aus Einbettungen
Anzahl an Einbettungen
Gefällt mir
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Bi with apache hadoop(en)

  1. 1. Business Integration with CDH 4 (including Apache Hadoop) Alexander Alten-Lorenz, Cloudera INC Muenchen, 22. February 2013
  2. 2. ChallengesVolume Velocity Variety
  3. 3. Business Integration• CRM • Invoicing• Analytics • Risk Management• Social Networks • Universal Data Access• Marketing • Data Governance• Document Store • SAP / Salesforce• Search-Indices • Article and Storage Management
  4. 4. Use Cases
  5. 5. Risk Management• Problem: Scoring of Customers and Projects• Solution: Finance History, Communication and Pattern Detection• User: Finance, Insurance
  6. 6. Recommendations• Problem: Recommend convenient products to purchased products, matching the interests• Solution: Statistical analysis of interests, purchase history, detect matching swarm patterns• Users: eCommerce, Advertising
  7. 7. Graph-Analytics• Problem: Detect trends and curves in large distributed networks (Wired, Social, Mesh)• Solution: Collecting and Data Mining all data, applying to self learning patterns to detect trends and forecasts• User: Enterprises, Gov, NGO, Provider, Telco, Stock Exchange
  8. 8. Detection of Dangerous Use• Problem: Spam, Credit Card Abuse• Solution: Pattern Detection, Prioritizing, heuristically Analytics• Users: Retail, Finance, Reseller
  9. 9. Text Analysis• Problem: Detect the meaning of the written word (Sentiment Analysis)• Solution: Keyword patterns, Coherences detection, Path detection• Users: eCommerce, Social Media Service Provider, Attitude Research
  10. 10. Amounts of real Data• Ebay: 12 PB, Search Optimization• Facebook: 50 PB, Logs, Reports• Walmart, 4.5 PB, Customer Transactions
  11. 11. Apache Hadoop• Software Framework for large amounts of unstructured data• Apache-License• Two main cores • HDFS: Distributed data storage • MapReduce: Distributed data handling
  12. 12. Hadoop ClusterData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data Node Data Node: 4-16 Cores, 4-16 Disks, 8-64 GB RAM, 1-10GB Network
  13. 13. Hadoop Distributed File System FileBlock Block Block Block Block Block Block Data Node Data Node Data Node
  14. 14. MapReduce DataRDBMS Query DataHadoop Query
  15. 15. Features HDFS MapReduce Distribution ✔ ✔Fault Tolerance ✔ ✔ Scalability ✔ ✔
  16. 16. Hadoop Eco System SQL Scripts HBase Whirr Hive Pig Oozie MapReduce Avro Java API HDFS eeper Zook Sqoop Flume Connectors Hue RDBMS Logs ... Mahout
  17. 17. Example of a Integration
  18. 18. Scope• Successful Audits per ISO 27001• Analyze different Data Sources from different Data Bases and CRM Systems• Realtime and Lifetime Statistics per Product• Periodical Analytic and Statistic Jobs• Weekly Re-Import into CRM• Single Queries per User (Analyst) over a Secured GUI
  19. 19. Solution Path• Cluster Authentication and Authorization via Kerberos and crypted data communication / Data Protection• Sqoop Connector to CRM / DB • Terradata, Oracle, Postgres, MySQL, MS SQL• Hive - HBase Integration• Hive Analytics, controlled automatically over Oozie Workload Orchestrator• Hue Shell, Authentication via Kerberos SPNEGO
  20. 20. CRM Park Integration CDH Authentification Sqoop Kerberos (AD, MITv5)Real Time HBase Hive Oozie Automation Enduser HUE
  21. 21. How to Manage?
  22. 22. Cloudera Manager• Automated Deployment • Reporting• Monitoring • Support Integration• Service Management• Log Management• Events and Alerts
  23. 23. Cloudera• Founded 2009 in Palo Alto• Clouderas Distribution Including Hadoop• CDH4 / Cloudera Manager 4• > 320 employees worldwide• Training, Consulting, Support, Development• Enterprise Tools
  24. 24. Thank You!•• Twitter: @mapredit• Blog:•• http://hadoop.