Your SlideShare is downloading. ×
Bi with apache hadoop(en)
Nächste SlideShare
Wird geladen in ...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Bi with apache hadoop(en)


Published on

Simple, low level presentation just to get the audience smooth into hadoop and show them real use cases

Simple, low level presentation just to get the audience smooth into hadoop and show them real use cases

Published in: Technologie

0 Kommentare
1 Gefällt mir
  • Hinterlassen Sie den ersten Kommentar

Keine Downloads
Bei Slideshare
Aus Einbettungen
Anzahl an Einbettungen
Gefällt mir
Einbettungen 0
No embeds

Inhalte melden
Als unangemessen gemeldet Als unangemessen melden
Als unangemessen melden

Wählen Sie Ihren Grund, warum Sie diese Präsentation als unangemessen melden.

No notes for slide


  • 1. Business Integration with CDH 4 (including Apache Hadoop) Alexander Alten-Lorenz, Cloudera INC Muenchen, 22. February 2013
  • 2. ChallengesVolume Velocity Variety
  • 3. Business Integration• CRM • Invoicing• Analytics • Risk Management• Social Networks • Universal Data Access• Marketing • Data Governance• Document Store • SAP / Salesforce• Search-Indices • Article and Storage Management
  • 4. Use Cases
  • 5. Risk Management• Problem: Scoring of Customers and Projects• Solution: Finance History, Communication and Pattern Detection• User: Finance, Insurance
  • 6. Recommendations• Problem: Recommend convenient products to purchased products, matching the interests• Solution: Statistical analysis of interests, purchase history, detect matching swarm patterns• Users: eCommerce, Advertising
  • 7. Graph-Analytics• Problem: Detect trends and curves in large distributed networks (Wired, Social, Mesh)• Solution: Collecting and Data Mining all data, applying to self learning patterns to detect trends and forecasts• User: Enterprises, Gov, NGO, Provider, Telco, Stock Exchange
  • 8. Detection of Dangerous Use• Problem: Spam, Credit Card Abuse• Solution: Pattern Detection, Prioritizing, heuristically Analytics• Users: Retail, Finance, Reseller
  • 9. Text Analysis• Problem: Detect the meaning of the written word (Sentiment Analysis)• Solution: Keyword patterns, Coherences detection, Path detection• Users: eCommerce, Social Media Service Provider, Attitude Research
  • 10. Amounts of real Data• Ebay: 12 PB, Search Optimization• Facebook: 50 PB, Logs, Reports• Walmart, 4.5 PB, Customer Transactions
  • 11. Apache Hadoop• Software Framework for large amounts of unstructured data• Apache-License• Two main cores • HDFS: Distributed data storage • MapReduce: Distributed data handling
  • 12. Hadoop ClusterData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data Node Data Node: 4-16 Cores, 4-16 Disks, 8-64 GB RAM, 1-10GB Network
  • 13. Hadoop Distributed File System FileBlock Block Block Block Block Block Block Data Node Data Node Data Node
  • 14. MapReduce DataRDBMS Query DataHadoop Query
  • 15. Features HDFS MapReduce Distribution ✔ ✔Fault Tolerance ✔ ✔ Scalability ✔ ✔
  • 16. Hadoop Eco System SQL Scripts HBase Whirr Hive Pig Oozie MapReduce Avro Java API HDFS eeper Zook Sqoop Flume Connectors Hue RDBMS Logs ... Mahout
  • 17. Example of a Integration
  • 18. Scope• Successful Audits per ISO 27001• Analyze different Data Sources from different Data Bases and CRM Systems• Realtime and Lifetime Statistics per Product• Periodical Analytic and Statistic Jobs• Weekly Re-Import into CRM• Single Queries per User (Analyst) over a Secured GUI
  • 19. Solution Path• Cluster Authentication and Authorization via Kerberos and crypted data communication / Data Protection• Sqoop Connector to CRM / DB • Terradata, Oracle, Postgres, MySQL, MS SQL• Hive - HBase Integration• Hive Analytics, controlled automatically over Oozie Workload Orchestrator• Hue Shell, Authentication via Kerberos SPNEGO
  • 20. CRM Park Integration CDH Authentification Sqoop Kerberos (AD, MITv5)Real Time HBase Hive Oozie Automation Enduser HUE
  • 21. How to Manage?
  • 22. Cloudera Manager• Automated Deployment • Reporting• Monitoring • Support Integration• Service Management• Log Management• Events and Alerts
  • 23. Cloudera• Founded 2009 in Palo Alto• Clouderas Distribution Including Hadoop• CDH4 / Cloudera Manager 4• > 320 employees worldwide• Training, Consulting, Support, Development• Enterprise Tools
  • 24. Thank You!•• Twitter: @mapredit• Blog:•• http://hadoop.