Bi with apache hadoop(en)
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Bi with apache hadoop(en)

on

  • 1,258 Views

Simple, low level presentation just to get the audience smooth into hadoop and show them real use cases

Simple, low level presentation just to get the audience smooth into hadoop and show them real use cases

Statistiken

Views

Gesamtviews
1,258
Views auf SlideShare
1,256
Views einbetten
2

Actions

Gefällt mir
1
Downloads
19
Kommentare
0

2 Einbettungen 2

http://www.docshut.com 1
http://dschool.co 1

Zugänglichkeit

Kategorien

Details hochladen

Uploaded via as Adobe PDF

Benutzerrechte

© Alle Rechte vorbehalten

Report content

Als unangemessen gemeldet Als unangemessen melden
Als unangemessen melden

Wählen Sie Ihren Grund, warum Sie diese Präsentation als unangemessen melden.

Löschen
  • Full Name Full Name Comment goes here.
    Sind Sie sicher, dass Sie...
    Ihre Nachricht erscheint hier
    Processing...
Kommentar posten
Kommentar bearbeiten

Bi with apache hadoop(en) Presentation Transcript

  • 1. Business Integration with CDH 4 (including Apache Hadoop) Alexander Alten-Lorenz, Cloudera INC Muenchen, 22. February 2013
  • 2. ChallengesVolume Velocity Variety
  • 3. Business Integration• CRM • Invoicing• Analytics • Risk Management• Social Networks • Universal Data Access• Marketing • Data Governance• Document Store • SAP / Salesforce• Search-Indices • Article and Storage Management
  • 4. Use Cases
  • 5. Risk Management• Problem: Scoring of Customers and Projects• Solution: Finance History, Communication and Pattern Detection• User: Finance, Insurance
  • 6. Recommendations• Problem: Recommend convenient products to purchased products, matching the interests• Solution: Statistical analysis of interests, purchase history, detect matching swarm patterns• Users: eCommerce, Advertising
  • 7. Graph-Analytics• Problem: Detect trends and curves in large distributed networks (Wired, Social, Mesh)• Solution: Collecting and Data Mining all data, applying to self learning patterns to detect trends and forecasts• User: Enterprises, Gov, NGO, Provider, Telco, Stock Exchange
  • 8. Detection of Dangerous Use• Problem: Spam, Credit Card Abuse• Solution: Pattern Detection, Prioritizing, heuristically Analytics• Users: Retail, Finance, Reseller
  • 9. Text Analysis• Problem: Detect the meaning of the written word (Sentiment Analysis)• Solution: Keyword patterns, Coherences detection, Path detection• Users: eCommerce, Social Media Service Provider, Attitude Research
  • 10. Amounts of real Data• Ebay: 12 PB, Search Optimization• Facebook: 50 PB, Logs, Reports• Walmart, 4.5 PB, Customer Transactions http://wiki.apache.org/hadoop/PoweredBy http://en.wikipedia.org/wiki/Big_data
  • 11. Apache Hadoop• Software Framework for large amounts of unstructured data• Apache-License• Two main cores • HDFS: Distributed data storage • MapReduce: Distributed data handling
  • 12. Hadoop ClusterData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data NodeData Node Data Node Data Node Data Node Data Node: 4-16 Cores, 4-16 Disks, 8-64 GB RAM, 1-10GB Network
  • 13. Hadoop Distributed File System FileBlock Block Block Block Block Block Block Data Node Data Node Data Node
  • 14. MapReduce DataRDBMS Query DataHadoop Query
  • 15. Features HDFS MapReduce Distribution ✔ ✔Fault Tolerance ✔ ✔ Scalability ✔ ✔
  • 16. Hadoop Eco System SQL Scripts HBase Whirr Hive Pig Oozie MapReduce Avro Java API HDFS eeper Zook Sqoop Flume Connectors Hue RDBMS Logs ... Mahout
  • 17. Example of a Integration
  • 18. Scope• Successful Audits per ISO 27001• Analyze different Data Sources from different Data Bases and CRM Systems• Realtime and Lifetime Statistics per Product• Periodical Analytic and Statistic Jobs• Weekly Re-Import into CRM• Single Queries per User (Analyst) over a Secured GUI
  • 19. Solution Path• Cluster Authentication and Authorization via Kerberos and crypted data communication / Data Protection• Sqoop Connector to CRM / DB • Terradata, Oracle, Postgres, MySQL, MS SQL• Hive - HBase Integration• Hive Analytics, controlled automatically over Oozie Workload Orchestrator• Hue Shell, Authentication via Kerberos SPNEGO
  • 20. CRM Park Integration CDH Authentification Sqoop Kerberos (AD, MITv5)Real Time HBase Hive Oozie Automation Enduser HUE
  • 21. How to Manage?
  • 22. Cloudera Manager• Automated Deployment • Reporting• Monitoring • Support Integration• Service Management• Log Management• Events and Alerts
  • 23. Cloudera• Founded 2009 in Palo Alto• Clouderas Distribution Including Hadoop• CDH4 / Cloudera Manager 4• > 320 employees worldwide• Training, Consulting, Support, Development• Enterprise Tools
  • 24. Thank You!• alexander@cloudera.com• Twitter: @mapredit• Blog: mapredit.blogspot.com• http://www.cloudera.com/• http://hadoop. apache.org/