Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

How to Build a Scalable and Free Big Data Service

1.787 Aufrufe

Veröffentlicht am

Apigee is releasing a free developer version of Insights—a predictive analytics service built on Hadoop. This webcast will give enterprise architects and Hadoop enthusiasts an overview of how to build a real world system on top of Hadoop.

In this webcast, two Apigee engineering leads will explain how to:
- secure a shared Hadoop service using APIs
- make Hadoop elastically scale on demand
- maximize the efficiency of Hadoop across multiple tenants
- monitor and operate a multi-tenant Hadoop service

Veröffentlicht in: Software
  • Als Erste(r) kommentieren

How to Build a Scalable and Free Big Data Service

  1. 1. How to build an elastically scalable, multi-tenant, FREE big data service Webinar
  2. 2. @karlunho Alan Ho @sbaxi Shailendra Baxi @rbhargava Rajesh Bhargava
  3. 3. youtube.com/apigee
  4. 4. slideshare.com/apigee
  5. 5. Agenda 1. What & Why we built this service 2. Demo 1. Technical Architecture 2. Developer Experience 5
  6. 6. Apigee Developer 6
  7. 7. What we built Free big data service for building context aware apps 7
  8. 8. Context Aware Apps are “Behavior Driven” 8
  9. 9. Developer Alternatives for Machine Learning 9 Amazon Machine Learning
  10. 10. Insights approach for Apigee Developer 10 Accelerated Development Descriptive & Predictive Behavior Based Algorithms E2E Experience Free
  11. 11. Architecture 1 DATA INSIGHTS 1.Data upload Structured or Unstructured 2. Scalable Volume, Variety & Velocity 3. Core IP Machine Learning Graph Processing Un-structured Data 4. Analytics Offerings Predictive & Journey analytics, segmentation User Interactions Prediction Journey Segmentation Computational Algorithms Machine Learning Library Data Pipelines Unstructured Data Processors GRASP Processor Distributed Processing Foundation Distributed Data and Job Management Apache usergrid Query Language Modeling Work Bench User Interface
  12. 12. Transactional Datastore Modeling, Scoring, Data Transformation, Aggregation/Reporting Ephemeral Hadoop Cluster Management Service Software Libraries GRASP Unstructured Data Machine Learning Insights Master Data Staging Area Monitoring service Ingestion Datastore GRASP Query Service Query Datastore Query Server Real Time Service (Edge) Real Time Datastore (usergrid) node Applications UI, Modeling Workbench Application Data HTTPS, AWS APIs HTTP(S) Persistent Datastore = S3 = HDFS API System Components Metadata Service Runtime Metadata Job Queue, Job Dependencies, Data Set partitions Metadata - Store Static Metadata DataStore & Dataset, Application, Job
  13. 13. How does Insights work? Ingest Customer Data Batch or browser based Event based or Customer profile Aggregate behavior graphs Cross-channel, domain-agnostic customer journey graphs Enriched with Customer profile Query capability and machine learning Customer journey visualization Models & Scores Data scientist + developer support R interface for predictive modeling on Hadoop Integrated with API Edge (incl BaaS, node.js) Data Flow Customer Data store Persistant Data store HDFS on compute cluster Serving Data store (Customer, usergrid) Data Ingestion (Batch or Browser based) Data Moved to Persistent storage Data brought to the compute cluster for processing Processed Data exported to appropriate location
  14. 14. Transactional Datastore Modeling, Scoring, Data Transformation, Aggregation/Reporting Ephemeral Hadoop Cluster Management Service Software Libraries GRASP Unstructured Data Machine Learning Insights Master GRASP Query Service Query Datastore Query Server Real Time Service Real Time Datastore (usergrid) node Applications UI, Modeling Workbench Application Data HTTPS, AWS APIs HTTP(S) Persistent Datastore = S3 = HDFS API Data level Multi-tenancy Metadata Service Runtime Metadata Job Queue, Job Dependencies, Data Set partitions Metadata - Store Static Metadata DataStore & Dataset, Application, Job Data Staging Monitoring service Ingestion Datastore Datasets segregated/sharded by Account ID Data keyed by account ID
  15. 15. Applications UI, Modeling Workbench Application Data Transactional Datastore Modeling, Scoring, Data Transformation, Aggregation/Reporting Ephemeral Hadoop Cluster Management Service Software Libraries GRASP Unstructured Data Machine Learning Insights Master Data Staging Area Monitoring service Ingestion Datastore GRASP Query Service Query Datastore Query Server Real Time Service Real Time Datastore (usergrid) node HTTPS, AWS APIs HTTP(S) Persistent Datastore = S3 = HDFS API Scalability Metadata Service Runtime Metadata Job Queue, Job Dependencies, Data Set partitions Metadata - Store Static Metadata DataStore & Dataset, Application, Job Horizontal ScalingElastic/Ephemeral scaling Sharding
  16. 16. Insights UI & APIs • HTML5 Single page application • Interacts with RESTful APIs • Guide a novice user through the experience – Help them understand important Predictive / Machine learning concepts • Scalable REST API infrastructure 16
  17. 17. Insights R SDK 17
  18. 18. Developer Resources • E2E Recommendation Tutorial – Try it Free ! • Sample Datasets • Blog posts, Embedded Documentation 18
  19. 19. Try it out Apigee Developer https://accounts-beta.apigee.com 19
  20. 20. Summary • Be practical when approaching multi-tenancy • Cost can be drastically reduced with elastic scaling & Multi- tenancy • Developer Experience requires continual refinement • Try it out our Free Service for yourself ! 20

×