How to Build a Scalable and Free Big Data Service

How to build an elastically
scalable, multi-tenant, FREE
big data service
Webinar

@karlunho
Alan Ho
@sbaxi
Shailendra Baxi
@rbhargava
Rajesh
Bhargava

Agenda
1. What & Why we built this service
2. Demo
1. Technical Architecture
2. Developer Experience
5

What we built
Free big data service for building
context aware apps
7

Context Aware Apps are “Behavior Driven”
8

Developer Alternatives for Machine Learning
9
Amazon Machine
Learning

Insights approach for Apigee Developer
10
Accelerated
Development
Descriptive
&
Predictive
Behavior
Based
Algorithms
E2E
Experience
Free

Architecture
1
DATA
INSIGHTS
1.Data upload
Structured or Unstructured
2. Scalable
Volume, Variety &
Velocity
3. Core IP
Machine Learning
Graph Processing
Un-structured Data
4. Analytics Offerings
Predictive & Journey
analytics, segmentation
User Interactions
Prediction Journey Segmentation
Computational Algorithms
Machine Learning Library
Data
Pipelines Unstructured Data
Processors
GRASP Processor
Distributed Processing Foundation
Distributed Data and Job Management
Apache usergrid
Query Language
Modeling Work Bench User Interface

Transactional Datastore
Modeling, Scoring,
Data Transformation,
Aggregation/Reporting
Ephemeral Hadoop Cluster
Management
Service
Software Libraries
GRASP Unstructured Data
Machine Learning
Insights Master
Data Staging Area
Monitoring
service
Ingestion Datastore
GRASP Query Service
Query
Datastore
Query Server
Real Time Service (Edge)
Real Time
Datastore (usergrid)
node
Applications
UI, Modeling
Workbench
Application Data
HTTPS, AWS APIs
HTTP(S)
Persistent
Datastore
= S3
= HDFS
API
System Components
Metadata Service
Runtime Metadata
Job Queue, Job Dependencies, Data
Set partitions
Metadata - Store
Static Metadata
DataStore & Dataset, Application, Job

How does Insights work?
Ingest Customer
Data
Batch or browser based
Event based or Customer profile
Aggregate behavior
graphs
Cross-channel, domain-agnostic
customer journey graphs
Enriched with Customer profile
Query capability and
machine learning
Customer journey visualization
Models & Scores
Data scientist +
developer support
R interface for predictive modeling on
Hadoop
Integrated with API Edge (incl BaaS,
node.js)
Data Flow
Customer
Data store
Persistant
Data store
HDFS on
compute cluster
Serving Data store
(Customer,
usergrid)
Data Ingestion
(Batch or Browser
based)
Data Moved to
Persistent
storage
Data brought to the
compute cluster for
processing
Processed Data
exported to
appropriate
location

Modeling, Scoring,
Management
Service
Software Libraries
Machine Learning
Insights Master
GRASP Query Service
Query
Datastore
Query Server
Real Time Service
Real Time
node
Applications
UI, Modeling
Workbench
Application Data
HTTPS, AWS APIs
HTTP(S)
Persistent
Datastore
= S3
= HDFS
API
Data level Multi-tenancy
Metadata Service
Runtime Metadata
Set partitions
Metadata - Store
Static Metadata
Data Staging
Monitoring
service
Ingestion Datastore
Datasets segregated/sharded by Account ID
Data keyed by account ID

Applications
UI, Modeling
Workbench
Application Data
Modeling, Scoring,
Management
Service
Software Libraries
Machine Learning
Insights Master
Data Staging Area
Monitoring
service
Ingestion Datastore
GRASP Query Service
Query
Datastore
Query Server
Real Time Service
Real Time
node
HTTPS, AWS APIs
HTTP(S)
Persistent
Datastore
= S3
= HDFS
API
Scalability
Metadata Service
Runtime Metadata
Set partitions
Metadata - Store
Static Metadata
Horizontal ScalingElastic/Ephemeral scaling
Sharding

Insights UI & APIs
• HTML5 Single page application
• Interacts with RESTful APIs
• Guide a novice user through the experience – Help them
understand important Predictive / Machine learning concepts
• Scalable REST API infrastructure
16

Developer Resources
• E2E Recommendation Tutorial – Try it Free !
• Sample Datasets
• Blog posts, Embedded Documentation
18

Try it out Apigee Developer
https://accounts-beta.apigee.com
19

Summary
• Be practical when approaching multi-tenancy
• Cost can be drastically reduced with elastic scaling & Multi-
tenancy
• Developer Experience requires continual refinement
• Try it out our Free Service for yourself !
20

How to Build a Scalable and Free Big Data Service

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Apigee | Google Cloud

Mehr von Apigee | Google Cloud (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

How to Build a Scalable and Free Big Data Service

Hinweis der Redaktion