Apache Hadoop and Spark are best-of-breed technologies for distributed processing and storage of very large data sets: Big Data. Join us as we explain how to integrate Salesforce with off-the-shelf big data tools to build flexible applications. You'll also learn how Force.com is evolving in this area and how Big Objects and Data Pipelines will provide Big Data capability within the platform.
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Bringing the Power of Big Data Computation to Salesforce
1. Bringing the Power of Big Data
Computation to Salesforce
Arun Bhat
Chief Architect – Model N Inc.
abhat@modeln.com
@parunbhat
Krishna Shekhram
Software Architect – Model N Inc.
kshekhram@modeln.com
@kshekhram
3. • Model N is the leading provider of Revenue Management solutions for the life sciences and
technology industries.
• The company helps customers maximize revenues, drive growth and reduce compliance risk by
transforming the revenue lifecycle from inefficient disjointed operation into a strategic end to end
process.
Why do we care about big data
Model N – The Pioneer in Revenue Management
Founded in 1999$120+B
Revenue under management
2+M
Sales lines processed daily
100+
Companies maximizing revenue
with Model N
50,000+
Sales, Sales Ops, FAE’s, Finance,
Marketing, Manufacturing reps and
Distributor users
100+
Countries where Model N
Revenue Management is used
1,000+
Distributors in 50 Countries
4. Arun Bhat
Chief Architect, Revvy Products
15 years in Model N
19 years in Software Industry
Led Architecture of Model N products
Responsible for architecture of multi-tenant
Revvy products on Salesforce
Passionate about technology but likes to read
comics
Krishna Shekhram
Architect, Revvy Products
6 years in Model N
14 years in Software Industry
Architected Model N Analytics Products
Lead for Revvy Big Data Architecture
Enjoys exploring new technologies. Love to
watch documentaries to learn more about
world.
Model N – The Pioneer in Revenue Management
6. Leveraging Salesforce
Computing using Big Data
Metadata as a common fabric
Integrating into a Cohesive Architecture
Building a Data Driven Application
Demo
Data Pipeline and BigObjects
Summary
Agenda
Big Data
10. Source: logs, social media,
mobile, IOT, POS
Format: structured, text, picture,
video, binary, document
Speed: real-time streams,
transactions, batch upload
Rapid Ingestion
Bigger Storage
Faster Processing
Quicker Retrieval
Better Visualization
Hidden insights discovery
Facts based decision making
Business process automation
Ecosystem engagement
Growth & monetization of data
Data Explosion Technology Evolution Business Opportunities
Why “Big Data” is a Big Deal
Competitive advantage for today, Survival for tomorrow
11. Big data technology is going through innovation spurt
Big Data Technology Landscape
12. Components
• HDFS, Map/Reduce, YARN
• Provides fault tolerant and scalable cluster
HDFS as storage
• Supports variety of data formats
• Metadata driven schema evolution
YARN as cluster manager
• Supports Security, Resource Isolation, Multi-tenancy
• Highly available and elastic scaling
Components
• Spark Core, SQL, MLib, Streaming, GraphX
• Can run in variety of clusters (YARN, Mesos,
Standalone)
Data Access
• Data access from HDFS, S3, Cassandra, HBase,
JDBC, Streaming source like Kafka
• Supports multiple formats like Parquet, json, csv, etc.
Compute
• General purpose low latency compute engine
• Batch, Interactive, Query, Predictive, Graph and
Stream processing
Hadoop and Spark Advantage
Data driven, flexible, multi-tenant applications at scale
Hadoop Spark
14. Sales Data Sales Metadata
URL: /tx/sales/Sales.parquet
Columns:
Sale ID: ID
Customer : Relationship (Customer)
Product : Relationship (Product)
Invoice Date: Date
Qty : Integer
Price : Decimal
Metadata Example
Metadata describes data
Sale ID
Customer
Product
Invoice Date
Qty
Price
Product ID
Product #
BU
Customer
ID
Name
Type
Customer
Sales
Product
15. Calculation Unit Calculation Model
Flexibility & Extensibility
Key for multi tenant cloud applications
Calc
Op
Input
Dataset
Output
Dataset
Define
Metadata
Define
Metadata
Input
Dataset
Input
Dataset
Input
Dataset
Output
Dataset
Output
Dataset
Output
Dataset
Calculation
Model
Metadata MetadataConfiguration
16. • Metadata Capture & Synchronization
• Define all dataset as objects in Salesforce to capture metadata. Example: Sales, Inventory, Order
• Load actual data in HDFS
• Synchronize metadata on change
• Master Data Sync
• Synchronize the master data from SFDC to HDFS. Example: Accounts, Catalog
• HDFS Schema using metadata
• Use HDFS file formats which supports schema evolution(e.g. Parquet, Avro)
• Use the dataset metadata to read/write HDFS file
• Configure Calculation
• Define Variability in calculation as configuration using Salesforce custom object
Leverage Salesforce to capture metadata
Flexibility & Extensibility using metadata
18. • Exposes all the REST APIs needed for application.
• Stores application and object metadata
• Provides support for multi-tenancy, error handling and recovery
• Provides secure API for
• Metadata synchronization
• Data Loads
• Batch calculation
• Querying the aggregated results
• Real time calculation/prediction
Exposes big data computation as service
Web Service as Middleware
Compute
Cluster
Cluster Web
Service
19. • Abstracts out complexity of big data technology
• Translates business specific service calls to calculation jobs
• Uses metadata to build calculation model
• Handles connection to cluster
• Manages multi-tenancy context to submit jobs to cluster
• Interacts with Various cluster components
• HDFS
• YARN
• Spark
Acts as client for cluster
Web Service as Middleware
Compute
Cluster
Cluster Web
Service
20. Building a Data Driven
Application
Getting best of both world to realize business value
21. • Unified transactional and analytics application
• Provides real time insights from data in business context
• Calculates KPIs and processes data for business
• Evaluate performance against goal based on data
• Combines intelligence with Action
• Facilitate business process automation
• Learn from data to support fast and accurate decision
Key Concepts
What is a data driven application
22. Contextual Discovery
Measuring KPIs and
triggering workflow
actions, alerts or
notifications based on KPI.
Claim processing
Fraud detection
Processing large amount
of data and running
business calculation on it
to generate results critical
for business operation.
Tax report generation
Stock portfolio valuation
Intelligent decisions and
actions based on learning
from data. Prediction,
Optimization, Anomaly
detection, AI,
Recommendation.
Google Now, Price
Optimization
Business Process
Automation Data Processing Decision Intelligence
Interactive dashboards
and analysis in the
transactional application
business context.
Account performance
dashboard in CRM
application
Data Driven Application Examples
23. Guideline for building data driven application
Reference Architecture
Metadata
Manager
Common Library
Data
Manager
Job
Manager
Config
Manager
Application
Account
Catalog
Opportunity
Sales
Segment
Big Data Cluster
Web App Middleware
Cluster Client
Metadata
Service
Data
Service
Application
Service
Data Storage
Calculation Runtime
25. User enters segment definition
See Sales metadata in Salesforce
Show Sales lines loaded in Hadoop
Trigger segmentation from Salesforce
Show dashboards with segmented customers in Salesforce
Segmenting customers based on revenue
Demo Overview
27. Data Pipelines
Brings batch processing using Hadoop to the Salesforce Platform
Apache Pig for data flow control and evaluation
BigObjects
Storage of large amounts of data
Data Pipelines and BigObjects (Pilot)
28. Features that can be leveraged
BigObjects to store POS, Order and line items
Apache Pig Script and Hadoop through the Data Pipeline API
Features that need to be incorporated
Support Data Pipeline API through Apex (instead of the Metadata API)
Support for low latency jobs e.g. Spark (as compared to batch processing)
To get big data computation in Salesforce
Collaborate with Salesforce on big data roadmap
31. • How to leverage Salesforce to build flexible cloud applications
• How to use big data computation to realize valuable insights, actions and faster decisions from your data at
scale
• How to fuse Salesforce and Big Data technologies together using metadata and integrations
• How to unlock your business potential using data driven application
• How Salesforce and Big Data technologies can coexist well
What we learnt
Summary