Weitere ähnliche Inhalte Ähnlich wie Evolving Beyond the Data Lake: A Story of Wind and Rain (20) Mehr von MapR Technologies (18) Kürzlich hochgeladen (20) Evolving Beyond the Data Lake: A Story of Wind and Rain1. 1© 2016 MapR Technologies 1© 2016 MapR Technologies
Evolving Beyond the Data Lake
A Story of Wind and Rain
2. 2© 2016 MapR Technologies 2
Industry Leaders Are Investing in Disruptive Technology Now
Innovating and reducing costs at the same time
Source: IDC, Gartner; Analysis & Estimates: MapR
Next-gen consists of cloud, big data, software and hardware related expenses
(100,000)
(80,000)
(60,000)
(40,000)
(20,000)
-
20,000
40,000
60,000
80,000
100,000
120,000
2013 2014 2015 2016 2017 2018 2019 2020
Investment in Next-Gen vs. Legacy Technologies for Data
$120
100
80
60
40
20
(20)
(40)
(60)
(80)
(100)
In Billions
Total $ Growth of IT Market Next-Gen Growth Legacy Market Growth/Shrink in $
90% of data is on
next-gen technology
in just four years
3. 3© 2016 MapR Technologies 3
Application Development and Deployment
Oracle
Bulk Load
Machine
Learning
Data
Lake
Predictive
Modeling
BI /
Reporting
Insights
DB
Events
(Kafka)
NoSQL
SQL
Server
Graph
DB
Microservice
(.NET)
Microservice
(NodeJS)
Microservice
(Java)
Customer Insights
SQL
Server
IIS, ASP.NET
Desktop
Browser
(Javascript, jQuery)
SQL
HTML, CSS, JS
Microsoft
Reporting
Service
2005 Today Desktop
Browser
(Javascript, 20+
Frameworks)
Tablet
Native
Android
Native
iOS
JSON
JSON, CSS,
HTML, JS
Backendfor
Frontend
(Java)
4. 4© 2016 MapR Technologies 4
Application Development and Deployment
Oracle
Bulk Load
Machine
Learning
Data
Lake
Predictive
Modeling
BI /
Reporting
Insights
DB
Events
(Kafka)
NoSQL
SQL
Server
Graph
DB
Microservice
(.NET)
Backendfor
Frontend
(Java)
Microservice
(NodeJS)
Microservice
(Java)
Desktop
Browser
(Javascript, 20+
Frameworks)
Tablet
Native
Android
Native
iOS
Customer Insights
JSON
JSON, CSS,
HTML, JS
SQL
Server
IIS, ASP.NET
Desktop
Browser
(Javascript, jQuery)
SQL
HTML, CSS, JS
Microsoft
Reporting
Service
2005 Today
5. 5© 2016 MapR Technologies 5© 2016 MapR Technologies© 2016 MapR Technologies
Messaging platforms
6. 6© 2016 MapR Technologies 6
Producers Consumers
A stream is an unbounded sequence of events carried
from a set of producers to a set of consumers.
What’s a Stream?
Producers and consumers don’t have to be aware of
each other, instead they participate in shared topics.
This is called publish/subscribe.
/Events:Topic
7. 7© 2016 MapR Technologies 7
Publishers and Subscribers (pub-sub)
/Events:Topic Analytics
Consumers
Stream ProcessorsSocial Platforms
Servers
(Logs, Metrics)
Sensors
Mobile Apps
Other Apps &
Microservices
Alerting Systems
Stream Processing
Frameworks
Databases &
Search Engines
Dashboards
Other Apps &
Microservices
8. 8© 2016 MapR Technologies 8
Considering a Messaging Platform
• 50-100k messages per second used to be good
– Not really good to handle decoupled communication between services
• Kafka model is BLAZING fast
– Kafka 0.9 API with message sizes at 200 bytes
– MapR Streams on a 5 node cluster sustained 18 million events / sec
– Throughput of 3.5GB/s and over 1.5 trillion events / day
• Manual sharding is not a “great” solution
– Adding more servers should be easy and fool proof, not painful
– Yes, I have lived through this
9. 9© 2016 MapR Technologies 9
Goals
• Real-time or near-time
– Includes situations with deadlines
– Also includes situations where delay is simply undesirable
– Even includes situations where delay is just fine
• Microservices
– Streaming is a convenient idiom for design
– Microservices … you know we wanted it
– Service isolation is a key requirement
10. 10© 2016 MapR Technologies 10
Advantages of Messaging and Real-time Enablement
• Less moving parts
– Less things to go wrong
• Better resource utilization
– Scale any application up or down on demand
• Common deployment model (new isolation model)
– Repeatability between environments (dev, qa, production)
• Improved integration testing
– Listen to production streams in dev and qa (** this is a BIG DEAL! **)
• Shared file system
– Get at the data anywhere in the cluster
– Simplifies business continuity
11. 11© 2016 MapR Technologies 11
A microservice is
loosely coupled
with bounded context
12. 12© 2016 MapR Technologies 12
How to Couple Services and Break micro-ness
• Shared schemas, relational stores
• Ad hoc communication between services
• Enterprise service busses
• Brittle protocols
• Poor protocol versioning
Don’t do this!
13. 13© 2016 MapR Technologies 13
How to Decouple Services
• Use self-describing data
• Private databases
• Infrastructural communication between services
• Use modern protocols
• Adopt future-proof protocol practices
• Use shared storage where necessary due to scale
14. 14© 2016 MapR Technologies 14
Decoupled Architecture
Producer
Activity Handler
Producer
Producer
Historical
Interesting
Data Real-time
Analysis
Results Dashboard
Anomaly
Detection
15. 15© 2016 MapR Technologies 15
Mechanisms for Decoupling
• Traditional message queues?
– Message queues are classic answer
– Key feature/flaw is out-of-order acknowledgement
– Many implementations
– You pay a huge performance hit for persistence
• Kafka-esque Logs?
– Logs are like queues, but with ordering
– Out-of-order consumption is possible, acknowledgement not so much
– Canonical base implementation is Kafka
– Performance plus persistence
17. 17© 2016 MapR Technologies 17
Fraud Detection
?
POS 1
location, t, card #
yes/no?
POS 2
location, t, card #
yes/no?
18. 18© 2016 MapR Technologies 18
Traditional Solution
POS
1..n
Fraud
detector
Last card
use
19. 19© 2016 MapR Technologies 19
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
20. 20© 2016 MapR Technologies 20
What Happens Next?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
21. 21© 2016 MapR Technologies 21
How to Get Service Isolation
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
22. 22© 2016 MapR Technologies 22
New Uses of Data
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity
23. 23© 2016 MapR Technologies 23
Scaling Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
24. 24© 2016 MapR Technologies 24© 2016 MapR Technologies
Use Cases
25. 25© 2016 MapR Technologies 25
Event-based Data Drives Applications
Failure
Alerts
Real-time application
& network monitoring
Trending
now
Web
Personalized Offers
Real-time Fraud Detection
Ad optimization
Supply Chain Optimization
26. 26© 2016 MapR Technologies 26
Classifiers
Fighting Fraudulent Web Traffic
Activity Stream
Click Stream
Deviation from Normal
Blacklist Activities
Whitelist Activities
User Activity Profile
Known Bad Classifier
All OK Classifier
Session Alteration
Stream Notify Security
27. 27© 2016 MapR Technologies 27
Similarities between Marketing and Fraud?
Customer 360 Website Fraud
• Build a user profile
– What are their normal usage patterns
• Build “segmented” profiles
– What do real users normally do
• Dynamically alter website
– Prevent user functionality
• Kick-off external workflows
– Notify security team
• Build a user profile
– What type of content do they like
• Build “segmented” profiles
– Company affiliation
• Dynamically alter website
– Show alternate content
• Kick-off external workflows
– Nurture emails
28. 28© 2016 MapR Technologies 28
Message
Bus
Specialized Storage
Operational Applications
J2EE
AppServer
Relational
Database
Legacy Business Platforms
• IT must integrate all the products
• Inability to operationalize the insight rapidly
• Can’t deal with high speed data ingestion and processing
• Scale up architecture leads to high cost
Specialized Storage
Analytical Applications
Analytic
Database
ETL Tool BI Tool
29. 29© 2016 MapR Technologies 29
Converged Data Platform
Analytical
Applications
Operational
Applications
Converged Applications
Complete Access to Real-time and
Historical Data in One Platform
Developers
Creating Database
and Event Based
Applications
(Bottom Line Initiatives) (Top Line Initiatives)
Analysts
Creating BI Reports
and KPIs on Data
Warehouse
Historical Data Current Data
30. 30© 2016 MapR Technologies 30
Web-Scale Storage
MapR-FS MapR-DB
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Event StreamingDatabase
MapR Platform Services: Open API Architecture
Assures Interoperability, Avoids Lock-in
HDFS
API
POSIX
NFS
SQL,
HBase
API
JSON
API
Kafka
API
31. 31© 2016 MapR Technologies 31
Converged Application Benefits
• Consumers scale horizontally with partitions
• 1:1 mapping between consumer and partition
• Enables predictable scaling as production needs grow
• Data can be seamlessly replicated to another cluster
• Enables HA with zero code changes
• Data is indexed dynamically according to receivers, senders
• Scales beyond the capabilities of Kafka
• Snapshots can be taken to capture state
• Enables faster testing and deployment of
applications
32. 32© 2016 MapR Technologies 32
Not All Data Platforms are the Same
33. 33© 2016 MapR Technologies 33
@kingmesal
jscott@mapr.com
Engage with us!
kingmesal
Hinweis der Redaktion Great news, I have 467 slides today …. Hahah… I’m just kidding… I only have 465… Over the next four years, companies will experience flat IT spending. But underneath that will be a steady decrease in legacy spend accompanied by a corresponding increase in spend behind next gen technologies. But this chart also provides insight into the solution. The key to reducing costs while driving innovation is the data.
CLICK In fact, forecast also shows that within four years 90% of data will be on next gen technology…. It’s important to realize how Application development has changed dramatically in the past 10 years…
The complexity was driven by the difficulties in dealing with separate silos of data... Real time – means you have a choice, now or when you are ready
I can’t emphasize enough that this capability allows you to feed your production data stream into Dev and QA for testing – Many people would give an arm for that capability. Much more than just traditional real-time…
Not FINITE! It is a stream... There is no explicit end Real time – means you have a choice, now or when you are ready
Anyone ever sit in a LONG meeting to discuss changing a database schema? Data warehouse? You know, those 30 minute meetings that run an hour long with no agreed upon answer?
Add fields, DO NOT CHANGE a field type… Gentle migrations… like JSON...
Not sharing your database with everyone helps
Avro, binary json....
Message driven architectures are fundamentally sound, but in the past the cost to scale the messaging layer was cost prohibi Reading and acknowledging a messages is much like a database transaction, and within message queues they are a major factor to performance.
All IO is in a continuous, sweeping motion, increases throughput Either due to meetings! Or perhaps one application dominating the use of the shared database. Event based data drives applications.. Whether it’s collecting machine sensors to predict and prevent failures, or providing key offers to customers, or identifying and preventing fraud before it happens. All these use cases are enabled by event based data flows and a converged platform. Bad Actors! / Fraudsters
Deviation from normal For those who may be more familiar with a Customer 360 let me explain just how similar the software to support both models really is. 1 line of code
Spyglass blueprint for converged applications
event driven microservices This table isn’t meant to show you that we do everything better than Cloudera and Hortonworks, it is to show you that not all platforms are built for the same purpose. This is really intended to show you how truly different we are from Cloudera and Hortonworks. We compete with a number of other companies like IBM (DB2 and MQ), Oracle (DW and Database), Tibco (MQ)… we compete with a lot of different companies that cover different parts of your business. We just chose to build it into a single homogenous platform.