Big data doesn't mean big money. In fact, choosing a NoSQL solution will almost certainly save your business money, in terms of hardware, licensing, and total cost of ownership. What's more, choosing the correct technology for your use case will almost certainly increase your top line as well.
Big words, right? We'll back them up with customer case studies and lots of details.
This webinar will give you the basics for growing your business in a profitable way. What's the use of growing your top line but outspending any gains on cumbersome, ineffective, outdated IT? We'll take you through the specific use cases and business models that are the best fit for NoSQL solutions.
By the way, no prior knowledge is required. If you don't even know what RDBMS or NoSQL stand for, you are in the right place. Get your questions answered, and get your business on the right track to meeting your customers' needs in today's data environment.
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing How to Choose
1. ROI on Big Data: RDBMS,
NoSQL or Both?
Robin Schumacher
VP Products, DataStax
2. 3 Big Ideas for todayâs conversation
â˘Big data != big money
â˘Big words require big back-up
â˘All questions are big (and never foolish)
â˘Exciting News: Launch of DataStax Enterprise 3.1
3. Agenda: What Will We Cover?
⢠Introduction to DataStax and NoSQL
⢠Overview of legacy vs. modern, big data applications
⢠Comparing RDBMSâs and NoSQL
⢠Customers examples of RDBMS-to-NoSQL swap outâs
and co-existence strategies
⢠Conclusions
4. Avoid Big Data FUD
⢠Cost compared to what?
⢠Value compared to what?
⢠How to plan for success?
http://www.informationweek.com/big-data/commentary/big-data-analytics/when-big-data-equals-big-
money-waste/240157956
5. DataStax: An Overview
⢠Founded in April 2010
⢠We drive Apache Cassandraâ˘,
the popular open-source NoSQL database
⢠We provide DataStax Enterprise for
enterprise NoSQL implementations
⢠300+ customers
⢠100+ employees
⢠Home to Apache Cassandra Chair & most
committers
⢠Headquartered in San Francisco Bay area
⢠Funded by prominent venture firms
6. What is Apache Cassandra?
Datacenter
Cloud
Massively scalable
NoSQL database
Source: (http://www.datastax.com/resources/whitepapers/bigdata)
And easy
data distribution
That offers
uptime, all the time
(continuous availability)
7. What is DataStax Enterprise?
DataStax Enterprise --
powered by Apache Cassandraâ˘, certified for production
1. DataStax Enterprise Server
2. OpsCenter Enterprise
3. Expert Support & Services
⢠Massive scalability
⢠Continuous availability, and
⢠Operational simplicity for real-time,
analytic, and enterprise search data.
8. Details of DataStax Enterprise Server
⢠Production-certified version of
Cassandra for online applications.
⢠Integrated Hadoop for batch
analytics.
⢠Built-in Solr for enterprise search.
⢠Comprehensive security for
sensitive data.
⢠Active everywhere architecture.
⢠Gold standard for multi-data center
and cloud deployments.
⢠Built-in data replication; removes
need for ETL.
⢠Complete isolation between different
workloads.
⢠Methods for data migration from
legacy RDBMSâs.
9. Details of DataStax OpsCenter
A new, 10-node Cassandra (or Hadoop) cluster with OpsCenter running in 3 minutesâŚA new, 10-node DSE cluster with OpsCenter running on AWS in 3 minutesâŚ
Done1 2 3
10. Launch Today: DataStax Enterprise 3.1
⢠Lower Total Cost of Ownership
⢠Better ROI
⢠Simpler & faster development
⢠Greater insight
⢠More flexibility and functionality
11. Whatâs New: Cassandra 1.2 Integration
⢠Manage up to 10x more Cassandra
data per node than prior versions for
many use cases
⢠Use vnodes and parallel operations
to increase capacity and perform
maintenance operations much faster
⢠Get much greater functionality with
new CQL binary protocol via Java
and .NET drivers
⢠Store arrays and lists of data much
more easily with collections
⢠Get deeper visibility into the
response times of your queries and
other database operations with
tracing
12. Whatâs New: Solr 4.3 Integration
⢠60+ new features
⢠Even faster performance
⢠Stability Improvements
⢠New memory caches and memory
monitoring
⢠Easier customization with new
pluggable document handling
14. http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-
on.html
Netflix Cloud BenchmarkâŚ
âIn terms of scalability, there is a clear winner throughout
our experiments. Cassandra achieves the highest
throughput for the maximum number of nodes in all
experiments with a linear increasing throughput.â
Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August
2013, p. 10. Benchmark paper presented at the Very Large Database Conference, 2013.
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2013.pdf
End Point Independent NoSQL Benchmark
Highest in throughputâŚ
Lowest in latencyâŚ
Cassandra: NoSQL Performance Leader
15. Use Cases Handled By DataStax Enterprise
Managed by Cassandra Managed by Hadoop Managed by Solr
⢠Time series data
⢠Device/Sensor/Data
âexhaustâ systems
⢠Distributed applications
⢠Media streaming
⢠Online Web retail
(transactional, shopping
carts, etc.)
⢠Real-time data analytics
⢠Social media capture and
analysis
⢠Web click-stream analysis
⢠Write-intensive transactional
systems
⢠Buyer behavior analytics
⢠Compliance/regulatory
analysis
⢠Customer
recommendation output
⢠Fraud detection
⢠Risk analysis
⢠Sales program
campaign analysis
⢠Supply chain analytics
⢠Batch Web clickstream
analysis
⢠General Web search
⢠Web retail faceted
(categorization) search
⢠Search/hit prioritization
and highlighting
⢠Application log search and
analysis
⢠Document (PDF, MS
Word, etc.) search and
analysis
⢠Geospatial search
⢠Real estate location and
property search
⢠Social media match ups
16. NoSQL Momentum
âThe economics donât look
great for Oracle.
According to analysis by
Wikibonâs David Floyer
(and highlighted in the
Wall Street Journal), the
NoSQL database market
is expected to grow at a
compound annual
growth rate of nearly
60% between 2011 and
2017. The SQL slice of
the Big Data market, in
contrast, will grow at just a
26% CAGR during that
same time period.â
19. But does this mean the RDBMS is on the way out�
The truth is the vast
majority of modern
application architectures
use both an RDBMS and
NoSQL. The question is
when and where should
each be used?
20. Legacy vs. Todayâs Data Applications
LOB
App
RDBMS
Oracle
LOB
App
RDBMS
MySQL
LOB
App
RDBMS
SQL
Server
Data Warehouse
RDBMS
Teradata/
Column DBâs
LOB
App
NoSQL
LOB
App
NoSQL
LOB
App
NoSQL
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
Data Warehouse
Hadoop
Legacy Line-of-
Business Apps
Todayâs Line-of-
Business Apps
21. Components of Legacy vs. Todayâs Data Applications
LOB
App
RDBMS
Oracle
LOB
App
RDBMS
MySQL
LOB
App
RDBMS
SQL
Server
Data Warehouse
RDBMS
Teradata/
Column DBâs
LOB
App
NoSQL
LOB
App
NoSQL
LOB
App
NoSQL
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
C
* C
*
C
*
C
*
C
*C
*
C
*
C
*
C
*
C
*
Data Warehouse
Hadoop
Transactions:
⢠LOB Style
⢠Full consistency
Analytics:
⢠ROLAP
⢠Rank
⢠Windowing
⢠Partition by, etc.
Search
⢠Full Text
Transactions:
⢠LOB Style
⢠Tunable
consistency
Analytics:
⢠MapReduce
⢠Hive
⢠Pig
⢠Mahout
Search
⢠Solr
Transactions:
⢠DW style
Analytics:
⢠ROLAP
⢠RANK
⢠Windowing
⢠Partition by, etc.
Search
⢠Full Text
Transactions:
⢠None
Analytics:
⢠MapReduce
⢠Hive
⢠Pig
⢠Mahout
Search
⢠Solr
22. Previous Generation vs. Modern Applications
Slow/medium velocity data High velocity data
Data coming in from one/few locations Data coming in from many locations
Rigid, static structured data Flexible, fluid, multi-type data
Low/medium data volumes; purge often High data volumes; retain forever
Deploy app central location/ one server Deploy app everywhere / many servers
Write data in one location Write data everywhere/anywhere
Primary concern: scale reads Scale writes and reads
Scale up for more users/data Scale out for more users/data
Downtime tolerated Downtime not tolerated
Legacy Applications Todayâs Applications
23. DataStax / Cassandra vs. Legacy RDBMS
Fluid and flexible data model Rigid data model
Easily supports modern data types Difficulty in supporting all datatypes
Automatic data sharding/distribution Manual data sharding/distribution
Multi-data center/cloud support Single DC with data shipping options
Continuous availability Medium to high availability
Read from anywhere Read from primary, possibly slaves
Write data anywhere Write data to primary or specified shards
AID transactions; tunable consistency ACID transactions
Unlimited scale out for more capacity Limited scale up for capacity (out-reads)
CQL for primary interface SQL for primary interface
DataStax Enterprise/Cassandra Legacy RDBMS
24. Business Catalysts For NoSQL - Do You Need ToâŚ
âŚkeep business always online and serving customers?
âŚserve customers everywhere (i.e. in multiple locations)?
âŚdeliver information fast both internally and externally?
âŚhandle increasing customer demand?
âŚprotect information that runs the business?
âŚmake business decisions based on right information?
âŚeasily find needed information?
âŚreceive strong payback for IT investments?
25. Keep Business Online
Netflix systems are run in the cloud across multiple availability zones
with Cassandra and sport constant uptime. Over 95% of Netflixâs data
is stored in Cassandra (much of it previously on Oracle).
26. Keep Business Online
Commenting on Amazon outage in Oct 2012: âWe configure all our clusters
to use a replication factor of three, with each replica located in a different
Availability Zone. This allowed Cassandra to handle the outage remarkably
well. When a single zone became unavailable, we didn't need to do
anything. Cassandra routed requests around the unavailable zone and when
it recovered, the ring was repaired.â
- Netflix Tech Blog
27. Serve Customers Everywhere
Rightscale keeps its customers in contact with each other all over the
world via DataStax clusters in 5+ global data centers.
28. Deliver Information Fast Everywhere
Adobe delivers on very stringent response time requirements (<
12ms or less for 95% of requests) for its marketing cloud with
DataStax clusters in two data centers.
29. Handle Increasing Customer Demand
Gnip delivers social media data to 95% of Fortune 500 by using
DataStax Enterprise. Data velocity rates for Twitter alone can be
20,000 tweets per second.
30. Handle Increasing Customer Demand
Ooyala distributes and analyzes media/video content for companies
like ESPN, Rolling Stone and others. They track about one quarter of
all online video viewers each day and generate 1-2 billion events that
are streaming in real-time through their DataStax cluster.
32. Make Right Business Decisions
âDataStax made it all work togetherâ
⢠Cassandra, Hadoop, Solr, Security
Manage costs & improve performance
⢠400% ROI over five years
⢠$750K five-year savings in support costs
⢠90% better response and upload time
Analyzing Information
⢠Doctorsâ notes
⢠Analyze notes to bill back Medicare /
Medicaid
33. Find Information Instantly
Datafiniti, which is a search engine for data, needs to consume lots
of data in real time and provide fast search on top of the same data.
34. Get Strong Payback on IT Investment
Constant Contact found that scaling out with NoSQL vs. IBM DB2
saved them 90% in software costs, and was implemented in 1/3 the
time...
âTo do what we need to do today
without Cassandra would cost a
couple million dollars more and
would be signiďŹcantly harder to
manage operationally.â
36. When Legacy RDBMS over NoSQL/Cassandra
⢠No need for a flexible data model; data is all structured and fits
well within an RDBMS schema.
⢠Data does not come in at high rates and the speed at which
data is written is not important.
⢠You need detailed/complex/nested ACID transactions.
⢠All your data can fit into memory or reside on 1-2 machines and
substantial growth is not expected.
⢠You have no need for constant uptime; unexpected downtime
has no/little impact.
⢠You donât need to distribute data to multiple locations, various
cloud availability zones, or have multiple copies for disaster
recovery purposes.
⢠No need to integrate/seamlessly move data between real-time,
analytics, and search systems.
⢠Software costs not a concern.
37. When DataStax/Cassandra Over Legacy RDBMS
⢠You need a more flexible data model.
⢠You have to store a variety of data types.
⢠You need constant uptime/continuous availability.
⢠You need to distribute data across multiple data centers or
cloud availability zones.
⢠You need linear scale-out performance for growing data.
⢠You need very fast write capabilities.
⢠You need to write and read data in multiple locations.
⢠You need transactions but eventual consistency is OK (or
strong consistency with performance impact for many data
copies).
⢠You need an easy way to integrate real-time, analytics, and
search data.
⢠You need cost savings/a better ROI.
38. How Can I Try DataStax Enterprise?
⢠Go to
www.datastax.com/download.
⢠Download a copy of DataStax
Enterprise.
⢠Installs and configures in minutes.
⢠Completely free for development
evaluation (no trial time bombs,
etc.); subscription required for
production deployments.
40. Thank You â Questions?
We power the big data applications
that transform business.
Hinweis der Redaktion
Point 1: Big data does not equal big money. In fact, choosing a NoSQL solution will almost certainly save your business money, in terms of hardware, licensing, and total cost of ownership. What's more, choosing the correct technology for your use case will almost certainly increase your top line as well.Point 2: Donât settle for big words without big back-up. In this webinar and in lots of other materials on our web site, we'll back up what we say with customer case studies and lots of details. After todayâs conversation, youâll know the basics for growing your business in a profitable way. What's the use of growing your top line but outspending any gains on cumbersome, ineffective, outdated IT? We'll take you through the specific use cases and business models that are the best fit for NoSQL solutions.Point 3: No prior knowledge is required at this point. If you don't even know what RDBMS or NoSQL stand for, you are in the right place. Get your questions answered, and get your business on the right track to meeting your customers' needs in today's data environment.
Every once in a while a prominent information provider gets it wrong, and that was certainly the case with last weekâs InformationWeek post. Writer Todd Holmes perpetuated some of the fears that you might have already experienced as youâve looked into non-relational database technologies. Weâll dispell those concerns during todayâs talk, and give you the information you need to take the right next steps for your business.
DataStax Enterprise is an enterprise NoSQL platform built on Cassandra that lets you scale with no surprises and keep your applications running, no matter what. The platform gives you operational simplicity for real-time, analytic, and enterprise search data. Its components are the DataStax Enterprise Server, OpsCenter Enterprise, and Expert Support & Services