SlideShare ist ein Scribd-Unternehmen logo
1 von 9
Downloaden Sie, um offline zu lesen
WHITEPAPER
WHITEPAPER
Overview
Businesses today are challenged by the ongoing explo-
sion of data. Organizations capture, track, analyze and store
more information than ever before—everything from mass
quantities of transactional, online and mobile data, to grow-
ing amounts of machine-generated data. In fact, machine-
generated data represents the fastest-growing category of
Big Data.
How can you effectively address the impact of data over-
load on application performance, speed and reliability?
Where do newer technologies such as columnar databases
and NoSQL come into play?
The first thing to recognize is that, in the new data manage-
ment paradigm, one size will not fit all data needs. The right
IT solution may encompass one to two to even three tech-
nologies working together.
Figuring out which of the several technologies (and even
subvariants of these technologies) meets your needs—
while also fitting your IT staffing and budget parameters—
is no small issue. We hope this User Guide will help clarify
which data management approach is best for which of your
company’s data challenges.
INFOBRIGHT
Corporate Headquarters
47 Colborne Street, Suite 403
Toronto, Ontario M5E1P8 Canada
Tel. 416 596 2483
Toll Free 877 596 2483
info@infobright.com
www.infobright.com
Sales:
North America
Tel. 312-924-1695
EMEA
Tel. +353 (0)87 743 7107
User’s Guide to the Emerging Database Landscape:
Row vs. Columnar vs. NoSQL
WHITEPAPER
2
WHITEPAPER
Today’s Top Data-Management Challenge
Businesses today are challenged by the ongoing explosion of data. Gartner is predicting data growth will exceed 650%
over the next five years.1
Organizations capture, track, analyze and store everything from mass quantities of transactional,
online and mobile data, to growing amounts of machine-generated data. In fact, machine-generated data, including
sources ranging from web, telecom network and call-detail records, to data from online gaming, social networks, sensors,
computer logs, satellites, financial transaction feeds and more, represents the fastest-growing category of Big Data. High-
volume web sites can generate billions of data entries every month.
As volumes expand into the tens of terabytes and even the
petabyte range, IT departments are being pushed by end users to
provide enhanced analytics and reporting against these ever-
increasing volumes of data. Managers need to be able to quickly
understand this information, but, all too often, extracting useful
intelligence can be like finding the proverbial ‘needle in the
haystack.’
Using traditional row-based databases that were not designed to
analyze this amount of data, IT managers typically try to mitigate
these plummeting response times using several responses. Unfor-
tunately, each method has a significant adverse impact on analytic
effectiveness and/or costs. A recent survey from Unisphere
Research2
highlighted the most typical approaches:
• 	 Tuning or upgrading existing database, the most common response, translates into significantly increased costs,
either through admin costs or licensing fees
• 	 Upgrading hardware processing capabilities increases overall TCO
• 	 Expanding storage systems increases overall costs in direct proportion to the growth of data
• 	 Archiving old data translates into less data your analysts and business users can analyze at any one time. Frequently,
this results in less comprehensive analysis of user patterns—and can greatly impact forward-looking analytic conclu-
sions
• 	 Upgrading network infrastructure leads to both increased costs and, potentially, more complex network configura-
tions.
So, if throwing money at your database problem doesn’t really solve the issues, what should you do? How can you effec-
tively address the impact of data overload on application performance, speed and reliability? Where do newer technolo-
gies such as columnar databases and NoSQL come into play?
 
1
Gartner IT Infrastructure, Operations & Management Summit 2009 Post Event Brief.
2
“Keeping Up with Ever-expanding Enterprise Data,”Joseph McKendrick, Research Analyst, Unisphere Research, October 2010.
Figure 1. Machine-Generated Data Drives Big Data
WHITEPAPER
3
WHITEPAPER
Coexistence. Not Competition.
The first thing to recognize is that, in the new data management paradigm, one size will not fit all data needs. Instead of building the
one, single, ultimate database, the driving force behind the behemoth data-warehousing efforts of the last decade or so, IT managers
need to identify the right technologies to solve their particular business and data problems. The right IT solution may encompass one
to two to even three technologies working together. Open-source technology will coexist with proprietary software. Row-based data-
bases will live peacefully next to a columnar databases—and both will share data with NoSQL solutions.
Sounds simple, doesn’t it? Almost idyllic. Of course, there’s a bit more to it than that.
As Mike Vizard of ITBusinessEdge recently noted,“[T]here is more diversity in the database world than any time in recent memory.”3
Figuring out which of the several technologies (and even subvariants of these technologies) meets your needs—while also fitting your
IT staffing and budget parameters—is no small issue. We hope this User Guide will help clarify which data management approach is
best for which 0o your company’s data challenges.
The Ubiquity of Thinking in Rows
Organizing data in rows has been the standard approach for so long that it can seem like the only way to do it. An address list, a cus-
tomer roster, and inventory information—you can just envision the neat row of fields and data going from left to right on your screen.
Databases such as Oracle, MS SQL Server, DB2 and MySQL are the best known row-based databases.
Row-based databases are ubiquitous because so many of our most important business systems are transactional. Row-oriented data-
bases are well suited for transactional environments, such as a call center where a customer’s entire record is required when their profile
is retrieved and/or when fields are frequently updated. Other examples include:
• 	 Mail merging and customized emails
• 	 Inventory transactions
• 	 Billing and invoicing
Where row-based databases run into trouble is when they are used to handle
analytic loads against large volumes of data, especially when user queries are
dynamic and ad hoc.
To see why, let’s look at a database of sales transactions with 50-days of data
and 1 million rows per day. Each row has 30 columns of data. So, this data-
base has 30 columns and 50 million rows. Say you want to see how many
toasters were sold for the third week of this period. A row-based database
would return 7-million rows (1 million for each day of the third week) with 30
columns for each row—or 210-million data elements. That’s a lot of data ele-
ments to crunch to find out how many toasters were sold that week. As the
3
“The Rise of the Columnar Database,”Mike Vizard, IT BusinessEdge, June 14 2011.
Row-basedDatabase
TransactionalPowerhouse
FIgure 2. Example Data Set
WHITEPAPER
4
WHITEPAPER
data set increases in size, disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to
retrieve all column data for any query.
As we mentioned above, many companies try to solve this I/O problem by creating indices to optimize queries. This may
work for routine reports (i.e. you always want to know how many toasters you sold for the third week of a reporting pe-
riod) but there is a point of diminishing returns as load speed degrades since indices need to be recreated as data is added
In addition, users are severely limited in their ability to quickly do ad-hoc queries (i.e., how many toaster did we sell
through our first Groupon offer? Should we do it again?) that can’t depend on indices to optimize results.
PivotingYour Perspective: Columnar Technology
Column-oriented databases allow data to be stored column-by-column rather than row-by-row. This simple pivot in
perspective—looking down rather than looking across—has profound implications for analytic speed.
Column-oriented databases are better suited for analytics where, unlike transactions, only portions of each record
are required. By grouping the data together this way, the database only needs to retrieve columns that are relevant
to the query, greatly reducing the overall I/O.
Returning to the example in the section above, we see
that a columnar database would not only eliminate
43 days of data, it would also eliminate 28 columns
of data. Returning only the columns for toasters and
units sold, the columnar database would return only
14 million data elements or 93% less data. By return-
ing so much less data, columnar databases are much
faster than row-based databases when analyzing
large data sets.
In addition, some columnar databases (such as Infobright®)
compress data at high rates because each column stores a
single data type (as opposed to rows that typically contain
several data types), and allow compression to be optimized for each particular data type. Row-based databases have
multiple data types and limitless range of values, thus making compression less efficient overall.
Read the sidebar“Infobright: Putting Intelligence in Columns”to learn how Infobright improves query speed even more,
while simplifying administration and lowering costs, with its Knowledge Grid and Domain ExpertiseTM
capabilities.
Figure 3. Pivoting Data for Columnar View
Column-basedDatabase
LightningAnalytics
WHITEPAPER
5
WHITEPAPER
Will the Real NoSQL Please Stand Up?
A term invented by Carlo Strozzi in 19984
, NoSQL has been a hard term
to pin down from the beginning. For one thing, while most people
now translate the term to mean‘Not Only SQL,’there are other accepted
variations. More importantly, the term refers to a broad, emerging class
of non-relational database solutions.
NoSQL technologies have evolved to address specific business needs for
which row technologies couldn’t scale to meet and column technolo-
gies were unsuited to address. Currently, there are over 112 products or
open-source projects in the NoSQL space, with each solution matching
a specific business need. For example:
• 	 Real-time data logging such as in finance or web analytics
• 	 Web apps or any app which needs better performance without hav-
ing to define columns in an RDBMS
• 	 Storing frequently requested data for a web app
While each technology addresses different problems, they all share
certain attributes: huge volume of data and transaction rates, a distrib-
uted architecture and often unstructured (or semi-structured data) with
heavy read/write workloads. Unstructured information is typically text
heavy but may contain data such as dates and other numbers as well.
The resulting irregularities and ambiguities make this data unsuitable
for traditional row-based or column-based structured databases.
In short, NoSQL solutions are typically beasts in terms of their data
capacity, lookup speed and ability to handle streaming data, especially
over highly scaled environments.
On the other hand, they generally lack a SQL interface and often come
with little or no programmatic interfaces—meaning that setup and
administration may require some specialized skills. In addition, NoSQL
can be limited in terms of their ability to execute complex queries, re-
stricting the types of actionable analytics they can deliver. For example,
queries that JOIN two tables or employ nested SELECTs are typically not
possible using these technologies.
Below, we go a bit deeper into each of three main NoSQL subvariants: key-value stores, document stores and column stores.
Infobright:
Putting Intelligence in Columns
Infobright’s high performance analytic database is
designed to handle business-driven queries on large
volumes of data—without IT intervention. Easy to im-
plement and manage, Infobright provides the answers
your business users need at a price you can afford.
How is this achieved?
Infobright combines a columnar database with intelli-
gence we call the Knowledge Grid to deliver fast query
response with unmatched administrative simplicity:
no indexes, no data partitioning, and no manual tun-
ing.
Infobright uses intelligence, not hardware, to drive
query performance:
•	 Creates information about the data upon load,
automatically
•	 Uses this to eliminate or reduce the need to ac-
cess data to respond to a query
•	 The less data that needs to be accessed, the faster
the response
What this means to customers:
•	 Self-managing: 90% less administrative effort
•	 Low-cost: More than 50% less than alternative
solutions
•	 Scalable, high-performance: Up to 50 TB using a
single industry standard server
•	 Fast queries: Ad-hoc queries are as fast as antici-
pated queries, so users have total flexibility
•	 Compression: Data compression of 10:1 to 40:1
that means a lot less storage is needed
Infobright offers an open source and a commercial
edition of its software. Both products are designed to
handle data volumes up to 50TB.
Try it yourself—download our Community Edition at
www.infobright.org, or a free trial of our Enterprise
Edition at www.infobright.com.
4
Wikipedia, http://en.wikipedia.org/wiki/NoSQL
WHITEPAPER
6
WHITEPAPER
NoSQLDatabase
DataBeasts
Key-value Store
A key-value store does what it sounds like it does: values are stored and indexed by a key, usually built on a hash or tree data-structure. 5
Key-value pairs are widely used in tables and configuration files. Key-value stores allow the application to store its data without
predefining a schema—there is no need for a fixed data-model.
In a key-value store, for example, a record may look like:
12345 =>“img456.jpg,checkout.js,20”
Companies turn to key-value stores when they require the functionality of key-values but do not require the technology
overhead of a traditional RDBMS system, either because they require more efficient, cost-effective scalability or they are work-
ing with unstructured or semi-structured data. Key-value stores are great for unstructured data centered on a single object,
and where data is stored in memory with some persistent backup. Consequently, they are typically used as a cache for data
frequently requested by web applications such as online shopping carts or social-media sites. As these web pages are created
on the fly, the static components are quickly retrieved and served up to the user.
Document Store
As with a key-value store, companies turn to NoSQL document stores when they are dealing with huge volumes of data and transac-
tions requiring massive horizontal scaling or sharding. And, similarly, there is no need for a pre-set schema. However, the data in docu-
ment stores can contain several keys, so queries aren’t as limited as they are in key-value stores. For example, in a document data store
an example record could read:
“id”=> 12345,
“name”=>“Jane”,
“age”=> 22,
“email”=>“jane@gmail.com”
While multiple keys increase the types of possible queries, the data stored in these‘documents’do not need to be predefined and can
change from document to document. The tradeoff for the more complex query-options is speed: queries with a key-value store are
much simpler and often faster.
Document stores are often deployed for web-traffic analysis, user-behavior/action analysis, or log-file analysis in real time. However,
while document stores allow more query capabilities than key-value stores, there are still limitations given the non-relational basis of
the document-store database.
Column Store
Column stores are an emerging NoSQL option, created in response to very specific database problems involving beyond-massive
amounts of data across a hugely distributed system. Think Google. Think Facebook. Imagine the colossal amount of data that Google
stores in its data farms. And then imagine how many permutations of data sets need to be compiled to respond to all possible Google
5
For more on hash functions see http://en.wikipedia.org/wiki/Hash_function. For more on tree data see http://en.wikipedia.org/wiki/Tree_%28data_structure%29.
WHITEPAPER
7
WHITEPAPER
searches. Clearly, this task could never be accomplished in any reasonable time frame with a traditional relational database. It requires the
ability to handle massive amounts of data but with more query complexity than either key-value stores or document stores would deliver.
Most column stores also use MapReduce, a fault-tolerant framework for processing huge datasets on certain kinds of distributable problems
using a large number of computers. This technology is still emerging—and
use cases may eventually overlap with document stores as both technologies
mature. But at the moment, the use cases in production for column stores
are generally limited to applications such as Google and Facebook.
A Column by Any Other Name…..
It should go without saying, but we’ll say it anyway—a column store is only
similar to a column-based database in that they both have the word‘column’
in their names. A column-based database is still a structured relational
database, albeit one optimized for analytics. A column store is still firmly in
the NoSQL camp—this is a system for handling huge volumes of data and
transactions, in a massively distributed manner, without the need to define
the database structure up front—though it tends to have more SQL traits
than either a key-value store or document store.
Can I Get a Hadoop From Anyone?
While this User Guide addresses the emerging database landscape, no con-
versation would be complete without mentioning Hadoop.
Hadoop is a scalable fault-tolerant distributed system for data storage and
processing (open source under the Apache license). It has two main parts:
• 	 Hadoop Distributed File System (HDFS): self-healing high-bandwidth
clustered storage
• 	 MapReduce: fault-tolerant distributed processing framework
The data typically stored with Hadoop is complex, from multiple data sources
and, well, there’s always lots and lots of it. Beyond being a mass-storage
system, Hadoop, through MapReduce, also is used for batch processing and
computation done in parallel execution spread over a cluster of servers.
While running MapReduce jobs is a common way to access data stored in
Hadoop, technologies such as Hbase and Hive which sit on top of HDFS are
also used to query the data.
LiveRail: Infobright & Hadoop Power
Video Advertising Analytics
LiveRail delivers technology
solutions that enable and
enhance the monetization of
internet-distributed video. By focusing specifically on
challenges and opportunities created by online video,
LiveRail’s tools are designed to be easier, more efficient
and more effective than traditional display ad servers
to deliver and track advertising into this medium. Their
platform enables publishers, advertisers, ad networks
and media groups to manage, target, display and track
advertising in online video.
The Challenge: LiveRail’s platform enables publish-
ers, advertisers, ad networks and media groups to
manage, target, display and track advertising in online
video. With a growing number of customers, LiveRail
was faced with managing increasingly large data
volumes.
They also needed to provide near real-time access to
their customers for reporting and ad hoc analysis.
The Solution: LiveRail chose two complementary
technologies to manage hundreds of millions of rows
of data each day—Apache Hadoop and Infobright.
Detail is loaded hourly into Hadoop and at the same
time summarized and loaded into Infobright. Custom-
ers access Infobright 7x24 for ad-hoc reporting and
analysis and can schedule time if needed to access
cookie-level data stored in Hadoop.
“Infobright and Hadoop are complementary tech-
nologies that help us manage large amounts of data
while meeting diverse customers needs to analyze the
performance of video advertising investments.”
Andrei Dunca, CTO of LiveRail
WHITEPAPER
8
WHITEPAPER
Summary and Next Steps
The world of one-size-fits-all database is done. Myriad technology approaches have been (and are being) developed to meet the challenges of Big
Data. This activity impels corporate IT groups to look beyond rows-based solutions to find the right fit for their analytic needs, staffing and budget
requirements.
We hope that this paper, and the following Emerging Database Landscape chart, serves as a useful resource for figuring out the strengths and the
weaknesses of the various database approaches available today.
Infobright: High-performance Analytics for Machine-generated Data
Infobright’s high-performance database is the preferred choice for applications and data marts that analyze large volumes of“machine-generated
data”such as Web data, network logs, telecom records, stock tick data and sensor data. Easy to implement and with unmatched data compression,
operational simplicity and low cost, Infobright is being used by enterprises, SaaS and software companies in online businesses, telecommunications,
financial services and other industries to provide rapid access to critical business data.
If you decide that a columnar database has a place in your analytic solutions, you can try it for yourself, free. Either download our Community Edi-
tion at www.infobright.org, or a free trial of our Enterprise Edition at www.infobright.com. For more information, please visit http://www.infobright.
com or join our open source community at http://www.infobright.org.
WHITEPAPER
9
The Emerging Database Landscape
This chart gives a quick overview of the strengths, weaknesses and use cases for row-based, columnar and NoSQL databases.
	 Row-Based	 Columnar	 NoSQL—Key Value Store 	 NoSQL—Document Store 	 NoSQL—Column Store
Basic Description	 Data structured in rows	 Data is vertically striped	 Data stored usually in	 Persistent storage for unstructured 	 Very large data storage, MapReduce
		 and stored in columns	 memory with some persistent	 or semi-structured data along with	 support
			 backup	 some SQL-like querying functionality
Common Use Cases	 Transaction processing,	 Historical data analysis, 	 Used as a cache for storing	 Web apps or any app which needs	 Real-time data logging
	 interactive transactional	 data warehousing, business	 frequently requested data 	 better performance and scalability 	 as in finance or web analytics
	 applications 	 intelligence	 for a web app	 without having to define columns
				 in an RDBMS
Strengths	 Capturing and 	 Fast query support, 	 Scalability, very fast storage	 Persistent store with scalability	 Very high throughput for Big Data,
	 inputting new records. 	 especially for ad hoc queries	 and retrieval of unstructured	 features such as sharding built in 	 strong partitioning support, random
	 Robust, proven technology. 	 on large datasets,	 and partly structured data	 with and better query support	 read-write access
		 compression		 than key-value stores
Weaknesses	 Scale issues—less suitable	 Not suited for transactions; 	 Usually all data must fit into	 Lack of sophisticated query 	 Low-level API, inability to perform
	 for queries, especially 	 import and export speed; 	 memory, no complex query 	 capabilities	 complex queries, high latency of
	 against large databases	 heavy computing resource 	 capabilities		 response to queries
		utilization
Typical Database Size Range		 Several GBs to 50 TB	 Several GBs to several TBs	 Few TBs to several PBs	 Few TBs to several PBs
Key Players	 MySQL, Oracle, SQL Sever, 	 Infobright, Aster Data, 	 MemCached, Amazon S3, 	 MongoDb, Couchdb, SimpleDb	 HBase, Big Table, Cassandra
	 Sybase ASE	 Sybase IQ, Vertica, ParAccel	 Redis, Voldemort
© Copyright 2011 Infobright Inc. Infobright is a registered trademark of Infobright Inc. All other trademarks and registered trademarks are the property of their respective owners.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Business analytics
Business analyticsBusiness analytics
Business analytics
 
Big Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New ChallengesBig Data Analytics: Recent Achievements and New Challenges
Big Data Analytics: Recent Achievements and New Challenges
 
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
 
Unit 2
Unit 2Unit 2
Unit 2
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big data
Big dataBig data
Big data
 
Hh
HhHh
Hh
 
Unlocking big data
Unlocking big dataUnlocking big data
Unlocking big data
 
Dw hk-white paper
Dw hk-white paperDw hk-white paper
Dw hk-white paper
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
Big Data Challenges faced by Organizations
Big Data Challenges faced by OrganizationsBig Data Challenges faced by Organizations
Big Data Challenges faced by Organizations
 
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Abstract
AbstractAbstract
Abstract
 
Bigdata
BigdataBigdata
Bigdata
 
Big Data : Risks and Opportunities
Big Data : Risks and OpportunitiesBig Data : Risks and Opportunities
Big Data : Risks and Opportunities
 

Andere mochten auch

Smartphones – A game changer in expectations of customer experience
Smartphones – A game changer in expectations of customer experienceSmartphones – A game changer in expectations of customer experience
Smartphones – A game changer in expectations of customer experiencenavaidkhan
 
StudentProject_Computer game and learning at school
StudentProject_Computer game and learning at schoolStudentProject_Computer game and learning at school
StudentProject_Computer game and learning at schoolsvetcheb
 
Signalsflash070113 LTE World Summit Key take aways
Signalsflash070113 LTE World Summit Key take awaysSignalsflash070113 LTE World Summit Key take aways
Signalsflash070113 LTE World Summit Key take awaysnavaidkhan
 
Tmw20115 baroux.c
Tmw20115 baroux.cTmw20115 baroux.c
Tmw20115 baroux.cnavaidkhan
 
Alu 9900 wng_congestion_notification_interface_v1 3_external
Alu 9900 wng_congestion_notification_interface_v1 3_externalAlu 9900 wng_congestion_notification_interface_v1 3_external
Alu 9900 wng_congestion_notification_interface_v1 3_externalnavaidkhan
 
Alu 9900 wng_nbi_v03
Alu 9900 wng_nbi_v03Alu 9900 wng_nbi_v03
Alu 9900 wng_nbi_v03navaidkhan
 
Self optimizing networks-benefits of son in lte-july 2011
Self optimizing networks-benefits of son in lte-july 2011Self optimizing networks-benefits of son in lte-july 2011
Self optimizing networks-benefits of son in lte-july 2011navaidkhan
 
Skyfire log files100411
Skyfire log files100411Skyfire log files100411
Skyfire log files100411navaidkhan
 
Tmw20101 hayden.j and spaar
Tmw20101 hayden.j and spaarTmw20101 hayden.j and spaar
Tmw20101 hayden.j and spaarnavaidkhan
 
Tmw20116 brooks.l
Tmw20116 brooks.lTmw20116 brooks.l
Tmw20116 brooks.lnavaidkhan
 
Tmw20127 turgut.c
Tmw20127 turgut.cTmw20127 turgut.c
Tmw20127 turgut.cnavaidkhan
 
StudentProject_Virtual or real life?
StudentProject_Virtual or real life?StudentProject_Virtual or real life?
StudentProject_Virtual or real life?svetcheb
 
Huawei - Lte handover troubleshooting
Huawei - Lte handover troubleshootingHuawei - Lte handover troubleshooting
Huawei - Lte handover troubleshootingnavaidkhan
 

Andere mochten auch (17)

Smartphones – A game changer in expectations of customer experience
Smartphones – A game changer in expectations of customer experienceSmartphones – A game changer in expectations of customer experience
Smartphones – A game changer in expectations of customer experience
 
StudentProject_Computer game and learning at school
StudentProject_Computer game and learning at schoolStudentProject_Computer game and learning at school
StudentProject_Computer game and learning at school
 
Komp Igra
Komp IgraKomp Igra
Komp Igra
 
Signalsflash070113 LTE World Summit Key take aways
Signalsflash070113 LTE World Summit Key take awaysSignalsflash070113 LTE World Summit Key take aways
Signalsflash070113 LTE World Summit Key take aways
 
Tmw20115 baroux.c
Tmw20115 baroux.cTmw20115 baroux.c
Tmw20115 baroux.c
 
Kursy Do
Kursy DoKursy Do
Kursy Do
 
Alu 9900 wng_congestion_notification_interface_v1 3_external
Alu 9900 wng_congestion_notification_interface_v1 3_externalAlu 9900 wng_congestion_notification_interface_v1 3_external
Alu 9900 wng_congestion_notification_interface_v1 3_external
 
Alu 9900 wng_nbi_v03
Alu 9900 wng_nbi_v03Alu 9900 wng_nbi_v03
Alu 9900 wng_nbi_v03
 
Self optimizing networks-benefits of son in lte-july 2011
Self optimizing networks-benefits of son in lte-july 2011Self optimizing networks-benefits of son in lte-july 2011
Self optimizing networks-benefits of son in lte-july 2011
 
Skyfire log files100411
Skyfire log files100411Skyfire log files100411
Skyfire log files100411
 
Tmw20101 hayden.j and spaar
Tmw20101 hayden.j and spaarTmw20101 hayden.j and spaar
Tmw20101 hayden.j and spaar
 
Tmw20098 land
Tmw20098 landTmw20098 land
Tmw20098 land
 
Tmw20116 brooks.l
Tmw20116 brooks.lTmw20116 brooks.l
Tmw20116 brooks.l
 
Tmw20127 turgut.c
Tmw20127 turgut.cTmw20127 turgut.c
Tmw20127 turgut.c
 
StudentProject_Virtual or real life?
StudentProject_Virtual or real life?StudentProject_Virtual or real life?
StudentProject_Virtual or real life?
 
TMF Nice
TMF Nice TMF Nice
TMF Nice
 
Huawei - Lte handover troubleshooting
Huawei - Lte handover troubleshootingHuawei - Lte handover troubleshooting
Huawei - Lte handover troubleshooting
 

Ähnlich wie Emerging database landscape july 2011

Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and howbobosenthil
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
What's the Big Deal About Big Data?
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?Logi Analytics
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data managementDavid Walker
 
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperStream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperImpetus Technologies
 
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docxWeek 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docxjessiehampson
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your PortfolioDenodo
 
Exercise solution of chapter3 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter3 of datawarehouse cs614(solution of exercise)Exercise solution of chapter3 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter3 of datawarehouse cs614(solution of exercise)AYESHA JAVED
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeSG Analytics
 
The Second Big Bang
The Second Big BangThe Second Big Bang
The Second Big BangConnexica
 
Data warehousing has quickly evolved into a unique and popular busin.pdf
Data warehousing has quickly evolved into a unique and popular busin.pdfData warehousing has quickly evolved into a unique and popular busin.pdf
Data warehousing has quickly evolved into a unique and popular busin.pdfapleather
 
TDWI checklist 2018 - Data Warehouse Infrastructure
TDWI checklist 2018 - Data Warehouse InfrastructureTDWI checklist 2018 - Data Warehouse Infrastructure
TDWI checklist 2018 - Data Warehouse InfrastructureJeannette Browning
 

Ähnlich wie Emerging database landscape july 2011 (20)

The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
What's the Big Deal About Big Data?
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
Struggling with data management
Struggling with data managementStruggling with data management
Struggling with data management
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Stream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White PaperStream Meets Batch for Smarter Analytics- Impetus White Paper
Stream Meets Batch for Smarter Analytics- Impetus White Paper
 
Database Essay
Database EssayDatabase Essay
Database Essay
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docxWeek 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
Week 4 Lecture 1 - Databases and Data WarehousesManagement of .docx
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio
 
Exercise solution of chapter3 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter3 of datawarehouse cs614(solution of exercise)Exercise solution of chapter3 of datawarehouse cs614(solution of exercise)
Exercise solution of chapter3 of datawarehouse cs614(solution of exercise)
 
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to LifeEvolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
Evolving Big Data Strategies: Bringing Data Lake and Data Mesh Vision to Life
 
The Second Big Bang
The Second Big BangThe Second Big Bang
The Second Big Bang
 
Data warehousing has quickly evolved into a unique and popular busin.pdf
Data warehousing has quickly evolved into a unique and popular busin.pdfData warehousing has quickly evolved into a unique and popular busin.pdf
Data warehousing has quickly evolved into a unique and popular busin.pdf
 
TDWI checklist 2018 - Data Warehouse Infrastructure
TDWI checklist 2018 - Data Warehouse InfrastructureTDWI checklist 2018 - Data Warehouse Infrastructure
TDWI checklist 2018 - Data Warehouse Infrastructure
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Emerging database landscape july 2011

  • 1. WHITEPAPER WHITEPAPER Overview Businesses today are challenged by the ongoing explo- sion of data. Organizations capture, track, analyze and store more information than ever before—everything from mass quantities of transactional, online and mobile data, to grow- ing amounts of machine-generated data. In fact, machine- generated data represents the fastest-growing category of Big Data. How can you effectively address the impact of data over- load on application performance, speed and reliability? Where do newer technologies such as columnar databases and NoSQL come into play? The first thing to recognize is that, in the new data manage- ment paradigm, one size will not fit all data needs. The right IT solution may encompass one to two to even three tech- nologies working together. Figuring out which of the several technologies (and even subvariants of these technologies) meets your needs— while also fitting your IT staffing and budget parameters— is no small issue. We hope this User Guide will help clarify which data management approach is best for which of your company’s data challenges. INFOBRIGHT Corporate Headquarters 47 Colborne Street, Suite 403 Toronto, Ontario M5E1P8 Canada Tel. 416 596 2483 Toll Free 877 596 2483 info@infobright.com www.infobright.com Sales: North America Tel. 312-924-1695 EMEA Tel. +353 (0)87 743 7107 User’s Guide to the Emerging Database Landscape: Row vs. Columnar vs. NoSQL
  • 2. WHITEPAPER 2 WHITEPAPER Today’s Top Data-Management Challenge Businesses today are challenged by the ongoing explosion of data. Gartner is predicting data growth will exceed 650% over the next five years.1 Organizations capture, track, analyze and store everything from mass quantities of transactional, online and mobile data, to growing amounts of machine-generated data. In fact, machine-generated data, including sources ranging from web, telecom network and call-detail records, to data from online gaming, social networks, sensors, computer logs, satellites, financial transaction feeds and more, represents the fastest-growing category of Big Data. High- volume web sites can generate billions of data entries every month. As volumes expand into the tens of terabytes and even the petabyte range, IT departments are being pushed by end users to provide enhanced analytics and reporting against these ever- increasing volumes of data. Managers need to be able to quickly understand this information, but, all too often, extracting useful intelligence can be like finding the proverbial ‘needle in the haystack.’ Using traditional row-based databases that were not designed to analyze this amount of data, IT managers typically try to mitigate these plummeting response times using several responses. Unfor- tunately, each method has a significant adverse impact on analytic effectiveness and/or costs. A recent survey from Unisphere Research2 highlighted the most typical approaches: • Tuning or upgrading existing database, the most common response, translates into significantly increased costs, either through admin costs or licensing fees • Upgrading hardware processing capabilities increases overall TCO • Expanding storage systems increases overall costs in direct proportion to the growth of data • Archiving old data translates into less data your analysts and business users can analyze at any one time. Frequently, this results in less comprehensive analysis of user patterns—and can greatly impact forward-looking analytic conclu- sions • Upgrading network infrastructure leads to both increased costs and, potentially, more complex network configura- tions. So, if throwing money at your database problem doesn’t really solve the issues, what should you do? How can you effec- tively address the impact of data overload on application performance, speed and reliability? Where do newer technolo- gies such as columnar databases and NoSQL come into play?   1 Gartner IT Infrastructure, Operations & Management Summit 2009 Post Event Brief. 2 “Keeping Up with Ever-expanding Enterprise Data,”Joseph McKendrick, Research Analyst, Unisphere Research, October 2010. Figure 1. Machine-Generated Data Drives Big Data
  • 3. WHITEPAPER 3 WHITEPAPER Coexistence. Not Competition. The first thing to recognize is that, in the new data management paradigm, one size will not fit all data needs. Instead of building the one, single, ultimate database, the driving force behind the behemoth data-warehousing efforts of the last decade or so, IT managers need to identify the right technologies to solve their particular business and data problems. The right IT solution may encompass one to two to even three technologies working together. Open-source technology will coexist with proprietary software. Row-based data- bases will live peacefully next to a columnar databases—and both will share data with NoSQL solutions. Sounds simple, doesn’t it? Almost idyllic. Of course, there’s a bit more to it than that. As Mike Vizard of ITBusinessEdge recently noted,“[T]here is more diversity in the database world than any time in recent memory.”3 Figuring out which of the several technologies (and even subvariants of these technologies) meets your needs—while also fitting your IT staffing and budget parameters—is no small issue. We hope this User Guide will help clarify which data management approach is best for which 0o your company’s data challenges. The Ubiquity of Thinking in Rows Organizing data in rows has been the standard approach for so long that it can seem like the only way to do it. An address list, a cus- tomer roster, and inventory information—you can just envision the neat row of fields and data going from left to right on your screen. Databases such as Oracle, MS SQL Server, DB2 and MySQL are the best known row-based databases. Row-based databases are ubiquitous because so many of our most important business systems are transactional. Row-oriented data- bases are well suited for transactional environments, such as a call center where a customer’s entire record is required when their profile is retrieved and/or when fields are frequently updated. Other examples include: • Mail merging and customized emails • Inventory transactions • Billing and invoicing Where row-based databases run into trouble is when they are used to handle analytic loads against large volumes of data, especially when user queries are dynamic and ad hoc. To see why, let’s look at a database of sales transactions with 50-days of data and 1 million rows per day. Each row has 30 columns of data. So, this data- base has 30 columns and 50 million rows. Say you want to see how many toasters were sold for the third week of this period. A row-based database would return 7-million rows (1 million for each day of the third week) with 30 columns for each row—or 210-million data elements. That’s a lot of data ele- ments to crunch to find out how many toasters were sold that week. As the 3 “The Rise of the Columnar Database,”Mike Vizard, IT BusinessEdge, June 14 2011. Row-basedDatabase TransactionalPowerhouse FIgure 2. Example Data Set
  • 4. WHITEPAPER 4 WHITEPAPER data set increases in size, disk I/O becomes a substantial limiting factor since a row-oriented design forces the database to retrieve all column data for any query. As we mentioned above, many companies try to solve this I/O problem by creating indices to optimize queries. This may work for routine reports (i.e. you always want to know how many toasters you sold for the third week of a reporting pe- riod) but there is a point of diminishing returns as load speed degrades since indices need to be recreated as data is added In addition, users are severely limited in their ability to quickly do ad-hoc queries (i.e., how many toaster did we sell through our first Groupon offer? Should we do it again?) that can’t depend on indices to optimize results. PivotingYour Perspective: Columnar Technology Column-oriented databases allow data to be stored column-by-column rather than row-by-row. This simple pivot in perspective—looking down rather than looking across—has profound implications for analytic speed. Column-oriented databases are better suited for analytics where, unlike transactions, only portions of each record are required. By grouping the data together this way, the database only needs to retrieve columns that are relevant to the query, greatly reducing the overall I/O. Returning to the example in the section above, we see that a columnar database would not only eliminate 43 days of data, it would also eliminate 28 columns of data. Returning only the columns for toasters and units sold, the columnar database would return only 14 million data elements or 93% less data. By return- ing so much less data, columnar databases are much faster than row-based databases when analyzing large data sets. In addition, some columnar databases (such as Infobright®) compress data at high rates because each column stores a single data type (as opposed to rows that typically contain several data types), and allow compression to be optimized for each particular data type. Row-based databases have multiple data types and limitless range of values, thus making compression less efficient overall. Read the sidebar“Infobright: Putting Intelligence in Columns”to learn how Infobright improves query speed even more, while simplifying administration and lowering costs, with its Knowledge Grid and Domain ExpertiseTM capabilities. Figure 3. Pivoting Data for Columnar View Column-basedDatabase LightningAnalytics
  • 5. WHITEPAPER 5 WHITEPAPER Will the Real NoSQL Please Stand Up? A term invented by Carlo Strozzi in 19984 , NoSQL has been a hard term to pin down from the beginning. For one thing, while most people now translate the term to mean‘Not Only SQL,’there are other accepted variations. More importantly, the term refers to a broad, emerging class of non-relational database solutions. NoSQL technologies have evolved to address specific business needs for which row technologies couldn’t scale to meet and column technolo- gies were unsuited to address. Currently, there are over 112 products or open-source projects in the NoSQL space, with each solution matching a specific business need. For example: • Real-time data logging such as in finance or web analytics • Web apps or any app which needs better performance without hav- ing to define columns in an RDBMS • Storing frequently requested data for a web app While each technology addresses different problems, they all share certain attributes: huge volume of data and transaction rates, a distrib- uted architecture and often unstructured (or semi-structured data) with heavy read/write workloads. Unstructured information is typically text heavy but may contain data such as dates and other numbers as well. The resulting irregularities and ambiguities make this data unsuitable for traditional row-based or column-based structured databases. In short, NoSQL solutions are typically beasts in terms of their data capacity, lookup speed and ability to handle streaming data, especially over highly scaled environments. On the other hand, they generally lack a SQL interface and often come with little or no programmatic interfaces—meaning that setup and administration may require some specialized skills. In addition, NoSQL can be limited in terms of their ability to execute complex queries, re- stricting the types of actionable analytics they can deliver. For example, queries that JOIN two tables or employ nested SELECTs are typically not possible using these technologies. Below, we go a bit deeper into each of three main NoSQL subvariants: key-value stores, document stores and column stores. Infobright: Putting Intelligence in Columns Infobright’s high performance analytic database is designed to handle business-driven queries on large volumes of data—without IT intervention. Easy to im- plement and manage, Infobright provides the answers your business users need at a price you can afford. How is this achieved? Infobright combines a columnar database with intelli- gence we call the Knowledge Grid to deliver fast query response with unmatched administrative simplicity: no indexes, no data partitioning, and no manual tun- ing. Infobright uses intelligence, not hardware, to drive query performance: • Creates information about the data upon load, automatically • Uses this to eliminate or reduce the need to ac- cess data to respond to a query • The less data that needs to be accessed, the faster the response What this means to customers: • Self-managing: 90% less administrative effort • Low-cost: More than 50% less than alternative solutions • Scalable, high-performance: Up to 50 TB using a single industry standard server • Fast queries: Ad-hoc queries are as fast as antici- pated queries, so users have total flexibility • Compression: Data compression of 10:1 to 40:1 that means a lot less storage is needed Infobright offers an open source and a commercial edition of its software. Both products are designed to handle data volumes up to 50TB. Try it yourself—download our Community Edition at www.infobright.org, or a free trial of our Enterprise Edition at www.infobright.com. 4 Wikipedia, http://en.wikipedia.org/wiki/NoSQL
  • 6. WHITEPAPER 6 WHITEPAPER NoSQLDatabase DataBeasts Key-value Store A key-value store does what it sounds like it does: values are stored and indexed by a key, usually built on a hash or tree data-structure. 5 Key-value pairs are widely used in tables and configuration files. Key-value stores allow the application to store its data without predefining a schema—there is no need for a fixed data-model. In a key-value store, for example, a record may look like: 12345 =>“img456.jpg,checkout.js,20” Companies turn to key-value stores when they require the functionality of key-values but do not require the technology overhead of a traditional RDBMS system, either because they require more efficient, cost-effective scalability or they are work- ing with unstructured or semi-structured data. Key-value stores are great for unstructured data centered on a single object, and where data is stored in memory with some persistent backup. Consequently, they are typically used as a cache for data frequently requested by web applications such as online shopping carts or social-media sites. As these web pages are created on the fly, the static components are quickly retrieved and served up to the user. Document Store As with a key-value store, companies turn to NoSQL document stores when they are dealing with huge volumes of data and transac- tions requiring massive horizontal scaling or sharding. And, similarly, there is no need for a pre-set schema. However, the data in docu- ment stores can contain several keys, so queries aren’t as limited as they are in key-value stores. For example, in a document data store an example record could read: “id”=> 12345, “name”=>“Jane”, “age”=> 22, “email”=>“jane@gmail.com” While multiple keys increase the types of possible queries, the data stored in these‘documents’do not need to be predefined and can change from document to document. The tradeoff for the more complex query-options is speed: queries with a key-value store are much simpler and often faster. Document stores are often deployed for web-traffic analysis, user-behavior/action analysis, or log-file analysis in real time. However, while document stores allow more query capabilities than key-value stores, there are still limitations given the non-relational basis of the document-store database. Column Store Column stores are an emerging NoSQL option, created in response to very specific database problems involving beyond-massive amounts of data across a hugely distributed system. Think Google. Think Facebook. Imagine the colossal amount of data that Google stores in its data farms. And then imagine how many permutations of data sets need to be compiled to respond to all possible Google 5 For more on hash functions see http://en.wikipedia.org/wiki/Hash_function. For more on tree data see http://en.wikipedia.org/wiki/Tree_%28data_structure%29.
  • 7. WHITEPAPER 7 WHITEPAPER searches. Clearly, this task could never be accomplished in any reasonable time frame with a traditional relational database. It requires the ability to handle massive amounts of data but with more query complexity than either key-value stores or document stores would deliver. Most column stores also use MapReduce, a fault-tolerant framework for processing huge datasets on certain kinds of distributable problems using a large number of computers. This technology is still emerging—and use cases may eventually overlap with document stores as both technologies mature. But at the moment, the use cases in production for column stores are generally limited to applications such as Google and Facebook. A Column by Any Other Name….. It should go without saying, but we’ll say it anyway—a column store is only similar to a column-based database in that they both have the word‘column’ in their names. A column-based database is still a structured relational database, albeit one optimized for analytics. A column store is still firmly in the NoSQL camp—this is a system for handling huge volumes of data and transactions, in a massively distributed manner, without the need to define the database structure up front—though it tends to have more SQL traits than either a key-value store or document store. Can I Get a Hadoop From Anyone? While this User Guide addresses the emerging database landscape, no con- versation would be complete without mentioning Hadoop. Hadoop is a scalable fault-tolerant distributed system for data storage and processing (open source under the Apache license). It has two main parts: • Hadoop Distributed File System (HDFS): self-healing high-bandwidth clustered storage • MapReduce: fault-tolerant distributed processing framework The data typically stored with Hadoop is complex, from multiple data sources and, well, there’s always lots and lots of it. Beyond being a mass-storage system, Hadoop, through MapReduce, also is used for batch processing and computation done in parallel execution spread over a cluster of servers. While running MapReduce jobs is a common way to access data stored in Hadoop, technologies such as Hbase and Hive which sit on top of HDFS are also used to query the data. LiveRail: Infobright & Hadoop Power Video Advertising Analytics LiveRail delivers technology solutions that enable and enhance the monetization of internet-distributed video. By focusing specifically on challenges and opportunities created by online video, LiveRail’s tools are designed to be easier, more efficient and more effective than traditional display ad servers to deliver and track advertising into this medium. Their platform enables publishers, advertisers, ad networks and media groups to manage, target, display and track advertising in online video. The Challenge: LiveRail’s platform enables publish- ers, advertisers, ad networks and media groups to manage, target, display and track advertising in online video. With a growing number of customers, LiveRail was faced with managing increasingly large data volumes. They also needed to provide near real-time access to their customers for reporting and ad hoc analysis. The Solution: LiveRail chose two complementary technologies to manage hundreds of millions of rows of data each day—Apache Hadoop and Infobright. Detail is loaded hourly into Hadoop and at the same time summarized and loaded into Infobright. Custom- ers access Infobright 7x24 for ad-hoc reporting and analysis and can schedule time if needed to access cookie-level data stored in Hadoop. “Infobright and Hadoop are complementary tech- nologies that help us manage large amounts of data while meeting diverse customers needs to analyze the performance of video advertising investments.” Andrei Dunca, CTO of LiveRail
  • 8. WHITEPAPER 8 WHITEPAPER Summary and Next Steps The world of one-size-fits-all database is done. Myriad technology approaches have been (and are being) developed to meet the challenges of Big Data. This activity impels corporate IT groups to look beyond rows-based solutions to find the right fit for their analytic needs, staffing and budget requirements. We hope that this paper, and the following Emerging Database Landscape chart, serves as a useful resource for figuring out the strengths and the weaknesses of the various database approaches available today. Infobright: High-performance Analytics for Machine-generated Data Infobright’s high-performance database is the preferred choice for applications and data marts that analyze large volumes of“machine-generated data”such as Web data, network logs, telecom records, stock tick data and sensor data. Easy to implement and with unmatched data compression, operational simplicity and low cost, Infobright is being used by enterprises, SaaS and software companies in online businesses, telecommunications, financial services and other industries to provide rapid access to critical business data. If you decide that a columnar database has a place in your analytic solutions, you can try it for yourself, free. Either download our Community Edi- tion at www.infobright.org, or a free trial of our Enterprise Edition at www.infobright.com. For more information, please visit http://www.infobright. com or join our open source community at http://www.infobright.org.
  • 9. WHITEPAPER 9 The Emerging Database Landscape This chart gives a quick overview of the strengths, weaknesses and use cases for row-based, columnar and NoSQL databases. Row-Based Columnar NoSQL—Key Value Store NoSQL—Document Store NoSQL—Column Store Basic Description Data structured in rows Data is vertically striped Data stored usually in Persistent storage for unstructured Very large data storage, MapReduce and stored in columns memory with some persistent or semi-structured data along with support backup some SQL-like querying functionality Common Use Cases Transaction processing, Historical data analysis, Used as a cache for storing Web apps or any app which needs Real-time data logging interactive transactional data warehousing, business frequently requested data better performance and scalability as in finance or web analytics applications intelligence for a web app without having to define columns in an RDBMS Strengths Capturing and Fast query support, Scalability, very fast storage Persistent store with scalability Very high throughput for Big Data, inputting new records. especially for ad hoc queries and retrieval of unstructured features such as sharding built in strong partitioning support, random Robust, proven technology. on large datasets, and partly structured data with and better query support read-write access compression than key-value stores Weaknesses Scale issues—less suitable Not suited for transactions; Usually all data must fit into Lack of sophisticated query Low-level API, inability to perform for queries, especially import and export speed; memory, no complex query capabilities complex queries, high latency of against large databases heavy computing resource capabilities response to queries utilization Typical Database Size Range Several GBs to 50 TB Several GBs to several TBs Few TBs to several PBs Few TBs to several PBs Key Players MySQL, Oracle, SQL Sever, Infobright, Aster Data, MemCached, Amazon S3, MongoDb, Couchdb, SimpleDb HBase, Big Table, Cassandra Sybase ASE Sybase IQ, Vertica, ParAccel Redis, Voldemort © Copyright 2011 Infobright Inc. Infobright is a registered trademark of Infobright Inc. All other trademarks and registered trademarks are the property of their respective owners.