SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
PUTTING YOUR BIG DATA
STRATEGY ON THE RIGHT TRACK
Big data brings a mix of technologies into organizations, and harnessing
those tools can be a challenge. But there are steps IT teams can take to
put their projects on the path to success. BY JACK VAUGHAN
UNLOCKING THE BUSINESS BENEFITS IN BIG DATA
2
DON’T COUNT
OUT THE DATA
WAREHOUSE
3
DATA BY ANY
OTHER NAME
4
GROWING PAINS
1
FINDING THE
RIGHT TOOLS
PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 2
HOME
FINDING THE
RIGHT TOOLS
DON’T COUNT
OUT THE DATA
WAREHOUSE
DATA BY ANY
OTHER NAME
GROWING PAINS
Information of all types is engulfing
computer systems in many organi-
zations, complicating efforts to pull
valuable business insights out of
it through big data analytics initia-
tives. At the same time, a cavalcade
of new technologies has arrived to
help companies cope with the data
influx—but sorting through those
technologies is often an intimidating
task in itself.
In addition, IT managers must
assess whether Hadoop clusters,
NoSQL databases and other big data
management tools can fit comfort-
ably into existing systems architec-
tures or if architectural modi-fica-
tions are needed to accommodate
them. The answer varies based on
factors such as planned uses, organi-
zational structures and IT maturity.
And the burgeoning business-side
interest in extracting business value
and deriving competitive advan-
tages from vaults of big data means
that there isn’t a lot of time to make
those assessments and choose
between the available technology
options. In more and more compa-
nies, big data is viewed as a precious
resource that business leaders and
data scientists want to sift through
like prospectors looking for precious
metals.
This “big data gold rush” puts
added pressure on IT and data
management strategists to quickly
deliver systems that can handle the
growing amounts, and increasing
variety, of incoming data.
One of the biggest issues in plan-
ning a big data strategy is where to
put all the data for processing and
analysis. It wasn’t long ago that
transactional data was the primary
concern and that the options for
managing it boiled down to a hand-
ful of relational databases. Multi-
dimensional databases, columnar
software and other specialized ana-
lytical engines added some choices
for warehousing data from transac-
tion systems for analysis. Even so,
in many companies the big decision
was: enterprise data warehouse
(EDW) or collection of independent
data marts?
But things have changed. Collect-
ing and analyzing data from social
media sites, sensors, system logs
SURGING VOLUMES OF STRUCTURED AND UNSTRUCTURED
DATA—WHAT WE’VE COME TO KNOW AS BIG DATA—ARE
PUTTING IT AND DATA MANAGEMENT TEAMS UNDER THE GUN.
PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 3
HOME
FINDING THE
RIGHT TOOLS
DON’T COUNT
OUT THE DATA
WAREHOUSE
DATA BY ANY
OTHER NAME
GROWING PAINS
and other nontransactional sources
has become a priority for many
organizations. And big data tech-
nologies that can support those ini-
tiatives have proliferated to such an
extent that the number of different,
and disparate, options is dizzying.
Matthew Aslett, an enterprise
software analyst at research and
advisory company The 451 Group,
has depicted the plethora of data
storage and management choices
now available in the form of a Lon-
don Underground subway map,
arraying the available technologies
as stations along color-coded lines
representing different product cat-
egories. In addition to conventional
databases, a sampling of those cat-
egories includes Hadoop file system
implementations as well as schema-
less NoSQL databases and “NewS-
QL” hybrids that use SQL-based
relational data models but aim to
provide NoSQL-like levels of data
scalability. Heightening the potential
for buyer bewilderment even more,
some categories house technologies
of widely varying stripes. In particu-
lar, NoSQL is an umbrella term that
encompasses a diverse mix of graph
databases; document, column and
key-value stores; and other types of
repositories.
Initially, many big data applica-
tions were “greenfield” projects
that didn’t face some of the issues
of typical application development
initiatives, such as the need to inte-
grate with legacy systems or struc-
tured data sources. Often, technol-
ogy-savvy data analysts and other
business users took a first hack at
doing something with unstructured
or semi-structured data under the
radar of IT and business intelligence
managers, taking advantage of the
open source nature of Hadoop and
many NoSQL tools. But big data is
definitely on the corporate radar
now, and the drive to incorporate
non-transactional forms of data into
mainstream analytics processes is
making effective deployment and
management of big data systems by
IT teams a necessity.
There are some fundamental
steps that companies can take to
get started on harnessing big data
technologies and putting their proj-
ects on the path to success. Let’s
take a closer look at a few of them.
1
FINDING THE
RIGHT TOOLS
It’s still early in the big data adop-
tion cycle, and different companies
are trying out different technolo-
gies—sometimes with the same end
goal, as a look at available user case
studies shows:
FINDING THE RIGHT TOOLS
PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 4
HOME
FINDING THE
RIGHT TOOLS
DON’T COUNT
OUT THE DATA
WAREHOUSE
DATA BY ANY
OTHER NAME
GROWING PAINS
n NoSQL databases are being
used to analyze network failure
and degradation patterns, man-
age digital assets and track and
correlate Web server log activ-
ity, among other applications.
n Hadoop systems are being
employed for uses such as
matching highway traffic pat-
terns with cell phone usage data,
evaluating consumer buying
behavior for more targeted eth-
nic demographics and creating
new financial services products
based on real-time analysis of
customer activity.
n NewSQL databases have been
tapped to support applications
that include automating real-
time pricing for air travel and
improving the scalability of util-
ity database systems.
n Analytical databases have been
applied in initiatives such as dis-
secting website user activity and
uncovering trends in GPS infor-
mation collected from taxis.
The key is to pick the right data-
base for the job at hand, in the same
way bettors at a race track try to
choose “the horse for the course,”
a phrase that refers to the ability
of some thoroughbreds to run bet-
ter on dirt or grass, or on a dry or
muddy track. But multiple database
horses might be required for differ-
ent courses within a big data envi-
ronment.
ThoughtWorks Inc., a Chicago-
based software development servic-
es company that also sells applica-
tion lifecycle management tools, has
created a hypothetical online retail
application framework to illustrate
the concept of polyglot persistence,
or using a variety of database tech-
nologies to handle different types of
data based on which technology is
the best fit in each individual case.
For example, a key-value NoSQL
data store might be best for manag-
ing website user-session data as
part of the retail framework, accord-
ing to the ThoughtWorks model. But
it envisions the use of four other fla-
vors of NoSQL databases for tasks
such as processing online shopping-
cart data, powering the site’s rec-
ommendation engine and storing
user activity logs.
And SQL-based relational data-
bases still have their place in this
new polyglot world. In the online
retail framework, relational tech-
nology is depicted as a good fit for
financial data that requires transac-
tional updates and is best served by
a tabular structure. Reporting also
could be the province of a relational
database with SQL interfaces at
the ready for exchanging data with
reporting tools.
Relational databases are efficient
at processing transactions, and
through their support for character-
istics such as transactional atomi-
FINDING THE RIGHT TOOLS
PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 5
HOME
FINDING THE
RIGHT TOOLS
DON’T COUNT
OUT THE DATA
WAREHOUSE
DATA BY ANY
OTHER NAME
GROWING PAINS
city and consistency, they offer reli-
ability and data recovery capabilities
that NoSQL technologies typically
can’t match. But relational software
often isn’t suited to text and other
unstructured forms of big data. And
it requires “a lot of maintenance on
the back end,” including the need
to carefully construct data schemas
and modify them when business
requirements change, said Pramod
Sadalage, a principal consultant at
ThoughtWorks. Those issues are
minimized with NoSQL and Hadoop
offerings.
“What we’re saying is, ‘Give the
things that belong to a certain task
to a certain database,’ ” Sadalage
said. “If you have, for example, a
[product] catalog, put it in a data-
base that is well suited for that—
then searches go faster.”
2DON’T COUNT
OUT THE DATA
WAREHOUSE
Big data management projects
might be born because existing data
warehouse systems are beginning to
sag under the weight of the data that
is flooding into organizations. But
that doesn’t mean data warehouses
are all of a sudden obsolete—just
that the nature of warehousing
data is changing to make room for
big data. “Different styles of data
warehouse architecture have come
and gone over the years,” said Philip
Russom, data management research
director at The Data Warehousing
Institute (TDWI) in Renton, Wash.
“As we move to bigger volumes and
diversity of data, we have to again
evolve the data warehouse, just as
we have in the past.”
Hadoop-based big data systems
initially were viewed as potential
data warehouse killers, but that
sentiment has largely given way to
expectations of peaceful coexis-
tence. For example, 78% of 263 IT
professionals, business users and
consultants surveyed by TDWI in
November 2012 said they thought
Hadoop systems could be a useful
complement to their data warehous-
es for supporting advanced analyt-
ics applications. In addition, 41%
saw Hadoop as an effective staging
area for information on its way to a
data warehouse. Asked if Hadoop
clusters could fully replace an EDW,
more than half of the respondents
said no; just 4% said yes (see FIGURE
1 on page 6).
Russom thinks that using Hadoop
to stage data for loading into data
warehouses is a “beachhead” for
big data technologies in companies.
But the staging process itself is one
aspect of data warehousing that
has changed significantly in recent
DON’T COUNT OUT THE DATA WAREHOUSE
PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 6
HOME
FINDING THE
RIGHT TOOLS
DON’T COUNT
OUT THE DATA
WAREHOUSE
DATA BY ANY
OTHER NAME
GROWING PAINS
years, he said. In many cases, raw
data is likely to pile up in Hadoop
systems and initially be analyzed
there. “In the old days, the data
staging area was pretty temporary,”
Russom said. “But it has evolved to
become a kind of archive.”
Even so, he doesn’t expect those
archives to exist in isolation, dis-
connected from data warehouses.
Some of the data will be moved
into EDWs, perhaps in the form of
aggregated analytics results, and the
two technologies increasingly are
being used in tandem, according to
Russom. “Hadoop-enabled analyt-
ics are sometimes deployed in silos,
but the trend is toward integrating
Hadoop and EDW data at analysis
time for maximal visibility into busi-
ness performance,” he wrote in a
report about the TDWI survey.
3
DATA BY ANY
OTHER NAME
Big data projects begun as skunk-
works or standalone undertakings
do run the risk of creating informa-
tion silos. To prevent that, organiza-
tions should incorporate them into
an overall data management strat-
DATA BY ANY OTHER NAME
FIGURE 1: HADOOP VERSUS THE DATA WAREHOUSE
SOURCE: THE DATA WAREHOUSING INSTITUTE. BASED ON A SURVEY OF 263 IT PROFESSIONALS,
BUSINESS USERS AND CONSULTANTS CONDUCTED IN NOVEMBER 2012.
n Can the HDFS augment your
enterprise data warehouse?
n Can the Hadoop Distributed File
System replace your enterprise
data warehouse?
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
n n n n n n n n n n
4%
Yes
50%
Yes
37%
Maybe
47%
Maybe
59%
No
3%
No
PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 7
HOME
FINDING THE
RIGHT TOOLS
DON’T COUNT
OUT THE DATA
WAREHOUSE
DATA BY ANY
OTHER NAME
GROWING PAINS
egy from the start, said Mark Beyer,
an analyst at Gartner Inc. in Stam-
ford, Conn. That means asking many
of the same questions IT teams ask
about conventional data as part
of data quality and governance
programs, he added. For example,
where did a particular set of big data
come from, how long must it be kept
and does it need to be remediated
before being used?
Beyer said applying proven data
management processes to pools
of big data is especially important
with information that comes from
external sources, including what he
described as “crowdsourced” data
collected from Facebook, Twitter
and other social networks. With
such data, “you don’t know if the
‘create case’ matches the use case,”
he said. Understanding the origins
of data and factors such as how fast
it changes is crucial to effective big
data management, he advised.
The bottom line, Beyer said, is
that “big data assets are no more
accurate than any other digital
information”—and often less so. As
a result, he warned IT managers to
get ready for a bumpy ride: “Big data
is an invader. Big data breaks things.
You don’t control it.” Asserting
control over the data once it’s in an
organization’s systems could mean
the difference between success and
failure in making effective use of the
information.
4
GROWING PAINS
It’s also important to recognize that
technologies such as Hadoop, its
associated MapReduce program-
ming model and NoSQL databases
aren’t automatic cure-alls for a com-
pany’s data management needs.
In addition to the data quality and
governance challenges, technical
complexities lurk around the corners
of big data environments.
For many companies, complex-
ity comes in the form of Java-based
development. Java is the program-
ming language of choice for Hadoop
and other big data technologies. But
even the large army of experienced
Java developers faces challenges
in working with Hadoop because it
doesn’t include native support for
SQL. As a result, developers can run
into difficulties in creating MapRe-
duce programs to distill Hadoop
GROWING PAINS
For many companies, complexity comes in the form
of Java-based development.
PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 8
HOME
FINDING THE
RIGHT TOOLS
DON’T COUNT
OUT THE DATA
WAREHOUSE
DATA BY ANY
OTHER NAME
GROWING PAINS
data into subsets for processing on
different compute nodes in a cluster,
said Paul Dix, CEO and founder of
Errplane, a New York-based consul-
tancy and developer of application
monitoring software. “Most Java
developers face issues in how they
think about processing data into the
MapReduce paradigm,” said Dix,
who also is a member of the New
York Hadoop User Group. “They
have to learn how to write MapRe-
duce code to work with Hadoop;
they have to learn to structure the
problem correctly.”
Programming directly in MapRe-
duce isn’t the only path developers
can take. “There are a lot of ways to
do Hadoop without writing MapRe-
duce programs from scratch,” said
Paul Mackles, senior manager of
software architecture at software
vendor Adobe Systems Inc. in San
Jose, Calif. For example, Hive, an
open source Hadoop offshoot, offers
a table-based data model and a
SQL-like language that automati-
cally compiles queries into MapRe-
duce statements for analyzing data
in Hadoop systems. Apache Pig is a
GROWING PAINS
A SNAPSHOT OF THE BIG DATA TECHNOLOGY LANDSCAPE
IT architects building big data systems have a variety of technology compo-
nents at their disposal.
n Distributions of the Hadoop file system and related MapReduce program-
ming model are offered by Cloudera, Hortonworks, MapR Technologies
and other vendors.
n Hadoop is not an island: The open source software framework is supported
by a long list of supporting tools, including Hive, HBase, Pig, HCatalog and
ZooKeeper.
n NoSQL database technology has grown into a flourishing market seemingly
overnight, populated with products such as CouchDB, Cassandra, MongoDB,
RavenDB, Redis, Riak, Neo4j and InfiniteGraph.
n Hybrid mixes of relational and non-relational technologies are emerging.
Referred to as “NewSQL” databases, they include the likes of VoltDB,
NuoDB, ScaleBase and Drizzle.
n Analytical databases based on a mix of relational, columnar and massively
parallel processing technology include Sybase IQ, Teradata Aster, IBM
Netezza, HP Vertica, Greenplum and ParAccel. n
PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 9
HOME
FINDING THE
RIGHT TOOLS
DON’T COUNT
OUT THE DATA
WAREHOUSE
DATA BY ANY
OTHER NAME
GROWING PAINS
separate platform with a high-level
language for creating highly parallel-
ized MapReduce programs. In addi-
tion, software vendors such as Clou-
dera Inc. are starting to offer their
own SQL query engines for Hadoop.
Mixing Java skills and SQL add-
ons doesn’t assure Hadoop suc-
cess, though. Converting queries to
MapReduce in Hive “works fairly
well, but it isn’t always a clean tran-
sition,” Dix said.
Hive queries often require tuning
to attain the best possible perfor-
mance, according to Mackles. Data
joins are “not its strong suit,” he said
during a presentation at TDWI’s
2013 BI Executive Summit in Las
Vegas. Working with MapReduce
typically incurs performance hits at
the start of query jobs and imposes
more processing overhead while
they’re running, he added.
Finding a good starting point for
a would-be Hadoop development
team can help build both skills and
confidence. One possible starter
project recommended by Dix: put-
ting Web server log files into a
Hadoop cluster and then applying
MapReduce to the data to find out,
say, average response times on
webpages or the number of page-
loading errors generated by a Web
application. “That’s the low-hanging
fruit,” he said.
Mackles listed a variety of new
and upgraded tools that are being
developed to help organizations
get over the big data hump. That
includes a second-generation ver-
sion of MapReduce called Yarn; a
table and storage management util-
ity named HCatalog; and Hadoop
2.0, which is available in an alpha
release and is designed to make
real-time processing and querying of
Hadoop data more feasible, among
other improvements. “Hadoop has
been around long enough that I
think a lot of the shortcomings are
pretty well known,” Mackles said,
adding that Hadoop 2.0 addresses
many of the issues.
Those technologies and others
might well help the big data man-
agement and analytics cause, but
they further add to the vast and
growing forest of tools that IT, data
warehousing and data management
professionals need to navigate in
planning and managing deploy-
ments. It’s a challenge that likely will
be faced in more and more compa-
nies, though. In the TDWI survey,
only 10% of the respondents said
their organizations had Hadoop
systems in production use—but
another 51% said they expected to
be Hadoop users within three years.
The corporate spotlight will be on
the IT teams responsible for build-
ing scalable big data systems and
integrating them into existing data
warehousing and analytics environ-
ments. Finding the right technolo-
gies, and managing the process in a
way that gets the most out of them,
will help keep the glare of that light
from getting too hot. n
GROWING PAINS
PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 10
HOME
FINDING THE
RIGHT TOOLS
DON’T COUNT
OUT THE DATA
WAREHOUSE
DATA BY ANY
OTHER NAME
GROWING PAINS
JACK VAUGHAN is news and
site editor of SearchData
Management.com. He covers
big data management, data
warehousing, databases and
data integration. Vaughan was
an editor for TechTarget’s
SearchSOA.com, SearchVB.com, TheServerSide
.net and SearchDomino.com websites. Email him
at jvaughan@techtarget.com.
Putting Your Big Data Strategy
on the Right Track is a
SearchBusinessAnalytics.com
e-publication.
Scot Petersen
Editorial Director
Jason Sparapani
Managing Editor, E-Publications
Joe Hebert
Associate Managing Editor, E-Publications
Craig Stedman
Executive Editor
Melanie Luna
Managing Editor
Mark Brunelli
News Director
Linda Koury
Director of Online Design
Neva Maniscalco
Graphic Designer
Doug Olender
Publisher
dolender@techtarget.com
Ed Laplante
Director of Sales
elaplante@techtarget.com
TechTarget Inc.
275 Grove Street, Newton, MA 02466
www.techtarget.com
© 2013 TechTarget Inc. No part of this publication
may be transmitted or reproduced in any form or
by any means without written permission from the
publisher. TechTarget reprints are available through
The YGS Group.
About TechTarget: TechTarget publishes media for
information technology profes­sionals. More than
100 focused websites enable quick access to a deep
store of news, advice and analysis about the tech­
nologies, products and processes crucial to your job.
Our live and virtual events give you direct access to
independent expert commentary and advice. At IT
Knowledge Exchange, our social commu­nity, you
can get advice and share solu­tions with peers and
experts.
ABOUT THE AUTHOR

Weitere ähnliche Inhalte

Mehr von The Marketing Distillery

Capitalizing on the Internet of Things: a primer
Capitalizing on the Internet of Things: a primerCapitalizing on the Internet of Things: a primer
Capitalizing on the Internet of Things: a primerThe Marketing Distillery
 
From the Internet of Computers to the Internet of Things
From the Internet of Computers to the Internet of ThingsFrom the Internet of Computers to the Internet of Things
From the Internet of Computers to the Internet of ThingsThe Marketing Distillery
 
Smart networked objects and the Internet of Things
Smart networked objects and the Internet of ThingsSmart networked objects and the Internet of Things
Smart networked objects and the Internet of ThingsThe Marketing Distillery
 
Enhancing intelligence with the Internet of Things
Enhancing intelligence with the Internet of ThingsEnhancing intelligence with the Internet of Things
Enhancing intelligence with the Internet of ThingsThe Marketing Distillery
 
M2M innovations invigorate warehouse management
M2M innovations invigorate warehouse managementM2M innovations invigorate warehouse management
M2M innovations invigorate warehouse managementThe Marketing Distillery
 
How Big Data can help optimize business marketing efforts
How Big Data can help optimize business marketing effortsHow Big Data can help optimize business marketing efforts
How Big Data can help optimize business marketing effortsThe Marketing Distillery
 

Mehr von The Marketing Distillery (20)

Capitalizing on the Internet of Things: a primer
Capitalizing on the Internet of Things: a primerCapitalizing on the Internet of Things: a primer
Capitalizing on the Internet of Things: a primer
 
Making sense of consumer data
Making sense of consumer dataMaking sense of consumer data
Making sense of consumer data
 
Capitalizing on the Internet of Things
Capitalizing on the Internet of ThingsCapitalizing on the Internet of Things
Capitalizing on the Internet of Things
 
Managing the Internet of Things
Managing the Internet of ThingsManaging the Internet of Things
Managing the Internet of Things
 
From the Internet of Computers to the Internet of Things
From the Internet of Computers to the Internet of ThingsFrom the Internet of Computers to the Internet of Things
From the Internet of Computers to the Internet of Things
 
Getting started in Big Data
Getting started in Big DataGetting started in Big Data
Getting started in Big Data
 
Smart networked objects and the Internet of Things
Smart networked objects and the Internet of ThingsSmart networked objects and the Internet of Things
Smart networked objects and the Internet of Things
 
Enhancing intelligence with the Internet of Things
Enhancing intelligence with the Internet of ThingsEnhancing intelligence with the Internet of Things
Enhancing intelligence with the Internet of Things
 
Internet of Things application platforms
Internet of Things application platformsInternet of Things application platforms
Internet of Things application platforms
 
Internet of Things building blocks
Internet of Things building blocksInternet of Things building blocks
Internet of Things building blocks
 
Smart cities and the Internet of Things
Smart cities and the Internet of ThingsSmart cities and the Internet of Things
Smart cities and the Internet of Things
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
M2M innovations invigorate warehouse management
M2M innovations invigorate warehouse managementM2M innovations invigorate warehouse management
M2M innovations invigorate warehouse management
 
Big Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictionsBig Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictions
 
How Big Data can help optimize business marketing efforts
How Big Data can help optimize business marketing effortsHow Big Data can help optimize business marketing efforts
How Big Data can help optimize business marketing efforts
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
The Business Analyst as a leader
The Business Analyst as a leaderThe Business Analyst as a leader
The Business Analyst as a leader
 
Big Data strategy components
Big Data strategy componentsBig Data strategy components
Big Data strategy components
 
Banking on analytics
Banking on analyticsBanking on analytics
Banking on analytics
 
The promise and challenge of Big Data
The promise and challenge of Big DataThe promise and challenge of Big Data
The promise and challenge of Big Data
 

Kürzlich hochgeladen

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 

Kürzlich hochgeladen (20)

20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 

Putting your Big Data management strategy on right track

  • 1. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK Big data brings a mix of technologies into organizations, and harnessing those tools can be a challenge. But there are steps IT teams can take to put their projects on the path to success. BY JACK VAUGHAN UNLOCKING THE BUSINESS BENEFITS IN BIG DATA 2 DON’T COUNT OUT THE DATA WAREHOUSE 3 DATA BY ANY OTHER NAME 4 GROWING PAINS 1 FINDING THE RIGHT TOOLS
  • 2. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 2 HOME FINDING THE RIGHT TOOLS DON’T COUNT OUT THE DATA WAREHOUSE DATA BY ANY OTHER NAME GROWING PAINS Information of all types is engulfing computer systems in many organi- zations, complicating efforts to pull valuable business insights out of it through big data analytics initia- tives. At the same time, a cavalcade of new technologies has arrived to help companies cope with the data influx—but sorting through those technologies is often an intimidating task in itself. In addition, IT managers must assess whether Hadoop clusters, NoSQL databases and other big data management tools can fit comfort- ably into existing systems architec- tures or if architectural modi-fica- tions are needed to accommodate them. The answer varies based on factors such as planned uses, organi- zational structures and IT maturity. And the burgeoning business-side interest in extracting business value and deriving competitive advan- tages from vaults of big data means that there isn’t a lot of time to make those assessments and choose between the available technology options. In more and more compa- nies, big data is viewed as a precious resource that business leaders and data scientists want to sift through like prospectors looking for precious metals. This “big data gold rush” puts added pressure on IT and data management strategists to quickly deliver systems that can handle the growing amounts, and increasing variety, of incoming data. One of the biggest issues in plan- ning a big data strategy is where to put all the data for processing and analysis. It wasn’t long ago that transactional data was the primary concern and that the options for managing it boiled down to a hand- ful of relational databases. Multi- dimensional databases, columnar software and other specialized ana- lytical engines added some choices for warehousing data from transac- tion systems for analysis. Even so, in many companies the big decision was: enterprise data warehouse (EDW) or collection of independent data marts? But things have changed. Collect- ing and analyzing data from social media sites, sensors, system logs SURGING VOLUMES OF STRUCTURED AND UNSTRUCTURED DATA—WHAT WE’VE COME TO KNOW AS BIG DATA—ARE PUTTING IT AND DATA MANAGEMENT TEAMS UNDER THE GUN.
  • 3. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 3 HOME FINDING THE RIGHT TOOLS DON’T COUNT OUT THE DATA WAREHOUSE DATA BY ANY OTHER NAME GROWING PAINS and other nontransactional sources has become a priority for many organizations. And big data tech- nologies that can support those ini- tiatives have proliferated to such an extent that the number of different, and disparate, options is dizzying. Matthew Aslett, an enterprise software analyst at research and advisory company The 451 Group, has depicted the plethora of data storage and management choices now available in the form of a Lon- don Underground subway map, arraying the available technologies as stations along color-coded lines representing different product cat- egories. In addition to conventional databases, a sampling of those cat- egories includes Hadoop file system implementations as well as schema- less NoSQL databases and “NewS- QL” hybrids that use SQL-based relational data models but aim to provide NoSQL-like levels of data scalability. Heightening the potential for buyer bewilderment even more, some categories house technologies of widely varying stripes. In particu- lar, NoSQL is an umbrella term that encompasses a diverse mix of graph databases; document, column and key-value stores; and other types of repositories. Initially, many big data applica- tions were “greenfield” projects that didn’t face some of the issues of typical application development initiatives, such as the need to inte- grate with legacy systems or struc- tured data sources. Often, technol- ogy-savvy data analysts and other business users took a first hack at doing something with unstructured or semi-structured data under the radar of IT and business intelligence managers, taking advantage of the open source nature of Hadoop and many NoSQL tools. But big data is definitely on the corporate radar now, and the drive to incorporate non-transactional forms of data into mainstream analytics processes is making effective deployment and management of big data systems by IT teams a necessity. There are some fundamental steps that companies can take to get started on harnessing big data technologies and putting their proj- ects on the path to success. Let’s take a closer look at a few of them. 1 FINDING THE RIGHT TOOLS It’s still early in the big data adop- tion cycle, and different companies are trying out different technolo- gies—sometimes with the same end goal, as a look at available user case studies shows: FINDING THE RIGHT TOOLS
  • 4. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 4 HOME FINDING THE RIGHT TOOLS DON’T COUNT OUT THE DATA WAREHOUSE DATA BY ANY OTHER NAME GROWING PAINS n NoSQL databases are being used to analyze network failure and degradation patterns, man- age digital assets and track and correlate Web server log activ- ity, among other applications. n Hadoop systems are being employed for uses such as matching highway traffic pat- terns with cell phone usage data, evaluating consumer buying behavior for more targeted eth- nic demographics and creating new financial services products based on real-time analysis of customer activity. n NewSQL databases have been tapped to support applications that include automating real- time pricing for air travel and improving the scalability of util- ity database systems. n Analytical databases have been applied in initiatives such as dis- secting website user activity and uncovering trends in GPS infor- mation collected from taxis. The key is to pick the right data- base for the job at hand, in the same way bettors at a race track try to choose “the horse for the course,” a phrase that refers to the ability of some thoroughbreds to run bet- ter on dirt or grass, or on a dry or muddy track. But multiple database horses might be required for differ- ent courses within a big data envi- ronment. ThoughtWorks Inc., a Chicago- based software development servic- es company that also sells applica- tion lifecycle management tools, has created a hypothetical online retail application framework to illustrate the concept of polyglot persistence, or using a variety of database tech- nologies to handle different types of data based on which technology is the best fit in each individual case. For example, a key-value NoSQL data store might be best for manag- ing website user-session data as part of the retail framework, accord- ing to the ThoughtWorks model. But it envisions the use of four other fla- vors of NoSQL databases for tasks such as processing online shopping- cart data, powering the site’s rec- ommendation engine and storing user activity logs. And SQL-based relational data- bases still have their place in this new polyglot world. In the online retail framework, relational tech- nology is depicted as a good fit for financial data that requires transac- tional updates and is best served by a tabular structure. Reporting also could be the province of a relational database with SQL interfaces at the ready for exchanging data with reporting tools. Relational databases are efficient at processing transactions, and through their support for character- istics such as transactional atomi- FINDING THE RIGHT TOOLS
  • 5. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 5 HOME FINDING THE RIGHT TOOLS DON’T COUNT OUT THE DATA WAREHOUSE DATA BY ANY OTHER NAME GROWING PAINS city and consistency, they offer reli- ability and data recovery capabilities that NoSQL technologies typically can’t match. But relational software often isn’t suited to text and other unstructured forms of big data. And it requires “a lot of maintenance on the back end,” including the need to carefully construct data schemas and modify them when business requirements change, said Pramod Sadalage, a principal consultant at ThoughtWorks. Those issues are minimized with NoSQL and Hadoop offerings. “What we’re saying is, ‘Give the things that belong to a certain task to a certain database,’ ” Sadalage said. “If you have, for example, a [product] catalog, put it in a data- base that is well suited for that— then searches go faster.” 2DON’T COUNT OUT THE DATA WAREHOUSE Big data management projects might be born because existing data warehouse systems are beginning to sag under the weight of the data that is flooding into organizations. But that doesn’t mean data warehouses are all of a sudden obsolete—just that the nature of warehousing data is changing to make room for big data. “Different styles of data warehouse architecture have come and gone over the years,” said Philip Russom, data management research director at The Data Warehousing Institute (TDWI) in Renton, Wash. “As we move to bigger volumes and diversity of data, we have to again evolve the data warehouse, just as we have in the past.” Hadoop-based big data systems initially were viewed as potential data warehouse killers, but that sentiment has largely given way to expectations of peaceful coexis- tence. For example, 78% of 263 IT professionals, business users and consultants surveyed by TDWI in November 2012 said they thought Hadoop systems could be a useful complement to their data warehous- es for supporting advanced analyt- ics applications. In addition, 41% saw Hadoop as an effective staging area for information on its way to a data warehouse. Asked if Hadoop clusters could fully replace an EDW, more than half of the respondents said no; just 4% said yes (see FIGURE 1 on page 6). Russom thinks that using Hadoop to stage data for loading into data warehouses is a “beachhead” for big data technologies in companies. But the staging process itself is one aspect of data warehousing that has changed significantly in recent DON’T COUNT OUT THE DATA WAREHOUSE
  • 6. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 6 HOME FINDING THE RIGHT TOOLS DON’T COUNT OUT THE DATA WAREHOUSE DATA BY ANY OTHER NAME GROWING PAINS years, he said. In many cases, raw data is likely to pile up in Hadoop systems and initially be analyzed there. “In the old days, the data staging area was pretty temporary,” Russom said. “But it has evolved to become a kind of archive.” Even so, he doesn’t expect those archives to exist in isolation, dis- connected from data warehouses. Some of the data will be moved into EDWs, perhaps in the form of aggregated analytics results, and the two technologies increasingly are being used in tandem, according to Russom. “Hadoop-enabled analyt- ics are sometimes deployed in silos, but the trend is toward integrating Hadoop and EDW data at analysis time for maximal visibility into busi- ness performance,” he wrote in a report about the TDWI survey. 3 DATA BY ANY OTHER NAME Big data projects begun as skunk- works or standalone undertakings do run the risk of creating informa- tion silos. To prevent that, organiza- tions should incorporate them into an overall data management strat- DATA BY ANY OTHER NAME FIGURE 1: HADOOP VERSUS THE DATA WAREHOUSE SOURCE: THE DATA WAREHOUSING INSTITUTE. BASED ON A SURVEY OF 263 IT PROFESSIONALS, BUSINESS USERS AND CONSULTANTS CONDUCTED IN NOVEMBER 2012. n Can the HDFS augment your enterprise data warehouse? n Can the Hadoop Distributed File System replace your enterprise data warehouse? n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n n 4% Yes 50% Yes 37% Maybe 47% Maybe 59% No 3% No
  • 7. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 7 HOME FINDING THE RIGHT TOOLS DON’T COUNT OUT THE DATA WAREHOUSE DATA BY ANY OTHER NAME GROWING PAINS egy from the start, said Mark Beyer, an analyst at Gartner Inc. in Stam- ford, Conn. That means asking many of the same questions IT teams ask about conventional data as part of data quality and governance programs, he added. For example, where did a particular set of big data come from, how long must it be kept and does it need to be remediated before being used? Beyer said applying proven data management processes to pools of big data is especially important with information that comes from external sources, including what he described as “crowdsourced” data collected from Facebook, Twitter and other social networks. With such data, “you don’t know if the ‘create case’ matches the use case,” he said. Understanding the origins of data and factors such as how fast it changes is crucial to effective big data management, he advised. The bottom line, Beyer said, is that “big data assets are no more accurate than any other digital information”—and often less so. As a result, he warned IT managers to get ready for a bumpy ride: “Big data is an invader. Big data breaks things. You don’t control it.” Asserting control over the data once it’s in an organization’s systems could mean the difference between success and failure in making effective use of the information. 4 GROWING PAINS It’s also important to recognize that technologies such as Hadoop, its associated MapReduce program- ming model and NoSQL databases aren’t automatic cure-alls for a com- pany’s data management needs. In addition to the data quality and governance challenges, technical complexities lurk around the corners of big data environments. For many companies, complex- ity comes in the form of Java-based development. Java is the program- ming language of choice for Hadoop and other big data technologies. But even the large army of experienced Java developers faces challenges in working with Hadoop because it doesn’t include native support for SQL. As a result, developers can run into difficulties in creating MapRe- duce programs to distill Hadoop GROWING PAINS For many companies, complexity comes in the form of Java-based development.
  • 8. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 8 HOME FINDING THE RIGHT TOOLS DON’T COUNT OUT THE DATA WAREHOUSE DATA BY ANY OTHER NAME GROWING PAINS data into subsets for processing on different compute nodes in a cluster, said Paul Dix, CEO and founder of Errplane, a New York-based consul- tancy and developer of application monitoring software. “Most Java developers face issues in how they think about processing data into the MapReduce paradigm,” said Dix, who also is a member of the New York Hadoop User Group. “They have to learn how to write MapRe- duce code to work with Hadoop; they have to learn to structure the problem correctly.” Programming directly in MapRe- duce isn’t the only path developers can take. “There are a lot of ways to do Hadoop without writing MapRe- duce programs from scratch,” said Paul Mackles, senior manager of software architecture at software vendor Adobe Systems Inc. in San Jose, Calif. For example, Hive, an open source Hadoop offshoot, offers a table-based data model and a SQL-like language that automati- cally compiles queries into MapRe- duce statements for analyzing data in Hadoop systems. Apache Pig is a GROWING PAINS A SNAPSHOT OF THE BIG DATA TECHNOLOGY LANDSCAPE IT architects building big data systems have a variety of technology compo- nents at their disposal. n Distributions of the Hadoop file system and related MapReduce program- ming model are offered by Cloudera, Hortonworks, MapR Technologies and other vendors. n Hadoop is not an island: The open source software framework is supported by a long list of supporting tools, including Hive, HBase, Pig, HCatalog and ZooKeeper. n NoSQL database technology has grown into a flourishing market seemingly overnight, populated with products such as CouchDB, Cassandra, MongoDB, RavenDB, Redis, Riak, Neo4j and InfiniteGraph. n Hybrid mixes of relational and non-relational technologies are emerging. Referred to as “NewSQL” databases, they include the likes of VoltDB, NuoDB, ScaleBase and Drizzle. n Analytical databases based on a mix of relational, columnar and massively parallel processing technology include Sybase IQ, Teradata Aster, IBM Netezza, HP Vertica, Greenplum and ParAccel. n
  • 9. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 9 HOME FINDING THE RIGHT TOOLS DON’T COUNT OUT THE DATA WAREHOUSE DATA BY ANY OTHER NAME GROWING PAINS separate platform with a high-level language for creating highly parallel- ized MapReduce programs. In addi- tion, software vendors such as Clou- dera Inc. are starting to offer their own SQL query engines for Hadoop. Mixing Java skills and SQL add- ons doesn’t assure Hadoop suc- cess, though. Converting queries to MapReduce in Hive “works fairly well, but it isn’t always a clean tran- sition,” Dix said. Hive queries often require tuning to attain the best possible perfor- mance, according to Mackles. Data joins are “not its strong suit,” he said during a presentation at TDWI’s 2013 BI Executive Summit in Las Vegas. Working with MapReduce typically incurs performance hits at the start of query jobs and imposes more processing overhead while they’re running, he added. Finding a good starting point for a would-be Hadoop development team can help build both skills and confidence. One possible starter project recommended by Dix: put- ting Web server log files into a Hadoop cluster and then applying MapReduce to the data to find out, say, average response times on webpages or the number of page- loading errors generated by a Web application. “That’s the low-hanging fruit,” he said. Mackles listed a variety of new and upgraded tools that are being developed to help organizations get over the big data hump. That includes a second-generation ver- sion of MapReduce called Yarn; a table and storage management util- ity named HCatalog; and Hadoop 2.0, which is available in an alpha release and is designed to make real-time processing and querying of Hadoop data more feasible, among other improvements. “Hadoop has been around long enough that I think a lot of the shortcomings are pretty well known,” Mackles said, adding that Hadoop 2.0 addresses many of the issues. Those technologies and others might well help the big data man- agement and analytics cause, but they further add to the vast and growing forest of tools that IT, data warehousing and data management professionals need to navigate in planning and managing deploy- ments. It’s a challenge that likely will be faced in more and more compa- nies, though. In the TDWI survey, only 10% of the respondents said their organizations had Hadoop systems in production use—but another 51% said they expected to be Hadoop users within three years. The corporate spotlight will be on the IT teams responsible for build- ing scalable big data systems and integrating them into existing data warehousing and analytics environ- ments. Finding the right technolo- gies, and managing the process in a way that gets the most out of them, will help keep the glare of that light from getting too hot. n GROWING PAINS
  • 10. PUTTING YOUR BIG DATA STRATEGY ON THE RIGHT TRACK 10 HOME FINDING THE RIGHT TOOLS DON’T COUNT OUT THE DATA WAREHOUSE DATA BY ANY OTHER NAME GROWING PAINS JACK VAUGHAN is news and site editor of SearchData Management.com. He covers big data management, data warehousing, databases and data integration. Vaughan was an editor for TechTarget’s SearchSOA.com, SearchVB.com, TheServerSide .net and SearchDomino.com websites. Email him at jvaughan@techtarget.com. Putting Your Big Data Strategy on the Right Track is a SearchBusinessAnalytics.com e-publication. Scot Petersen Editorial Director Jason Sparapani Managing Editor, E-Publications Joe Hebert Associate Managing Editor, E-Publications Craig Stedman Executive Editor Melanie Luna Managing Editor Mark Brunelli News Director Linda Koury Director of Online Design Neva Maniscalco Graphic Designer Doug Olender Publisher dolender@techtarget.com Ed Laplante Director of Sales elaplante@techtarget.com TechTarget Inc. 275 Grove Street, Newton, MA 02466 www.techtarget.com © 2013 TechTarget Inc. No part of this publication may be transmitted or reproduced in any form or by any means without written permission from the publisher. TechTarget reprints are available through The YGS Group. About TechTarget: TechTarget publishes media for information technology profes­sionals. More than 100 focused websites enable quick access to a deep store of news, advice and analysis about the tech­ nologies, products and processes crucial to your job. Our live and virtual events give you direct access to independent expert commentary and advice. At IT Knowledge Exchange, our social commu­nity, you can get advice and share solu­tions with peers and experts. ABOUT THE AUTHOR