SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Reducing the Total Cost of
Ownership of Big Data
W H I T E P A P E R
Abstract
In this white paper, Impetus shares best practices and
strategies that will enable businesses to lower the total
cost of ownership of Big Data solutions. This white paper
discusses challenges related to the cost of Big Data
solutions, and looks at the technological options available
to address Big Data concerns.
Impetus Technologies Inc.
www.impetus.com
Reducing the Total Cost of Ownership of Big Data
2
Contents
Introduction...........................................................................................................3
Using Commodity Hardware for Big Data..............................................................3
Using Open Source and Cloud Computing.............................................................4
The Cost Components of a Big Data Warehouse...................................................4
Lowering the Total Cost of Ownership ..................................................................5
Reducing the Cost of Storage.................................................................................5
What Technologies, Where? ..................................................................................6
Big Data Scenarios in OLAP....................................................................................7
Analytics with Hadoop............................................................................................8
Choosing the Right Technologies...........................................................................8
Opting for Faster MapReduce/Hadoop .................................................................9
NoSQL Database Solutions.....................................................................................9
New Era Relational Databases.............................................................................10
Impetus Solutions and Recommendations..........................................................10
Conclusion............................................................................................................11
Reducing the Total Cost of Ownership of Big Data
3
Introduction
As the power of Big Data solutions continues to grow, so too does the cost of
collecting, managing, and storing data. According to IDC/EMC estimates, the
total value of the computers, networks, and storage facilities driving the digital
universe now stands at a whopping USD 6 trillion! Furthermore, that figure is
expected to grow significantly over the next few years. In fact, some estimate
that the size of digital universe doubles every 18 months.
Yet, how much of that information is actually useful? An overload of information
can actually increase the cost of storage, reduce producitivity, and essentially
ensure much of the collected data will go to waste. Despite access to this rich
pool of data, many businesses continue to extract information of little value. It
is estimated that businesses spend an extra USD 650 billion to gather and store
data that they never put to use.
Clearly, much more can be done to unearth business intelligence and actionable
insights from Big Data. The question is, what is the best way do that both
intellgently and cost-effectively? In this white paper, Impetus examines some of
the pros and cons of several Big Data solutions on the market, and offers
practical advice based on years of experience.
Using Commodity Hardware for Big Data
There are many advantages to using commodity hardware. In addition to being
both readily available and accessible, the biggest advantage of commodity
hardware is businesses can build it themselves, opening up many avenues for
innovation.
The cost of building reliable storage from commodity hardware is about USD 1
per gigabyte—a great deal and a very good start. However, keep in mind, that
figure only covers the cost of storage and does not include other costs
associated with managing, monitoring, and hosting data.
Reducing the Total Cost of Ownership of Big Data
4
Using Open Source and Cloud Computing
Using free, open source software to store, manage, and analyze Big Data comes
with a number of benefits. By now, everyone has heard of Hadoop and its ability
to tackle large volumes of data, while still providing significant savings.
Using cloud computing for Big Data also has its advantages. The advantage is
cloud computing allows users to rent resources over the cloud to take care of
data and analytics; for example, Amazon Web Services, and Microsoft, for its
Windows Azure platform. Cloud computing allows you to select an offering from
their portfolios appropriate for your needs and requirements.
The downside to using cloud computing, however, is its storage capabilities.
While there is storage over the cloud, it can be very costly.
The Cost Components of a Big Data Warehouse
Many businesses today are turning to Big Data Warehouses as a means of
storage. Before making this decision, it is important to understand the costs
these storage facilities can generate.
Entry Cost
The first expense is entry cost—the cost incurred to identify the right Big Data
solution.
Cost of Migrating Data
Once a Big Data solution has been chosen, next expense will be the cost of
moving data to the new system. Data migration can be especially expensive for
businesses requiring ETL processes. ETL processes may require the purchase of
specialized tools that can also be quite expensive.
Other Costs
A number of other factors can potentially inflate the cost of Big Data solutions.
For example, all solutions require a tool that will enable the system to be easily
handled for scalability, and in the setting of failing conditions. Thus,
Reducing the Total Cost of Ownership of Big Data
5
performance analytics and data management may represent additional major
expenses to a Big Data plan.
Ongoing maintenance is also essential, and accounts for another cost. As the
volume of data increases and changes are made, Big Data warehouses will
always require monitoring and tuning.
Taken together, these factors—performance analytics, data management, data
maintenance—can dramatically increase the cost of a Big Data solution.
Lowering the Total Cost of Ownership
Based on years of experience in the field, Impetus has identified a number of
best practices to help businesses reduce the total cost of ownership of Big Data
solutions. This section discusses potential cost savings in hardware and
software, with these two main suggestions in mind:
For hardware, Impetus suggests that looking at cost saving available in
storage and computation.
For software, Impetus suggests a number of solutions that will enable
the processing of more data, more quickly, and for less money.
Reducing the Cost of Storage
Impetus advises businesses to compress data in order to cut storage costs.
Compressed data requires less storage space, and less storage space means less
spending.
Some of the solutions available on the market claim they can compress data to
1/40th of its previous size. When looking at these solutions, however, be careful
to ensure that the read throughput of the data is not compromised when it is
decompressed.
Additionally, with Big Data analytics, businesses may opt to focus on a specific
subset of data, rather than looking at all data, which accumulated over time.
Another option would be to look into systems designed to store data and
information based on principles very similar to information lifecycle
management (ILM).
Reducing the Total Cost of Ownership of Big Data
6
With all this talk about Big Data, it is easy to forget about small data. Often, it is
easier to gain business insight using smaller sets of data. Thus, Impetus does not
recommend using Big Data solutions for the storage and retrieval of small
amounts of data, as the relative latency of queries will be higher.
What Technologies, Where?
One key to reducing the total cost of ownership is to understand the available
technologies and how they can be used.
With the advent of Big Data, many different commercial and specialized
hardware and appliances have come to the market. These solutions offer rich
features such as fault tolerance, easy capacity scaling, and specialized
management tools. The commodity hardware available today can be harnessed
for Big Data use cases by leveraging the open source stacks or solutions.
Latency is also a critical factor, but the systems with the lowest latency are also
likely to be the most expensive. There is, of course, a niche market that focuses
on latency as a business problem.
For cloud-based Big Data solutions, the first question is whether moving to the
cloud is the only solution given data storage requirements. Moving to a cloud-
based solution can be quite expensive, especially if the data is not already on
the cloud. Businesses will also need to upload all of the data needed for
processing, which adds significantly to the cost.
With this thorough understanding of the technologies available to tackle Big
Data, Impetus will now discuss how these technologies can be used. These
technologies can be broadly divided into two categories—online analytical
processing (OLAP) and online transaction processing (OTP).
Big Data Scenarios in OLTP
When generating or working with large sets of data in an OLTP scenario, cost-
effective NoSQL solutions are ideal. When working with a typical data
warehouse that requires analytical processing, however, Impetus recommends
using MapReduce or MPP-based systems.
Reducing the Total Cost of Ownership of Big Data
7
Big Data Scenarios in OLAP
Big Data online analytical processing (OLAP) can be divided into three different
scenarios:
Big Input Small Output. This is the most common scenario, and is often
used to draw conclusions and to prepare graphs or charts, or in cases
where the top n-elements in a data set need to be identified.
Small Input Big Output. This scenario occurs when the input data set is
small and the resulting output is big, and typically occurs in cases of
predictive analysis, where n-number of outcomes are possible. It is also
applicable in scenarios where correlation-coefficient matrices must be
populated with a given set of inputs. These inputs may be small, but the
results might turn out to be very large.
Big Input and Big Output. The third scenario occurs in ETL processes.
Here, the magnitude of output data is similar to that of input data.
In the real world, whenever businesses summarize or concentrate data with
respect to parameters such as data volume, latency, or cost, there is a decrease
in volume of data. In such a scenario, small data solutions such as MPP-data
stores, traditional relational databases, and newer NoSQL databases offering
the lowest latency are recommended. Note, however, that when moving from a
small data solution to a Big Data solution, the latency of these systems will
increase while the corresponding cost per gigabyte will decrease.
It is well known that Hadoop systems are cost effective. That said, in case of
small data solutions, where latency is the key factor, opting for customized and
tailored solutions that enable quicker data retrieval will provide the best results.
The primary downfall of these solutions is that the cost of deployment will
increase the storage cost per GB.
Massively parallel processing (MPP), on the other hand, offers a number of
significant benefits. MPP-data store solutions provide relational stores while
simultaneously accommodating larger sizes of data.
Often times it is best to deploy a combination of these systems to best address
business needs.
Reducing the Total Cost of Ownership of Big Data
8
Analytics with Hadoop
Indirect Analytics Over Hadoop
In this approach, Hadoop is used to clean and transform the data into a
structured form, then to load the structured data into the RDBMS databases.
This approach provides the end user with the flexibility of parallel processing of
Hadoop and an SQL interface at the summarized data level. This solution is
relatively inexpensive when compared with other options.
Direct Analytics Over Hadoop
Applying analytics directly over a Hadoop system without moving it to any
RDBMS databases can be an effective practice to analyze the data from the
Hadoop Distributed File System (HDFS).
This approach enables both batch and asynchronous analytics of data in the
Hadoop system. This is a very cost-effective approach because it does not
require the management of data sources other than existing Hadoop systems. It
also allows flexibility to scale to any level with summarized data.
Analytics Over Hadoop with MPP Data Warehouse
Today, a number of options available on the market allow for the integration of
MPP-based data warehouses and Hadoop. These options are worth considering
for large volumes of data.
The primary disadvantage to these approaches, however, is the potential cost
involved. Most MPP-based data warehouses are expensive. Some also require
high-end servers for deployment, which only add to the expense.
Choosing the Right Technologies
To choose right technology stack, businesses need to look at these three factors
to first determine whether implementation of business use cases:
Cost: The first factor is the cost per terabyte for storage. The next
consideration is the cost related to business continuity and vendor lock-
Reducing the Total Cost of Ownership of Big Data
9
in. Also, understand how the current system is likely to change with
strategic decisions, and if these changes would require a different
vendor.
Latency: The next factors to consider are latency requirements. Do any
use case take the throughput of the system into account? For a system
for smaller data, when system response times are critical, MPP-based or
relational database systems would be a better choice.
Dollar-per-terabyte: For business driven by the dollar-per-terabyte
factor, Impetus advises an MPP-based solution. This option provides a
middle ground between the Hadoop and NoSQL-based solutions, and
can allow storage of large amounts of data without compromising
speed.
For business with varying requirements, whose data and related strategies also
change frequently, Impetus does recommend working with a vendor lock-in
model.
Opting for Faster MapReduce/Hadoop
For business requirements driven by cost or business continuity, opt for
Hadoop. Hadoop will enable storage of all of data, and has a relatively high
degree of latency. A few vendors offer faster Hadoop implementations or other
parallel processing frameworks. These solutions usually extend standard
Hadoop APIs and offer enhanced system performance, as well as better support
for the production environment.
NoSQL Database Solutions
OLTP scenarios mean that faster reads and writes are required. The vendors in
this market offer a variety of different solutions with different underlying
implementations, each suited to a different business use case:
Hbase and Cassandra are recommended for banking and financial
business. For random and real-time read/write access to the Big ‘table-
like’ Data, use HBase. For faster writes, look to Cassandra.
MongoDB and CouchDB are recommended when the primary
requirement is the querying of transactional data and defining indexes.
Reducing the Total Cost of Ownership of Big Data
10
There are also other databases—graph databases like Neo4j for
instance—that make Big-Data-heavy social media analytics problems
simpler.
New Era Relational Databases
The latest relational databases (RDBMSs) have been specifically designed with
these OLTP scenarios in mind, and have taken major steps toward addressing
latency issues. Many businesses have been using SQL successfully for the last
several years, and most business users still consider SQL to be the best tool to
query structured data.
Other solutions include emerging sets of technologies and new versions of
existing RDBMS engines that are all very adept at handling large volumes of
structured data.
Therefore, for handling large volumes of structured data, look to new era RDMS
solutions like MySQL cluster, GridSQL, or later versions of Microsoft SQL Server.
Impetus Solutions and Recommendations
One way to reduce the cost of data migration is to use MapReduce for ETL,
rather than costly ETL tools.
Management and provisioning tools are available with commercial Big Data
solutions for easy management of systems. Impetus offers Ankush, a vendor-
neutral tool for cluster management, which can be used to automatically
provision multiple Hadoop clusters.
For ongoing maintenance, Impetus mantra for success is, “automate, automate,
automate!” Any task that needs to be carried out more than once should be
automated. This also holds true for monitoring and tuning.
When dealing with changing capacity, continue to add hardware or look for
alternative methods to speed things up. Using graphics processing units for
general purpose computing can also help.
Impetus also recommends Rainstor or similar solutions that help to compress
data and reduce the cost of hardware required data storage.
Reducing the Total Cost of Ownership of Big Data
11
Finally, look to faster, tailored MapReduce solutions that will allow completion
of more tasks in less time.
Conclusion
In summary, best practices and robust strategies can help lower the total cost of
ownership of your Big Data solutions, and transform Big Data challenges into Big
Data opportunities.
At Impetus, we have used these methods paired with the Hadoop Ecosystem to
successfully tackle Big Data problems.
About Impetus
Impetus Technologies is a leading provider of Big Data solutions for the
Fortune 500®. We help customers effectively manage the “3-Vs” of Big Data
and create new business insights across their enterprises.
Website: www.bigdata.impetus.com | Email: bigdata@impetus.com
© 2013 Impetus Technologies, Inc.
All rights reserved. Product and
company names mentioned herein
may be trademarks of their
respective companies.
May 2013

Weitere ähnliche Inhalte

Was ist angesagt?

The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)Cloudera, Inc.
 
How to Lower TCO and Avoid Cloud Lock-in

How to Lower TCO and Avoid Cloud Lock-in
How to Lower TCO and Avoid Cloud Lock-in

How to Lower TCO and Avoid Cloud Lock-in
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Revolutionising Storage for your Future Business Requirements
Revolutionising Storage for your Future Business RequirementsRevolutionising Storage for your Future Business Requirements
Revolutionising Storage for your Future Business RequirementsNetApp
 
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopCloudera, Inc.
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeDataWorks Summit
 
EMEA TechTalk – The NetApp Flash Optimized Portfolio
EMEA TechTalk – The NetApp Flash Optimized PortfolioEMEA TechTalk – The NetApp Flash Optimized Portfolio
EMEA TechTalk – The NetApp Flash Optimized PortfolioNetApp
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduCloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateCloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningCloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
The Benefits of Flash Storage for Virtualized Environments
The Benefits of Flash Storage for Virtualized EnvironmentsThe Benefits of Flash Storage for Virtualized Environments
The Benefits of Flash Storage for Virtualized EnvironmentsNetApp
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnCloudera, Inc.
 

Was ist angesagt? (20)

The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)The Transformation of your Data in modern IT (Presented by DellEMC)
The Transformation of your Data in modern IT (Presented by DellEMC)
 
How to Lower TCO and Avoid Cloud Lock-in

How to Lower TCO and Avoid Cloud Lock-in
How to Lower TCO and Avoid Cloud Lock-in

How to Lower TCO and Avoid Cloud Lock-in

 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Revolutionising Storage for your Future Business Requirements
Revolutionising Storage for your Future Business RequirementsRevolutionising Storage for your Future Business Requirements
Revolutionising Storage for your Future Business Requirements
 
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning
Transforming Insurance Analytics with Big Data and Automated Machine Learning

Transforming Insurance Analytics with Big Data and Automated Machine Learning

 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Breakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with HadoopBreakout: Operational Analytics with Hadoop
Breakout: Operational Analytics with Hadoop
 
Rob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San JoseRob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San Jose
 
Making Bank Predictive and Real-Time
Making Bank Predictive and Real-TimeMaking Bank Predictive and Real-Time
Making Bank Predictive and Real-Time
 
EMEA TechTalk – The NetApp Flash Optimized Portfolio
EMEA TechTalk – The NetApp Flash Optimized PortfolioEMEA TechTalk – The NetApp Flash Optimized Portfolio
EMEA TechTalk – The NetApp Flash Optimized Portfolio
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Apache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance UpdateApache Impala (incubating) 2.5 Performance Update
Apache Impala (incubating) 2.5 Performance Update
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine Learning
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
The Benefits of Flash Storage for Virtualized Environments
The Benefits of Flash Storage for Virtualized EnvironmentsThe Benefits of Flash Storage for Virtualized Environments
The Benefits of Flash Storage for Virtualized Environments
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in Churn
 

Ähnlich wie Reducing the Total Cost of Ownership of Big Data- Impetus White Paper

Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
EMC Isilon: A Scalable Storage Platform for Big Data
EMC Isilon: A Scalable Storage Platform for Big DataEMC Isilon: A Scalable Storage Platform for Big Data
EMC Isilon: A Scalable Storage Platform for Big DataEMC
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaJyrki Määttä
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattooMohamed Magdy
 
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docxHow Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docxpooleavelina
 
datacore-1-341M4XT
datacore-1-341M4XTdatacore-1-341M4XT
datacore-1-341M4XTGary Mason
 
Managing The Data Explosion
Managing The Data ExplosionManaging The Data Explosion
Managing The Data ExplosionLaura Hood
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapterRajiv Tiwari
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S
 
Storage Cost vs. Performance: Which Problem are You Solving?
Storage Cost vs. Performance: Which Problem are You Solving?Storage Cost vs. Performance: Which Problem are You Solving?
Storage Cost vs. Performance: Which Problem are You Solving?IBM India Smarter Computing
 
A-B-C Strategies for File and Content Brochure
A-B-C Strategies for File and Content BrochureA-B-C Strategies for File and Content Brochure
A-B-C Strategies for File and Content BrochureHitachi Vantara
 
G3May15-digital-Big Data
G3May15-digital-Big DataG3May15-digital-Big Data
G3May15-digital-Big DataJerry Bowskill
 
Creating the Foundations for the Internet of Things
Creating the Foundations for the Internet of ThingsCreating the Foundations for the Internet of Things
Creating the Foundations for the Internet of ThingsCapgemini
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperHitachi Vantara
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 

Ähnlich wie Reducing the Total Cost of Ownership of Big Data- Impetus White Paper (20)

Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Fundamentals of Big Data
Fundamentals of Big DataFundamentals of Big Data
Fundamentals of Big Data
 
EMC Isilon: A Scalable Storage Platform for Big Data
EMC Isilon: A Scalable Storage Platform for Big DataEMC Isilon: A Scalable Storage Platform for Big Data
EMC Isilon: A Scalable Storage Platform for Big Data
 
Data warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-clouderaData warehouse-optimization-with-hadoop-informatica-cloudera
Data warehouse-optimization-with-hadoop-informatica-cloudera
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
The book of elephant tattoo
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
 
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docxHow Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
How Analytics Has Changed in the Last 10 Years (and How It’s Staye.docx
 
datacore-1-341M4XT
datacore-1-341M4XTdatacore-1-341M4XT
datacore-1-341M4XT
 
Managing The Data Explosion
Managing The Data ExplosionManaging The Data Explosion
Managing The Data Explosion
 
Hadoop for Finance - sample chapter
Hadoop for Finance - sample chapterHadoop for Finance - sample chapter
Hadoop for Finance - sample chapter
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | Qubole
 
Storage Cost vs. Performance: Which Problem are You Solving?
Storage Cost vs. Performance: Which Problem are You Solving?Storage Cost vs. Performance: Which Problem are You Solving?
Storage Cost vs. Performance: Which Problem are You Solving?
 
A-B-C Strategies for File and Content Brochure
A-B-C Strategies for File and Content BrochureA-B-C Strategies for File and Content Brochure
A-B-C Strategies for File and Content Brochure
 
G3May15-digital-Big Data
G3May15-digital-Big DataG3May15-digital-Big Data
G3May15-digital-Big Data
 
Creating the Foundations for the Internet of Things
Creating the Foundations for the Internet of ThingsCreating the Foundations for the Internet of Things
Creating the Foundations for the Internet of Things
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White Paper
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
Big Data
Big DataBig Data
Big Data
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 

Mehr von Impetus Technologies

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Impetus Technologies
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarImpetus Technologies
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Impetus Technologies
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in ElasticsearchImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarImpetus Technologies
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Impetus Technologies
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Impetus Technologies
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...Impetus Technologies
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastImpetus Technologies
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Impetus Technologies
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Impetus Technologies
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Impetus Technologies
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabImpetus Technologies
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trendsImpetus Technologies
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...Impetus Technologies
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastImpetus Technologies
 

Mehr von Impetus Technologies (20)

Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
Data Warehouse Modernization Webinar Series- Critical Trends, Implementation ...
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Building Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus WebinarBuilding Real-time Streaming Apps in Minutes- Impetus Webinar
Building Real-time Streaming Apps in Minutes- Impetus Webinar
 
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise- StreamAna...
 
Impetus White Paper- Handling Data Corruption in Elasticsearch
Impetus White Paper- Handling  Data Corruption  in ElasticsearchImpetus White Paper- Handling  Data Corruption  in Elasticsearch
Impetus White Paper- Handling Data Corruption in Elasticsearch
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix WebinarReal-world Applications of Streaming Analytics- StreamAnalytix Webinar
Real-world Applications of Streaming Analytics- StreamAnalytix Webinar
 
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
Real-time Streaming Analytics for Enterprises based on Apache Storm - Impetus...
 
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
Accelerating Hadoop Solution Lifecycle and Improving ROI- Impetus On-demand W...
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...SPARK USE CASE-  Distributed Reinforcement Learning for Electricity Market Bi...
SPARK USE CASE- Distributed Reinforcement Learning for Electricity Market Bi...
 
Enterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus WebcastEnterprise Ready Android and Manageability- Impetus Webcast
Enterprise Ready Android and Manageability- Impetus Webcast
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
Leveraging NoSQL Database Technology to Implement Real-time Data Architecture...
 
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
Maturity of Mobile Test Automation: Approaches and Future Trends- Impetus Web...
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
 
Webinar maturity of mobile test automation- approaches and future trends
Webinar  maturity of mobile test automation- approaches and future trendsWebinar  maturity of mobile test automation- approaches and future trends
Webinar maturity of mobile test automation- approaches and future trends
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
The Shared Elephant - Hadoop as a Shared Service for Multiple Departments – I...
 
Performance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus WebcastPerformance Testing of Big Data Applications - Impetus Webcast
Performance Testing of Big Data Applications - Impetus Webcast
 

Kürzlich hochgeladen

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Kürzlich hochgeladen (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Reducing the Total Cost of Ownership of Big Data- Impetus White Paper

  • 1. Reducing the Total Cost of Ownership of Big Data W H I T E P A P E R Abstract In this white paper, Impetus shares best practices and strategies that will enable businesses to lower the total cost of ownership of Big Data solutions. This white paper discusses challenges related to the cost of Big Data solutions, and looks at the technological options available to address Big Data concerns. Impetus Technologies Inc. www.impetus.com
  • 2. Reducing the Total Cost of Ownership of Big Data 2 Contents Introduction...........................................................................................................3 Using Commodity Hardware for Big Data..............................................................3 Using Open Source and Cloud Computing.............................................................4 The Cost Components of a Big Data Warehouse...................................................4 Lowering the Total Cost of Ownership ..................................................................5 Reducing the Cost of Storage.................................................................................5 What Technologies, Where? ..................................................................................6 Big Data Scenarios in OLAP....................................................................................7 Analytics with Hadoop............................................................................................8 Choosing the Right Technologies...........................................................................8 Opting for Faster MapReduce/Hadoop .................................................................9 NoSQL Database Solutions.....................................................................................9 New Era Relational Databases.............................................................................10 Impetus Solutions and Recommendations..........................................................10 Conclusion............................................................................................................11
  • 3. Reducing the Total Cost of Ownership of Big Data 3 Introduction As the power of Big Data solutions continues to grow, so too does the cost of collecting, managing, and storing data. According to IDC/EMC estimates, the total value of the computers, networks, and storage facilities driving the digital universe now stands at a whopping USD 6 trillion! Furthermore, that figure is expected to grow significantly over the next few years. In fact, some estimate that the size of digital universe doubles every 18 months. Yet, how much of that information is actually useful? An overload of information can actually increase the cost of storage, reduce producitivity, and essentially ensure much of the collected data will go to waste. Despite access to this rich pool of data, many businesses continue to extract information of little value. It is estimated that businesses spend an extra USD 650 billion to gather and store data that they never put to use. Clearly, much more can be done to unearth business intelligence and actionable insights from Big Data. The question is, what is the best way do that both intellgently and cost-effectively? In this white paper, Impetus examines some of the pros and cons of several Big Data solutions on the market, and offers practical advice based on years of experience. Using Commodity Hardware for Big Data There are many advantages to using commodity hardware. In addition to being both readily available and accessible, the biggest advantage of commodity hardware is businesses can build it themselves, opening up many avenues for innovation. The cost of building reliable storage from commodity hardware is about USD 1 per gigabyte—a great deal and a very good start. However, keep in mind, that figure only covers the cost of storage and does not include other costs associated with managing, monitoring, and hosting data.
  • 4. Reducing the Total Cost of Ownership of Big Data 4 Using Open Source and Cloud Computing Using free, open source software to store, manage, and analyze Big Data comes with a number of benefits. By now, everyone has heard of Hadoop and its ability to tackle large volumes of data, while still providing significant savings. Using cloud computing for Big Data also has its advantages. The advantage is cloud computing allows users to rent resources over the cloud to take care of data and analytics; for example, Amazon Web Services, and Microsoft, for its Windows Azure platform. Cloud computing allows you to select an offering from their portfolios appropriate for your needs and requirements. The downside to using cloud computing, however, is its storage capabilities. While there is storage over the cloud, it can be very costly. The Cost Components of a Big Data Warehouse Many businesses today are turning to Big Data Warehouses as a means of storage. Before making this decision, it is important to understand the costs these storage facilities can generate. Entry Cost The first expense is entry cost—the cost incurred to identify the right Big Data solution. Cost of Migrating Data Once a Big Data solution has been chosen, next expense will be the cost of moving data to the new system. Data migration can be especially expensive for businesses requiring ETL processes. ETL processes may require the purchase of specialized tools that can also be quite expensive. Other Costs A number of other factors can potentially inflate the cost of Big Data solutions. For example, all solutions require a tool that will enable the system to be easily handled for scalability, and in the setting of failing conditions. Thus,
  • 5. Reducing the Total Cost of Ownership of Big Data 5 performance analytics and data management may represent additional major expenses to a Big Data plan. Ongoing maintenance is also essential, and accounts for another cost. As the volume of data increases and changes are made, Big Data warehouses will always require monitoring and tuning. Taken together, these factors—performance analytics, data management, data maintenance—can dramatically increase the cost of a Big Data solution. Lowering the Total Cost of Ownership Based on years of experience in the field, Impetus has identified a number of best practices to help businesses reduce the total cost of ownership of Big Data solutions. This section discusses potential cost savings in hardware and software, with these two main suggestions in mind: For hardware, Impetus suggests that looking at cost saving available in storage and computation. For software, Impetus suggests a number of solutions that will enable the processing of more data, more quickly, and for less money. Reducing the Cost of Storage Impetus advises businesses to compress data in order to cut storage costs. Compressed data requires less storage space, and less storage space means less spending. Some of the solutions available on the market claim they can compress data to 1/40th of its previous size. When looking at these solutions, however, be careful to ensure that the read throughput of the data is not compromised when it is decompressed. Additionally, with Big Data analytics, businesses may opt to focus on a specific subset of data, rather than looking at all data, which accumulated over time. Another option would be to look into systems designed to store data and information based on principles very similar to information lifecycle management (ILM).
  • 6. Reducing the Total Cost of Ownership of Big Data 6 With all this talk about Big Data, it is easy to forget about small data. Often, it is easier to gain business insight using smaller sets of data. Thus, Impetus does not recommend using Big Data solutions for the storage and retrieval of small amounts of data, as the relative latency of queries will be higher. What Technologies, Where? One key to reducing the total cost of ownership is to understand the available technologies and how they can be used. With the advent of Big Data, many different commercial and specialized hardware and appliances have come to the market. These solutions offer rich features such as fault tolerance, easy capacity scaling, and specialized management tools. The commodity hardware available today can be harnessed for Big Data use cases by leveraging the open source stacks or solutions. Latency is also a critical factor, but the systems with the lowest latency are also likely to be the most expensive. There is, of course, a niche market that focuses on latency as a business problem. For cloud-based Big Data solutions, the first question is whether moving to the cloud is the only solution given data storage requirements. Moving to a cloud- based solution can be quite expensive, especially if the data is not already on the cloud. Businesses will also need to upload all of the data needed for processing, which adds significantly to the cost. With this thorough understanding of the technologies available to tackle Big Data, Impetus will now discuss how these technologies can be used. These technologies can be broadly divided into two categories—online analytical processing (OLAP) and online transaction processing (OTP). Big Data Scenarios in OLTP When generating or working with large sets of data in an OLTP scenario, cost- effective NoSQL solutions are ideal. When working with a typical data warehouse that requires analytical processing, however, Impetus recommends using MapReduce or MPP-based systems.
  • 7. Reducing the Total Cost of Ownership of Big Data 7 Big Data Scenarios in OLAP Big Data online analytical processing (OLAP) can be divided into three different scenarios: Big Input Small Output. This is the most common scenario, and is often used to draw conclusions and to prepare graphs or charts, or in cases where the top n-elements in a data set need to be identified. Small Input Big Output. This scenario occurs when the input data set is small and the resulting output is big, and typically occurs in cases of predictive analysis, where n-number of outcomes are possible. It is also applicable in scenarios where correlation-coefficient matrices must be populated with a given set of inputs. These inputs may be small, but the results might turn out to be very large. Big Input and Big Output. The third scenario occurs in ETL processes. Here, the magnitude of output data is similar to that of input data. In the real world, whenever businesses summarize or concentrate data with respect to parameters such as data volume, latency, or cost, there is a decrease in volume of data. In such a scenario, small data solutions such as MPP-data stores, traditional relational databases, and newer NoSQL databases offering the lowest latency are recommended. Note, however, that when moving from a small data solution to a Big Data solution, the latency of these systems will increase while the corresponding cost per gigabyte will decrease. It is well known that Hadoop systems are cost effective. That said, in case of small data solutions, where latency is the key factor, opting for customized and tailored solutions that enable quicker data retrieval will provide the best results. The primary downfall of these solutions is that the cost of deployment will increase the storage cost per GB. Massively parallel processing (MPP), on the other hand, offers a number of significant benefits. MPP-data store solutions provide relational stores while simultaneously accommodating larger sizes of data. Often times it is best to deploy a combination of these systems to best address business needs.
  • 8. Reducing the Total Cost of Ownership of Big Data 8 Analytics with Hadoop Indirect Analytics Over Hadoop In this approach, Hadoop is used to clean and transform the data into a structured form, then to load the structured data into the RDBMS databases. This approach provides the end user with the flexibility of parallel processing of Hadoop and an SQL interface at the summarized data level. This solution is relatively inexpensive when compared with other options. Direct Analytics Over Hadoop Applying analytics directly over a Hadoop system without moving it to any RDBMS databases can be an effective practice to analyze the data from the Hadoop Distributed File System (HDFS). This approach enables both batch and asynchronous analytics of data in the Hadoop system. This is a very cost-effective approach because it does not require the management of data sources other than existing Hadoop systems. It also allows flexibility to scale to any level with summarized data. Analytics Over Hadoop with MPP Data Warehouse Today, a number of options available on the market allow for the integration of MPP-based data warehouses and Hadoop. These options are worth considering for large volumes of data. The primary disadvantage to these approaches, however, is the potential cost involved. Most MPP-based data warehouses are expensive. Some also require high-end servers for deployment, which only add to the expense. Choosing the Right Technologies To choose right technology stack, businesses need to look at these three factors to first determine whether implementation of business use cases: Cost: The first factor is the cost per terabyte for storage. The next consideration is the cost related to business continuity and vendor lock-
  • 9. Reducing the Total Cost of Ownership of Big Data 9 in. Also, understand how the current system is likely to change with strategic decisions, and if these changes would require a different vendor. Latency: The next factors to consider are latency requirements. Do any use case take the throughput of the system into account? For a system for smaller data, when system response times are critical, MPP-based or relational database systems would be a better choice. Dollar-per-terabyte: For business driven by the dollar-per-terabyte factor, Impetus advises an MPP-based solution. This option provides a middle ground between the Hadoop and NoSQL-based solutions, and can allow storage of large amounts of data without compromising speed. For business with varying requirements, whose data and related strategies also change frequently, Impetus does recommend working with a vendor lock-in model. Opting for Faster MapReduce/Hadoop For business requirements driven by cost or business continuity, opt for Hadoop. Hadoop will enable storage of all of data, and has a relatively high degree of latency. A few vendors offer faster Hadoop implementations or other parallel processing frameworks. These solutions usually extend standard Hadoop APIs and offer enhanced system performance, as well as better support for the production environment. NoSQL Database Solutions OLTP scenarios mean that faster reads and writes are required. The vendors in this market offer a variety of different solutions with different underlying implementations, each suited to a different business use case: Hbase and Cassandra are recommended for banking and financial business. For random and real-time read/write access to the Big ‘table- like’ Data, use HBase. For faster writes, look to Cassandra. MongoDB and CouchDB are recommended when the primary requirement is the querying of transactional data and defining indexes.
  • 10. Reducing the Total Cost of Ownership of Big Data 10 There are also other databases—graph databases like Neo4j for instance—that make Big-Data-heavy social media analytics problems simpler. New Era Relational Databases The latest relational databases (RDBMSs) have been specifically designed with these OLTP scenarios in mind, and have taken major steps toward addressing latency issues. Many businesses have been using SQL successfully for the last several years, and most business users still consider SQL to be the best tool to query structured data. Other solutions include emerging sets of technologies and new versions of existing RDBMS engines that are all very adept at handling large volumes of structured data. Therefore, for handling large volumes of structured data, look to new era RDMS solutions like MySQL cluster, GridSQL, or later versions of Microsoft SQL Server. Impetus Solutions and Recommendations One way to reduce the cost of data migration is to use MapReduce for ETL, rather than costly ETL tools. Management and provisioning tools are available with commercial Big Data solutions for easy management of systems. Impetus offers Ankush, a vendor- neutral tool for cluster management, which can be used to automatically provision multiple Hadoop clusters. For ongoing maintenance, Impetus mantra for success is, “automate, automate, automate!” Any task that needs to be carried out more than once should be automated. This also holds true for monitoring and tuning. When dealing with changing capacity, continue to add hardware or look for alternative methods to speed things up. Using graphics processing units for general purpose computing can also help. Impetus also recommends Rainstor or similar solutions that help to compress data and reduce the cost of hardware required data storage.
  • 11. Reducing the Total Cost of Ownership of Big Data 11 Finally, look to faster, tailored MapReduce solutions that will allow completion of more tasks in less time. Conclusion In summary, best practices and robust strategies can help lower the total cost of ownership of your Big Data solutions, and transform Big Data challenges into Big Data opportunities. At Impetus, we have used these methods paired with the Hadoop Ecosystem to successfully tackle Big Data problems. About Impetus Impetus Technologies is a leading provider of Big Data solutions for the Fortune 500®. We help customers effectively manage the “3-Vs” of Big Data and create new business insights across their enterprises. Website: www.bigdata.impetus.com | Email: bigdata@impetus.com © 2013 Impetus Technologies, Inc. All rights reserved. Product and company names mentioned herein may be trademarks of their respective companies. May 2013