Big data analytics is the process of examining large data sets containing a variety of data types i.e., big data to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits. Enterprises are increasingly looking to find actionable insights into their data. Many big data projects originate from the need to answer specific business questions. With the right big data analytics platforms in place, an enterprise can boost sales, increase efficiency, and improve operations, customer service and risk management. Notably, the business area getting the most attention relates to increasing efficiencies and optimizing operations. By using big data analytics you can extract only the relevant information from terabytes, petabytes and exabytes, and analyse it to transform your business decisions for the future. Becoming proactive with big data analytics isn't a one-time endeavour, it is more of a culture change – a new way of gaining ground.
Keywords: business, analytics, exabytes, efficiency, data sets
2. ABSTRACT
• Big data analytics is the process of examining large data sets
containing a variety of data types i.e., big data to uncover
hidden patterns, unknown correlations, market trends,
customer preferences and other useful business information.
• The analytical findings can lead to more effective marketing,
new revenue opportunities, better customer service, improved
operational efficiency, competitive advantages over rival
organizations and other business benefits
• Many big data projects originate from the need to answer
specific business questions.With the right big data analytics
platforms in place, an enterprise can boost sales, increase
efficiency, and improve operations, customer service and risk
management
3.
4. MOTIVATION
• By using big data analytics you can extract only the relevant
information from terabytes, petabytes and exabytes, and
analyze it to transform your business decisions for the future.
• With the right big data analytics platforms in place, an
enterprise can boost sales, increase efficiency, and improve
operations, customer service and risk management.
0
2
4
6
Category 1 Category 2 Category 3 Category 4
Series 1 Series 2 Series 3
• Technologies that includes
Hadoop and related tools such as
YARN, MapReduce, Spark, Hive
and Pig as well as NoSQL
databases supports the
processing of large and diverse
data sets across clustered
systems
5. PROBLEM STATEMENT
• The first challenge is in breaking down data silos to access all
data an organization stores in different places and often in
different systems.
• A second big data challenge is in creating platforms that can
pull in unstructured data as easily as structured data.
• This massive volume of data is typically so large that it's
difficult to process using traditional database and software
methods.
6. PROBLEM STATEMENT(cont..)
• The above challenges can be overcome by the
implementation of following technologies
Parallel Database Technologies
Map Reduce
• The best open source tools available are
10. • 2004- Initial versions of HDFS and MapReduce were
implemented.
• 2005-used GFS and MapReduce to perform operations.
• 2006-Yahoo! created Hadoop based on GFS and MapReduce .
• 2007 -Yahoo started using Hadoop on a 1000 node cluster.
• 2008- Apache took over Hadoop,Tested a 4000 node cluster with
it
• 2009- successfully sorted a peta byte of data in less than 17 hours
to handle billions of searches and indexing millions of web pages.
• 2011 - Hadoop releases version 1.0
• 2013 -Version 2.0.6 is available
KNOWLEDGE FROM LITERATURE
SURVEY(CONT..)
12. Methods Author Year
RDBMS
(Relational Data Base Management
Systems)
E.F.CODD 1980
GRID COMPUTING IANFOSTER,
CARL KESSELMAN
(Early) 1990s
Volunteer computing Luis F. G. Sarmenta 1996
hadoop HDFS Sanjay Ghemawat, Howard Gobioff, Shun-
Tak Leung
2003
hadoop MapReduce Jefry Dean and Sanjay Ghemawat 2004
Apache Hadoop Doug Cutting
&
Mike Cafarella
2011
LITERATURE SURVEY METHODS(CONT..)
13. •Hardware Failure:
As soon as we start using many pieces of hardware,
the chance that one will fail is fairly high.
• Combine the data after analysis:
Most analysis tasks need to be able to combine the
data in some way; data read from one disk may need
to be combined with the data from any of the other
99 disks.
DEMERITS OF PREVIOUS METHODS
14. Apache Hadoop is a framework for running applications on large cluster
built of commodity hardware.
A common way of avoiding data loss is through replication: redundant
copies of the data are kept by the system so that in the event of failure,
there is another copy available.The Hadoop Distributed Filesystem (HDFS),
takes care of this problem.
The second problem is solved by a simple programming model- Mapreduce.
Hadoop is the popular open source implementation of MapReduce, a
powerful tool designed for deep analysis and transformation of very large
data sets.
HADOOP ADVANTAGES
16. CONCLUSION
By using big data analytics you can extract only the relevant
information from terabytes, petabytes and exabytes, and analyze it
to transform your business decisions for the future.
With the right big data analytics platforms in place, an enterprise can
boost sales, increase efficiency, and improve operations, customer
service and risk management.
Pros Cons
Cost Effective Cluster management is hard
Parallel processing Single point of failure
Fault tolerance Security issues
Scalability