Apidays New York 2024 - The value of a flexible API Management solution for O...
What is big data
1. What Is Big Data ?
By Ashwin Pednekar
Email : ashwinpednekar@gmail.com
2. Agenda
• Introduction to Big Data ( What’s so Big about Big Data ? )
• Understanding Big Data
• Use and Benefits of Big Data
• Technologies used in Big Data
• Famous Quotes about Big Data
3. What is so Big about Big Data ?
• We have heard big data defined in many, many
different ways, and so, I’m not surprised there’s
so much confusion surrounding the term.
Because of all the misunderstanding and
misperceptions
• Big data is a collection of data from traditional
and digital sources inside and outside your
company that represents a source for ongoing
discovery and analysis
5. Enterprise need to Fully understand Big Data
• what it is to them,
• what is does for them
• what it means to them
Understand Data Itself :
• Structured Data
• Unstructured Data
6. Structured Data :
Structured data refers to information with a high degree of
organization, such that inclusion in a relational database is
seamless and readily searchable by simple, straightforward search
engine algorithms or other search operations
Unstructured Data :
Unstructured data usually refers to information that doesn't
reside in a traditional row-column database and not organized or
Structured Logically
Examples include e-mail messages, word processing documents,
videos
7. The management of unstructured data is recognized as one of the
major unsolved problems in the information technology (IT)
industry, the main reason being that the tools and techniques that
have proved so successful transforming structured data into
business intelligence and actionable information simply don't
work when it comes to unstructured data. New approaches are
necessary.
8. Many organizations are missing out on what data experts agree is an opportunity to derive significant business
value from properly harnessing unstructured data. IDC, estimates that unstructured content already accounts
for a staggering 90 percent of all digital data, much of which is locked away across a variety of different data
stores, in different locations and in varying formats.
Unstructured data can help companies gain a better understanding of their customers, products, services and
business in general. For example, data from Twitter streams, social media networks and web logs can help a
company gauge customer sentiment toward a product or service, or help identify and address a potential service
or quality issue before it becomes a full-fledged problem. Combining existing data about customers from
transactional systems with data gathered about them from other sources can help an organization get closer to a
360-degree view of its customers.
And an Answer to achieve this is “Big Data” Technologies and Methods
10. Today’s consumers are a tough nut to crack. They look around a lot before they buy, talk to their entire social
network about their purchases, demand to be treated as unique and want to be sincerely thanked for buying
your products. Big Data allows you to profile these increasingly vocal and fickle little ‘tyrants’ in a far-reaching
manner so that you can engage in an almost one-on-one, real-time conversation with them. This is not
actually a luxury. If you don’t treat them like they want to, they will leave you in the blink of an eye.
Just a small example: when any customer enters a bank, Big Data tools allow the clerk to check his/her profile
in real-time and learn which relevant products or services (s)he might advise. Big Data will also have a key role
to play in uniting the digital and physical shopping spheres: a retailer could suggest an offer on a mobile
carrier, on the basis of a consumer indicating a certain need in the social media
11. Big Data can also help you understand how others perceive
your products so that you can adapt them, or your marketing,
if need be. Analysis of unstructured social media text allows
you to uncover the sentiments of your customers and even
segment those in different geographical locations or among
different demographic groups.
Success not only depends on how you run your company.
Social and economic factors are crucial for your
accomplishments as well. Predictive analytics, fueled by Big
Data allows you to scan and analyze newspaper reports or
social media feeds so that you permanently keep up to speed
on the latest developments in your industry and its
environment. Detailed health-tests on your suppliers and
customers are another goodie that comes with Big Data. This
will allow you to take action when one of them is in risk of
defaulting.
12. The insights that you gain from analyzing your market and its consumers with Big Data are not just valuable to
you. You could sell them as non-personalized trend data to large industry players operating in the same segment
as you and create a whole new revenue stream.
One of the more impressive examples comes from Shazam, the song identification application. It helps
record labels find out where music sub-cultures are arising by monitoring the use of its service, including
the location data that mobile devices so conveniently provide. The record labels can then find and sign
up promising new artists or remarket their existing ones accordingly.
Previously, if business users needed to analyze large amounts of varied data, they had to ask their IT colleagues
for help as they themselves lacked the technical skills for doing so. Often, by the time they received the
requested information, it was no longer useful or even correct. With Big Data tools, the technical teams can do
the groundwork and then build repeatability into algorithms for faster searches. In other words, they can
develop systems and install interactive and dynamic visualization tools that allow business users to analyze, view
and benefit from the data
14. Hadoop :
An open source (free) software framework for processing huge datasets on
certain kinds of problems on a distributed system. Its development was
inspired by Google’s MapReduce and Google File System. It was originally
developed at Yahoo! and is now managed as a project of the Apache
Software Foundation
R Programming:
An open source (free) programming language and software
environment for statistical computing and graphics. The R
language has become a de facto standard among statisticians for
developing statistical software and is widely used for statistical
software development and data analysis. R is part of the GNU
Project, a collaboration that supports open source projects.
15. Spark :
Apache Spark is a fast and general-purpose cluster computing
system designed for processing data in parallel at a large scale
Python NLTK : is a leading platform for building Python
programs to work with human language data. It provides easy-to-
use interfaces to over 50 corpora and lexical resources, along with
a suite of text processing libraries for classification, tokenization,
stemming, tagging, parsing, and semantic reasoning.
MongoDB : is a cross-platform document-oriented database that
stores data into JSON-like documents.
There are many such tools used for Data Analytics , Data mining , Visual and Statistical Analysis .
Big Data is huge ecosystem of such tools and technologies