this slide is for brief introduction to the big data with little bit of fun through memes.
it is prepared with the articles from different websites about big data and some of my own words so it would be great if you like it
3. What Is Big Data?
big data is larger, more complex data sets,
especially from new data sources.
These data sets are so Big that traditional data
processing software just can't manage them.
These massive volumes of data can be used to
address business problems you wouldn’t have
been able to tackle before.
4.
5. The Three ‘V’ s of Big Data?
Volume:
With big data, you’ll have to process high volumes of low-density, unstructured data.
This can be data of unknown value, such as Twitter data feeds, or sensor-enabled
equipment.
For some organizations, this might be tens of terabytes of data. For others, it may be
hundreds of petabytes.
Velocity:
Velocity is the fast rate at which data is received and (perhaps) acted on.
Some internet-enabled smart products operate in real time or near real time and will
require real-time evaluation and action.
Variety:
Variety refers to the many types of data that are available. Traditional data types
were structured and fit neatly in a Relational Database.
With the rise of big data, data comes in new unstructured data types.
Unstructured and semi-structured data types, such as text, audio, and video, require
additional preprocessing to derive meaning and support metadata.
7. Structured:
Think spreadsheets; every piece of information is grouped into rows and columns.
Specific elements defined by certain variables are easily discoverable.
Structured data is the easiest type of data to analyze because it requires little to no
preparation before processing.
it’s easy to work with
Unstructured:
Unstructured data is all your unorganized data.
The hardest part of analyzing unstructured data is teaching an application to
information it’s extracting. More often than not, this means translating it into some form
structured data
Semi-Structured:
Semi-structured data toes the line between structured and unstructured.
Let’s say you take a picture of your cat from your phone. It automatically logs the time
picture was taken, the GPS data at the time of the capture and your device ID. If you’re
any kind of web service for storage, like iCloud, your account info becomes attached to
file.
8. Big Data in Cloud Computing
Big Data refers to the large sets of data collected
Meanwhile Cloud computing refers to the mechanism that remotely takes this
data in and performs any operations specified on that data.
Example:
Cloud Gaming.
Transportation
Media and Entertainment
Government
9.
10. How Big Data And Cloud Computing
Relates?
Google is a company that famously uses big data. In addition to having access
to user information through its Chrome browser and Gmail products, Google
also receives billions of search requests every day on its search engine.
The company uses that data to train its algorithms, getting better at
fundamental search tasks such as parsing sentences, correcting misspellings
and understanding what a user is trying to search for.
Google also uses data on historical and current search terms to recommend
search suggestions to users before they finish typing, which provides useful
autocomplete services to its users.
11.
12. Advantage
It helps in improving science and research.
Every second additions are made.
It helps in improving science and research.
Disadvantage
Traditional storage can cost lot of money to store big data.
Lots of big data is unstructured.
Big data analysis results are misleading sometimes.
Too much processing power is needed to analys data
13. Big Data Tools
Hadoop:
The Apache Hadoop software library is a framework that allows for the distributed
processing of large data sets across clusters of computers using simple
programming models
Apache Storm:
Apache Storm is a real-time distributed tool for processing data streams. It is written
in Java and Clojure, and can be integrated with any programming language.
MongoDB:
This is an open-source NoSQL database that is an advanced alternative to modern
databases. It is a document-oriented database used for storing large volumes of
data.