SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Beyond The Hype
 Big

Data
 Big data—a growing torrent
 4 V’S Of Big Data
 Big Data vs. DWH-DM
 Challenges of Large Scale Social Network
Analysis
 Where does it come from??
 Big data Technologies
 Applications of Big data Analysis
 Conclusion
Big data
 “Big data” refers to datasets whose size is beyond the ability
of typical database software tools to capture, store, manage
and analyze.
 This definition can vary by sector depending on what kinds
of software tools are commonly available and what sizes of
datasets are common there.
 As technology advances over time, the size of datasets that
qualify as Big data will also increase.
 With these caveats, Big data will range from a few dozen
terabytes to multiple petabytes (thousands of terabytes ).
Big data—a growing torrent
 $600 to buy a disk drive that can store all of the world’s music

 5 billion mobile phones in use in 2010 .

 30 billion pieces of content shared on Facebook every month .
 40% projected growth in global data generated per year vs.
5% growth in global IT spending .
 235 terabytes data collected by the US Library of Congress by
April 2011.

 15 out of 17 sectors in the US have more data stored per
company than the US Library of Congress.


Volume -- data is getting higher/bigger than ever.
Velocity -- data is increasing e.g. Complex real time data.
Variety -- data is spiraling e.g. unstructured video & voice.
Variability -- data types/formats also different

Volume

Variability

Big
Data

Variety

Velocity
Big Data vs. DWH-DM
• Big Data
– Multitude of data types
• Structured, Semi-structured and Unstructured
– Demographic, psychographic, transactional
– Call center data, social media data, web log
data, sensor networks etc.
– Requires new storage mechanisms eg. Hadoop
– High dimensionality
– Online versions of algorithms
• Online services such as eBay, Yahoo, Amazon and
Facebook, have transformed/ created big data
Big Data vs. DWH-DM
• Areas like genomics, astronomy, military surveillance and
RFID technology are also contributing to the explosive
growth of the field.
• A jet engine’s sensors sends terabytes of data every hour,
which can be used to build predictive models for repair
cycles. Understanding when repairs should be done, instead
of doing traditional preventive maintenance at certain set
intervals, could be worth billions of dollars.
• The challenge in big data analytics is to dig deeply, quickly
and widely
• DWH-DM
– Structured data
– Off-line algorithms
Challenges of Large Scale Social Network
Analysis
 Social networking sites like Facebook, YouTube, Orkut and
Twitter are among the most popular sites on the internet.
 Users of these sites form a social network (SN), which provides
a powerful mean of sharing, organizing, and finding contents
and contacts.
 However, the rate at which SNs are growing, posses many
latent challenges in maintaining the stability of their

underlying systems and the members associated with them.
Challenges of Large Scale Social Network
Analysis
• Social Networks (SNs) are living networks that daily give birth
to data traces which can be up to exabytes in volume.
• For example, Facebook produce more than a petabyte of data
per day. Even it’s logging data exceeds 25 terabytes per-day.

• Google creates as much information (social blogs and orkut )
in two days now, as we did from the dawn of man through
2003 i.e., one exabyte of data.

• Analysts need to analyze this huge plethora of SN data to
support system management activities in limited time.
Big data and Big Brother
• Perhaps one of the biggest contributors to big data, however,
is social networking.

• People themselves have become contributors of information
as they increasingly use services such as Facebook and
LinkedIn to connect with each other.
• “LinkedIn is a particularly interesting target, given the
professional nature of its audience. By analyzing LinkedIn
network information, we can learn a lot about individuals and
the people that they know”
• While it may be difficult to manipulate big data at a grand
scale, it is relatively easy, given the right tools and techniques,
to analyze small subsets (such as personal networks of
contacts) for potentially useful results.
• We can do this at a micro-analytic level, where we mine
profiles for snippets of information and at the macro-analytic
level, where we look at patterns in the data.
• “Even when people are not part of your network, a

properly filled-out profile reveals their job title, where
they worked in the past, and where they were
educated.”
Where does it come from??
 In the global marketplace, businesses, suppliers and customers are
creating and consuming vast amounts of information .
Cont… Big Data
 Gartner predicts that enterprise data in all forms will grow
650% over the next 5 years.

 According to IDC, the world's volume of data doubles every
18 months.
 This flood of data is referred to as “information overload,”

“data deluge” and “big data” .
 Big data creates a challenge for business leaders.
NoSQL Databases
 Most of the organizations that built data platforms have
found it necessary to go beyond the relational database model

to tackle big data, because they become ineffective at this
scale.
 Managing, sharding and replication across a horde of

database servers is difficult and slow.
 To store huge datasets effectively a new breed of databases
are developed. There databases are called NoSQL databases,

or Non-Relational databases.
NoSQL Databases
Many of the NoSQL databases are the logical descendants of
Google’s BigTable and Amazon’s Dynamo.
These are designed to be distributed across many nodes, to
provide consistency and to have very flexible schema.
Popular NoSQL databases
Cassandra:
 Developed at Facebook, in production use at Twitter,
Rackspace, Reddit, and other large sites.
 Cassandra is designed for high performance, reliability,
and automatic replication. It has a very flexible data
model. A new startup, Riptano, provides commercial
support.
HBase:
 Part of the Apache Hadoop project, and modeled on
Google’s BigTable.
 Suitable for extremely large databases (billions of rows,
millions of columns), distributed across thousands of
nodes. Along with Hadoop, commercial support is
provided by Cloudera.
Prevalence of Big Data
 Big data is not limited to big companies like Facebook and
Google.
 According to McKinsey Global Institute study in 2011
 Most of the investment firms in U.S with less than 1,000
employees has 3.8 petabytes of data stored.
 Companies in all sectors have at least 100 terabytes stored.
Big Data Formats
Big data Technologies
 Big data technologies describe a new generation of
technologies and architectures, designed to economically
extract value from very large volumes of a wide variety of
data, by enabling high velocity capture, discovery, and/or
analysis.
 The above definition incorporates all types of data (e.g., realtime, analytic) managed by next generation systems.


MapReduce approach is basically a divide-and-conquer
strategy for distributing an extremely large problem across
an extremely large computing cluster.



In the “map” stage, a programming task is divided into a
number of identical subtasks, which are then distributed

across many processors.


The intermediate results are then combined by a single
reduce task.



MapReduce provides a solution to Google’s biggest
problem, i.e creating large searches.
 MapReduce has proven to be widely applicable to many large
data problems, ranging from search to machine learning.

 The

most

popular

open

source

MapReduce is the Hadoop project.

implementation

of
Applications of Big data Analysis
 Facebook and LinkedIn use patterns of friendship
relationships to suggest other people you may know, or
should know, with frightening accuracy.
 Amazon saves your searches, correlates what you search for
with what other users search for, and uses it to create
surprisingly appropriate recommendations.
 Medical researchers sift through the health records of
thousands of people to try to identify useful correlations
between medical treatments and health outcomes.
Applications of Big data Analysis
 Facebook and LinkedIn use patterns of friendship
relationships to suggest other people you may know, or
should know, with frightening accuracy.
 Amazon saves your searches, correlates what you search for
with what other users search for, and uses it to create
surprisingly appropriate recommendations.
 Medical researchers sift through the health records of
thousands of people to try to identify useful correlations
between medical treatments and health outcomes.
 As

data volumes are growing exponentially, so is the
concern over data preservation, access,
dissemination, and usability. Many agencies has
taken initiatives to research into areas such as
automated analysis techniques, data mining,
machine learning, privacy, and database
interoperability and these will help to identify how
big data can enable science in new ways and at new
levels..
•

http://treparel.com/news/convergence-big-data-cloudcomputing/

•

http://www.intersystems.com/casestudies/cache/esa.h
tml

•

http://blogs.technet.com/b/trustworthycomputing/arc
hive/2013/06/04/cloud-computing-turning-big-datainto-business-insight.aspx
BigData
BigData

Weitere ähnliche Inhalte

Was ist angesagt?

Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035Neelam Rawat
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunishaShivlal Mewada
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...PROWEBSCRAPER
 

Was ist angesagt? (20)

Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Big data
Big data Big data
Big data
 
Big data-analytics-cpe8035
Big data-analytics-cpe8035Big data-analytics-cpe8035
Big data-analytics-cpe8035
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data 101
Big data 101Big data 101
Big data 101
 
Big Data World
Big Data WorldBig Data World
Big Data World
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Big Data 101
Big Data 101Big Data 101
Big Data 101
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big data
Big dataBig data
Big data
 
Big Data
Big DataBig Data
Big Data
 
23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha
 
3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...3 pillars of big data : structured data, semi structured data and unstructure...
3 pillars of big data : structured data, semi structured data and unstructure...
 
WORLD CAT AS BIG DATA
WORLD CAT AS  BIG DATAWORLD CAT AS  BIG DATA
WORLD CAT AS BIG DATA
 

Andere mochten auch

SRS Document Of Course management software system.doc
SRS Document Of Course management software system.docSRS Document Of Course management software system.doc
SRS Document Of Course management software system.docMaRwa Samih AL-Amri
 
SRS for student database management system
SRS for student database management systemSRS for student database management system
SRS for student database management systemSuman Saurabh
 
School management system
School management systemSchool management system
School management systemSoumya Behera
 
School management system
School management systemSchool management system
School management systemasd143
 
Software requirements specification of Library Management System
Software requirements specification of Library Management SystemSoftware requirements specification of Library Management System
Software requirements specification of Library Management SystemSoumili Sen
 
Student Management System Project Abstract
Student Management System Project AbstractStudent Management System Project Abstract
Student Management System Project AbstractUdhayyagethan Mano
 
Student management system
Student management systemStudent management system
Student management systemGaurav Subham
 
Student database management system
Student database management systemStudent database management system
Student database management systemSnehal Raut
 
Student management system
Student management systemStudent management system
Student management systemAmit Gandhi
 
School Management System ppt
School Management System pptSchool Management System ppt
School Management System pptMohsin Ali
 
Library mangement system project srs documentation.doc
Library mangement system project srs documentation.docLibrary mangement system project srs documentation.doc
Library mangement system project srs documentation.docjimmykhan
 

Andere mochten auch (13)

SRS Document Of Course management software system.doc
SRS Document Of Course management software system.docSRS Document Of Course management software system.doc
SRS Document Of Course management software system.doc
 
SRS for student database management system
SRS for student database management systemSRS for student database management system
SRS for student database management system
 
School management system
School management systemSchool management system
School management system
 
School management system
School management systemSchool management system
School management system
 
Software requirements specification of Library Management System
Software requirements specification of Library Management SystemSoftware requirements specification of Library Management System
Software requirements specification of Library Management System
 
Student Management System Project Abstract
Student Management System Project AbstractStudent Management System Project Abstract
Student Management System Project Abstract
 
Student management system
Student management systemStudent management system
Student management system
 
Student database management system
Student database management systemStudent database management system
Student database management system
 
PPT - Powerful Presentation Techniques
PPT - Powerful Presentation TechniquesPPT - Powerful Presentation Techniques
PPT - Powerful Presentation Techniques
 
School Management System
School Management SystemSchool Management System
School Management System
 
Student management system
Student management systemStudent management system
Student management system
 
School Management System ppt
School Management System pptSchool Management System ppt
School Management System ppt
 
Library mangement system project srs documentation.doc
Library mangement system project srs documentation.docLibrary mangement system project srs documentation.doc
Library mangement system project srs documentation.doc
 

Ähnlich wie BigData

Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysisPoonam Kshirsagar
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfrajsharma159890
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentationKlawal13
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxvipulkondekar
 

Ähnlich wie BigData (20)

Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
1
11
1
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Big data-ppt-
Big data-ppt-Big data-ppt-
Big data-ppt-
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Seminar presentation
Seminar presentationSeminar presentation
Seminar presentation
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 
Big Data
Big DataBig Data
Big Data
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
 
Big Data
Big DataBig Data
Big Data
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big data
Big dataBig data
Big data
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
 

Kürzlich hochgeladen

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Kürzlich hochgeladen (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

BigData

  • 2.  Big Data  Big data—a growing torrent  4 V’S Of Big Data  Big Data vs. DWH-DM  Challenges of Large Scale Social Network Analysis  Where does it come from??  Big data Technologies  Applications of Big data Analysis  Conclusion
  • 3. Big data  “Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze.  This definition can vary by sector depending on what kinds of software tools are commonly available and what sizes of datasets are common there.  As technology advances over time, the size of datasets that qualify as Big data will also increase.  With these caveats, Big data will range from a few dozen terabytes to multiple petabytes (thousands of terabytes ).
  • 4. Big data—a growing torrent  $600 to buy a disk drive that can store all of the world’s music  5 billion mobile phones in use in 2010 .  30 billion pieces of content shared on Facebook every month .  40% projected growth in global data generated per year vs. 5% growth in global IT spending .  235 terabytes data collected by the US Library of Congress by April 2011.  15 out of 17 sectors in the US have more data stored per company than the US Library of Congress.
  • 5.  Volume -- data is getting higher/bigger than ever. Velocity -- data is increasing e.g. Complex real time data. Variety -- data is spiraling e.g. unstructured video & voice. Variability -- data types/formats also different Volume Variability Big Data Variety Velocity
  • 6. Big Data vs. DWH-DM • Big Data – Multitude of data types • Structured, Semi-structured and Unstructured – Demographic, psychographic, transactional – Call center data, social media data, web log data, sensor networks etc. – Requires new storage mechanisms eg. Hadoop – High dimensionality – Online versions of algorithms • Online services such as eBay, Yahoo, Amazon and Facebook, have transformed/ created big data
  • 7. Big Data vs. DWH-DM • Areas like genomics, astronomy, military surveillance and RFID technology are also contributing to the explosive growth of the field. • A jet engine’s sensors sends terabytes of data every hour, which can be used to build predictive models for repair cycles. Understanding when repairs should be done, instead of doing traditional preventive maintenance at certain set intervals, could be worth billions of dollars. • The challenge in big data analytics is to dig deeply, quickly and widely • DWH-DM – Structured data – Off-line algorithms
  • 8. Challenges of Large Scale Social Network Analysis  Social networking sites like Facebook, YouTube, Orkut and Twitter are among the most popular sites on the internet.  Users of these sites form a social network (SN), which provides a powerful mean of sharing, organizing, and finding contents and contacts.  However, the rate at which SNs are growing, posses many latent challenges in maintaining the stability of their underlying systems and the members associated with them.
  • 9.
  • 10. Challenges of Large Scale Social Network Analysis • Social Networks (SNs) are living networks that daily give birth to data traces which can be up to exabytes in volume. • For example, Facebook produce more than a petabyte of data per day. Even it’s logging data exceeds 25 terabytes per-day. • Google creates as much information (social blogs and orkut ) in two days now, as we did from the dawn of man through 2003 i.e., one exabyte of data. • Analysts need to analyze this huge plethora of SN data to support system management activities in limited time.
  • 11. Big data and Big Brother • Perhaps one of the biggest contributors to big data, however, is social networking. • People themselves have become contributors of information as they increasingly use services such as Facebook and LinkedIn to connect with each other. • “LinkedIn is a particularly interesting target, given the professional nature of its audience. By analyzing LinkedIn network information, we can learn a lot about individuals and the people that they know”
  • 12. • While it may be difficult to manipulate big data at a grand scale, it is relatively easy, given the right tools and techniques, to analyze small subsets (such as personal networks of contacts) for potentially useful results. • We can do this at a micro-analytic level, where we mine profiles for snippets of information and at the macro-analytic level, where we look at patterns in the data. • “Even when people are not part of your network, a properly filled-out profile reveals their job title, where they worked in the past, and where they were educated.”
  • 13. Where does it come from??  In the global marketplace, businesses, suppliers and customers are creating and consuming vast amounts of information .
  • 14. Cont… Big Data  Gartner predicts that enterprise data in all forms will grow 650% over the next 5 years.  According to IDC, the world's volume of data doubles every 18 months.  This flood of data is referred to as “information overload,” “data deluge” and “big data” .  Big data creates a challenge for business leaders.
  • 15.
  • 16. NoSQL Databases  Most of the organizations that built data platforms have found it necessary to go beyond the relational database model to tackle big data, because they become ineffective at this scale.  Managing, sharding and replication across a horde of database servers is difficult and slow.  To store huge datasets effectively a new breed of databases are developed. There databases are called NoSQL databases, or Non-Relational databases.
  • 17. NoSQL Databases Many of the NoSQL databases are the logical descendants of Google’s BigTable and Amazon’s Dynamo. These are designed to be distributed across many nodes, to provide consistency and to have very flexible schema.
  • 18. Popular NoSQL databases Cassandra:  Developed at Facebook, in production use at Twitter, Rackspace, Reddit, and other large sites.  Cassandra is designed for high performance, reliability, and automatic replication. It has a very flexible data model. A new startup, Riptano, provides commercial support. HBase:  Part of the Apache Hadoop project, and modeled on Google’s BigTable.  Suitable for extremely large databases (billions of rows, millions of columns), distributed across thousands of nodes. Along with Hadoop, commercial support is provided by Cloudera.
  • 19. Prevalence of Big Data  Big data is not limited to big companies like Facebook and Google.  According to McKinsey Global Institute study in 2011  Most of the investment firms in U.S with less than 1,000 employees has 3.8 petabytes of data stored.  Companies in all sectors have at least 100 terabytes stored.
  • 20.
  • 22. Big data Technologies  Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery, and/or analysis.  The above definition incorporates all types of data (e.g., realtime, analytic) managed by next generation systems.
  • 23.  MapReduce approach is basically a divide-and-conquer strategy for distributing an extremely large problem across an extremely large computing cluster.  In the “map” stage, a programming task is divided into a number of identical subtasks, which are then distributed across many processors.  The intermediate results are then combined by a single reduce task.  MapReduce provides a solution to Google’s biggest problem, i.e creating large searches.
  • 24.  MapReduce has proven to be widely applicable to many large data problems, ranging from search to machine learning.  The most popular open source MapReduce is the Hadoop project. implementation of
  • 25. Applications of Big data Analysis  Facebook and LinkedIn use patterns of friendship relationships to suggest other people you may know, or should know, with frightening accuracy.  Amazon saves your searches, correlates what you search for with what other users search for, and uses it to create surprisingly appropriate recommendations.  Medical researchers sift through the health records of thousands of people to try to identify useful correlations between medical treatments and health outcomes.
  • 26. Applications of Big data Analysis  Facebook and LinkedIn use patterns of friendship relationships to suggest other people you may know, or should know, with frightening accuracy.  Amazon saves your searches, correlates what you search for with what other users search for, and uses it to create surprisingly appropriate recommendations.  Medical researchers sift through the health records of thousands of people to try to identify useful correlations between medical treatments and health outcomes.
  • 27.  As data volumes are growing exponentially, so is the concern over data preservation, access, dissemination, and usability. Many agencies has taken initiatives to research into areas such as automated analysis techniques, data mining, machine learning, privacy, and database interoperability and these will help to identify how big data can enable science in new ways and at new levels..