Introduction to Data Science.pptx

A
CS3352 -Foundations of Data
Science
Introduction to Data Science
Dr.V.Anusuya
Associate Professor/IT
Ramco Institute of Technology
Rajapalayam
Introduction
• Data Science is the area of study which involves extracting
insights from vast amounts of data using various scientific
methods, algorithms and processes.
• Data Science is useful to discover hidden patterns from the
voluminous raw data.
• Data Science is because of the evolution of mathematical
statistics, data analysis, and big data.
Contd.,
• Ingredients = Data
• When a chef is starting out with a new dish.
• After the chef has determined what type of dish he would like to
make, he goes to the fridge to gather ingredients. If he doesn’t
have the necessary ingredients, he takes a trip to the store to
collect them.
• As data scientists, our data points are our ingredients. We may
have some on hand, but we still might have to collect more
through web scraping, SQL queries, etc.
Big Data
• Data Science focuses on processing the huge
volume of heterogeneous data known as big
data.
• Big data refers to the data sets that are large
and complex in nature and difficult to process
using traditional data processing application
software.
Contd.,
• The characteristics of big data are often referred
to as the three Vs:
• Volume—How much data is there?
• Variety—How diverse are different types of
data?
• Velocity—At what speed is new data generated?
• Veracity-How accurate is the data?.
• These four properties make big data different
from the data found in traditional data
management tools.
Contd.,
• The four V’s of data make the data
capturing,cleaning,preprocessing,storing,search
ing,sharing,transferring and visualization
processes a complex task.
• Specialization techniques are required to extract
the insights from this huge volume of data.
• The main things that set a data scientist apart
from a statistician are the ability to work with
big data and experience in machine learning,
computing, and algorithm building.
Big data
• Tools- Hadoop, Pig, Spark, R, Python, and Java.
• Python is a great language for data science
because it has many data science libraries
available, and it’s widely supported by
specialized software.
Contd.,
• Data Science is an interdisciplinary field that
allows to extract knowledge from structured
or unstructured data.
• Translate business problem and Research
project into practical solution.
Benefits and uses of data science
• Data science and big data are used almost everywhere
in both commercial and noncommercial Settings.
• Commercial companies in almost every industry use
data science and big data to gain insights into their
customers, processes, staff, completion, and products.
• Many companies use data science to offer customers a
better user experience, as well as to cross-sell, up-sell,
and personalize their offerings.
Benefits and uses of data science
• Governmental organizations are also aware of data’s value. Many
governmental organizations not only rely on internal data scientists to
discover valuable information, but also share their data with the
public.
• Nongovernmental organizations (NGOs) use it to raise money and
defend their causes.
• Universities use data science in their research but also to enhance the
study experience of their students. The rise of massive open online
courses (MOOC) produces a lot of data, which allows universities to
study how this type of learning can complement traditional classes.
Facets of data
• In data science and big data we’ll come across
many different types of data, and each of them
tends to require different tools and techniques. The
main categories of data are these:
• Structured
• Unstructured
• Natural language
• Machine-generated
• Graph-based
• Audio, video, and images
• Streaming
Structured Data
•Structured data is data that depends on a
data model and resides in a fixed field within
a record.
•To store structured data in tables within
databases or Excel files.
•SQL, or Structured Query Language, is the
preferred way to manage and query data
that resides in databases.
•Hierarchical data such as a family tree.
Example
Unstructured Data
• Unstructured data is data that isn’t easy to fit
into a data model because the content is
context-specific or varying.
• Example of unstructured data is your regular
email.
• Although email contains structured elements
such as the sender, title, and body text.
• The thousands of different languages and
dialects out there further complicate this.
Example
Natural language
• Natural language is a special type of unstructured data; it’s challenging to
process because it requires knowledge of specific data science techniques
and linguistics.
• The natural language processing community has had success in entity
recognition, Speech recognition, summarization, text completion, and
sentiment analysis, but models trained in one domain don’t generalize well to
other domains.
• Even state-of-the-art techniques aren’t able to decipher the meaning of every
piece of text.
• Example: personal voice Assistant, siri, Alexa recognize patterns in speech,
provide a useful response.
• Example: Email Filters- Primary, Social ,Promotions
Machine-generated data
• Machine-generated data is information that’s
automatically created by a computer, process,
application, or other machine without human
intervention.
• Machine-generated data is becoming a major
data resource and will continue to do so.
• The analysis of machine data relies on highly
scalable tools, due to its high volume and
speed. Examples of machine data are web
server logs, call detail records, network event
logs, and telemetry.
Introduction to Data Science.pptx
Graph-based or Networked Data
• A graph is a mathematical structure to model
pair-wise relationships between objects.
• Graph or network data is, in short, data that
focuses on the relationship or adjacency of
objects.
• The graph structures use nodes, edges, and
properties to represent and store graphical data.
Contd.,
• Graph-based data is a natural way to represent
social networks, and its structure allows you to
calculate specific metrics such as the influence
of a person and the shortest path between two
people.
• Example
• LinkedIn-business colleagues, Twitter-
follower list.
• Facebook-connecting edges here to show
friends.
Graph based structure
Specialized query languages such as SPARQL
Audio, image, and video
• Audio, image, and video are data types that
pose specific challenges to a data scientist.
• Tasks that are trivial for humans, such as
recognizing objects in pictures, to be challenging
for computers.
• MLBAM (Major League Baseball Advanced
Media) announced in 2014 that they’ll increase
video capture to approximately 7 TB per game
for the purpose of live, in-game analytics.
Contd.,
• Recently a company called DeepMind
succeeded at creating an algorithm that’s
capable of learning how to play video games.
• This algorithm takes the video screen as input
and learns to interpret everything via a
complex of deep learning.
Streaming Data
• The data flows into the system when an event
happens instead of being loaded into a data
store in a batch.
• Examples are the “What’s trending” on
Twitter, live sporting or music events, and the
stock market.
1 von 24

Recomendados

IoT and its Applications von
IoT and its ApplicationsIoT and its Applications
IoT and its ApplicationsAbdulla Shaheen
1.2K views20 Folien
Introduction to Data Science.pptx von
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
2.8K views44 Folien
How Internet Of Things is changing Financial Sector von
How Internet Of Things is changing Financial SectorHow Internet Of Things is changing Financial Sector
How Internet Of Things is changing Financial SectorSandeep Mishra
307 views7 Folien
IoT and Big Data von
IoT and Big DataIoT and Big Data
IoT and Big Datasabnees
3.5K views24 Folien
Data Visualization in Data Science von
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data ScienceMaloy Manna, PMP®
3.3K views34 Folien
M2M technology in IOT von
M2M technology in IOTM2M technology in IOT
M2M technology in IOTshashidharPapishetty
333 views9 Folien

Más contenido relacionado

Was ist angesagt?

Big data von
Big dataBig data
Big dataFACTS Computer Software L.L.C
2.7K views20 Folien
introduction to data science von
introduction to data scienceintroduction to data science
introduction to data sciencebhavesh lande
4.7K views29 Folien
Data science von
Data scienceData science
Data scienceBenha University
1.9K views34 Folien
Internet of Things von
Internet of ThingsInternet of Things
Internet of ThingsMphasis
979 views7 Folien
Presentation on Big Data Analytics von
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
11.6K views21 Folien
Introduction to Data Analytics von
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data AnalyticsUtkarsh Sharma
947 views16 Folien

Was ist angesagt?(20)

introduction to data science von bhavesh lande
introduction to data scienceintroduction to data science
introduction to data science
bhavesh lande4.7K views
Internet of Things von Mphasis
Internet of ThingsInternet of Things
Internet of Things
Mphasis979 views
Presentation on Big Data Analytics von S P Sajjan
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
S P Sajjan11.6K views
Introduction to Data Analytics von Utkarsh Sharma
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Utkarsh Sharma947 views
IoT advatage and disadvantage von Rubel Biswas
IoT advatage and disadvantageIoT advatage and disadvantage
IoT advatage and disadvantage
Rubel Biswas921 views
5G + AI Applications in Healthcare and Medical Sciences von Hamidreza Bolhasani
5G + AI Applications in Healthcare and Medical Sciences5G + AI Applications in Healthcare and Medical Sciences
5G + AI Applications in Healthcare and Medical Sciences
Internet of Things(IOT)_Seminar_Dr.G.Rajeshkumar von RAJESHKUMARG12
Internet of Things(IOT)_Seminar_Dr.G.RajeshkumarInternet of Things(IOT)_Seminar_Dr.G.Rajeshkumar
Internet of Things(IOT)_Seminar_Dr.G.Rajeshkumar
RAJESHKUMARG124.7K views
Benefits of Document management system von Digismartek
Benefits of Document management systemBenefits of Document management system
Benefits of Document management system
Digismartek619 views
Sustainability and fog computing applications, advantages and challenges von AbdulMajidFarooqi
Sustainability and fog computing applications, advantages and challengesSustainability and fog computing applications, advantages and challenges
Sustainability and fog computing applications, advantages and challenges
AbdulMajidFarooqi204 views
Top 5 Trending Business Intelligence Tools | Edureka von Edureka!
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!4.8K views
Cloud Computing and Services | PPT von Seminar Links
Cloud Computing and Services | PPTCloud Computing and Services | PPT
Cloud Computing and Services | PPT
Seminar Links36.3K views
Introduction to Data Science and Analytics von Dhruv Saxena
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
Dhruv Saxena349 views
MongoDB and the Internet of Things von MongoDB
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
MongoDB6.5K views

Similar a Introduction to Data Science.pptx

Data science.chapter-1,2,3 von
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
629 views44 Folien
Data science unit1 von
Data science unit1Data science unit1
Data science unit1varshakumar21
2.9K views46 Folien
ch2 DS.pptx von
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptxderbew2112
1 view34 Folien
Big data ppt von
Big data pptBig data ppt
Big data pptDeepika ParthaSarathy
6.1K views34 Folien
Digital data von
Digital dataDigital data
Digital dataShivanandaVSeeri
1.4K views80 Folien
Digital Types von
Digital TypesDigital Types
Digital TypesShivanandaVSeeri
128 views80 Folien

Similar a Introduction to Data Science.pptx(20)

Data science.chapter-1,2,3 von varshakumar21
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21629 views
Behind the scenes of data science von Loïc Lejoly
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
Loïc Lejoly101 views
BIg Data Overview von dimantoku
BIg Data OverviewBIg Data Overview
BIg Data Overview
dimantoku68 views
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION von Elvis Muyanja
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
Elvis Muyanja1.6K views
Introduction to Big Data Analytics von Utkarsh Sharma
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
Utkarsh Sharma259 views
Data Engineer vs Data Scientist vs Data Analyst.pptx von CarolineRebeccaD
Data Engineer vs Data Scientist vs Data Analyst.pptxData Engineer vs Data Scientist vs Data Analyst.pptx
Data Engineer vs Data Scientist vs Data Analyst.pptx
CarolineRebeccaD181 views
Chapter 2 - Introduction to Data Science.pptx von Wollo UNiversity
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
Wollo UNiversity20 views

Más de Anusuya123

Types of Data-Introduction.pptx von
Types of Data-Introduction.pptxTypes of Data-Introduction.pptx
Types of Data-Introduction.pptxAnusuya123
6 views36 Folien
Basic Statistical Descriptions of Data.pptx von
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxAnusuya123
90 views22 Folien
Data warehousing.pptx von
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptxAnusuya123
56 views8 Folien
Unit 1-Data Science Process Overview.pptx von
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxAnusuya123
82 views60 Folien
5.2.2. Memory Consistency Models.pptx von
5.2.2. Memory Consistency Models.pptx5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptxAnusuya123
7 views39 Folien
5.1.3. Chord.pptx von
5.1.3. Chord.pptx5.1.3. Chord.pptx
5.1.3. Chord.pptxAnusuya123
2 views32 Folien

Más de Anusuya123(12)

Types of Data-Introduction.pptx von Anusuya123
Types of Data-Introduction.pptxTypes of Data-Introduction.pptx
Types of Data-Introduction.pptx
Anusuya1236 views
Basic Statistical Descriptions of Data.pptx von Anusuya123
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
Anusuya12390 views
Data warehousing.pptx von Anusuya123
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
Anusuya12356 views
Unit 1-Data Science Process Overview.pptx von Anusuya123
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptx
Anusuya12382 views
5.2.2. Memory Consistency Models.pptx von Anusuya123
5.2.2. Memory Consistency Models.pptx5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptx
Anusuya1237 views
3. Descriptive statistics.ppt von Anusuya123
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.ppt
Anusuya1232 views
Runtimeenvironment von Anusuya123
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
Anusuya123124 views
Lexical analyzer generator lex von Anusuya123
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
Anusuya1234.5K views
Operators in Python von Anusuya123
Operators in PythonOperators in Python
Operators in Python
Anusuya1231.9K views

Último

How I learned to stop worrying and love the dark silicon apocalypse.pdf von
How I learned to stop worrying and love the dark silicon apocalypse.pdfHow I learned to stop worrying and love the dark silicon apocalypse.pdf
How I learned to stop worrying and love the dark silicon apocalypse.pdfTomasz Kowalczewski
24 views66 Folien
9_DVD_Dynamic_logic_circuits.pdf von
9_DVD_Dynamic_logic_circuits.pdf9_DVD_Dynamic_logic_circuits.pdf
9_DVD_Dynamic_logic_circuits.pdfUsha Mehta
28 views32 Folien
SWM L15-L28_drhasan (Part 2).pdf von
SWM L15-L28_drhasan (Part 2).pdfSWM L15-L28_drhasan (Part 2).pdf
SWM L15-L28_drhasan (Part 2).pdfMahmudHasan747870
28 views93 Folien
Codes and Conventions.pptx von
Codes and Conventions.pptxCodes and Conventions.pptx
Codes and Conventions.pptxIsabellaGraceAnkers
7 views5 Folien
MSA Website Slideshow (16).pdf von
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdfmsaucla
46 views8 Folien
A multi-microcontroller-based hardware for deploying Tiny machine learning mo... von
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...A multi-microcontroller-based hardware for deploying Tiny machine learning mo...
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...IJECEIAES
12 views10 Folien

Último(20)

How I learned to stop worrying and love the dark silicon apocalypse.pdf von Tomasz Kowalczewski
How I learned to stop worrying and love the dark silicon apocalypse.pdfHow I learned to stop worrying and love the dark silicon apocalypse.pdf
How I learned to stop worrying and love the dark silicon apocalypse.pdf
9_DVD_Dynamic_logic_circuits.pdf von Usha Mehta
9_DVD_Dynamic_logic_circuits.pdf9_DVD_Dynamic_logic_circuits.pdf
9_DVD_Dynamic_logic_circuits.pdf
Usha Mehta28 views
MSA Website Slideshow (16).pdf von msaucla
MSA Website Slideshow (16).pdfMSA Website Slideshow (16).pdf
MSA Website Slideshow (16).pdf
msaucla46 views
A multi-microcontroller-based hardware for deploying Tiny machine learning mo... von IJECEIAES
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...A multi-microcontroller-based hardware for deploying Tiny machine learning mo...
A multi-microcontroller-based hardware for deploying Tiny machine learning mo...
IJECEIAES12 views
13_DVD_Latch-up_prevention.pdf von Usha Mehta
13_DVD_Latch-up_prevention.pdf13_DVD_Latch-up_prevention.pdf
13_DVD_Latch-up_prevention.pdf
Usha Mehta10 views
_MAKRIADI-FOTEINI_diploma thesis.pptx von fotinimakriadi
_MAKRIADI-FOTEINI_diploma thesis.pptx_MAKRIADI-FOTEINI_diploma thesis.pptx
_MAKRIADI-FOTEINI_diploma thesis.pptx
fotinimakriadi6 views
7_DVD_Combinational_MOS_Logic_Circuits.pdf von Usha Mehta
7_DVD_Combinational_MOS_Logic_Circuits.pdf7_DVD_Combinational_MOS_Logic_Circuits.pdf
7_DVD_Combinational_MOS_Logic_Circuits.pdf
Usha Mehta59 views
CHI-SQUARE ( χ2) TESTS.pptx von ssusera597c5
CHI-SQUARE ( χ2) TESTS.pptxCHI-SQUARE ( χ2) TESTS.pptx
CHI-SQUARE ( χ2) TESTS.pptx
ssusera597c529 views
Design and analysis of a new undergraduate Computer Engineering degree – a me... von WaelBadawy6
Design and analysis of a new undergraduate Computer Engineering degree – a me...Design and analysis of a new undergraduate Computer Engineering degree – a me...
Design and analysis of a new undergraduate Computer Engineering degree – a me...
WaelBadawy653 views
Machine Element II Course outline.pdf von odatadese1
Machine Element II Course outline.pdfMachine Element II Course outline.pdf
Machine Element II Course outline.pdf
odatadese17 views
Informed search algorithms.pptx von Dr.Shweta
Informed search algorithms.pptxInformed search algorithms.pptx
Informed search algorithms.pptx
Dr.Shweta13 views

Introduction to Data Science.pptx

  • 1. CS3352 -Foundations of Data Science Introduction to Data Science Dr.V.Anusuya Associate Professor/IT Ramco Institute of Technology Rajapalayam
  • 2. Introduction • Data Science is the area of study which involves extracting insights from vast amounts of data using various scientific methods, algorithms and processes. • Data Science is useful to discover hidden patterns from the voluminous raw data. • Data Science is because of the evolution of mathematical statistics, data analysis, and big data.
  • 3. Contd., • Ingredients = Data • When a chef is starting out with a new dish. • After the chef has determined what type of dish he would like to make, he goes to the fridge to gather ingredients. If he doesn’t have the necessary ingredients, he takes a trip to the store to collect them. • As data scientists, our data points are our ingredients. We may have some on hand, but we still might have to collect more through web scraping, SQL queries, etc.
  • 4. Big Data • Data Science focuses on processing the huge volume of heterogeneous data known as big data. • Big data refers to the data sets that are large and complex in nature and difficult to process using traditional data processing application software.
  • 5. Contd., • The characteristics of big data are often referred to as the three Vs: • Volume—How much data is there? • Variety—How diverse are different types of data? • Velocity—At what speed is new data generated? • Veracity-How accurate is the data?. • These four properties make big data different from the data found in traditional data management tools.
  • 6. Contd., • The four V’s of data make the data capturing,cleaning,preprocessing,storing,search ing,sharing,transferring and visualization processes a complex task. • Specialization techniques are required to extract the insights from this huge volume of data. • The main things that set a data scientist apart from a statistician are the ability to work with big data and experience in machine learning, computing, and algorithm building.
  • 7. Big data • Tools- Hadoop, Pig, Spark, R, Python, and Java. • Python is a great language for data science because it has many data science libraries available, and it’s widely supported by specialized software.
  • 8. Contd., • Data Science is an interdisciplinary field that allows to extract knowledge from structured or unstructured data. • Translate business problem and Research project into practical solution.
  • 9. Benefits and uses of data science • Data science and big data are used almost everywhere in both commercial and noncommercial Settings. • Commercial companies in almost every industry use data science and big data to gain insights into their customers, processes, staff, completion, and products. • Many companies use data science to offer customers a better user experience, as well as to cross-sell, up-sell, and personalize their offerings.
  • 10. Benefits and uses of data science • Governmental organizations are also aware of data’s value. Many governmental organizations not only rely on internal data scientists to discover valuable information, but also share their data with the public. • Nongovernmental organizations (NGOs) use it to raise money and defend their causes. • Universities use data science in their research but also to enhance the study experience of their students. The rise of massive open online courses (MOOC) produces a lot of data, which allows universities to study how this type of learning can complement traditional classes.
  • 11. Facets of data • In data science and big data we’ll come across many different types of data, and each of them tends to require different tools and techniques. The main categories of data are these: • Structured • Unstructured • Natural language • Machine-generated • Graph-based • Audio, video, and images • Streaming
  • 12. Structured Data •Structured data is data that depends on a data model and resides in a fixed field within a record. •To store structured data in tables within databases or Excel files. •SQL, or Structured Query Language, is the preferred way to manage and query data that resides in databases. •Hierarchical data such as a family tree.
  • 14. Unstructured Data • Unstructured data is data that isn’t easy to fit into a data model because the content is context-specific or varying. • Example of unstructured data is your regular email. • Although email contains structured elements such as the sender, title, and body text. • The thousands of different languages and dialects out there further complicate this.
  • 16. Natural language • Natural language is a special type of unstructured data; it’s challenging to process because it requires knowledge of specific data science techniques and linguistics. • The natural language processing community has had success in entity recognition, Speech recognition, summarization, text completion, and sentiment analysis, but models trained in one domain don’t generalize well to other domains. • Even state-of-the-art techniques aren’t able to decipher the meaning of every piece of text. • Example: personal voice Assistant, siri, Alexa recognize patterns in speech, provide a useful response. • Example: Email Filters- Primary, Social ,Promotions
  • 17. Machine-generated data • Machine-generated data is information that’s automatically created by a computer, process, application, or other machine without human intervention. • Machine-generated data is becoming a major data resource and will continue to do so. • The analysis of machine data relies on highly scalable tools, due to its high volume and speed. Examples of machine data are web server logs, call detail records, network event logs, and telemetry.
  • 19. Graph-based or Networked Data • A graph is a mathematical structure to model pair-wise relationships between objects. • Graph or network data is, in short, data that focuses on the relationship or adjacency of objects. • The graph structures use nodes, edges, and properties to represent and store graphical data.
  • 20. Contd., • Graph-based data is a natural way to represent social networks, and its structure allows you to calculate specific metrics such as the influence of a person and the shortest path between two people. • Example • LinkedIn-business colleagues, Twitter- follower list. • Facebook-connecting edges here to show friends.
  • 21. Graph based structure Specialized query languages such as SPARQL
  • 22. Audio, image, and video • Audio, image, and video are data types that pose specific challenges to a data scientist. • Tasks that are trivial for humans, such as recognizing objects in pictures, to be challenging for computers. • MLBAM (Major League Baseball Advanced Media) announced in 2014 that they’ll increase video capture to approximately 7 TB per game for the purpose of live, in-game analytics.
  • 23. Contd., • Recently a company called DeepMind succeeded at creating an algorithm that’s capable of learning how to play video games. • This algorithm takes the video screen as input and learns to interpret everything via a complex of deep learning.
  • 24. Streaming Data • The data flows into the system when an event happens instead of being loaded into a data store in a batch. • Examples are the “What’s trending” on Twitter, live sporting or music events, and the stock market.