SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Frieda Brioschi - frieda.brioschi@gmail.com
Emma Tracanella - emma.tracanella@gmail.com
AROUND DATA SCIENCE
LESSON 5 - 2020/21
LESSON 5
2
DESCRIBE YOUR PROJECT
Photo by William Iven on Unsplash
LESSON 5
A COUPLE OF DIGRESSIONS
▸ storage issues
▸ http://blog.odsi.co.uk/wp-content/uploads/2013/08/History-of-computer-
data-storage.png.jpg
▸ the rise of data center
▸ computational power
▸ the Internet
3
LESSON 5
MARGARET HAMILTON
4
LESSON 5
DATA CENTER CLOUD (4.563 IN 2019)
5https://www.digitalic.it/tecnologia/data-center-cloud-numeri-e-diffusione-nel-mondo-litalia-tra-i-paesi-europei-che-ne-ospita-di-piu
BIG DATA
WHAT ARE
Photo by ev on Unsplash
LESSON 5
DEFINITION
The term “big data” refers to data that is so large, fast or complex that it’s difficult or impossible to
process using traditional methods. The concept of big data gained momentum in the early 2000s
when industry analyst Doug Laney articulated the definition of big data as the three V’s:
▸ Volume: Organizations collect data from a variety of sources, including business transactions,
smart (IoT) devices, industrial equipment, videos, social media and more. In the past, storing it
would have been a problem.
▸ Velocity: With the growth in the Internet of Things, data streams in to businesses at an
unprecedented speed and must be handled in a timely manner, near-real time.
▸ Variety: Data comes in all types of formats – from structured, numeric data in traditional
databases to unstructured text documents, emails, videos, audios, stock ticker data and nancial
transactions.
7
LESSON 5
(ACCORDING TO SAS)
8
LESSON 5
9
https://www.domo.com/learn/
data-never-sleeps-8
https://www.visualcapitalist.com/
big-data-keeps-getting-bigger/
LESSON 5
CORRELATION
When two sets of data are strongly linked together we say they have a High Correlation.
▸ Correlation is Positive when the values increase together, and
▸ Correlation is Negative when one value decreases as the other increases
Correlation can have a value:
▸ 1 is a perfect positive correlation
▸ 0 is no correlation (the values don't seem linked at all)
▸ -1 is a perfect negative correlation
12
LESSON 5
CORRELATION
Correlation is one of the most widely used statistical concepts.
Since the term "correlation" refers to a mutual relationship or association between
quantities, why is it a useful metric?
▸ Correlation can help in predicting one quantity from another
▸ Correlation can (but often does not) indicate the presence of a causal
relationship
▸ Correlation is used as a basic quantity and foundation for many other
modeling techniques
13
https://wearesocial.com/blog/2020/07/digital-use-around-the-world-in-july-2020
https://wearesocial.com/blog/2020/07/digital-use-around-the-world-in-july-2020
https://wearesocial.com/blog/2020/07/digital-use-around-the-world-in-july-2020
https://wearesocial.com/blog/2020/07/digital-use-around-the-world-in-july-2020
DATA
LINKED
LESSON 5
LINKED DATA / LOD
19
Linked data is structured data which is interlinked with other data so it becomes
more useful through semantic queries.It builds upon standard Web technologies
but rather than using them to serve web pages only for human readers, it extends
them to share information in a way that can be read automatically by computers.
Part of the vision of linked data is for the Internet to become a global database.
Linked data may also be open data, in which case it is usually described as linked
open data (LOD).
▸ https://en.wikipedia.org/wiki/Linked_data
LESSON 5
SCHEMA.ORG
http://schema.org/docs/full.html
20
LESSON 5
GOOGLE KNOWLEDGE GRAPH
21
https://www.youtube.com/watch?v=mmQl6VGvX-c
LESSON 5
WHY LINKED DATA MATTERS
Linked data is a method for publishing structured data using vocabularies like
schema.org that can be connected together and interpreted by machines. Using
linked data, statements encoded in triples can be spread across different
websites.
This enables data from different sources to be connected and queried.
▸ https://wordlift.io/blog/en/entity/linked-data/
22
DATA MINING
CLASSICAL
Photo by ev on Unsplash
LESSON 5
CONTEXT
You don’t have to be a fancy statistician to do data mining, but you do
have to know something about what the data signies and how the
business works.
Only when you understand the data and the problem that you need to
solve can data-mining processes help you to discover useful
information and put it to use.
24
LESSON 5
NINE LAWS OF DATA MINING - 1
Pioneering data miner Thomas Khabaza developed his “Nine Laws of Data Mining”
to guide new data miners as they get down to work
▸ 1 - “Business Goals Law” 

Business objectives are the origin of every data mining solution.
A data miner is someone who discovers useful information from data to support
specific business goals. Data mining isn’t defined by the tool you use.
▸ 2 - “Business Knowledge Law”

Business Knowledge is central to every step of the data mining process.
You don’t have to be a fancy statistician to do data mining, but you do have to
know something about what the data signies and how the business works.
25
LESSON 5
NINE LAWS OF DATA MINING - 2
▸ 3. “Data Preparation Law”

Data preparation is more than half of every data mining process.
Pretty much every data miner will spend more time on data preparation than on
analysis.
▸ 4. “No Free Lunch for the Data Miner”

The right model for a given application can only be discovered by experiment.
In data mining, models are selected through trial and error.
▸ 5 - “Patterns”

There are always patterns in the data.
As a data miner, you explore data in search of useful patterns. Understanding patterns
in the data enables you to influence what happens in the future.
26
LESSON 5
NINE LAWS OF DATA MINING - 3
▸ 6.  “Insight Law”

Data mining amplifies perception in the business domain.
Data mining methods enable you to understand your business better than you
could have done without them.
▸ 7 - “Prediction Law”

Prediction increases information locally by generalization.
Data mining helps us use what we know to make better predictions (or
estimates) of things we don’t know.
27
LESSON 5
NINE LAWS OF DATA MINING - 4
▸ 8. “Value Law”

The value of data mining results is not determined by the accuracy or stability
of predictive models.
Your model must produce good predictions, consistently. That’s it.
▸ 9. “Law of Change”

All patterns are subject to change.
Any model that gives you great predictions today may be useless tomorrow.
28
LESSON 5
PHASES OF THE DATA MINING PROCESS
The Cross-Industry Standard Process for
Data Mining (CRISP-DM) is the dominant
data-mining process framework. It’s an
open standard; anyone may use it.
29
LESSON 5
BUSINESS UNDERSTANDING
Get a clear understanding of the problem you’re out to solve, how it impacts your
organization, and your goals for addressing it.
Tasks in this phase include:
▸ Identifying your business goals
▸ Assessing your situation
▸ Defining your data mining goals
▸ Producing your project plan
30
LESSON 5
DATA UNDERSTANDING
Review the data that you have, document it, identify data management and data quality
issues.
Tasks in this phase include:
▸ Gathering data
▸ Describing
▸ Exploring
▸ Verifying quality
31
LESSON 5
DATA PREPARATION
Get your data ready to use for modeling.
Tasks in this phase include:
▸ Selecting data
▸ Cleaning data
▸ Constructing
▸ Integrating
▸ Formatting
32
LESSON 5
MODELING
Use mathematical techniques to identify patterns within your data.
Tasks in this phase include:
▸ Selecting techniques
▸ Designing tests
▸ Building models
▸ Assessing models
33
LESSON 5
EVALUATION
Review the patterns you have discovered and assess their potential for business
use.
Tasks in this phase include:
▸ Evaluating results
▸ Reviewing the process
▸ Determining the next steps
34
LESSON 5
DEPLOYMENT
Put your discoveries to work in everyday business. 
Tasks in this phase include:
▸ Planning deployment (your methods for integrating data mining discoveries
into use)
▸ Reporting final results
▸ Reviewing final results
35
DATA AGGREGATION
CLASSICAL
Photo by ev on Unsplash
LESSON 5
DATA AGGREGATION
Data aggregation is the process where raw data is gathered and expressed in a summary
form for statistical analysis.
For example, raw data can be aggregated over a given time period to provide statistics. After
the data is aggregated and written to a view or report, you can analyze the aggregated data
to gain insights about particular resources or resource groups.
There are two types of data aggregation:
▸ Time aggregation - All data points for a single resource over a specified time period.
▸ Spatial aggregation - All data points for a group of resources over a specified
geographical area.
37
LESSON 5
SUMMARY STATISTICS
When data is aggregated, groups of observations are replaced with summary statistics based on those observations.
Summary statistics are used tto communicate the largest amount of information as simply as possible.
▸ Mean
▸ Count
▸ Maximum
▸ Median
▸ Minimum
▸ Mode
▸ Range
▸ Sum
38
LESSON 5
TABLES
Tables are the format in which most numerical data are initially stored and analysed and
are likely to be the means you use to organise data collected during experiments and
dissertation research.
Tables are an effective way of presenting data:
• when you wish to show how a single category of information varies when
measured at different points (in time or space).
• when the dataset contains relatively few numbers.
• when the precise value is crucial to your argument and a graph would not convey
39
LESSON 5
BAR CHARTS
Bar charts are one of the most commonly
used types of graph and are used to display
and compare the number, frequency or other
measure for different discrete categories or
groups.
The bars can be drawn either vertically or
horizontally depending upon the number of
categories and length or complexity of the
category labels.
40
LESSON 5
HISTOGRAMS
Histograms are a special form of bar chart
where the data represent continuous rather
than discrete categories. Since a
continuous category may have a large
number of possible values the data are
often grouped to reduce the number of data
points.
41
LESSON 5
PIE CHARTS
Pie charts are a visual way of displaying how
the total data are distributed between different
categories. Pie charts should only be used for
displaying nominal data. They are generally
best for showing information grouped into a
small number of categories and are a
graphical way of displaying data that might
otherwise be presented as a simple table.
42
Pie chart of populations of English native speakers
LESSON 5
LINE GRAPHS
Line graphs are usually used to show time
series data – that is how one or more
variables vary over a continuous period of
time. Line graphs are particularly useful for
identifying patterns and trends in the data
such as seasonal effects, large changes and
turning points. As well as time series data,
line graphs can also be appropriate for
displaying data that are measured over other
continuous variables such as distance.
43
DATA SCIENCE
WHAT IS
Photo by ev on Unsplash
LESSON 5
DEFINITION
Data Science is a blend of various tools, algorithms, and machine learning
principles with the goal to discover hidden patterns from the raw data and solve
analytically complicated problems.
45
LESSON 5
APPLICATION OF DATA SCIENCE
46
LESSON 5
47
LESSON 5
EXPLAINING VS PREDICTING
48
By 2020 more than 80 % of the data
will be unstructured. This data is
generated from different sources like
nancial logs, text les, multimedia
forms, sensors, and instruments.
LESSON 5
49https://databasetown.com/introduction-to-data-science-a-beginners-guide/#What_is_Data_Science
LESSON 5
50
LESSON 5
51
The Data Scientist has the ability to handle the crude data using the latest
technologies and techniques, can perform the necessary analysis, and can
present the acquired knowledge to his associates in an informative way.
LESSON 5
52
The Data Analyst works with R, Python and SQL; the role combines technical
and analytical knowledge.
LESSON 5
53
The Data Architect integrates, centralizes, protects and maintains data
sources.
LESSON 5
54
The Statistician can be seen as the pioneer of the data science eld. It is often
he who reaps the information from the data and transforms it into actionable
insights.
LESSON 5
55
The Database Administrator ensures that the database is accessible to every
stakeholder in the organizations and performs the necessary safety measures
to keep the stored data safe.
LESSON 5
56
The Business Analyst is probably the least technical prole, he has a deep
understanding of the various business processes that are in place. He often
performs the role of the middle person between the business folks and the
technicians.
LESSON 5
57
The Data and Analytics Manager steers the direction of the data science
team. He consolidates strong and specialized skills in a various arrangement
of advancements (SQL, R, SAS, … ) with the social aptitudes required to deal
with a group.
EXAMPLES
SOME
PHOTO BY JAREDD CRAIG ON UNSPLASH
LESSON 5
THE NY TIMES
https://www.nytimes.com/interactive/2019/11/02/us/politics/trump-twitter-
disinformation.html
59

Weitere ähnliche Inhalte

Was ist angesagt?

Data mining and data aggregation basics
Data mining and data aggregation basicsData mining and data aggregation basics
Data mining and data aggregation basicsFrieda Brioschi
 
Around Data Science (v. 2020 ITA)
Around Data Science (v. 2020 ITA)Around Data Science (v. 2020 ITA)
Around Data Science (v. 2020 ITA)Frieda Brioschi
 
How we perceive information (v. 2020 ITA)
How we perceive information (v. 2020 ITA)How we perceive information (v. 2020 ITA)
How we perceive information (v. 2020 ITA)Frieda Brioschi
 
Visual communication of quantitative data (v. 2020 ITA)
Visual communication of quantitative data (v. 2020 ITA)Visual communication of quantitative data (v. 2020 ITA)
Visual communication of quantitative data (v. 2020 ITA)Frieda Brioschi
 
Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)Frieda Brioschi
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industryStefano Perfetti
 
The data science revolution in insurance
The data science revolution in insuranceThe data science revolution in insurance
The data science revolution in insuranceStefano Perfetti
 
Keynote acm10.14.2017
Keynote acm10.14.2017Keynote acm10.14.2017
Keynote acm10.14.2017Alo Ghosh
 
How we perceive information
How we perceive informationHow we perceive information
How we perceive informationFrieda Brioschi
 
Big data-and-creativity v.1
Big data-and-creativity v.1Big data-and-creativity v.1
Big data-and-creativity v.1Kim Flintoff
 
Data scientist
Data scientistData scientist
Data scientistTrieu Nguyen
 
Information visualization: information dashboards
Information visualization: information dashboardsInformation visualization: information dashboards
Information visualization: information dashboardsKatrien Verbert
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Carlo Vaccari
 
Focus composants-english-v0
Focus composants-english-v0Focus composants-english-v0
Focus composants-english-v0RenĂŠ MANDEL
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionFabio Stella
 
Data Visualization
Data Visualization Data Visualization
Data Visualization Madelyn Cox
 
Www.diva portal.org smash-get_diva2_328402_fulltext01
Www.diva portal.org smash-get_diva2_328402_fulltext01Www.diva portal.org smash-get_diva2_328402_fulltext01
Www.diva portal.org smash-get_diva2_328402_fulltext01Loida Silao
 
Digital Prosumer - Identification of Personas through Intelligent Data Mining...
Digital Prosumer - Identification of Personas through Intelligent Data Mining...Digital Prosumer - Identification of Personas through Intelligent Data Mining...
Digital Prosumer - Identification of Personas through Intelligent Data Mining...Adebowale Nadi MBCS MIET MIScT RITTech
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Jonathan Gray
 

Was ist angesagt? (20)

Data mining and data aggregation basics
Data mining and data aggregation basicsData mining and data aggregation basics
Data mining and data aggregation basics
 
Around Data Science (v. 2020 ITA)
Around Data Science (v. 2020 ITA)Around Data Science (v. 2020 ITA)
Around Data Science (v. 2020 ITA)
 
How we perceive information (v. 2020 ITA)
How we perceive information (v. 2020 ITA)How we perceive information (v. 2020 ITA)
How we perceive information (v. 2020 ITA)
 
Visual communication of quantitative data (v. 2020 ITA)
Visual communication of quantitative data (v. 2020 ITA)Visual communication of quantitative data (v. 2020 ITA)
Visual communication of quantitative data (v. 2020 ITA)
 
Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industry
 
The data science revolution in insurance
The data science revolution in insuranceThe data science revolution in insurance
The data science revolution in insurance
 
Keynote acm10.14.2017
Keynote acm10.14.2017Keynote acm10.14.2017
Keynote acm10.14.2017
 
How we perceive information
How we perceive informationHow we perceive information
How we perceive information
 
Big data-and-creativity v.1
Big data-and-creativity v.1Big data-and-creativity v.1
Big data-and-creativity v.1
 
Visual analytics
Visual analyticsVisual analytics
Visual analytics
 
Data scientist
Data scientistData scientist
Data scientist
 
Information visualization: information dashboards
Information visualization: information dashboardsInformation visualization: information dashboards
Information visualization: information dashboards
 
Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8Sharing Advisory Board newsletter #8
Sharing Advisory Board newsletter #8
 
Focus composants-english-v0
Focus composants-english-v0Focus composants-english-v0
Focus composants-english-v0
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
Data Visualization
Data Visualization Data Visualization
Data Visualization
 
Www.diva portal.org smash-get_diva2_328402_fulltext01
Www.diva portal.org smash-get_diva2_328402_fulltext01Www.diva portal.org smash-get_diva2_328402_fulltext01
Www.diva portal.org smash-get_diva2_328402_fulltext01
 
Digital Prosumer - Identification of Personas through Intelligent Data Mining...
Digital Prosumer - Identification of Personas through Intelligent Data Mining...Digital Prosumer - Identification of Personas through Intelligent Data Mining...
Digital Prosumer - Identification of Personas through Intelligent Data Mining...
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
 

Ähnlich wie Around Data Science (v. 2021 ITA)

Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsIRJET Journal
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfAlan Morrison
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016Quantopian
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data miningINFOGAIN PUBLICATION
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Ali Alkan
 
Climate change action through artificial intelligence
Climate change action through artificial intelligenceClimate change action through artificial intelligence
Climate change action through artificial intelligenceweADAPT
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael..."Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...Quantopian
 
Data mining
Data miningData mining
Data miningsagar dl
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET Journal
 
BCS BISSG Business Intelligence - Past, Present and Future
BCS BISSG Business Intelligence - Past, Present and FutureBCS BISSG Business Intelligence - Past, Present and Future
BCS BISSG Business Intelligence - Past, Present and FutureGary Nuttall MBCS CITP
 

Ähnlich wie Around Data Science (v. 2021 ITA) (20)

Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 
Data Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope SurveyData Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope Survey
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
FAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdfFAIR data_ Superior data visibility and reuse without warehousing.pdf
FAIR data_ Superior data visibility and reuse without warehousing.pdf
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
Empowering Quants in the Data Economy by Napoleon Hernandez at QuantCon 2016
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining6 ijaems sept-2015-6-a review of data security primitives in data mining
6 ijaems sept-2015-6-a review of data security primitives in data mining
 
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
Intelligently Automating Machine Learning, Artificial Intelligence, and Data ...
 
Climate change action through artificial intelligence
Climate change action through artificial intelligenceClimate change action through artificial intelligence
Climate change action through artificial intelligence
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
Challenges of Big Data Research
Challenges of Big Data ResearchChallenges of Big Data Research
Challenges of Big Data Research
 
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael..."Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
"Machine Learning Approaches to Regime-aware Portfolio Management" by Michael...
 
Data mining
Data miningData mining
Data mining
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
IRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in SoftwareIRJET- A Study on Data Mining in Software
IRJET- A Study on Data Mining in Software
 
BCS BISSG Business Intelligence - Past, Present and Future
BCS BISSG Business Intelligence - Past, Present and FutureBCS BISSG Business Intelligence - Past, Present and Future
BCS BISSG Business Intelligence - Past, Present and Future
 

Mehr von Frieda Brioschi

Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)Frieda Brioschi
 
Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)Frieda Brioschi
 
How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)Frieda Brioschi
 
Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)Frieda Brioschi
 
What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)Frieda Brioschi
 
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Frieda Brioschi
 
Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)Frieda Brioschi
 
Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)Frieda Brioschi
 
Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)Frieda Brioschi
 
What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)Frieda Brioschi
 
Storytelling with data
Storytelling with dataStorytelling with data
Storytelling with dataFrieda Brioschi
 
Visual communication of quantitative data
Visual communication of quantitative dataVisual communication of quantitative data
Visual communication of quantitative dataFrieda Brioschi
 
Information Classification
Information ClassificationInformation Classification
Information ClassificationFrieda Brioschi
 
What are data and information, why they matter
What are data and information, why they matterWhat are data and information, why they matter
What are data and information, why they matterFrieda Brioschi
 
Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)Frieda Brioschi
 

Mehr von Frieda Brioschi (16)

Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)
 
Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)
 
How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)
 
Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)
 
What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)
 
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
 
Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)
 
Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)
 
Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)
 
What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)
 
Storytelling with data
Storytelling with dataStorytelling with data
Storytelling with data
 
Visual communication of quantitative data
Visual communication of quantitative dataVisual communication of quantitative data
Visual communication of quantitative data
 
Data Lingo
Data LingoData Lingo
Data Lingo
 
Information Classification
Information ClassificationInformation Classification
Information Classification
 
What are data and information, why they matter
What are data and information, why they matterWhat are data and information, why they matter
What are data and information, why they matter
 
Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)
 

KĂźrzlich hochgeladen

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 

KĂźrzlich hochgeladen (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAĐĄY_INDEX-DM_23-1-final-eng.pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 

Around Data Science (v. 2021 ITA)

  • 1. Frieda Brioschi - frieda.brioschi@gmail.com Emma Tracanella - emma.tracanella@gmail.com AROUND DATA SCIENCE LESSON 5 - 2020/21
  • 2. LESSON 5 2 DESCRIBE YOUR PROJECT Photo by William Iven on Unsplash
  • 3. LESSON 5 A COUPLE OF DIGRESSIONS ▸ storage issues ▸ http://blog.odsi.co.uk/wp-content/uploads/2013/08/History-of-computer- data-storage.png.jpg ▸ the rise of data center ▸ computational power ▸ the Internet 3
  • 5. LESSON 5 DATA CENTER CLOUD (4.563 IN 2019) 5https://www.digitalic.it/tecnologia/data-center-cloud-numeri-e-diffusione-nel-mondo-litalia-tra-i-paesi-europei-che-ne-ospita-di-piu
  • 6. BIG DATA WHAT ARE Photo by ev on Unsplash
  • 7. LESSON 5 DEFINITION The term “big data” refers to data that is so large, fast or complex that it’s difcult or impossible to process using traditional methods. The concept of big data gained momentum in the early 2000s when industry analyst Doug Laney articulated the denition of big data as the three V’s: ▸ Volume: Organizations collect data from a variety of sources, including business transactions, smart (IoT) devices, industrial equipment, videos, social media and more. In the past, storing it would have been a problem. ▸ Velocity: With the growth in the Internet of Things, data streams in to businesses at an unprecedented speed and must be handled in a timely manner, near-real time. ▸ Variety: Data comes in all types of formats – from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data and nancial transactions. 7
  • 12. LESSON 5 CORRELATION When two sets of data are strongly linked together we say they have a High Correlation. ▸ Correlation is Positive when the values increase together, and ▸ Correlation is Negative when one value decreases as the other increases Correlation can have a value: ▸ 1 is a perfect positive correlation ▸ 0 is no correlation (the values don't seem linked at all) ▸ -1 is a perfect negative correlation 12
  • 13. LESSON 5 CORRELATION Correlation is one of the most widely used statistical concepts. Since the term "correlation" refers to a mutual relationship or association between quantities, why is it a useful metric? ▸ Correlation can help in predicting one quantity from another ▸ Correlation can (but often does not) indicate the presence of a causal relationship ▸ Correlation is used as a basic quantity and foundation for many other modeling techniques 13
  • 19. LESSON 5 LINKED DATA / LOD 19 Linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries.It builds upon standard Web technologies but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the Internet to become a global database. Linked data may also be open data, in which case it is usually described as linked open data (LOD). ▸ https://en.wikipedia.org/wiki/Linked_data
  • 21. LESSON 5 GOOGLE KNOWLEDGE GRAPH 21 https://www.youtube.com/watch?v=mmQl6VGvX-c
  • 22. LESSON 5 WHY LINKED DATA MATTERS Linked data is a method for publishing structured data using vocabularies like schema.org that can be connected together and interpreted by machines. Using linked data, statements encoded in triples can be spread across different websites. This enables data from different sources to be connected and queried. ▸ https://wordlift.io/blog/en/entity/linked-data/ 22
  • 24. LESSON 5 CONTEXT You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signies and how the business works. Only when you understand the data and the problem that you need to solve can data-mining processes help you to discover useful information and put it to use. 24
  • 25. LESSON 5 NINE LAWS OF DATA MINING - 1 Pioneering data miner Thomas Khabaza developed his “Nine Laws of Data Mining” to guide new data miners as they get down to work ▸ 1 - “Business Goals Law” 
 Business objectives are the origin of every data mining solution. A data miner is someone who discovers useful information from data to support specic business goals. Data mining isn’t dened by the tool you use. ▸ 2 - “Business Knowledge Law”
 Business Knowledge is central to every step of the data mining process. You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signies and how the business works. 25
  • 26. LESSON 5 NINE LAWS OF DATA MINING - 2 ▸ 3. “Data Preparation Law”
 Data preparation is more than half of every data mining process. Pretty much every data miner will spend more time on data preparation than on analysis. ▸ 4. “No Free Lunch for the Data Miner”
 The right model for a given application can only be discovered by experiment. In data mining, models are selected through trial and error. ▸ 5 - “Patterns”
 There are always patterns in the data. As a data miner, you explore data in search of useful patterns. Understanding patterns in the data enables you to influence what happens in the future. 26
  • 27. LESSON 5 NINE LAWS OF DATA MINING - 3 ▸ 6.  “Insight Law”
 Data mining amplifies perception in the business domain. Data mining methods enable you to understand your business better than you could have done without them. ▸ 7 - “Prediction Law”
 Prediction increases information locally by generalization. Data mining helps us use what we know to make better predictions (or estimates) of things we don’t know. 27
  • 28. LESSON 5 NINE LAWS OF DATA MINING - 4 ▸ 8. “Value Law”
 The value of data mining results is not determined by the accuracy or stability of predictive models. Your model must produce good predictions, consistently. That’s it. ▸ 9. “Law of Change”
 All patterns are subject to change. Any model that gives you great predictions today may be useless tomorrow. 28
  • 29. LESSON 5 PHASES OF THE DATA MINING PROCESS The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It’s an open standard; anyone may use it. 29
  • 30. LESSON 5 BUSINESS UNDERSTANDING Get a clear understanding of the problem you’re out to solve, how it impacts your organization, and your goals for addressing it. Tasks in this phase include: ▸ Identifying your business goals ▸ Assessing your situation ▸ Dening your data mining goals ▸ Producing your project plan 30
  • 31. LESSON 5 DATA UNDERSTANDING Review the data that you have, document it, identify data management and data quality issues. Tasks in this phase include: ▸ Gathering data ▸ Describing ▸ Exploring ▸ Verifying quality 31
  • 32. LESSON 5 DATA PREPARATION Get your data ready to use for modeling. Tasks in this phase include: ▸ Selecting data ▸ Cleaning data ▸ Constructing ▸ Integrating ▸ Formatting 32
  • 33. LESSON 5 MODELING Use mathematical techniques to identify patterns within your data. Tasks in this phase include: ▸ Selecting techniques ▸ Designing tests ▸ Building models ▸ Assessing models 33
  • 34. LESSON 5 EVALUATION Review the patterns you have discovered and assess their potential for business use. Tasks in this phase include: ▸ Evaluating results ▸ Reviewing the process ▸ Determining the next steps 34
  • 35. LESSON 5 DEPLOYMENT Put your discoveries to work in everyday business.  Tasks in this phase include: ▸ Planning deployment (your methods for integrating data mining discoveries into use) ▸ Reporting nal results ▸ Reviewing nal results 35
  • 37. LESSON 5 DATA AGGREGATION Data aggregation is the process where raw data is gathered and expressed in a summary form for statistical analysis. For example, raw data can be aggregated over a given time period to provide statistics. After the data is aggregated and written to a view or report, you can analyze the aggregated data to gain insights about particular resources or resource groups. There are two types of data aggregation: ▸ Time aggregation - All data points for a single resource over a specied time period. ▸ Spatial aggregation - All data points for a group of resources over a specied geographical area. 37
  • 38. LESSON 5 SUMMARY STATISTICS When data is aggregated, groups of observations are replaced with summary statistics based on those observations. Summary statistics are used tto communicate the largest amount of information as simply as possible. ▸ Mean ▸ Count ▸ Maximum ▸ Median ▸ Minimum ▸ Mode ▸ Range ▸ Sum 38
  • 39. LESSON 5 TABLES Tables are the format in which most numerical data are initially stored and analysed and are likely to be the means you use to organise data collected during experiments and dissertation research. Tables are an effective way of presenting data: • when you wish to show how a single category of information varies when measured at different points (in time or space). • when the dataset contains relatively few numbers. • when the precise value is crucial to your argument and a graph would not convey 39
  • 40. LESSON 5 BAR CHARTS Bar charts are one of the most commonly used types of graph and are used to display and compare the number, frequency or other measure for different discrete categories or groups. The bars can be drawn either vertically or horizontally depending upon the number of categories and length or complexity of the category labels. 40
  • 41. LESSON 5 HISTOGRAMS Histograms are a special form of bar chart where the data represent continuous rather than discrete categories. Since a continuous category may have a large number of possible values the data are often grouped to reduce the number of data points. 41
  • 42. LESSON 5 PIE CHARTS Pie charts are a visual way of displaying how the total data are distributed between different categories. Pie charts should only be used for displaying nominal data. They are generally best for showing information grouped into a small number of categories and are a graphical way of displaying data that might otherwise be presented as a simple table. 42 Pie chart of populations of English native speakers
  • 43. LESSON 5 LINE GRAPHS Line graphs are usually used to show time series data – that is how one or more variables vary over a continuous period of time. Line graphs are particularly useful for identifying patterns and trends in the data such as seasonal effects, large changes and turning points. As well as time series data, line graphs can also be appropriate for displaying data that are measured over other continuous variables such as distance. 43
  • 44. DATA SCIENCE WHAT IS Photo by ev on Unsplash
  • 45. LESSON 5 DEFINITION Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data and solve analytically complicated problems. 45
  • 46. LESSON 5 APPLICATION OF DATA SCIENCE 46
  • 48. LESSON 5 EXPLAINING VS PREDICTING 48 By 2020 more than 80 % of the data will be unstructured. This data is generated from different sources like nancial logs, text les, multimedia forms, sensors, and instruments.
  • 51. LESSON 5 51 The Data Scientist has the ability to handle the crude data using the latest technologies and techniques, can perform the necessary analysis, and can present the acquired knowledge to his associates in an informative way.
  • 52. LESSON 5 52 The Data Analyst works with R, Python and SQL; the role combines technical and analytical knowledge.
  • 53. LESSON 5 53 The Data Architect integrates, centralizes, protects and maintains data sources.
  • 54. LESSON 5 54 The Statistician can be seen as the pioneer of the data science eld. It is often he who reaps the information from the data and transforms it into actionable insights.
  • 55. LESSON 5 55 The Database Administrator ensures that the database is accessible to every stakeholder in the organizations and performs the necessary safety measures to keep the stored data safe.
  • 56. LESSON 5 56 The Business Analyst is probably the least technical prole, he has a deep understanding of the various business processes that are in place. He often performs the role of the middle person between the business folks and the technicians.
  • 57. LESSON 5 57 The Data and Analytics Manager steers the direction of the data science team. He consolidates strong and specialized skills in a various arrangement of advancements (SQL, R, SAS, … ) with the social aptitudes required to deal with a group.
  • 59. LESSON 5 THE NY TIMES https://www.nytimes.com/interactive/2019/11/02/us/politics/trump-twitter- disinformation.html 59