SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Sztuka czytania między
wierszami
czyli język R i Data Mining w akcji
<me>

Katarzyna Mrowca

</me>
The deal 
Agenda
• Quick glance on theory - Data mining
• Exercises on… paper
• Quick glance on tool – R console
• Exercises – became friend with R
•…
Agenda
• Quick glance on theory - Data mining
• Exercises on… paper
• Quick glance on tool – R console
• Exercises – became friend with R
•…

Theory

Exercise
Agenda
• Quick glance on theory - Data preparation
• Exercises
• Decision trees
• Cluser analysis
• Text mining
•…

Theory

Exercise
Agile is everywhere!
Agile is everywhere!
• Retro after second break
Quick glance on theory!
What data mining is?
What „google” says?
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), [1] an interdisciplinary subfield of
computer science,
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), an interdisciplinary subfield of computer
science, is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial intelligence,
machine learning, statistics.
What „google” says?
The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for
further use.
What „google” says?
The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for
further use.
What „google” says?
The overall goal of the data mining process is to extract information
from a data set and transform it into an understandable structure for
further use.
What „google” says?
Aside from the raw analysis step, it involves database and data
management aspects, data pre-processing, model and inference
considerations, interestingness metrics, complexity considerations,
post-processing of discovered structures, visualization, and online
updating.

Source: wikipedia
Data mining – what is „inside”
• Predictive
• Regression
• Classification
• Collaborative Filtering

• Descriptive
• Clustering / similarity matching
• Association rules and variants
• Deviation detection
Data mining – what is „inside”
• Predictive:
• Regression
• Classification
• Collaborative Filtering

• Descriptive:
• Clustering / similarity matching
• Association rules and variants
• Deviation detection
Data mining – what is „inside”
• Predictive:
• Regression
• Classification
• Collaborative Filtering

• Descriptive:
• Clustering / similarity matching
• Association rules and variants
• Deviation detection
What data mining is not?
Why Data Mining is so
popular?
What is a difference between
statistics and data mining?
Exercise
Data preparation
Variables
Qualitative & Quantitative
Tame R console!
Take a break 
Regression
Time series
Decision trees
Regression trees
Classification trees
K means
Text mining
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Introduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and PredictionIntroduction to Big Data and AI for Business Analytics and Prediction
Introduction to Big Data and AI for Business Analytics and Prediction
 
Introduction to Big Data and its Trends
Introduction to Big Data and its TrendsIntroduction to Big Data and its Trends
Introduction to Big Data and its Trends
 
The Importance of Open Innovation in AI era
The Importance of Open Innovation in AI eraThe Importance of Open Innovation in AI era
The Importance of Open Innovation in AI era
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic Happens
 
Data science
Data scienceData science
Data science
 
Data science
Data scienceData science
Data science
 
Big Data and Predictive Analysis
Big Data and Predictive AnalysisBig Data and Predictive Analysis
Big Data and Predictive Analysis
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace
 
AI on Big Data
AI on Big DataAI on Big Data
AI on Big Data
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Science presentation for elementary school students
Data Science presentation for elementary school studentsData Science presentation for elementary school students
Data Science presentation for elementary school students
 
Big Data Maturity Model and Governance
Big Data Maturity Model and GovernanceBig Data Maturity Model and Governance
Big Data Maturity Model and Governance
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
Traffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big DataTraffic Data Analysis and Prediction using Big Data
Traffic Data Analysis and Prediction using Big Data
 
Big Data Science: Intro and Benefits
Big Data Science: Intro and BenefitsBig Data Science: Intro and Benefits
Big Data Science: Intro and Benefits
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 

Ähnlich wie Sztuka czytania między wierszami - R i Data mining

Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
Bikramjit Sarkar, Ph.D.
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 

Ähnlich wie Sztuka czytania między wierszami - R i Data mining (20)

R & Data mining in action
R & Data mining in actionR & Data mining in action
R & Data mining in action
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...01-introduction.ppt the paper that you can unless you want to join me because...
01-introduction.ppt the paper that you can unless you want to join me because...
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Göteborg university(condensed)
Göteborg university(condensed)Göteborg university(condensed)
Göteborg university(condensed)
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Data mining
Data miningData mining
Data mining
 
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
What is Data Science? |Role of Data Science in Big Data, Hadoop & Machine Lea...
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptx
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Data mining
Data miningData mining
Data mining
 
An Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional DataAn Efficient Approach for Clustering High Dimensional Data
An Efficient Approach for Clustering High Dimensional Data
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
 

Mehr von Katarzyna Mrowca

Aplikacja od początku do końca - czyli skąd programiści wiedzą co pisać?
Aplikacja od początku do końca - czyli skąd programiści wiedzą co pisać?Aplikacja od początku do końca - czyli skąd programiści wiedzą co pisać?
Aplikacja od początku do końca - czyli skąd programiści wiedzą co pisać?
Katarzyna Mrowca
 

Mehr von Katarzyna Mrowca (20)

Delivering unicorns
Delivering unicornsDelivering unicorns
Delivering unicorns
 
Make your data beautiful!
Make your data beautiful! Make your data beautiful!
Make your data beautiful!
 
Defeat feature gluttony
Defeat feature gluttony Defeat feature gluttony
Defeat feature gluttony
 
Technical... User Stories?!
Technical... User Stories?!Technical... User Stories?!
Technical... User Stories?!
 
How to defeat feature gluttony?
How to defeat feature gluttony?How to defeat feature gluttony?
How to defeat feature gluttony?
 
User Stories Refactoring
User Stories RefactoringUser Stories Refactoring
User Stories Refactoring
 
Architecture for rookies
Architecture for rookiesArchitecture for rookies
Architecture for rookies
 
Agile project management anti patterns
Agile project management anti patterns Agile project management anti patterns
Agile project management anti patterns
 
User Stories Refactoring
User Stories RefactoringUser Stories Refactoring
User Stories Refactoring
 
Technical... user stories?!
Technical... user stories?!Technical... user stories?!
Technical... user stories?!
 
Tajniki współpracy z (trudnym) klientem
Tajniki współpracy z (trudnym) klientemTajniki współpracy z (trudnym) klientem
Tajniki współpracy z (trudnym) klientem
 
[ACE'14] The art of saying no
[ACE'14] The art of saying no [ACE'14] The art of saying no
[ACE'14] The art of saying no
 
Skad programisci wiedza co pisac
Skad programisci wiedza co pisacSkad programisci wiedza co pisac
Skad programisci wiedza co pisac
 
Sztuka mówienia NIE - w kontekście zbierania wymagań biznesowych
Sztuka mówienia NIE - w kontekście zbierania wymagań biznesowychSztuka mówienia NIE - w kontekście zbierania wymagań biznesowych
Sztuka mówienia NIE - w kontekście zbierania wymagań biznesowych
 
Sztuka wojny wg analityka IT - jak współpracować z trudnym klientem
Sztuka wojny wg analityka IT - jak współpracować z trudnym klientemSztuka wojny wg analityka IT - jak współpracować z trudnym klientem
Sztuka wojny wg analityka IT - jak współpracować z trudnym klientem
 
Jak wybrać systemy IT wspierające działalność przedsiębiorstwa
Jak wybrać systemy IT wspierające działalność przedsiębiorstwaJak wybrać systemy IT wspierające działalność przedsiębiorstwa
Jak wybrać systemy IT wspierające działalność przedsiębiorstwa
 
Aplikacja od początku do końca - czyli skąd programiści wiedzą co pisać?
Aplikacja od początku do końca - czyli skąd programiści wiedzą co pisać?Aplikacja od początku do końca - czyli skąd programiści wiedzą co pisać?
Aplikacja od początku do końca - czyli skąd programiści wiedzą co pisać?
 
"Z IT na nasze" - czyli na czym polega praca Analityka IT. (Wersja plus size :))
"Z IT na nasze" - czyli na czym polega praca Analityka IT. (Wersja plus size :))"Z IT na nasze" - czyli na czym polega praca Analityka IT. (Wersja plus size :))
"Z IT na nasze" - czyli na czym polega praca Analityka IT. (Wersja plus size :))
 
"Z IT na nasze" - czyli na czym polega praca analityka?
"Z IT na nasze" - czyli na czym polega praca analityka?"Z IT na nasze" - czyli na czym polega praca analityka?
"Z IT na nasze" - czyli na czym polega praca analityka?
 
Confitura 2013
Confitura 2013Confitura 2013
Confitura 2013
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Sztuka czytania między wierszami - R i Data mining

  • 1. Sztuka czytania między wierszami czyli język R i Data Mining w akcji
  • 3.
  • 5. Agenda • Quick glance on theory - Data mining • Exercises on… paper • Quick glance on tool – R console • Exercises – became friend with R •…
  • 6. Agenda • Quick glance on theory - Data mining • Exercises on… paper • Quick glance on tool – R console • Exercises – became friend with R •… Theory Exercise
  • 7. Agenda • Quick glance on theory - Data preparation • Exercises • Decision trees • Cluser analysis • Text mining •… Theory Exercise
  • 9. Agile is everywhere! • Retro after second break
  • 10. Quick glance on theory!
  • 13. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), [1] an interdisciplinary subfield of computer science,
  • 14. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 15. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 16. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 17. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 18. What „google” says? Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
  • 19. What „google” says? The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
  • 20. What „google” says? The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
  • 21. What „google” says? The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
  • 22. What „google” says? Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. Source: wikipedia
  • 23. Data mining – what is „inside” • Predictive • Regression • Classification • Collaborative Filtering • Descriptive • Clustering / similarity matching • Association rules and variants • Deviation detection
  • 24. Data mining – what is „inside” • Predictive: • Regression • Classification • Collaborative Filtering • Descriptive: • Clustering / similarity matching • Association rules and variants • Deviation detection
  • 25. Data mining – what is „inside” • Predictive: • Regression • Classification • Collaborative Filtering • Descriptive: • Clustering / similarity matching • Association rules and variants • Deviation detection
  • 26. What data mining is not?
  • 27. Why Data Mining is so popular?
  • 28. What is a difference between statistics and data mining?

Hinweis der Redaktion

  1. Ćwiczenie na kartkach polegające na szukaniu zależności. Narysować na tablicy, podać przykład ze spłacalnością kredytów
  2. Przykład z kodem pocztowym i numerem telefonu