SlideShare ist ein Scribd-Unternehmen logo
1 von 23
DS101:
Introduction to AI and DS
Lecture 1: Introduction to Data Science
Dr. Sudheer
hsudheer@ifheindia.org
1
2
Course Code Course Title L P U
DS101
Introduction to Data Science and Artificial
Intelligence
3 0 3
Team of Instructors: 1. Ms Sathya AR 2. Ms. P Rohini 3. Dr. H Sudheer 4. Dr. P. Sirisha
Course Objective:
1. The objective of this course is to expose the students to fundamental concepts of data science and their
implementation using Python programming.
2. Introduce the mathematical foundations required for data science
3. To explore the various data pre-processing techniques
4. To Summarize the aspects of exploratory data analysis (EDA): Uses of EDA; Role of metadata in EDA; Data
transformations identified through EDA.
5. To understand the AI approaches in Data Science.
3
Textbook (s) T1 Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From The
Frontline”, O’Reilly, 2014.
T2 Artificial Intelligence A Modern Approach, by Stuart Russell and Peter
Norvig, 3 rd Edition, Pearson Education, 2010, ISBN 13:978-0-13-604259-4.
Reference Book(s) R1 Python Data Science Handbook, Essential Tools for Working with Data, Jake
VanderPlas,Orielly, 2017
R2
Data Science from Scratch: FIRST PRINCIPLES WITH PYTHON, Joel Grus,
Orielly,2019
R3 The Data Science HandBook, Field Cady ,Wiley,2017
R4
Jiawei Han, Micheline Kamber and Jian Pei, “ Data Mining: Concepts and
Techniques”, Third Edition. ISBN 0123814790, 2011
Online Resources R5 https://onlinecourses.nptel.ac.in/noc22_cs72/preview
R6 https://www.udemy.com/course/complete-python-bootcamp/
R7
https://lms.simplilearn.com/courses/4227/Introduction-to-Data-
Science/syllabus
4
“Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has
to be changed into gas, plastic, chemicals, etc to create a valuable entity that
drives profitable activity; so must data be broken down, analyzed for it to have
value.” — Clive Humby, 2006
5
Increasingly many companies see
themselves as data driven.
6
Data Science is the science which uses computer science, statistics
and machine learning, visualization and human-computer
interactions to collect, clean, integrate, analyze, visualize, interact
with data to create data products.
Data science is an interdisciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge and insights
from noisy, structured and unstructured data] and apply knowledge from
data across a broad range of application domains. Data science is related
to data mining, machine learning and big data.
SOURCE : WIKIPIDEA
7
Big Data and Data Science Hype
8
“Big Data” Sources
Every:
Click
Ad impression
Billing event
Fast Forward, pause,…
Server request
Transaction
Network message
Fault
…
User Generated (Web &
Mobile)
….
.
Internet of Things / M2M Health/Scientific Computing
It’s All Happening On-line
“Big Data” Sources
11
The Current Landscape (with a Little History)
12
Data science is a broad field that refers to the collective
processes, theories, concepts, tools and technologies that
enable the review, analysis and extraction of valuable
knowledge and information from raw data.
Source: Techopedia
Drew Conway’s Venn diagram of data science
Rise of the Data Scientist
13
skills of Data Geeks:
Statistics – traditional analysis you’re used to thinking about
Data Munging – parsing, scraping, and formatting data
Visualization – graphs, tools, etc.
Harvard Business Review declared data scientist to be the “Sexiest Job of the
21st Century”.
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
The Role of the Social Scientist in Data Science
14
Both LinkedIn and Facebook are social network companies.
Often‐ times a description or definition of data scientist includes hybrid sta
tistician, software engineer, and social scientist.
If they’re social science-y problems like friend recommendations or people
you know or user segmentation, then by all means, bring on the social
scientist! Social scientists also do tend to be good question askers and have
other good investigative qualities, so a social scientist who also has the
quantitative and programming chops makes a great data scientist.
Data Science Jobs
15
Most of the job descriptions: they ask data scientists to be experts in
computer science, statistics, communication, data visualization, and to have
extensive domain expertise.
Nobody is an expert in everything, which is why it makes more sense to create
teams of people who have different profiles and different expertise together,
as a team, they can specialize in all those things.
A Data Science Profile :
• Computer science
• Math
• Statistics
• Machine learning
• Domain expertise
• Communication and presentation skills
• Data visualization
Rachel’s data science profile, which she created to illustrate trying to visualize oneself as a data
scientist; she wanted students and guest lecturers to “riff” on this—to add buckets or remove
skills, use a different scale or visualization method, and think about the drawbacks of self-
reporting
16
Data science team profiles can be
constructed from data scientist
profiles; there should be alignment
between the data science team
profile and the profile of the data
problems they try to solve
17
Data science workflow
18
Section 2
https://cacm.acm.org/blogs/blog-cacm/169199-data-science-
workflow-overview-and-challenges/fulltext
Data science workflow
19
Section 2
Data science workflow
20
Digging Around
in Data
Hypothesize
Model
Large Scale
Exploitation
Evaluate
Interpret
Clean,
prep
What is hard about Data Science
21
• Overcoming assumptions
• Making ad-hoc explanations of data patterns
• Overgeneralizing
• Communication
• Not checking enough (validate models, data pipeline
integrity, etc.)
• Using statistical tests correctly
• Prototype  Production transitions
• Data pipeline complexity (who do you ask?)
What is hard about Data Science
22
What are Data Scientists really doing?
23
Section 2
https://visit.figure-eight.com/rs/416-ZBE-
142/images/CrowdFlower_DataScienceReport_2016.pdf

Weitere ähnliche Inhalte

Ähnlich wie Lecture_1_Intro_toDS&AI.pptx

Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdfUniversity of Sindh
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSara-Jayne Terp
 
ds.pptx
ds.pptxds.pptx
ds.pptxElves3
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...Juan Mateos-Garcia
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Ed Fox on Learning Technologies
Ed Fox on Learning TechnologiesEd Fox on Learning Technologies
Ed Fox on Learning TechnologiesGardner Campbell
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfArmyTrilidiaDevegaSK
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st centuryMartinFrigaard
 
Data Science for Every Student at RPI
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPISteven Miller
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the ConsistentDutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the ConsistentHendrik Drachsler
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 

Ähnlich wie Lecture_1_Intro_toDS&AI.pptx (20)

Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdf
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
ds.pptx
ds.pptxds.pptx
ds.pptx
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Ed Fox on Learning Technologies
Ed Fox on Learning TechnologiesEd Fox on Learning Technologies
Ed Fox on Learning Technologies
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 
Data Science for Every Student at RPI
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPI
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the ConsistentDutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 

Kürzlich hochgeladen

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Kürzlich hochgeladen (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Lecture_1_Intro_toDS&AI.pptx

  • 1. DS101: Introduction to AI and DS Lecture 1: Introduction to Data Science Dr. Sudheer hsudheer@ifheindia.org 1
  • 2. 2 Course Code Course Title L P U DS101 Introduction to Data Science and Artificial Intelligence 3 0 3 Team of Instructors: 1. Ms Sathya AR 2. Ms. P Rohini 3. Dr. H Sudheer 4. Dr. P. Sirisha Course Objective: 1. The objective of this course is to expose the students to fundamental concepts of data science and their implementation using Python programming. 2. Introduce the mathematical foundations required for data science 3. To explore the various data pre-processing techniques 4. To Summarize the aspects of exploratory data analysis (EDA): Uses of EDA; Role of metadata in EDA; Data transformations identified through EDA. 5. To understand the AI approaches in Data Science.
  • 3. 3 Textbook (s) T1 Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From The Frontline”, O’Reilly, 2014. T2 Artificial Intelligence A Modern Approach, by Stuart Russell and Peter Norvig, 3 rd Edition, Pearson Education, 2010, ISBN 13:978-0-13-604259-4. Reference Book(s) R1 Python Data Science Handbook, Essential Tools for Working with Data, Jake VanderPlas,Orielly, 2017 R2 Data Science from Scratch: FIRST PRINCIPLES WITH PYTHON, Joel Grus, Orielly,2019 R3 The Data Science HandBook, Field Cady ,Wiley,2017 R4 Jiawei Han, Micheline Kamber and Jian Pei, “ Data Mining: Concepts and Techniques”, Third Edition. ISBN 0123814790, 2011 Online Resources R5 https://onlinecourses.nptel.ac.in/noc22_cs72/preview R6 https://www.udemy.com/course/complete-python-bootcamp/ R7 https://lms.simplilearn.com/courses/4227/Introduction-to-Data- Science/syllabus
  • 4. 4 “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” — Clive Humby, 2006
  • 5. 5 Increasingly many companies see themselves as data driven.
  • 6. 6
  • 7. Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products. Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data] and apply knowledge from data across a broad range of application domains. Data science is related to data mining, machine learning and big data. SOURCE : WIKIPIDEA 7
  • 8. Big Data and Data Science Hype 8
  • 9. “Big Data” Sources Every: Click Ad impression Billing event Fast Forward, pause,… Server request Transaction Network message Fault … User Generated (Web & Mobile) …. . Internet of Things / M2M Health/Scientific Computing It’s All Happening On-line
  • 11. 11
  • 12. The Current Landscape (with a Little History) 12 Data science is a broad field that refers to the collective processes, theories, concepts, tools and technologies that enable the review, analysis and extraction of valuable knowledge and information from raw data. Source: Techopedia Drew Conway’s Venn diagram of data science
  • 13. Rise of the Data Scientist 13 skills of Data Geeks: Statistics – traditional analysis you’re used to thinking about Data Munging – parsing, scraping, and formatting data Visualization – graphs, tools, etc. Harvard Business Review declared data scientist to be the “Sexiest Job of the 21st Century”. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  • 14. The Role of the Social Scientist in Data Science 14 Both LinkedIn and Facebook are social network companies. Often‐ times a description or definition of data scientist includes hybrid sta tistician, software engineer, and social scientist. If they’re social science-y problems like friend recommendations or people you know or user segmentation, then by all means, bring on the social scientist! Social scientists also do tend to be good question askers and have other good investigative qualities, so a social scientist who also has the quantitative and programming chops makes a great data scientist.
  • 15. Data Science Jobs 15 Most of the job descriptions: they ask data scientists to be experts in computer science, statistics, communication, data visualization, and to have extensive domain expertise. Nobody is an expert in everything, which is why it makes more sense to create teams of people who have different profiles and different expertise together, as a team, they can specialize in all those things. A Data Science Profile : • Computer science • Math • Statistics • Machine learning • Domain expertise • Communication and presentation skills • Data visualization
  • 16. Rachel’s data science profile, which she created to illustrate trying to visualize oneself as a data scientist; she wanted students and guest lecturers to “riff” on this—to add buckets or remove skills, use a different scale or visualization method, and think about the drawbacks of self- reporting 16
  • 17. Data science team profiles can be constructed from data scientist profiles; there should be alignment between the data science team profile and the profile of the data problems they try to solve 17
  • 18. Data science workflow 18 Section 2 https://cacm.acm.org/blogs/blog-cacm/169199-data-science- workflow-overview-and-challenges/fulltext
  • 20. Data science workflow 20 Digging Around in Data Hypothesize Model Large Scale Exploitation Evaluate Interpret Clean, prep
  • 21. What is hard about Data Science 21 • Overcoming assumptions • Making ad-hoc explanations of data patterns • Overgeneralizing • Communication • Not checking enough (validate models, data pipeline integrity, etc.) • Using statistical tests correctly • Prototype  Production transitions • Data pipeline complexity (who do you ask?)
  • 22. What is hard about Data Science 22
  • 23. What are Data Scientists really doing? 23 Section 2 https://visit.figure-eight.com/rs/416-ZBE- 142/images/CrowdFlower_DataScienceReport_2016.pdf

Hinweis der Redaktion

  1. Ronny Kohavi* keynote at KDD 2015 People are incredibly clever at explaining “very surprising results”. Unfortunately most very surprising results are caused by data pipeline errors. Beware “HiPPOs” (Highest Paid-Person’s Opinion)
  2. Quote from paper “I’d rather the data go away than be wrong and not know” Assumptions not communicated: transformations not documented.
  3. Quote from paper “I’d rather the data go away than be wrong and not know” Assumptions not communicated: transformations not documented.