How to do quick user assign in kanban in Odoo 17 ERP
Data scientist enablement dse 400 - week 1
1. Content of this document is under Creative Commons BY-NC-SA
Data Scientist Enablement
DSE 400 - Fast Track to Data Science
Week 1 Roadmap
Advanced Center of Excellence
Modern Renaissance Corporation
In Collaboration with SONO team and others
2. Agenda
You can always find the latest version of this document at bit.ly/1hC5wAV
Welcome
Mission and Objectives
DSE Roadmap
DSE 400 at a glance
Week 1 at a glance
Discussions
Learning
Practice
Assignments and Submission
Looking ahead
References
Acknowledgement
3. Welcome
Welcome to DSE 2014 Track. You are on one of most
exciting programs to disseminate knowledge, diffuse
advancements and also stimulate adoption of Data/Decision
Sciences, Big Data Analytics and what we call EvidenceOriented Systems Engineering. The content and the courses
are designed to be easy, engaging and engendering.
Consequently, we also hope this program will also be most
rewarding for you from intellectual, pragmatic and
professional development perspectives.
4. Mission and Objectives
Mission of our program is to provide free, open and worldclass enablement of Data Scientists and help advance the
profession of Data Science and allied disciplines.
We aim to prepare the participants with analytical and
practical skills emphasizing breadth and depth in a range of
relevant disciplines and capabilities in Data/Decision
Sciences, Big Data Analytics, Architecture and Systems
Engineering.
5. Data Scientist Enablement Roadmap - 2014
Ramping up Machine Learning with R
Advanced Techniques in
Big Data Analytics
Fast track to
Data Science
Modern Data Platforms
“”“A Data Scientist is someone who knows how to extract meaning from and interpret data, which
requires both tools and methods from statistics and machine learning, as well as being human.”
- Rachel Schutt and Cathy O’Neil, Doing Data Science
6. DSE 2014 with tentative timeline
Mar 30 - May 10
July 20 - Aug 30
Ramping up Machine Learning with R
Fast track to
Data Science
Jan 19 - Mar 15
Modern Data Platforms
May 25 - July 5
Advanced Techniques in
Big Data Analytics
7. DSE 400 at a glance
Introductory course with NO pre requisites. It employs
socialized learning paradigm involving individual effort,
team work, discussions and collaboration on SONO (Social
Knowledge) platform.
Topics include Algorithms, Statistical
Inference, Data Analysis, Hadoop, R,
Data Engineering, Machine Learning,
Visualization, Applications, Case Studies,
employing a variety of tools and techniques.
8. DSE 400 - Week 1 at a glance
Discussions(on SONO):
Welcome, Introductions, Programming and Analytics background etc.
Reading plan:
Read Chapters 1-3 from An Introduction to Data Science by Jeffrey Stanton and Big Data
[sorry] & Data Science: What Does a Data Scientist Do?
Activities:
Installing R and R-Studio; Fun with Math; Playing with ML Datasets, Research on Data
Visualization tools etc.
Assignment 1:
Download Housing dataset from UCI Machine Learning Repository to your local machine or
cloud drive. Import this dataset into your R environment and display this dataset.
9. Social Engagement on SONO
Login to SONO Community. Visit our Jump Pad (or
Knowledge Domain) called DSE 400. Go to DSE 2014
Global then join right participant group based on first letter
of your last name. Also feel free to explore other
Knowledge-rich communities on SONO.
http://getsokno.com/redvinef/controllers/cell.php?
user_knocell=992
10. Social Engagement on SONO - Week 1
Discussion 1: Welcome to DSE program.
Discussion 2: What programming languages are you
familiar with? What languages do you use on day to day
basis? Do you have any experience using R Language?
What kind of Analytics tools if any, you have used before.
<Optional> Discussion 3: Q&A. General questions as well
as questions specific to week1 are welcome.
To participate in these discussions visit DSE 400 Week 1 at
http://getsokno.com/redvinef/controllers/cell.php?user_knocell=1001
11. Week 1 Reading Plan
DSE 400 is designed be a broad introduction to Data
Science, Analytics Architecture and Visualization from both
learning as well as pragmatic perspectives. Following plan
is recommend for Week 1 to kickstart the program.
Read Chapters 1-3 from An Introduction to Data Science
by Jeffrey Stanton.
Read Big Data [sorry] & Data Science: What Does a Data
Scientist Do?
12. Activities
<Required> Visit http://www.rstudio.com/ Follow the instructions to
download and install R and R-Studio. For specific advice on your system and its
configuration, several how-to videos on Installing R and R-Studio can be found
on Youtube. Skip this activity if you already have R and R-Studio.
<Collaborative Research> <Required> Create a presentation on Data
Visualization Tools - A Comparative Study . Incorporate your unique ideas,
research and collective insights to arrive at the right evaluation methodology,
explain your thought-process and justify your choices. Note: You will build this
presentation for 4 weeks. You and your team will present it during 5th week
13. Activities - contd
<Practice> Math is Fun. Create a bar chart quickly with 10 random values
using Data Graphs widget at Math is Fun website. Change graph to Pie Chart.
Display percentages only, not the original values.
<Practice> Visit UCI Machine Learning Repository. Familiarize yourself
with various datasets at this site. Feel free to download any dataset you like. We
will be using this repository in DSE program extensively. For week 1 our focus is
on just “Housing” dataset.
14. Assignment 1 - Submission Required
Download R-Studio, in case you have not already done so.
Download Housing dataset from UCI Machine Learning
Repository to your local machine or cloud drive. Import this
dataset into your R environment and display this dataset.
Show the screenshot of your environment.
(See the sample image in the next slide.)
http://archive.ics.uci.edu/ml/datasets.html
16. Submissions
Deadline Saturday Jan 25, 11:59 PM your local time.
Submit <mail to datascience400@gmail.com> the
screenshots of your R workspace (on your
machine/laptop/desktop) showing the Housing dataset.
You can either paste the image into the body of email or
create a document in PDF format and send it as an
attachment. No links please.
19. DSE 400 - Weeks 2-8 ahead
Week 2 Basic Statistics, Hypothesis Testing, Regression, Playing with Spreadsheets,Visualization with
R. If you are new to Statistics or need a refresher, read ahead Think Stats: Probability and Statistics for
Programmers or watch Statistics Playlist by Khan Academy
Week 3 - 4 Intro to Machine Learning(ML) - Classification, Clustering, Prediction NaiveBayes,
Recommendations and Boosting algorithms
Week 5 Visualizations. Present your research Data Visualization Tools - A Comparative Study
Week 6 -7 Processing large data sets. Hadoop Ecosystem. Stream Computing etc.
Week 8 Ethics, Privacy and Building Data Products.
20. References and Additional Reading
An Introduction to Data Science by Jeffrey Stanton. This
is a good introduction to Data Science for non-technical
readers. This book is available under Creative Commons
Licence.
Learning R - Video Tutorial Lessons on Youtube
R for Machine Learning by Allison Chung
The Value of Big Data Isn't the Data HBR Article
[MIT OCW] Prediction, Machine Learning and Statistics
21. Citation
Housing Data Set Information: Concerns housing values in suburbs of Boston.
Origin: This dataset was taken from the StatLib library which is maintained at
Carnegie Mellon University. Creator : Harrison, D. and Rubinfeld, D.L.
'Hedonic prices and the demand for clean air', J. Environ. Economics &
Management, vol.5, 81-102, 1978.
Content that appears as is on this document only, is under Creative Commons
BY-NC-SA This license may not apply to material referenced here.
22. For More Information
DSE 2014 stream is all set set to commence on Jan 19, 2004
For more details, visit DSE 400 Announcement Page bit.ly/18zPE1j
Visit DSE 2014 Global to participate in DSE and to get to know the DSE Core
Team and participants. Week 1 discussions can found at DSE 400 Week 1
We welcome questions, thoughts and suggestions. Post these on SONO in the
right forum/discussion or write to us at <datascience400@gmail.com>
You can always find the latest version of this document at bit.ly/1hC5wAV
23. Acknowledgement
We thank our community of committed and passionate
volunteers, experts, educators, innovators, benefactors,
advisers, advocates, mentors and supporters
We are also grateful to the outstanding support and
encouragement from SONO team as well as other
organizations like R-Project, Open Courseware
Consortium, MIT, IBM, HortonWorks, Stanford University,
Caltech and Data Science Central etc.