The talk is on How to become a data scientist. This was at 2ns Annual event of Pune Developer's Community. It focuses on Skill Set required to become data scientist. And also based on who you are what you can be.
4. Agenda
●
● Why Data Scientist?
● Artificial Intelligence
● Being a Data Scientist
● Broad Skill Sets
● Skill Sets based on your Profile
● Typical Team in Data Science
● Conclusion & Final Remarks
● References to start
22. Who is Data
Scientist?
One who has wide breadth of
abilities:
● Academic curiosity
● Storytelling
● Product sense
● Engineering experience
● Cleverness
And above all
● deep domain expertise in
Mathematics, Statistical and
Machine Learning
24. What is Data Science?
Data science, also known as data-driven science, is an
interdisciplinary field about scientific methods, processes,
and systems to extract knowledge or insights from data in
various forms, either structured or unstructured, similar to
data mining.
- Wikipedia : https://en.wikipedia.org/wiki/Data_science
30. Core Areas
Pick One
● Information Retrieval
● Natural Language Processing
● Linguistics
● Machine Learning
● Image Processing
● Video Processing
● Speech Processing
● Then pick Neural Networks and
Deep Learning
31. Tools
and
Technology
If not all at least 3-4
● Excel
● R, Python
● Spark
● Hadoop
● Scala
● AWS
● Solr, Elastic Search
● New ML Libraries - Tensorflow,
Caffe
● Queueing System
33. Application of
Algorithms
● Practical Implementations
● Follow Kaggle, KDNuggets and
solve problems
● Ability to quickly suggest
algorithms to apply and also to
implement the same
● Working with a Mentor will help
34. Data Savvy
● Data Oriented Mindset
● Quickly understand the
problem and give solutions in
short span of time
● Ability to think how data can
add value to business and what
insights can be driven.
35. As a data scientist, if you know nothing
else, you need to know how to take
some data, munge it, clean it, filter it ,
mine it, visualize it and then validate.
It’s a very long process
38. Skill Sets Focus
● Exceptionally Strong
Programming Skills
● Strong Data Structure
Knowledge
● Master Python, R, Java
● Github Profile
● Then, Work in
Companies to Solve
Problems
Freshers
39. Skill Sets Focus
● Mathematics
● Course - Take Up a
Course Online.
● Pick up a Area - ML, NLP,
Linguistics etc
● Apply and Solve
Problems in Kaggle
Programmers
(> 2 Years Experience)
40. Skill Sets Focus
● Strong AWS Knowledge
● Knowledge of ML/DL
Libraries and Tools
● Photographer’s Mind
● Be a Data Engineer than a
Scientist
● Practice, Practice,
Practice
Programmers
(> 2 Years Experience)
41. Skill Set Focus
Programmers with over 10
Years Experience
● Their curiosity helps to find the
problem on their own and they
solve it themselves.
● Can take a course and Talk to
people with experience in these
areas.
● Inability to admit the lack of
knowledge
● Understand Scale Challenges
with Data
42. Skill Set Focus
Business Analyst/Managers
● Take a Course
○ Understand how things are built
○ Not necessary to know
mathematics or programming
● Understand the steps in ML
○ Data Collection
○ Data Preparation
○ Model Selection
○ Training
○ Evaluation
○ Parameter Tuning
○ Prediction
43. Skill Set Focus
Business Analyst/Managers
● Incremental Systems
● Accuracy Models
● View AI Videos applied to
Business.
● Log the data properly in your
applications.
● Ability to convey problems to
Solutios Architects
44. Skill Set Focus
Database Admins
● Understand different types of Data
○ Text,Images,Numbers,Files etc
● Learn all about storage
mechanisms, advantages,
disadvantages of different
databases
○ NoSQL - Mongo, Cassandra, GraphDB
(Neo4J), CouchDB
○ SQL
45. Skill Set Focus
Database Admins
● Ability to convey what
database is optimal to what
type of data.
● Design and Build Models for
various kinds of data on paper
● Practice Modeling of Data
Extensively.
46. Skill Set Focus
Domain Experts/CxOs
● Same as Business
Analysts/Managers.
● Formulating the Business
around Data
● AI is used to solve business
problem
● AI is used for Automation
51. Data Scientist
● Mathematics, Statistics etc.
● Expert in ML, NLP etc in at
least one of these areas..
● Knows to Apply Different ML
Models and Algorithms
52. Data Engineers
● Takes inputs from Data
Scientists once the problem is
solved
● Exceptionally good at
Programming and different
tools
● Solves problems at Scale
● Productionize the solutions.
55. Concluding
Remarks
● Everything is on cloud, let’s use it.
● Unaware of Business Value
● Clueless About Data Science and
Related Technologies
● Solutions Architect and Domain
Experts are critical to know before
you join.
● Vision of the Company
Major Challenges in Companies
56. Final Remarks Work with Mentor
Have a Photographer’s Mind.
Choose your career wisely
58. Online Course
● Coursera : Andrew NG Machine
Learning Course https://goo.gl/fDTwSE
● Youtube : Prof. Sengupta
https://goo.gl/JGG6th
59. People and
Books
● People to follow.
○ Andrew NG
○ Bernard Marr - AI Journalist.
○ Geoffrey Hinton
○ Roman Trusov
○ Many people :
https://www.quora.com/Who-are-some-
notable-machine-learning-researchers
● Books
○ Programming Collective Intelligence