The document summarizes an agenda for a presentation on machine learning and data science. It includes an introduction to CRISP-DM (Cross Industry Standard for Data Mining), guided analytics, and a KNIME demo. It also discusses the differences between machine learning, artificial intelligence, and data science. Machine learning produces predictions, artificial intelligence produces actions, and data science produces insights. It provides an overview of the CRISP-DM process for data mining projects including the business understanding, data understanding, data preparation, modeling, evaluation, and deployment phases. It also discusses guided analytics and interactive systems to assist business analysts in finding insights and predicting outcomes from data.
Ähnlich wie Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science
Which institute is best for data science?DIGITALSAI1
Ähnlich wie Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science (20)
4. ML vs. AI vs. DS?
Data Science produces insights
Machine Learning produces predictions
5. ML vs. AI vs. DS?
Data Science produces insights
Machine Learning produces predictions
Artificial Intelligence produces actions
6.
7. What is Artificial Intelligence?
• Artificial Narrow Intelligence (ANI): Machine
intelligence that equals or exceeds human
intelligence or efficiency at a specific task.
• Artificial General Intelligence (AGI): A machine
with the ability to apply intelligence to any
problem, rather than just one specific problem
(human-level intelligence).
• Artificial Superintelligence (ASI): An intellect that
is much smarter than the best human brains in
practically every field, including scientific
creativity, general wisdom and social skills.
8. Machine Learning | Introduction
• Machine Learning is a type of Artificial Intelligence that provides
computers with the ability to learn without being explicitly programmed.
• Provides various techniques that can learn from and make predictions on
data.
9. Machine Learning | Learning Approaches
• Supervised Learning: Learning with a labeled
training set
• Example: email spam detector with training set
of already labeled emails
• Unsupervised Learning: Discovering patterns
in unlabeled data
• Example: cluster similar documents based on
the text content
• Reinforcement Learning: learning based on
feedback or reward
• Example: learn to play chess by winning or
losing
14. Business question is the focus
• what is the business question?
• which problem needs to be solved?
• how sure are you about the question?
• who asked for it?
• what is the real pain point?
• who will use the results?
• where will it be used?
• if you couldn’t do this, what would you do instead?
• what would be a better target?
• how will this be achived?
• what should the data look like?
CRISP-DM | Business Understanding
16. Initial data collection to get familiar with the data
What does the data mean?
• code book
• name of each variable
• description of each variable
• possible range of values for each variable
How was the data created / collected?
machine-generated data?
• which systems were involved?
• how do these systems work?
human-generated data?
• how standardized is the input process?
• (fixed fields? free text?)
CRISP-DM | Data Understanding
17. All steps from the raw data to the final dataset
Final dataset:
used for statistical modeling
sometimes called ADS (analytical dataset)
Includes or can include:
• data source selection and loading
• table selection and loading
• joining data sources
• data cleaning (missing values, outliers, ...)
• feature generation and data transformation
• taking samples of data
• …
CRISP-DM | Data Preparation
21. CRISP - DM
Cross Industry Standard for Data Mining
80 - 20 Rule!
Time Consuming : %20
Success Factor : %80
Source: Berthold, Borgelt, Höppner, Klawonn: Guide to Intelligent Data Analysis, Springer 2011
22. Sharing Tools
Sharing Skills
Sharing Responsibility
A new generation of tools
They can build their own reports
A recipe for disaster
Data is viral - everybody wants it
Start small and just do it
25. Guided Analytics | Definition
• Allowing data scientists to build
interactive systems, interactively
assisting the business analyst in her
quest to find new insights in data and
predict future outcomes.
26. Guided Analytics | Definition
• We explicitly do not aim to replace the
driver (or totally automate the process) but
instead offer assistance and carefully
gather feedback whenever needed
throughout the analysis process.
• To make this successful, the data scientist
needs to be able to easily create powerful
analytical applications that allow
interaction with the business user
whenever their expertise and feedback is
needed.
28. Guided Analytics | Environments
Openness:
• The environment does not post restrictions in terms of
tools used – this also simplifies collaboration between
scripting gurus (such as R or Python) and others who just
want to reuse their expertise without diving into their
code.
• Obviously being able to reach out to other tools for specific
data types (text, images, …) or specialized high
performance or big data algorithms (such as H2O or
Spark) from within the same environment would be a plus;
Uniformity
Flexibility
Agility
29. Guided Analytics | Environments
Openness
Uniformity:
The experts creating data science can do it all in
the same environment:
• blend data,
• run the analysis,
• mix & match tools,
• build the infrastructure to deploy this as analytical
application;
Flexibility
Agility
30. Guided Analytics | Environments
Openness
Uniformity
Flexibility:
• Underneath the analytical application, we
can run simple regression models or
orchestrate complex parameter
optimization and ensemble models –
ranging from one to thousands of models.
Agility
31. Guided Analytics | Environments
Openness
Uniformity
Flexibility
Agility:
• Once the application is used in the wild, new demands
will arise quickly: more automation here, more consumer
feedback there.
• The environment that is used to build these analytical
applications needs to make it intuitive for other members
of the data science team to quickly adapt the existing
analytical applications to new and changing
requirements.
32. Guided Analytics | Auto-what?
• So how do all of those driverless, automatic, automated AI or
machine learning systems fit into this picture?
• Their goal is either to encapsulate (and hide!) existing expert data
scientists’ expertise or apply more or less sophisticated
optimization schemes to the fine-tuning of the data science tasks.
33. Guided Analytics | Auto-what?
• Obviously, this can be useful if no in-house data science expertise is available but in
the end, the business analyst is locked into the pre-packaged expertise and the
limited set of hard coded scenarios.
• Both, data scientist expertise and parameter optimization can easily be part of a
Guided Analytics workflow as well.
• And since automation of whatever kind tends to always miss the important and interesting
piece, adding a Guided Analytics component to this makes it even more powerful: we can
guide the optimization scheme and we can adjust the pre-coded expert knowledge to
the new task at hand.
34. Data Sciense Project | Roles
www.sistek.com.tr
• Data scientists
– Workflow development
– Data Analysis
– Model Development
• Business analysts
– WebPortal
– Data Analysis
• IT administrators
– Enterprise Architecture Mngmt
– Cloud solution provider
5.Data Science Project –Roles
35. Data Science Project | Data Scientist
www.sistek.com.tr
Responsible for:
• Creating, updating workflows
• Creating, maintaining metanode
templates
• Building, evaluating, monitoring data
and using ad hoc developed
workflows
• Development of WebPortal
applications
5.Data Science Project – Data Scientists
39. Guided Analytics | Design
The workflow defines a fully automated web based application to
select, train, test, and optimize a number of machine learning
models.
The workflow is designed for business analysts to easily create
predictive analytics solutions by applying their domain knowledge.
Each of the wrapped metanodes outputs a web page with which the
business analyst can interact.
41. Sources
๏ Christian Dietz, Paolo Tamagnini, Simon Schmid, Michael Berthold: Intelligently
Automating Machine Learning, Artificial Intelligence, and Data Science,
https://www.knime.com/blog
๏ Berthold, Borgelt, Höppner, Klawonn: Guide to Intelligent Data Analysis, Springer 2011
๏ Michael Berthold: Principles of Guided Analytics, https://www.knime.com/blog
๏ Ali Alkan: Veri Madenciliği Teknikleri, Eğitim Notları 2017
๏ Ali Alkan: İleri Analitik Teknikler Seminerleri 1-2-3-5-6-7, Seminer Notları 2016-17