SlideShare ist ein Scribd-Unternehmen logo
1 von 44
Improve Your Edge on
Machine Learning - Part 1
2
https://www.anaconda.com/products/distribution
Python in Anaconda Navigator Tools
Introduction Python
Python: Data Types, Assignment, Operator
4
Both " and ' can be used to denote strings. If the apostrophe character should be part of the
string, use " as outer boundaries:
"Barack's last name is Obama"
Alternatively,  can be used as an escape character: 'Barack's last name is Obama'
Integers (int)
In [1]:
# Integers
a = 2
b = 239
Floating point numbers (float)
In [2]:
# Floats
c = 2.1
d = 239.0
Strings (str)
In [3]:
e = 'Hello world!'
my_text = 'This is Future Lab'
Boolean (bool)
In [4]:
x = True
y = False
Standard calculator-like
operations
Most basic operations on integers and
floats such as addition, subtraction,
multiplication work as one would expect:
In [5]: 2 * 4
Out[5]: 8
In [6]: 2 / 5
Out[6]: 0.4
In [7]: 3.1 + 7.4
Out[7]: 10.5
Exponents
Exponents are denoted by **:
In [8]: 2**3
Out[8]: 8
Floor division
Floor division is denoted by //. It returns
the integer part of a division result
(removes decimals after division):
In [9]: 10 // 3
Out[9]: 3
Modulo
Modulo is denoted by %. It returns the
remainder after a division:
In [10]: 10 % 3
Out[10]: 1
Operations on strings
Strings can be added (concatenated) by
use of the addition operator +:
In [11]: 'Bruce' + ' ' + 'Wayne'
Out[11]: 'Bruce Wayne'
Multiplication is also allowed:
In [12 'a' * 3
Out[12]:'aaa'
Python: Control Flow
5
In Python, code blocks are separated by use of indentation. See the definition of an if-
statement below:
Syntax of conditional blocks
if condition:
# Code goes here (must be indented!)
# Otherwise, IndentationError will be thrown
# Code placed here is outside of the if-statement
Where evaluation of condition must return a boolean (True or False).
Remember:
1. The : must be present after condition.
2. The line immediately after : must be indented.
3. The if-statement is exited by reverting the indentation as shown
above.
This is how Python interprets the code as a block.
The same indentation rules are required for all types of code blocks, the if-block above is just an
example. Examples of other types of code blocks are for and while loops, functions etc.
All editors will automatically make the indentation upon hitting enter after the :, so it doesn't take long
to get used to this.
if-statements
An if-statement has the following syntax:
In [13]:
x = 2
if x > 1:
print('x is larger than 1')
if / else-statements
In [14]:
y = 1
if y > 1:
print('y is larger than 1')
else:
print('y is less than or equal to 1')
if / elif / else
In [15]:
z = 0
if z > 1:
print('z is larger than 1')
elif z < 1:
print('z is less than 1')
else:
print('z is equal to 1')
Python: Data Structures
6
Data structures are constructs that can contain one or more variables. They are containers that can store a lot of data into a single entity.
Python's four basic data structures are:
● Lists
● Dictionaries
● Tuples
● Sets
Lists
Lists are defined by square brackets [] with elements separated by commas. They
can have elements of any data type.
Lists are arguably the most used data structure in Python.
List syntax
L = [item_1, item_2, ..., item_n]
Mutability
Lists are mutable. They can be changed after creation.
List
In [1]:
# List with integers
a = [10, 20, 30, 40]
# Multiple data types in the same list
b = [1, True, 'Hi!', 4.3]
# List of lists
c = [['Nested', 'lists'], ['are', 'possible']]
Python: Data Structures
7
Dictionaries
Dictionaries have key/value pairs which are enclosed in curly
brackets{}. A value can be fetched by querying the corresponding key.
Referring the data via logically named keys instead of list indexes
makes the code more readable.
Dictionary syntax
d = {key1: value1, key2: value2, ..., key_n: value_n}
Note that values can be of any data type like floats, strings etc., but
they can also be lists or other data structures.
Keys must be unique within the dictionary. Otherwise it would be
hard to extract the value by calling out a certain key, see the section
about indexing and slicing below.
Keys also must be of an immutable type.
Mutability
Dictionaries are mutable. They can be changed after creation.
# Strings as keys and numbers as values
d1 = {'axial_force': 319.2, 'moment': 74, 'shear': 23}
# Strings as keys and lists as values
d2 = {'Point1': [1.3, 51, 10.6], 'Point2': [7.1, 11, 6.7]}
# Keys of different types (int and str, don't do this!)
d3 = {1: True, 'hej': 23}
The first two dictionaries above have a certain trend. For d1 the keys are strings and
the values are integers. For d2 the keys are strings and the values are lists. These
are well-structured dictionaries.
However, d3 has keys that are of mixed types! The first key is an integer and the
second is a string. This is totally valid syntax, but not a good idea to do.
As with most stuff in Python the flexibility is very nice, but it can also be confusing
to have many different types mixed in the same data structure. To make code more
readable, it is often preferred to keep the same trend throughout the dictionary. I.e.
all keys are of same type and all values are of the same type as in d1 and d2.
The keys and values can be extracted separately by the methods dict.keys() and
dict.values():
In [3]: d1.keys()
Out[3]: dict_keys(['axial_force', 'moment', 'shear'])
In [4]: d1.values()
Out[4]: dict_values([319.2, 74, 23])
Python: Data Structures
8
Tuples
Tuples are very comparable to lists, but they are defined by
parentheses (). Most notable difference from lists is that tuples are
immutable.
Tuple syntax
t = (item_1, item_2, ..., item_n)
Mutability
Tuples are immutable. They cannot be changed after creation.
Tuple examples
In [5]:
# Simple tuple of integers
t1 = (1, 24, 56)
# Multiple types as tuple elements
t2 = (1, 1.62, '12', [1, 2 , 3])
# Tuple of tuples
points = ((4, 5), (12, 6), (14, 9))
Sets
Sets are defined with curly brackets {}. They are unordered and don't
have an index. See description of indexing further down. Sets also
have unique items.
Set syntax
s = {item_1, item_2, ..., item_n}
The primary idea about sets is the ability to perform set operations.
These are known from mathematics and can determine the union,
intersection, difference etc. of two given sets.
A list, string or tuple can be converted to a set by
set(sequence_to_convert). Since sets only have unique items, the set
resulting from the operation has same values as the input sequence,
but with duplicates removed. This can be a way to create a list with
only unique elements.
For example:
# Convert list to set and back to list again with now only unique
elements
list_uniques = list(set(list_with_duplicates))
Python: Function
9
A function is a block of code that is first defined, and thereafter can be
called to run as many times as needed. A function might have
arguments, some of which can be optional if a default value is
specified.
A function is called by parentheses: function_name(). Arguments are
placed inside the parentehes and comma separated if there are more
than one. Similar to f(x, y) from mathematics.
A function can return one or more values to the caller. The values to
return are put in the return statement. When the code hits a return
statement the function terminates. If no return statement is given, the
function will return None
def function_name(arg1, arg2, default_arg1=0, default_arg2=None):
'''This is the docstring
The docstring explains what the function does, so it is like a multiline comment. It does not have to be here,
but it is good practice to use them to document the code. They are especially useful for more complicated
functions, although functions should in general be kept as simple as possible.
Arguments could be explained together with their types (e.g. strings, lists, dicts etc.).
'''
# Function code goes here
# Possible 'return' statement terminating the function. If 'return' is not specified, function returns None.
return return_val1, return_val2
If multiple values are to be returned, they can be separated by commas as shown. The returned entity will by default be a tuple.
Note that when using default arguments, it is good practice to only use immutable types. An example further below will demonstrate why this is recommended.
In [5]:
def say_hello_to(name):
''' Say hello to the input name '''
print(f'Hello {name}')
say_hello_to('Anders') # <--- Calling the function
prints 'Hello {name}'
r = say_hello_to('Anders') # <--- Calling the function
prints 'Hello {name}' and assigns None to r
print(r) # <--- Prints None, since
function had no return statement
10
Let us know your feedback as well ☺
QnA Session!
Numpy
Numpy: Why ?
12
- Numeric Python
- Alternative to Python List: NumPy Array
- Calculations over entire arrays
- Easy and Fast
- Installation In the terminal: pip install numpy
height = [1.73, 1.68, 1.71, 1.89, 1.79]
height
Out[1]: [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
weight
Out[2]: [65.4, 59.2, 63.6, 88.4, 68.7]
weight / height ** 2
Out[3]: TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'
Solution →
Numpy: Calculation
13
import numpy as np
np_height = np.array(height)
np_height
Out[1]: array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array(weight)
np_weight
Out[2]: array([65.4, 59.2, 63.6, 88.4, 68.7])
bmi = np_weight / np_height ** 2
bmi
Out[3]: array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])
Subsetting
bmi > 23
Out[3]: array([False, False, False, True, False]
Numpy: Data Types
14
Numerical types:
•integers (int)
•unsigned integers (uint)
•floating point (float)
•complex
Other data types:
•booleans (bool)
•string
•datetime
•Python object
Data Type Description
bool_ Boolean (True or False) stored as a byte
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2.15E-9 to 2.15E+9)
int64 Integer (-9.22E-18 to 9.22E+18)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4.29E+9)
uint64 Unsigned integer (0 to 1.84E+19)
float16 Half precision signed float
float32 Single precision signed float
float64 Double precision signed float
complex64 Complex number: two 32-bit floats (real and
imaginary components)
complex128 Complex number: two 64-bit floats (real and
imaginary components)
Numpy: Creation ndArrays
15
array = np.array([[0,1,2],[2,3,4]])
Out[1]:
[[0 1 2]
[2 3 4]]
array = np.zeros((2,3))
Out[1]:
[[0. 0. 0.]
[0. 0. 0.]]
array = np.ones((2,3))
Out[1]:
[[1. 1. 1.]
[1. 1. 1.]]
array = np.eye(3)
Out[1]:
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 1.]]
array = np.arange(0, 10, 2)
Out[1]:
[0, 2, 4, 6, 8]
array = np.random.randint(0, 10, (3,3))
Out[1]:
[[6 4 3]
[1 5 6]
[9 8 5]]
Numpy: Indexing & Slicing
16
arr = np.arange(10)
print(arr) # [0 1 2 3 4 5 6 7 8 9]
print(arr[5]) #5
print(arr[5:8]) #[5 6 7]
arr[5:8] = 12
print(arr) #[ 0 1 2 3 4 12 12 12 8 9]
One-dimensional arrays are simple; on the surface they act similarly to Python lists:
As you can see, if you assign a scalar value to a slice, as in
arr[5:8] = 12, the value is propagated (or broadcasted) to
the entire selection.
An important first distinction from Python’s built-in lists is
that array slices are views on the original array.
This means that the data is not copied, and any
modifications to the view will be reflected in the source
array.
arr = np.arange(10)
print(arr)
# [0 1 2 3 4 5 6 7 8 9]
arr_slice = arr[5:8]
print(arr_slice)
# [5 6 7]
arr_slice[1] = 12345
print(arr)
# [ 0 1 2 3 4 5
12345 7 8 9]
arr_slice[:] = 64
print(arr)
# [ 0 1 2 3 4 64 64 64 8 9]
17
Let us know your feedback as well ☺
QnA Session!
Pandas
Pandas: Dataframe Method & Attribute
19
df.attribute description
dtypes list the types of the columns
columns list the column names
axes list the row labels and column names
ndim number of dimensions
size number of elements
shape return a tuple representing the dimensionality
values numpy representation of the data
df.method() description
head( [n] ), tail( [n] ) first/last n rows
describe() generate descriptive statistics (for numeric columns only)
max(), min() return max/min values for all numeric columns
mean(), median() return mean/median values for all numeric columns
std() standard deviation
sample([n]) returns a random sample of the data frame
dropna() drop all the records with missing values
Unlike attributes, python methods have parenthesis.
All attributes and methods can be listed with a dir() function: dir(df)
Pandas
20
#Read csv file
df = pd.read_csv("http://rcs.bu.edu/examples/python/data_analysis/Salaries.csv")
Pandas: Dataframe Data Types
21
Pandas Type Native Python Type Description
object string The most general dtype. Will be
assigned to your column if column
has mixed types (numbers and
strings).
int64 int Numeric characters. 64 refers to
the memory allocated to hold this
character.
float64 float Numeric characters with decimals.
If a column contains numbers and
NaNs(see below), pandas will
default to float64, in case your
missing value has a decimal.
datetime64, timedelta[ns] N/A (but see the datetime module
in Python’s standard library)
Values meant to hold time data.
Look into these for time series
experiments.
Pandas: Dataframe Group By
22
Using "group by" method we can:
- Split the data into groups based on some criteria
- Calculate statistics (or apply a function) to each group
Once groupby object is create we can calculate various statistics for
each group
Pandas: Dataframe Filtering
23
To subset the data we can apply Boolean indexing.
This indexing is commonly known as a filter.
For example if we want to subset the rows in which the salary value is greater than $120K:
Any Boolean operator can be used to subset the data:
> greater; >= greater or equal;
< less; <= less or equal;
== equal; != not equal;
Pandas: Dataframe Selecting Row
24
If we need to select a range of rows, we can specify the range using ":"
Notice that the first row has a position 0, and the last value in the range is omitted:
So for 0:10 range the first 10 rows are returned with the positions starting with 0 and ending with 9
iloc methods
Pandas: Dataframe Common Aggregate
25
Aggregation - computing a summary statistic
about each group, i.e.
compute group sums or means
compute group sizes/counts
Common aggregation functions:
min, max
count, sum, prod
mean, median, mode, mad
std, var
df.method() description
describe Basic statistics (count, mean, std, min, quantiles, max)
min, max Minimum and maximum values
mean, median, mode Arithmetic average, median and mode
var, std Variance and standard deviation
sem Standard error of mean
skew Sample skewness
kurt kurtosis
26
Let us know your feedback as well ☺
QnA Session!
Introduction Machine Learning
Intro ML
28
Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming, coined
the term “Machine Learning” as – “Field of study that gives computers the capability to
learn without being explicitly programmed”.
How it is different from traditional
Programming:
- In Traditional Programming, we feed the
Input, Program logic and run the program to
get output.
- In Machine Learning, we feed the input,
output and run it on machine during training
and the machine creates its own logic, which
is being evaluated while testing.
Intro ML: Terminology
29
Terminologies that one should know before starting Machine Learning:
- Model: A model is a specific representation learned from data by applying some
machine learning algorithm. A model is also called hypothesis.
- Feature: A feature is an individual measurable property of our data. A set of numeric
features can be conveniently described by a feature vector. Feature vectors are fed as
input to the model. For example, in order to predict a fruit, there may be features like
color, smell, taste, etc.
- Target(Label): A target variable or label is the value to be predicted by our model. For
the fruit example discussed in the features section, the label with each set of input
would be the name of the fruit like apple, orange, banana, etc.
- Training: The idea is to give a set of inputs(features) and it’s expected outputs(labels),
so after training, we will have a model (hypothesis) that will then map new data to one
of the categories trained on.
- Prediction: Once our model is ready, it can be fed a set of inputs to which it will provide
a predicted output(label).
Intro ML: Type of Learning
30
- Supervised Learning
- Unsupervised Learning
- Semi-Supervised Learning
1. Supervised Learning:
Supervised learning is when the model is getting trained on a labelled dataset. Labelled
dataset is one which have both input and output parameters. In this type of learning
both training and validation datasets are labelled as shown in the figures below.
Types of Supervised Learning:
- Classification
- Regression
Intro ML: Type of Learning
31
2. Unsupervised Learning:
Unsupervised learning is the training of machine using information that is neither
classified nor labeled and allowing the algorithm to act on that information without
guidance. Here the task of machine is to group unsorted information according to
similarities, patterns and differences without any prior training of data. Unsupervised
machine learning is more challenging than supervised learning due to the absence of
labels.
Types of Supervised Learning:
- Clustering
- Association
3. Semi-supervised machine learning:
To counter these disadvantages, the concept
of Semi-Supervised Learning was introduced.
In this type of learning, the algorithm is trained
upon a combination of labeled and unlabeled
data. Typically, this combination will contain a
very small amount of labeled data and a very
large amount of unlabeled data.
Intro ML
32
Intro ML: Classification
33
We have a training set of observations (e.g., labeled images) and a test set
that we use only for evaluation.
Intro ML: Classification
Intro ML: Classification
Intro ML: Classification
Intro ML: Classification
Intro ML: Classification
Formally, given training set (xi,yi) for i=1…n, we want to create a
classification model f that can predict label y for a new x.
The machine learning algorithm will create the function f.
The predicted value of y for a new x is sign(f(x)).
Classification ?
- Yes/No questions – binary classification
- automatic handwriting recognition, speech recognition, biometrics, document
classification, spam detection, predicting credit default risk, detecting credit
card fraud, predicting customer churn, predicting medical outcomes (strokes,
side effects, etc.)
39
Let us know your feedback as well ☺
QnA Session!
Introduction Scikit-Learn
Intro Scikit-Learn: Why?
A. Simple and efficient tools for predictive data analysis
- Machine Learning methods
- Data processing
- Visualization
A. Accessible to everybody, and reusable in various contexts
- Documented API with lot’s of examples
- Not bound to Training frameworks (e.g. Tensorflow, Pytorch)
- Building blocks for your data analysis
A. Built on NumPy, SciPy, and matplotlib
- No own data types (unlike Pandas)
- Benefit from NumPy and SciPy optimizations
- Extends the most common visualisation tool
Open source, commercially usable - BSD license
Version 1.0 since September 2021
•https://scikit-learn.org/stable/
Intro Scikit-Learn: Tools
A. Classification:
Categorizing objects to one or more classes.
- Support Vector Machines (SVM)
- Nearest Neighbors
- Random Forest
- . . .
A. Regression:
Prediction of one (uni-) or more (multi-variate) continuous-
valued attributes.
- Support Vector Regression (SVR)
- Nearest Neighbors
- Random Forest
- . . .
A. Clustering:
Group objects of a set.
- k-Means
- Spectral Clustering
- Mean-Shift
- . . .
D. Dimensionality reduction:
Reducing the number of random variables.
- Principal Component Analysis (PCA)
- Feature Selection
- non-Negative Matrix Factorization
- . . .
E. Model selection:
Compare, validate and choose parameters/models.
- Grid Search
- Cross Validation
- . . .
F. Pre-Processing:
Prepare/transform data before training models.
- Conversion
- Normalization
- Feature Extract
Intro Scikit-Learn: Supervised ML Flow
Easy install via PIP or Conda for Windows, macOS and Linux, e.g:
$ pip install scikit-learn or
$ conda install -c intel scikit-learn
44
Kindly turn on your camera ☺
Let’s take a group photo!

Weitere ähnliche Inhalte

Ähnlich wie Improve Your Edge on Machine Learning - Day 1.pptx

An Introduction : Python
An Introduction : PythonAn Introduction : Python
An Introduction : PythonRaghu Kumar
 
Programming in C (part 2)
Programming in C (part 2)Programming in C (part 2)
Programming in C (part 2)SURBHI SAROHA
 
functions modules and exceptions handlings.ppt
functions modules and exceptions handlings.pptfunctions modules and exceptions handlings.ppt
functions modules and exceptions handlings.pptRajasekhar364622
 
PythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdfPythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdfdata2businessinsight
 
mooc_presentataion_mayankmanral on the subject puthon
mooc_presentataion_mayankmanral on the subject puthonmooc_presentataion_mayankmanral on the subject puthon
mooc_presentataion_mayankmanral on the subject puthongarvitbisht27
 
Java fundamentals
Java fundamentalsJava fundamentals
Java fundamentalsHCMUTE
 
Get started python programming part 1
Get started python programming   part 1Get started python programming   part 1
Get started python programming part 1Nicholas I
 
Basic of Python- Hands on Session
Basic of Python- Hands on SessionBasic of Python- Hands on Session
Basic of Python- Hands on SessionDharmesh Tank
 
1183 c-interview-questions-and-answers
1183 c-interview-questions-and-answers1183 c-interview-questions-and-answers
1183 c-interview-questions-and-answersAkash Gawali
 
Python Interview Questions And Answers
Python Interview Questions And AnswersPython Interview Questions And Answers
Python Interview Questions And AnswersH2Kinfosys
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Guy Lebanon
 

Ähnlich wie Improve Your Edge on Machine Learning - Day 1.pptx (20)

ch08.ppt
ch08.pptch08.ppt
ch08.ppt
 
An Introduction : Python
An Introduction : PythonAn Introduction : Python
An Introduction : Python
 
Structured Languages
Structured LanguagesStructured Languages
Structured Languages
 
Programming in C (part 2)
Programming in C (part 2)Programming in C (part 2)
Programming in C (part 2)
 
functions modules and exceptions handlings.ppt
functions modules and exceptions handlings.pptfunctions modules and exceptions handlings.ppt
functions modules and exceptions handlings.ppt
 
python and perl
python and perlpython and perl
python and perl
 
Python basics
Python basicsPython basics
Python basics
 
Python slide.1
Python slide.1Python slide.1
Python slide.1
 
Python Basics
Python BasicsPython Basics
Python Basics
 
PythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdfPythonStudyMaterialSTudyMaterial.pdf
PythonStudyMaterialSTudyMaterial.pdf
 
mooc_presentataion_mayankmanral on the subject puthon
mooc_presentataion_mayankmanral on the subject puthonmooc_presentataion_mayankmanral on the subject puthon
mooc_presentataion_mayankmanral on the subject puthon
 
Java fundamentals
Java fundamentalsJava fundamentals
Java fundamentals
 
Get started python programming part 1
Get started python programming   part 1Get started python programming   part 1
Get started python programming part 1
 
Python 3.pptx
Python 3.pptxPython 3.pptx
Python 3.pptx
 
Python basics
Python basicsPython basics
Python basics
 
Basic of Python- Hands on Session
Basic of Python- Hands on SessionBasic of Python- Hands on Session
Basic of Python- Hands on Session
 
1183 c-interview-questions-and-answers
1183 c-interview-questions-and-answers1183 c-interview-questions-and-answers
1183 c-interview-questions-and-answers
 
2 Functions2.pptx
2 Functions2.pptx2 Functions2.pptx
2 Functions2.pptx
 
Python Interview Questions And Answers
Python Interview Questions And AnswersPython Interview Questions And Answers
Python Interview Questions And Answers
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Improve Your Edge on Machine Learning - Day 1.pptx

  • 1. Improve Your Edge on Machine Learning - Part 1
  • 4. Python: Data Types, Assignment, Operator 4 Both " and ' can be used to denote strings. If the apostrophe character should be part of the string, use " as outer boundaries: "Barack's last name is Obama" Alternatively, can be used as an escape character: 'Barack's last name is Obama' Integers (int) In [1]: # Integers a = 2 b = 239 Floating point numbers (float) In [2]: # Floats c = 2.1 d = 239.0 Strings (str) In [3]: e = 'Hello world!' my_text = 'This is Future Lab' Boolean (bool) In [4]: x = True y = False Standard calculator-like operations Most basic operations on integers and floats such as addition, subtraction, multiplication work as one would expect: In [5]: 2 * 4 Out[5]: 8 In [6]: 2 / 5 Out[6]: 0.4 In [7]: 3.1 + 7.4 Out[7]: 10.5 Exponents Exponents are denoted by **: In [8]: 2**3 Out[8]: 8 Floor division Floor division is denoted by //. It returns the integer part of a division result (removes decimals after division): In [9]: 10 // 3 Out[9]: 3 Modulo Modulo is denoted by %. It returns the remainder after a division: In [10]: 10 % 3 Out[10]: 1 Operations on strings Strings can be added (concatenated) by use of the addition operator +: In [11]: 'Bruce' + ' ' + 'Wayne' Out[11]: 'Bruce Wayne' Multiplication is also allowed: In [12 'a' * 3 Out[12]:'aaa'
  • 5. Python: Control Flow 5 In Python, code blocks are separated by use of indentation. See the definition of an if- statement below: Syntax of conditional blocks if condition: # Code goes here (must be indented!) # Otherwise, IndentationError will be thrown # Code placed here is outside of the if-statement Where evaluation of condition must return a boolean (True or False). Remember: 1. The : must be present after condition. 2. The line immediately after : must be indented. 3. The if-statement is exited by reverting the indentation as shown above. This is how Python interprets the code as a block. The same indentation rules are required for all types of code blocks, the if-block above is just an example. Examples of other types of code blocks are for and while loops, functions etc. All editors will automatically make the indentation upon hitting enter after the :, so it doesn't take long to get used to this. if-statements An if-statement has the following syntax: In [13]: x = 2 if x > 1: print('x is larger than 1') if / else-statements In [14]: y = 1 if y > 1: print('y is larger than 1') else: print('y is less than or equal to 1') if / elif / else In [15]: z = 0 if z > 1: print('z is larger than 1') elif z < 1: print('z is less than 1') else: print('z is equal to 1')
  • 6. Python: Data Structures 6 Data structures are constructs that can contain one or more variables. They are containers that can store a lot of data into a single entity. Python's four basic data structures are: ● Lists ● Dictionaries ● Tuples ● Sets Lists Lists are defined by square brackets [] with elements separated by commas. They can have elements of any data type. Lists are arguably the most used data structure in Python. List syntax L = [item_1, item_2, ..., item_n] Mutability Lists are mutable. They can be changed after creation. List In [1]: # List with integers a = [10, 20, 30, 40] # Multiple data types in the same list b = [1, True, 'Hi!', 4.3] # List of lists c = [['Nested', 'lists'], ['are', 'possible']]
  • 7. Python: Data Structures 7 Dictionaries Dictionaries have key/value pairs which are enclosed in curly brackets{}. A value can be fetched by querying the corresponding key. Referring the data via logically named keys instead of list indexes makes the code more readable. Dictionary syntax d = {key1: value1, key2: value2, ..., key_n: value_n} Note that values can be of any data type like floats, strings etc., but they can also be lists or other data structures. Keys must be unique within the dictionary. Otherwise it would be hard to extract the value by calling out a certain key, see the section about indexing and slicing below. Keys also must be of an immutable type. Mutability Dictionaries are mutable. They can be changed after creation. # Strings as keys and numbers as values d1 = {'axial_force': 319.2, 'moment': 74, 'shear': 23} # Strings as keys and lists as values d2 = {'Point1': [1.3, 51, 10.6], 'Point2': [7.1, 11, 6.7]} # Keys of different types (int and str, don't do this!) d3 = {1: True, 'hej': 23} The first two dictionaries above have a certain trend. For d1 the keys are strings and the values are integers. For d2 the keys are strings and the values are lists. These are well-structured dictionaries. However, d3 has keys that are of mixed types! The first key is an integer and the second is a string. This is totally valid syntax, but not a good idea to do. As with most stuff in Python the flexibility is very nice, but it can also be confusing to have many different types mixed in the same data structure. To make code more readable, it is often preferred to keep the same trend throughout the dictionary. I.e. all keys are of same type and all values are of the same type as in d1 and d2. The keys and values can be extracted separately by the methods dict.keys() and dict.values(): In [3]: d1.keys() Out[3]: dict_keys(['axial_force', 'moment', 'shear']) In [4]: d1.values() Out[4]: dict_values([319.2, 74, 23])
  • 8. Python: Data Structures 8 Tuples Tuples are very comparable to lists, but they are defined by parentheses (). Most notable difference from lists is that tuples are immutable. Tuple syntax t = (item_1, item_2, ..., item_n) Mutability Tuples are immutable. They cannot be changed after creation. Tuple examples In [5]: # Simple tuple of integers t1 = (1, 24, 56) # Multiple types as tuple elements t2 = (1, 1.62, '12', [1, 2 , 3]) # Tuple of tuples points = ((4, 5), (12, 6), (14, 9)) Sets Sets are defined with curly brackets {}. They are unordered and don't have an index. See description of indexing further down. Sets also have unique items. Set syntax s = {item_1, item_2, ..., item_n} The primary idea about sets is the ability to perform set operations. These are known from mathematics and can determine the union, intersection, difference etc. of two given sets. A list, string or tuple can be converted to a set by set(sequence_to_convert). Since sets only have unique items, the set resulting from the operation has same values as the input sequence, but with duplicates removed. This can be a way to create a list with only unique elements. For example: # Convert list to set and back to list again with now only unique elements list_uniques = list(set(list_with_duplicates))
  • 9. Python: Function 9 A function is a block of code that is first defined, and thereafter can be called to run as many times as needed. A function might have arguments, some of which can be optional if a default value is specified. A function is called by parentheses: function_name(). Arguments are placed inside the parentehes and comma separated if there are more than one. Similar to f(x, y) from mathematics. A function can return one or more values to the caller. The values to return are put in the return statement. When the code hits a return statement the function terminates. If no return statement is given, the function will return None def function_name(arg1, arg2, default_arg1=0, default_arg2=None): '''This is the docstring The docstring explains what the function does, so it is like a multiline comment. It does not have to be here, but it is good practice to use them to document the code. They are especially useful for more complicated functions, although functions should in general be kept as simple as possible. Arguments could be explained together with their types (e.g. strings, lists, dicts etc.). ''' # Function code goes here # Possible 'return' statement terminating the function. If 'return' is not specified, function returns None. return return_val1, return_val2 If multiple values are to be returned, they can be separated by commas as shown. The returned entity will by default be a tuple. Note that when using default arguments, it is good practice to only use immutable types. An example further below will demonstrate why this is recommended. In [5]: def say_hello_to(name): ''' Say hello to the input name ''' print(f'Hello {name}') say_hello_to('Anders') # <--- Calling the function prints 'Hello {name}' r = say_hello_to('Anders') # <--- Calling the function prints 'Hello {name}' and assigns None to r print(r) # <--- Prints None, since function had no return statement
  • 10. 10 Let us know your feedback as well ☺ QnA Session!
  • 11. Numpy
  • 12. Numpy: Why ? 12 - Numeric Python - Alternative to Python List: NumPy Array - Calculations over entire arrays - Easy and Fast - Installation In the terminal: pip install numpy height = [1.73, 1.68, 1.71, 1.89, 1.79] height Out[1]: [1.73, 1.68, 1.71, 1.89, 1.79] weight = [65.4, 59.2, 63.6, 88.4, 68.7] weight Out[2]: [65.4, 59.2, 63.6, 88.4, 68.7] weight / height ** 2 Out[3]: TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int' Solution →
  • 13. Numpy: Calculation 13 import numpy as np np_height = np.array(height) np_height Out[1]: array([1.73, 1.68, 1.71, 1.89, 1.79]) np_weight = np.array(weight) np_weight Out[2]: array([65.4, 59.2, 63.6, 88.4, 68.7]) bmi = np_weight / np_height ** 2 bmi Out[3]: array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836]) Subsetting bmi > 23 Out[3]: array([False, False, False, True, False]
  • 14. Numpy: Data Types 14 Numerical types: •integers (int) •unsigned integers (uint) •floating point (float) •complex Other data types: •booleans (bool) •string •datetime •Python object Data Type Description bool_ Boolean (True or False) stored as a byte int8 Byte (-128 to 127) int16 Integer (-32768 to 32767) int32 Integer (-2.15E-9 to 2.15E+9) int64 Integer (-9.22E-18 to 9.22E+18) uint8 Unsigned integer (0 to 255) uint16 Unsigned integer (0 to 65535) uint32 Unsigned integer (0 to 4.29E+9) uint64 Unsigned integer (0 to 1.84E+19) float16 Half precision signed float float32 Single precision signed float float64 Double precision signed float complex64 Complex number: two 32-bit floats (real and imaginary components) complex128 Complex number: two 64-bit floats (real and imaginary components)
  • 15. Numpy: Creation ndArrays 15 array = np.array([[0,1,2],[2,3,4]]) Out[1]: [[0 1 2] [2 3 4]] array = np.zeros((2,3)) Out[1]: [[0. 0. 0.] [0. 0. 0.]] array = np.ones((2,3)) Out[1]: [[1. 1. 1.] [1. 1. 1.]] array = np.eye(3) Out[1]: [[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]] array = np.arange(0, 10, 2) Out[1]: [0, 2, 4, 6, 8] array = np.random.randint(0, 10, (3,3)) Out[1]: [[6 4 3] [1 5 6] [9 8 5]]
  • 16. Numpy: Indexing & Slicing 16 arr = np.arange(10) print(arr) # [0 1 2 3 4 5 6 7 8 9] print(arr[5]) #5 print(arr[5:8]) #[5 6 7] arr[5:8] = 12 print(arr) #[ 0 1 2 3 4 12 12 12 8 9] One-dimensional arrays are simple; on the surface they act similarly to Python lists: As you can see, if you assign a scalar value to a slice, as in arr[5:8] = 12, the value is propagated (or broadcasted) to the entire selection. An important first distinction from Python’s built-in lists is that array slices are views on the original array. This means that the data is not copied, and any modifications to the view will be reflected in the source array. arr = np.arange(10) print(arr) # [0 1 2 3 4 5 6 7 8 9] arr_slice = arr[5:8] print(arr_slice) # [5 6 7] arr_slice[1] = 12345 print(arr) # [ 0 1 2 3 4 5 12345 7 8 9] arr_slice[:] = 64 print(arr) # [ 0 1 2 3 4 64 64 64 8 9]
  • 17. 17 Let us know your feedback as well ☺ QnA Session!
  • 19. Pandas: Dataframe Method & Attribute 19 df.attribute description dtypes list the types of the columns columns list the column names axes list the row labels and column names ndim number of dimensions size number of elements shape return a tuple representing the dimensionality values numpy representation of the data df.method() description head( [n] ), tail( [n] ) first/last n rows describe() generate descriptive statistics (for numeric columns only) max(), min() return max/min values for all numeric columns mean(), median() return mean/median values for all numeric columns std() standard deviation sample([n]) returns a random sample of the data frame dropna() drop all the records with missing values Unlike attributes, python methods have parenthesis. All attributes and methods can be listed with a dir() function: dir(df)
  • 20. Pandas 20 #Read csv file df = pd.read_csv("http://rcs.bu.edu/examples/python/data_analysis/Salaries.csv")
  • 21. Pandas: Dataframe Data Types 21 Pandas Type Native Python Type Description object string The most general dtype. Will be assigned to your column if column has mixed types (numbers and strings). int64 int Numeric characters. 64 refers to the memory allocated to hold this character. float64 float Numeric characters with decimals. If a column contains numbers and NaNs(see below), pandas will default to float64, in case your missing value has a decimal. datetime64, timedelta[ns] N/A (but see the datetime module in Python’s standard library) Values meant to hold time data. Look into these for time series experiments.
  • 22. Pandas: Dataframe Group By 22 Using "group by" method we can: - Split the data into groups based on some criteria - Calculate statistics (or apply a function) to each group Once groupby object is create we can calculate various statistics for each group
  • 23. Pandas: Dataframe Filtering 23 To subset the data we can apply Boolean indexing. This indexing is commonly known as a filter. For example if we want to subset the rows in which the salary value is greater than $120K: Any Boolean operator can be used to subset the data: > greater; >= greater or equal; < less; <= less or equal; == equal; != not equal;
  • 24. Pandas: Dataframe Selecting Row 24 If we need to select a range of rows, we can specify the range using ":" Notice that the first row has a position 0, and the last value in the range is omitted: So for 0:10 range the first 10 rows are returned with the positions starting with 0 and ending with 9 iloc methods
  • 25. Pandas: Dataframe Common Aggregate 25 Aggregation - computing a summary statistic about each group, i.e. compute group sums or means compute group sizes/counts Common aggregation functions: min, max count, sum, prod mean, median, mode, mad std, var df.method() description describe Basic statistics (count, mean, std, min, quantiles, max) min, max Minimum and maximum values mean, median, mode Arithmetic average, median and mode var, std Variance and standard deviation sem Standard error of mean skew Sample skewness kurt kurtosis
  • 26. 26 Let us know your feedback as well ☺ QnA Session!
  • 28. Intro ML 28 Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming, coined the term “Machine Learning” as – “Field of study that gives computers the capability to learn without being explicitly programmed”. How it is different from traditional Programming: - In Traditional Programming, we feed the Input, Program logic and run the program to get output. - In Machine Learning, we feed the input, output and run it on machine during training and the machine creates its own logic, which is being evaluated while testing.
  • 29. Intro ML: Terminology 29 Terminologies that one should know before starting Machine Learning: - Model: A model is a specific representation learned from data by applying some machine learning algorithm. A model is also called hypothesis. - Feature: A feature is an individual measurable property of our data. A set of numeric features can be conveniently described by a feature vector. Feature vectors are fed as input to the model. For example, in order to predict a fruit, there may be features like color, smell, taste, etc. - Target(Label): A target variable or label is the value to be predicted by our model. For the fruit example discussed in the features section, the label with each set of input would be the name of the fruit like apple, orange, banana, etc. - Training: The idea is to give a set of inputs(features) and it’s expected outputs(labels), so after training, we will have a model (hypothesis) that will then map new data to one of the categories trained on. - Prediction: Once our model is ready, it can be fed a set of inputs to which it will provide a predicted output(label).
  • 30. Intro ML: Type of Learning 30 - Supervised Learning - Unsupervised Learning - Semi-Supervised Learning 1. Supervised Learning: Supervised learning is when the model is getting trained on a labelled dataset. Labelled dataset is one which have both input and output parameters. In this type of learning both training and validation datasets are labelled as shown in the figures below. Types of Supervised Learning: - Classification - Regression
  • 31. Intro ML: Type of Learning 31 2. Unsupervised Learning: Unsupervised learning is the training of machine using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance. Here the task of machine is to group unsorted information according to similarities, patterns and differences without any prior training of data. Unsupervised machine learning is more challenging than supervised learning due to the absence of labels. Types of Supervised Learning: - Clustering - Association 3. Semi-supervised machine learning: To counter these disadvantages, the concept of Semi-Supervised Learning was introduced. In this type of learning, the algorithm is trained upon a combination of labeled and unlabeled data. Typically, this combination will contain a very small amount of labeled data and a very large amount of unlabeled data.
  • 33. Intro ML: Classification 33 We have a training set of observations (e.g., labeled images) and a test set that we use only for evaluation.
  • 38. Intro ML: Classification Formally, given training set (xi,yi) for i=1…n, we want to create a classification model f that can predict label y for a new x. The machine learning algorithm will create the function f. The predicted value of y for a new x is sign(f(x)). Classification ? - Yes/No questions – binary classification - automatic handwriting recognition, speech recognition, biometrics, document classification, spam detection, predicting credit default risk, detecting credit card fraud, predicting customer churn, predicting medical outcomes (strokes, side effects, etc.)
  • 39. 39 Let us know your feedback as well ☺ QnA Session!
  • 41. Intro Scikit-Learn: Why? A. Simple and efficient tools for predictive data analysis - Machine Learning methods - Data processing - Visualization A. Accessible to everybody, and reusable in various contexts - Documented API with lot’s of examples - Not bound to Training frameworks (e.g. Tensorflow, Pytorch) - Building blocks for your data analysis A. Built on NumPy, SciPy, and matplotlib - No own data types (unlike Pandas) - Benefit from NumPy and SciPy optimizations - Extends the most common visualisation tool Open source, commercially usable - BSD license Version 1.0 since September 2021 •https://scikit-learn.org/stable/
  • 42. Intro Scikit-Learn: Tools A. Classification: Categorizing objects to one or more classes. - Support Vector Machines (SVM) - Nearest Neighbors - Random Forest - . . . A. Regression: Prediction of one (uni-) or more (multi-variate) continuous- valued attributes. - Support Vector Regression (SVR) - Nearest Neighbors - Random Forest - . . . A. Clustering: Group objects of a set. - k-Means - Spectral Clustering - Mean-Shift - . . . D. Dimensionality reduction: Reducing the number of random variables. - Principal Component Analysis (PCA) - Feature Selection - non-Negative Matrix Factorization - . . . E. Model selection: Compare, validate and choose parameters/models. - Grid Search - Cross Validation - . . . F. Pre-Processing: Prepare/transform data before training models. - Conversion - Normalization - Feature Extract
  • 43. Intro Scikit-Learn: Supervised ML Flow Easy install via PIP or Conda for Windows, macOS and Linux, e.g: $ pip install scikit-learn or $ conda install -c intel scikit-learn
  • 44. 44 Kindly turn on your camera ☺ Let’s take a group photo!

Hinweis der Redaktion

  1. You may share the feedback form here while opening up the floor for QnA session
  2. You may share the feedback form here while opening up the floor for QnA session
  3. You may share the feedback form here while opening up the floor for QnA session
  4. •In semi supervised learning labelled data is used to learn a model and using that model unlabeled data is labelled called pseudo labelling now using whole data model is trained for further use
  5. Machine learning as a field, *read* If we want to teach the computer to recognize images of chairs, then we give the computer a whole bunch of images, and tell it which ones are chairs and which are now, and then it’s supposed to learn to recognize chairs, even ones it hasn’t seen before. It’s not like we tell the computer how to recognize a chair, we don’t tell it “a chair has 4 legs and a back and a flat surface to sit on and so on”, we just give it a lot of examples. Machine learning has close ties to statistics, in fact it’s hard to say what’s different about predictive statistics and machine learning, and these fields are very closely linked right now.
  6. The problem I just told you about is a classification problem where we are trying to identify chairs. The way we set the problem up is that we have a *read* We use the training set to learn a model of what a chair is. The test set are images that are not in the training set, and we want to be able to make predictions on those, as to whether or not each image is a chair. It could be that some the labels on the training set are noisy. That could happen. In fact one if these labels is noisy *point*. That’s ok, because as long as their isn’t too much noise, we should still be able to learn a model for a chair. It just won’t be able to classify perfectly, and that happens. Some prediction problems are harder than others, but that’s ok, we just do the best we can from the training data. And in terms of the size of the training data, the more the merrier. We want as much data as we can to train these models.
  7. How do we represent an image of a chair, or a flower, or whatever, in the training set? I just zoomed in on a piece of this image over here, and you can see that the pixels in the image. We can represent each pixel according to its rgb values (red green blue), so we get three numbers representing each image. So you can represent the whole image as a collection of rgb values. So the image becomes this very large vector of numbers. And in general, when doing machine learning, we need to represent each observation in the training and test sets as a vector of numbers. The label is also represented by a number. Here the number is -1 because the image is not a chair. The chairs would all get label +1.
  8. Here’s another example. This is a problem that comes from NYC’s power company, where they wanted to predict which manholes were going to have a fire. So we would represent each manhole as a vector, and here are the components in the vector. The first component might be *read*. In general, the first step is to figure out how to represent your data as a vector. You can make the vector very large, you can include lots of factors if you like, that’s fine. Computationally things are easier if you use fewer features, but then you risk leaving out information. So there’s a tradeoff right there that you will have to worry about, and we’ll talk more about that later. But in any case, you can’t do ML if you don’t have your data represented this way, so that’s the first step. *pause* You’d think that manholes with more cables, more recent serious events, etc. would be more prone to explosions and fires in the future. But what combination of them would give you the best predictor? How do you combine them together? You could add them all up but that might not be the best thing. You could give them all weights and add them up, but how do you know the weights? That’s what ML does for you. It tells you what combinations to use to get the best predictors.
  9. Just to be formal about it, *Read* The features are also called *read* so you can choose whatever terminology you like.
  10. Let’s take a simple version of the manhole example where we have only two features, *Read*. So each observation can be represented as a point on a 2d graph, which means I can plot the whole dataset.
  11. You may share the feedback form here while opening up the floor for QnA session