General Requirements This section contains the general requirements which must be met by
your submitted assignment. Marks will be deducted if you fail to meet any of the following
general requirements. - You must complete Tasks 1-3 in the Jupyter Notebook under the Py thon
3 kernel. - All code must be written in one single ipynb file, where each task and the sub-tasks
therein (if any) must be clearly separated via Markdown cells to ensure good readability. - You
must include code-level comments in the..tpynb file to explain the key parts of your code. - You
must follow the instructions given in each task to complete the corresponding task - You must
follow the rules specified in the "Submission Requirements" section to make your final
submission. 1 - Your code in the submitted ipynb file must be executable during marking, where
all necessary files needed for executing the code must also be submitted, as detalled in the
"Submission Requirements" section. - All graphs must be properly sized and formatted to
include a meaningful title, appropriate axis labels, and a legend. The fonts contained in the graph
must be properly sized for good readability. The components of the graph should be
appropriately coloured, if appicable.
Task 1 - Problem Formulation, Data Acquisition and Preparation (12\%) Please visit the UCI
repository at httos:/larchive icsuciedu/m/datasets phe and cick on the "Classification" link under
the "Default Task" section, as illustrated in Figure 1, to check the available data sets that fall into
the category of dassification tasks. You can find the details about each of the listed data sets by
clicking on its name (beside its icon), as illustrated in Figure 1, and accordingly gain a better
understanding of the data and its domain. After that, you need to choose ONE data set Which
must satisfy the following criteria: - The data set must contain at least 150 rows (l.e, data
records). - The data set must contain at least five columns except the class label column. - The
data set must contain at least one categorical column except the class label column. The data set
must NoT be a multiabel data set, eg. the Anuran Calls (MFCCs) data set, where each data
record is associated with multiple different labeis. Note: If you choose a data set not satisfying
the above criteria, your totai marks of this assignment will be hatived. Once you have chosen a
certain data set, you can click on the "Data Folder" link in the frontpage of that data set, as
shown in Figure 2, to find and download the data file into your local Jupyter Notebook working
folder. Note that some dati files may not be in the format of .ovv, xis or xila in such cases, you
need to first convert them into the format of cov before looding the data. Next, you need to load
the data, periorm necessayy and appropriate data preparation operations to faclitate the
subsequent data analysis and modelling. Note: If multiple data files evst in the Data Folder," you
may just choose one o.
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
General Requirements This section contains the general requirements w.pdf
1. General Requirements This section contains the general requirements which must be met by
your submitted assignment. Marks will be deducted if you fail to meet any of the following
general requirements. - You must complete Tasks 1-3 in the Jupyter Notebook under the Py thon
3 kernel. - All code must be written in one single ipynb file, where each task and the sub-tasks
therein (if any) must be clearly separated via Markdown cells to ensure good readability. - You
must include code-level comments in the..tpynb file to explain the key parts of your code. - You
must follow the instructions given in each task to complete the corresponding task - You must
follow the rules specified in the "Submission Requirements" section to make your final
submission. 1 - Your code in the submitted ipynb file must be executable during marking, where
all necessary files needed for executing the code must also be submitted, as detalled in the
"Submission Requirements" section. - All graphs must be properly sized and formatted to
include a meaningful title, appropriate axis labels, and a legend. The fonts contained in the graph
must be properly sized for good readability. The components of the graph should be
appropriately coloured, if appicable.
Task 1 - Problem Formulation, Data Acquisition and Preparation (12%) Please visit the UCI
repository at httos:/larchive icsuciedu/m/datasets phe and cick on the "Classification" link under
the "Default Task" section, as illustrated in Figure 1, to check the available data sets that fall into
the category of dassification tasks. You can find the details about each of the listed data sets by
clicking on its name (beside its icon), as illustrated in Figure 1, and accordingly gain a better
understanding of the data and its domain. After that, you need to choose ONE data set Which
must satisfy the following criteria: - The data set must contain at least 150 rows (l.e, data
records). - The data set must contain at least five columns except the class label column. - The
data set must contain at least one categorical column except the class label column. The data set
must NoT be a multiabel data set, eg. the Anuran Calls (MFCCs) data set, where each data
record is associated with multiple different labeis. Note: If you choose a data set not satisfying
the above criteria, your totai marks of this assignment will be hatived. Once you have chosen a
certain data set, you can click on the "Data Folder" link in the frontpage of that data set, as
shown in Figure 2, to find and download the data file into your local Jupyter Notebook working
folder. Note that some dati files may not be in the format of .ovv, xis or xila in such cases, you
need to first convert them into the format of cov before looding the data. Next, you need to load
the data, periorm necessayy and appropriate data preparation operations to faclitate the
subsequent data analysis and modelling. Note: If multiple data files evst in the Data Folder," you
may just choose one of them which you believe is the most appropriate one to work on.
Furthermore, feature engineering might need to be performed in the step of data preparation. You
must describe vour workflow (including the invoived key components) for completing this tosk
2. present key observations and anolyses, provide justifcobions of any choices you hove mode, and
discuss any issues if encountered finduding the wors you nove used to oddress them in the report
required in Tosk 4.
Task 2 - Data Exploration (16%) Now you've finished Task 1. You can start to explore the data
loaded and prepared in Task 1 by carrying out the following steps: 2.1 Exploring each column
(li.e. attributes) by using appropriate descriptive statistics and/or graphical visualisations. If the
data set contains more than 10 attributes, you just need to select 10 columns to explone. You
must eioborate the woy(5) you've used for explorotion and present key observations, onolyses
and conciusions in the report required in Task 4. 2,2. Exploring the relationships between all
pairs of columns (example 10 selected pairs of columns If the data set contains more than 10
attributes) by using appropriate descriptive statistics and/or graphical visualisations. You must
eloborote the way(s) you've used for exploration and present key observations, onalyses and
conclusions regarding the reiationships between the explored poirs in the report required in Tosk
4 23 Posing one meaningtu question and exploring the data by using appropriate methods to find
its answer. You must stote the question, descnbe the woy you've used to find its answer, report
key observations bosed upon numenc metrics (e 9 , descriptive/inferential statistics) and/or
grophical visuolisations, and presentany interesting takeoways in the report required in Tosk 4.
You must oiso describe your workfow (including the involved key components) for completing
this task, provide justifications of any choices you ve made, and discuss any issues if
encountered (including the ways you ve used to oddress them) in the report required in Tosk 4.
(including the woys you ve used to oddress them) in the report required in Tosk 4 . Task 3 - Data
Modelling (32%6) In this task, you are asked to choose TWO classification models, and carry
out the following steps: 3.1 Splitting the data into a training set and a test set. Specifically, you
need to split the data at the following ratios, respectivey, to form three different suites of training
and test sets: - Suite1: 50is for training and 50 s.for testing - Suite2: 60%f for training and 40 -
for testing - Suite3: 80% for training and 205 for testing You must describe the woy you hove
used for splitting the doto to ensure the reproduciblify of your work in the report required in
Tosk 4 3.2. Performing the following steps for each of the two chosen moders on each of the
above three suites: - Identifying the mathod in the "scikit-learn" package which implements the
chosen madel. - Selecting appropriate model parameters and using them to train the modef vie
the aboveidentified method. You must elaborate and justify the way you've used for parameter
selection in the report required in Task 4. - Evaluating the performances of the model on the
training and test sets, respectively, in terms of contusion matrix Classificaton accuracy Q.
Predion Recall o F1 score and report them in the report required in Tosk 4 .
3. You are asked to write a report for the data science "project" you've completed in Tasks 1-3. Th.
report must have the following structure. - A cover page, induding o Report title 0. Your full
name and student 10 Q. Affiliation 0. Contact details o Date - Abstract (i.e., an executive
summary) - Introduction (including the entire worktlow of this data science "project") - Task 1
(following the italic instructions related to the report as specfied in Task 1) - Task 2 (following
the italic instructions related to the report as specified in Task 2) - Task 3 (following the
italicinstructions related to the report as specified in Task 3 ) - Discussion and Conclusions The
report must be saved in the PDF format and named "report pof for submission. It MUST be
written in the single column format with font size between 10 and 12 points and no more than is
paies (including tables, graphs and/or references). Penalties will apply if the report does not
satisfy these requirements. Moreover, the quality of the report will be considered when marking,
e. &. organisation, clarity, and grammatical mistakes. Please remember to explicitly cite any
sources which your ve referred to when doing your work! Submission Requirements