somebody plz help with this Make use of the scikit-learn (sklearn) pyt.pdf

J

somebody plz help with this Make use of the scikit-learn (sklearn) python package in your function implementations Complete the Following Functions in task4.py: calculate_naive_metrics Given a train dataframe, test dataframe, target_col and naive assumption split out the target column from the training and test dataframes to create a feature dataframes and a target series then calculate (rounded to 4 decimal places) accuracy, recall, precision and f1 score using the sklearn functions, the train and test target values and the naive assumption. calculate_logistic_regression_metrics Given a train dataframe, test dataframe, target_col and logreg_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a logistic regression model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets. For Feature Importance use the top 10 features selected by RFE and sort by absolute value of the coefficient from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_random_forest_metrics Given a train dataframe, test dataframe, target_col and rf_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a random forest model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets For Feature Importance use the top 10 features using the built in feature importance attributes as sorted from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_gradient_boosting_metrics Given a train dataframe, test dataframe, target_col and gb_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a gradient boosting model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training .

somebody plz help with this
Make use of the scikit-learn (sklearn) python package in your function implementations
Complete the Following Functions in task4.py: calculate_naive_metrics
Given a train dataframe, test dataframe, target_col and naive assumption split out the target
column from the training and test dataframes to create a feature dataframes and a target series
then calculate (rounded to 4 decimal places) accuracy, recall, precision and f1 score using the
sklearn functions, the train and test target values and the naive assumption.
calculate_logistic_regression_metrics
Given a train dataframe, test dataframe, target_col and logreg_kwargs split out the target column
from the training and test dataframes to create a feature dataframes and a target series. Then train
a logistic regression model (initialized using the kwargs) on the training data and predict (both
binary predictions and probability estimates) on the training and test data. Then using those
predictions and estimates along with the target values calculate (rounded to 4 decimal places)
accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the
reciever operator curve (using probabilities for roc auc) for both training and test datasets.
For Feature Importance use the top 10 features selected by RFE and sort by absolute value of the
coefficient from biggest to smallest (make sure you use the same feature and importance column
names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do
that with `df.reset_index(drop=True)` )
calculate_random_forest_metrics
Given a train dataframe, test dataframe, target_col and rf_kwargs split out the target column
from the training and test dataframes to create a feature dataframes and a target series. Then train
a random forest model (initialized using the kwargs) on the training data and predict (both binary
predictions and probability estimates) on the training and test data. Then using those predictions
and estimates along with the target values calculate (rounded to 4 decimal places) accuracy,
recall, precision, f1 score, false positive rate, false negative rate and area under the reciever
operator curve (using probabilities for roc auc) for both training and test datasets
For Feature Importance use the top 10 features using the built in feature importance attributes as
sorted from biggest to smallest (make sure you use the same feature and importance column
names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do
that with `df.reset_index(drop=True)` )
calculate_gradient_boosting_metrics
Given a train dataframe, test dataframe, target_col and gb_kwargs split out the target column
from the training and test dataframes to create a feature dataframes and a target series. Then train
a gradient boosting model (initialized using the kwargs) on the training data and predict (both
binary predictions and probability estimates) on the training and test data. Then using those
predictions and estimates along with the target values calculate (rounded to 4 decimal places)
accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the
reciever operator curve (using probabilities for roc auc) for both training and test datasets
For Feature Importance use the top 10 features using the built in feature importance attributes as
sorted from biggest to smallest (make sure you use the same feature and importance column
names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do
that with `df.reset_index(drop=True)` )
import numpy as np
import pandas as pd
from sklearn.metrics import *
from sklearn.linear_model import *
from sklearn.ensemble import *
from sklearn.feature_selection import RFE
class ModelMetrics:
def __init__(self,
model_type:str,train_metrics:dict,test_metrics:dict,feature_importance_df:pd.DataFrame):
self.model_type = model_type
self.train_metrics = train_metrics
self.test_metrics = test_metrics
self.feat_imp_df = feature_importance_df
self.feat_name_col = "Feature"
self.imp_col = "Importance"
def add_train_metric(self,metric_name:str,metric_val:float):
self.train_metrics[metric_name] = metric_val
def add_test_metric(self,metric_name:str,metric_val:float):
self.test_metrics[metric_name] = metric_val
def __str__(self):
output_str = f"MODEL TYPE: {self.model_type}n"
output_str += f"TRAINING METRICS:n"
for key in sorted(self.train_metrics.keys()):
output_str += f" - {key} : {self.train_metrics[key]:.4f}n"
output_str += f"TESTING METRICS:n"
for key in sorted(self.test_metrics.keys()):
output_str += f" - {key} : {self.test_metrics[key]:.4f}n"
if self.feat_imp_df is not None:
output_str += f"FEATURE IMPORTANCES:n"
for i in self.feat_imp_df.index:
output_str += f" - {self.feat_imp_df[self.feat_name_col][i]} :
{self.feat_imp_df[self.imp_col][i]:.4f}n"
return output_str
def calculate_naive_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame,
target_col:str, naive_assumption:int) -> ModelMetrics:
# TODO: Write the necessary code to calculate accuracy, recall, precision and fscore given a
train and test dataframe
# and a train and test target series and naive assumption
train_metrics = {
"accuracy" : 0,
"recall" : 0,
"precision" : 0,
"fscore" : 0
}
test_metrics = {
"accuracy" : 0,
"recall" : 0,
"precision" : 0,
"fscore" : 0
}
naive_metrics = ModelMetrics("Naive",train_metrics,test_metrics,None)
return naive_metrics
def calculate_logistic_regression_metrics(train_dataset:pd.DataFrame,
test_dataset:pd.DataFrame, target_col:str, logreg_kwargs) ->
tuple[ModelMetrics,LogisticRegression]:
# TODO: Write the necessary code to train a logistic regression binary classifiaction model and
calculate accuracy, recall, precision, fscore,
# false positive rate, false negative rate and area under the reciever operator curve given a train
and test dataframe and train and test target series
# and keyword arguments for the logistic regrssion model
model = LogisticRegression()
train_metrics = {
"accuracy" : 0,
"recall" : 0,
"precision" : 0,
"fscore" : 0,
"fpr" : 0,
"fnr" : 0,
"roc_auc" : 0
}
test_metrics = {
"accuracy" : 0,
"recall" : 0,
"precision" : 0,
"fscore" : 0,
"fpr" : 0,
"fnr" : 0,
"roc_auc" : 0
}
# TODO: Use RFE to select the top 10 features
# make sure the column of feature names is named Feature
# and the column of importances is named Importance
# and the dataframe is sorted by ascending ranking then decending absolute value of Importance
log_reg_importance = pd.DataFrame()
log_reg_metrics = ModelMetrics("Logistic
Regression",train_metrics,test_metrics,log_reg_importance)
return log_reg_metrics,model
def calculate_random_forest_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame,
target_col:str, rf_kwargs) -> tuple[ModelMetrics,RandomForestClassifier]:
# TODO: Write the necessary code to train a random forest binary classification model and
calculate accuracy, recall, precision, fscore,
# false positive rate, false negative rate and area under the reciever operator curve given a train
and test dataframe and train and test
# target series and keyword arguments for the random forest model
model = RandomForestClassifier()
train_metrics = {
"accuracy" : 0,
"recall" : 0,
"precision" : 0,
"fscore" : 0,
"fpr" : 0,
"fnr" : 0,
"roc_auc" : 0
}
test_metrics = {
"accuracy" : 0,
"recall" : 0,
"precision" : 0,
"fscore" : 0,
"fpr" : 0,
"fnr" : 0,
"roc_auc" : 0
}
# TODO: Reminder DONT use RFE for rf_importance
# make sure the column of feature names is named Feature
# and the column of importances is named Importance
# and the dataframe is sorted by decending absolute value of Importance
rf_importance = pd.DataFrame()
rf_metrics = ModelMetrics("Random Forest",train_metrics,test_metrics,rf_importance)
return rf_metrics,model
def calculate_gradient_boosting_metrics(train_dataset:pd.DataFrame,
test_dataset:pd.DataFrame, target_col:str, gb_kwargs) ->
tuple[ModelMetrics,GradientBoostingClassifier]:
# TODO: Write the necessary code to train a gradient boosting binary classification model and
calculate accuracy, recall, precision, fscore,
# false positive rate, false negative rate and area under the reciever operator curve given a train
and test dataframe and train and test
# target series and keyword arguments for the gradient boosting model
model = GradientBoostingClassifier()
train_metrics = {
"accuracy" : 0,
"recall" : 0,
"precision" : 0,
"fscore" : 0,
"fpr" : 0,
"fnr" : 0,
"roc_auc" : 0
}
test_metrics = {
"accuracy" : 0,
"recall" : 0,
"precision" : 0,
"fscore" : 0,
"fpr" : 0,
"fnr" : 0,
"roc_auc" : 0
}
# TODO: Reminder DONT use RFE for gb_importance
# make sure the column of feature names is named Feature
# and the column of importances is named Importance
# and the dataframe is sorted by decending absolute value of Importance
gb_importance = pd.DataFrame()
gb_metrics = ModelMetrics("Gradient Boosting",train_metrics,test_metrics,gb_importance)
return gb_metrics,model
somebody plz help with this Make use of the scikit-learn (sklearn) pyt.pdf

Más contenido relacionado

Similar a somebody plz help with this Make use of the scikit-learn (sklearn) pyt.pdf(20)

maXbox starter65 machinelearning3maXbox starter65 machinelearning3
maXbox starter65 machinelearning3
Max Kleiner168 views
Grid search.pptxGrid search.pptx
Grid search.pptx
AbithaSam25 views
BPstudy sklearn 20180925BPstudy sklearn 20180925
BPstudy sklearn 20180925
Shintaro Fukushima1.1K views
interenship.pptxinterenship.pptx
interenship.pptx
Naveen31654910 views
Data mining with caret packageData mining with caret package
Data mining with caret package
Vivian S. Zhang4.4K views
Python Manuel-R2021.pdfPython Manuel-R2021.pdf
Python Manuel-R2021.pdf
RamprakashSingaravel11.8K views

Más de john344(20)

Último(20)

2022 CAPE Merit List 2023 2022 CAPE Merit List 2023
2022 CAPE Merit List 2023
Caribbean Examinations Council2.3K views
class-3   Derived lipids (steorids).pptxclass-3   Derived lipids (steorids).pptx
class-3 Derived lipids (steorids).pptx
Dr. Santhosh Kumar. N45 views
Narration  ppt.pptxNarration  ppt.pptx
Narration ppt.pptx
Tariq KHAN57 views
ME_URBAN_WAR.pptME_URBAN_WAR.ppt
ME_URBAN_WAR.ppt
Norvell (Tex) DeAtkine117 views
Class 10 English  lesson plansClass 10 English  lesson plans
Class 10 English lesson plans
Tariq KHAN149 views
M. Pharm Unit 2. Regulatory Asspects.pptxM. Pharm Unit 2. Regulatory Asspects.pptx
M. Pharm Unit 2. Regulatory Asspects.pptx
Ashokrao Mane College of Pharmacy, Peth- Vadgaon86 views
Lecture: Open InnovationLecture: Open Innovation
Lecture: Open Innovation
Michal Hron68 views
Streaming Quiz 2023.pdfStreaming Quiz 2023.pdf
Streaming Quiz 2023.pdf
Quiz Club NITW77 views
ICS3211_lecture_week72023.pdfICS3211_lecture_week72023.pdf
ICS3211_lecture_week72023.pdf
Vanessa Camilleri175 views
GSoC 2024GSoC 2024
GSoC 2024
DeveloperStudentClub1041 views
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
DR .PALLAVI PATHANIA102 views
SIMPLE PRESENT TENSE_new.pptxSIMPLE PRESENT TENSE_new.pptx
SIMPLE PRESENT TENSE_new.pptx
nisrinamadani2135 views
Narration lesson plan.docxNarration lesson plan.docx
Narration lesson plan.docx
Tariq KHAN84 views
Chemistry of sex hormones.pptxChemistry of sex hormones.pptx
Chemistry of sex hormones.pptx
RAJ K. MAURYA93 views

somebody plz help with this Make use of the scikit-learn (sklearn) pyt.pdf

  • 1. somebody plz help with this Make use of the scikit-learn (sklearn) python package in your function implementations Complete the Following Functions in task4.py: calculate_naive_metrics Given a train dataframe, test dataframe, target_col and naive assumption split out the target column from the training and test dataframes to create a feature dataframes and a target series then calculate (rounded to 4 decimal places) accuracy, recall, precision and f1 score using the sklearn functions, the train and test target values and the naive assumption. calculate_logistic_regression_metrics Given a train dataframe, test dataframe, target_col and logreg_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a logistic regression model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets. For Feature Importance use the top 10 features selected by RFE and sort by absolute value of the coefficient from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_random_forest_metrics Given a train dataframe, test dataframe, target_col and rf_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a random forest model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets For Feature Importance use the top 10 features using the built in feature importance attributes as sorted from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_gradient_boosting_metrics Given a train dataframe, test dataframe, target_col and gb_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a gradient boosting model (initialized using the kwargs) on the training data and predict (both
  • 2. binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets For Feature Importance use the top 10 features using the built in feature importance attributes as sorted from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) import numpy as np import pandas as pd from sklearn.metrics import * from sklearn.linear_model import * from sklearn.ensemble import * from sklearn.feature_selection import RFE class ModelMetrics: def __init__(self, model_type:str,train_metrics:dict,test_metrics:dict,feature_importance_df:pd.DataFrame): self.model_type = model_type self.train_metrics = train_metrics self.test_metrics = test_metrics self.feat_imp_df = feature_importance_df self.feat_name_col = "Feature" self.imp_col = "Importance" def add_train_metric(self,metric_name:str,metric_val:float): self.train_metrics[metric_name] = metric_val def add_test_metric(self,metric_name:str,metric_val:float): self.test_metrics[metric_name] = metric_val def __str__(self): output_str = f"MODEL TYPE: {self.model_type}n" output_str += f"TRAINING METRICS:n" for key in sorted(self.train_metrics.keys()): output_str += f" - {key} : {self.train_metrics[key]:.4f}n" output_str += f"TESTING METRICS:n" for key in sorted(self.test_metrics.keys()): output_str += f" - {key} : {self.test_metrics[key]:.4f}n" if self.feat_imp_df is not None: output_str += f"FEATURE IMPORTANCES:n" for i in self.feat_imp_df.index: output_str += f" - {self.feat_imp_df[self.feat_name_col][i]} :
  • 3. {self.feat_imp_df[self.imp_col][i]:.4f}n" return output_str def calculate_naive_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame, target_col:str, naive_assumption:int) -> ModelMetrics: # TODO: Write the necessary code to calculate accuracy, recall, precision and fscore given a train and test dataframe # and a train and test target series and naive assumption train_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0 } test_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0 } naive_metrics = ModelMetrics("Naive",train_metrics,test_metrics,None) return naive_metrics def calculate_logistic_regression_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame, target_col:str, logreg_kwargs) -> tuple[ModelMetrics,LogisticRegression]: # TODO: Write the necessary code to train a logistic regression binary classifiaction model and calculate accuracy, recall, precision, fscore, # false positive rate, false negative rate and area under the reciever operator curve given a train and test dataframe and train and test target series # and keyword arguments for the logistic regrssion model model = LogisticRegression() train_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } test_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0,
  • 4. "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } # TODO: Use RFE to select the top 10 features # make sure the column of feature names is named Feature # and the column of importances is named Importance # and the dataframe is sorted by ascending ranking then decending absolute value of Importance log_reg_importance = pd.DataFrame() log_reg_metrics = ModelMetrics("Logistic Regression",train_metrics,test_metrics,log_reg_importance) return log_reg_metrics,model def calculate_random_forest_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame, target_col:str, rf_kwargs) -> tuple[ModelMetrics,RandomForestClassifier]: # TODO: Write the necessary code to train a random forest binary classification model and calculate accuracy, recall, precision, fscore, # false positive rate, false negative rate and area under the reciever operator curve given a train and test dataframe and train and test # target series and keyword arguments for the random forest model model = RandomForestClassifier() train_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } test_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } # TODO: Reminder DONT use RFE for rf_importance # make sure the column of feature names is named Feature # and the column of importances is named Importance # and the dataframe is sorted by decending absolute value of Importance
  • 5. rf_importance = pd.DataFrame() rf_metrics = ModelMetrics("Random Forest",train_metrics,test_metrics,rf_importance) return rf_metrics,model def calculate_gradient_boosting_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame, target_col:str, gb_kwargs) -> tuple[ModelMetrics,GradientBoostingClassifier]: # TODO: Write the necessary code to train a gradient boosting binary classification model and calculate accuracy, recall, precision, fscore, # false positive rate, false negative rate and area under the reciever operator curve given a train and test dataframe and train and test # target series and keyword arguments for the gradient boosting model model = GradientBoostingClassifier() train_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } test_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } # TODO: Reminder DONT use RFE for gb_importance # make sure the column of feature names is named Feature # and the column of importances is named Importance # and the dataframe is sorted by decending absolute value of Importance gb_importance = pd.DataFrame() gb_metrics = ModelMetrics("Gradient Boosting",train_metrics,test_metrics,gb_importance) return gb_metrics,model