# somebody plz help with this Make use of the scikit-learn (sklearn) pyt.pdf

J

somebody plz help with this Make use of the scikit-learn (sklearn) python package in your function implementations Complete the Following Functions in task4.py: calculate_naive_metrics Given a train dataframe, test dataframe, target_col and naive assumption split out the target column from the training and test dataframes to create a feature dataframes and a target series then calculate (rounded to 4 decimal places) accuracy, recall, precision and f1 score using the sklearn functions, the train and test target values and the naive assumption. calculate_logistic_regression_metrics Given a train dataframe, test dataframe, target_col and logreg_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a logistic regression model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets. For Feature Importance use the top 10 features selected by RFE and sort by absolute value of the coefficient from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_random_forest_metrics Given a train dataframe, test dataframe, target_col and rf_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a random forest model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets For Feature Importance use the top 10 features using the built in feature importance attributes as sorted from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_gradient_boosting_metrics Given a train dataframe, test dataframe, target_col and gb_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a gradient boosting model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training .

wk5ppt2_IrisAliciaWei1
42 views16 Folien
wk5ppt1_TitanicAliciaWei1
32 views20 Folien

## Similar a somebody plz help with this Make use of the scikit-learn (sklearn) pyt.pdf

### Último(20)

2022 CAPE Merit List 2023
Caribbean Examinations Council2.3K views
class-3 Derived lipids (steorids).pptx
Dr. Santhosh Kumar. N45 views
Narration ppt.pptx
Tariq KHAN57 views
ME_URBAN_WAR.ppt
Norvell (Tex) DeAtkine117 views
Class 10 English lesson plans
Tariq KHAN149 views
M. Pharm Unit 2. Regulatory Asspects.pptx
Ashokrao Mane College of Pharmacy, Peth- Vadgaon86 views
Lecture: Open Innovation
Michal Hron68 views
Streaming Quiz 2023.pdf
Quiz Club NITW77 views
ICS3211_lecture_week72023.pdf
Vanessa Camilleri175 views
GSoC 2024
DeveloperStudentClub1041 views
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
DR .PALLAVI PATHANIA102 views
SIMPLE PRESENT TENSE_new.pptx
Narration lesson plan.docx
Tariq KHAN84 views
Chemistry of sex hormones.pptx
RAJ K. MAURYA93 views

### somebody plz help with this Make use of the scikit-learn (sklearn) pyt.pdf

• 1. somebody plz help with this Make use of the scikit-learn (sklearn) python package in your function implementations Complete the Following Functions in task4.py: calculate_naive_metrics Given a train dataframe, test dataframe, target_col and naive assumption split out the target column from the training and test dataframes to create a feature dataframes and a target series then calculate (rounded to 4 decimal places) accuracy, recall, precision and f1 score using the sklearn functions, the train and test target values and the naive assumption. calculate_logistic_regression_metrics Given a train dataframe, test dataframe, target_col and logreg_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a logistic regression model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets. For Feature Importance use the top 10 features selected by RFE and sort by absolute value of the coefficient from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_random_forest_metrics Given a train dataframe, test dataframe, target_col and rf_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a random forest model (initialized using the kwargs) on the training data and predict (both binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets For Feature Importance use the top 10 features using the built in feature importance attributes as sorted from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) calculate_gradient_boosting_metrics Given a train dataframe, test dataframe, target_col and gb_kwargs split out the target column from the training and test dataframes to create a feature dataframes and a target series. Then train a gradient boosting model (initialized using the kwargs) on the training data and predict (both
• 2. binary predictions and probability estimates) on the training and test data. Then using those predictions and estimates along with the target values calculate (rounded to 4 decimal places) accuracy, recall, precision, f1 score, false positive rate, false negative rate and area under the reciever operator curve (using probabilities for roc auc) for both training and test datasets For Feature Importance use the top 10 features using the built in feature importance attributes as sorted from biggest to smallest (make sure you use the same feature and importance column names as ModelMetrics shows in feat_name_col and imp_col and the index is 0-9 you can do that with `df.reset_index(drop=True)` ) import numpy as np import pandas as pd from sklearn.metrics import * from sklearn.linear_model import * from sklearn.ensemble import * from sklearn.feature_selection import RFE class ModelMetrics: def __init__(self, model_type:str,train_metrics:dict,test_metrics:dict,feature_importance_df:pd.DataFrame): self.model_type = model_type self.train_metrics = train_metrics self.test_metrics = test_metrics self.feat_imp_df = feature_importance_df self.feat_name_col = "Feature" self.imp_col = "Importance" def add_train_metric(self,metric_name:str,metric_val:float): self.train_metrics[metric_name] = metric_val def add_test_metric(self,metric_name:str,metric_val:float): self.test_metrics[metric_name] = metric_val def __str__(self): output_str = f"MODEL TYPE: {self.model_type}n" output_str += f"TRAINING METRICS:n" for key in sorted(self.train_metrics.keys()): output_str += f" - {key} : {self.train_metrics[key]:.4f}n" output_str += f"TESTING METRICS:n" for key in sorted(self.test_metrics.keys()): output_str += f" - {key} : {self.test_metrics[key]:.4f}n" if self.feat_imp_df is not None: output_str += f"FEATURE IMPORTANCES:n" for i in self.feat_imp_df.index: output_str += f" - {self.feat_imp_df[self.feat_name_col][i]} :
• 3. {self.feat_imp_df[self.imp_col][i]:.4f}n" return output_str def calculate_naive_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame, target_col:str, naive_assumption:int) -> ModelMetrics: # TODO: Write the necessary code to calculate accuracy, recall, precision and fscore given a train and test dataframe # and a train and test target series and naive assumption train_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0 } test_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0 } naive_metrics = ModelMetrics("Naive",train_metrics,test_metrics,None) return naive_metrics def calculate_logistic_regression_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame, target_col:str, logreg_kwargs) -> tuple[ModelMetrics,LogisticRegression]: # TODO: Write the necessary code to train a logistic regression binary classifiaction model and calculate accuracy, recall, precision, fscore, # false positive rate, false negative rate and area under the reciever operator curve given a train and test dataframe and train and test target series # and keyword arguments for the logistic regrssion model model = LogisticRegression() train_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } test_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0,
• 4. "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } # TODO: Use RFE to select the top 10 features # make sure the column of feature names is named Feature # and the column of importances is named Importance # and the dataframe is sorted by ascending ranking then decending absolute value of Importance log_reg_importance = pd.DataFrame() log_reg_metrics = ModelMetrics("Logistic Regression",train_metrics,test_metrics,log_reg_importance) return log_reg_metrics,model def calculate_random_forest_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame, target_col:str, rf_kwargs) -> tuple[ModelMetrics,RandomForestClassifier]: # TODO: Write the necessary code to train a random forest binary classification model and calculate accuracy, recall, precision, fscore, # false positive rate, false negative rate and area under the reciever operator curve given a train and test dataframe and train and test # target series and keyword arguments for the random forest model model = RandomForestClassifier() train_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } test_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } # TODO: Reminder DONT use RFE for rf_importance # make sure the column of feature names is named Feature # and the column of importances is named Importance # and the dataframe is sorted by decending absolute value of Importance
• 5. rf_importance = pd.DataFrame() rf_metrics = ModelMetrics("Random Forest",train_metrics,test_metrics,rf_importance) return rf_metrics,model def calculate_gradient_boosting_metrics(train_dataset:pd.DataFrame, test_dataset:pd.DataFrame, target_col:str, gb_kwargs) -> tuple[ModelMetrics,GradientBoostingClassifier]: # TODO: Write the necessary code to train a gradient boosting binary classification model and calculate accuracy, recall, precision, fscore, # false positive rate, false negative rate and area under the reciever operator curve given a train and test dataframe and train and test # target series and keyword arguments for the gradient boosting model model = GradientBoostingClassifier() train_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } test_metrics = { "accuracy" : 0, "recall" : 0, "precision" : 0, "fscore" : 0, "fpr" : 0, "fnr" : 0, "roc_auc" : 0 } # TODO: Reminder DONT use RFE for gb_importance # make sure the column of feature names is named Feature # and the column of importances is named Importance # and the dataframe is sorted by decending absolute value of Importance gb_importance = pd.DataFrame() gb_metrics = ModelMetrics("Gradient Boosting",train_metrics,test_metrics,gb_importance) return gb_metrics,model
Aktuelle SpracheEnglish
Español
Portugues
Français
Deutsche