With help of my mentor Avinash, I have drafted Prevent COVID-19 spread leveraging ML- Random Foresting .
Please note This is a draft version , welcome bashing which enable me to create clean version 1.0. Require your support and inputs to create GTM solution to cater the current needs.
Call Girls Service Anantapur 📲 6297143586 Book Now VIP Call Girls in Anantapur
Prevent COVID-19 using ML
1. Stop Covid-19 spread by leveraging ML
Covid-19 is spreading faster than light we might need to act faster to stop community spread. A country
like US where we have patient data available with CMS, Payers, Providers & Pharma. With the available
data we can fuel the engine to stop the Covid-19 spread.
How faster is the next question? Not faster than light but faster than sound. A one stop solution
leverages the ML to Quarantine and provide preventive care.
Random Foresting Algorithm (RF) is a supervised learning algorithm which is used for both classification
as well as regression. Also, many empirical studies have confirmed the theory that the random forest
algorithm has a high prediction accuracy with good tolerance for abnormal value and noise.
RF can be leveraged to classify the vulnerable population with various patient conditions and ensure
preventive care can be enabled to stop the community spread.
Step 1. Confirm the training set, validation set, and test set. Last 2 years of Claims Data
Step 2. Construct the RF classifier. Input training sets and use the RF algorithm to build the model;
apply the combination classifier composed of classification decision trees. (COVID-19 Prone for
infection or not)
Step 3. Get the weight Value. Input the validation set and classify each sample in the validation set by
regarding each decision tree in the forest as an independent classifier.
Step 4. Input the test set to evaluate the performance of the model.
Step 5. Input the unclassified samples. Classify the samples by the random forest. The result depends
on the weighted vote of the classification results of each sub classifier.
A sample Classification Parameters: Training Data Set Last 2 years of Claims data
Let’s consider 2000 claims data set. Among these, close to 250 patients are marked as super critical
with immediate attention. The proportion is 13.5% and obviously represents an unbalanced data
problem.
Technology Stack: We can use Python with NumPy, SciPy, SciKit-learn, Pandas, and Matplotlib
Apply RF for feature selection. In order to call the RF algorithm in Python, all features are converted to
float values.
Using the tool for feature ranking based on the RF, we input the Patient Claims data into 2000 rows
with various independent features (Age / Medical Condition / Claim Occurrence / ICD-10 Proc & Diag
Codes /Gender / Plan type / Zip Code / County/ Smoking Type / Acholic / etc) and one dependent
variable. According to their importance, we rank the features in descending order.
2. After removing the lower scoring features one by one, we find that the accuracy and only the top 5 (Age/
Gender/ Medical Condition/Zip Code/Plan Type) parameters are retained.
Of the above independent features, 10 are removed and the data are reorganized into 2000 rows, 5
(Age/ Gender/ Medical Condition/Zip Code/Plan Type) independent features, and 1 target variable.
The 5 features (Age/ Gender/ Medical Condition/Zip Code/Plan Type) based on importance are selected
as the features of the patient prediction problem and the operation dimensionality are reduced
significantly.
The relation between the features and the target variable, we analyse top 3 features and their results
listed below
1) Age & Medical Condition
Feature Value Range Medical Condition COVID-19 prone for
infection
Member Age
= >90 & above Diabetic stage-2 Super Critical
=>80 & above Broken leg Critical
=> 60 & Above Asthma High
=>50 & above BP Medium
=>60 & Above
Asthma & Diabetic
Stage-2 with Amputee
leg
Super Critical Immediate
attention
Gender
Male Lung Cancer Stage-2
Super Critical Immediate
attention
Female Breast Cancer High
2) Age, Medical Condition and Gender
3. Feature Value Range Medical Condition Gender COVID-19 prone
for infection
Member Age
= >90 & above Diabetic stage-2 Male Super Critical
=>80 & above
Broken leg , Breast
Cancer
Female
Super Critical
=> 30 & Above Asthma Male Medium
=>50 & above BP Female Medium
=>60 & Above
Asthma & Diabetic
Stage-2 with
Amputee leg
Female
Super Critical
Immediate
attention
=>25 & Above
Lung Cancer Stage-
2
Male Super Critical
Immediate
attention
3) Zip Code, Age, Medical Condition and Gender:
ZIP CODE Feature Value
Range
Medical
Condition
Gender COVID-19
prone for
infection
06426
ESSEX
Member Age
= >90 &
above
Diabetic stage-2
Male
Super Critical
=>80 &
above
Broken leg ,
Breast Cancer
Female
Super Critical
06480
Portland
=> 30 &
Above
Asthma
Male
Medium
4. 06413
Clinton
=>50 &
above
BP
Female
Medium
06455
Middletown
=>60 &
Above
Asthma &
Diabetic Stage-2
with Amputee leg
Female
Super Critical
Immediate
attention
=>25 &
Above
Lung Cancer
Stage-2
Male Super Critical
Immediate
attention
Prediction from RF will lead to key critical action items:
Health Care Provider:
Procure COVID-19 testing kits / Ventilator support
Plan for COVID-19 Home test, since aged people population is
high
Enough Ambulatory services & Ventilator support to assist the
patients faster
Plan extra beds in ICU’s
Health Care Payer:
Send notification to the Member to take COVID-19 test
Ensure dependents are entitled for COVID-19 coverage if Member
is diagnosed COVID-19 Positive