2. Objective and the objective of the work
Analysis to Predict Who Will Die When.
HOW ?
๏ง Create Training and Validation set .
๏ง Use the training set to calculate likelihood ratio.
๏ง Itโs important because it gives forecast information regarding health outcomes.
๏ง this assignment teaching us to explore data and locate exact information among
data.l
3. Data source
Number of cases
What is the distribution of the data
โข Data source from the Assignment
Select count (*) from dbo.final
โข The total number of cases
( 17,443,442 number of cases)
โข distribution of the data
The average is Average -59.5318
And the Standard deviation- 4.2931
Average AgeAtDx: 59.53186
Standard Deviation of AgeAtDx: 4.293136
Start Dataset (hap464.dbo.final): 17,443,442 Cases
and 829,827 IDs
Zombies Removed: 17,432,694 Cases and 829,659
IDs
>365 Dx/Yr Removed: 17,379,218 Cases and
829,603 IDs ๏ This is your clean data.
80% Training Set From Clean Data: 13,760,416 Cases
and 657,905 IDs
20% Validation Set From Clean Data: 3,619,297
Cases and 171,698 IDs
4. Preparation of the data
17,443,442
10,748
diagnoses
removed
53,476
diagnoses
removed
829,827 distinct
IDs
Remove
Zombies: 168
distinct IDs
5. Calculating Likelihood Ratios
(Patients who died within six months after
diagnosis Dead Patients)
(Patients who lived six months after
diagnosis)
Alive Patients
7. Usefulness of the project
โข The usefulness of the project is to practice doing SQL in a large data set by using the skills of
codes, Also to figure out Selecting appropriate method of data analysis and removal of
confounding in the data, Visually present complex multivariate data and Interpret
quantitative findings and relate it to specific policy issues or management decisions.
โข In fact, Itโs important in our future work filed