SlideShare ist ein Scribd-Unternehmen logo
1 von 29
D. Fullerton
STAT 3010.W01
   Final Project
       07/19/09
STAT 3010.W01 Final Project: Analysis of Center for Disease Control
         Chronic Disease Indicators of the United States and Georgia for Year 2005

The aim of this report is to discuss the results of a statistical analysis of Chronic Disease
Indicators of the United States and Georgia for the year 2005 made by the Center for Disease
Control. The points covered in the analysis of data were: 1) Determine descriptive statistics and
describe the distributions of variables of the data set, 2) Compare chronic disease indicator rates
between the United States and Georgia separately for each of five categories, and 3) Create a
random 20 item sample from the dataset, then estimate the Chronic Disease Indicator rate in the
United States and Georgia using a 95% and 99% confidence interval, then determine whether or
not the population mean rate for all 50 initial data were captured by the estimated confidence
intervals. SAS 9.1.3 SP4 and graphics from SAS and Minitab 15 were the applications used in
this analysis.

The particular dataset was chosen due relation to healthcare, size, and complexion of data. The
five variables (three catagorical and two quantitative) of the Center for Disease Control Chronic
Disease Indicators of the United States and Georgia for Year 2005 were obtained by filtering a
data set from the Center for Disease Control website (http://apps.nccd.cdc.gov/cdi/Default.aspx).
A comparison was selected between the United States and Georgia. The data and definitions
were originally developed by The Council of State and Territorial Epidemiologists with
epidemiologists and chronic disease program directors at the state and federal level, were refined
between 1999 and 2002, then a survey was made for 2005.

This data has proved useful in Georgia to develop a database of the indicators by 19 health
districts available via the internet. As well, the Division of Diabetes Translation at Center for
Disease Control uses the data to assist diabetes programs with their surveillance and
epidemiological activities. Table 1 shows a short selection of the data, and variable names used
in Table 1 are described in Table 2. There are 50 datapoints from the year 2005, and the six other
datapoints from different years were trimmed from the data set before analysis. Therefore, results
and analysis is only valid for the year 2005.
The occurrences per 100,000 people of the United States, and Georgia, by Chronic Disease
Indicator category are assessed.

The assessment of the quantitative and categorical variables shows the following. Table 3 shows
the descriptive statistics for Chronic Disease Indicators of the United States and Georgia both
have a significant difference between the mean and median. Figures 1 and 2 clearly show that the
distribution of occurrences for the United States, and for Georgia, are both unimodal, and
positively skewed. Figures 3 and 4 further demonstrate this trend. Although drasticly skewed, no
outliers are shown. The most representative measure of central tendency is the median, 25.95 for
the United States, and 25.90 for Georgia.

Table 4 shows the frequency of each occurrence by category. Cancer swallows up the data at 36
occurrences (out of 50), this mode is over four times that of the next leading indicator,
Cardiovascular Disease. Figures 5 and 6 reinforce this, however, it is notable that cancer has a
broader range of results,and is skewed, but Cardiovascular Disease has a more even distribution.

A new categorical variable was created for the occurrences in the United States and Georgia
based on size. The occurrences were broken up into chunks of size 150. The Contingency Table
5 shows that Cancer statistics for the United States are mostly returned in the “X-Small” range,
meaning that most of the 32 data points in this category were less than 150 occurrences.
Occurrences for Georgia differ in that some results fall into the “Medium” range, and 50% of the
Cardiovascular results are from the “X-Small” category.

The categorical indicator is also show in Figures 7 through 10. They stress again that Cancer is
the leading indicator, by far, at over 75% overall. Figure 7 clumps the smallest three indicators
into one category, “Other”. The breakdowns of cause by either United States, or Georgia,
continue to stress the facts that Cancer and Cardiovascular Disease are the factors that beg
further study.

Tables 11 and 12 again show the breakdown of occurrences by the newly created variable, size.
Each show that most occurrences for both the United States, and for Georgia, fall into the “X-
Small” category, at a frequency of nearly 40% in each. Tables 13 and 14 show the category of
incidence by size on stacked bar charts for the United States and Georgia. Cancer results in the
United States fit mostly the “X-small” category, and Cardiovascular fit the “Small” category.
The results in Geogia show that “X-Small” leads in all categories, and is the vast majority of the
Cancer indicator.

Finally, a random sample was produced in SAS of 20 data points. Both the 95 and 99%
confidence intervals captured the true sample means with the United States between 38.69 and
200.55 (95%), and 9.00 and 230.24 (99%), where the true mean is 102.33, and Georgia between
35.69 and 210.84 (95%), and 3.56 and 242.97 (99%), where the true mean is 100.37
APPENDIX I: SAS TABLES AND FIGURES

                                     Table 1: Abbreviated Display of the Center for Disease Control
                                 Chronic Disease Indicators of the United States and Georgia for Year 2005

           Obs CATEGORY                   INDICATOR                                 YEAR MEASURE         UNITED_STATES    GEORGIA
              1 Tobacco and Alcohol       Chronic liver disease - mortality          2005 Crude Rate                9.3         7.5
              2 Tobacco and Alcohol       Chronic liver disease - mortality          2005 Age-adjusted              8.9         8.1
                                                                                          Rate
              3 Cancer                    Invasive cancer (all sites combined) -     2005 Crude Rate              469.8       402.6
                                          incidence
              4 Cancer                    Invasive cancer (all sites combined) -     2005 Age-adjusted            458.4       452.0
                                          incidence                                       Rate
              5 Cancer                    Cancer (all sites combined) - mortality    2005 Crude Rate              188.6       157.2
              . .                         .                                             . .                           .             .
              . .                         .                                             . .                           .             .
              . .                         .                                             . .                           .
             48 Overarching Conditions    Premature mortality among adults aged      2005 Age-adjusted            618.6       711.1
                                          45-64 years                                     Rate
             49 Other Diseases and Risk   Asthma - mortality                         2005 Crude Rate                1.3         1.3
                Factors
             50 Other Diseases and Risk   Asthma - mortality                         2005 Age-adjusted              1.3         1.5
                Factors                                                                   Rate

NOTE: The data for other years were minimal and thus eliminated from this data set (the numeration “Obs” was added automatically by SAS).
Table 2: Summary of Variables Contained in Center for Disease Control
       Chronic Disease Indicators of the United States and Georgia for Year 2005

                                                                                Measurement
 Variable Name           Label           General Type       Specific Type
                                                                                   Units
                   Observation
Obs                                     Categorical      Identifier Variable   N/A
                   number

CATEGORY           Disease category     Categorical      Nominal               N/A


INDICATOR          Disease indicator    Categorical      Nominal               N/A


                   Survey year
YEAR                                    Categorical      Nominal               N/A
                   (only 2005 used)
                   Crude or Age
MEASURE                                 Categorical      Nominal               N/A
                   adjusted rate
                                                                               Number of
                                                                               instances per
UNITED_STATES       -                   Quantitative     Interval/Ratio
                                                                               100,000
                                                                               persons*
                                                                               Number of
                                                                               instances per
GEORGIA             -                   Quantitative     Interval/Ratio
                                                                               100,000
                                                                               persons*

        * standardized by the direct method to the year 2000 standard U.S. population
            based on single years of age from the Census P25-1130 series estimates
Table 3: Descriptive Statistics of Center for Disease Control
     Chronic Disease Indicators of the United States and Georgia for Year 2005


     Variable            N Mean Median Std Dev Range Minimum Maximum
     UNITED_STATES       50 102.23   25.95   153.70    628.60       1.30        629.90
     GEORGIA             50 100.37   25.90   163.10    719.70       1.30        721.00


                Table 4: Frequency Table of Center for Disease Control
                        Chronic Disease Indicators by Category
                                     CATEGORY


                                                                Cumulative      Cumulative
CATEGORY                              Frequency       Percent    Frequency         Percent
Cancer                                        36        72.00              36             72.00
Cardiovascular Disease                         8        16.00              44             88.00

Other Diseases and Risk Factors                2         4.00              46             92.00

Overarching Conditions                         2         4.00              48             96.00

Tobacco and Alcohol                            2         4.00              50            100.00
Figure 1: Histogram of Occurrences United States (per 100,000 people)

            70




            60




            50




       P
            40
       e
       r
       c
       e
       n
       t    30




            20




            10




                0

                        0     120         240             360         480     600

                                          UNITED STATES



  Figure 2: Histogram of Occurrences Georgia (per 100,000 people)

           70




           60




           50




   P
           40
   e
   r
   c
   e
   n
   t       30




           20




           10




           0

                    0       120     240           360           480     600    720

                                                GEORGIA
Figure 3: Box Plot of Occurrences United States (year 2005 per 100,000 people)

             800




             600



         U
         N
         I
         T
         E
         D
             400
         S
         T
         A
         T
         E
         S



             200




               0

                                     2005

                                       YEAR


  Figure 4: Box Plot of Occurrences Georgia (year 2005 per 100,000 people)

             800




             600




         G
         E
         O
         R   400
         G
         I
         A




             200




              0

                                     2005

                                       YEAR
Figure 5: Side by Side Box Plot of Occurrences United States (per 100,000 people)

              800




              600



          U
          N
          I
          T
          E
          D
              400
          S
          T
          A
          T
          E
          S



              200




               0

                    Cancer                      Tobacco and Alcohol

                                   CATEGORY




  Figure 6: Side by Side Box Plot of Occurrences Georgia (per 100,000 people)

              800




              600




          G
          E
          O
          R   400
          G
          I
          A




              200




               0

                    Cancer                      Tobacco and Alcohol

                                   CATEGORY
Table 5: Contingency Table Category of Occurrences by United States Size

    CATEGORY(CATEGORY)                          US_SIZE
  Frequency
  Percent
  Row Pct
  Col Pct                           Large Small X-Large X-Small        Total
  Cancer                                 2       2        0       32      36
                                      4.00    4.00     0.00    64.00   72.00
                                      5.56    5.56     0.00    88.89
                                    100.00   25.00     0.00    84.21
  Cardiovascular Disease                 0       6        0        2       8
                                      0.00   12.00     0.00     4.00   16.00
                                      0.00   75.00     0.00    25.00
                                      0.00   75.00     0.00     5.26
  Other Diseases and Risk Factors        0       0        0        2       2
                                      0.00    0.00     0.00     4.00    4.00
                                      0.00    0.00     0.00   100.00
                                      0.00    0.00     0.00     5.26
  Overarching Conditions                 0       0        2        0       2
                                      0.00    0.00     4.00     0.00    4.00
                                      0.00    0.00   100.00     0.00
                                      0.00    0.00   100.00     0.00
  Tobacco and Alcohol                    0       0        0        2       2
                                      0.00    0.00     0.00     4.00    4.00
                                      0.00    0.00     0.00   100.00
                                      0.00    0.00     0.00     5.26
  Total                                  2       8        2       38     50
                                      4.00   16.00     4.00    76.00 100.00
Table 6: Contingency Table Category of Occurrences by Georgia Size

  CATEGORY(CATEGORY)                                GA_SIZE
Frequency
Percent
Row Pct
Col Pct                            Large Medium Small X-Large X-Small           Total
                         Cancer         1       1        3         0       31      36
                                     2.00    2.00     6.00      0.00    62.00   72.00
                                     2.78    2.78     8.33      0.00    86.11
                                   100.00   50.00    50.00      0.00    79.49
         Cardiovascular Disease         0       1        3         0        4       8
                                     0.00    2.00     6.00      0.00     8.00   16.00
                                     0.00   12.50    37.50      0.00    50.00
                                     0.00   50.00    50.00      0.00    10.26
 Other Diseases and Risk Factors        0       0        0         0        2       2
                                     0.00    0.00     0.00      0.00     4.00    4.00
                                     0.00    0.00     0.00      0.00   100.00
                                     0.00    0.00     0.00      0.00     5.13
        Overarching Conditions          0       0        0         2        0       2
                                     0.00    0.00     0.00      4.00     0.00    4.00
                                     0.00    0.00     0.00    100.00     0.00
                                     0.00    0.00     0.00    100.00     0.00
            Tobacco and Alcohol         0       0        0         0        2       2
                                     0.00    0.00     0.00      0.00     4.00    4.00
                                     0.00    0.00     0.00      0.00   100.00
                                     0.00    0.00     0.00      0.00     5.13
Total                                   1       2        6         2       39     50
                                     2.00    4.00    12.00      4.00    78.00 100.00
Figure 7: Pie Chart Category of Occurrences (per 100,000 people)
Figure 8: Pie Chart Category of Occurrences United States (per 100,000 people)
Figure 9: Pie Chart Category of Occurrences Georgia (per 100,000 people)
Figure 10: Bar Chart of Category of Occurrences (per 100,000 people)
Figure 11: Bar Chart of Category of Occurrences United States (per 100,000 people)
                    FREQUENCY
                             40




                             30




                             20




                             10




                              0

                                  Large    Small     X- Large    X- Small


                                                US_ SIZE




  Figure 12: Bar Chart of Category of Occurrences Georgia (per 100,000 people)
                 FREQUENCY
                        40




                        30




                        20




                        10




                         0

                              Large   Medi um    Small     X- Large   X- Small


                                                GA_ SIZE
Figure 13: Stacked Bar Chart of Category of Occurrences
            United States (per 100,000 people)
Figure 14: Stacked Bar Chart of Category of Occurrences
                            Georgia (per 100,000 people)




Table 7: 95 and 99% Confidence Intervals for United States and Georgia 20 set Sample

                                                     Lower 95%          Upper 95%
  Variable              Label                 N     CL for Mean        CL for Mean
  UNITED_STATES         UNITED STATES        20             38.69            200.55
  GEORGIA               GEORGIA              20             35.69            210.84
                                                     Lower 99%          Upper 99%
  Variable              Label                 N     CL for Mean        CL for Mean
  UNITED_STATES         UNITED STATES        20              9.00            230.24
  GEORGIA               GEORGIA              20              3.56            242.97
Appendix II: Figures Generated in Minitab

                                      Figure 15

                         Hi st ogr am of UNI TED_ STATES

             25


             20
Fr equency




             15


             10


             5



             0
                  0         160               320            480         640
                                     occur r ences/ 100k


                                      Figure 16

                             Hi st ogr am of GEORGI A

             30


             25


             20
Fr equency




             15


             10


             5


             0
                  0   100      200      300       400      500     600   700
                                     occur r ences/ 100k
Figure 17

                            Boxpl ot of UNI TED_ STATES
                      700


                      600


                      500
occur r ences/ 100k




                      400


                      300


                      200


                      100


                       0



                                     Figure 18

                               Boxpl ot of GEORGI A
                      800

                      700

                      600
occur r ences/ 100k




                      500

                      400

                      300

                      200

                      100

                       0
Figure 19

                                                      Boxpl ot of UNI TED_ STATES by CATEGORY
                             700
                             600
      occur r ences/ 100k


                             500
                             400
                             300
                             200
                             100
                                  0
                                                  r                                  e                                                   s                                              s                                     l
                                                ce                                 as                                               or                                              n                                       ho
                                           an                                  i se                                            ct                                               tio                                     o
                                          C                                   D                                              Fa                                          n   di                                   A
                                                                                                                                                                                                                      lc
                                                                         ar                                             sk                                           o                                        d
                                                                       ul                                             Ri                                            C                                    an
                                                                     sc                                           d                                            ng                                    o
                                                                va                                             an                                            hi                                   cc
                                                             di
                                                               o                                           s                                            rc                                     ba
                                                        ar                                               se                                           ra                                    To
                                                       C                                              sea                                        ve
                                                                                                    Di                                       O
                                                                                            e   r
                                                                                         th
                                                                                    O

                                                                                                                      CATEGORY


                                                                                                               Figure 20

                                                         Boxpl ot of GEORGI A by CATEGORY
                            800
                            700
occur ences/ 100k




                            600
                            500
                            400
                            300
                            200
                            100
                             0

                                           er                         e                      rs                         ns                   ol
                                        nc                         as                      to                         io                   oh
                                      Ca                         se                     Fa
                                                                                          c
                                                                                                                  di
                                                                                                                    t
                                                                                                                                        Al
                                                                                                                                          c
                                                             r Di                     k                         on                    d
                                                           la                        s                         C                    an
                                                         cu                       Ri                        ng                    o
                                                        s                       d                          i                   cc
                                                      va                      an                         ch
                                                                                                                             ba
                                                    io                      s                         ar
                                                  rd                      se                       er                      To
                                                Ca                     sea                      Ov
                                                                     Di
                                                                  er
                                                                th
                                                               O
                                                                                      CATEGORY
Figure 21

                  Pi e Char t of CATEGORY
                                                            Category
                                                            Cancer
                           4.0%                             Cardiovascular Disease
                    4.0%
             4.0%                                           Other Diseases and Risk Factors
                                                            Overarching Conditions
                                                            Tobacco and Alcohol



  16.0%




                                            72.0%




                                      Figure 22

Pie Char t of CATEGORY f or UNI TED_ STATES
                                                            Category
                                                            Cancer
                             0.3%
                                                            Cardiovascular Disease
                                                            Other Diseases and Risk Factors
          24.5%                                             Overarching Conditions
                                                            Tobacco and Alcohol




                                                    47.5%
 0.0%




           27.6%
Figure 23

             Pi e Char t of CATEGORY f or GEORGI A
                                                                                               Category
                                                                                               Cancer
                                       0.3%
                                                                                               Cardiovascular Disease
                                                                                               Other Diseases and Risk Factors
                                                                                               Overarching Conditions
              28.6%                                                                            Tobacco and Alcohol




                                                                            45.3%



        0.0%




                      25.7%




                                                              Figure 24

                                              Bar Char t of CATEGORY
        40

        30
Count




        20

        10

        0
                     er                         e                        rs                        ns                   ol
                   nc                         as                       to                        io                   oh
                 Ca                         se                     Fa
                                                                     c
                                                                                             di
                                                                                               t
                                                                                                                  Al
                                                                                                                     c
                                       r Di                      k                         on                   d
                                     la                         s                         C                   an
                                   cu                         Ri                       ng                   o
                                  s                         d                         i                   cc
                                va                        an                        ch
                                                                                                        ba
                              io                        s                        ar
                            rd                        se                       er                     To
                          Ca                      s ea                      Ov
                                                Di
                                             er
                                          th
                                         O
                                                                  CATEGORY
Figure 25

                      Bar Char t of Unit ed St at es Si ze
        40




        30
Count




        20




        10




        0
              Large           Small                 X-Large        X-Small
                                        US_ SI ZE


                                      Figure 26

                         Bar Char t of Geor gi a Si ze
        40




        30
Count




        20




        10




        0
             Large      Medium           Small           X-Large     X-Small
                                        GA_ SI ZE
Figure 27

              St acked Bar Char t of CATEGORY by Unit ed St at es Si ze
               40                                                                                            US_SI ZE
                                                                                                             X-Small
                                                                                                             X-Large
               30
                                                                                                             Small
                                                                                                             Large
Count




               20


               10


                0
        CATEGORY          er                 e                 rs                   ns                 ol
                        nc                 as                to                   io                 oh
                      Ca                ise               Fa
                                                            c                 di
                                                                                t
                                                                                                 Al
                                                                                                   c
                                      rD               sk                   on                 d
                                    la              Ri
                                                                           C                 an
                               scu                d                   i ng                 o
                             va                 an                  ch                   cc
                           io                 s                   ar                   ba
                         rd                 se                 er                   To
                       Ca               isea                Ov
                                      rD
                                   he
                                Ot


                                                          Figure 28

                    St acked Bar Char t of CATEGORY by Geor gi a Si ze
               40                                                                                            GA_SI ZE
                                                                                                             X-Small
                                                                                                             X-Large
               30
                                                                                                             Small
                                                                                                             Medium
Count




                                                                                                             Large
               20


               10


                0
        CATEGORY          er                   e                 rs                  ns                 ol
                        nc                   as                to                  io                 oh
                      Ca                   se                c                 di
                                                                                 t
                                                                                                  Al
                                                                                                     c
                                      r Di                 Fa                on
                                    la                  sk                  C                 an
                                                                                                d
                                 cu                   Ri                 ng                 o
                               s                    d                   i                 cc
                             va                   an                  ch
                           io                   s                  ar                   ba
                         rd                   se                 er                  To
                       Ca                is ea               Ov
                                      rD
                                   he
                                Ot
Appendix III: SAS Code

* FULLERTON, STAT 3010.W01, FINAL PROJECT: DATA ANALYSIS OF Center for Disease Control Chronic
Disease Indicators (CDC - CDI) of the United States and Georgia for Year 2005;

* SETTING SYSTEM OPTIONS;

DM 'LOG;CLEAR;OUT;CLEAR;';
OPTIONS LS=100 PS=75 FORMDLIM="=";
QUIT;

* Loading previously saved data set;

DATA NEWCDICDC;
      SET 'V:final.projectCDICDC';
RUN;

* Saving the data as a permanent SAS data set;

DATA CDICDC;
      SET 'V:final.projectCDICDC';
RUN;

* To view data in SAS;

PROC PRINT DATA = CDICDC;
RUN;

* SETTING LIBREF;

* Saving data as a permanent SAS data set;

LIBNAME W2 'V:final.project';

DATA W2.CDICDC;
      SET CDICDC;
RUN;

* IMPORT CDC - CDI DATA;

PROC IMPORT
      DATAFILE = 'V:final.projectFilChrDisIndCDC.xls'
      OUT = T1
      REPLACE;
RUN; QUIT;

* Variable View in SAS;

PROC CONTENTS DATA = W2.CDICDC;
RUN;

* Table 1 Dataset;

ODS RTF;

PROC PRINT DATA = W2.CDICDC;
      VAR CATEGORY INDICATOR YEAR MEASURE UNITED_STATES GEORGIA;
RUN;

ODS RTF CLOSE;
* Descriptive Statistics for Quantitative Variables;

ODS RTF;
PROC MEANS DATA = W2.CDICDC MAXDEC=2 N MEAN MEDIAN STD RANGE MIN MAX;
      VAR UNITED_STATES GEORGIA;
RUN;
ODS RTF CLOSE;

* Frequency Tables of Category Variables;

ODS RTF;
PROC FREQ DATA = W2.CDICDC;
      TABLES CATEGORY INDICATOR MEASURE;
RUN;
ODS RTF CLOSE;

* Histograms and Boxplots;

DM 'LOG; CLEAR; OUT; CLEAR;';

PROC UNIVARIATE DATA = W2.CDICDC;
      VAR UNITED_STATES GEORGIA;
      HISTOGRAM;
RUN;

PROC SORT DATA = W2.CDICDC;
      BY YEAR;
PROC BOXPLOT DATA = W2.CDICDC;
      PLOT UNITED_STATES*YEAR;
      PLOT GEORGIA*YEAR;
RUN;

* Boxplot of Occurrences by Category;

DM 'LOG; CLEAR; OUT; CLEAR; GRAPH; CLEAR';
PROC SORT DATA = W2.CDICDC;
      BY CATEGORY;
PROC BOXPLOT DATA = W2.CDICDC;
      PLOT UNITED_STATES*CATEGORY;
      PLOT GEORGIA*CATEGORY;
RUN;

* Creating new variable (size) for contingency table analysis;

DM 'LOG;CLEAR;OUT;CLEAR';
DATA T1;
      SET T1;
      LENGTH US_SIZE $ 7;
      IF UNITED_STATES < 145 THEN US_SIZE = 'X-Small';
      IF (UNITED_STATES GE 145) AND (UNITED_STATES < 300) THEN US_SIZE = 'Small';
      IF (UNITED_STATES GE 300) AND (UNITED_STATES < 450) THEN US_SIZE = 'Medium';
      IF (UNITED_STATES GE 450) AND (UNITED_STATES < 600) THEN US_SIZE = 'Large';
      IF (UNITED_STATES GE 600) THEN US_SIZE = 'X-Large';
      SET T1;
      LENGTH GA_SIZE $ 7;
      IF GEORGIA < 145 THEN GA_SIZE = 'X-Small';
      IF (GEORGIA GE 145) AND (GEORGIA < 300) THEN GA_SIZE = 'Small';
      IF (GEORGIA GE 300) AND (GEORGIA < 450) THEN GA_SIZE = 'Medium';
      IF (GEORGIA GE 450) AND (GEORGIA < 600) THEN GA_SIZE = 'Large';
      IF (GEORGIA GE 600) THEN GA_SIZE = 'X-Large';
PROC PRINT DATA = T1;
RUN;

* Contingency Tables;

DM 'LOG;CLEAR;OUT;CLEAR';

ODS RTF;
PROC FREQ DATA = T1;
      TABLES CATEGORY*US_SIZE;
RUN;
ODS RTF CLOSE;

ODS RTF;
PROC FREQ DATA = T1;
      TABLES CATEGORY*GA_SIZE;
RUN;
ODS RTF CLOSE;

* Pie Charts;

PROC GCHART DATA = W2.CDICDC;
      PIE CATEGORY;
      GOPTIONS HTEXT = 1;
LEGEND;
RUN;
QUIT;

PROC GCHART DATA = W2.CDICDC;
      PIE CATEGORY / SUMVAR = UNITED_STATES PERCENT = INSIDE;
GOPTIONS HTEXT = 1;
LEGEND;
RUN;
QUIT;

PROC GCHART DATA = W2.CDICDC;
      PIE CATEGORY / SUMVAR = GEORGIA PERCENT = INSIDE;
GOPTIONS HTEXT = 1;
LEGEND;
RUN;
QUIT;

* Bar Charts;

PROC GCHART DATA = W2.CDICDC;
      VBAR CATEGORY / TYPE = FREQ;
GOPTIONS HTEXT = 1;
LEGEND;
RUN;

PROC GCHART DATA = T1;
      VBAR US_SIZE / TYPE = FREQ;
GOPTIONS HTEXT = 1;
LEGEND;
RUN;

PROC GCHART DATA = T1;
      VBAR GA_SIZE / TYPE = FREQ;
GOPTIONS HTEXT = 1;
LEGEND;
RUN;
* Stacked Bar Charts;

PROC GCHART DATA = T1;
      VBAR CATEGORY / SUBGROUP = US_SIZE;
      GOPTIONS HTEXT = 1;
      LEGEND;
RUN;

PROC GCHART DATA = T1;
      VBAR CATEGORY / SUBGROUP = GA_SIZE;
      GOPTIONS HTEXT = 1;
      LEGEND;
RUN;

* Generate Random sample set of data with seed to replicate data;

DATA CDICDCN;

         SET W2.CDICDC;
         GROUP = RANUNI(123456);

PROC PRINT DATA = CDICDCN;
RUN;

* Sort random data to show only the first 20 observations;

PROC SORT DATA = CDICDCN;

         BY GROUP;

DATA CDICDCNN;

      SET CDICDCN;
      IF _n_ < 21;
PROC PRINT DATA = CDICDCNN;
RUN;

* Confidence Intervals on ratio scale variables;

DM 'LOG;CLEAR;OUT;CLEAR;';

ODS RTF;
PROC MEANS DATA = CDICDCNN MAXDEC=2 N CLM ALPHA = .05;
      VAR UNITED_STATES GEORGIA;
RUN;
PROC MEANS DATA = CDICDCNN MAXDEC=2 N CLM ALPHA = .01;
      VAR UNITED_STATES GEORGIA;
RUN;
ODS RTF CLOSE;

* Export Data to Minitab;
PROC EXPORT
         OUTFILE = 'V:final.projectFilChrDisIndCDC.csv'
         DATA = W2.CDICDC
         REPLACE;
RUN;
PROC EXPORT
         OUTFILE = 'V:final.projectFilChrDisIndCDCT1.csv'
         DATA = T1
         REPLACE;
RUN;
QUIT;

Weitere ähnliche Inhalte

Ähnlich wie Analysis of CDC Chronic Disease Indicators US compared with Georgia

Practical lesson №4 Cases SMR Survival (1).pptx
Practical lesson №4 Cases SMR Survival (1).pptxPractical lesson №4 Cases SMR Survival (1).pptx
Practical lesson №4 Cases SMR Survival (1).pptxKanishka478113
 
Practical lesson №3 Cases Mortality.pptx
Practical lesson №3 Cases Mortality.pptxPractical lesson №3 Cases Mortality.pptx
Practical lesson №3 Cases Mortality.pptxKanishka478113
 
No sólo de especialistas médicos vive el hombre
No sólo de especialistas médicos vive el hombreNo sólo de especialistas médicos vive el hombre
No sólo de especialistas médicos vive el hombretrujillo40
 
Mortality measurement
Mortality measurementMortality measurement
Mortality measurementAbino David
 
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docxPRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docxsleeperharwell
 
Validation of Two Prognostic Mortality Indices at a Program for All-Inclusive...
Validation of Two Prognostic Mortality Indices at a Program for All-Inclusive...Validation of Two Prognostic Mortality Indices at a Program for All-Inclusive...
Validation of Two Prognostic Mortality Indices at a Program for All-Inclusive...Leonard Davis Institute of Health Economics
 
09 June Health Status Update: Cancer Staging in Utah
09 June Health Status Update: Cancer Staging in Utah09 June Health Status Update: Cancer Staging in Utah
09 June Health Status Update: Cancer Staging in UtahState of Utah, Salt Lake City
 
MORTALITY by RAM NAIK
MORTALITY by RAM NAIKMORTALITY by RAM NAIK
MORTALITY by RAM NAIKRam Naik M
 
Why We Need Health Reform And Why It Is So Difficult Feb2010
Why We Need Health Reform And Why It Is So Difficult Feb2010Why We Need Health Reform And Why It Is So Difficult Feb2010
Why We Need Health Reform And Why It Is So Difficult Feb2010Sharp Metropolitan Medical Campus
 
Ozz(morbidity and mortality)
Ozz(morbidity and mortality)Ozz(morbidity and mortality)
Ozz(morbidity and mortality)Viju Rathod
 
Regional Core Health Data Initiative
Regional Core Health Data InitiativeRegional Core Health Data Initiative
Regional Core Health Data InitiativePAHO_RHO
 
Morbidity and mortality slides
Morbidity and mortality slidesMorbidity and mortality slides
Morbidity and mortality slidesjesus4u
 
Quality of Life of the Filipino Cancer Patient
Quality of Life of the Filipino Cancer PatientQuality of Life of the Filipino Cancer Patient
Quality of Life of the Filipino Cancer PatientMary Ondinee Manalo Igot
 
Epidemiology.pptx
Epidemiology.pptxEpidemiology.pptx
Epidemiology.pptxDeepakRx1
 

Ähnlich wie Analysis of CDC Chronic Disease Indicators US compared with Georgia (20)

Practical lesson №4 Cases SMR Survival (1).pptx
Practical lesson №4 Cases SMR Survival (1).pptxPractical lesson №4 Cases SMR Survival (1).pptx
Practical lesson №4 Cases SMR Survival (1).pptx
 
Practical lesson №3 Cases Mortality.pptx
Practical lesson №3 Cases Mortality.pptxPractical lesson №3 Cases Mortality.pptx
Practical lesson №3 Cases Mortality.pptx
 
No sólo de especialistas médicos vive el hombre
No sólo de especialistas médicos vive el hombreNo sólo de especialistas médicos vive el hombre
No sólo de especialistas médicos vive el hombre
 
Bodgbddeathdalyestimates
BodgbddeathdalyestimatesBodgbddeathdalyestimates
Bodgbddeathdalyestimates
 
Mortality measurement
Mortality measurementMortality measurement
Mortality measurement
 
Global Burden of Disease - Pakistan Presentation
Global Burden of Disease - Pakistan PresentationGlobal Burden of Disease - Pakistan Presentation
Global Burden of Disease - Pakistan Presentation
 
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docxPRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
PRIVATE AGE ADJUSTMENTWhen analyzing epidemiologic dat.docx
 
Standerdization
StanderdizationStanderdization
Standerdization
 
Validation of Two Prognostic Mortality Indices at a Program for All-Inclusive...
Validation of Two Prognostic Mortality Indices at a Program for All-Inclusive...Validation of Two Prognostic Mortality Indices at a Program for All-Inclusive...
Validation of Two Prognostic Mortality Indices at a Program for All-Inclusive...
 
Variation
VariationVariation
Variation
 
09 June Health Status Update: Cancer Staging in Utah
09 June Health Status Update: Cancer Staging in Utah09 June Health Status Update: Cancer Staging in Utah
09 June Health Status Update: Cancer Staging in Utah
 
MORTALITY by RAM NAIK
MORTALITY by RAM NAIKMORTALITY by RAM NAIK
MORTALITY by RAM NAIK
 
Global and regional mortality from 235 causes of death for 20 age groups in 1...
Global and regional mortality from 235 causes of death for 20 age groups in 1...Global and regional mortality from 235 causes of death for 20 age groups in 1...
Global and regional mortality from 235 causes of death for 20 age groups in 1...
 
Why We Need Health Reform And Why It Is So Difficult Feb2010
Why We Need Health Reform And Why It Is So Difficult Feb2010Why We Need Health Reform And Why It Is So Difficult Feb2010
Why We Need Health Reform And Why It Is So Difficult Feb2010
 
Ozz(morbidity and mortality)
Ozz(morbidity and mortality)Ozz(morbidity and mortality)
Ozz(morbidity and mortality)
 
Regional Core Health Data Initiative
Regional Core Health Data InitiativeRegional Core Health Data Initiative
Regional Core Health Data Initiative
 
Morbidity and mortality slides
Morbidity and mortality slidesMorbidity and mortality slides
Morbidity and mortality slides
 
Quality of Life of the Filipino Cancer Patient
Quality of Life of the Filipino Cancer PatientQuality of Life of the Filipino Cancer Patient
Quality of Life of the Filipino Cancer Patient
 
zhe_CRI2015_NHANES
zhe_CRI2015_NHANESzhe_CRI2015_NHANES
zhe_CRI2015_NHANES
 
Epidemiology.pptx
Epidemiology.pptxEpidemiology.pptx
Epidemiology.pptx
 

Analysis of CDC Chronic Disease Indicators US compared with Georgia

  • 1. D. Fullerton STAT 3010.W01 Final Project 07/19/09
  • 2. STAT 3010.W01 Final Project: Analysis of Center for Disease Control Chronic Disease Indicators of the United States and Georgia for Year 2005 The aim of this report is to discuss the results of a statistical analysis of Chronic Disease Indicators of the United States and Georgia for the year 2005 made by the Center for Disease Control. The points covered in the analysis of data were: 1) Determine descriptive statistics and describe the distributions of variables of the data set, 2) Compare chronic disease indicator rates between the United States and Georgia separately for each of five categories, and 3) Create a random 20 item sample from the dataset, then estimate the Chronic Disease Indicator rate in the United States and Georgia using a 95% and 99% confidence interval, then determine whether or not the population mean rate for all 50 initial data were captured by the estimated confidence intervals. SAS 9.1.3 SP4 and graphics from SAS and Minitab 15 were the applications used in this analysis. The particular dataset was chosen due relation to healthcare, size, and complexion of data. The five variables (three catagorical and two quantitative) of the Center for Disease Control Chronic Disease Indicators of the United States and Georgia for Year 2005 were obtained by filtering a data set from the Center for Disease Control website (http://apps.nccd.cdc.gov/cdi/Default.aspx). A comparison was selected between the United States and Georgia. The data and definitions were originally developed by The Council of State and Territorial Epidemiologists with epidemiologists and chronic disease program directors at the state and federal level, were refined between 1999 and 2002, then a survey was made for 2005. This data has proved useful in Georgia to develop a database of the indicators by 19 health districts available via the internet. As well, the Division of Diabetes Translation at Center for Disease Control uses the data to assist diabetes programs with their surveillance and epidemiological activities. Table 1 shows a short selection of the data, and variable names used in Table 1 are described in Table 2. There are 50 datapoints from the year 2005, and the six other datapoints from different years were trimmed from the data set before analysis. Therefore, results and analysis is only valid for the year 2005. The occurrences per 100,000 people of the United States, and Georgia, by Chronic Disease Indicator category are assessed. The assessment of the quantitative and categorical variables shows the following. Table 3 shows the descriptive statistics for Chronic Disease Indicators of the United States and Georgia both have a significant difference between the mean and median. Figures 1 and 2 clearly show that the distribution of occurrences for the United States, and for Georgia, are both unimodal, and positively skewed. Figures 3 and 4 further demonstrate this trend. Although drasticly skewed, no outliers are shown. The most representative measure of central tendency is the median, 25.95 for the United States, and 25.90 for Georgia. Table 4 shows the frequency of each occurrence by category. Cancer swallows up the data at 36 occurrences (out of 50), this mode is over four times that of the next leading indicator, Cardiovascular Disease. Figures 5 and 6 reinforce this, however, it is notable that cancer has a broader range of results,and is skewed, but Cardiovascular Disease has a more even distribution. A new categorical variable was created for the occurrences in the United States and Georgia based on size. The occurrences were broken up into chunks of size 150. The Contingency Table 5 shows that Cancer statistics for the United States are mostly returned in the “X-Small” range, meaning that most of the 32 data points in this category were less than 150 occurrences.
  • 3. Occurrences for Georgia differ in that some results fall into the “Medium” range, and 50% of the Cardiovascular results are from the “X-Small” category. The categorical indicator is also show in Figures 7 through 10. They stress again that Cancer is the leading indicator, by far, at over 75% overall. Figure 7 clumps the smallest three indicators into one category, “Other”. The breakdowns of cause by either United States, or Georgia, continue to stress the facts that Cancer and Cardiovascular Disease are the factors that beg further study. Tables 11 and 12 again show the breakdown of occurrences by the newly created variable, size. Each show that most occurrences for both the United States, and for Georgia, fall into the “X- Small” category, at a frequency of nearly 40% in each. Tables 13 and 14 show the category of incidence by size on stacked bar charts for the United States and Georgia. Cancer results in the United States fit mostly the “X-small” category, and Cardiovascular fit the “Small” category. The results in Geogia show that “X-Small” leads in all categories, and is the vast majority of the Cancer indicator. Finally, a random sample was produced in SAS of 20 data points. Both the 95 and 99% confidence intervals captured the true sample means with the United States between 38.69 and 200.55 (95%), and 9.00 and 230.24 (99%), where the true mean is 102.33, and Georgia between 35.69 and 210.84 (95%), and 3.56 and 242.97 (99%), where the true mean is 100.37
  • 4. APPENDIX I: SAS TABLES AND FIGURES Table 1: Abbreviated Display of the Center for Disease Control Chronic Disease Indicators of the United States and Georgia for Year 2005 Obs CATEGORY INDICATOR YEAR MEASURE UNITED_STATES GEORGIA 1 Tobacco and Alcohol Chronic liver disease - mortality 2005 Crude Rate 9.3 7.5 2 Tobacco and Alcohol Chronic liver disease - mortality 2005 Age-adjusted 8.9 8.1 Rate 3 Cancer Invasive cancer (all sites combined) - 2005 Crude Rate 469.8 402.6 incidence 4 Cancer Invasive cancer (all sites combined) - 2005 Age-adjusted 458.4 452.0 incidence Rate 5 Cancer Cancer (all sites combined) - mortality 2005 Crude Rate 188.6 157.2 . . . . . . . . . . . . . . . . . . . . 48 Overarching Conditions Premature mortality among adults aged 2005 Age-adjusted 618.6 711.1 45-64 years Rate 49 Other Diseases and Risk Asthma - mortality 2005 Crude Rate 1.3 1.3 Factors 50 Other Diseases and Risk Asthma - mortality 2005 Age-adjusted 1.3 1.5 Factors Rate NOTE: The data for other years were minimal and thus eliminated from this data set (the numeration “Obs” was added automatically by SAS).
  • 5. Table 2: Summary of Variables Contained in Center for Disease Control Chronic Disease Indicators of the United States and Georgia for Year 2005 Measurement Variable Name Label General Type Specific Type Units Observation Obs Categorical Identifier Variable N/A number CATEGORY Disease category Categorical Nominal N/A INDICATOR Disease indicator Categorical Nominal N/A Survey year YEAR Categorical Nominal N/A (only 2005 used) Crude or Age MEASURE Categorical Nominal N/A adjusted rate Number of instances per UNITED_STATES - Quantitative Interval/Ratio 100,000 persons* Number of instances per GEORGIA - Quantitative Interval/Ratio 100,000 persons* * standardized by the direct method to the year 2000 standard U.S. population based on single years of age from the Census P25-1130 series estimates
  • 6. Table 3: Descriptive Statistics of Center for Disease Control Chronic Disease Indicators of the United States and Georgia for Year 2005 Variable N Mean Median Std Dev Range Minimum Maximum UNITED_STATES 50 102.23 25.95 153.70 628.60 1.30 629.90 GEORGIA 50 100.37 25.90 163.10 719.70 1.30 721.00 Table 4: Frequency Table of Center for Disease Control Chronic Disease Indicators by Category CATEGORY Cumulative Cumulative CATEGORY Frequency Percent Frequency Percent Cancer 36 72.00 36 72.00 Cardiovascular Disease 8 16.00 44 88.00 Other Diseases and Risk Factors 2 4.00 46 92.00 Overarching Conditions 2 4.00 48 96.00 Tobacco and Alcohol 2 4.00 50 100.00
  • 7. Figure 1: Histogram of Occurrences United States (per 100,000 people) 70 60 50 P 40 e r c e n t 30 20 10 0 0 120 240 360 480 600 UNITED STATES Figure 2: Histogram of Occurrences Georgia (per 100,000 people) 70 60 50 P 40 e r c e n t 30 20 10 0 0 120 240 360 480 600 720 GEORGIA
  • 8. Figure 3: Box Plot of Occurrences United States (year 2005 per 100,000 people) 800 600 U N I T E D 400 S T A T E S 200 0 2005 YEAR Figure 4: Box Plot of Occurrences Georgia (year 2005 per 100,000 people) 800 600 G E O R 400 G I A 200 0 2005 YEAR
  • 9. Figure 5: Side by Side Box Plot of Occurrences United States (per 100,000 people) 800 600 U N I T E D 400 S T A T E S 200 0 Cancer Tobacco and Alcohol CATEGORY Figure 6: Side by Side Box Plot of Occurrences Georgia (per 100,000 people) 800 600 G E O R 400 G I A 200 0 Cancer Tobacco and Alcohol CATEGORY
  • 10. Table 5: Contingency Table Category of Occurrences by United States Size CATEGORY(CATEGORY) US_SIZE Frequency Percent Row Pct Col Pct Large Small X-Large X-Small Total Cancer 2 2 0 32 36 4.00 4.00 0.00 64.00 72.00 5.56 5.56 0.00 88.89 100.00 25.00 0.00 84.21 Cardiovascular Disease 0 6 0 2 8 0.00 12.00 0.00 4.00 16.00 0.00 75.00 0.00 25.00 0.00 75.00 0.00 5.26 Other Diseases and Risk Factors 0 0 0 2 2 0.00 0.00 0.00 4.00 4.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 5.26 Overarching Conditions 0 0 2 0 2 0.00 0.00 4.00 0.00 4.00 0.00 0.00 100.00 0.00 0.00 0.00 100.00 0.00 Tobacco and Alcohol 0 0 0 2 2 0.00 0.00 0.00 4.00 4.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 5.26 Total 2 8 2 38 50 4.00 16.00 4.00 76.00 100.00
  • 11. Table 6: Contingency Table Category of Occurrences by Georgia Size CATEGORY(CATEGORY) GA_SIZE Frequency Percent Row Pct Col Pct Large Medium Small X-Large X-Small Total Cancer 1 1 3 0 31 36 2.00 2.00 6.00 0.00 62.00 72.00 2.78 2.78 8.33 0.00 86.11 100.00 50.00 50.00 0.00 79.49 Cardiovascular Disease 0 1 3 0 4 8 0.00 2.00 6.00 0.00 8.00 16.00 0.00 12.50 37.50 0.00 50.00 0.00 50.00 50.00 0.00 10.26 Other Diseases and Risk Factors 0 0 0 0 2 2 0.00 0.00 0.00 0.00 4.00 4.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 5.13 Overarching Conditions 0 0 0 2 0 2 0.00 0.00 0.00 4.00 0.00 4.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 100.00 0.00 Tobacco and Alcohol 0 0 0 0 2 2 0.00 0.00 0.00 0.00 4.00 4.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00 5.13 Total 1 2 6 2 39 50 2.00 4.00 12.00 4.00 78.00 100.00
  • 12. Figure 7: Pie Chart Category of Occurrences (per 100,000 people)
  • 13. Figure 8: Pie Chart Category of Occurrences United States (per 100,000 people)
  • 14. Figure 9: Pie Chart Category of Occurrences Georgia (per 100,000 people)
  • 15. Figure 10: Bar Chart of Category of Occurrences (per 100,000 people)
  • 16. Figure 11: Bar Chart of Category of Occurrences United States (per 100,000 people) FREQUENCY 40 30 20 10 0 Large Small X- Large X- Small US_ SIZE Figure 12: Bar Chart of Category of Occurrences Georgia (per 100,000 people) FREQUENCY 40 30 20 10 0 Large Medi um Small X- Large X- Small GA_ SIZE
  • 17. Figure 13: Stacked Bar Chart of Category of Occurrences United States (per 100,000 people)
  • 18. Figure 14: Stacked Bar Chart of Category of Occurrences Georgia (per 100,000 people) Table 7: 95 and 99% Confidence Intervals for United States and Georgia 20 set Sample Lower 95% Upper 95% Variable Label N CL for Mean CL for Mean UNITED_STATES UNITED STATES 20 38.69 200.55 GEORGIA GEORGIA 20 35.69 210.84 Lower 99% Upper 99% Variable Label N CL for Mean CL for Mean UNITED_STATES UNITED STATES 20 9.00 230.24 GEORGIA GEORGIA 20 3.56 242.97
  • 19. Appendix II: Figures Generated in Minitab Figure 15 Hi st ogr am of UNI TED_ STATES 25 20 Fr equency 15 10 5 0 0 160 320 480 640 occur r ences/ 100k Figure 16 Hi st ogr am of GEORGI A 30 25 20 Fr equency 15 10 5 0 0 100 200 300 400 500 600 700 occur r ences/ 100k
  • 20. Figure 17 Boxpl ot of UNI TED_ STATES 700 600 500 occur r ences/ 100k 400 300 200 100 0 Figure 18 Boxpl ot of GEORGI A 800 700 600 occur r ences/ 100k 500 400 300 200 100 0
  • 21. Figure 19 Boxpl ot of UNI TED_ STATES by CATEGORY 700 600 occur r ences/ 100k 500 400 300 200 100 0 r e s s l ce as or n ho an i se ct tio o C D Fa n di A lc ar sk o d ul Ri C an sc d ng o va an hi cc di o s rc ba ar se ra To C sea ve Di O e r th O CATEGORY Figure 20 Boxpl ot of GEORGI A by CATEGORY 800 700 occur ences/ 100k 600 500 400 300 200 100 0 er e rs ns ol nc as to io oh Ca se Fa c di t Al c r Di k on d la s C an cu Ri ng o s d i cc va an ch ba io s ar rd se er To Ca sea Ov Di er th O CATEGORY
  • 22. Figure 21 Pi e Char t of CATEGORY Category Cancer 4.0% Cardiovascular Disease 4.0% 4.0% Other Diseases and Risk Factors Overarching Conditions Tobacco and Alcohol 16.0% 72.0% Figure 22 Pie Char t of CATEGORY f or UNI TED_ STATES Category Cancer 0.3% Cardiovascular Disease Other Diseases and Risk Factors 24.5% Overarching Conditions Tobacco and Alcohol 47.5% 0.0% 27.6%
  • 23. Figure 23 Pi e Char t of CATEGORY f or GEORGI A Category Cancer 0.3% Cardiovascular Disease Other Diseases and Risk Factors Overarching Conditions 28.6% Tobacco and Alcohol 45.3% 0.0% 25.7% Figure 24 Bar Char t of CATEGORY 40 30 Count 20 10 0 er e rs ns ol nc as to io oh Ca se Fa c di t Al c r Di k on d la s C an cu Ri ng o s d i cc va an ch ba io s ar rd se er To Ca s ea Ov Di er th O CATEGORY
  • 24. Figure 25 Bar Char t of Unit ed St at es Si ze 40 30 Count 20 10 0 Large Small X-Large X-Small US_ SI ZE Figure 26 Bar Char t of Geor gi a Si ze 40 30 Count 20 10 0 Large Medium Small X-Large X-Small GA_ SI ZE
  • 25. Figure 27 St acked Bar Char t of CATEGORY by Unit ed St at es Si ze 40 US_SI ZE X-Small X-Large 30 Small Large Count 20 10 0 CATEGORY er e rs ns ol nc as to io oh Ca ise Fa c di t Al c rD sk on d la Ri C an scu d i ng o va an ch cc io s ar ba rd se er To Ca isea Ov rD he Ot Figure 28 St acked Bar Char t of CATEGORY by Geor gi a Si ze 40 GA_SI ZE X-Small X-Large 30 Small Medium Count Large 20 10 0 CATEGORY er e rs ns ol nc as to io oh Ca se c di t Al c r Di Fa on la sk C an d cu Ri ng o s d i cc va an ch io s ar ba rd se er To Ca is ea Ov rD he Ot
  • 26. Appendix III: SAS Code * FULLERTON, STAT 3010.W01, FINAL PROJECT: DATA ANALYSIS OF Center for Disease Control Chronic Disease Indicators (CDC - CDI) of the United States and Georgia for Year 2005; * SETTING SYSTEM OPTIONS; DM 'LOG;CLEAR;OUT;CLEAR;'; OPTIONS LS=100 PS=75 FORMDLIM="="; QUIT; * Loading previously saved data set; DATA NEWCDICDC; SET 'V:final.projectCDICDC'; RUN; * Saving the data as a permanent SAS data set; DATA CDICDC; SET 'V:final.projectCDICDC'; RUN; * To view data in SAS; PROC PRINT DATA = CDICDC; RUN; * SETTING LIBREF; * Saving data as a permanent SAS data set; LIBNAME W2 'V:final.project'; DATA W2.CDICDC; SET CDICDC; RUN; * IMPORT CDC - CDI DATA; PROC IMPORT DATAFILE = 'V:final.projectFilChrDisIndCDC.xls' OUT = T1 REPLACE; RUN; QUIT; * Variable View in SAS; PROC CONTENTS DATA = W2.CDICDC; RUN; * Table 1 Dataset; ODS RTF; PROC PRINT DATA = W2.CDICDC; VAR CATEGORY INDICATOR YEAR MEASURE UNITED_STATES GEORGIA; RUN; ODS RTF CLOSE;
  • 27. * Descriptive Statistics for Quantitative Variables; ODS RTF; PROC MEANS DATA = W2.CDICDC MAXDEC=2 N MEAN MEDIAN STD RANGE MIN MAX; VAR UNITED_STATES GEORGIA; RUN; ODS RTF CLOSE; * Frequency Tables of Category Variables; ODS RTF; PROC FREQ DATA = W2.CDICDC; TABLES CATEGORY INDICATOR MEASURE; RUN; ODS RTF CLOSE; * Histograms and Boxplots; DM 'LOG; CLEAR; OUT; CLEAR;'; PROC UNIVARIATE DATA = W2.CDICDC; VAR UNITED_STATES GEORGIA; HISTOGRAM; RUN; PROC SORT DATA = W2.CDICDC; BY YEAR; PROC BOXPLOT DATA = W2.CDICDC; PLOT UNITED_STATES*YEAR; PLOT GEORGIA*YEAR; RUN; * Boxplot of Occurrences by Category; DM 'LOG; CLEAR; OUT; CLEAR; GRAPH; CLEAR'; PROC SORT DATA = W2.CDICDC; BY CATEGORY; PROC BOXPLOT DATA = W2.CDICDC; PLOT UNITED_STATES*CATEGORY; PLOT GEORGIA*CATEGORY; RUN; * Creating new variable (size) for contingency table analysis; DM 'LOG;CLEAR;OUT;CLEAR'; DATA T1; SET T1; LENGTH US_SIZE $ 7; IF UNITED_STATES < 145 THEN US_SIZE = 'X-Small'; IF (UNITED_STATES GE 145) AND (UNITED_STATES < 300) THEN US_SIZE = 'Small'; IF (UNITED_STATES GE 300) AND (UNITED_STATES < 450) THEN US_SIZE = 'Medium'; IF (UNITED_STATES GE 450) AND (UNITED_STATES < 600) THEN US_SIZE = 'Large'; IF (UNITED_STATES GE 600) THEN US_SIZE = 'X-Large'; SET T1; LENGTH GA_SIZE $ 7; IF GEORGIA < 145 THEN GA_SIZE = 'X-Small'; IF (GEORGIA GE 145) AND (GEORGIA < 300) THEN GA_SIZE = 'Small'; IF (GEORGIA GE 300) AND (GEORGIA < 450) THEN GA_SIZE = 'Medium'; IF (GEORGIA GE 450) AND (GEORGIA < 600) THEN GA_SIZE = 'Large'; IF (GEORGIA GE 600) THEN GA_SIZE = 'X-Large';
  • 28. PROC PRINT DATA = T1; RUN; * Contingency Tables; DM 'LOG;CLEAR;OUT;CLEAR'; ODS RTF; PROC FREQ DATA = T1; TABLES CATEGORY*US_SIZE; RUN; ODS RTF CLOSE; ODS RTF; PROC FREQ DATA = T1; TABLES CATEGORY*GA_SIZE; RUN; ODS RTF CLOSE; * Pie Charts; PROC GCHART DATA = W2.CDICDC; PIE CATEGORY; GOPTIONS HTEXT = 1; LEGEND; RUN; QUIT; PROC GCHART DATA = W2.CDICDC; PIE CATEGORY / SUMVAR = UNITED_STATES PERCENT = INSIDE; GOPTIONS HTEXT = 1; LEGEND; RUN; QUIT; PROC GCHART DATA = W2.CDICDC; PIE CATEGORY / SUMVAR = GEORGIA PERCENT = INSIDE; GOPTIONS HTEXT = 1; LEGEND; RUN; QUIT; * Bar Charts; PROC GCHART DATA = W2.CDICDC; VBAR CATEGORY / TYPE = FREQ; GOPTIONS HTEXT = 1; LEGEND; RUN; PROC GCHART DATA = T1; VBAR US_SIZE / TYPE = FREQ; GOPTIONS HTEXT = 1; LEGEND; RUN; PROC GCHART DATA = T1; VBAR GA_SIZE / TYPE = FREQ; GOPTIONS HTEXT = 1; LEGEND; RUN;
  • 29. * Stacked Bar Charts; PROC GCHART DATA = T1; VBAR CATEGORY / SUBGROUP = US_SIZE; GOPTIONS HTEXT = 1; LEGEND; RUN; PROC GCHART DATA = T1; VBAR CATEGORY / SUBGROUP = GA_SIZE; GOPTIONS HTEXT = 1; LEGEND; RUN; * Generate Random sample set of data with seed to replicate data; DATA CDICDCN; SET W2.CDICDC; GROUP = RANUNI(123456); PROC PRINT DATA = CDICDCN; RUN; * Sort random data to show only the first 20 observations; PROC SORT DATA = CDICDCN; BY GROUP; DATA CDICDCNN; SET CDICDCN; IF _n_ < 21; PROC PRINT DATA = CDICDCNN; RUN; * Confidence Intervals on ratio scale variables; DM 'LOG;CLEAR;OUT;CLEAR;'; ODS RTF; PROC MEANS DATA = CDICDCNN MAXDEC=2 N CLM ALPHA = .05; VAR UNITED_STATES GEORGIA; RUN; PROC MEANS DATA = CDICDCNN MAXDEC=2 N CLM ALPHA = .01; VAR UNITED_STATES GEORGIA; RUN; ODS RTF CLOSE; * Export Data to Minitab; PROC EXPORT OUTFILE = 'V:final.projectFilChrDisIndCDC.csv' DATA = W2.CDICDC REPLACE; RUN; PROC EXPORT OUTFILE = 'V:final.projectFilChrDisIndCDCT1.csv' DATA = T1 REPLACE; RUN; QUIT;