SlideShare ist ein Scribd-Unternehmen logo
1 von 4
Downloaden Sie, um offline zu lesen
Proc Compare to Validate Datasets
                 Angelina Cecilia Casas M.S., PPD Development, Research Triangle Park, NC

                                                                        Variables Summary
                                                                        Number of Variables in Common: 3.
ABSTRACT
Comparison of two datasets is a technique used to know that two         Observation Summary
datasets are equal or if they have discrepancies. This method
can be used to validate that a dataset has been created correctly       Observation              Base   Compare
or that the changes in a dataset are only those that are expected.
I.e. observations were added or removed or corrections were             First Obs                  1           1
done.                                                                   Last Obs                   4           4

                                                                        Number of Observations in Common: 4.
INTRODUCTION                                                            Total Number of Observations Read from
Proc Compare is a procedure that allows two datasets to be              WORK.DEMOG: 4.
compared for properties, number of observations, number of              Total Number of Observations Read from
variables, and properties of the datasets.                              WORK.COMPARE: 4.
For a variable, you can get output about differences in:                Number of Observations with Some Compared Variables
Type, length, formats, informats and label(s).                          Unequal: 0.
For a dataset, you can find differences in:                             Number of Observations with All Compared Variables
date of creation, last modification of the datasets was modified,       Equal: 4.
number of variables and observations of the datasets. You can           NOTE: No unequal values were found. All values
also see the labels of the datasets, but differences are not            compared are exactly equal.
reported for that.
For observations, you can compare the value of the record for           The records in DEMOG and the records in compare have the
each variable. Also, you can decide how different the values of         same order and an ID statement is not needed. However if the
the observations can be.                                                order of the two datasets is not the same, records might be
                                                                        compared incorrectly.
THE DATASETS THAT WILL BE COMPARED.
proc print data=demog noobs;
                                                                        WHEN THE DATASETS HAVE A DIFFERENT
Unique       WEIGHTKG             dob                                   ORDER;
 20120          101.6          08/19/1949                               proc sort data=compare;
 20130           94.3          06/02/1934                               by DOB;
 20110           85.7          09/17/1949                               run;
 20202           99.3          10/17/1931
                                                                        proc compare base=demog (keep=unique weightkg)
                                                                                compare=compare (keep=unique weightkg);
proc contents data=demog;                                               run;
Data Set Name: WORK.DEMOG                 Observations:        4        As in other SAS procedures and data step, it is possible to select
Member Type:   DATA                       Variables: 3
                                                                        observations and variables in the datasets that are going to be
--Alphabetic List of Variables and Attributes--                         used.
                                                                        Dataset        Created           Modified          NVar      Nobs
Variable Type Len Pos Format              Label                         WORK.DEMOG   20JAN03:13:17       20JAN03:13:17         2        4
-----------------------------------------------                         WORK.COMPARE 20JAN03:13:17       20JAN03:13:17         2        4
UNIQUE    Char 200 32              Unique Record
WEIGHTKG Num      8   8 5.1        Weight in kg                         Vars Summary
dob       Num     8 24 MMDDYY10. Date of Birth
                                                                        # of Vars in Common: 2.
proc sql;                                                               Observation Summary
  create table Compare as select * from demog;
quit;                                                                   Observation         Base    Compare

                                                                        # of Obs in Common: 4.
                                                                        Total # of Obs Read from WORK.DEMOG: 4.
                                                                        Total # of Obs Read from WORK.COMPARE: 4.
AN EXAMPLE OF A CLEAN COMPARE OUTPUT
                                                                        # of Obs with Some Compared Vars Unequal: 3.
proc compare base=here.demog compare=compare;                           # of Obs with All Compared Vars Equal: 1.
run;
                                                                        Values Comparison Summary
The COMPARE Procedure                                                   # of Vars Compared with All Obs Equal: 0.
Comparison of WORK.DEMOG with WORK.COMPARE                              # of Vars Compared with Some Obs Unequal: 2.
(Method=EXACT)                                                          Total # of Values which Compare Unequal: 6.
                                                                        Maximum Difference: 15.9.
Data Set Summary
                                                                        All Vars Compared have Unequal Values
Dataset          Created           Modified        Nvar Nobs            Variable   Type    Len     Label            Ndif    MaxDif
                                                                        Unique     CHAR    200     Unique Record       3
WORK.DEMOG   14JAN03:16:03 14JAN03:16:06              3             4   WEIGHTKG   NUM       8     Weight in kg        3    15.900
WORK.COMPARE 14JAN03:16:06 14JAN03:16:06              3             4
                                                                        Value Comparison Results for Vars
______________________________________________________               # of Obs with Some Compared Variables Unequal: 0.
         || Unique Record                                            # of Obs with All Compared Variables Equal: 3.
         || Base Value            Compare Value
     Obs || Unique                 Unique                            NOTE: No unequal values were found. All values compared
  _____ || ___________________+ ___________________+                 are exactly equal.
         ||
      1 || 20120                   20202
      3 || 20110                   20120
      4 || 20202                   20110                             INFORMATION ABOUT VARIABLES OR
_______________________________________________________
_______________________________________________________              OBSERVATIONS THAT ARE IN DIFFERENT IN
         || Weight in kg                                             THE DATASETS BEING COMPARED.
         ||       Base    Compare
     Obs ||   WEIGHTKG   WEIGHTKG      Diff.      % Diff
  _____ || _________ _________ _________ _________                   The output there are no differences, but it might be easy to miss
         ||                                                          the fact that only 3 of the 4 observations were compared. Also,
      1 ||       101.6       99.3    -2.3000     -2.2638
      3 ||        85.7      101.6    15.9000     18.5531             you don’t know which observations were not compared unless
      4 ||        99.3       85.7   -13.6000    -13.6959             you use the LIST or LISTALL option.
_______________________________________________________
                                                                     proc compare data=base
                                                                                  compare=compare
The values of the variable that was to the right of the BASE or                   list;
DATA statement is in the BASE column. The values of dataset            id unique;
listed to right of the COMPARE dataset is in the COMPARE             run;
column.

It is necessary that the two datasets that are going to be
                                                                     Observation 2 in WORK.DEMOG not found in WORK.COMPARE:
compared have the same order UNLESS the order of the                 Unique=20120.
datasets is something that is being tested.
                                                                     Observation 4 in WORK.COMPARE not found in WORK.DEMOG:
What happens when the order of the observations in each of the       Unique=34343.
datasets is different?
                                                                     The LIST option will list the observations and variables that exist
proc sql;                                                            in one dataset and are missing in the other. There are other
  update compare set unique='34343' where                            options, LISTALL, LISTBASE, LISTBASEOBS, LISTBASEVAR,
unique='20202';
                                                                     LISTCOMP, LISTCMPOBS, LISTCOMPVAR, LISTOBS, AND
quit;
                                                                     LISTVAR, to limit this report to one or the other dataset and only
proc sort data=compare;                                              variables and / or observations. LISTEQUALVAR will also list
  by unique;                                                         the variables that are not used in the ID variable list for which all
run;                                                                 values are equal.

proc compare base=demog(keep = unique weightkg)
        compare=compare(keep = unique weightkg);
run;                                                                 LIMITING THE PRINTED OUTPUT
_______________________________________________________               Sometimes it is not necessary to know all the observations that
         || Unique Record
         || Base Value            Compare Value                      have an error, once it is known that an error exists, it might not be
     Obs || Unique                 Unique                            needed to know all the discrepancies.
_______ || ___________________+ ___________________+
         ||                                                          It might be desirable to limit the size of the output; for that, the
      2 || 20120                   20130
      3 || 20130                   20202                             option MAXPRINT is very useful.
      4 || 20202                   34343
_______________________________________________________              First it is necessary to use the original dataset and modify the
                                                                     variable WEIGHTKG to have something to report.
Etcetera
                                                                     For example, let’s change the value of one of the variables in the
                                                                     ID statement from ‘34343’ to ‘20120’.
Even though, there is only one discrepancy (UNIQUE) in the
datasets, several discrepancies are reported. It is necessary to     proc sql;
specify which observations should be compared .                        update compare set unique='20120'
The option that tells proc compare, which variables should be          where unique='34343';
together, is ID. After it, you should list the number of variables
that define which observation in each of the datasets should be        update compare set weightkg=int(weightkg);
compared. This is similar to the way datasets are merged using
a by statement.                                                      quit;

proc compare base=demog compare=compare;                             proc sort data=compare;
  id unique;                                                           by unique;
run;                                                                 run;

                                                                     proc compare data=demog
# of Obs in Common: 3.                                                            compare=compare
                                                                                  maxprint=(4,2);
# of Obs in WORK.DEMOG but not in WORK.COMPARE: 1.                     id unique ;
# of Obs in WORK.COMPARE but not in WORK.DEMOG: 1.
                                                                     run;
Total # of Obs Read from WORK.DEMOG: 4.
Total # of Obs Read from WORK.COMPARE: 4.                            At the most, 4 differences will be printed, 2 for every variable that
has discrepancies.                                                    var age;
                                                                      with nage;
                                                                    run;
Variable   Type      Len   Label          Ndif    MaxDif

WEIGHTKG   NUM         8   Weight in kg       4    0.700            # of Vars in Common: 3.
_______________________________________________________             # of Vars in WORK.DEMOG but not in WORK.COMPARE:
         || Weight in kg                                            1.
         ||       Base    Compare
Unique   ||   WEIGHTKG   WEIGHTKG      Diff.     % Diff
_______ || _________ _________ _________ _________                  # of Vars in WORK.COMPARE but not in WORK.DEMOG:
         ||                                                         1.
20110    ||       85.7       85.0    -0.7000    -0.8168
20120    ||      101.6      101.0    -0.6000    -0.5906             # of ID Vars: 1.
                                                                    # of VAR Statement Vars: 1.
 NOTE: The MAXPRINT=2 printing limit has been reached               # of WITH Statement Vars: 1.
for the variable WEIGHTKG.

No more values will be printed for this comparison.                 Sometimes, there are differences in the datasets that are not
                                                                    important for the purpose of the comparison. For this scenario, it
                                                                    is possible to give a value for which only observations that are
                                                                    bigger than this number will be marked as a difference.
PRINTING RELATED INFORMATION
TOGETHER.                                                           proc sql;
                                                                      update compare set nage=nage+0.001;
                                                                    quit;
Sometimes it is desired to see the information with discrepancies
grouped together by the variables in the ID statement.              proc compare data=demog
                                                                                 compare=compare
proc compare data=demog                                                          criterion=.01;
             compare=compare                                          var age;
                  transpose                                           with nage;
  ;                                                                 run;
  id unique ;
run;
                                                                     The COMPARE Procedure
Unique=20110:                                                        Comparison of WORK.DEMOG with WORK.COMPARE
Variable Base Value    Compare           Diff.          % Diff       (Method=RELATIVE(2.21E-12), Criterion=0.01)
WEIGHTKG       85.7       85.0        -0.700000      -0.816803
     dob 09/17/1949 09/14/1949        -3.000000       0.079830       Variables Summary

Unique=20120:                                                        Number of Variables in Common: 3.
Variable Base Value    Compare            Diff.         % Diff       Number of Variables in WORK.DEMOG but not in
WEIGHTKG      101.6      101.0        -0.600000      -0.590551      WORK.COMPARE: 1.
     dob 08/19/1949 08/16/1949        -3.000000       0.079218       Number of Variables in WORK.COMPARE but not in
                                                                    WORK.DEMOG: 1.
Unique=20130:                                                        Number of VAR Statement Variables: 1.
Variable Base Value    Compare            Diff.         % Diff       Number of WITH Statement Variables: 1.
WEIGHTKG       94.3       94.0        -0.300000      -0.318134
     dob 06/02/1934 05/30/1934        -3.000000       0.032106
                                                                     Number of Observations in Common: 4.
                                                                     Total Number of Observations Read from WORK.DEMOG: 4.
Unique=20202:                                                        Total Number of Observations Read from WORK.COMPARE:
Variable Base Value    Compare           Diff.         % Diff       4.
WEIGHTKG       99.3       99.0        -0.300000      -0.302115
     dob 10/17/1931 10/14/1931        -3.000000       0.029118       Number of Observations with Some Compared Variables
                                                                    Unequal: 0.
                                                                     Number of Observations with All Compared Variables
COMPARING VARIABLES WITH DIFFERENT                                  Equal: 4.

NAMES.
Sometimes, it is necessary to compare datasets when it is known      Values Comparison Summary
that the variable names are different yet they should have the       Number of Variables Compared with All Observations
same values.                                                        Equal: 1.
                                                                     Number of Variables Compared with Some Observations
Proc sql;                                                           Unequal: 0.
                                                                     Total Number of Values which Compare Unequal: 0.
  alter table compare                                                Total Number of Values not EXACTLY Equal: 4.
                                                                     Maximum Difference Criterion Value: 0.000018868.
  add nage num label "Numeric age";

  alter table demog
  add age num label "New Numeric age";                              CREATING AN OUTPUT DATASET
                                                                    Sometimes, it is desirable to create an output dataset that
  update demog   set                                                contains only the equalities or discrepancies. Here, there is an
  age=int((date()-dob)/365.25);                                     example to create a dataset that has the differences.
                                                                    The outnoequal option, specifies that only the observations with
  update compare set                                                discrepancies will exist in the dataset toprint.
  nage=int((date()-dob)/365.25);
quit;                                                               proc compare data=demog
                                                                                 compare=compare
proc compare data=demog                                                          outnoequal
               compare=compare;                                                  out=toprint
  id unique;                                                           ;
id unique ;                                                                   Obs   Unique   _OBS_    _NAME_      _LABEL_         BASE COMPARE
run;                                                                            1     20110      1     WEIGHTKG   Weight in   kg    85.7     85
                                                                                3     20120      2     WEIGHTKG   Weight in   kg   101.6    101
Proc Print data=toprint;                                                        5     20130      3     WEIGHTKG   Weight in   kg    94.3     94
                                                                                7     20202      4     WEIGHTKG   Weight in   kg    99.3     99
run;

Variables with Unequal Values

    Variable     Type        Len     Label             Ndif    MaxDif
                                                                                CONCLUSION
    WEIGHTKG     NUM           8     Weight in kg          4    0.700           You are the person who decides what is important to validate,
    Value Comparison Results for Variables                                      values of the observations and how different can they be. Is it
                                                                                important to have the same labels, formats and informats in a
_______________________________________________________                         variable. Proc Compare is a tool that is flexible to allow you find
           ||Weight in kg                                                       the differences that are important to you.
           ||     Base     Compare
Unique     || WEIGHTKG    WEIGHTKG     Diff.     % Diff
_________ || ________     ________ ________   _________                         ACKNOWLEDGMENTS
           ||
20110      ||    85.7        85.0   -0.7000    -0.8168                          I want to thank, Craig Mauldin, Andy Barnett, George Clark,
20120      ||   101.6       101.0   -0.6000    -0.5906                          Bonnie Duncan and Kim Sturgen for their comments and support.
20130      ||    94.3        94.0   -0.3000    -0.3181
20202      ||    99.3        99.0   -0.3000    -0.3021
______________________________________________________                          CONTACT INFORMATION
OUTPUT OF PROC PRINT                                                            Your comments and questions are valued and encouraged.
                                                                                Contact the author at:
Obs     _TYPE_          _OBS_        Unique     WEIGHTKG                dob

1       DIF              1           20110          -0.7                    E             A. Cecilia Casas
2       DIF              2           20120          -0.6                    E             PPD Development
3       DIF              3           20130          -0.3                    E             3900 Paramount Parkway
4       DIF              4           20202          -0.3                    E
                                                                                          Morrisville, NC 27560
It is possible to use the noprint option, but then it is strongly                         Phone: (919) 462-4199
recommended to use the options OUTCOMP AND OUTBASE to                                     Fax: (919) 379-6151
differentiate the variables.                                                              Email: Cecilia.Casas@rtp.ppdi.com

proc compare data=demog
             compare=compare                                                    SAS and all other SAS Institute Inc. product or service names are
             outnoequal                                                         registered trademarks or trademarks of SAS Institute Inc. in the
             out=toprint
             noprint                                                            USA and other countries. ® indicates USA registration.
             outcomp
             outbase;                                                           Other brand and product names are trademarks of their
  id unique ;                                                                   respective companies.
run;

proc print data=toprint;
run;


Obs     _TYPE_          _OBS_ Unique          WEIGHTKG                dob
1       BASE             1    20110             85.7           09/17/1949
2       COMPARE          1         20110        85.0           09/17/1949
3       BASE             2         20120       101.6           08/19/1949
4       COMPARE          2         20120       101.0           08/19/1949
5       BASE             3         20130        94.3           06/02/1934
6       COMPARE          3         20130        94.0           06/02/1934
7       BASE             4         20202        99.3           10/17/1931
8       COMPARE          4         20202        99.0           10/17/1931




A USEFUL DATASET TO REPORT
To get only the observations and variables that have
discrepancies, proc transpose can be a big help in reporting the
findings.

proc transpose data=toprint out=transp;
  by unique _obs_;
  id _type_;
run;

proc print data=transp(where=(base^=compare));
run;

Weitere ähnliche Inhalte

Ähnlich wie Proc Compare to validate datasets

Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Brian Brazil
 
Bootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph EmbeddingBootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph EmbeddingNanjing University
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsKhalid Belhajjame
 
Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkIOSR Journals
 
Introductory Exercise: Establish extent of precarious employment in EU countr...
Introductory Exercise: Establish extent of precarious employment in EU countr...Introductory Exercise: Establish extent of precarious employment in EU countr...
Introductory Exercise: Establish extent of precarious employment in EU countr...Arhiv družboslovnih podatkov
 
Big Data Processing using a AWS Dataset
Big Data Processing using a AWS DatasetBig Data Processing using a AWS Dataset
Big Data Processing using a AWS DatasetVishva Abeyrathne
 
How to Speed Up Your Validation Process Without Really Trying?
How to Speed Up Your Validation Process Without Really Trying?How to Speed Up Your Validation Process Without Really Trying?
How to Speed Up Your Validation Process Without Really Trying?alice_m_cheng
 
Sql Interview Questions
Sql Interview QuestionsSql Interview Questions
Sql Interview Questionsarjundwh
 
153680 sqlinterview
153680  sqlinterview153680  sqlinterview
153680 sqlinterviewzdsgsgdf
 
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdfMukundThakur22
 
Record matching
Record matchingRecord matching
Record matchingNishna Ma
 
BI Portfolio
BI PortfolioBI Portfolio
BI Portfoliopstaerker
 
.NET Core, ASP.NET Core Course, Session 15
.NET Core, ASP.NET Core Course, Session 15.NET Core, ASP.NET Core Course, Session 15
.NET Core, ASP.NET Core Course, Session 15aminmesbahi
 
Set and Merge
Set and MergeSet and Merge
Set and Mergevenkatam
 
Database Change Management | Change Manager 5.1 Beta
Database Change Management | Change Manager 5.1 BetaDatabase Change Management | Change Manager 5.1 Beta
Database Change Management | Change Manager 5.1 BetaMichael Findling
 
VARIATIONS IN OUTCOME FOR THE SAME MAP REDUCE TRANSITIVE CLOSURE ALGORITHM IM...
VARIATIONS IN OUTCOME FOR THE SAME MAP REDUCE TRANSITIVE CLOSURE ALGORITHM IM...VARIATIONS IN OUTCOME FOR THE SAME MAP REDUCE TRANSITIVE CLOSURE ALGORITHM IM...
VARIATIONS IN OUTCOME FOR THE SAME MAP REDUCE TRANSITIVE CLOSURE ALGORITHM IM...AIRCC Publishing Corporation
 
Variations in Outcome for the Same Map Reduce Transitive Closure Algorithm Im...
Variations in Outcome for the Same Map Reduce Transitive Closure Algorithm Im...Variations in Outcome for the Same Map Reduce Transitive Closure Algorithm Im...
Variations in Outcome for the Same Map Reduce Transitive Closure Algorithm Im...AIRCC Publishing Corporation
 

Ähnlich wie Proc Compare to validate datasets (20)

Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)
 
Bootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph EmbeddingBootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph Embedding
 
Detecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow ResultsDetecting Duplicate Records in Scientific Workflow Results
Detecting Duplicate Records in Scientific Workflow Results
 
Efficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl FrameworkEfficient Record De-Duplication Identifying Using Febrl Framework
Efficient Record De-Duplication Identifying Using Febrl Framework
 
Introductory Exercise: Establish extent of precarious employment in EU countr...
Introductory Exercise: Establish extent of precarious employment in EU countr...Introductory Exercise: Establish extent of precarious employment in EU countr...
Introductory Exercise: Establish extent of precarious employment in EU countr...
 
Big Data Processing using a AWS Dataset
Big Data Processing using a AWS DatasetBig Data Processing using a AWS Dataset
Big Data Processing using a AWS Dataset
 
How to Speed Up Your Validation Process Without Really Trying?
How to Speed Up Your Validation Process Without Really Trying?How to Speed Up Your Validation Process Without Really Trying?
How to Speed Up Your Validation Process Without Really Trying?
 
Sql
SqlSql
Sql
 
Sql Interview Questions
Sql Interview QuestionsSql Interview Questions
Sql Interview Questions
 
Sql
SqlSql
Sql
 
Sql
SqlSql
Sql
 
153680 sqlinterview
153680  sqlinterview153680  sqlinterview
153680 sqlinterview
 
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
[ACNA2022] Hadoop Vectored IO_ your data just got faster!.pdf
 
Record matching
Record matchingRecord matching
Record matching
 
BI Portfolio
BI PortfolioBI Portfolio
BI Portfolio
 
.NET Core, ASP.NET Core Course, Session 15
.NET Core, ASP.NET Core Course, Session 15.NET Core, ASP.NET Core Course, Session 15
.NET Core, ASP.NET Core Course, Session 15
 
Set and Merge
Set and MergeSet and Merge
Set and Merge
 
Database Change Management | Change Manager 5.1 Beta
Database Change Management | Change Manager 5.1 BetaDatabase Change Management | Change Manager 5.1 Beta
Database Change Management | Change Manager 5.1 Beta
 
VARIATIONS IN OUTCOME FOR THE SAME MAP REDUCE TRANSITIVE CLOSURE ALGORITHM IM...
VARIATIONS IN OUTCOME FOR THE SAME MAP REDUCE TRANSITIVE CLOSURE ALGORITHM IM...VARIATIONS IN OUTCOME FOR THE SAME MAP REDUCE TRANSITIVE CLOSURE ALGORITHM IM...
VARIATIONS IN OUTCOME FOR THE SAME MAP REDUCE TRANSITIVE CLOSURE ALGORITHM IM...
 
Variations in Outcome for the Same Map Reduce Transitive Closure Algorithm Im...
Variations in Outcome for the Same Map Reduce Transitive Closure Algorithm Im...Variations in Outcome for the Same Map Reduce Transitive Closure Algorithm Im...
Variations in Outcome for the Same Map Reduce Transitive Closure Algorithm Im...
 

Kürzlich hochgeladen

call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morvikas rana
 
CALL ON ➥8923113531 🔝Call Girls Jankipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Jankipuram Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Jankipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Jankipuram Lucknow best sexual serviceanilsa9823
 
CALL ON ➥8923113531 🔝Call Girls Adil Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Adil Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Adil Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Adil Nagar Lucknow best Female serviceanilsa9823
 
Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666nishakur201
 
Pokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy TheoryPokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy Theorydrae5
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfpastor83
 
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual serviceanilsa9823
 
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)Delhi Call girls
 
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,dollysharma2066
 
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)Delhi Call girls
 
call girls in candolim beach 9870370636] NORTH GOA ..
call girls in candolim beach 9870370636] NORTH GOA ..call girls in candolim beach 9870370636] NORTH GOA ..
call girls in candolim beach 9870370636] NORTH GOA ..nishakur201
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girlsPooja Nehwal
 
The Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushThe Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushShivain97
 
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)Delhi Call girls
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Lilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptxLilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptxABMWeaklings
 
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...anilsa9823
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...PsychicRuben LoveSpells
 

Kürzlich hochgeladen (20)

(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
(Anamika) VIP Call Girls Navi Mumbai Call Now 8250077686 Navi Mumbai Escorts ...
 
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Morcall Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
call Now 9811711561 Cash Payment乂 Call Girls in Dwarka Mor
 
CALL ON ➥8923113531 🔝Call Girls Jankipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Jankipuram Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Jankipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Jankipuram Lucknow best sexual service
 
CALL ON ➥8923113531 🔝Call Girls Adil Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Adil Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Adil Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Adil Nagar Lucknow best Female service
 
Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666
 
Pokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy TheoryPokemon Go... Unraveling the Conspiracy Theory
Pokemon Go... Unraveling the Conspiracy Theory
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdf
 
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
 
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Mukherjee Nagar (Delhi)
 
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
 
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Palam (Delhi)
 
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
(Aarini) Russian Call Girls Surat Call Now 8250077686 Surat Escorts 24x7
 
call girls in candolim beach 9870370636] NORTH GOA ..
call girls in candolim beach 9870370636] NORTH GOA ..call girls in candolim beach 9870370636] NORTH GOA ..
call girls in candolim beach 9870370636] NORTH GOA ..
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
 
The Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushThe Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by Mindbrush
 
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Dashrath Puri (Delhi)
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Lilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptxLilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptx
 
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
 

Proc Compare to validate datasets

  • 1. Proc Compare to Validate Datasets Angelina Cecilia Casas M.S., PPD Development, Research Triangle Park, NC Variables Summary Number of Variables in Common: 3. ABSTRACT Comparison of two datasets is a technique used to know that two Observation Summary datasets are equal or if they have discrepancies. This method can be used to validate that a dataset has been created correctly Observation Base Compare or that the changes in a dataset are only those that are expected. I.e. observations were added or removed or corrections were First Obs 1 1 done. Last Obs 4 4 Number of Observations in Common: 4. INTRODUCTION Total Number of Observations Read from Proc Compare is a procedure that allows two datasets to be WORK.DEMOG: 4. compared for properties, number of observations, number of Total Number of Observations Read from variables, and properties of the datasets. WORK.COMPARE: 4. For a variable, you can get output about differences in: Number of Observations with Some Compared Variables Type, length, formats, informats and label(s). Unequal: 0. For a dataset, you can find differences in: Number of Observations with All Compared Variables date of creation, last modification of the datasets was modified, Equal: 4. number of variables and observations of the datasets. You can NOTE: No unequal values were found. All values also see the labels of the datasets, but differences are not compared are exactly equal. reported for that. For observations, you can compare the value of the record for The records in DEMOG and the records in compare have the each variable. Also, you can decide how different the values of same order and an ID statement is not needed. However if the the observations can be. order of the two datasets is not the same, records might be compared incorrectly. THE DATASETS THAT WILL BE COMPARED. proc print data=demog noobs; WHEN THE DATASETS HAVE A DIFFERENT Unique WEIGHTKG dob ORDER; 20120 101.6 08/19/1949 proc sort data=compare; 20130 94.3 06/02/1934 by DOB; 20110 85.7 09/17/1949 run; 20202 99.3 10/17/1931 proc compare base=demog (keep=unique weightkg) compare=compare (keep=unique weightkg); proc contents data=demog; run; Data Set Name: WORK.DEMOG Observations: 4 As in other SAS procedures and data step, it is possible to select Member Type: DATA Variables: 3 observations and variables in the datasets that are going to be --Alphabetic List of Variables and Attributes-- used. Dataset Created Modified NVar Nobs Variable Type Len Pos Format Label WORK.DEMOG 20JAN03:13:17 20JAN03:13:17 2 4 ----------------------------------------------- WORK.COMPARE 20JAN03:13:17 20JAN03:13:17 2 4 UNIQUE Char 200 32 Unique Record WEIGHTKG Num 8 8 5.1 Weight in kg Vars Summary dob Num 8 24 MMDDYY10. Date of Birth # of Vars in Common: 2. proc sql; Observation Summary create table Compare as select * from demog; quit; Observation Base Compare # of Obs in Common: 4. Total # of Obs Read from WORK.DEMOG: 4. Total # of Obs Read from WORK.COMPARE: 4. AN EXAMPLE OF A CLEAN COMPARE OUTPUT # of Obs with Some Compared Vars Unequal: 3. proc compare base=here.demog compare=compare; # of Obs with All Compared Vars Equal: 1. run; Values Comparison Summary The COMPARE Procedure # of Vars Compared with All Obs Equal: 0. Comparison of WORK.DEMOG with WORK.COMPARE # of Vars Compared with Some Obs Unequal: 2. (Method=EXACT) Total # of Values which Compare Unequal: 6. Maximum Difference: 15.9. Data Set Summary All Vars Compared have Unequal Values Dataset Created Modified Nvar Nobs Variable Type Len Label Ndif MaxDif Unique CHAR 200 Unique Record 3 WORK.DEMOG 14JAN03:16:03 14JAN03:16:06 3 4 WEIGHTKG NUM 8 Weight in kg 3 15.900 WORK.COMPARE 14JAN03:16:06 14JAN03:16:06 3 4 Value Comparison Results for Vars
  • 2. ______________________________________________________ # of Obs with Some Compared Variables Unequal: 0. || Unique Record # of Obs with All Compared Variables Equal: 3. || Base Value Compare Value Obs || Unique Unique NOTE: No unequal values were found. All values compared _____ || ___________________+ ___________________+ are exactly equal. || 1 || 20120 20202 3 || 20110 20120 4 || 20202 20110 INFORMATION ABOUT VARIABLES OR _______________________________________________________ _______________________________________________________ OBSERVATIONS THAT ARE IN DIFFERENT IN || Weight in kg THE DATASETS BEING COMPARED. || Base Compare Obs || WEIGHTKG WEIGHTKG Diff. % Diff _____ || _________ _________ _________ _________ The output there are no differences, but it might be easy to miss || the fact that only 3 of the 4 observations were compared. Also, 1 || 101.6 99.3 -2.3000 -2.2638 3 || 85.7 101.6 15.9000 18.5531 you don’t know which observations were not compared unless 4 || 99.3 85.7 -13.6000 -13.6959 you use the LIST or LISTALL option. _______________________________________________________ proc compare data=base compare=compare The values of the variable that was to the right of the BASE or list; DATA statement is in the BASE column. The values of dataset id unique; listed to right of the COMPARE dataset is in the COMPARE run; column. It is necessary that the two datasets that are going to be Observation 2 in WORK.DEMOG not found in WORK.COMPARE: compared have the same order UNLESS the order of the Unique=20120. datasets is something that is being tested. Observation 4 in WORK.COMPARE not found in WORK.DEMOG: What happens when the order of the observations in each of the Unique=34343. datasets is different? The LIST option will list the observations and variables that exist proc sql; in one dataset and are missing in the other. There are other update compare set unique='34343' where options, LISTALL, LISTBASE, LISTBASEOBS, LISTBASEVAR, unique='20202'; LISTCOMP, LISTCMPOBS, LISTCOMPVAR, LISTOBS, AND quit; LISTVAR, to limit this report to one or the other dataset and only proc sort data=compare; variables and / or observations. LISTEQUALVAR will also list by unique; the variables that are not used in the ID variable list for which all run; values are equal. proc compare base=demog(keep = unique weightkg) compare=compare(keep = unique weightkg); run; LIMITING THE PRINTED OUTPUT _______________________________________________________ Sometimes it is not necessary to know all the observations that || Unique Record || Base Value Compare Value have an error, once it is known that an error exists, it might not be Obs || Unique Unique needed to know all the discrepancies. _______ || ___________________+ ___________________+ || It might be desirable to limit the size of the output; for that, the 2 || 20120 20130 3 || 20130 20202 option MAXPRINT is very useful. 4 || 20202 34343 _______________________________________________________ First it is necessary to use the original dataset and modify the variable WEIGHTKG to have something to report. Etcetera For example, let’s change the value of one of the variables in the ID statement from ‘34343’ to ‘20120’. Even though, there is only one discrepancy (UNIQUE) in the datasets, several discrepancies are reported. It is necessary to proc sql; specify which observations should be compared . update compare set unique='20120' The option that tells proc compare, which variables should be where unique='34343'; together, is ID. After it, you should list the number of variables that define which observation in each of the datasets should be update compare set weightkg=int(weightkg); compared. This is similar to the way datasets are merged using a by statement. quit; proc compare base=demog compare=compare; proc sort data=compare; id unique; by unique; run; run; proc compare data=demog # of Obs in Common: 3. compare=compare maxprint=(4,2); # of Obs in WORK.DEMOG but not in WORK.COMPARE: 1. id unique ; # of Obs in WORK.COMPARE but not in WORK.DEMOG: 1. run; Total # of Obs Read from WORK.DEMOG: 4. Total # of Obs Read from WORK.COMPARE: 4. At the most, 4 differences will be printed, 2 for every variable that
  • 3. has discrepancies. var age; with nage; run; Variable Type Len Label Ndif MaxDif WEIGHTKG NUM 8 Weight in kg 4 0.700 # of Vars in Common: 3. _______________________________________________________ # of Vars in WORK.DEMOG but not in WORK.COMPARE: || Weight in kg 1. || Base Compare Unique || WEIGHTKG WEIGHTKG Diff. % Diff _______ || _________ _________ _________ _________ # of Vars in WORK.COMPARE but not in WORK.DEMOG: || 1. 20110 || 85.7 85.0 -0.7000 -0.8168 20120 || 101.6 101.0 -0.6000 -0.5906 # of ID Vars: 1. # of VAR Statement Vars: 1. NOTE: The MAXPRINT=2 printing limit has been reached # of WITH Statement Vars: 1. for the variable WEIGHTKG. No more values will be printed for this comparison. Sometimes, there are differences in the datasets that are not important for the purpose of the comparison. For this scenario, it is possible to give a value for which only observations that are bigger than this number will be marked as a difference. PRINTING RELATED INFORMATION TOGETHER. proc sql; update compare set nage=nage+0.001; quit; Sometimes it is desired to see the information with discrepancies grouped together by the variables in the ID statement. proc compare data=demog compare=compare proc compare data=demog criterion=.01; compare=compare var age; transpose with nage; ; run; id unique ; run; The COMPARE Procedure Unique=20110: Comparison of WORK.DEMOG with WORK.COMPARE Variable Base Value Compare Diff. % Diff (Method=RELATIVE(2.21E-12), Criterion=0.01) WEIGHTKG 85.7 85.0 -0.700000 -0.816803 dob 09/17/1949 09/14/1949 -3.000000 0.079830 Variables Summary Unique=20120: Number of Variables in Common: 3. Variable Base Value Compare Diff. % Diff Number of Variables in WORK.DEMOG but not in WEIGHTKG 101.6 101.0 -0.600000 -0.590551 WORK.COMPARE: 1. dob 08/19/1949 08/16/1949 -3.000000 0.079218 Number of Variables in WORK.COMPARE but not in WORK.DEMOG: 1. Unique=20130: Number of VAR Statement Variables: 1. Variable Base Value Compare Diff. % Diff Number of WITH Statement Variables: 1. WEIGHTKG 94.3 94.0 -0.300000 -0.318134 dob 06/02/1934 05/30/1934 -3.000000 0.032106 Number of Observations in Common: 4. Total Number of Observations Read from WORK.DEMOG: 4. Unique=20202: Total Number of Observations Read from WORK.COMPARE: Variable Base Value Compare Diff. % Diff 4. WEIGHTKG 99.3 99.0 -0.300000 -0.302115 dob 10/17/1931 10/14/1931 -3.000000 0.029118 Number of Observations with Some Compared Variables Unequal: 0. Number of Observations with All Compared Variables COMPARING VARIABLES WITH DIFFERENT Equal: 4. NAMES. Sometimes, it is necessary to compare datasets when it is known Values Comparison Summary that the variable names are different yet they should have the Number of Variables Compared with All Observations same values. Equal: 1. Number of Variables Compared with Some Observations Proc sql; Unequal: 0. Total Number of Values which Compare Unequal: 0. alter table compare Total Number of Values not EXACTLY Equal: 4. Maximum Difference Criterion Value: 0.000018868. add nage num label "Numeric age"; alter table demog add age num label "New Numeric age"; CREATING AN OUTPUT DATASET Sometimes, it is desirable to create an output dataset that update demog set contains only the equalities or discrepancies. Here, there is an age=int((date()-dob)/365.25); example to create a dataset that has the differences. The outnoequal option, specifies that only the observations with update compare set discrepancies will exist in the dataset toprint. nage=int((date()-dob)/365.25); quit; proc compare data=demog compare=compare proc compare data=demog outnoequal compare=compare; out=toprint id unique; ;
  • 4. id unique ; Obs Unique _OBS_ _NAME_ _LABEL_ BASE COMPARE run; 1 20110 1 WEIGHTKG Weight in kg 85.7 85 3 20120 2 WEIGHTKG Weight in kg 101.6 101 Proc Print data=toprint; 5 20130 3 WEIGHTKG Weight in kg 94.3 94 7 20202 4 WEIGHTKG Weight in kg 99.3 99 run; Variables with Unequal Values Variable Type Len Label Ndif MaxDif CONCLUSION WEIGHTKG NUM 8 Weight in kg 4 0.700 You are the person who decides what is important to validate, Value Comparison Results for Variables values of the observations and how different can they be. Is it important to have the same labels, formats and informats in a _______________________________________________________ variable. Proc Compare is a tool that is flexible to allow you find ||Weight in kg the differences that are important to you. || Base Compare Unique || WEIGHTKG WEIGHTKG Diff. % Diff _________ || ________ ________ ________ _________ ACKNOWLEDGMENTS || 20110 || 85.7 85.0 -0.7000 -0.8168 I want to thank, Craig Mauldin, Andy Barnett, George Clark, 20120 || 101.6 101.0 -0.6000 -0.5906 Bonnie Duncan and Kim Sturgen for their comments and support. 20130 || 94.3 94.0 -0.3000 -0.3181 20202 || 99.3 99.0 -0.3000 -0.3021 ______________________________________________________ CONTACT INFORMATION OUTPUT OF PROC PRINT Your comments and questions are valued and encouraged. Contact the author at: Obs _TYPE_ _OBS_ Unique WEIGHTKG dob 1 DIF 1 20110 -0.7 E A. Cecilia Casas 2 DIF 2 20120 -0.6 E PPD Development 3 DIF 3 20130 -0.3 E 3900 Paramount Parkway 4 DIF 4 20202 -0.3 E Morrisville, NC 27560 It is possible to use the noprint option, but then it is strongly Phone: (919) 462-4199 recommended to use the options OUTCOMP AND OUTBASE to Fax: (919) 379-6151 differentiate the variables. Email: Cecilia.Casas@rtp.ppdi.com proc compare data=demog compare=compare SAS and all other SAS Institute Inc. product or service names are outnoequal registered trademarks or trademarks of SAS Institute Inc. in the out=toprint noprint USA and other countries. ® indicates USA registration. outcomp outbase; Other brand and product names are trademarks of their id unique ; respective companies. run; proc print data=toprint; run; Obs _TYPE_ _OBS_ Unique WEIGHTKG dob 1 BASE 1 20110 85.7 09/17/1949 2 COMPARE 1 20110 85.0 09/17/1949 3 BASE 2 20120 101.6 08/19/1949 4 COMPARE 2 20120 101.0 08/19/1949 5 BASE 3 20130 94.3 06/02/1934 6 COMPARE 3 20130 94.0 06/02/1934 7 BASE 4 20202 99.3 10/17/1931 8 COMPARE 4 20202 99.0 10/17/1931 A USEFUL DATASET TO REPORT To get only the observations and variables that have discrepancies, proc transpose can be a big help in reporting the findings. proc transpose data=toprint out=transp; by unique _obs_; id _type_; run; proc print data=transp(where=(base^=compare)); run;