SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Comparing SAS Files
Laura Schild
Consider using PROC COMPARE the
next time you need to…
• prepare to combine two data sets, so you know
what variables need to be reformatted, etc.
• evaluate newly collected data in comparison to
an existing file (ex. CRANE Study / Healthcore
files)
• test whether program revisions have occurred as
expected
• examine whether two algorithms for computing
certain variables produce comparable results
PROC COMPARE with No Options
proc compare
base = TrackingSystem_BR_201112
compare = TrackingSystem_BR_201201
;
title1 'PROC COMPARE with No Options';
run;
PROC COMPARE with No Options
(continued)
PROC COMPARE with No Options
(continued)
PROC COMPARE with No Options
(continued)
PROC COMPARE with No Options
(continued)
PROC COMPARE with No Options
(continued)
Comparing Contents Only
When all you really care about is the contents, add the NoValues and
ListVar options.
• NOVALUES suppresses the report of the value comparison results.
• NOSUMMARY suppresses the data set, variable, observation, and values
comparison summary reports.
• LISTVAR lists all variables that are found in only one data set.
• WARNING displays a warning message in the SAS log when differences are found.
proc compare novalues nosummary listvar warning
base = TrackingSystem_BR_201112
compare = TrackingSystem_BR_201201
;
title1 'PROC COMPARE with NoValues, NoSummary and
ListVar Options';
run;
Comparing Contents Only
(continued)
Comparing Contents Only
(continued)
Comparing Contents Only
(continued)
The Warning Option
Note: The NoValues and NoSummary options suppress printing
reports, but SAS still compares the records. Consequently,
when you use the Warning option, you will still get warnings
even if the contents are identical.
PROC COMPARE Identical Files
with Warning Option
*-------------------------------------------------------------------*;
* PROC COMPARE Identical Files with Warning Option *;
*-------------------------------------------------------------------*;
data TrackingSystem_BR_201112_Copy;
set TrackingSystem_BR_201112;
run;
options pageno=1;
proc compare warning
base = TrackingSystem_BR_201112
compare = TrackingSystem_BR_201112_Copy
;
title1 'PROC COMPARE Identical Files with Warning Option';
run;
PROC COMPARE Identical Files
with Warning Option (output)
PROC COMPARE Identical Files
Sorted Differently
*-------------------------------------------------------------------*;
* PROC COMPARE with Warning Option *;
* - Identical Files Sorted Differently *;
*-------------------------------------------------------------------*;
proc sort data=TrackingSystem_BR_201112; by MemberNumber;
run;
proc sort data=TrackingSystem_BR_201112_copy; by LastName;
run;
options pageno=1;
proc compare warning
base = TrackingSystem_BR_201112
compare = TrackingSystem_BR_201112_copy
;
title1 'PROC COMPARE with Warning Option - Identical Files Sorted
Differently';
run;
PROC COMPARE Identical Files
Sorted Differently (Log)
56
57 options pageno=1;
58 proc compare warning
59 base = TrackingSystem_BR_201112
60 compare = TrackingSystem_BR_201112_copy
61 ;
62 title1 'PROC COMPARE with Warning Option - Identical Files Sorted Differently';
63 run;
WARNING: Values of the following 18 variables compare unequal: MemberNumber FirstName LastName PhoneNumber Gender BirthDate
StreetAddress StreetAddress2 City ZipCode NumberOfMedications_BR DrugName1_BR DrugName2_BR PrescriptionNumber1_BR
PrescriptionNumber2_BR TransferNumberLive Schedule
TransferNumberScript
WARNING: The data sets WORK.TRACKINGSYSTEM_BR_201112 and WORK.TRACKINGSYSTEM_BR_201112_COPY contain unequal
values.
NOTE: There were 90 observations read from the data set WORK.TRACKINGSYSTEM_BR_201112.
NOTE: There were 90 observations read from the data set WORK.TRACKINGSYSTEM_BR_201112_COPY.
NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format.
NOTE: PROCEDURE COMPARE used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
PROC COMPARE Identical Files
Sorted Differently (Output)
PROC COMPARE Identical Files
Sorted Differently (Output, page 2)
PROC COMPARE Identical Files
Sorted Differently (Output, page 19)
• For each variable, you get a list similar to the following. Full
output not included due to patient confidentiality and due to
length.
PROC COMPARE by ID
With MaxPrint Option (code)
*-------------------------------------------------------------------------------*;
* PROC COMPARE by ID with ListObs and MaxPrint Options *;
*-------------------------------------------------------------------------------*;
libname local 'G:InvestigatorsDavisDavis_CRANE (SJS Antiepileptic)SAS ProgramsData';
libname kpga 'G:InvestigatorsDavisDavis_CRANE (SJS Antiepileptic)SAS ProgramsDataKPGA';
proc sort data=kpga.CRANE01v3_KPGA_Cohort_NoID; by StudyID Cohort;
proc sort data=local.CRANE01v5_KPGA_Cohort_NoID; by StudyID Cohort;
run;
options pageno=1;
PROC COMPARE ListVar ListObs MaxPrint=(10,500)
BASE = kpga.CRANE01v3_KPGA_Cohort_NoID
COMPARE = local.CRANE01v5_KPGA_Cohort_NoID
LISTOBS
;
ID StudyID Cohort;
title1 'PROC COMPARE by ID with ListVar, ListObs, and MaxPrint Options';
RUN;
MaxPrint Option
• LISTOBS lists all observations that are found in only one
data set.
• MAXPRINT=total | (per-variable, total) specifies the
maximum number of differences to print, where
– total is the maximum total number of differences to print. The
default value is 500 unless you use the ALLOBS option (or both
the ALLVAR and TRANSPOSE options), in which case the default
is 32000.
– per-variable is the maximum number of differences to print for
each variable within a BY group. The default value is 50 unless
you use the ALLOBS option (or both the ALLVAR and TRANSPOSE
options), in which case the default is 1000.
– The MAXPRINT= option prevents the output from becoming
extremely large when data sets differ greatly.
PROC COMPARE by ID
With MaxPrint Option (output)
Comparing Multiple Files
Run PROC CONTENTS
• Run a PROC CONTENTS on each file to be included in the
comparison, keeping the variable name, type, and length.
*-------------------------------------------------------------------------------*;
* Compare the variable names/formats on each of the Tracking System BR files. *;
*-------------------------------------------------------------------------------*;
proc contents data=qa.TrackingSystem_BR_201112 noprint out=TS_BR_201112 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201201 noprint out=TS_BR_201201 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201202 noprint out=TS_BR_201202 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201203 noprint out=TS_BR_201203 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201204 noprint out=TS_BR_201204 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201205 noprint out=TS_BR_201205 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201206 noprint out=TS_BR_201206 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201207 noprint out=TS_BR_201207 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201208 noprint out=TS_BR_201208 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201209 noprint out=TS_BR_201209 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201210 noprint out=TS_BR_201210 (keep=NAME TYPE
LENGTH);
proc contents data=qa.TrackingSystem_BR_201211 noprint out=TS_BR_201211 (keep=NAME TYPE
LENGTH);
run;
Get All Variable Names
• Combine all of the files, keeping only the variable name, and then
eliminate duplicates.
data TS_BR_AllNames (keep=name);
set TS_BR_201112
TS_BR_201201
TS_BR_201202
TS_BR_201203
TS_BR_201204
TS_BR_201205
TS_BR_201206
TS_BR_201207
TS_BR_201208
TS_BR_201209
TS_BR_201210
TS_BR_201211
;
run;
proc sort data=TS_BR_AllNames nodupkey; by name;
run;
Combine All Contents Using SQL
proc sql;
create table TS_BR_AllContents as
select a.name
, b.type as type_201112
, c.type as type_201201
, d.type as type_201202
, e.type as type_201203
, f.type as type_201204
, g.type as type_201205
, h.type as type_201206
, i.type as type_201207
, j.type as type_201208
, k.type as type_201209
, l.type as type_201210
, m.type as type_201211
, b.length as length_201112
, c.length as length_201201
, d.length as length_201202
, e.length as length_201203
, f.length as length_201204
, g.length as length_201205
, h.length as length_201206
Combine All Contents Using SQL
, i.length as length_201207
, j.length as length_201208
, k.length as length_201209
, l.length as length_201210
, m.length as length_201211
from TS_BR_AllNames as a
left join TS_BR_201112 as b on a.name = b.name
left join TS_BR_201201 as c on a.name = c.name
left join TS_BR_201202 as d on a.name = d.name
left join TS_BR_201203 as e on a.name = e.name
left join TS_BR_201204 as f on a.name = f.name
left join TS_BR_201205 as g on a.name = g.name
left join TS_BR_201206 as h on a.name = h.name
left join TS_BR_201207 as i on a.name = i.name
left join TS_BR_201208 as j on a.name = j.name
left join TS_BR_201209 as k on a.name = k.name
left join TS_BR_201210 as l on a.name = l.name
left join TS_BR_201211 as m on a.name = m.name
;
quit;
Report Variables Not On All Files
proc print data=TS_BR_AllContents;
where type_201112 = .
or type_201201 = .
or type_201202 = .
or type_201203 = .
or type_201204 = .
or type_201205 = .
or type_201206 = .
or type_201207 = .
or type_201208 = .
or type_201209 = .
or type_201210 = .
or type_201211 = .
;
title3 'Variable Name Not On All Tracking System BR Files';
run;
Report Variables Not On All Files
Correct Discrepancies
• If one or more files contain variables that they should not have (e.g.
HRN on the 201201 file), drop those variables.
• If one or more files are missing variables that they should have, add
them, by:
– linking to another source (if available)
– hardcoding a value
– adding dummy variables w/ missing values
– getting a replacement file if necessary
• If there are discrepancies in variable names, you might see both
“extra” variables and “missing” variables that need to be fixed by
renaming them.
• Make corrections and rerun the PROC CONTENTS and PROC SQL.
• You might also want to rerun the discrepancy report to make sure
you fixed the discrepancies correctly.
Report Variable Type Discrepancies
*-------------------------------------------------------------------------------*;
* Check variable types for consistency. *;
*-------------------------------------------------------------------------------*;
proc print data=TS_BR_AllContents;
where type_201112 ne type_201211
or type_201201 ne type_201211
or type_201202 ne type_201211
or type_201203 ne type_201211
or type_201204 ne type_201211
or type_201205 ne type_201211
or type_201206 ne type_201211
or type_201207 ne type_201211
or type_201208 ne type_201211
or type_201209 ne type_201211
or type_201210 ne type_201211
;
title3 'Variable Type Discrepancy On Tracking System BR Files';
run;
Report Variable Type Discrepancies
Correct Type Discrepancies
Converting Character to Numeric
%macro CorrectType (file,var,tempvar);
data TrackingSystem_BR_&file (drop=&tempvar);
set TrackingSystem_BR_&file (rename=(&var=&tempvar));
&var = &tempvar * 1;
run;
%mend CorrectType;
%CorrectType (201201,PrescriptionNumber2_BR,RxNum2_BR_Char);
%CorrectType (201203,PrescriptionNumber2_BR,RxNum2_BR_Char);
%CorrectType (201204,PrescriptionNumber2_BR,RxNum2_BR_Char);
• Then rerun the PROC CONTENTS and PROC SQL.
• You might also want to rerun the discrepancy reports
to make sure you fixed the type discrepancies
correctly.
Report Variable Length Discrepancies
*-------------------------------------------------------------------------------*;
* Check variable lengths for consistency. *;
*-------------------------------------------------------------------------------*;
data TS_BR_AllContents;
set TS_BR_AllContents;
length_max = max(length_201112,length_201201,length_201202,length_201203,
length_201204,length_201205,length_201206,length_201207,
length_201208,length_201209,length_201210,length_201211);
run;
proc print data=TS_BR_AllContents;
where length_201112 ne length_201211
or length_201201 ne length_201211
or length_201202 ne length_201211
or length_201203 ne length_201211
or length_201204 ne length_201211
or length_201205 ne length_201211
or length_201206 ne length_201211
or length_201207 ne length_201211
or length_201208 ne length_201211
or length_201209 ne length_201211
or length_201210 ne length_201211
;
var name length_max length_2: type: ;
title3 'Variable Length Discrepancy On Tracking System BR Files';
run;
Report Variable Length Discrepancies
Combine Files Keeping Longest
Variable Lengths
*-------------------------------------------------------------------------------*;
* Combine all of the Tracking System BR files. *;
*-------------------------------------------------------------------------------*;
data TrackingSystem_BR_All;
length City $16
DrugName2_BR $21
FirstName $11
LastName $17
PRODUCTID $14
PreviouslyAsked $4
StreetAddress $30
StreetAddress2 $28
;
set TrackingSystem_BR_201112 (in=a)
TrackingSystem_BR_201201 (in=b)
TrackingSystem_BR_201202 (in=c)
TrackingSystem_BR_201203 (in=d)
TrackingSystem_BR_201204 (in=e)
TrackingSystem_BR_201205 (in=f)
TrackingSystem_BR_201206 (in=g)
TrackingSystem_BR_201207 (in=h)
TrackingSystem_BR_201208 (in=i)
TrackingSystem_BR_201209 (in=j)
TrackingSystem_BR_201210 (in=k)
TrackingSystem_BR_201211 (in=l)
;

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to SAS Data Set Options
Introduction to SAS Data Set OptionsIntroduction to SAS Data Set Options
Introduction to SAS Data Set OptionsMark Tabladillo
 
Entity Relationship Diagram presentation
Entity Relationship Diagram presentationEntity Relationship Diagram presentation
Entity Relationship Diagram presentationSopov Chan
 
Relational algebra-and-relational-calculus
Relational algebra-and-relational-calculusRelational algebra-and-relational-calculus
Relational algebra-and-relational-calculusSalman Vadsarya
 
Joins & constraints
Joins & constraintsJoins & constraints
Joins & constraintsVENNILAV6
 
CCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embeddingCCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embeddingHa Phuong
 
SAS Macros part 1
SAS Macros part 1SAS Macros part 1
SAS Macros part 1venkatam
 
PLSQL Developer tips and tricks
PLSQL Developer tips and tricksPLSQL Developer tips and tricks
PLSQL Developer tips and tricksPatrick Barel
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Languageguest2160992
 
Extrapolation from Progression Free Survival to Overall Survival in Oncology
Extrapolation from Progression Free Survival to Overall Survival in OncologyExtrapolation from Progression Free Survival to Overall Survival in Oncology
Extrapolation from Progression Free Survival to Overall Survival in OncologyOffice of Health Economics
 
Fundamentals of Database ppt ch03
Fundamentals of Database ppt ch03Fundamentals of Database ppt ch03
Fundamentals of Database ppt ch03Jotham Gadot
 
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Usman Tariq
 

Was ist angesagt? (13)

Introduction to SAS Data Set Options
Introduction to SAS Data Set OptionsIntroduction to SAS Data Set Options
Introduction to SAS Data Set Options
 
Entity Relationship Diagram presentation
Entity Relationship Diagram presentationEntity Relationship Diagram presentation
Entity Relationship Diagram presentation
 
Relational algebra-and-relational-calculus
Relational algebra-and-relational-calculusRelational algebra-and-relational-calculus
Relational algebra-and-relational-calculus
 
Joins & constraints
Joins & constraintsJoins & constraints
Joins & constraints
 
Rdbms (2)
Rdbms (2)Rdbms (2)
Rdbms (2)
 
CCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embeddingCCS2019-opological time-series analysis with delay-variant embedding
CCS2019-opological time-series analysis with delay-variant embedding
 
SAS Macros part 1
SAS Macros part 1SAS Macros part 1
SAS Macros part 1
 
1.2 sql create and drop table
1.2 sql create and drop table1.2 sql create and drop table
1.2 sql create and drop table
 
PLSQL Developer tips and tricks
PLSQL Developer tips and tricksPLSQL Developer tips and tricks
PLSQL Developer tips and tricks
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Language
 
Extrapolation from Progression Free Survival to Overall Survival in Oncology
Extrapolation from Progression Free Survival to Overall Survival in OncologyExtrapolation from Progression Free Survival to Overall Survival in Oncology
Extrapolation from Progression Free Survival to Overall Survival in Oncology
 
Fundamentals of Database ppt ch03
Fundamentals of Database ppt ch03Fundamentals of Database ppt ch03
Fundamentals of Database ppt ch03
 
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
Data Models [DATABASE SYSTEMS: Design, Implementation, and Management]
 

Ähnlich wie Comparing SAS Files

Pluggable database tutorial 2
Pluggable database tutorial 2Pluggable database tutorial 2
Pluggable database tutorial 2Osama Mustafa
 
Sas-training-in-mumbai
Sas-training-in-mumbaiSas-training-in-mumbai
Sas-training-in-mumbaiUnmesh Baile
 
Bringing OpenClinica Data into SAS
Bringing OpenClinica Data into SASBringing OpenClinica Data into SAS
Bringing OpenClinica Data into SASRick Watts
 
Pluggable database tutorial
Pluggable database tutorialPluggable database tutorial
Pluggable database tutorialOsama Mustafa
 
Moving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASMMoving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASMMonowar Mukul
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sasAjay Ohri
 
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq lsInSync Conference
 
12c database migration from ASM storage to NON-ASM storage
12c database migration from ASM storage to NON-ASM storage12c database migration from ASM storage to NON-ASM storage
12c database migration from ASM storage to NON-ASM storageMonowar Mukul
 
br_test_lossof-datafile_10g.doc
br_test_lossof-datafile_10g.docbr_test_lossof-datafile_10g.doc
br_test_lossof-datafile_10g.docLucky Ally
 
C# and Borland StarTeam Connectivity
C# and Borland StarTeam ConnectivityC# and Borland StarTeam Connectivity
C# and Borland StarTeam ConnectivityShreesha Rao
 
Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Kyle Hailey
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureWake Tech BAS
 
Honey I Shrunk the Database
Honey I Shrunk the DatabaseHoney I Shrunk the Database
Honey I Shrunk the DatabaseVanessa Hurst
 
Apache stratos hangout 3
Apache stratos hangout   3Apache stratos hangout   3
Apache stratos hangout 3Nirmal Fernando
 
Dataguard physical stand by setup
Dataguard physical stand by setupDataguard physical stand by setup
Dataguard physical stand by setupsmajeed1
 

Ähnlich wie Comparing SAS Files (20)

Pluggable database tutorial 2
Pluggable database tutorial 2Pluggable database tutorial 2
Pluggable database tutorial 2
 
Sas classes in mumbai
Sas classes in mumbaiSas classes in mumbai
Sas classes in mumbai
 
Sas-training-in-mumbai
Sas-training-in-mumbaiSas-training-in-mumbai
Sas-training-in-mumbai
 
Less04 Instance
Less04 InstanceLess04 Instance
Less04 Instance
 
Bringing OpenClinica Data into SAS
Bringing OpenClinica Data into SASBringing OpenClinica Data into SAS
Bringing OpenClinica Data into SAS
 
Pluggable database tutorial
Pluggable database tutorialPluggable database tutorial
Pluggable database tutorial
 
Moving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASMMoving 12c database from NON-ASM to ASM
Moving 12c database from NON-ASM to ASM
 
Oracle training in hyderabad
Oracle training in hyderabadOracle training in hyderabad
Oracle training in hyderabad
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sas
 
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
Tony jambu   (obscure) tools of the trade for tuning oracle sq lsTony jambu   (obscure) tools of the trade for tuning oracle sq ls
Tony jambu (obscure) tools of the trade for tuning oracle sq ls
 
12c database migration from ASM storage to NON-ASM storage
12c database migration from ASM storage to NON-ASM storage12c database migration from ASM storage to NON-ASM storage
12c database migration from ASM storage to NON-ASM storage
 
SAS Online Training by Real Time Working Professionals in USA,UK,India,Middle...
SAS Online Training by Real Time Working Professionals in USA,UK,India,Middle...SAS Online Training by Real Time Working Professionals in USA,UK,India,Middle...
SAS Online Training by Real Time Working Professionals in USA,UK,India,Middle...
 
br_test_lossof-datafile_10g.doc
br_test_lossof-datafile_10g.docbr_test_lossof-datafile_10g.doc
br_test_lossof-datafile_10g.doc
 
C# and Borland StarTeam Connectivity
C# and Borland StarTeam ConnectivityC# and Borland StarTeam Connectivity
C# and Borland StarTeam Connectivity
 
Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle
 
Plsql
PlsqlPlsql
Plsql
 
BAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 LectureBAS 150 Lesson 4 Lecture
BAS 150 Lesson 4 Lecture
 
Honey I Shrunk the Database
Honey I Shrunk the DatabaseHoney I Shrunk the Database
Honey I Shrunk the Database
 
Apache stratos hangout 3
Apache stratos hangout   3Apache stratos hangout   3
Apache stratos hangout 3
 
Dataguard physical stand by setup
Dataguard physical stand by setupDataguard physical stand by setup
Dataguard physical stand by setup
 

Comparing SAS Files

  • 2. Consider using PROC COMPARE the next time you need to… • prepare to combine two data sets, so you know what variables need to be reformatted, etc. • evaluate newly collected data in comparison to an existing file (ex. CRANE Study / Healthcore files) • test whether program revisions have occurred as expected • examine whether two algorithms for computing certain variables produce comparable results
  • 3.
  • 4. PROC COMPARE with No Options proc compare base = TrackingSystem_BR_201112 compare = TrackingSystem_BR_201201 ; title1 'PROC COMPARE with No Options'; run;
  • 5. PROC COMPARE with No Options (continued)
  • 6. PROC COMPARE with No Options (continued)
  • 7. PROC COMPARE with No Options (continued)
  • 8. PROC COMPARE with No Options (continued)
  • 9. PROC COMPARE with No Options (continued)
  • 10. Comparing Contents Only When all you really care about is the contents, add the NoValues and ListVar options. • NOVALUES suppresses the report of the value comparison results. • NOSUMMARY suppresses the data set, variable, observation, and values comparison summary reports. • LISTVAR lists all variables that are found in only one data set. • WARNING displays a warning message in the SAS log when differences are found. proc compare novalues nosummary listvar warning base = TrackingSystem_BR_201112 compare = TrackingSystem_BR_201201 ; title1 'PROC COMPARE with NoValues, NoSummary and ListVar Options'; run;
  • 14. The Warning Option Note: The NoValues and NoSummary options suppress printing reports, but SAS still compares the records. Consequently, when you use the Warning option, you will still get warnings even if the contents are identical.
  • 15. PROC COMPARE Identical Files with Warning Option *-------------------------------------------------------------------*; * PROC COMPARE Identical Files with Warning Option *; *-------------------------------------------------------------------*; data TrackingSystem_BR_201112_Copy; set TrackingSystem_BR_201112; run; options pageno=1; proc compare warning base = TrackingSystem_BR_201112 compare = TrackingSystem_BR_201112_Copy ; title1 'PROC COMPARE Identical Files with Warning Option'; run;
  • 16. PROC COMPARE Identical Files with Warning Option (output)
  • 17. PROC COMPARE Identical Files Sorted Differently *-------------------------------------------------------------------*; * PROC COMPARE with Warning Option *; * - Identical Files Sorted Differently *; *-------------------------------------------------------------------*; proc sort data=TrackingSystem_BR_201112; by MemberNumber; run; proc sort data=TrackingSystem_BR_201112_copy; by LastName; run; options pageno=1; proc compare warning base = TrackingSystem_BR_201112 compare = TrackingSystem_BR_201112_copy ; title1 'PROC COMPARE with Warning Option - Identical Files Sorted Differently'; run;
  • 18. PROC COMPARE Identical Files Sorted Differently (Log) 56 57 options pageno=1; 58 proc compare warning 59 base = TrackingSystem_BR_201112 60 compare = TrackingSystem_BR_201112_copy 61 ; 62 title1 'PROC COMPARE with Warning Option - Identical Files Sorted Differently'; 63 run; WARNING: Values of the following 18 variables compare unequal: MemberNumber FirstName LastName PhoneNumber Gender BirthDate StreetAddress StreetAddress2 City ZipCode NumberOfMedications_BR DrugName1_BR DrugName2_BR PrescriptionNumber1_BR PrescriptionNumber2_BR TransferNumberLive Schedule TransferNumberScript WARNING: The data sets WORK.TRACKINGSYSTEM_BR_201112 and WORK.TRACKINGSYSTEM_BR_201112_COPY contain unequal values. NOTE: There were 90 observations read from the data set WORK.TRACKINGSYSTEM_BR_201112. NOTE: There were 90 observations read from the data set WORK.TRACKINGSYSTEM_BR_201112_COPY. NOTE: At least one W.D format was too small for the number to be printed. The decimal may be shifted by the "BEST" format. NOTE: PROCEDURE COMPARE used (Total process time): real time 0.01 seconds cpu time 0.01 seconds
  • 19. PROC COMPARE Identical Files Sorted Differently (Output)
  • 20. PROC COMPARE Identical Files Sorted Differently (Output, page 2)
  • 21. PROC COMPARE Identical Files Sorted Differently (Output, page 19) • For each variable, you get a list similar to the following. Full output not included due to patient confidentiality and due to length.
  • 22. PROC COMPARE by ID With MaxPrint Option (code) *-------------------------------------------------------------------------------*; * PROC COMPARE by ID with ListObs and MaxPrint Options *; *-------------------------------------------------------------------------------*; libname local 'G:InvestigatorsDavisDavis_CRANE (SJS Antiepileptic)SAS ProgramsData'; libname kpga 'G:InvestigatorsDavisDavis_CRANE (SJS Antiepileptic)SAS ProgramsDataKPGA'; proc sort data=kpga.CRANE01v3_KPGA_Cohort_NoID; by StudyID Cohort; proc sort data=local.CRANE01v5_KPGA_Cohort_NoID; by StudyID Cohort; run; options pageno=1; PROC COMPARE ListVar ListObs MaxPrint=(10,500) BASE = kpga.CRANE01v3_KPGA_Cohort_NoID COMPARE = local.CRANE01v5_KPGA_Cohort_NoID LISTOBS ; ID StudyID Cohort; title1 'PROC COMPARE by ID with ListVar, ListObs, and MaxPrint Options'; RUN;
  • 23. MaxPrint Option • LISTOBS lists all observations that are found in only one data set. • MAXPRINT=total | (per-variable, total) specifies the maximum number of differences to print, where – total is the maximum total number of differences to print. The default value is 500 unless you use the ALLOBS option (or both the ALLVAR and TRANSPOSE options), in which case the default is 32000. – per-variable is the maximum number of differences to print for each variable within a BY group. The default value is 50 unless you use the ALLOBS option (or both the ALLVAR and TRANSPOSE options), in which case the default is 1000. – The MAXPRINT= option prevents the output from becoming extremely large when data sets differ greatly.
  • 24. PROC COMPARE by ID With MaxPrint Option (output)
  • 25.
  • 26.
  • 27.
  • 29. Run PROC CONTENTS • Run a PROC CONTENTS on each file to be included in the comparison, keeping the variable name, type, and length. *-------------------------------------------------------------------------------*; * Compare the variable names/formats on each of the Tracking System BR files. *; *-------------------------------------------------------------------------------*; proc contents data=qa.TrackingSystem_BR_201112 noprint out=TS_BR_201112 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201201 noprint out=TS_BR_201201 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201202 noprint out=TS_BR_201202 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201203 noprint out=TS_BR_201203 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201204 noprint out=TS_BR_201204 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201205 noprint out=TS_BR_201205 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201206 noprint out=TS_BR_201206 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201207 noprint out=TS_BR_201207 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201208 noprint out=TS_BR_201208 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201209 noprint out=TS_BR_201209 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201210 noprint out=TS_BR_201210 (keep=NAME TYPE LENGTH); proc contents data=qa.TrackingSystem_BR_201211 noprint out=TS_BR_201211 (keep=NAME TYPE LENGTH); run;
  • 30. Get All Variable Names • Combine all of the files, keeping only the variable name, and then eliminate duplicates. data TS_BR_AllNames (keep=name); set TS_BR_201112 TS_BR_201201 TS_BR_201202 TS_BR_201203 TS_BR_201204 TS_BR_201205 TS_BR_201206 TS_BR_201207 TS_BR_201208 TS_BR_201209 TS_BR_201210 TS_BR_201211 ; run; proc sort data=TS_BR_AllNames nodupkey; by name; run;
  • 31. Combine All Contents Using SQL proc sql; create table TS_BR_AllContents as select a.name , b.type as type_201112 , c.type as type_201201 , d.type as type_201202 , e.type as type_201203 , f.type as type_201204 , g.type as type_201205 , h.type as type_201206 , i.type as type_201207 , j.type as type_201208 , k.type as type_201209 , l.type as type_201210 , m.type as type_201211 , b.length as length_201112 , c.length as length_201201 , d.length as length_201202 , e.length as length_201203 , f.length as length_201204 , g.length as length_201205 , h.length as length_201206
  • 32. Combine All Contents Using SQL , i.length as length_201207 , j.length as length_201208 , k.length as length_201209 , l.length as length_201210 , m.length as length_201211 from TS_BR_AllNames as a left join TS_BR_201112 as b on a.name = b.name left join TS_BR_201201 as c on a.name = c.name left join TS_BR_201202 as d on a.name = d.name left join TS_BR_201203 as e on a.name = e.name left join TS_BR_201204 as f on a.name = f.name left join TS_BR_201205 as g on a.name = g.name left join TS_BR_201206 as h on a.name = h.name left join TS_BR_201207 as i on a.name = i.name left join TS_BR_201208 as j on a.name = j.name left join TS_BR_201209 as k on a.name = k.name left join TS_BR_201210 as l on a.name = l.name left join TS_BR_201211 as m on a.name = m.name ; quit;
  • 33. Report Variables Not On All Files proc print data=TS_BR_AllContents; where type_201112 = . or type_201201 = . or type_201202 = . or type_201203 = . or type_201204 = . or type_201205 = . or type_201206 = . or type_201207 = . or type_201208 = . or type_201209 = . or type_201210 = . or type_201211 = . ; title3 'Variable Name Not On All Tracking System BR Files'; run;
  • 34. Report Variables Not On All Files
  • 35. Correct Discrepancies • If one or more files contain variables that they should not have (e.g. HRN on the 201201 file), drop those variables. • If one or more files are missing variables that they should have, add them, by: – linking to another source (if available) – hardcoding a value – adding dummy variables w/ missing values – getting a replacement file if necessary • If there are discrepancies in variable names, you might see both “extra” variables and “missing” variables that need to be fixed by renaming them. • Make corrections and rerun the PROC CONTENTS and PROC SQL. • You might also want to rerun the discrepancy report to make sure you fixed the discrepancies correctly.
  • 36. Report Variable Type Discrepancies *-------------------------------------------------------------------------------*; * Check variable types for consistency. *; *-------------------------------------------------------------------------------*; proc print data=TS_BR_AllContents; where type_201112 ne type_201211 or type_201201 ne type_201211 or type_201202 ne type_201211 or type_201203 ne type_201211 or type_201204 ne type_201211 or type_201205 ne type_201211 or type_201206 ne type_201211 or type_201207 ne type_201211 or type_201208 ne type_201211 or type_201209 ne type_201211 or type_201210 ne type_201211 ; title3 'Variable Type Discrepancy On Tracking System BR Files'; run;
  • 37. Report Variable Type Discrepancies
  • 38. Correct Type Discrepancies Converting Character to Numeric %macro CorrectType (file,var,tempvar); data TrackingSystem_BR_&file (drop=&tempvar); set TrackingSystem_BR_&file (rename=(&var=&tempvar)); &var = &tempvar * 1; run; %mend CorrectType; %CorrectType (201201,PrescriptionNumber2_BR,RxNum2_BR_Char); %CorrectType (201203,PrescriptionNumber2_BR,RxNum2_BR_Char); %CorrectType (201204,PrescriptionNumber2_BR,RxNum2_BR_Char); • Then rerun the PROC CONTENTS and PROC SQL. • You might also want to rerun the discrepancy reports to make sure you fixed the type discrepancies correctly.
  • 39. Report Variable Length Discrepancies *-------------------------------------------------------------------------------*; * Check variable lengths for consistency. *; *-------------------------------------------------------------------------------*; data TS_BR_AllContents; set TS_BR_AllContents; length_max = max(length_201112,length_201201,length_201202,length_201203, length_201204,length_201205,length_201206,length_201207, length_201208,length_201209,length_201210,length_201211); run; proc print data=TS_BR_AllContents; where length_201112 ne length_201211 or length_201201 ne length_201211 or length_201202 ne length_201211 or length_201203 ne length_201211 or length_201204 ne length_201211 or length_201205 ne length_201211 or length_201206 ne length_201211 or length_201207 ne length_201211 or length_201208 ne length_201211 or length_201209 ne length_201211 or length_201210 ne length_201211 ; var name length_max length_2: type: ; title3 'Variable Length Discrepancy On Tracking System BR Files'; run;
  • 40. Report Variable Length Discrepancies
  • 41. Combine Files Keeping Longest Variable Lengths *-------------------------------------------------------------------------------*; * Combine all of the Tracking System BR files. *; *-------------------------------------------------------------------------------*; data TrackingSystem_BR_All; length City $16 DrugName2_BR $21 FirstName $11 LastName $17 PRODUCTID $14 PreviouslyAsked $4 StreetAddress $30 StreetAddress2 $28 ; set TrackingSystem_BR_201112 (in=a) TrackingSystem_BR_201201 (in=b) TrackingSystem_BR_201202 (in=c) TrackingSystem_BR_201203 (in=d) TrackingSystem_BR_201204 (in=e) TrackingSystem_BR_201205 (in=f) TrackingSystem_BR_201206 (in=g) TrackingSystem_BR_201207 (in=h) TrackingSystem_BR_201208 (in=i) TrackingSystem_BR_201209 (in=j) TrackingSystem_BR_201210 (in=k) TrackingSystem_BR_201211 (in=l) ;

Hinweis der Redaktion

  1. I could add “var name type: ;” (using colon wildcard), but then the report is too wide to fit on one screen.