This presentation was given to introduce users to SPSS 14's new programmability features. It discusses five new power features in SPSS 14, emphasizing programmability, and contains a number of annotated examples of using programmability (along with other power features).
1. Programmability in SPSS 14:
A Radical Increase in Power
A Platform for Statistical Applications
Jon K. Peck
Technical Advisor
SPSS Inc.
peck@spss.com
May, 2006
Copyright (c) SPSS Inc, 2006
2. 1. External Programming Language (BEGIN PROGRAM)
2. Multiple Datasets
3. XML Workspace and OMS Enhancements
4. Dataset and Variable Attributes
5. Drive SPSS Processor Externally
Working together, they dramatically increase
the power of SPSS.
SPSS becomes a platform that enables you to
build statistical/data manipulation applications.
GPL provides new programming power for graphics.
The Five Big ThingsThe Five Big Things
Copyright (c) SPSS Inc, 2006
3. Many datasets open at once
One is active at a time (set by syntax or UI)
DATASET ACTIVATE command
Each dataset has a Data Editor window
Copy, paste, and merge between windows
Write tabular results to a dataset using Output
Management System
Retrieve via Programmability
No longer necessary to organize jobs linearly
Multiple DatasetsMultiple Datasets
Copyright (c) SPSS Inc, 2006
4. XML WorkspaceXML Workspace
Store dictionary and selected results in workspace
Write results to workspace as XML with Output
Management System (OMS)
Retrieve selected contents from workspace via
external programming language
Persists for entire session
Copyright (c) SPSS Inc, 2006
5. OMS Output: XML or DatasetOMS Output: XML or Dataset
Write tabular results to Datasets with OMS
Main dataset remains active
Prior to SPSS 14, write to SAV file, close active, and open to use
results
Tables can be accessed via workspace or as datasets
XML workspace and XPath accessors are very general
Accessed via programmability functions
Dataset output more familiar to SPSS users
Accessed via programmability functions or traditional SPSS syntax
Use with DATASET ACTIVATE command
Copyright (c) SPSS Inc, 2006
6. AttributesAttributes
Extended metadata for files and variables
VARIABLE ATTRIBUTE, DATAFILE ATTRIBUTE
Keep facts and notes about data permanently with
the data. E.g., validation rules, source, usage,
question text, formula
Two kinds: User defined and SPSS defined
Saved with the data in the SAV file
Can be used in program logic
Copyright (c) SPSS Inc, 2006
7. ProgrammabilityProgrammability
Integrates external programming language into SPSS syntax
BEGIN PROGRAM … END PROGRAM
set of functions to communicate with SPSS
SPSS has integrated the Python language
SDK enabling other languages available
New: VB.NET available soon
External processes can drive SPSS Processor
VB.NET works only in this mode
SPSS Developer Central has SDK, Python Integration Plug-In,
and many extension modules
Available for all SPSS 14 platforms
Copyright (c) SPSS Inc, 2006
8. The Python LanguageThe Python Language
Free, portable, elegant, object oriented, versatile,
widely supported, easy to learn,…
Download from Python.org.
Version 2.4.1 or later required
Python tutorial
Python user discussion list
The Cheeseshop: Third-party modules
Copyright (c) SPSS Inc, 2006
9. Legal NoticeLegal Notice
SPSS is not the owner or licensor of the Python
software. Any user of Python must agree to the
terms of the Python license agreement located on
the Python web site. SPSS is not making any
statement about the quality of the Python program.
SPSS fully disclaims all liability associated with
your use of the Python program.
Copyright (c) SPSS Inc, 2006
10. Programmability Enables…Programmability Enables…
Generalized jobs by controlling logic based on
Variable Dictionary
Procedure output (XML or datasets)
Case data (requires SPSS 14.0.1)
Environment
Enhanced data management
Manipulation of output
Computations not built in to SPSS
Use of intelligent Python IDE driving SPSS (14.0.1)
statement completion, syntax checking, and debugging
External Control of SPSS Processor
Copyright (c) SPSS Inc, 2006
11. Programmability MakesProgrammability Makes
Obsolete…Obsolete…
SPSS Macro
except as a shorthand for lists or constants
Learning Python is much easier than learning Macro
SaxBasic
except for autoscripts
but autoscripts become less important
These have not gone away.
The SPSS transformation language continues to be
important.
Copyright (c) SPSS Inc, 2006
13. Initialization for ExamplesInitialization for Examples
* SPSS Directions, May 2006.
* In preparation for the examples, specify where SPSS
standard data files reside.
BEGIN PROGRAM.
import spss, spssaux
spssaux.GetSPSSInstallDir("SPSSDIR")
END PROGRAM.
This program creates a File Handle pointing to the SPSS installation
directory, where the sample files are installed
Copyright (c) SPSS Inc, 2006
14. * EXAMPLE 0: My first program.
BEGIN PROGRAM.
import spss
print "Hello, world!"
END PROGRAM.
Inside BEGIN PROGRAM, you write Python code.
import spss connects program to SPSS.
Import needed once per session.
Output goes to Viewer log items.
Executed when END PROGRAM reached.
RunRun Copyright (c) SPSS Inc, 2006
Example 0: Hello, worldExample 0: Hello, world
15. *Run an SPSS command from a program; create file handle.
BEGIN PROGRAM.
import spss, spssaux
spss.Submit("SHOW ALL.")
spssaux.GetSPSSInstallDir("SPSSDIR")
END PROGRAM.
Submit, in module spss is called to run one or more SPSS
commands within BEGIN PROGRAM.
One of many functions (API's) that interacts with SPSS.
GetSPSSInstallDir, in the spssaux module, creates a FILE
HANDLE to that directory
RunRun Copyright (c) SPSS Inc, 2006
Example 1: Run SPSS CommandExample 1: Run SPSS Command
16. * Print useful information in the Viewer and then get help
on an API.
BEGIN PROGRAM.
spss.Submit("GET FILE='SPSSDIR/employee data.sav'.")
varcount = spss.GetVariableCount()
casecount = spss.GetCaseCount()
print "The number of variables is " + str(varcount) + "
and the number of cases is " + str(casecount)
print help(spss.GetVariableCount)
END PROGRAM.
There are API's in the spss module to get variable dictionary
information.
help function prints short API documentation in Viewer.
RunRun Copyright (c) SPSS Inc, 2006
Example 2: Some API'sExample 2: Some API's
17. Example 3a: Data-DirectedExample 3a: Data-Directed
AnalysisAnalysis
* Summarize variables according to measurement level.
BEGIN PROGRAM.
import spss, spssaux
spssaux.OpenDataFile("SPSSDIR/employee data.sav")
# make variable dictionaries by measurement level
catVars = spssaux.VariableDict(variableLevel=['nominal',
'ordinal'])
scaleVars = spssaux.VariableDict(variableLevel=['scale'])
print "Categorical Variablesn"
for var in catVars:
print var, var.VariableName, "t", "var.VariableLabel"
Continued
Copyright (c) SPSS Inc, 2006
18. # summarize variables based on measurement level
if catVars:
spss.Submit("FREQ " + " ".join(catVars.variables))
if scaleVars:
spss.Submit("DESC "+" ".join(scaleVars.variables))
# create a macro listing scale variables
spss.SetMacroValue("!scaleVars", "
".join(scaleVars.variables))
END PROGRAM.
DESC !scaleVars.
" ".join(['x', 'y', 'z']) produces
'x y z'
RunRun Copyright (c) SPSS Inc, 2006
Example 3a (continued)Example 3a (continued)
19. * Handle an error. Use another standard Python module.
BEGIN PROGRAM.
import sys
try:
spss.Submit("foo.")
except:
print "That command did not work! ", sys.exc_info()[0]
END PROGRAM.
Errors generate exceptions
Makes it easy to check whether a long syntax job worked
Hundreds of standard modules and many others available from
SPSS and third parties
RunRun Copyright (c) SPSS Inc, 2006
Example 5: Handling ErrorsExample 5: Handling Errors
20. * Create set of dummy variables for a categorical
variable and a macro name for them.
BEGIN PROGRAM.
import spss, spssaux, spssaux2
mydict = spssaux.VariableDict()
spssaux2.CreateBasisVariables(mydict.["educ"],
"EducDummy", macroname = "!EducBasis")
spss.Submit("REGRESSION /STATISTICS=COEF /DEP=salary"
+ "/ENTER=jobtime prevexp !EducBasis.")
END PROGRAM.
Discovers educ values from the data and generates
appropriate transformation commands.
Creates macro !EducBasis
RunRun Copyright (c) SPSS Inc, 2006
Example 8: Create Basis VariablesExample 8: Create Basis Variables
21. * Automatically add cases from all SAV files in a directory.
BEGIN PROGRAM.
import glob
savlist = glob.glob("c:/temp/parts/*.sav")
if savlist:
cmd = ["ADD FILES "] +
["/FILE='" + fn + "'" for fn in savlist] +
[".", "EXECUTE."]
spss.Submit(cmd)
print "Files merged:n", "n".join(savlist)
else:
print "No files found to merge"
END PROGRAM.
The glob module resolves file-system wildcards
If savlist tests whether there are any matching files.
RunRun Copyright (c) SPSS Inc, 2006
Example 9: Merge DirectoryExample 9: Merge Directory
ContentsContents
22. * Run regression; get selected statistics, but do not display the
regular Regression output. Use OMS and Xpath wrapper functions.
BEGIN PROGRAM.
import spss, spssaux
spssaux.OpenDataFile("SPSSDIR/CARS.SAV")
try:
handle, failcode = spssaux.CreateXMLOutput(
"REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse
year.", visible=False)
horseCoef = spssaux.GetValuesFromXMLWorkspace(
handle, "Coefficients", rowCategory="Horsepower",
colCategory="B",cellAttrib="number")
print "The effect of horsepower on acceleration is: ",
horseCoef
Rsq = spssaux.GetValuesFromXMLWorkspace(
handle, "Model Summary", colCategory="R Square",
cellAttrib="text")
print "The R square is: ", Rsq
spss.DeleteXPathHandle(handle)
except:
print "*** Regression command failed. No results available."
raise
END PROGRAM. RunRun Copyright (c) SPSS Inc, 2006
Example 10: Use Parts of Output -Example 10: Use Parts of Output -
XMLXML
23. BEGIN PROGRAM.
import spss, Transform
spssaux.OpenDataFile('SPSSDIR/employee data.sav')
newvar = Transform.Compute(varname="average_increase",
varlabel="Salary increase per month of experience
if at least a year",
varmeaslvl="Scale",
varmissval=[999,998,997],
varformat="F8.4")
newvar.expression = "(salary-salbegin)/jobtime"
newvar.condition = "jobtime > 12"
newvar.retransformable=True
newvar.generate() # Get exception if compute fails
Transform.timestamp("average_increase")
spss.Submit("DISPLAY DICT /VAR=average_increase.")
spss.Submit("DESC average_increase.")
END PROGRAM.
RunRun Copyright (c) SPSS Inc, 2006
Example 11: Transformations inExample 11: Transformations in
Python SyntaxPython Syntax
24. BEGIN PROGRAM.
import spss, Transform
try:
Transform.retransform("average_increase")
Transform.timestamp("average_increase")
except:
print "Could not update average_increase."
else:
spss.Submit("display dictionary"+
"/variable=average_increase.")
END PROGRAM.
Transformation saved using Attributes
RunRun Copyright (c) SPSS Inc, 2006
Example 11A: Repeat TransformExample 11A: Repeat Transform
25. BEGIN PROGRAM.
import spss, viewer
spss.Submit("DESCRIPTIVES ALL")
spssapp = viewer.spssapp()
try:
actualName = spssapp.SaveDesignatedOutput(
"c:/temp/myoutput.spo")
except:
print "Save failed. Name:", actualName
else:
spssapp.ExportDesignatedOutput(
"c:/temp/myoutput.doc", format="Word")
spssapp.CloseDesignatedOutput()
END PROGRAM.
RunRun Copyright (c) SPSS Inc, 2006
Example 12: Controlling theExample 12: Controlling the
Viewer Using AutomationViewer Using Automation
26. BEGIN PROGRAM.
import spss, spssaux
from poisson_regression import *
spssaux.OpenDataFile(
'SPSSDIR/Tutorial/Sample_Files/autoaccidents.sav')
poisson_regression("accident", covariates=["age"],
factors=["gender"])
END PROGRAM.
Poisson regression module built from SPSS CNLR and
transformations commands.
PROGRAMS can get case data and use other Python
modules or code on it.
RunRun Copyright (c) SPSS Inc, 2006
Example 13: A New ProcedureExample 13: A New Procedure
Poisson RegressionPoisson Regression
27. * Mean salary by education level.
BEGIN PROGRAM.
import spssdata
data = spssdata.Spssdata(indexes=('salary', 'educ'))
Counts ={}; Salaries={}
for case in data:
cat = int(case.educ)
Counts[cat] = Counts.get(cat, 0) + 1
Salaries[cat] = Salaries.get(cat,0) + case.salary
print "educ mean salaryn"
for cat in sorted(Counts):
print " %2d $%6.0f" % (cat,
Salaries[cat]/Counts[cat])
del data
END PROGRAM.
RunRun Copyright (c) SPSS Inc, 2006
Example 14: Using Case DataExample 14: Using Case Data
28. BEGIN PROGRAM.
# <accumulate Counts and Salaries as in Example 14>
desViewer = viewer.spssapp().GetDesignatedOutput()
rowcats = []; cells = []
for cat in sorted(Counts):
rowcats.append(int(cat))
cells.append(Salaries[cat]/Counts[cat])
ptable = viewer.PivotTable("a Python table",
tabletitle="Effect of Education on Salary",
caption="Data from employee data.sav",
rowdim="Years of Education",
rowlabels=rowcats,
collabels=["Mean Salary"],
cells = cells,
tablelook="c:/data/goodlook.tlo")
ptable.insert(desViewer)
END PROGRAM.
RunRun Copyright (c) SPSS Inc, 2006
Example 14a: Output As a PivotExample 14a: Output As a Pivot
TableTable
29. get file='c:/spss14/cars.sav'.
DATASET NAME maindata.
DATASET DECLARE regcoef.
DATASET DECLARE regfit.
OMS /IF SUBTYPE=["coefficients"]
/DESTINATION FORMAT = sav OUTFILE=regcoef.
OMS /IF SUBTYPE=["Model Summary"]
/DESTINATION FORMAT = sav OUTFILE=regfit.
REGRESSION /DEPENDENT accel /METHOD=ENTER
weight horse year.
OMSEND.
Use OMS directly to figure out what to retrieve programmatically
Copyright (c) SPSS Inc, 2006
Exploring OMS Dataset OutputExploring OMS Dataset Output
30. BEGIN PROGRAM.
import spss, spssaux, spssdata
try:
coefhandle, rsqhandle, failcode =
spssaux.CreateDatasetOutput(
"REGRESSION /DEPENDENT accel /METHOD=ENTER
weight horse year.",
subtype=["coefficients", "Model Summary"])
cursor = spssdata.Spssdata(indexes=["Var2",
"B"], dataset=coefhandle)
for case in cursor:
if case.Var2.startswith("Horsepower"):
print "The effect of horsepower on
acceleration is: ", case.B
cursor.close()
Copyright (c) SPSS Inc, 2006
Example 10a: Use Bits of Output -Example 10a: Use Bits of Output -
DatasetsDatasets
31. cursor =spssdata.Spssdata(indexes=["RSquare"],
dataset=rsqhandle)
row = cursor.fetchone()
print "The R Squared is: ", row.RSquare
cursor.close()
except:
print "*** Regression command failed. No
results available."
raise
spssdata.Dataset("maindata").activate()
spssdata.Dataset(coefhandle).close()
spssdata.Dataset(rsqhandle).close()
END PROGRAM.
RunRun Copyright (c) SPSS Inc, 2006
Example 10a: Use Bits of Output –Example 10a: Use Bits of Output –
Datasets (continued)Datasets (continued)
32. Variable Dictionary access
Procedures selected based on variable properties
Actions based on environment
Automatic construction of transformations
Error handling
Variables that remember their formulas
Management of the SPSS Viewer
New statistical procedure
Access to case data
Copyright (c) SPSS Inc, 2006
What We SawWhat We Saw
33. SPSS Processor (backend) can be embedded and
controlled by Python or other processes
Build applications using SPSS functionality
invisibly
Application supplies user interface
No SPSS Viewer
Allows use of Python IDE to build programs
Pythonwin or many others
Copyright (c) SPSS Inc, 2006
Externally Controlling SPSSExternally Controlling SPSS
34. Copyright (c) SPSS Inc, 2006
PythonWin IDE Controlling SPSSPythonWin IDE Controlling SPSS
35. Extend SPSS functionality
Write more general and flexible jobs
Handle errors
React to results and metadata
Implement new features
Write simpler, clearer, more efficient code
Greater productivity
Automate repetitive tasks
Build SPSS functionality into other applications
Copyright (c) SPSS Inc, 2006
What Are the ProgrammabilityWhat Are the Programmability
Benefits?Benefits?
36. SPSS 14 (14.0.1 for data access and IDE)
Python (visit Python.org)
Installation
Tutorial
Many other resources
SPSS® Programming and Data Management, 3rd Edition: A Guide for SPSS®
and SAS® Users new
SPSS Developer Central
Python Plug-In (14.0.1 version covers 14.0.2)
Example modules
Dive Into Python (diveintopython.org) book or PDF
Practical Python by Magnus Lie Hetland
Python Cookbook, 2nd
ed by Martelli, Ravenscroft, & Ascher
Python and
Plug-In
On the CD in
SPSS 15
Copyright (c) SPSS Inc, 2006
Getting StartedGetting Started
37. Five power features of SPSS 14
Examples of programmability using Python
How to get started: materials and resources
Copyright (c) SPSS Inc, 2006
RecapRecap
39. Working together these new features give you a
dramatically more powerful SPSS.
SPSS becomes a platform that enables you to
build your own statistical applications.
1. Programmability
2. Multiple datasets
3. XML Workspace and OMS enhancements
4. Attributes
5. External driver application
Copyright (c) SPSS Inc, 2006
In ClosingIn Closing
40. Jon Peck can now be reached at:
peck@us.ibm.com
Copyright (c) SPSS Inc, 2006
ContactContact
Hinweis der Redaktion
The plan is as follows:
We will first look at the new set of four general power features, focusing mainly on programmability.
We will look at some code in the demos, but the focus today is on what you can do with a bit of the how to.
Now let’s look at the five power features.
The external programming language is the focus of this talk, but it interacts closely with other features
Not to mention GPL for graphics.
First a brief discussion of #2 through #5, then we will go into #1.
No need to open a dataset and save and close it in order to open another.
match/add can work with the active data and other datasets. Especially useful when merging non-SPSS datasets.
Communication point
OMS can also create a new dataset that can be retrieved in the program as case data.
Use Xpath to select parts of objects or retrieve entire XML tree and parse in the external programming language.
We expect to do more with the workspace in future
SPSS has metadata already: var labels, value labels, missing values etc.
Now users can create their own properties/attributes
This is something radical.
These are different from input programs or transformation programs
SPSS has integrated Python, and, using the SDK, you can integrate other languages such as .NET if you want to use that.
Now let’s talk about the Python language a little.
Python is not on the SPSS 14 cd or on Developer Central
Get it from the Python web site
SPSS does not control the Python language or its future development.
By opening SPSS to third party languages, we can take advantage of progress on those and do more, faster. Not limited by our own resources
Now what can you do with Programmability?
Uses dictionary object indexed by variable name.
The first category is omitted.
Example assumes that SPSS Output Labels are set to &quot;Labels&quot;, not &quot;Names&quot;.
A later example will show this using datasets instead of XML
Uses OLE automation methods. Client only, local mode
Example assumes that SPSS Output Labels are set to &quot;Labels&quot;, not &quot;Names&quot;.
We used programmability to easily solve problems that were difficult to handle in earlier versions.
I asked at the beginning about challenges that have been hard to solve with SPSS. I hope that you have seen the glimmer of some solutions with SPSS 14.
Now that you are excited, how do you get started?
The PythonWin I D E is available from http://starship.python.net/crew/mhammond/win32/Downloads.html. There are many other choices for a Python I D E.