SlideShare a Scribd company logo
1 of 45
Programmability in
SPSS 15
The Revolution Continues
Jon Peck
Technical Advisor
SPSS
Copyright (c) SPSS Inc, 2006
 Recap of SPSS 14 Python programmability
 Developer Central
 New features in SPSS 15 programmability
 Writing first-class procedures
 Updating the data
 The Bonus Pack modules
 Interacting with the user
 Q & A
 Conclusion
Copyright (c) SPSS Inc, 2006
Agenda
 "Because of programmability, SPSS 14 is the most
important release since I started using SPSS fifteen
years ago."
 "I think I am going to like using Python."
 "Python, here I come!"
 "I now think Python is an amazing language."
 "Python and SPSS 14 and later are, IMHO, GREAT!"
 "By the way, Python is a great addition to SPSS."
Copyright (c) SPSS Inc, 2006
Quotations from SPSS Users
 SPSS provides a powerful engine for statistical and
graphical methods and for data management.
 Python® provides a powerful, elegant, and easy-
to-learn language for controlling and responding to
this engine.
 Together they provide a comprehensive system for
serious applications of analytical methods to data.
Copyright (c) SPSS Inc, 2006
The Combination of SPSS and
Python
 SPSS 14.0 provided
 Programmability
 Multiple datasets
 Variable and File Attributes
 Programmability read-access to case data
 Ability to control SPSS from a Python program
 SPSS 15 adds
 Read and write case data
 Create new variables directly rather than generating syntax
 Create pivot tables and text blocks via backend API’s
 Easier setup
Copyright (c) SPSS Inc, 2006
Programmability Features in
SPSS 14 and 15
 Makes possible jobs that respond to datasets, output,
environment
 Allows greater generality, more automation
 Makes jobs more robust
 Allows extending the capabilities of SPSS
 Enables better organized and more maintainable code
 Facilitates staff specialization
 Increases productivity
 More fun
Copyright (c) SPSS Inc, 2006
Programmability Advantages
 Python extends SPSS via
 General programming language
 Access to variable dictionary, case data, and output
 Access to standard and third-party modules
 SPSS Developer Central modules
 Module structure for building libraries of code
 Runs in "back-end" syntax context (like macro)
 SaxBasic scripting runs in "front-end" context
 Two modes
 Traditional SPSS syntax window
 Drive SPSS from Python (external mode)
 Optional install
Copyright (c) SPSS Inc, 2006
Programmability Overview
 SPSS is not the owner or licensor of the Python
software. Any user of Python must agree to the
terms of the Python license agreement located on
the Python web site. SPSS is not making any
statement about the quality of the Python program.
SPSS fully disclaims all liability associated with
your use of the Python program.
Copyright (c) SPSS Inc, 2006
Legal Notice
 Supports implementing various programming
languages
 Requires a programmer to implement a new language
 VB.NET Plug-In available on Developer Central
 Works only in external mode
Copyright (c) SPSS Inc, 2006
The SPSS Programmability SDK
 Python interpreter embedded within SPSS
 SPSS runs in traditional way until BEGIN PROGRAM
command is found
 Python collects commands until END PROGRAM
command is found; then runs the program
 Python can communicate with SPSS through API's (calls to
functions)
 Includes running SPSS syntax inside Python program
 Includes creating macro values for later use in syntax
 Python can access SPSS output and data
 OMS is a key tool
Copyright (c) SPSS Inc, 2006
How Programmability Works
BEGIN PROGRAM.
import spss, spssaux
spssaux.GetSPSSInstallDir("SPSSDIR")
spssaux.OpenDataFile("SPSSDIR/employee data.sav")
# find categorical variables
catVars = spssaux.VariableDict(variableLevel=['nominal',
'ordinal'])
if catVars:
spss.Submit("FREQ " + " ".join(catVars.variables))
# create a macro listing categorical variables
spss.SetMacroValue("!catVars", " ".join(catVars.variables))
END PROGRAM.
DESC !catVars.
Run
Copyright (c) SPSS Inc, 2006
Example:
Summarize Categorical Variables
 Two modes of operation
 SPSS Drives mode (inside): traditional syntax context
 BEGIN PROGRAM …program… END PROGRAM
 X Drives mode (outside): eXternal program drives SPSS
 Python interpreter (or VB.NET)
 import spss
 No SPSS Viewer, Data Editor, or SPSS user interface
 Output sent as text to the application – can be suppressed
 Has performance advantages
 Build programs with an IDE
 Even if to be run in traditional mode
Copyright (c) SPSS Inc, 2006
Programmability Inside or Outside
SPSS
Copyright (c) SPSS Inc, 2006
PythonWin IDE Controlling SPSS
 Python.org
 Python Tutorial
 Global (standard) Module Index
 Python help system and help command
 Cheeseshop
 1627 packages as of Sept 21, 2006
 SPSS Developer Central
 SPSS Programming and Data Management, 3rd ed, 2006.
 Many books
 Look for books at the Python 2.4 level
Copyright (c) SPSS Inc, 2006
Python Resources
 Dive Into Python book or PDF
 Practical Python by Magnus Lie Hetland
 Extensive examples and discussion of Python
 Python Cookbook, 2nd
ed by Martelli, Ravenscroft, & Ascher
 Second edition (July, 2006) of
Martelli, Python in a Nutshell, O'Reilly
 Very clear, comprehensive reference material
 wxPython in Action by Rappin and Dunn
 Explains user interface building with wxPython
Copyright (c) SPSS Inc, 2006
Python Books
 scipy 0.5.0 Scientific Algorithms Library for Python
 scipy is an open source library of scientific tools for
Python. scipy gathers a variety of high level science and
engineering modules together as a single package. scipy
provides modules for statistics, optimization, integration,
linear algebra, Fourier transforms, signal and image
processing, genetic algorithms, ODE solvers, special
functions, and more. scipy requires and supplements
NumPy, which provides a multidimensional array object and
other basic functionality.
 scipy rework currently beta
 Visit Scipy.org Copyright (c) SPSS Inc, 2006
Cheeseshop: scipy
Went Live
21-May-2006
 New Web home for developing SPSS applications
 SPSS Developer Central
 old url: forums.spss.com/code_center
 Python Integration Plug-Ins
 Useful supplementary modules by SPSS and others
 Updated for SPSS 15
 Articles on programmability and graphics
 Place to ask questions and exchange information
 Programmability Extension SDK
 Get Python itself from Python.org
 SPSS uses 2.4. (2.4.3)
 Not limited to programmability
 GPL graphics
 User-contributed code
Key Supplementary
Modules
spssaux
spssdata
New for SPSS 15
trans
extendedTransforms
rake
pls
Copyright (c) SPSS Inc, 2006
SPSS Developer Central
 You can extend SPSS capabilities by building new procedures
 Or use ones that others have built
 Combine SPSS procedures and transformations with Python
logic
 Poisson regression (SPSS 14) example using iterated CNLR
 New raking procedure built over GENLOG
 Calculate data aggregates in SPSS and pass to algorithm
coded in Python
 Raking procedure starts with AGGREGATE
 Acquire case data and compute in Python
 Use Python standard modules and third-party additions
 Partial Least Squares Regression (pls module)
Copyright (c) SPSS Inc, 2006
Approaches to
Creating New Procedures
 Common to adapt existing libraries or code for use
as Python extension modules
 C, C++, VB, Fortran,...
 Extension modules are normal Python modules
 Python itself written in C
 Many standard modules are C code
 Python tools and API's to assist
 Chap 25 in Python in a Nutshell
 Tutorial on extending and embedding the Python
interpreter
Copyright (c) SPSS Inc, 2006
Adapt Existing Code Libraries
 Regression with large number of predictors (even k > N)
 Similar to Principal Components but considers dependent
variable simultaneously
 Calculates principal components of (y, X) then use regression
on the scores instead of original data
 User chooses number of factors
 Equivalent to ordinary regression when number of factors
equals number of predictors and one y variable
 For more information see An Optimization Perspective on
Kernel Partial Least Squares Regression.pdf.
Copyright (c) SPSS Inc, 2006
Partial Least Squares Regression
 Strategy
 Fetches data from SPSS
 Uses scipy matrix operations to compute results
 Third-party module from Cheeseshop
 Writes pivot tables to SPSS Viewer
 Subject to OMS
 SPSS 14 viewer module created pivot table using OLE automation
 Saves predicted values to active dataset
Copyright (c) SPSS Inc, 2006
The pls Module
GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav".
REGRESSION /STATISTICS COEFF R /DEPENDENT sales
/METHOD=ENTER curb_wgt engine_s fuel_cap horsepow
length mpg price resale type wheelbas width .
begin program.
import spss, pls
pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepow
length mpg price resale type wheelbas width""",
yhat="predsales")
end program.
 plsproc defaults to five factors
Copyright (c) SPSS Inc, 2006
pls Example: REGRESSION vs
PLS
 PLS with 5 factors
almost equals
regression with 11
variables
Copyright (c) SPSS Inc, 2006
Results
 "Raking" adjusts sample weights to control totals in n
dimensions
 Example: data classified by age and sex with known
population totals or proportions
 Calculated by fitting a main effects loglinear model
 Various adjustments required
 Not a complete solution to reweighting
 Not directly available in SPSS
Copyright (c) SPSS Inc, 2006
Raking Sample Weights
 Strategy: combine SPSS procedures with Python logic
 rake.py (part of SPSS 15 Bonus Pack)
 Aggregates data via AGGREGATE to new dataset
 Creates new variable with control totals
 Applies GENLOG, saving predicted counts
 Adjusts predicted counts
 Matches back into original dataset
 Does not use MATCH FILES or require a SORT command
 Written in one (long) day
rake.rake("age sex",
[{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}],
finalweight="finalwt")
Copyright (c) SPSS Inc, 2006
Raking Module
 SPSS 14 programmability can wrap SPSS syntax in Python
logic
 Useful when definitions can be expressed in SPSS syntax
 SPSS 15 programmability can generate new variables directly
 Cursor can have accessType='w'
 SPSS 15 programmability can add cases directly
 Cursor can have accessType='a'
 SPSS 15 programmability can create new datasets from
scratch
 Cursor can have accessType='n'
 spssdata module on Developer Central updated to support
these modes
Copyright (c) SPSS Inc, 2006
Extending SPSS Transformations
 trans module facilitates plugging in Python code to
iterate over cases
 Runs as an SPSS procedure
 Passes the data
 Adds variables to the SPSS variable dictionary
 Can apply any calculation casewise
 Use with
 Standard Python functions (e.g., math module)
 Any user-written functions or appropriate classes
 Functions in extendedTransforms module
Copyright (c) SPSS Inc, 2006
trans and extendedTransforms
Modules
 trans strategy
 Pass case data through Python code writing
result back to SPSS in new variables
 extendedTransforms collection of ten functions to
apply to SPSS variables
 Regular expression search/replace
 Template-based substitution
 soundex and nysiis functions for phonetic equivalence
 Levenshtein distance function for string similarity
 Date/time conversions based on patterns
Copyright (c) SPSS Inc, 2006
trans and extendedTransforms
Modules
 Pattern matching in text strings
 If you use SPSS index or replace, you need these
 Standardize string data (Mr, Mr., Herr, Senor,...)
 Patterns can be simple strings (as with SPSS
index) or complex patterns
 Pick out variable names with common parts
Copyright (c) SPSS Inc, 2006
Python Regular Expressions
 "age" – string containing the letters age
 "wage" – string containing the word age
 "abc|xyz|pqrst" = string containing any of abc etc
 "d+" – a string of any number of digits
 "x.*y" – a string starting with x and ending with y
 Can be case sensitive or not
 Can greatly simplify code currently using SPSS index and
replace functions
Copyright (c) SPSS Inc, 2006
Regular Expressions:
A Few Examples
import spss, trans, spssaux, extendedTransforms
spssaux.OpenDataFile("c:/data/names.sav")
tproc = trans.Tfunction(listwiseDeletion=True)
tproc.append(extendedTransforms.search, 'match','a8',
['names', trans.const('Peck|Pech|Pek')])
tproc.append(extendedTransforms.search, 'matchignorecase','a8',
['names', trans.const('peck'), trans.const(True)])
tproc.append(extendedTransforms.search, ('match2','startpos','length'),
('a12','f4.0','f4.0'), ['names', trans.const('Peck')])
tproc.execute()
spss.Submit("SELECT IF length > 0")
spssaux.SaveDataFile("c:/temp/namesplus.sav")
Run
Copyright (c) SPSS Inc, 2006
Using trans and extendedTransforms
search Function
begin program.
import trans, re
def splitAndExtract(s):
"""split a string on "--" and return the left part and the number
in the right part. Ex: "simvastatin-- PO 80mg TAB" -> "simvastatin", 80"""
parts = s.split("--")
try:
number = re.search("d+", parts[1]).group()
except:
number = None
return parts[0], number
tproc = trans.Tfunction()
tproc.append(splitAndExtract, ("name", "number"), ("a30", "f5.0"), ["medicine"])
tproc.execute()
end program. Run
Copyright (c) SPSS Inc, 2006
Using trans:
Writing Your Own Function
 Algorithms for approximating phonetic equivalence of
names
 soundexallwords can be used on unstructured text
 Applied to database of 20,000+ surnames
import spss, trans, spssaux, extendedTransforms
spssaux.OpenDataFile("c:/data/names.sav")
tproc = trans.Tfunction()
tproc.append(extendedTransforms.soundex, 'soundex','a5', ['names'])
tproc.append(extendedTransforms.nysiis, 'nysiis', 'a20', ['names'])
tproc.execute()
spssaux.SaveDataFile("c:/temp/namesplusplus.sav")
Run
Copyright (c) SPSS Inc, 2006
extendedTransforms
soundex and nysiis
Copyright (c) SPSS Inc, 2006
Results
 (Overly) simple processing of unstructured text
 Use soundex word by word to abstract spelling
 No stemming, linguistic analysis etc
 Use STAFS for serious work
 Very simple to use
begin program.
import spss, trans, extendedTransforms
t = trans.Tfunction()
t.append(extendedTransforms.soundexallwords, 'allsoundexn66',
'a108', ['n_66'])
t.execute()
end program.
Copyright (c) SPSS Inc, 2006
soundex on Unstructured Text
Copyright (c) SPSS Inc, 2006
soundex on Unstructured Text
 Python comes with Tkinter, a gui toolkit
 There are better ones freely downloadable
 E.g., wxPython
 Visit wxpython.org
 Very easy to do small user interactions
 Examples
 Message box
 File chooser
 Variable picker
Copyright (c) SPSS Inc, 2006
Creating a Graphical User
Interface
Copyright (c) SPSS Inc, 2006
Simple Message Box Using
wxPython
Copyright (c) SPSS Inc, 2006
Simple File Chooser Using
wxPython
Copyright (c) SPSS Inc, 2006
Variable Picker Using wxPython
 User-missing values
 GetVarMissingValues
 GetSPSSLowHigh
 Pivot table API's
 BasePivotTable
 CellText
 Dimension
 Output Text block support
 Good for writing comments to the Viewer
 Miscellaneous
 GetWeightVar
 HasCursor
 SplitChange
Copyright (c) SPSS Inc, 2006
Other New spss Module API’s
 SPSS 14 introduced major programmability features
 SPSS 15 adds
 Reading and writing case data: new variables; new cases
 Creating pivot tables and text blocks
 Writing first-class SPSS procedures
 Bonus Pack and Partial Least Squares modules illustrate
these features
 Developer Central improves ability to provide modules and
information
 Will soon have four new SPSS 15 modules
Copyright (c) SPSS Inc, 2006
Recap
?
?
?
?
Copyright (c) SPSS Inc, 2006
Questions
 SPSS 15 programmability makes it easy to add
capabilities beyond what is already built in to SPSS
 SPSS 15 makes it easier to build complete
applications on top of SPSS
 SPSS 15 programmability makes you more
productive
 SPSS 15 has lots of other great features, too
 Try it out
Copyright (c) SPSS Inc, 2006
SPSS 15:
The Revolution Continues
Copyright (c) SPSS Inc, 2006
Write to Me!

More Related Content

What's hot

Quadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemRob Vesse
 
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesUse of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesMaurice Dawson
 
Sempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopSempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopAlexander Schätzle
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataEMC
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial IntroductionSakthi Dasans
 
Practical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedRob Vesse
 
Weka tutorial
Weka tutorialWeka tutorial
Weka tutorialGRajendra
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questionsDr P Deepak
 
Programming Languages - Functional Programming Paper
Programming Languages - Functional Programming PaperProgramming Languages - Functional Programming Paper
Programming Languages - Functional Programming PaperShreya Chakrabarti
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of RAnalyticsWeek
 
Database API Viewed as a Mathematical Function, Insights into Testing
Database API Viewed as a Mathematical Function, Insights into TestingDatabase API Viewed as a Mathematical Function, Insights into Testing
Database API Viewed as a Mathematical Function, Insights into TestingBrendan Furey
 
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro MorisakiA11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro MorisakiInsight Technology, Inc.
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 

What's hot (15)

Quadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystemQuadrupling your elephants - RDF and the Hadoop ecosystem
Quadrupling your elephants - RDF and the Hadoop ecosystem
 
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing OpportunitiesUse of Open Source Software Enhancing Curriculum | Developing Opportunities
Use of Open Source Software Enhancing Curriculum | Developing Opportunities
 
Sempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on HadoopSempala - Interactive SPARQL Query Processing on Hadoop
Sempala - Interactive SPARQL Query Processing on Hadoop
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big Data
 
Sap abap
Sap abapSap abap
Sap abap
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
Practical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking RevisitedPractical SPARQL Benchmarking Revisited
Practical SPARQL Benchmarking Revisited
 
Weka tutorial
Weka tutorialWeka tutorial
Weka tutorial
 
Base sas interview questions
Base sas interview questionsBase sas interview questions
Base sas interview questions
 
Programming Languages - Functional Programming Paper
Programming Languages - Functional Programming PaperProgramming Languages - Functional Programming Paper
Programming Languages - Functional Programming Paper
 
The History and Use of R
The History and Use of RThe History and Use of R
The History and Use of R
 
Data provenance in Hopsworks
Data provenance in HopsworksData provenance in Hopsworks
Data provenance in Hopsworks
 
Database API Viewed as a Mathematical Function, Insights into Testing
Database API Viewed as a Mathematical Function, Insights into TestingDatabase API Viewed as a Mathematical Function, Insights into Testing
Database API Viewed as a Mathematical Function, Insights into Testing
 
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro MorisakiA11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by  Toshiro Morisaki
A11,B24 次世代型インメモリデータベースSAP HANA。その最新技術を理解する by Toshiro Morisaki
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 

Similar to Programmability in spss 15

Presentation on spss
Presentation on spssPresentation on spss
Presentation on spssalfiyajamalcj
 
CHX PYTHON INTRO
CHX PYTHON INTROCHX PYTHON INTRO
CHX PYTHON INTROKai Liu
 
Open source analytics
Open source analyticsOpen source analytics
Open source analyticsAjay Ohri
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroSpagoWorld
 
Software for Qualitative and Quantitative Data Analysis
Software for Qualitative and Quantitative Data AnalysisSoftware for Qualitative and Quantitative Data Analysis
Software for Qualitative and Quantitative Data AnalysisAlexandru Caratas Ghenea
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQLEDB
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsEsther Vasiete
 
Involvement in OpenHPC
Involvement in OpenHPC	Involvement in OpenHPC
Involvement in OpenHPC Linaro
 
Spssbriefguide160
Spssbriefguide160Spssbriefguide160
Spssbriefguide160vishalks
 
بررسی چارچوب جنگو
بررسی چارچوب جنگوبررسی چارچوب جنگو
بررسی چارچوب جنگوrailsbootcamp
 
Seminar report on python 3 course
Seminar report on python 3 courseSeminar report on python 3 course
Seminar report on python 3 courseHimanshuPanwar38
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016Anand Haridass
 
Week 1 unit 1to5 sap hana native application basics
Week 1 unit 1to5 sap hana native application basicsWeek 1 unit 1to5 sap hana native application basics
Week 1 unit 1to5 sap hana native application basicsSatya Harish
 
Erp ehp6 medialist
Erp ehp6 medialistErp ehp6 medialist
Erp ehp6 medialistAdnan Khalid
 
Combining Big Data and HPC in a GRIDScalar Environment
Combining Big Data and HPC in a GRIDScalar EnvironmentCombining Big Data and HPC in a GRIDScalar Environment
Combining Big Data and HPC in a GRIDScalar Environmentinside-BigData.com
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...Robert Grossman
 

Similar to Programmability in spss 15 (20)

Presentation on spss
Presentation on spssPresentation on spss
Presentation on spss
 
CHX PYTHON INTRO
CHX PYTHON INTROCHX PYTHON INTRO
CHX PYTHON INTRO
 
Open source analytics
Open source analyticsOpen source analytics
Open source analytics
 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
 
Software for Qualitative and Quantitative Data Analysis
Software for Qualitative and Quantitative Data AnalysisSoftware for Qualitative and Quantitative Data Analysis
Software for Qualitative and Quantitative Data Analysis
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
Involvement in OpenHPC
Involvement in OpenHPC	Involvement in OpenHPC
Involvement in OpenHPC
 
Spssbriefguide160
Spssbriefguide160Spssbriefguide160
Spssbriefguide160
 
بررسی چارچوب جنگو
بررسی چارچوب جنگوبررسی چارچوب جنگو
بررسی چارچوب جنگو
 
Pyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdfPyhton-1a-Basics.pdf
Pyhton-1a-Basics.pdf
 
Seminar report on python 3 course
Seminar report on python 3 courseSeminar report on python 3 course
Seminar report on python 3 course
 
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
IBM Data Engine for Hadoop and Spark - POWER System Edition ver1 March 2016
 
Week 1 unit 1to5 sap hana native application basics
Week 1 unit 1to5 sap hana native application basicsWeek 1 unit 1to5 sap hana native application basics
Week 1 unit 1to5 sap hana native application basics
 
Readme
ReadmeReadme
Readme
 
AtoZ about TYPO3 v8 CMS
AtoZ about TYPO3 v8 CMSAtoZ about TYPO3 v8 CMS
AtoZ about TYPO3 v8 CMS
 
Erp ehp6 medialist
Erp ehp6 medialistErp ehp6 medialist
Erp ehp6 medialist
 
Big data analytics use case and software
Big data analytics use case and softwareBig data analytics use case and software
Big data analytics use case and software
 
Combining Big Data and HPC in a GRIDScalar Environment
Combining Big Data and HPC in a GRIDScalar EnvironmentCombining Big Data and HPC in a GRIDScalar Environment
Combining Big Data and HPC in a GRIDScalar Environment
 
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
How to Lower the Cost of Deploying Analytics: An Introduction to the Portable...
 

Recently uploaded

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 

Recently uploaded (20)

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 

Programmability in spss 15

  • 1. Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006
  • 2.  Recap of SPSS 14 Python programmability  Developer Central  New features in SPSS 15 programmability  Writing first-class procedures  Updating the data  The Bonus Pack modules  Interacting with the user  Q & A  Conclusion Copyright (c) SPSS Inc, 2006 Agenda
  • 3.  "Because of programmability, SPSS 14 is the most important release since I started using SPSS fifteen years ago."  "I think I am going to like using Python."  "Python, here I come!"  "I now think Python is an amazing language."  "Python and SPSS 14 and later are, IMHO, GREAT!"  "By the way, Python is a great addition to SPSS." Copyright (c) SPSS Inc, 2006 Quotations from SPSS Users
  • 4.  SPSS provides a powerful engine for statistical and graphical methods and for data management.  Python® provides a powerful, elegant, and easy- to-learn language for controlling and responding to this engine.  Together they provide a comprehensive system for serious applications of analytical methods to data. Copyright (c) SPSS Inc, 2006 The Combination of SPSS and Python
  • 5.  SPSS 14.0 provided  Programmability  Multiple datasets  Variable and File Attributes  Programmability read-access to case data  Ability to control SPSS from a Python program  SPSS 15 adds  Read and write case data  Create new variables directly rather than generating syntax  Create pivot tables and text blocks via backend API’s  Easier setup Copyright (c) SPSS Inc, 2006 Programmability Features in SPSS 14 and 15
  • 6.  Makes possible jobs that respond to datasets, output, environment  Allows greater generality, more automation  Makes jobs more robust  Allows extending the capabilities of SPSS  Enables better organized and more maintainable code  Facilitates staff specialization  Increases productivity  More fun Copyright (c) SPSS Inc, 2006 Programmability Advantages
  • 7.  Python extends SPSS via  General programming language  Access to variable dictionary, case data, and output  Access to standard and third-party modules  SPSS Developer Central modules  Module structure for building libraries of code  Runs in "back-end" syntax context (like macro)  SaxBasic scripting runs in "front-end" context  Two modes  Traditional SPSS syntax window  Drive SPSS from Python (external mode)  Optional install Copyright (c) SPSS Inc, 2006 Programmability Overview
  • 8.  SPSS is not the owner or licensor of the Python software. Any user of Python must agree to the terms of the Python license agreement located on the Python web site. SPSS is not making any statement about the quality of the Python program. SPSS fully disclaims all liability associated with your use of the Python program. Copyright (c) SPSS Inc, 2006 Legal Notice
  • 9.  Supports implementing various programming languages  Requires a programmer to implement a new language  VB.NET Plug-In available on Developer Central  Works only in external mode Copyright (c) SPSS Inc, 2006 The SPSS Programmability SDK
  • 10.  Python interpreter embedded within SPSS  SPSS runs in traditional way until BEGIN PROGRAM command is found  Python collects commands until END PROGRAM command is found; then runs the program  Python can communicate with SPSS through API's (calls to functions)  Includes running SPSS syntax inside Python program  Includes creating macro values for later use in syntax  Python can access SPSS output and data  OMS is a key tool Copyright (c) SPSS Inc, 2006 How Programmability Works
  • 11. BEGIN PROGRAM. import spss, spssaux spssaux.GetSPSSInstallDir("SPSSDIR") spssaux.OpenDataFile("SPSSDIR/employee data.sav") # find categorical variables catVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal']) if catVars: spss.Submit("FREQ " + " ".join(catVars.variables)) # create a macro listing categorical variables spss.SetMacroValue("!catVars", " ".join(catVars.variables)) END PROGRAM. DESC !catVars. Run Copyright (c) SPSS Inc, 2006 Example: Summarize Categorical Variables
  • 12.  Two modes of operation  SPSS Drives mode (inside): traditional syntax context  BEGIN PROGRAM …program… END PROGRAM  X Drives mode (outside): eXternal program drives SPSS  Python interpreter (or VB.NET)  import spss  No SPSS Viewer, Data Editor, or SPSS user interface  Output sent as text to the application – can be suppressed  Has performance advantages  Build programs with an IDE  Even if to be run in traditional mode Copyright (c) SPSS Inc, 2006 Programmability Inside or Outside SPSS
  • 13. Copyright (c) SPSS Inc, 2006 PythonWin IDE Controlling SPSS
  • 14.  Python.org  Python Tutorial  Global (standard) Module Index  Python help system and help command  Cheeseshop  1627 packages as of Sept 21, 2006  SPSS Developer Central  SPSS Programming and Data Management, 3rd ed, 2006.  Many books  Look for books at the Python 2.4 level Copyright (c) SPSS Inc, 2006 Python Resources
  • 15.  Dive Into Python book or PDF  Practical Python by Magnus Lie Hetland  Extensive examples and discussion of Python  Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher  Second edition (July, 2006) of Martelli, Python in a Nutshell, O'Reilly  Very clear, comprehensive reference material  wxPython in Action by Rappin and Dunn  Explains user interface building with wxPython Copyright (c) SPSS Inc, 2006 Python Books
  • 16.  scipy 0.5.0 Scientific Algorithms Library for Python  scipy is an open source library of scientific tools for Python. scipy gathers a variety of high level science and engineering modules together as a single package. scipy provides modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, genetic algorithms, ODE solvers, special functions, and more. scipy requires and supplements NumPy, which provides a multidimensional array object and other basic functionality.  scipy rework currently beta  Visit Scipy.org Copyright (c) SPSS Inc, 2006 Cheeseshop: scipy
  • 17. Went Live 21-May-2006  New Web home for developing SPSS applications  SPSS Developer Central  old url: forums.spss.com/code_center  Python Integration Plug-Ins  Useful supplementary modules by SPSS and others  Updated for SPSS 15  Articles on programmability and graphics  Place to ask questions and exchange information  Programmability Extension SDK  Get Python itself from Python.org  SPSS uses 2.4. (2.4.3)  Not limited to programmability  GPL graphics  User-contributed code Key Supplementary Modules spssaux spssdata New for SPSS 15 trans extendedTransforms rake pls Copyright (c) SPSS Inc, 2006 SPSS Developer Central
  • 18.  You can extend SPSS capabilities by building new procedures  Or use ones that others have built  Combine SPSS procedures and transformations with Python logic  Poisson regression (SPSS 14) example using iterated CNLR  New raking procedure built over GENLOG  Calculate data aggregates in SPSS and pass to algorithm coded in Python  Raking procedure starts with AGGREGATE  Acquire case data and compute in Python  Use Python standard modules and third-party additions  Partial Least Squares Regression (pls module) Copyright (c) SPSS Inc, 2006 Approaches to Creating New Procedures
  • 19.  Common to adapt existing libraries or code for use as Python extension modules  C, C++, VB, Fortran,...  Extension modules are normal Python modules  Python itself written in C  Many standard modules are C code  Python tools and API's to assist  Chap 25 in Python in a Nutshell  Tutorial on extending and embedding the Python interpreter Copyright (c) SPSS Inc, 2006 Adapt Existing Code Libraries
  • 20.  Regression with large number of predictors (even k > N)  Similar to Principal Components but considers dependent variable simultaneously  Calculates principal components of (y, X) then use regression on the scores instead of original data  User chooses number of factors  Equivalent to ordinary regression when number of factors equals number of predictors and one y variable  For more information see An Optimization Perspective on Kernel Partial Least Squares Regression.pdf. Copyright (c) SPSS Inc, 2006 Partial Least Squares Regression
  • 21.  Strategy  Fetches data from SPSS  Uses scipy matrix operations to compute results  Third-party module from Cheeseshop  Writes pivot tables to SPSS Viewer  Subject to OMS  SPSS 14 viewer module created pivot table using OLE automation  Saves predicted values to active dataset Copyright (c) SPSS Inc, 2006 The pls Module
  • 22. GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav". REGRESSION /STATISTICS COEFF R /DEPENDENT sales /METHOD=ENTER curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width . begin program. import spss, pls pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width""", yhat="predsales") end program.  plsproc defaults to five factors Copyright (c) SPSS Inc, 2006 pls Example: REGRESSION vs PLS
  • 23.  PLS with 5 factors almost equals regression with 11 variables Copyright (c) SPSS Inc, 2006 Results
  • 24.  "Raking" adjusts sample weights to control totals in n dimensions  Example: data classified by age and sex with known population totals or proportions  Calculated by fitting a main effects loglinear model  Various adjustments required  Not a complete solution to reweighting  Not directly available in SPSS Copyright (c) SPSS Inc, 2006 Raking Sample Weights
  • 25.  Strategy: combine SPSS procedures with Python logic  rake.py (part of SPSS 15 Bonus Pack)  Aggregates data via AGGREGATE to new dataset  Creates new variable with control totals  Applies GENLOG, saving predicted counts  Adjusts predicted counts  Matches back into original dataset  Does not use MATCH FILES or require a SORT command  Written in one (long) day rake.rake("age sex", [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight="finalwt") Copyright (c) SPSS Inc, 2006 Raking Module
  • 26.  SPSS 14 programmability can wrap SPSS syntax in Python logic  Useful when definitions can be expressed in SPSS syntax  SPSS 15 programmability can generate new variables directly  Cursor can have accessType='w'  SPSS 15 programmability can add cases directly  Cursor can have accessType='a'  SPSS 15 programmability can create new datasets from scratch  Cursor can have accessType='n'  spssdata module on Developer Central updated to support these modes Copyright (c) SPSS Inc, 2006 Extending SPSS Transformations
  • 27.  trans module facilitates plugging in Python code to iterate over cases  Runs as an SPSS procedure  Passes the data  Adds variables to the SPSS variable dictionary  Can apply any calculation casewise  Use with  Standard Python functions (e.g., math module)  Any user-written functions or appropriate classes  Functions in extendedTransforms module Copyright (c) SPSS Inc, 2006 trans and extendedTransforms Modules
  • 28.  trans strategy  Pass case data through Python code writing result back to SPSS in new variables  extendedTransforms collection of ten functions to apply to SPSS variables  Regular expression search/replace  Template-based substitution  soundex and nysiis functions for phonetic equivalence  Levenshtein distance function for string similarity  Date/time conversions based on patterns Copyright (c) SPSS Inc, 2006 trans and extendedTransforms Modules
  • 29.  Pattern matching in text strings  If you use SPSS index or replace, you need these  Standardize string data (Mr, Mr., Herr, Senor,...)  Patterns can be simple strings (as with SPSS index) or complex patterns  Pick out variable names with common parts Copyright (c) SPSS Inc, 2006 Python Regular Expressions
  • 30.  "age" – string containing the letters age  "wage" – string containing the word age  "abc|xyz|pqrst" = string containing any of abc etc  "d+" – a string of any number of digits  "x.*y" – a string starting with x and ending with y  Can be case sensitive or not  Can greatly simplify code currently using SPSS index and replace functions Copyright (c) SPSS Inc, 2006 Regular Expressions: A Few Examples
  • 31. import spss, trans, spssaux, extendedTransforms spssaux.OpenDataFile("c:/data/names.sav") tproc = trans.Tfunction(listwiseDeletion=True) tproc.append(extendedTransforms.search, 'match','a8', ['names', trans.const('Peck|Pech|Pek')]) tproc.append(extendedTransforms.search, 'matchignorecase','a8', ['names', trans.const('peck'), trans.const(True)]) tproc.append(extendedTransforms.search, ('match2','startpos','length'), ('a12','f4.0','f4.0'), ['names', trans.const('Peck')]) tproc.execute() spss.Submit("SELECT IF length > 0") spssaux.SaveDataFile("c:/temp/namesplus.sav") Run Copyright (c) SPSS Inc, 2006 Using trans and extendedTransforms search Function
  • 32. begin program. import trans, re def splitAndExtract(s): """split a string on "--" and return the left part and the number in the right part. Ex: "simvastatin-- PO 80mg TAB" -> "simvastatin", 80""" parts = s.split("--") try: number = re.search("d+", parts[1]).group() except: number = None return parts[0], number tproc = trans.Tfunction() tproc.append(splitAndExtract, ("name", "number"), ("a30", "f5.0"), ["medicine"]) tproc.execute() end program. Run Copyright (c) SPSS Inc, 2006 Using trans: Writing Your Own Function
  • 33.  Algorithms for approximating phonetic equivalence of names  soundexallwords can be used on unstructured text  Applied to database of 20,000+ surnames import spss, trans, spssaux, extendedTransforms spssaux.OpenDataFile("c:/data/names.sav") tproc = trans.Tfunction() tproc.append(extendedTransforms.soundex, 'soundex','a5', ['names']) tproc.append(extendedTransforms.nysiis, 'nysiis', 'a20', ['names']) tproc.execute() spssaux.SaveDataFile("c:/temp/namesplusplus.sav") Run Copyright (c) SPSS Inc, 2006 extendedTransforms soundex and nysiis
  • 34. Copyright (c) SPSS Inc, 2006 Results
  • 35.  (Overly) simple processing of unstructured text  Use soundex word by word to abstract spelling  No stemming, linguistic analysis etc  Use STAFS for serious work  Very simple to use begin program. import spss, trans, extendedTransforms t = trans.Tfunction() t.append(extendedTransforms.soundexallwords, 'allsoundexn66', 'a108', ['n_66']) t.execute() end program. Copyright (c) SPSS Inc, 2006 soundex on Unstructured Text
  • 36. Copyright (c) SPSS Inc, 2006 soundex on Unstructured Text
  • 37.  Python comes with Tkinter, a gui toolkit  There are better ones freely downloadable  E.g., wxPython  Visit wxpython.org  Very easy to do small user interactions  Examples  Message box  File chooser  Variable picker Copyright (c) SPSS Inc, 2006 Creating a Graphical User Interface
  • 38. Copyright (c) SPSS Inc, 2006 Simple Message Box Using wxPython
  • 39. Copyright (c) SPSS Inc, 2006 Simple File Chooser Using wxPython
  • 40. Copyright (c) SPSS Inc, 2006 Variable Picker Using wxPython
  • 41.  User-missing values  GetVarMissingValues  GetSPSSLowHigh  Pivot table API's  BasePivotTable  CellText  Dimension  Output Text block support  Good for writing comments to the Viewer  Miscellaneous  GetWeightVar  HasCursor  SplitChange Copyright (c) SPSS Inc, 2006 Other New spss Module API’s
  • 42.  SPSS 14 introduced major programmability features  SPSS 15 adds  Reading and writing case data: new variables; new cases  Creating pivot tables and text blocks  Writing first-class SPSS procedures  Bonus Pack and Partial Least Squares modules illustrate these features  Developer Central improves ability to provide modules and information  Will soon have four new SPSS 15 modules Copyright (c) SPSS Inc, 2006 Recap
  • 43. ? ? ? ? Copyright (c) SPSS Inc, 2006 Questions
  • 44.  SPSS 15 programmability makes it easy to add capabilities beyond what is already built in to SPSS  SPSS 15 makes it easier to build complete applications on top of SPSS  SPSS 15 programmability makes you more productive  SPSS 15 has lots of other great features, too  Try it out Copyright (c) SPSS Inc, 2006 SPSS 15: The Revolution Continues
  • 45. Copyright (c) SPSS Inc, 2006 Write to Me!

Editor's Notes

  1. Other new SPSS 14 features enhance programmability: multiple concurrent datasets variable and file attributes XML workspace and OMS enhancements
  2. The PythonWin I D E is available from http://starship.python.net/crew/mhammond/win32/Downloads.html. There are many other choices for a Python I D E.
  3. Names that are phonetically equivalent have identical soundex values and identical nysiis values. The graphic highlights the surnames that are phonetically equivalent to Abercrombie.
  4. Jon Peck can now be reached at peck@us.ibm.com.