SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Time Series Analysis in Python with statsmodels

                   Wes McKinney1                 Josef Perktold2               Skipper Seabold3

                                            1 Departmentof Statistical Science
                                                    Duke University
                                            2 Department of Economics

                                    University of North Carolina at Chapel Hill
                                               3 Departmentof Economics
                                                  American University


                       10th Python in Science Conference, 13 July 2011



McKinney, Perktold, Seabold (statsmodels)        Python Time Series Analysis          SciPy Conference 2011   1 / 29
What is statsmodels?




          A library for statistical modeling, implementing standard statistical
          models in Python using NumPy and SciPy
          Includes:
                  Linear (regression) models of many forms
                  Descriptive statistics
                  Statistical tests
                  Time series analysis
                  ...and much more




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   2 / 29
What is Time Series Analysis?




          Statistical modeling of time-ordered data observations
          Inferring structure, forecasting and simulation, and testing
          distributional assumptions about the data
          Modeling dynamic relationships among multiple time series
          Broad applications e.g. in economics, finance, neuroscience, signal
          processing...




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   3 / 29
Talk Overview



          Brief update on statsmodels development
          Aside: user interface and data structures
          Descriptive statistics and tests
          Auto-regressive moving average models (ARMA)
          Vector autoregression (VAR) models
          Filtering tools (Hodrick-Prescott and others)
          Near future: Bayesian dynamic linear models (DLMs), ARCH /
          GARCH volatility models and beyond




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   4 / 29
Statsmodels development update



          We’re now on GitHub! Join us:

                         http://github.com/statsmodels/statsmodels

          Check out the slick Sphinx docs:

                                http://statsmodels.sourceforge.net

          Development focus has been largely computational, i.e. writing
          correct, tested implementations of all the common classes of
          statistical models




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   5 / 29
Statsmodels development update




          Major work to be done on providing a nice integrated user interface
          We must work together to close the gap between R and Python!
          Some important areas:
                  Formula framework, for specifying model design matrices
                  Need integrated rich statistical data structures (pandas)
                  Data visualization of results should always be a few keystrokes away
                  Write a “Statsmodels for R users” guide




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   6 / 29
Aside: statistical data structures and user interface



          While I have a captive audience...
          Controversial fact: pandas is the only Python library currently
          providing data structures matching (and in many places exceeding)
          the richness of R’s data structures (for statistics)
                  Let’s have a BoF session so I can justify this statement
          Feedback I hear is that end users find the fragmented, incohesive set
          of Python tools for data analysis and statistics to be confusing,
          frustrating, and certainly not compelling them to use Python...
                  (Not to mention the packaging headaches)




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   7 / 29
Aside: statistical data structures and user interface




          We need to “commit” ASAP (not 12 months from now) to a high
          level data structure(s) as the “primary data structure(s) for statistical
          data analysis” and communicate that clearly to end users
                  Or we might as well all start programming in R...




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   8 / 29
Example data: EEG trace data


               300

               200

               100

                 0

               100

               200

               300

               400

               500

               600
                  0         500           0      0           0              0      0          0             0
                                      100     150         200         250       300        350        400




McKinney, Perktold, Seabold (statsmodels)     Python Time Series Analysis              SciPy Conference 2011    9 / 29
Example data: Macroeconomic data


              5.5
              5.0      cpi
              4.5
              4.0
              3.5
              3.0
              7.5
              7.0      m1
              6.5
              6.0
              5.5
              5.0
              4.5
              9.5
              9.0
                       realgdp
              8.5
              8.0
                  0   4     8  2  6   0   4   8   2   6   0   4    8
               196 196 196 197 197 198 198 198 199 199 200 200 200




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   10 / 29
Example data: Stock data


              800
                         AAPL
              700        GOOG
                         MSFT
              600        YHOO
              500
              400
              300
              200
              100
                0
                          1         2          3        4           5      6           7      8       9
                       200       200        200      200      200       200      200       200     200




McKinney, Perktold, Seabold (statsmodels)          Python Time Series Analysis              SciPy Conference 2011   11 / 29
Descriptive statistics
            Autocorrelation, partial autocorrelation plots
            Commonly used for identification in ARMA(p,q) and ARIMA(p,d,q)
            models
            acf = tsa . acf ( eeg , 50)
            pacf = tsa . pacf ( eeg , 50)

     1.0                  Autocorrelation                     1.0               Partial Autocorrelation


     0.5                                                      0.5


     0.0                                                      0.0


     0.5                                                      0.5


     1.00         10        20        30    40        50      1.00         10        20        30         40    50

McKinney, Perktold, Seabold (statsmodels)    Python Time Series Analysis               SciPy Conference 2011   12 / 29
Statistical tests




          Ljung-Box test for zero autocorrelation
          Unit root test for cointegration (Augmented Dickey-Fuller test)
          Granger-causality
          Whiteness (iid-ness) and normality
          See our conference paper (when the proceedings get published!)




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   13 / 29
Autoregressive moving average (ARMA) models
          One of most common univariate time series models:

                   yt = µ + a1 yt−1 + ... + ak yt−p +                t    + b1   t−1   + ... + bq       t−q
                                                                                           2
                   where E ( t , s ) = 0, for t = s and                   t   ∼ N (0, σ )


          Exact log-likelihood can be evaluated via the Kalman filter, but the
          “conditional” likelihood is easier and commonly used
          statsmodels has tools for simulating ARMA processes with known
          coefficients ai , bi and also estimation given specified lag orders
              import scikits.statsmodels.tsa.arima_process as ap
              ar_coef = [1, .75, -.25]; ma_coef = [1, -.5]
              nobs = 100
              y = ap.arma_generate_sample(ar_coef, ma_coef, nobs)
              y += 4 # add in constant

McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis                SciPy Conference 2011   14 / 29
ARMA Estimation



          Several likelihood-based estimators implemented (see docs)
              model = tsa.ARMA(y)
              result = model.fit(order=(2, 1), trend=’c’,
                                 method=’css-mle’, disp=-1)
              result.params
              # array([ 3.97, -0.97, -0.05, -0.13])


          Standard model diagnostics, standard errors, information criteria
          (AIC, BIC, ...), etc available in the returned ARMAResults object




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   15 / 29
Vector Autoregression (VAR) models



          Widely used model for modeling multiple (K -variate) time series,
          especially in macroeconomics:

                           Yt = A1 Yt−1 + . . . + Ap Yt−p +               t,   t   ∼ N (0, Σ)

          Matrices Ai are K × K .
          Yt must be a stationary process (sometimes achieved by
          differencing). Related class of models (VECM) for modeling
          nonstationary (including cointegrated) processes




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis            SciPy Conference 2011   16 / 29
Vector Autoregression (VAR) models

   >>> model = VAR(data); model.select_order(8)
                    VAR Order Selection
   =====================================================
              aic          bic          fpe         hqic
   -----------------------------------------------------
   0       -27.83       -27.78    8.214e-13       -27.81
   1       -28.77       -28.57    3.189e-13       -28.69
   2       -29.00      -28.64*    2.556e-13       -28.85
   3       -29.10       -28.60    2.304e-13      -28.90*
   4       -29.09       -28.43    2.330e-13       -28.82
   5       -29.13       -28.33    2.228e-13       -28.81
   6      -29.14*       -28.18   2.213e-13*       -28.75
   7       -29.07       -27.96    2.387e-13       -28.62
   =====================================================
   * Minimum

McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   17 / 29
Vector Autoregression (VAR) models

   >>> result = model.fit(2)
   >>> result.summary() # print summary for each variable
   <snip>
   Results for equation m1
   ====================================================
               coefficient    std. error t-stat    prob
   ----------------------------------------------------
   const          0.004968      0.001850   2.685 0.008
   L1.m1          0.363636      0.071307   5.100 0.000
   L1.realgdp    -0.077460      0.092975 -0.833 0.406
   L1.cpi        -0.052387      0.128161 -0.409 0.683
   L2.m1          0.250589      0.072050   3.478 0.001
   L2.realgdp    -0.085874      0.092032 -0.933 0.352
   L2.cpi         0.169803      0.128376   1.323 0.188
   ====================================================
   <snip>


McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   18 / 29
Vector Autoregression (VAR) models




   >>> result = model.fit(2)
   >>> result.summary() # print summary for each variable
   <snip>
   Correlation matrix of residuals
                    m1   realgdp       cpi
   m1         1.000000 -0.055690 -0.297494
   realgdp   -0.055690 1.000000 0.115597
   cpi       -0.297494 0.115597 1.000000




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   19 / 29
VAR: Impulse Response analysis
          Analyze systematic impact of unit “shock” to a single variable

   irf = result.irf(10)
   irf.plot()

                                                                  Impulse responses
                                      m1 → m1                         realgdp → m1                        cpi → m1
                         1.0                               0.2                               0.4
                         0.8                               0.1                               0.3
                                                                                             0.2
                         0.6                               0.0                               0.1
                         0.4                               0.1                               0.0
                         0.2                               0.2                               0.1
                                                                                             0.2
                         0.0                               0.3                               0.3
                         0.20        4                     0.40          4                10 0.40
                                2            6
                                    m1 → realgdp   8   10         2 realgdp → realgdp 8
                                                                                6                   2   cpi4→ realgdp
                                                                                                                  6     8   10
                        0.20                               1.0                               0.2
                        0.15                               0.8                               0.1
                        0.10                               0.6                               0.0
                        0.05
                                                           0.4                               0.1
                        0.00
                        0.05                               0.2                               0.2
                        0.10                               0.0                               0.3
                        0.150   2     4      6     8   10 0.20    2     4                    0.40         4 → cpi
                                      m1 → cpi                        realgdp →6
                                                                               cpi   8    10        2     cpi 6         8   10
                        0.20                              0.15                               1.0
                        0.15                              0.10                               0.8
                        0.10                              0.05                               0.6
                        0.05                              0.00
                        0.00                              0.05                               0.4
                        0.05                              0.10                               0.2
                        0.100   2     4     6      8   10 0.150   2     4      6     8    10 0.00   2     4      6      8   10



McKinney, Perktold, Seabold (statsmodels)                 Python Time Series Analysis                                SciPy Conference 2011   20 / 29
VAR: Forecast Error Variance Decomposition
          Analyze contribution of each variable to forecasting error

   fevd = result.fevd(20)
   fevd.plot()

                                                Forecast error variance decomposition (FEVD)         m1
                         1.0                                 m1                                      realgdp
                         0.8                                                                         cpi
                         0.6
                         0.4
                         0.2
                         0.00               5                 10                        15     20
                         1.2                               realgdp
                         1.0
                         0.8
                         0.6
                         0.4
                         0.2
                         0.00               5                10                         15     20
                         1.2                                 cpi
                         1.0
                         0.8
                         0.6
                         0.4
                         0.2
                         0.00               5                 10                        15     20



McKinney, Perktold, Seabold (statsmodels)       Python Time Series Analysis                     SciPy Conference 2011   21 / 29
VAR: Statistical tests



   In [137]: result.test_causality(’m1’, [’cpi’, ’realgdp’])
   Granger causality f-test
   =========================================================
      Test statistic   Critical Value      p-value        df
   ---------------------------------------------------------
            1.248787         2.387325        0.289 (4, 579)
   =========================================================
   H_0: [’cpi’, ’realgdp’] do not Granger-cause m1
   Conclusion: fail to reject H_0 at 5.00% significance level




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   22 / 29
Filtering

          Hodrick-Prescott (HP) filter separates a time series yt into a trend τt
          and a cyclical component ζt , so that yt = τt + ζt .

              14
                                                                                       Inflation
              12                                                                       Cyclical component
              10                                                                       Trend component
               8
               6
               4
                2
               0
                2
                4
                       2      6      0      4      8       2       6       0       4      8        2       6
                    196    196    197    197    197    198     198     199     199     199      200    200

McKinney, Perktold, Seabold (statsmodels)        Python Time Series Analysis                  SciPy Conference 2011   23 / 29
Filtering

          In addition to the HP filter, 2 other filters popular in finance and
          economics, Baxter-King and Christiano-Fitzgerald, are available
          We refer you to our paper and the documentation for details on these:

                          Inflation and Unemployment: BK Filtered                           Inflation and Unemployment: CF Filtered
                                                                    INFL                                                              INFL
              4                                                               4                                                       UNEMP
                                                                    UNEMP

              2                                                               2


              0                                                               0


              2                                                               2


              4                                                               4
                                                                                  63



                                                                                               73



                                                                                                           83



                                                                                                                       93
                                                                                       68



                                                                                                     78



                                                                                                                 88



                                                                                                                             98

                                                                                                                                      03
                         71




                                      81




                                                    91




                                                                                                                                           08
                    66




                                76




                                              86




                                                           96

                                                                    01

                                                                         06



                                                                                  19



                                                                                              19



                                                                                                          19



                                                                                                                      19
                                                                                       19



                                                                                                    19



                                                                                                                19



                                                                                                                            19
                         19




                                     19




                                                   19




                                                                                                                                  20
                  19




                              19




                                            19




                                                         19




                                                                                                                                           20
                                                                20

                                                                         20




McKinney, Perktold, Seabold (statsmodels)                   Python Time Series Analysis                         SciPy Conference 2011           24 / 29
Preview: Bayesian dynamic linear models (DLM)



          A state space model by another name:

                                      yt = Ft θt + νt ,       νt ∼ N (0, Vt )
                                      θt = G θt−1 + ωt ,          ωt ∼ N (0, Wt )

          Estimation of basic model by Kalman filter recursions. Provides
          elegant way to do time-varying linear regressions for forecasting
          Extensions: multivariate DLMs, stochastic volatility (SV) models,
          MCMC-based posterior sampling, mixtures of DLMs




McKinney, Perktold, Seabold (statsmodels)    Python Time Series Analysis        SciPy Conference 2011   25 / 29
Preview: DLM Example (Constant+Trend model)

   model = Polynomial(2)
   dlm = DLM(close_px[’AAPL’], model.F, G=model.G, # model
             m0=m0, C0=C0, n0=n0, s0=s0, # priors
             state_discount=.95) # discount factor
                                                                Constant + Trend DLM



                        200



                        150



                        100



                         50
                                       8            9        009            9        009               9               9
                                    200          200        2            200    Jul 2            200             200
                              Nov          Jan          Mar        May                     Sep             Nov

McKinney, Perktold, Seabold (statsmodels)                 Python Time Series Analysis                              SciPy Conference 2011   26 / 29
Preview: Stochastic volatility models


              1.6                       JPY-USD Exchange Rate Volatility Process

              1.4

              1.2

              1.0

              0.8

              0.6

              0.4

              0.20                200             400               600            800             1000



McKinney, Perktold, Seabold (statsmodels)      Python Time Series Analysis          SciPy Conference 2011   27 / 29
Future: sandbox and beyond




          ARCH / GARCH models for volatility
          Structural VAR and error correction models (ECM) for cointegrated
          processes
          Models with non-normally distributed errors
          Better data description, visualization, and interactive research tools
          More sophisticated Bayesian time series models




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   28 / 29
Conclusions




          We’ve implemented many foundational models for time series
          analysis, but the field is very broad
          User interface can and should be much improved
          Repo: http://github.com/statsmodels/statsmodels
          Docs: http://statsmodels.sourceforge.net
          Contact: pystatsmodels@googlegroups.com




McKinney, Perktold, Seabold (statsmodels)   Python Time Series Analysis   SciPy Conference 2011   29 / 29

Weitere ähnliche Inhalte

Was ist angesagt?

Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Databricks
 

Was ist angesagt? (20)

Time series deep learning
Time series   deep learningTime series   deep learning
Time series deep learning
 
Final PPT.pptx
Final PPT.pptxFinal PPT.pptx
Final PPT.pptx
 
Stock Market Prediction
Stock Market Prediction Stock Market Prediction
Stock Market Prediction
 
ベイズ主義による研究の報告方法
ベイズ主義による研究の報告方法ベイズ主義による研究の報告方法
ベイズ主義による研究の報告方法
 
決定木・回帰木に基づくアンサンブル学習の最近
決定木・回帰木に基づくアンサンブル学習の最近決定木・回帰木に基づくアンサンブル学習の最近
決定木・回帰木に基づくアンサンブル学習の最近
 
機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論機械学習におけるオンライン確率的最適化の理論
機械学習におけるオンライン確率的最適化の理論
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
深層学習による自然言語処理の研究動向
深層学習による自然言語処理の研究動向深層学習による自然言語処理の研究動向
深層学習による自然言語処理の研究動向
 
Stock Market Price Prediction Using Technical Analysis
Stock Market Price Prediction Using Technical AnalysisStock Market Price Prediction Using Technical Analysis
Stock Market Price Prediction Using Technical Analysis
 
The Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By StatisticiansThe Basics of Statistics for Data Science By Statisticians
The Basics of Statistics for Data Science By Statisticians
 
Sigfin Neural Fractional SDE NET
Sigfin Neural Fractional SDE NETSigfin Neural Fractional SDE NET
Sigfin Neural Fractional SDE NET
 
Presentation1
Presentation1Presentation1
Presentation1
 
Deep Learning for Stock Prediction
Deep Learning for Stock PredictionDeep Learning for Stock Prediction
Deep Learning for Stock Prediction
 
3次元のデータをグラフにする(Tokyo.R#17)
3次元のデータをグラフにする(Tokyo.R#17)3次元のデータをグラフにする(Tokyo.R#17)
3次元のデータをグラフにする(Tokyo.R#17)
 
History of Data Science
History of Data ScienceHistory of Data Science
History of Data Science
 
1 6.変数選択とAIC
1 6.変数選択とAIC1 6.変数選択とAIC
1 6.変数選択とAIC
 
Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門Jubatus Casual Talks #2 異常検知入門
Jubatus Casual Talks #2 異常検知入門
 
社会心理学者のための時系列分析入門_小森
社会心理学者のための時系列分析入門_小森社会心理学者のための時系列分析入門_小森
社会心理学者のための時系列分析入門_小森
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
Hessian free
Hessian freeHessian free
Hessian free
 

Andere mochten auch

Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
 
Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonData Structures for Statistical Computing in Python
Data Structures for Statistical Computing in Python
Wes McKinney
 
SciPy 2011 pandas lightning talk
SciPy 2011 pandas lightning talkSciPy 2011 pandas lightning talk
SciPy 2011 pandas lightning talk
Wes McKinney
 
ET_with_EEG
ET_with_EEGET_with_EEG
ET_with_EEG
Xuan Guo
 
How Chile used social media during the Earthquake
How Chile used social media during the EarthquakeHow Chile used social media during the Earthquake
How Chile used social media during the Earthquake
Sebastian Salazar
 
Structured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and StatisticsStructured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and Statistics
Wes McKinney
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
Wes McKinney
 
Analysis of EEG data Using ICA and Algorithm Development for Energy Comparison
Analysis of EEG data Using ICA and Algorithm Development for Energy ComparisonAnalysis of EEG data Using ICA and Algorithm Development for Energy Comparison
Analysis of EEG data Using ICA and Algorithm Development for Energy Comparison
ijsrd.com
 

Andere mochten auch (20)

Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
pandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statisticspandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statistics
 
Data Structures for Statistical Computing in Python
Data Structures for Statistical Computing in PythonData Structures for Statistical Computing in Python
Data Structures for Statistical Computing in Python
 
Time travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodelsTime travel and time series analysis with pandas + statsmodels
Time travel and time series analysis with pandas + statsmodels
 
Revenue Growth through Machine Learning
Revenue Growth through Machine LearningRevenue Growth through Machine Learning
Revenue Growth through Machine Learning
 
SciPy 2011 pandas lightning talk
SciPy 2011 pandas lightning talkSciPy 2011 pandas lightning talk
SciPy 2011 pandas lightning talk
 
PyDataDC- Forecasting critical food violations at restaurants using open data
PyDataDC- Forecasting critical food violations at restaurants using open dataPyDataDC- Forecasting critical food violations at restaurants using open data
PyDataDC- Forecasting critical food violations at restaurants using open data
 
ET_with_EEG
ET_with_EEGET_with_EEG
ET_with_EEG
 
How Chile used social media during the Earthquake
How Chile used social media during the EarthquakeHow Chile used social media during the Earthquake
How Chile used social media during the Earthquake
 
Laughing Squid Opportunity Analysis Project
Laughing Squid Opportunity Analysis ProjectLaughing Squid Opportunity Analysis Project
Laughing Squid Opportunity Analysis Project
 
Structured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and StatisticsStructured Data Challenges in Finance and Statistics
Structured Data Challenges in Finance and Statistics
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
 
Productive Data Tools for Quants
Productive Data Tools for QuantsProductive Data Tools for Quants
Productive Data Tools for Quants
 
Analysis of EEG data Using ICA and Algorithm Development for Energy Comparison
Analysis of EEG data Using ICA and Algorithm Development for Energy ComparisonAnalysis of EEG data Using ICA and Algorithm Development for Energy Comparison
Analysis of EEG data Using ICA and Algorithm Development for Energy Comparison
 
Time series Forecasting using svm
Time series Forecasting using  svmTime series Forecasting using  svm
Time series Forecasting using svm
 
Pocoyo
PocoyoPocoyo
Pocoyo
 
Predicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector RegressionPredicting Stock Market Price Using Support Vector Regression
Predicting Stock Market Price Using Support Vector Regression
 
Time series database, InfluxDB & PHP
Time series database, InfluxDB & PHPTime series database, InfluxDB & PHP
Time series database, InfluxDB & PHP
 
ForecastIT 4. Holt's Exponential Smoothing
ForecastIT 4. Holt's Exponential SmoothingForecastIT 4. Holt's Exponential Smoothing
ForecastIT 4. Holt's Exponential Smoothing
 

Ähnlich wie Scipy 2011 Time Series Analysis in Python

Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008
bosc_2008
 
Colored petri nets theory and applications
Colored petri nets theory and applicationsColored petri nets theory and applications
Colored petri nets theory and applications
Abu Hussein
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
guest5929fa7
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
baoilleach
 

Ähnlich wie Scipy 2011 Time Series Analysis in Python (20)

Antao Biopython Bosc2008
Antao Biopython Bosc2008Antao Biopython Bosc2008
Antao Biopython Bosc2008
 
BOSC 2008 Biopython
BOSC 2008 BiopythonBOSC 2008 Biopython
BOSC 2008 Biopython
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
人工知能の基本問題:これまでとこれから
人工知能の基本問題:これまでとこれから人工知能の基本問題:これまでとこれから
人工知能の基本問題:これまでとこれから
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
Sci computing using python
Sci computing using pythonSci computing using python
Sci computing using python
 
Automated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform DesignsAutomated Machine Learning via Sequential Uniform Designs
Automated Machine Learning via Sequential Uniform Designs
 
Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0Introduction to Apache Hivemall v0.5.0
Introduction to Apache Hivemall v0.5.0
 
Python Orientation
Python OrientationPython Orientation
Python Orientation
 
Polyglot metadata for Hadoop
Polyglot metadata for HadoopPolyglot metadata for Hadoop
Polyglot metadata for Hadoop
 
Colored petri nets theory and applications
Colored petri nets theory and applicationsColored petri nets theory and applications
Colored petri nets theory and applications
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
Python for Chemistry
Python for ChemistryPython for Chemistry
Python for Chemistry
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Crude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimizationCrude-Oil Scheduling Technology: moving from simulation to optimization
Crude-Oil Scheduling Technology: moving from simulation to optimization
 
Ibmr 2014
Ibmr 2014Ibmr 2014
Ibmr 2014
 
An Overview of Python for Data Analytics
An Overview of Python for Data AnalyticsAn Overview of Python for Data Analytics
An Overview of Python for Data Analytics
 
CMSI計算科学技術特論C (2015) ALPS と量子多体問題②
CMSI計算科学技術特論C (2015) ALPS と量子多体問題②CMSI計算科学技術特論C (2015) ALPS と量子多体問題②
CMSI計算科学技術特論C (2015) ALPS と量子多体問題②
 

Mehr von Wes McKinney

Mehr von Wes McKinney (20)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data FrameworkApache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
 
New Directions for Apache Arrow
New Directions for Apache ArrowNew Directions for Apache Arrow
New Directions for Apache Arrow
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
 
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics StackApache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science StackApache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
 
Shared Infrastructure for Data Science
Shared Infrastructure for Data ScienceShared Infrastructure for Data Science
Shared Infrastructure for Data Science
 
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
 
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine LearningMemory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Scipy 2011 Time Series Analysis in Python

  • 1. Time Series Analysis in Python with statsmodels Wes McKinney1 Josef Perktold2 Skipper Seabold3 1 Departmentof Statistical Science Duke University 2 Department of Economics University of North Carolina at Chapel Hill 3 Departmentof Economics American University 10th Python in Science Conference, 13 July 2011 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 1 / 29
  • 2. What is statsmodels? A library for statistical modeling, implementing standard statistical models in Python using NumPy and SciPy Includes: Linear (regression) models of many forms Descriptive statistics Statistical tests Time series analysis ...and much more McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 2 / 29
  • 3. What is Time Series Analysis? Statistical modeling of time-ordered data observations Inferring structure, forecasting and simulation, and testing distributional assumptions about the data Modeling dynamic relationships among multiple time series Broad applications e.g. in economics, finance, neuroscience, signal processing... McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 3 / 29
  • 4. Talk Overview Brief update on statsmodels development Aside: user interface and data structures Descriptive statistics and tests Auto-regressive moving average models (ARMA) Vector autoregression (VAR) models Filtering tools (Hodrick-Prescott and others) Near future: Bayesian dynamic linear models (DLMs), ARCH / GARCH volatility models and beyond McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 4 / 29
  • 5. Statsmodels development update We’re now on GitHub! Join us: http://github.com/statsmodels/statsmodels Check out the slick Sphinx docs: http://statsmodels.sourceforge.net Development focus has been largely computational, i.e. writing correct, tested implementations of all the common classes of statistical models McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 5 / 29
  • 6. Statsmodels development update Major work to be done on providing a nice integrated user interface We must work together to close the gap between R and Python! Some important areas: Formula framework, for specifying model design matrices Need integrated rich statistical data structures (pandas) Data visualization of results should always be a few keystrokes away Write a “Statsmodels for R users” guide McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 6 / 29
  • 7. Aside: statistical data structures and user interface While I have a captive audience... Controversial fact: pandas is the only Python library currently providing data structures matching (and in many places exceeding) the richness of R’s data structures (for statistics) Let’s have a BoF session so I can justify this statement Feedback I hear is that end users find the fragmented, incohesive set of Python tools for data analysis and statistics to be confusing, frustrating, and certainly not compelling them to use Python... (Not to mention the packaging headaches) McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 7 / 29
  • 8. Aside: statistical data structures and user interface We need to “commit” ASAP (not 12 months from now) to a high level data structure(s) as the “primary data structure(s) for statistical data analysis” and communicate that clearly to end users Or we might as well all start programming in R... McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 8 / 29
  • 9. Example data: EEG trace data 300 200 100 0 100 200 300 400 500 600 0 500 0 0 0 0 0 0 0 100 150 200 250 300 350 400 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 9 / 29
  • 10. Example data: Macroeconomic data 5.5 5.0 cpi 4.5 4.0 3.5 3.0 7.5 7.0 m1 6.5 6.0 5.5 5.0 4.5 9.5 9.0 realgdp 8.5 8.0 0 4 8 2 6 0 4 8 2 6 0 4 8 196 196 196 197 197 198 198 198 199 199 200 200 200 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 10 / 29
  • 11. Example data: Stock data 800 AAPL 700 GOOG MSFT 600 YHOO 500 400 300 200 100 0 1 2 3 4 5 6 7 8 9 200 200 200 200 200 200 200 200 200 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 11 / 29
  • 12. Descriptive statistics Autocorrelation, partial autocorrelation plots Commonly used for identification in ARMA(p,q) and ARIMA(p,d,q) models acf = tsa . acf ( eeg , 50) pacf = tsa . pacf ( eeg , 50) 1.0 Autocorrelation 1.0 Partial Autocorrelation 0.5 0.5 0.0 0.0 0.5 0.5 1.00 10 20 30 40 50 1.00 10 20 30 40 50 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 12 / 29
  • 13. Statistical tests Ljung-Box test for zero autocorrelation Unit root test for cointegration (Augmented Dickey-Fuller test) Granger-causality Whiteness (iid-ness) and normality See our conference paper (when the proceedings get published!) McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 13 / 29
  • 14. Autoregressive moving average (ARMA) models One of most common univariate time series models: yt = µ + a1 yt−1 + ... + ak yt−p + t + b1 t−1 + ... + bq t−q 2 where E ( t , s ) = 0, for t = s and t ∼ N (0, σ ) Exact log-likelihood can be evaluated via the Kalman filter, but the “conditional” likelihood is easier and commonly used statsmodels has tools for simulating ARMA processes with known coefficients ai , bi and also estimation given specified lag orders import scikits.statsmodels.tsa.arima_process as ap ar_coef = [1, .75, -.25]; ma_coef = [1, -.5] nobs = 100 y = ap.arma_generate_sample(ar_coef, ma_coef, nobs) y += 4 # add in constant McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 14 / 29
  • 15. ARMA Estimation Several likelihood-based estimators implemented (see docs) model = tsa.ARMA(y) result = model.fit(order=(2, 1), trend=’c’, method=’css-mle’, disp=-1) result.params # array([ 3.97, -0.97, -0.05, -0.13]) Standard model diagnostics, standard errors, information criteria (AIC, BIC, ...), etc available in the returned ARMAResults object McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 15 / 29
  • 16. Vector Autoregression (VAR) models Widely used model for modeling multiple (K -variate) time series, especially in macroeconomics: Yt = A1 Yt−1 + . . . + Ap Yt−p + t, t ∼ N (0, Σ) Matrices Ai are K × K . Yt must be a stationary process (sometimes achieved by differencing). Related class of models (VECM) for modeling nonstationary (including cointegrated) processes McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 16 / 29
  • 17. Vector Autoregression (VAR) models >>> model = VAR(data); model.select_order(8) VAR Order Selection ===================================================== aic bic fpe hqic ----------------------------------------------------- 0 -27.83 -27.78 8.214e-13 -27.81 1 -28.77 -28.57 3.189e-13 -28.69 2 -29.00 -28.64* 2.556e-13 -28.85 3 -29.10 -28.60 2.304e-13 -28.90* 4 -29.09 -28.43 2.330e-13 -28.82 5 -29.13 -28.33 2.228e-13 -28.81 6 -29.14* -28.18 2.213e-13* -28.75 7 -29.07 -27.96 2.387e-13 -28.62 ===================================================== * Minimum McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 17 / 29
  • 18. Vector Autoregression (VAR) models >>> result = model.fit(2) >>> result.summary() # print summary for each variable <snip> Results for equation m1 ==================================================== coefficient std. error t-stat prob ---------------------------------------------------- const 0.004968 0.001850 2.685 0.008 L1.m1 0.363636 0.071307 5.100 0.000 L1.realgdp -0.077460 0.092975 -0.833 0.406 L1.cpi -0.052387 0.128161 -0.409 0.683 L2.m1 0.250589 0.072050 3.478 0.001 L2.realgdp -0.085874 0.092032 -0.933 0.352 L2.cpi 0.169803 0.128376 1.323 0.188 ==================================================== <snip> McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 18 / 29
  • 19. Vector Autoregression (VAR) models >>> result = model.fit(2) >>> result.summary() # print summary for each variable <snip> Correlation matrix of residuals m1 realgdp cpi m1 1.000000 -0.055690 -0.297494 realgdp -0.055690 1.000000 0.115597 cpi -0.297494 0.115597 1.000000 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 19 / 29
  • 20. VAR: Impulse Response analysis Analyze systematic impact of unit “shock” to a single variable irf = result.irf(10) irf.plot() Impulse responses m1 → m1 realgdp → m1 cpi → m1 1.0 0.2 0.4 0.8 0.1 0.3 0.2 0.6 0.0 0.1 0.4 0.1 0.0 0.2 0.2 0.1 0.2 0.0 0.3 0.3 0.20 4 0.40 4 10 0.40 2 6 m1 → realgdp 8 10 2 realgdp → realgdp 8 6 2 cpi4→ realgdp 6 8 10 0.20 1.0 0.2 0.15 0.8 0.1 0.10 0.6 0.0 0.05 0.4 0.1 0.00 0.05 0.2 0.2 0.10 0.0 0.3 0.150 2 4 6 8 10 0.20 2 4 0.40 4 → cpi m1 → cpi realgdp →6 cpi 8 10 2 cpi 6 8 10 0.20 0.15 1.0 0.15 0.10 0.8 0.10 0.05 0.6 0.05 0.00 0.00 0.05 0.4 0.05 0.10 0.2 0.100 2 4 6 8 10 0.150 2 4 6 8 10 0.00 2 4 6 8 10 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 20 / 29
  • 21. VAR: Forecast Error Variance Decomposition Analyze contribution of each variable to forecasting error fevd = result.fevd(20) fevd.plot() Forecast error variance decomposition (FEVD) m1 1.0 m1 realgdp 0.8 cpi 0.6 0.4 0.2 0.00 5 10 15 20 1.2 realgdp 1.0 0.8 0.6 0.4 0.2 0.00 5 10 15 20 1.2 cpi 1.0 0.8 0.6 0.4 0.2 0.00 5 10 15 20 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 21 / 29
  • 22. VAR: Statistical tests In [137]: result.test_causality(’m1’, [’cpi’, ’realgdp’]) Granger causality f-test ========================================================= Test statistic Critical Value p-value df --------------------------------------------------------- 1.248787 2.387325 0.289 (4, 579) ========================================================= H_0: [’cpi’, ’realgdp’] do not Granger-cause m1 Conclusion: fail to reject H_0 at 5.00% significance level McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 22 / 29
  • 23. Filtering Hodrick-Prescott (HP) filter separates a time series yt into a trend τt and a cyclical component ζt , so that yt = τt + ζt . 14 Inflation 12 Cyclical component 10 Trend component 8 6 4 2 0 2 4 2 6 0 4 8 2 6 0 4 8 2 6 196 196 197 197 197 198 198 199 199 199 200 200 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 23 / 29
  • 24. Filtering In addition to the HP filter, 2 other filters popular in finance and economics, Baxter-King and Christiano-Fitzgerald, are available We refer you to our paper and the documentation for details on these: Inflation and Unemployment: BK Filtered Inflation and Unemployment: CF Filtered INFL INFL 4 4 UNEMP UNEMP 2 2 0 0 2 2 4 4 63 73 83 93 68 78 88 98 03 71 81 91 08 66 76 86 96 01 06 19 19 19 19 19 19 19 19 19 19 19 20 19 19 19 19 20 20 20 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 24 / 29
  • 25. Preview: Bayesian dynamic linear models (DLM) A state space model by another name: yt = Ft θt + νt , νt ∼ N (0, Vt ) θt = G θt−1 + ωt , ωt ∼ N (0, Wt ) Estimation of basic model by Kalman filter recursions. Provides elegant way to do time-varying linear regressions for forecasting Extensions: multivariate DLMs, stochastic volatility (SV) models, MCMC-based posterior sampling, mixtures of DLMs McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 25 / 29
  • 26. Preview: DLM Example (Constant+Trend model) model = Polynomial(2) dlm = DLM(close_px[’AAPL’], model.F, G=model.G, # model m0=m0, C0=C0, n0=n0, s0=s0, # priors state_discount=.95) # discount factor Constant + Trend DLM 200 150 100 50 8 9 009 9 009 9 9 200 200 2 200 Jul 2 200 200 Nov Jan Mar May Sep Nov McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 26 / 29
  • 27. Preview: Stochastic volatility models 1.6 JPY-USD Exchange Rate Volatility Process 1.4 1.2 1.0 0.8 0.6 0.4 0.20 200 400 600 800 1000 McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 27 / 29
  • 28. Future: sandbox and beyond ARCH / GARCH models for volatility Structural VAR and error correction models (ECM) for cointegrated processes Models with non-normally distributed errors Better data description, visualization, and interactive research tools More sophisticated Bayesian time series models McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 28 / 29
  • 29. Conclusions We’ve implemented many foundational models for time series analysis, but the field is very broad User interface can and should be much improved Repo: http://github.com/statsmodels/statsmodels Docs: http://statsmodels.sourceforge.net Contact: pystatsmodels@googlegroups.com McKinney, Perktold, Seabold (statsmodels) Python Time Series Analysis SciPy Conference 2011 29 / 29