Quantcast
Channel: Active questions tagged r - Stack Overflow
Viewing all articles
Browse latest Browse all 209786

Python statsmodels GLM: change treatment of classes when applying 0s and 1s

$
0
0

I have a stock market dataset from GitHub:

import pandas as pd
import numpy as np
import statsmodels.api as sm

Smarket_url = 'https://raw.githubusercontent.com/selva86/datasets/master/Smarket.csv'
#Load data
Smarket = pd.read_csv(Smarket_url)

I'm doing logistic regression with the GLM function of the 'statsmodels' package. I did the same regression using R-Studio and it is giving me the same results except that the resulting coefficients in R that are negative appear as positive in Python, and vice versa. In python I initially used:

Smarket_model = sm.formula.glm('Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume',
                         data=Smarket,family=sm.families.Binomial()).fit()

This were the results:

Intercept      0.1260 
Lag1           0.0731
Lag2           0.0423
Lag3          -0.0111
Lag4          -0.0094 
Lag5          -0.0103 
Volume        -0.1354

I reduced the problem to being in how statsmodels was categorizing the outcome variable: 'Stock Up' was 0 and 'Stock Down' was 1. So I created a numpy array changing this configuration to Stock Up = 1 and Stock Down = 0. I then used the statsmodel.GML() function instead:

#Create numpy array changing zeros to ones and vice versa
change = np.where(Smarket['Direction']=='Up',1,0)   

#Add intercept
smarket_vars = sm.add_constant(Smarket[['Lag1','Lag2', 'Lag3', 'Lag4','Lag5','Volume']]) 
#Fit model
market_model = sm.GLM(change, smarket_vars,family=sm.families.Binomial() ).fit()

This gave me the right negative coefficients:

const         -0.1260      
Lag1          -0.0731      
Lag2          -0.0423      
Lag3           0.0111      
Lag4           0.0094      
Lag5           0.0103     
Volume         0.1354    

My question is, how can I get the right values without having to create the numpy array changing the 0s and 1s? Why did sm.formula.glm() assumed that 'Stock Up' was a 0 and 'Stock Down' was a 1? Thanks to everyone who read all this nonsense and is willing to help me :)


Viewing all articles
Browse latest Browse all 209786

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>