statsmodels.tools.tools.categorical(data, col=None, dictnames=False, drop=False)
[source]
Returns a dummy matrix given an array of categorical variables.
Parameters: |
|
---|---|
Returns: |
A matrix of dummy (indicator/binary) float variables for the categorical data. If dictnames is True, then the dictionary is returned as well. |
Return type: |
dummy_matrix, [dictnames, optional] |
This returns a dummy variable for EVERY distinct variable. If a a structured or recarray is provided, the names for the new variable is the old variable name - underscore - category name. So if the a variable ‘vote’ had answers as ‘yes’ or ‘no’ then the returned array would have to new variables– ‘vote_yes’ and ‘vote_no’. There is currently no name checking.
>>> import numpy as np >>> import statsmodels.api as sm
Univariate examples
>>> import string >>> string_var = [string.ascii_lowercase[0:5], string.ascii_lowercase[5:10], string.ascii_lowercase[10:15], string.ascii_lowercase[15:20], string.ascii_lowercase[20:25]] >>> string_var *= 5 >>> string_var = np.asarray(sorted(string_var)) >>> design = sm.tools.categorical(string_var, drop=True)
Or for a numerical categorical variable
>>> instr = np.floor(np.arange(10,60, step=2)/10) >>> design = sm.tools.categorical(instr, drop=True)
With a structured array
>>> num = np.random.randn(25,2) >>> struct_ar = np.zeros((25,1), dtype=[('var1', 'f4'),('var2', 'f4'), ('instrument','f4'),('str_instr','a5')]) >>> struct_ar['var1'] = num[:,0][:,None] >>> struct_ar['var2'] = num[:,1][:,None] >>> struct_ar['instrument'] = instr[:,None] >>> struct_ar['str_instr'] = string_var[:,None] >>> design = sm.tools.categorical(struct_ar, col='instrument', drop=True)
Or
>>> design2 = sm.tools.categorical(struct_ar, col='str_instr', drop=True)
© 2009–2012 Statsmodels Developers
© 2006–2008 Scipy Developers
© 2006 Jonathan E. Taylor
Licensed under the 3-clause BSD License.
http://www.statsmodels.org/stable/generated/statsmodels.tools.tools.categorical.html