Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #100500 > unrolled thread

Help on error " ValueError: For numerical factors, num_columns must be an int "

Started byRobert <rxjwg98@gmail.com>
First post2015-12-16 02:44 -0800
Last post2015-12-16 18:37 -0800
Articles 7 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Help on error " ValueError: For numerical factors, num_columns must be an int " Robert <rxjwg98@gmail.com> - 2015-12-16 02:44 -0800
    Re: Help on error " ValueError: For numerical factors, num_columns must be an int " Robert <rxjwg98@gmail.com> - 2015-12-16 02:56 -0800
      Re: Help on error " ValueError: For numerical factors, num_columns must be an int " Robert <rxjwg98@gmail.com> - 2015-12-16 03:03 -0800
    Re: Help on error " ValueError: For numerical factors, num_columns must be an int " Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-12-16 11:33 +0000
      Re: Help on error " ValueError: For numerical factors, num_columns must be an int " Robert <rxjwg98@gmail.com> - 2015-12-16 06:50 -0800
        Re: Help on error " ValueError: For numerical factors, num_columns must be an int " Josef Pktd <josef.pktd@gmail.com> - 2015-12-16 17:57 -0800
          Re: Help on error " ValueError: For numerical factors, num_columns must be an int " Robert <rxjwg98@gmail.com> - 2015-12-16 18:37 -0800

#100500 — Help on error " ValueError: For numerical factors, num_columns must be an int "

FromRobert <rxjwg98@gmail.com>
Date2015-12-16 02:44 -0800
SubjectHelp on error " ValueError: For numerical factors, num_columns must be an int "
Message-ID<cb78beb6-7a28-4bb5-8215-8771f1f324e3@googlegroups.com>
Hi,

When I run the following code, there is an error:

ValueError: For numerical factors, num_columns must be an int 


================
import numpy as np
import pandas as pd
from patsy import dmatrices
from sklearn.linear_model import LogisticRegression

X = [0.5,0.75,1.0,1.25,1.5,1.75,1.75,2.0,2.25,2.5,2.75,3.0,3.25,
3.5,4.0,4.25,4.5,4.75,5.0,5.5]
y = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]

zipped = list(zip(X,y))
df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])

y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
=======================

I have check 'df' is this type:
=============
type(df)
Out[25]: pandas.core.frame.DataFrame
=============

I cannot figure out where the problem is. Can you help me?
Thanks.

Error message:
..........


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\Users\rj\pyprj\stackoverflow_logisticregression0.py in <module>()
     17 df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
     18 
---> 19 y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
     20 
     21 y = np.ravel(y)

C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in dmatrices(formula_like, data, eval_env, NA_action, return_type)
    295     eval_env = EvalEnvironment.capture(eval_env, reference=1)
    296     (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
--> 297                                       NA_action, return_type)
    298     if lhs.shape[1] == 0:
    299         raise PatsyError("model is missing required outcome variables")

C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
    150         return iter([data])
    151     design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
--> 152                                       NA_action)
    153     if design_infos is not None:
    154         return build_design_matrices(design_infos, data,

C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
     55                                       data_iter_maker,
     56                                       eval_env,
---> 57                                       NA_action)
     58     else:
     59         return None

C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\build.pyc in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
    704                             factor_states[factor],
    705                             num_columns=num_column_counts[factor],
--> 706                             categories=None)
    707         else:
    708             assert factor in cat_levels_contrasts

C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\design_info.pyc in __init__(self, factor, type, state, num_columns, categories)
     86         if self.type == "numerical":
     87             if not isinstance(num_columns, int):
---> 88                 raise ValueError("For numerical factors, num_columns "
     89                                  "must be an int")
     90             if categories is not None:

ValueError: For numerical factors, num_columns must be an int 

[toc] | [next] | [standalone]


#100501

FromRobert <rxjwg98@gmail.com>
Date2015-12-16 02:56 -0800
Message-ID<fb190c29-132d-42da-a1e3-d7f13a7d800f@googlegroups.com>
In reply to#100500
On Wednesday, December 16, 2015 at 5:44:21 AM UTC-5, Robert wrote:
> Hi,
> 
> When I run the following code, there is an error:
> 
> ValueError: For numerical factors, num_columns must be an int 
> 
> 
> ================
> import numpy as np
> import pandas as pd
> from patsy import dmatrices
> from sklearn.linear_model import LogisticRegression
> 
> X = [0.5,0.75,1.0,1.25,1.5,1.75,1.75,2.0,2.25,2.5,2.75,3.0,3.25,
> 3.5,4.0,4.25,4.5,4.75,5.0,5.5]
> y = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]
> 
> zipped = list(zip(X,y))
> df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> 
> y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> =======================
> 
> I have check 'df' is this type:
> =============
> type(df)
> Out[25]: pandas.core.frame.DataFrame
> =============
> 
> I cannot figure out where the problem is. Can you help me?
> Thanks.
> 
> Error message:
> ..........
> 
> 
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> C:\Users\rj\pyprj\stackoverflow_logisticregression0.py in <module>()
>      17 df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
>      18 
> ---> 19 y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
>      20 
>      21 y = np.ravel(y)
> 
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in dmatrices(formula_like, data, eval_env, NA_action, return_type)
>     295     eval_env = EvalEnvironment.capture(eval_env, reference=1)
>     296     (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
> --> 297                                       NA_action, return_type)
>     298     if lhs.shape[1] == 0:
>     299         raise PatsyError("model is missing required outcome variables")
> 
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
>     150         return iter([data])
>     151     design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
> --> 152                                       NA_action)
>     153     if design_infos is not None:
>     154         return build_design_matrices(design_infos, data,
> 
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
>      55                                       data_iter_maker,
>      56                                       eval_env,
> ---> 57                                       NA_action)
>      58     else:
>      59         return None
> 
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\build.pyc in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
>     704                             factor_states[factor],
>     705                             num_columns=num_column_counts[factor],
> --> 706                             categories=None)
>     707         else:
>     708             assert factor in cat_levels_contrasts
> 
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\design_info.pyc in __init__(self, factor, type, state, num_columns, categories)
>      86         if self.type == "numerical":
>      87             if not isinstance(num_columns, int):
> ---> 88                 raise ValueError("For numerical factors, num_columns "
>      89                                  "must be an int")
>      90             if categories is not None:
> 
> ValueError: For numerical factors, num_columns must be an int

BTW, I use Python 2.7 on Canopy. 

patsy: VERSION    0.4.0

Thanks,

[toc] | [prev] | [next] | [standalone]


#100502

FromRobert <rxjwg98@gmail.com>
Date2015-12-16 03:03 -0800
Message-ID<1bac9aef-43e8-49c9-b07d-254b1100011d@googlegroups.com>
In reply to#100501
On Wednesday, December 16, 2015 at 5:57:04 AM UTC-5, Robert wrote:
> On Wednesday, December 16, 2015 at 5:44:21 AM UTC-5, Robert wrote:
> > Hi,
> > 
> > When I run the following code, there is an error:
> > 
> > ValueError: For numerical factors, num_columns must be an int 
> > 
> > 
> > ================
> > import numpy as np
> > import pandas as pd
> > from patsy import dmatrices
> > from sklearn.linear_model import LogisticRegression
> > 
> > X = [0.5,0.75,1.0,1.25,1.5,1.75,1.75,2.0,2.25,2.5,2.75,3.0,3.25,
> > 3.5,4.0,4.25,4.5,4.75,5.0,5.5]
> > y = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]
> > 
> > zipped = list(zip(X,y))
> > df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> > 
> > y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> > =======================
> > 
> > I have check 'df' is this type:
> > =============
> > type(df)
> > Out[25]: pandas.core.frame.DataFrame
> > =============
> > 
> > I cannot figure out where the problem is. Can you help me?
> > Thanks.
> > 
> > Error message:
> > ..........
> > 
> > 
> > ---------------------------------------------------------------------------
> > ValueError                                Traceback (most recent call last)
> > C:\Users\rj\pyprj\stackoverflow_logisticregression0.py in <module>()
> >      17 df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> >      18 
> > ---> 19 y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> >      20 
> >      21 y = np.ravel(y)
> > 
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in dmatrices(formula_like, data, eval_env, NA_action, return_type)
> >     295     eval_env = EvalEnvironment.capture(eval_env, reference=1)
> >     296     (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
> > --> 297                                       NA_action, return_type)
> >     298     if lhs.shape[1] == 0:
> >     299         raise PatsyError("model is missing required outcome variables")
> > 
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
> >     150         return iter([data])
> >     151     design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
> > --> 152                                       NA_action)
> >     153     if design_infos is not None:
> >     154         return build_design_matrices(design_infos, data,
> > 
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
> >      55                                       data_iter_maker,
> >      56                                       eval_env,
> > ---> 57                                       NA_action)
> >      58     else:
> >      59         return None
> > 
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\build.pyc in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
> >     704                             factor_states[factor],
> >     705                             num_columns=num_column_counts[factor],
> > --> 706                             categories=None)
> >     707         else:
> >     708             assert factor in cat_levels_contrasts
> > 
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\design_info.pyc in __init__(self, factor, type, state, num_columns, categories)
> >      86         if self.type == "numerical":
> >      87             if not isinstance(num_columns, int):
> > ---> 88                 raise ValueError("For numerical factors, num_columns "
> >      89                                  "must be an int")
> >      90             if categories is not None:
> > 
> > ValueError: For numerical factors, num_columns must be an int
> 
> BTW, I use Python 2.7 on Canopy. 
> 
> patsy: VERSION    0.4.0
> 
> Thanks,

When I use this code snippet, copied from the wb, it is also wrong:

import numpy as np
import pandas as pd
import patsy

time = np.tile([1, 2, 3, 4], 3)
country = np.repeat(['a', 'b', 'c'], 4)
event_int = np.random.randint(0, 2, size=len(time))

df = pd.DataFrame({'event_int':event_int, 'time_day':time, 'country':country})

f0 = 'event_int ~ C(time_day):C(country) - 1'
y,X0 = patsy.dmatrices(f0, df, return_type='dataframe')
print len(X0.columns)

I am new to these packages. I don't know why it is correct for other users.
Thanks,

[toc] | [prev] | [next] | [standalone]


#100503

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2015-12-16 11:33 +0000
Message-ID<mailman.7.1450265638.30845.python-list@python.org>
In reply to#100500
On 16/12/2015 10:44, Robert wrote:
> Hi,
>
> When I run the following code, there is an error:
>
> ValueError: For numerical factors, num_columns must be an int
>
>
> ================
> import numpy as np
> import pandas as pd
> from patsy import dmatrices
> from sklearn.linear_model import LogisticRegression
>
> X = [0.5,0.75,1.0,1.25,1.5,1.75,1.75,2.0,2.25,2.5,2.75,3.0,3.25,
> 3.5,4.0,4.25,4.5,4.75,5.0,5.5]
> y = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]
>
> zipped = list(zip(X,y))
> df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
>
> y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> =======================
>
> I have check 'df' is this type:
> =============
> type(df)
> Out[25]: pandas.core.frame.DataFrame
> =============
>
> I cannot figure out where the problem is. Can you help me?
> Thanks.
>
> Error message:
> ..........
>
>
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> C:\Users\rj\pyprj\stackoverflow_logisticregression0.py in <module>()
>       17 df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
>       18
> ---> 19 y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
>       20
>       21 y = np.ravel(y)
>
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in dmatrices(formula_like, data, eval_env, NA_action, return_type)
>      295     eval_env = EvalEnvironment.capture(eval_env, reference=1)
>      296     (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
> --> 297                                       NA_action, return_type)
>      298     if lhs.shape[1] == 0:
>      299         raise PatsyError("model is missing required outcome variables")
>
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
>      150         return iter([data])
>      151     design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
> --> 152                                       NA_action)
>      153     if design_infos is not None:
>      154         return build_design_matrices(design_infos, data,
>
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
>       55                                       data_iter_maker,
>       56                                       eval_env,
> ---> 57                                       NA_action)
>       58     else:
>       59         return None
>
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\build.pyc in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
>      704                             factor_states[factor],
>      705                             num_columns=num_column_counts[factor],
> --> 706                             categories=None)
>      707         else:
>      708             assert factor in cat_levels_contrasts
>
> C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\design_info.pyc in __init__(self, factor, type, state, num_columns, categories)
>       86         if self.type == "numerical":
>       87             if not isinstance(num_columns, int):
> ---> 88                 raise ValueError("For numerical factors, num_columns "
>       89                                  "must be an int")
>       90             if categories is not None:
>
> ValueError: For numerical factors, num_columns must be an int
>

Slap the ValueError into a search engine and the first hit is 
https://groups.google.com/forum/#!topic/pystatsmodels/KcSzNqDxv-Q

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#100511

FromRobert <rxjwg98@gmail.com>
Date2015-12-16 06:50 -0800
Message-ID<51b673c2-589d-4141-8b80-ef17318a9218@googlegroups.com>
In reply to#100503
On Wednesday, December 16, 2015 at 6:34:21 AM UTC-5, Mark Lawrence wrote:
> On 16/12/2015 10:44, Robert wrote:
> > Hi,
> >
> > When I run the following code, there is an error:
> >
> > ValueError: For numerical factors, num_columns must be an int
> >
> >
> > ================
> > import numpy as np
> > import pandas as pd
> > from patsy import dmatrices
> > from sklearn.linear_model import LogisticRegression
> >
> > X = [0.5,0.75,1.0,1.25,1.5,1.75,1.75,2.0,2.25,2.5,2.75,3.0,3.25,
> > 3.5,4.0,4.25,4.5,4.75,5.0,5.5]
> > y = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]
> >
> > zipped = list(zip(X,y))
> > df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> >
> > y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> > =======================
> >
> > I have check 'df' is this type:
> > =============
> > type(df)
> > Out[25]: pandas.core.frame.DataFrame
> > =============
> >
> > I cannot figure out where the problem is. Can you help me?
> > Thanks.
> >
> > Error message:
> > ..........
> >
> >
> > ---------------------------------------------------------------------------
> > ValueError                                Traceback (most recent call last)
> > C:\Users\rj\pyprj\stackoverflow_logisticregression0.py in <module>()
> >       17 df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> >       18
> > ---> 19 y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> >       20
> >       21 y = np.ravel(y)
> >
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in dmatrices(formula_like, data, eval_env, NA_action, return_type)
> >      295     eval_env = EvalEnvironment.capture(eval_env, reference=1)
> >      296     (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
> > --> 297                                       NA_action, return_type)
> >      298     if lhs.shape[1] == 0:
> >      299         raise PatsyError("model is missing required outcome variables")
> >
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
> >      150         return iter([data])
> >      151     design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
> > --> 152                                       NA_action)
> >      153     if design_infos is not None:
> >      154         return build_design_matrices(design_infos, data,
> >
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
> >       55                                       data_iter_maker,
> >       56                                       eval_env,
> > ---> 57                                       NA_action)
> >       58     else:
> >       59         return None
> >
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\build.pyc in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
> >      704                             factor_states[factor],
> >      705                             num_columns=num_column_counts[factor],
> > --> 706                             categories=None)
> >      707         else:
> >      708             assert factor in cat_levels_contrasts
> >
> > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\design_info.pyc in __init__(self, factor, type, state, num_columns, categories)
> >       86         if self.type == "numerical":
> >       87             if not isinstance(num_columns, int):
> > ---> 88                 raise ValueError("For numerical factors, num_columns "
> >       89                                  "must be an int")
> >       90             if categories is not None:
> >
> > ValueError: For numerical factors, num_columns must be an int
> >
> 
> Slap the ValueError into a search engine and the first hit is 
> https://groups.google.com/forum/#!topic/pystatsmodels/KcSzNqDxv-Q
> 
> -- 
> My fellow Pythonistas, ask not what our language can do for you, ask
> what you can do for our language.
> 
> Mark Lawrence

Hi,
I don't see a solution to my problem. I find the following demo code from 

https://patsy.readthedocs.org/en/v0.1.0/API-reference.html#patsy.dmatrix

It doesn't work either on the Canopy. Does it work on your computer?
Thanks,

/////////////
demo_data("a", "x", nlevels=3)
Out[134]: 
{'a': ['a1', 'a2', 'a3', 'a1', 'a2', 'a3'],
 'x': array([ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
        -0.97727788])}

mat = dmatrix("a + x", demo_data("a", "x", nlevels=3))

[toc] | [prev] | [next] | [standalone]


#100550

FromJosef Pktd <josef.pktd@gmail.com>
Date2015-12-16 17:57 -0800
Message-ID<a17dc6a5-fc55-4c5e-9852-403928694ed9@googlegroups.com>
In reply to#100511
On Wednesday, December 16, 2015 at 9:50:35 AM UTC-5, Robert wrote:
> On Wednesday, December 16, 2015 at 6:34:21 AM UTC-5, Mark Lawrence wrote:
> > On 16/12/2015 10:44, Robert wrote:
> > > Hi,
> > >
> > > When I run the following code, there is an error:
> > >
> > > ValueError: For numerical factors, num_columns must be an int
> > >
> > >
> > > ================
> > > import numpy as np
> > > import pandas as pd
> > > from patsy import dmatrices
> > > from sklearn.linear_model import LogisticRegression
> > >
> > > X = [0.5,0.75,1.0,1.25,1.5,1.75,1.75,2.0,2.25,2.5,2.75,3.0,3.25,
> > > 3.5,4.0,4.25,4.5,4.75,5.0,5.5]
> > > y = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]
> > >
> > > zipped = list(zip(X,y))
> > > df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> > >
> > > y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> > > =======================
> > >
> > > I have check 'df' is this type:
> > > =============
> > > type(df)
> > > Out[25]: pandas.core.frame.DataFrame
> > > =============
> > >
> > > I cannot figure out where the problem is. Can you help me?
> > > Thanks.
> > >
> > > Error message:
> > > ..........
> > >
> > >
> > > ---------------------------------------------------------------------------
> > > ValueError                                Traceback (most recent call last)
> > > C:\Users\rj\pyprj\stackoverflow_logisticregression0.py in <module>()
> > >       17 df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> > >       18
> > > ---> 19 y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> > >       20
> > >       21 y = np.ravel(y)
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in dmatrices(formula_like, data, eval_env, NA_action, return_type)
> > >      295     eval_env = EvalEnvironment.capture(eval_env, reference=1)
> > >      296     (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
> > > --> 297                                       NA_action, return_type)
> > >      298     if lhs.shape[1] == 0:
> > >      299         raise PatsyError("model is missing required outcome variables")
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
> > >      150         return iter([data])
> > >      151     design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
> > > --> 152                                       NA_action)
> > >      153     if design_infos is not None:
> > >      154         return build_design_matrices(design_infos, data,
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
> > >       55                                       data_iter_maker,
> > >       56                                       eval_env,
> > > ---> 57                                       NA_action)
> > >       58     else:
> > >       59         return None
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\build.pyc in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
> > >      704                             factor_states[factor],
> > >      705                             num_columns=num_column_counts[factor],
> > > --> 706                             categories=None)
> > >      707         else:
> > >      708             assert factor in cat_levels_contrasts
> > >
> > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\design_info.pyc in __init__(self, factor, type, state, num_columns, categories)
> > >       86         if self.type == "numerical":
> > >       87             if not isinstance(num_columns, int):
> > > ---> 88                 raise ValueError("For numerical factors, num_columns "
> > >       89                                  "must be an int")
> > >       90             if categories is not None:
> > >
> > > ValueError: For numerical factors, num_columns must be an int
> > >
> > 
> > Slap the ValueError into a search engine and the first hit is 
> > https://groups.google.com/forum/#!topic/pystatsmodels/KcSzNqDxv-Q

This was fixed in patsy 0.4.1 as discussed in this statsmodels thread.
You need to upgrade patsy from 0.4.0.

AFAIR, the type checking was too strict and broke with recent numpy versions.

Josef


> > 
> > -- 
> > My fellow Pythonistas, ask not what our language can do for you, ask
> > what you can do for our language.
> > 
> > Mark Lawrence
> 
> Hi,
> I don't see a solution to my problem. I find the following demo code from 
> 
> https://patsy.readthedocs.org/en/v0.1.0/API-reference.html#patsy.dmatrix
> 
> It doesn't work either on the Canopy. Does it work on your computer?
> Thanks,
> 
> /////////////
> demo_data("a", "x", nlevels=3)
> Out[134]: 
> {'a': ['a1', 'a2', 'a3', 'a1', 'a2', 'a3'],
>  'x': array([ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
>         -0.97727788])}
> 
> mat = dmatrix("a + x", demo_data("a", "x", nlevels=3))

[toc] | [prev] | [next] | [standalone]


#100551

FromRobert <rxjwg98@gmail.com>
Date2015-12-16 18:37 -0800
Message-ID<dec2bb9d-dfd1-48ea-bd59-15eab60b2594@googlegroups.com>
In reply to#100550
On Wednesday, December 16, 2015 at 8:57:30 PM UTC-5, Josef Pktd wrote:
> On Wednesday, December 16, 2015 at 9:50:35 AM UTC-5, Robert wrote:
> > On Wednesday, December 16, 2015 at 6:34:21 AM UTC-5, Mark Lawrence wrote:
> > > On 16/12/2015 10:44, Robert wrote:
> > > > Hi,
> > > >
> > > > When I run the following code, there is an error:
> > > >
> > > > ValueError: For numerical factors, num_columns must be an int
> > > >
> > > >
> > > > ================
> > > > import numpy as np
> > > > import pandas as pd
> > > > from patsy import dmatrices
> > > > from sklearn.linear_model import LogisticRegression
> > > >
> > > > X = [0.5,0.75,1.0,1.25,1.5,1.75,1.75,2.0,2.25,2.5,2.75,3.0,3.25,
> > > > 3.5,4.0,4.25,4.5,4.75,5.0,5.5]
> > > > y = [0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1]
> > > >
> > > > zipped = list(zip(X,y))
> > > > df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> > > >
> > > > y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> > > > =======================
> > > >
> > > > I have check 'df' is this type:
> > > > =============
> > > > type(df)
> > > > Out[25]: pandas.core.frame.DataFrame
> > > > =============
> > > >
> > > > I cannot figure out where the problem is. Can you help me?
> > > > Thanks.
> > > >
> > > > Error message:
> > > > ..........
> > > >
> > > >
> > > > ---------------------------------------------------------------------------
> > > > ValueError                                Traceback (most recent call last)
> > > > C:\Users\rj\pyprj\stackoverflow_logisticregression0.py in <module>()
> > > >       17 df = pd.DataFrame(zipped,columns = ['study_hrs','p_or_f'])
> > > >       18
> > > > ---> 19 y, X = dmatrices('p_or_f ~ study_hrs', df, return_type="dataframe")
> > > >       20
> > > >       21 y = np.ravel(y)
> > > >
> > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in dmatrices(formula_like, data, eval_env, NA_action, return_type)
> > > >      295     eval_env = EvalEnvironment.capture(eval_env, reference=1)
> > > >      296     (lhs, rhs) = _do_highlevel_design(formula_like, data, eval_env,
> > > > --> 297                                       NA_action, return_type)
> > > >      298     if lhs.shape[1] == 0:
> > > >      299         raise PatsyError("model is missing required outcome variables")
> > > >
> > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _do_highlevel_design(formula_like, data, eval_env, NA_action, return_type)
> > > >      150         return iter([data])
> > > >      151     design_infos = _try_incr_builders(formula_like, data_iter_maker, eval_env,
> > > > --> 152                                       NA_action)
> > > >      153     if design_infos is not None:
> > > >      154         return build_design_matrices(design_infos, data,
> > > >
> > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\highlevel.pyc in _try_incr_builders(formula_like, data_iter_maker, eval_env, NA_action)
> > > >       55                                       data_iter_maker,
> > > >       56                                       eval_env,
> > > > ---> 57                                       NA_action)
> > > >       58     else:
> > > >       59         return None
> > > >
> > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\build.pyc in design_matrix_builders(termlists, data_iter_maker, eval_env, NA_action)
> > > >      704                             factor_states[factor],
> > > >      705                             num_columns=num_column_counts[factor],
> > > > --> 706                             categories=None)
> > > >      707         else:
> > > >      708             assert factor in cat_levels_contrasts
> > > >
> > > > C:\Users\rj\AppData\Local\Enthought\Canopy\User\lib\site-packages\patsy\design_info.pyc in __init__(self, factor, type, state, num_columns, categories)
> > > >       86         if self.type == "numerical":
> > > >       87             if not isinstance(num_columns, int):
> > > > ---> 88                 raise ValueError("For numerical factors, num_columns "
> > > >       89                                  "must be an int")
> > > >       90             if categories is not None:
> > > >
> > > > ValueError: For numerical factors, num_columns must be an int
> > > >
> > > 
> > > Slap the ValueError into a search engine and the first hit is 
> > > https://groups.google.com/forum/#!topic/pystatsmodels/KcSzNqDxv-Q
> 
> This was fixed in patsy 0.4.1 as discussed in this statsmodels thread.
> You need to upgrade patsy from 0.4.0.
> 
> AFAIR, the type checking was too strict and broke with recent numpy versions.
> 
> Josef
> 
> 
> > > 
> > > -- 
> > > My fellow Pythonistas, ask not what our language can do for you, ask
> > > what you can do for our language.
> > > 
> > > Mark Lawrence
> > 
> > Hi,
> > I don't see a solution to my problem. I find the following demo code from 
> > 
> > https://patsy.readthedocs.org/en/v0.1.0/API-reference.html#patsy.dmatrix
> > 
> > It doesn't work either on the Canopy. Does it work on your computer?
> > Thanks,
> > 
> > /////////////
> > demo_data("a", "x", nlevels=3)
> > Out[134]: 
> > {'a': ['a1', 'a2', 'a3', 'a1', 'a2', 'a3'],
> >  'x': array([ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
> >         -0.97727788])}
> > 
> > mat = dmatrix("a + x", demo_data("a", "x", nlevels=3))

Thanks. It is right.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web