Groups > comp.lang.python > #40705 > unrolled thread

Unhelpful traceback

Started by	John Nagle <nagle@animats.com>
First post	2013-03-06 22:33 -0800
Last post	2013-03-07 19:23 -0500
Articles	9 — 5 participants

Back to article view | Back to comp.lang.python

  Unhelpful traceback John Nagle <nagle@animats.com> - 2013-03-06 22:33 -0800
    Re: Unhelpful traceback Andrew Berg <bahamutzero8825@gmail.com> - 2013-03-07 00:57 -0600
    Re: Unhelpful traceback Chris Rebert <clp2@rebertia.com> - 2013-03-06 23:04 -0800
    Re: Unhelpful traceback Dave Angel <davea@davea.name> - 2013-03-07 08:10 -0500
      Re: Unhelpful traceback John Nagle <nagle@animats.com> - 2013-03-07 10:42 -0800
        Re: Unhelpful traceback John Nagle <nagle@animats.com> - 2013-03-07 14:13 -0800
          Re: Unhelpful traceback Ian Kelly <ian.g.kelly@gmail.com> - 2013-03-07 15:32 -0700
      Re: Unhelpful traceback John Nagle <nagle@animats.com> - 2013-03-07 13:23 -0800
        Re: Unhelpful traceback Dave Angel <davea@davea.name> - 2013-03-07 19:23 -0500

#40705 — Unhelpful traceback

From	John Nagle <nagle@animats.com>
Date	2013-03-06 22:33 -0800
Subject	Unhelpful traceback
Message-ID	<kh9c95$t68$1@dont-email.me>

Here's a traceback that's not helping:

Traceback (most recent call last):
  File "InfoCompaniesHouse.py", line 255, in <module>
    main()
  File "InfoCompaniesHouse.py", line 251, in main
    loader.dofile(infile)                       # load this file
  File "InfoCompaniesHouse.py", line 213, in dofile
    self.dofilezip(infilename)                  # do ZIP file
  File "InfoCompaniesHouse.py", line 198, in dofilezip
    self.dofilecsv(infile, infd)        # as a CSV file
  File "InfoCompaniesHouse.py", line 182, in dofilecsv
    for fields in reader :                              # read entire
CSV file
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 14: ordinal not in range(128)

This is wierd, becuase "for fields in reader" isn't directly
doing a decode. That's further down somewhere, and the backtrace
didn't tell me where.

The program is converting some .CSV files that come packaged in .ZIP
files.  The files are big, so rather than expanding them, they're
read directly from the ZIP files and processed through the ZIP
and CSV modules.

Here's the code that's causing the error above:

        decoder = codecs.getreader('utf-8')
        with decoder(infdraw,errors="replace") as infd :
            with codecs.open(outfilename, encoding='utf-8', mode='w') as
outfd :
                headerline = infd.readline()
                self.doheaderline(headerline)
                reader = csv.reader(infd, delimiter=',', quotechar='"')
                for fields in reader :
                    pass

Normally, the "pass" is a call to something that
uses the data, but for test purposes, I put a "pass" in there.  It still
fails.   With that "pass", nothing is ever written to the
output file, and no "encoding" should be taking place.

"infdraw" is a stream from the zip module, create like this:

    with inzip.open(zipelt.filename,"r") as infd :
        self.dofilecsv(infile, infd)

This works for data records that are pure ASCII, but as soon as some
non-ASCII character comes through, it fails.

Where is the error being generated?  I'm not seeing any place
where there's a conversion to ASCII.  Not even a print.

				John Nagle

[toc] | [next] | [standalone]

#40711

From	Andrew Berg <bahamutzero8825@gmail.com>
Date	2013-03-07 00:57 -0600
Message-ID	<mailman.2981.1362639469.2939.python-list@python.org>
In reply to	#40705

On 2013.03.07 00:33, John Nagle wrote:
> This is wierd, becuase "for fields in reader" isn't directly
> doing a decode. That's further down somewhere, and the backtrace
> didn't tell me where.
Looking at the csv module docs,the reader object iterates over the
csvfile argument (which can be any iterator). I think that, in the case
of a file object, it's not decoded until iteration.
I've never used the csv module before though, so I could be wrong.

-- 
CPython 3.3.0 | Windows NT 6.2.9200 / FreeBSD 9.1

[toc] | [prev] | [next] | [standalone]

#40712

From	Chris Rebert <clp2@rebertia.com>
Date	2013-03-06 23:04 -0800
Message-ID	<mailman.2982.1362639844.2939.python-list@python.org>
In reply to	#40705

On Wed, Mar 6, 2013 at 10:33 PM, John Nagle <nagle@animats.com> wrote:
> Here's a traceback that's not helping:
<snip>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
> position 14: ordinal not in range(128)
<snip>
> The program is converting some .CSV files that come packaged in .ZIP
> files.  The files are big, so rather than expanding them, they're
> read directly from the ZIP files and processed through the ZIP
> and CSV modules.
<snip>
> This works for data records that are pure ASCII, but as soon as some
> non-ASCII character comes through, it fails.

I'd recommend using the `unicodecsv` package, which, unlike the std
lib `csv` module, is properly Unicode-compatible:
https://pypi.python.org/pypi/unicodecsv

Cheers,
Chris

[toc] | [prev] | [next] | [standalone]

#40760

From	Dave Angel <davea@davea.name>
Date	2013-03-07 08:10 -0500
Message-ID	<mailman.3016.1362661824.2939.python-list@python.org>
In reply to	#40705

On 03/07/2013 01:33 AM, John Nagle wrote:
> Here's a traceback that's not helping:
>

A bit more context would be helpful.  Starting with Python version.

> Traceback (most recent call last):
>    File "InfoCompaniesHouse.py", line 255, in <module>
>      main()
>    File "InfoCompaniesHouse.py", line 251, in main
>      loader.dofile(infile)                       # load this file
>    File "InfoCompaniesHouse.py", line 213, in dofile
>      self.dofilezip(infilename)                  # do ZIP file
>    File "InfoCompaniesHouse.py", line 198, in dofilezip
>      self.dofilecsv(infile, infd)        # as a CSV file
>    File "InfoCompaniesHouse.py", line 182, in dofilecsv
>      for fields in reader :                              # read entire
> CSV file
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
> position 14: ordinal not in range(128)
>
> This is wierd, becuase "for fields in reader" isn't directly
> doing a decode. That's further down somewhere, and the backtrace
> didn't tell me where.
>
> The program is converting some .CSV files that come packaged in .ZIP
> files.  The files are big, so rather than expanding them, they're
> read directly from the ZIP files and processed through the ZIP
> and CSV modules.
>
> Here's the code that's causing the error above:


>
>          decoder = codecs.getreader('utf-8')
>          with decoder(infdraw,errors="replace") as infd :
>              with codecs.open(outfilename, encoding='utf-8', mode='w') as
> outfd :
>                  headerline = infd.readline()
>                  self.doheaderline(headerline)
>                  reader = csv.reader(infd, delimiter=',', quotechar='"')
>                  for fields in reader :
>                      pass
>
> Normally, the "pass" is a call to something that
> uses the data, but for test purposes, I put a "pass" in there.  It still
> fails.   With that "pass", nothing is ever written to the
> output file, and no "encoding" should be taking place.
>
> "infdraw" is a stream from the zip module, create like this:
>
>      with inzip.open(zipelt.filename,"r") as infd :

You probably need a 'rb' rather than 'r', since the file is not ASCII.

>          self.dofilecsv(infile, infd)
>
> This works for data records that are pure ASCII, but as soon as some
> non-ASCII character comes through, it fails.
>
> Where is the error being generated?  I'm not seeing any place
> where there's a conversion to ASCII.  Not even a print.
>
> 				John Nagle
>
>
>
>

If that isn't enough, then please give the whole context, such as where 
zipelt and filename came from.  And don't forget to specify Python 
version.  Version 3.x treats nonbinary files very differently than 2.x

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#40807

From	John Nagle <nagle@animats.com>
Date	2013-03-07 10:42 -0800
Message-ID	<khan0m$ud6$1@dont-email.me>
In reply to	#40760

On 3/7/2013 5:10 AM, Dave Angel wrote:
> On 03/07/2013 01:33 AM, John Nagle wrote:
>> Here's a traceback that's not helping:
>>
> 
> A bit more context would be helpful.  Starting with Python version.

    Sorry, Python 2.7.

> 
> If that isn't enough, then please give the whole context, such as where
> zipelt and filename came from.  And don't forget to specify Python
> version.  Version 3.x treats nonbinary files very differently than 2.x

     Here it is, with some email wrap problems.

				John Nagle


    def dofilecsv(self, infilename, infdraw) :
        """
        Loader for Companies House company data, with files already open.
        """
        self.logger.info('Converting "%s"' % (infilename, ))        # log
        (pathpart, filepart) = os.path.split(infilename)            #
split off file part to construct outputfile)
        (outfile, ext) = os.path.splitext(filepart)                 #
remove extension
        outfile += ".sql"                                           #
add SQL suffix
        outfilename = os.path.abspath(os.path.join(self.options.destdir,
outfile))
        #   ***NEED TO INSURE UNIQUE OUTFILENAME EVEN IF DUPLICATED IN
ZIP FILES***
        decoder = codecs.getreader('utf-8')                         #
UTF-8 reader
        with decoder(infdraw,errors="replace") as infd :
            with codecs.open(outfilename, encoding='utf-8', mode='w') as
outfd :
                headerline = infd.readline()                        #
read header line
                self.doheaderline(headerline)                       #
process header line
                reader = csv.reader(infd, delimiter=',', quotechar='"')
# CSV file
                for fields in reader :                              #
read entire CSV file
                    self.doline(outfd, fields)                      #
copy fields
        self.logstats(infilename)                                   #
log statistics of this file

    def dofilezip(self, infilename) :
        """
        Do a ZIP file containing CSV files.
        """
        try :
            inzip = zipfile.ZipFile(infilename, "r", allowZip64=True)
# try to open
            zipdir = inzip.infolist()                       # get
objects in file
            for zipelt in zipdir :                          # for all
objects in file
                self.logger.debug('ZIP file "%s" contains "%s".' %
(infilename, zipelt.filename))
                (infile, ext) = os.path.splitext(zipelt.filename) #
remove extension
                if ext.lower() == ".csv" :                   # if a CSV file
                    with inzip.open(zipelt.filename,"r") as infd :
      # do this file
                        self.dofilecsv(infile, infd)        # as a CSV file
                else :
                    self.logger.error('Non-CSV file in ZIP file: "%s"' %
(zipelt.filename,))
                    self.errorcount += 1                    # tally

        except zipfile.BadZipfile as message :              # if trouble
            self.logger.error('Bad ZIP file: "%s"' % (infilename,))  #
note trouble
            self.errorcount += 1                            # tally

    def dofile(self, infilename) :
        """
        Loader for Companies House company data
        """
        (sink, ext) = os.path.splitext(infilename) # get extension
        if ext == ".zip" :   # if .ZIP file
            self.dofilezip(infilename)                  # do ZIP file
        elif ext == ".csv" :
            self.logger.info('Converting "%s"' % (infilename,))# log
            with open(infilename, "rb") as infd :
                self.dofilecsv(infilename, infd)        # do
            self.logstats(infilename)                   # log statistics
of this file
        else :
            self.logger.error('File of unexpected type (not .csv or
.zip): %s ' % (infilename,))
            self.errorcount += 1

[toc] | [prev] | [next] | [standalone]

#40829

From	John Nagle <nagle@animats.com>
Date	2013-03-07 14:13 -0800
Message-ID	<khb3c0$d17$1@dont-email.me>
In reply to	#40807

On 3/7/2013 10:42 AM, John Nagle wrote:
> On 3/7/2013 5:10 AM, Dave Angel wrote:
>> On 03/07/2013 01:33 AM, John Nagle wrote:
>>> Here's a traceback that's not helping:
>>>
>>
>> A bit more context would be helpful.  Starting with Python version.
> 
>     Sorry, Python 2.7.

The trouble comes from here:

	decoder = codecs.getreader('utf-8')          # UTF-8 reader
        with decoder(infdraw,errors="replace") as infd :

It's not the CSV module that's blowing up.  If I just feed the
raw unconverted bytes from the ZIP module into the CSV module,
the CSV module runs without complaint.

I've tried 'utf-8', 'ascii', and 'windows-1252' as codecs.
They all blow up. 'errors="replace"' doesn't help.

				John Nagle

[toc] | [prev] | [next] | [standalone]

#40831

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-03-07 15:32 -0700
Message-ID	<mailman.3059.1362695621.2939.python-list@python.org>
In reply to	#40829

On Thu, Mar 7, 2013 at 3:13 PM, John Nagle <nagle@animats.com> wrote:
> On 3/7/2013 10:42 AM, John Nagle wrote:
>> On 3/7/2013 5:10 AM, Dave Angel wrote:
>>> On 03/07/2013 01:33 AM, John Nagle wrote:
>>>> Here's a traceback that's not helping:
>>>>
>>>
>>> A bit more context would be helpful.  Starting with Python version.
>>
>>     Sorry, Python 2.7.
>
> The trouble comes from here:
>
>         decoder = codecs.getreader('utf-8')          # UTF-8 reader
>         with decoder(infdraw,errors="replace") as infd :
>
> It's not the CSV module that's blowing up.  If I just feed the
> raw unconverted bytes from the ZIP module into the CSV module,
> the CSV module runs without complaint.
>
> I've tried 'utf-8', 'ascii', and 'windows-1252' as codecs.
> They all blow up. 'errors="replace"' doesn't help.

I believe that the csv module is expecting string data, not unicode.
Since it receives unicode as a result of your decoder step, it tries
to convert it to a string using str(), which implicitly tries to
encode the data using the ascii codec, hence the error that you're
seeing.

[toc] | [prev] | [next] | [standalone]

#40827

From	John Nagle <nagle@animats.com>
Date	2013-03-07 13:23 -0800
Message-ID	<khb0d7$q74$1@dont-email.me>
In reply to	#40760

On 3/7/2013 5:10 AM, Dave Angel wrote:
> On 03/07/2013 01:33 AM, John Nagle wrote:

>>
>> "infdraw" is a stream from the zip module, create like this:
>>
>>      with inzip.open(zipelt.filename,"r") as infd :
> 
> You probably need a 'rb' rather than 'r', since the file is not ASCII.
> 
>>          self.dofilecsv(infile, infd)
>>
>> This works for data records that are pure ASCII, but as soon as some
>> non-ASCII character comes through, it fails.

   No, the ZIP module gives you back the bytes you
put in.  "rb" is not accepted there:

  File "InfoCompaniesHouse.py", line 197, in dofilezip
    with inzip.open(zipelt.filename,"rb") as infd :            # do this
file
  File "C:\python27\lib\zipfile.py", line 872, in open
    raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
RuntimeError: open() requires mode "r", "U", or "rU"

   "b" for files is about end of line handling (CR LF -> LF), anyway.

					John Nagle

[toc] | [prev] | [next] | [standalone]

#40837

From	Dave Angel <davea@davea.name>
Date	2013-03-07 19:23 -0500
Message-ID	<mailman.3063.1362702220.2939.python-list@python.org>
In reply to	#40827

On 03/07/2013 04:23 PM, John Nagle wrote:
>
>>   <snip>
>
>      raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
> RuntimeError: open() requires mode "r", "U", or "rU"
>
>     "b" for files is about end of line handling (CR LF -> LF), anyway.
>

Only for Python 2.  Since originally you didn't specify, I took my best 
shot. If you omit the 'b' opening a binary file in Python 3, you'd get 
problems similar to yours.  Text files will be converted to Unicode.

That's one of the reasons that specifying the full environment is important.

-- 
DaveA

[toc] | [prev] | [standalone]

csiph-web

Unhelpful traceback

Contents

#40705 — Unhelpful traceback

#40711

#40712

#40760

#40807

#40829

#40831

#40827

#40837