Groups > comp.lang.python > #24467 > unrolled thread

Frustrating circular bytes issue

Started by	J <dreadpiratejeff@gmail.com>
First post	2012-06-26 12:30 -0400
Last post	2012-06-26 19:33 +0200
Articles	2 — 2 participants

Back to article view | Back to comp.lang.python

  Frustrating circular bytes issue J <dreadpiratejeff@gmail.com> - 2012-06-26 12:30 -0400
    Re: Frustrating circular bytes issue Hans Mulder <hansmu@xs4all.nl> - 2012-06-26 19:33 +0200

#24467 — Frustrating circular bytes issue

From	J <dreadpiratejeff@gmail.com>
Date	2012-06-26 12:30 -0400
Subject	Frustrating circular bytes issue
Message-ID	<mailman.1518.1340728239.4697.python-list@python.org>

This is driving me batty... more enjoyment with the Python3
"Everything must be bytes" thing... sigh...
I have a file that contains a class used by other scripts.  The class
is fed either a file, or a stream of output from another command, then
interprets that output and returns a set that the main program can
use...  confusing, perhaps, but not necessarily important.

The class is created and then called with the load_filename method:


 def load_filename(self, filename):
        logging.info("Loading elements from filename: %s", filename)

        file = open(filename, "rb", encoding="utf-8")
        return self.load_file(file, filename)

As you can see, this calls the load_file method, by passing the
filehandle and filename (in common use, filename is actually an
IOStream object).

load_file starts out like this:


def load_file(self, file, filename="<stream>"):
        elements = []
        for string in self._reader(file):
            if not string:
                break

            element = {}


Note that it now calls the private _reader() passing along the
filehandle further in.  THIS is where I'm failing:

This is the private _reader function:


def _reader(self, file, size=4096, delimiter=r"\n{2,}"):
        buffer_old = ""
        while True:
            buffer_new = file.read()
            print(type(buffer_new))
            if not buffer_new:
                break
            lines = re.split(delimiter, buffer_old + buffer_new)
            buffer_old = lines.pop(-1)

            for line in lines:
                yield line

        yield buffer_old


(the print statement is something I put in to verify the problem.

So stepping through this, when _reader executes, it executes read() on
the opened filehandle.  Originally, it read in 4096 byte chunks, I
removed that to test a theory.  It creates buffer_new with the output
of the read.

Running type() on buffer_new tells me that it's a bytes object.

However no matter what I do:

file.read().decode()
buffer_new.decode() in the lines = re.split() statement
buffer_str = buffer_new.decode()

I always get a traceback telling me that the str object has no decoe() method.

If I remove the decode attempts, I get a traceback telling me that it
can't implicitly convert a bytes_object to a str object.

So I'm stuck in a vicious circle and can't see a way out.

here's sample error messages:
When using the decode() method to attempt to convert the bytes object:
Traceback (most recent call last):
  File "./filter_templates", line 134, in <module>
    sys.exit(main(sys.argv[1:]))
  File "./filter_templates", line 126, in main
    options.whitelist, options.blacklist)
  File "./filter_templates", line 77, in parse_file
    matches = match_elements(template.load_file(file), *args, **kwargs)
  File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
73, in load_file
    for string in self._reader(file):
  File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
35, in _reader
    lines = re.split(delimiter, buffer_old + buffer_new.decode())
AttributeError: 'str' object has no attribute 'decode'

It's telling me that buffer_new is a str object.

so if I remove the decode():

Traceback (most recent call last):
  File "./run_templates", line 142, in <module>
    sys.exit(main(sys.argv[1:]))
  File "./run_templates", line 137, in main
    runner.process(args, options.shell)
  File "./run_templates", line 39, in process
    records = self.process_output(process.stdout)
  File "./run_templates", line 88, in process_output
    return template.load_file(output)
  File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
73, in load_file
    for string in self._reader(file):
  File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
35, in _reader
    lines = re.split(delimiter, buffer_old + buffer_new)
TypeError: Can't convert 'bytes' object to str implicitly

now it's complaining that buffer_new is a bytes object and can't be
implicitly converted to str.

This is a bug introduced in our conversion from Python 2 to Python 3.
I am really, really starting to dislike some of the things Python3
does... or just am really, really frustrated.

[toc] | [next] | [standalone]

#24472

From	Hans Mulder <hansmu@xs4all.nl>
Date	2012-06-26 19:33 +0200
Message-ID	<4fe9f27b$0$6990$e4fe514c@news2.news.xs4all.nl>
In reply to	#24467

On 26/06/12 18:30:15, J wrote:
> This is driving me batty... more enjoyment with the Python3
> "Everything must be bytes" thing... sigh...
> I have a file that contains a class used by other scripts.  The class
> is fed either a file, or a stream of output from another command, then
> interprets that output and returns a set that the main program can
> use...  confusing, perhaps, but not necessarily important.

It would help if you could post an extract that we can actually run,
to see for ourselves what happens.

> The class is created and then called with the load_filename method:
> 
> 
>  def load_filename(self, filename):
>         logging.info("Loading elements from filename: %s", filename)
> 
>         file = open(filename, "rb", encoding="utf-8")

When I try this in Python3, I get an error message:

    ValueError: binary mode doesn't take an encoding argument


You'll have to decide for yourself whether you want to read strings or
bytes.  If you want strings, you'll have to open the file in text mode:

         file = open(filename, "rt", encoding="utf-8")

Alternatively, if you want bytes, you must leave off the encoding:


         file = open(filename, "rb")

>         return self.load_file(file, filename)
> 
> As you can see, this calls the load_file method, by passing the
> filehandle and filename (in common use, filename is actually an
> IOStream object).
> 
> load_file starts out like this:
> 
> 
> def load_file(self, file, filename="<stream>"):
>         elements = []
>         for string in self._reader(file):
>             if not string:
>                 break
> 
>             element = {}
> 
> 
> Note that it now calls the private _reader() passing along the
> filehandle further in.  THIS is where I'm failing:
> 
> This is the private _reader function:
> 
> 
> def _reader(self, file, size=4096, delimiter=r"\n{2,}"):
>         buffer_old = ""
>         while True:
>             buffer_new = file.read()
>             print(type(buffer_new))
>             if not buffer_new:
>                 break
>             lines = re.split(delimiter, buffer_old + buffer_new)
>             buffer_old = lines.pop(-1)
> 
>             for line in lines:
>                 yield line
> 
>         yield buffer_old
> 
> 
> (the print statement is something I put in to verify the problem.
> 
> So stepping through this, when _reader executes, it executes read() on
> the opened filehandle.  Originally, it read in 4096 byte chunks, I
> removed that to test a theory.  It creates buffer_new with the output
> of the read.
> 
> Running type() on buffer_new tells me that it's a bytes object.
> 
> However no matter what I do:
> 
> file.read().decode()
> buffer_new.decode() in the lines = re.split() statement
> buffer_str = buffer_new.decode()
> 
> I always get a traceback telling me that the str object has no decoe() method.
> 
> If I remove the decode attempts, I get a traceback telling me that it
> can't implicitly convert a bytes_object to a str object.
> 
> So I'm stuck in a vicious circle and can't see a way out.
> 
> here's sample error messages:
> When using the decode() method to attempt to convert the bytes object:
> Traceback (most recent call last):
>   File "./filter_templates", line 134, in <module>
>     sys.exit(main(sys.argv[1:]))
>   File "./filter_templates", line 126, in main
>     options.whitelist, options.blacklist)
>   File "./filter_templates", line 77, in parse_file
>     matches = match_elements(template.load_file(file), *args, **kwargs)
>   File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
> 73, in load_file
>     for string in self._reader(file):
>   File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
> 35, in _reader
>     lines = re.split(delimiter, buffer_old + buffer_new.decode())
> AttributeError: 'str' object has no attribute 'decode'

Look carefulle at the traceback: this line looks deceptively like
a line in your code, except the file name is different.  Your file
is called "./filter_template" and this line is from a file named
"/usr/lib/python3/dist-packages/checkbox/lib/template.py".

At line 77 in your code, your calling the "load_file" method on an
instance of a class defined in that file.  Your description sounds
as if you meant to call the "load_file" method on an instance of
your own class.  In other words, it sounds like you're instantiating
the wrong class.

I can't say for certain, because you've left out that bit of your code.


> It's telling me that buffer_new is a str object.
> 
> so if I remove the decode():
> 
> Traceback (most recent call last):
>   File "./run_templates", line 142, in <module>
>     sys.exit(main(sys.argv[1:]))
>   File "./run_templates", line 137, in main
>     runner.process(args, options.shell)
>   File "./run_templates", line 39, in process
>     records = self.process_output(process.stdout)
>   File "./run_templates", line 88, in process_output
>     return template.load_file(output)
>   File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
> 73, in load_file
>     for string in self._reader(file):
>   File "/usr/lib/python3/dist-packages/checkbox/lib/template.py", line
> 35, in _reader
>     lines = re.split(delimiter, buffer_old + buffer_new)
> TypeError: Can't convert 'bytes' object to str implicitly
> 
> now it's complaining that buffer_new is a bytes object and can't be
> implicitly converted to str.
> 
> This is a bug introduced in our conversion from Python 2 to Python 3.
> I am really, really starting to dislike some of the things Python3
> does... or just am really, really frustrated.

Try creating a script that we can all run that shows the problem.
Then shorten it by cutting out bits that shouldn't matter for
your problem.  After each cut, run the script, to make sure you
still have to problem.  If the problem goes away, you've cut out
the line with the bug.  Put that line back in, and try to figure
out what's wrong with it.  You might be able to solve your own
problem without even posting the script to this forum.

Hope this helps,

-- HansM

[toc] | [prev] | [standalone]

csiph-web

Frustrating circular bytes issue

Contents

#24467 — Frustrating circular bytes issue

#24472