Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #98509 > unrolled thread

Re: using binary in python

Started byMichiel Overtoom <motoom@xs4all.nl>
First post2015-11-09 11:40 +0100
Last post2015-11-09 18:17 +0200
Articles 10 — 3 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: using binary in python Michiel Overtoom <motoom@xs4all.nl> - 2015-11-09 11:40 +0100
    Re: using binary in python Marko Rauhamaa <marko@pacujo.net> - 2015-11-09 12:56 +0200
      Re: using binary in python Chris Angelico <rosuav@gmail.com> - 2015-11-09 22:04 +1100
        Re: using binary in python Marko Rauhamaa <marko@pacujo.net> - 2015-11-09 15:25 +0200
          Re: using binary in python Chris Angelico <rosuav@gmail.com> - 2015-11-10 00:52 +1100
            Re: using binary in python Marko Rauhamaa <marko@pacujo.net> - 2015-11-09 16:32 +0200
              Re: using binary in python Chris Angelico <rosuav@gmail.com> - 2015-11-10 02:17 +1100
                Re: using binary in python Marko Rauhamaa <marko@pacujo.net> - 2015-11-09 17:46 +0200
                  Re: using binary in python Chris Angelico <rosuav@gmail.com> - 2015-11-10 02:57 +1100
                    Re: using binary in python Marko Rauhamaa <marko@pacujo.net> - 2015-11-09 18:17 +0200

#98509 — Re: using binary in python

FromMichiel Overtoom <motoom@xs4all.nl>
Date2015-11-09 11:40 +0100
SubjectRe: using binary in python
Message-ID<mailman.168.1447065689.16136.python-list@python.org>
> On 08 Nov 2015, at 22:27, kent nyberg <kent@z-sverige.nu> wrote:
> 
> Well, lets assume I want to write and read binary.  How is it done?

With the functions 'open()' and 'read()' and 'write()'. If you're on Windows, don't forget to include a 'b' in the mode string of the open() call, otherwise Python will assume that you're opening a text file.

You also might want to look into the 'struct' module, functions 'pack()' and 'unpack()'. They convert python values to their binary representation which is used in binary files.

Greetings,


[toc] | [next] | [standalone]


#98510

FromMarko Rauhamaa <marko@pacujo.net>
Date2015-11-09 12:56 +0200
Message-ID<87d1vjigqf.fsf@elektro.pacujo.net>
In reply to#98509
Michiel Overtoom <motoom@xs4all.nl>:

> If you're on Windows, don't forget to include a 'b' in the mode string
> of the open() call, otherwise Python will assume that you're opening a
> text file.

Python has brought that blessing to other operating systems, as well.

One of the principal UNIX innovations was to see files as simple byte
sequences. The operating system would place no semantics on the meaning
or structure of the bytes.

Python presents a different concept of a file; Python files are either
text files or binary files. The dichotomy is built on top of the UNIX
file system. However, the Python model "leaks" in that nothing prevents
you from opening a binary file as a text file or vice versa.


Marko

[toc] | [prev] | [next] | [standalone]


#98511

FromChris Angelico <rosuav@gmail.com>
Date2015-11-09 22:04 +1100
Message-ID<mailman.169.1447067069.16136.python-list@python.org>
In reply to#98510
On Mon, Nov 9, 2015 at 9:56 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> One of the principal UNIX innovations was to see files as simple byte
> sequences. The operating system would place no semantics on the meaning
> or structure of the bytes.

And you also want to see those files as containing "plain text",
right? Unfortunately, those two goals are in conflict. Either a file
is nothing but bytes, or it contains text in some encoding. From the
file system and operating system's points of view, the files are
indeed nothing but bytes; but from the application's point of view,
text is text and bytes is bytes. In Python, a text file is opened with
a specific encoding, and Python handles the encode/decode steps.

ChrisA

[toc] | [prev] | [next] | [standalone]


#98521

FromMarko Rauhamaa <marko@pacujo.net>
Date2015-11-09 15:25 +0200
Message-ID<8737wfi9ss.fsf@elektro.pacujo.net>
In reply to#98511
Chris Angelico <rosuav@gmail.com>:

> On Mon, Nov 9, 2015 at 9:56 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> One of the principal UNIX innovations was to see files as simple byte
>> sequences. The operating system would place no semantics on the
>> meaning or structure of the bytes.
>
> And you also want to see those files as containing "plain text",
> right? Unfortunately, those two goals are in conflict. Either a file
> is nothing but bytes, or it contains text in some encoding. From the
> file system and operating system's points of view, the files are
> indeed nothing but bytes; but from the application's point of view,
> text is text and bytes is bytes. In Python, a text file is opened with
> a specific encoding, and Python handles the encode/decode steps.

So we have this stack:

  +-------------+
  | Application |
  +-------------+
  |   Python    |
  +-------------+
  |    UNIX     |
  +-------------+

The question is, does Python want to be "just a programming language"
that exposes UNIX to the application program? Or does Python want to
present an abstraction different than UNIX? IOW, is the dividing line
between the application and the operating system above or below Python?

It is evident that Python3 has intentionally moved away from the "just a
programming language" view toward Java's write-once-run-everywhere
ideal.


You would be correct that the original UNIX file system model was based
on somewhat of a naive falsity, namely text=ASCII. No matter how you
view it, there is a conflict of sorts. Python3 is trying to pave over
the conflict, but personally I would prefer the programming language
just give me the OS, warts and all.


Marko

[toc] | [prev] | [next] | [standalone]


#98523

FromChris Angelico <rosuav@gmail.com>
Date2015-11-10 00:52 +1100
Message-ID<mailman.174.1447077141.16136.python-list@python.org>
In reply to#98521
On Tue, Nov 10, 2015 at 12:25 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> So we have this stack:
>
>   +-------------+
>   | Application |
>   +-------------+
>   |   Python    |
>   +-------------+
>   |    UNIX     |
>   +-------------+
>
> The question is, does Python want to be "just a programming language"
> that exposes UNIX to the application program? Or does Python want to
> present an abstraction different than UNIX? IOW, is the dividing line
> between the application and the operating system above or below Python?
>
> It is evident that Python3 has intentionally moved away from the "just a
> programming language" view toward Java's write-once-run-everywhere
> ideal.
>
>
> You would be correct that the original UNIX file system model was based
> on somewhat of a naive falsity, namely text=ASCII. No matter how you
> view it, there is a conflict of sorts. Python3 is trying to pave over
> the conflict, but personally I would prefer the programming language
> just give me the OS, warts and all.

Then you don't want Python. The point of Python is to give you data
types like "list", "dict", "int" (not a machine word but a bignum),
and so on. It's NOT meant to be a thin wrapper around what your OS
offers. Python's string is a Unicode string, not a series of bytes (as
is C's char* type), because human text is better represented as
Unicode than as bytes; so it stands to reason that Python's files
should be able to contain text, since it's the one most obvious
substrate for data storage other than bytes. You get two easy options
(bytes and text), and for everything else you can use a library that's
built on one of those (pickle, json, etc) or a database.

I expect to be able to write idiomatic Python code and have it run on
Windows, Unix, Mac OS, OS/2, or Mozilla Firefox, and do the same
thing. Since those platforms are so very different, supporting all
five is going to mean restricting myself to only those operations that
are common to them all, but I expect those operations to be spelled
the same way and have the same semantics. I do NOT expect that
multiplying 123456 by 654321 will return 80779853376 on some platforms
and 3470442048 on others, nor do I expect "µ" to render as a micro
sign on some systems, a box drawing character "╡" on others, and as a
capital A with acute "Á" on the rest. (Examples not chosen at random.)
Obviously this is an ideal that sometimes can't be achieved perfectly
(Windows vs Unix file system rules, for instance), but it's definitely
part of Python's goal.

If you want C, you know where to get it. Though even C does quite a
bit of papering-over, so maybe you want to be writing assembly code.

ChrisA

[toc] | [prev] | [next] | [standalone]


#98527

FromMarko Rauhamaa <marko@pacujo.net>
Date2015-11-09 16:32 +0200
Message-ID<87si4fgs5f.fsf@elektro.pacujo.net>
In reply to#98523
Chris Angelico <rosuav@gmail.com>:

> On Tue, Nov 10, 2015 at 12:25 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> but personally I would prefer the programming language
>> just give me the OS, warts and all.
>
> Then you don't want Python. The point of Python is to give you data
> types like "list", "dict", "int" (not a machine word but a bignum),
> and so on.

Those examples are out of the scope of the OS abstraction.

> It's NOT meant to be a thin wrapper around what your OS
> offers.

Thankfully, Python hasn't yet taken that away. I can do a lot of nice
things with socket.* and os.* that are unavailable in, say, Java.

> Python's string is a Unicode string, not a series of bytes (as is C's
> char* type), because human text is better represented as Unicode than
> as bytes;

No problem there, either.

> so it stands to reason that Python's files should be able to contain
> text,

Yes, and lists and dicts and ints and objects and all. No problem there.

However, when filenames and sys.stdin deal with text, things are getting
iffy.


Marko

[toc] | [prev] | [next] | [standalone]


#98535

FromChris Angelico <rosuav@gmail.com>
Date2015-11-10 02:17 +1100
Message-ID<mailman.179.1447082283.16136.python-list@python.org>
In reply to#98527
On Tue, Nov 10, 2015 at 1:32 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Yes, and lists and dicts and ints and objects and all. No problem there.
>
> However, when filenames and sys.stdin deal with text, things are getting
> iffy.

So where do you mark the boundary between the human and the OS? If I
create a GUI, I should be able to put an entry field down that accepts
Unicode text. And if I make a web form and an HTTP server, a user
should be able to type Unicode text into an <input> field and send
that along. Either way, my program should get a Unicode string. Why
shouldn't I be able to do the same with input()? And why, if a user
enters a plausible file name, should that not be able to be opened as
a file?

At what point do you say "this is for humans, this is for machines"?
Isn't it Python's job to spare us that hassle?

ChrisA

[toc] | [prev] | [next] | [standalone]


#98538

FromMarko Rauhamaa <marko@pacujo.net>
Date2015-11-09 17:46 +0200
Message-ID<87lha7goqi.fsf@elektro.pacujo.net>
In reply to#98535
Chris Angelico <rosuav@gmail.com>:

> On Tue, Nov 10, 2015 at 1:32 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> Yes, and lists and dicts and ints and objects and all. No problem
>> there.
>>
>> However, when filenames and sys.stdin deal with text, things are
>> getting iffy.
>
> So where do you mark the boundary between the human and the OS? If I
> create a GUI, I should be able to put an entry field down that accepts
> Unicode text. And if I make a web form and an HTTP server, a user
> should be able to type Unicode text into an <input> field and send
> that along. Either way, my program should get a Unicode string. Why
> shouldn't I be able to do the same with input()? And why, if a user
> enters a plausible file name, should that not be able to be opened as
> a file?

sys.stdin is not (primarily) a human interface. It is the canonical
channel to relay the input data to the program. The results of the
computation are emitted through sys.stdout.

The input data could well be, say, UTF-8-encoded plain text, or a PDF
file, or a Zip file, or a music recording.

As for file names, even UTF-8 Linux environments often contain filenames
that are illegal UTF-8. Using surrogate characters is a clever trick,
but might even lead to security risks when more than one pathname can
map to the same surrogate encoding.

> At what point do you say "this is for humans, this is for machines"?
> Isn't it Python's job to spare us that hassle?

Python is certainly trying to do that.

   Flik: I was just trying to help.
   Mr. Soil: Then help us; *don't* help us. 

   <URL: http://www.imdb.com/title/tt0120623/quotes>


I program for Linux. I use different programming languages, but the
target is Linux. The systems I build and deal with consist of different
components written in different programming languages but they all
follow Linux-y conventions to work harmoniously together. I don't in any
way benefit from a smoke screen a programming language offers to place
in front of the operating system.


Marko

[toc] | [prev] | [next] | [standalone]


#98539

FromChris Angelico <rosuav@gmail.com>
Date2015-11-10 02:57 +1100
Message-ID<mailman.182.1447084624.16136.python-list@python.org>
In reply to#98538
On Tue, Nov 10, 2015 at 2:46 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
> I program for Linux. I use different programming languages, but the
> target is Linux. The systems I build and deal with consist of different
> components written in different programming languages but they all
> follow Linux-y conventions to work harmoniously together. I don't in any
> way benefit from a smoke screen a programming language offers to place
> in front of the operating system.

Then, as I said before: You do not want Python. Go use something else
that lets you get closer to the OS - possibly C, but possibly not. I'm
going to keep using a language that lets me write for humans, because
they are my target. Not Linux. People.

ChrisA

[toc] | [prev] | [next] | [standalone]


#98542

FromMarko Rauhamaa <marko@pacujo.net>
Date2015-11-09 18:17 +0200
Message-ID<87h9kvgnau.fsf@elektro.pacujo.net>
In reply to#98539
Chris Angelico <rosuav@gmail.com>:

> On Tue, Nov 10, 2015 at 2:46 AM, Marko Rauhamaa <marko@pacujo.net> wrote:
>> I program for Linux. I use different programming languages, but the
>> target is Linux. The systems I build and deal with consist of
>> different components written in different programming languages but
>> they all follow Linux-y conventions to work harmoniously together. I
>> don't in any way benefit from a smoke screen a programming language
>> offers to place in front of the operating system.
>
> Then, as I said before: You do not want Python. Go use something else
> that lets you get closer to the OS

Python still offers almost all of the system programming facilities.
Python3 is slowly drifting away, but it's still all there, thankfully.


Marko

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web