Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #98504 > unrolled thread
| Started by | kent nyberg <kent@z-sverige.nu> |
|---|---|
| First post | 2015-11-08 16:27 -0500 |
| Last post | 2015-11-10 17:19 -0700 |
| Articles | 15 — 9 participants |
Back to article view | Back to comp.lang.python
using binary in python kent nyberg <kent@z-sverige.nu> - 2015-11-08 16:27 -0500
Re: using binary in python Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-09 11:58 +0200
Re: using binary in python Larry Hudson <orgnut@yahoo.com> - 2015-11-09 22:20 -0800
Re: using binary in python Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-10 15:14 -0500
Re: using binary in python mm0fmf <none@mailinator.com> - 2015-11-10 20:36 +0000
Re: using binary in python Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-10 16:02 -0500
OT: Re: using binary in python mm0fmf <none@mailinator.com> - 2015-11-10 22:17 +0000
Re: using binary in python Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-11-10 21:22 +0000
Re: using binary in python Random832 <random832@fastmail.com> - 2015-11-10 19:03 -0500
Re: using binary in python Random832 <random832@fastmail.com> - 2015-11-10 19:04 -0500
Re: using binary in python Larry Hudson <orgnut@yahoo.com> - 2015-11-10 19:53 -0800
Re: using binary in python Random832 <random832@fastmail.com> - 2015-11-10 15:44 -0500
Re: using binary in python kent nyberg <kent@z-sverige.nu> - 2015-11-10 16:29 -0500
Re: using binary in python Christian Gollwitzer <auriocus@gmx.de> - 2015-11-11 22:32 +0100
Re: using binary in python Michael Torrie <torriem@gmail.com> - 2015-11-10 17:19 -0700
| From | kent nyberg <kent@z-sverige.nu> |
|---|---|
| Date | 2015-11-08 16:27 -0500 |
| Subject | using binary in python |
| Message-ID | <mailman.164.1447060794.16136.python-list@python.org> |
Hi there, Lets say I want to play around with binary files in python. Opening and using files in python is something that I think I've sort of got the hang of. The thing im wondering about is binary files. While searching for binary and python I started reading about bin(). I can use bin() to convert integers to binary. Now, I thought.. some way it should be possible to write "binary" files. That is, non ascii-files. For example, on Linux, if I run the command 'less' on a binary; for example /bin/ls, then the command asks me if I really want to do it, since its a binary file. I know why, since the non ascii-stuff messes up the terminal, and most likely since you rarely want to look at a binary file with less. Well, lets assume I want to write and read binary. How is it done? The built in bin() function only converts integers to binary but the variable or return is still just letters right? Converting the integer '1' to binary with bin() return 0b1. Which is ok. Its the binary representation of integer 1. But since.. there is files which contains data that is not representable as ascii, then I assume it can be written. But can it by Python? Ok, I get the feeling now that its hard to understand my question. I assume in the C language its just matter of writing a complex struct with strange variables and just write it to a file. But in python..? Can some one give a short explenation of a python write to file, that makes it a binary file? Thanks alot, and forgive me for my stupid questions. :) /Kent Nyberg
[toc] | [next] | [standalone]
| From | Jussi Piitulainen <harvesting@makes.email.invalid> |
|---|---|
| Date | 2015-11-09 11:58 +0200 |
| Message-ID | <lf5ziynxznx.fsf@ling.helsinki.fi> |
| In reply to | #98504 |
kent nyberg writes:
[- -]
> Well, lets assume I want to write and read binary. How is it done?
[- -]
You open the file with mode "wb" (to write binary) or "rb" (to read
binary), and then you write or read bytes (eight-bit units).
>>> data = '"binääridataa"\n'.encode('utf-8')
>>> f = open('roska.txt', 'wb')
>>> f.write(data)
17
>>> f.close()
The .encode methods produced a bytestring, which Python likes to display
as ASCII characters where it can and in hexadecimal where it cannot:
>>> data
b'"bin\xc3\xa4\xc3\xa4ridataa"\n'
An "octal dump" in characters (where ASCII, otherwise apparently octal)
and the corresponding hexadecimal shows that it is, indeed, these bytes
that ended up in the file:
$ od -t cx1 roska.txt
0000000 " b i n 303 244 303 244 r i d a t a a "
22 62 69 6e c3 a4 c3 a4 72 69 64 61 74 61 61 22
0000020 \n
0a
0000021
In UTF-8, the letter 'ä' is encoded as the two bytes c3 a4 (aka 303 244,
and you are welcome to work them out in binary, or even in decimal
though that is less transparent when what you want to see is bits). In
other encodings, there might be just one byte for that letter, and there
are even encodings where the ASCII letters would be different bytes.
Fixed-sized numbers (32-bit, 64-bit integers and floating point numbers
have conventional representations as groups of four or eight bytes). You
could interpret parts of the above text as such instead.
[toc] | [prev] | [next] | [standalone]
| From | Larry Hudson <orgnut@yahoo.com> |
|---|---|
| Date | 2015-11-09 22:20 -0800 |
| Message-ID | <HbGdndTOtKc3EdzLnZ2dnUU7-YnOydjZ@giganews.com> |
| In reply to | #98504 |
Your questions are somewhat difficult to answer because you misunderstand binary. The key is
that EVERYTHING in a computer is binary. There are NO EXCEPTIONS, it's all binary ALL the time.
The difference comes about in how this binary data is displayed and manipulated. I want to
emphasize, ALL the DATA is binary.
On 11/08/2015 01:27 PM, kent nyberg wrote:
> Hi there,
> Lets say I want to play around with binary files in python.
> Opening and using files in python is something that I think I've sort of got the hang of.
> The thing im wondering about is binary files.
> While searching for binary and python I started reading about bin().
> I can use bin() to convert integers to binary.
No. It doesn't convert anything. It takes the integer data (which is internally binary) and
gives you a _string_ representing that value in a binary format. The same with hex() and oct()
which give _strings_ of text corresponding to those formats. The original integer data is
unchanged -- and is still internally binary data.
> Now, I thought.. some way it should be possible to write "binary" files.
> That is, non ascii-files. For example, on Linux, if I run the command 'less' on a binary;
> for example /bin/ls, then the command asks me if I really want to do it, since its a binary file.
> I know why, since the non ascii-stuff messes up the terminal, and most likely since you rarely want to
> look at a binary file with less.
Not so much that it's rare, it's just that the less (and similar) commands expect the _binary
data_ that it is given represents text encoded in ascii, utf-8, whatever... And ls is a
executable program not text. So less (or the like) tries to display that data as text, and of
course, the result is garbage. (GIGO)
If you really want to look at a non-text file this way, there are utilities to do this properly,
like hexdump.
> Well, lets assume I want to write and read binary. How is it done?
> The built in bin() function only converts integers to binary but the variable or return is still just letters right?
> Converting the integer '1' to binary with bin() return 0b1. Which is ok. Its the binary representation of integer 1.
> But since.. there is files which contains data that is not representable as ascii, then I assume it can be written.
> But can it by Python?
Of course it can. The only difference a text file and a binary file is the way it's opened.
Text files are opened with 'r' or 'w', while binary files are opened with 'rb' or 'wb'. Being
different modes, the reading/writing is handled differently. One obvious difference, the lines
of a text file are marked by ending them with a newline character, so it's easy to read/write
the text line-by-line. But the data in a binary file is completely arbitrary and is much
trickier to handle (is an individual piece of data 1 byte, two bytes, 12,428 bytes or...) Once
again I'll emphasize that the _data_ in a text file is binary, it's just that these binary data
values represent text codes. The data in a binary file can represent anything, a program, a
.jpg picture, .mp3 music, a company Personnel record, or anything else.
BTW, I talk about text codes as a generic term -- there are many choices of how text is encoded.
> Ok, I get the feeling now that its hard to understand my question. I assume in the C language its just matter of
> writing a complex struct with strange variables and just write it to a file. But in python..?
There is no essential difference between how C and Python handle binary files. The principles
are the same, only the details differ. You can ignore comparing to C here.
> Can some one give a short explenation of a python write to file, that makes it a binary file?
A short explanation, probably not in a newsgroup post. Try searching the web for Python binary
files, or similar search terms. Then sit down and play! ;-)
> Thanks alot, and forgive me for my stupid questions. :)
> /Kent Nyberg
It's not a stupid question, but is is based on your misunderstanding of binary. Don't give up,
you'll get it! Just keep in mind that _ALL_ the underlying _DATA_ is binary, it is just how
this data is displayed and manipulated that makes the differences. And it is important that you
understand the difference between the actual (binary) data and the way it is displayed -- these
are two entirely different things.
Sorry that I'm not more specific on the 'how' to use binary files, but the subject is more
complex than a simple explanation can properly give.
-- Larry -=-
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2015-11-10 15:14 -0500 |
| Message-ID | <mailman.214.1447186467.16136.python-list@python.org> |
| In reply to | #98576 |
On Mon, 9 Nov 2015 22:20:25 -0800, Larry Hudson via Python-list
<python-list@python.org> declaimed the following:
>Of course it can. The only difference a text file and a binary file is the way it's opened.
>Text files are opened with 'r' or 'w', while binary files are opened with 'rb' or 'wb'. Being
>different modes, the reading/writing is handled differently. One obvious difference, the lines
>of a text file are marked by ending them with a newline character, so it's easy to read/write
>the text line-by-line. But the data in a binary file is completely arbitrary and is much
To be strict -- a text file has <some> system defined means of marking
line endings. UNIX/Linux uses just a <LF> character; Windows uses the pair
<CR><LF>. TRS-DOS used just <CR> for end of line. Some operating systems
may have used count-delimited formats (and then there is the VMS FORTRAN
segmented records with start and end segment bits).
Whatever the system uses, a text file can be read by "lines", the
system detecting the break between lines. A file opened in binary mode does
not have "lines", and if the system uses in-band delimeters (<LF>, et al)
those delimiters are returned as just another byte of data. (I suppose a
count-based system could treat the length as either in-band, returning it
as data, or out-of-band, stripping the count values while returning the
rest).
The Ada language defines the end of Text file to consist of <end of
line><end of page><end of file> (yes, the language defines end-of-page as a
controllable feature, and explicitly states that all three must be at the
end of a file) BUT then goes on to state that the nature of the delimiters
is implementation defined.
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | mm0fmf <none@mailinator.com> |
|---|---|
| Date | 2015-11-10 20:36 +0000 |
| Message-ID | <Hzs0y.118789$4v5.87935@fx37.am4> |
| In reply to | #98606 |
On 10/11/2015 20:14, Dennis Lee Bieber wrote: > The Ada language defines the end of Text file to consist of It is 15 years this month since I last worked in place that used Ada. I think that calls for a wee dram to celebrate ;-)
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2015-11-10 16:02 -0500 |
| Message-ID | <mailman.216.1447189352.16136.python-list@python.org> |
| In reply to | #98607 |
On Tue, 10 Nov 2015 20:36:52 +0000, mm0fmf via Python-list
<python-list@python.org> declaimed the following:
>On 10/11/2015 20:14, Dennis Lee Bieber wrote:
>> The Ada language defines the end of Text file to consist of
>
>It is 15 years this month since I last worked in place that used Ada. I
>think that calls for a wee dram to celebrate ;-)
Given that a dram is 1/8 of a "fluid ounce" that leads to the
conclusion that a "wee dram" is based on US standard fluid once, vs British
standard fluid ounce...
My language preferences do tend to be the extremes: Python for quick
throw-away stuff, Ada for more formal stuff (since it has a much more
rigorous syntax than Pascal, Modula-2, C/C++, Java -- no optional block
delimiters, no dangling else, etc.)
Unfortunately, as a hobbyist dabbler at home, I can't justify the time
to port an Ada compiler to Arduino, TIVA, Propeller, Beaglebone (though the
latter may just be a case of porting the hardware access). So... I'm stuck
with variants of C for those devices (again, excluding the Linux based
Beaglebone)
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | mm0fmf <none@mailinator.com> |
|---|---|
| Date | 2015-11-10 22:17 +0000 |
| Subject | OT: Re: using binary in python |
| Message-ID | <82u0y.114989$bA1.105280@fx41.am4> |
| In reply to | #98609 |
On 10/11/2015 21:02, Dennis Lee Bieber wrote: > On Tue, 10 Nov 2015 20:36:52 +0000, mm0fmf via Python-list > <python-list@python.org> declaimed the following: > >> On 10/11/2015 20:14, Dennis Lee Bieber wrote: >>> The Ada language defines the end of Text file to consist of >> >> It is 15 years this month since I last worked in place that used Ada. I >> think that calls for a wee dram to celebrate ;-) > > Given that a dram is 1/8 of a "fluid ounce" that leads to the > conclusion that a "wee dram" is based on US standard fluid once, vs British > standard fluid ounce... > > > My language preferences do tend to be the extremes: Python for quick > throw-away stuff, Ada for more formal stuff (since it has a much more > rigorous syntax than Pascal, Modula-2, C/C++, Java -- no optional block > delimiters, no dangling else, etc.) > > Unfortunately, as a hobbyist dabbler at home, I can't justify the time > to port an Ada compiler to Arduino, TIVA, Propeller, Beaglebone (though the > latter may just be a case of porting the hardware access). So... I'm stuck > with variants of C for those devices (again, excluding the Linux based > Beaglebone) > I escaped having to produce new code in Ada, I merely had to run some scripts that added the compiled C binaries into the Ada gloop! C user since 1983, C++ user since 2002, Python and C# since 2010. I regularly pinch myself that it seems to be painfully easy to be productive using Python compared to the other langauges!
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-11-10 21:22 +0000 |
| Message-ID | <mailman.217.1447190561.16136.python-list@python.org> |
| In reply to | #98607 |
On 10/11/2015 20:36, mm0fmf via Python-list wrote: > On 10/11/2015 20:14, Dennis Lee Bieber wrote: >> The Ada language defines the end of Text file to consist of > > It is 15 years this month since I last worked in place that used Ada. I > think that calls for a wee dram to celebrate ;-) Followed by a chorus or two of the Ian Dury song about Ada to put the icing on the cake. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Random832 <random832@fastmail.com> |
|---|---|
| Date | 2015-11-10 19:03 -0500 |
| Message-ID | <mailman.229.1447200287.16136.python-list@python.org> |
| In reply to | #98607 |
Dennis Lee Bieber <wlfraed@ix.netcom.com> writes: > > Given that a dram is 1/8 of a "fluid ounce" that leads to the > conclusion that a "wee dram" is based on US standard fluid once, 29.6 ml > vs British standard fluid ounce... 28.4 ml It's our _pints_ that are smaller than yours, not our ounces.
[toc] | [prev] | [next] | [standalone]
| From | Random832 <random832@fastmail.com> |
|---|---|
| Date | 2015-11-10 19:04 -0500 |
| Message-ID | <mailman.230.1447200312.16136.python-list@python.org> |
| In reply to | #98607 |
Dennis Lee Bieber <wlfraed@ix.netcom.com> writes: > Given that a dram is 1/8 of a "fluid ounce" that leads to the > conclusion that a "wee dram" is based on US standard fluid once, 29.6 ml > vs British standard fluid ounce... 28.4 ml It's our _pints_ that are smaller than yours, not our ounces.
[toc] | [prev] | [next] | [standalone]
| From | Larry Hudson <orgnut@yahoo.com> |
|---|---|
| Date | 2015-11-10 19:53 -0800 |
| Message-ID | <Gq2dne_GjoQtJt_LnZ2dnUU7-e-dnZ2d@giganews.com> |
| In reply to | #98606 |
On 11/10/2015 12:14 PM, Dennis Lee Bieber wrote:
> On Mon, 9 Nov 2015 22:20:25 -0800, Larry Hudson via Python-list
> <python-list@python.org> declaimed the following:
>
>> Of course it can. The only difference a text file and a binary file is the way it's opened.
>> Text files are opened with 'r' or 'w', while binary files are opened with 'rb' or 'wb'. Being
>> different modes, the reading/writing is handled differently. One obvious difference, the lines
>> of a text file are marked by ending them with a newline character, so it's easy to read/write
>> the text line-by-line. But the data in a binary file is completely arbitrary and is much
>
> To be strict -- a text file has <some> system defined means of marking
> line endings. UNIX/Linux uses just a <LF> character; Windows uses the pair
> <CR><LF>. TRS-DOS used just <CR> for end of line. Some operating systems
> may have used count-delimited formats (and then there is the VMS FORTRAN
> segmented records with start and end segment bits).
>
The main purpose of my message was to get across the idea of separating the actual data (as
binary values) and the way this data is displayed (to the user/programmer). They are two
entirely different concepts, and the OP was obviously confused about this. But of course,
you're right -- I was careless/imprecise in some of my descriptions.
-=- Larry -=-
[toc] | [prev] | [next] | [standalone]
| From | Random832 <random832@fastmail.com> |
|---|---|
| Date | 2015-11-10 15:44 -0500 |
| Message-ID | <mailman.215.1447188327.16136.python-list@python.org> |
| In reply to | #98576 |
Dennis Lee Bieber <wlfraed@ix.netcom.com> writes: > To be strict -- a text file has <some> system defined means of marking > line endings. UNIX/Linux uses just a <LF> character; Windows uses the pair > <CR><LF>. TRS-DOS used just <CR> for end of line. Some operating systems > may have used count-delimited formats (and then there is the VMS FORTRAN > segmented records with start and end segment bits). Another possibility would be fixed-length records. The ANSI C standard permits a maximum line length (no less than 254) and for trailing spaces to be ignored.
[toc] | [prev] | [next] | [standalone]
| From | kent nyberg <kent@z-sverige.nu> |
|---|---|
| Date | 2015-11-10 16:29 -0500 |
| Message-ID | <mailman.218.1447190949.16136.python-list@python.org> |
| In reply to | #98576 |
On Mon, Nov 09, 2015 at 10:20:25PM -0800, Larry Hudson via Python-list wrote: > Your questions are somewhat difficult to answer because you misunderstand > binary. The key is that EVERYTHING in a computer is binary. There are NO > EXCEPTIONS, it's all binary ALL the time. The difference comes about in how > this binary data is displayed and manipulated. I want to emphasize, ALL the > DATA is binary. > Thanks alot for taking the time. I get it now. I sort of, but not fully, misunderstood the conecpt of binary files. The thing I was after; and the thing Im playing with now after a more succesfull time with google, is writing more specific things to a file than just strings. English is not my native language so please forgive me, but I wanted to write specifc 16bit codes, and read them. And later play with bitwise operations on them. Sort of. It might not make sense at all, but hey.. it doesnt have to. Thanks anyway. :)
[toc] | [prev] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-11-11 22:32 +0100 |
| Message-ID | <n20c0p$c0o$1@dont-email.me> |
| In reply to | #98611 |
Am 10.11.15 um 22:29 schrieb kent nyberg: > On Mon, Nov 09, 2015 at 10:20:25PM -0800, Larry Hudson via Python-list wrote: >> Your questions are somewhat difficult to answer because you misunderstand >> binary. The key is that EVERYTHING in a computer is binary. There are NO >> EXCEPTIONS, it's all binary ALL the time. The difference comes about in how >> this binary data is displayed and manipulated. I want to emphasize, ALL the >> DATA is binary. >> > > Thanks alot for taking the time. > I get it now. I sort of, but not fully, misunderstood the conecpt of binary files. > The thing I was after; and the thing Im playing with now after a more succesfull time with google, > is writing more specific things to a file than just strings. > English is not my native language so please forgive me, but > I wanted to write specifc 16bit codes, and read them. And later play with bitwise operations on them. Sort of. > It might not make sense at all, but hey.. it doesnt have to I think I understand what you want. Look at the struct module: https://docs.python.org/2/library/struct.html You can write/read binary data from files with standard means. Using struct, you can interpret or format integer values into a specific binary format. That would allow to create a reader or writer for a given binary format in Python. Christian
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2015-11-10 17:19 -0700 |
| Message-ID | <mailman.232.1447201165.16136.python-list@python.org> |
| In reply to | #98576 |
On 11/10/2015 02:29 PM, kent nyberg wrote: > On Mon, Nov 09, 2015 at 10:20:25PM -0800, Larry Hudson via Python-list wrote: >> Your questions are somewhat difficult to answer because you misunderstand >> binary. The key is that EVERYTHING in a computer is binary. There are NO >> EXCEPTIONS, it's all binary ALL the time. The difference comes about in how >> this binary data is displayed and manipulated. I want to emphasize, ALL the >> DATA is binary. >> > > Thanks alot for taking the time. > I get it now. I sort of, but not fully, misunderstood the conecpt of binary files. > The thing I was after; and the thing Im playing with now after a more succesfull time with google, > is writing more specific things to a file than just strings. > English is not my native language so please forgive me, but > I wanted to write specifc 16bit codes, and read them. And later play with bitwise operations on them. Sort of. > It might not make sense at all, but hey.. it doesnt have to. > Thanks anyway. :) You're correct; it doesn't make that much sense. If it were me I'd write out my numbers in text format to the file. You can always read them back in and covert them to a number. Just a quick couple of notes on "binary" vs "text". In the old days on Windows, the difference between "binary" and "ascii" when it came to file reading was simply the interpretation of the end-of-line marker. Whenever you wrote out a \n, it got silently converted to two bytes, 0x0d 0x0a. If you were trying to write a jpeg file, for example, this would corrupt things as you well know. In the Unix world, we never worried about such things because the end-of-line marker was simple 0x0a. It was never translated and never was expanded silently. So when it came to how we worked with files, there was no difference between binary and ascii modes as far as the C library open() was concerned. Now with Python 3, we now again do have to think about the distinction between "text" and "binary" when working with files. If we want to open a text file, we have to open it while specifying the expected text encoding, whether that is UTF-8, UCS-2, UTF-16, or some other old and esoteric encoding. That is to say when Python reads from a text file, there is always going to be a decoding process going on where text file bytes are read in, and then converted into unicode characters. When writing a text file out, unicode characters have to be encoded into a series of bytes. If Python knows what encoding we want, then it can do this automatically as we write to the file. In Python 3 opening a file for binary will read in raw bytes and you can manipulate them however you wish.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web