Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50110 > unrolled thread

hex dump w/ or w/out utf-8 chars

Started byblatt <ferdy.blatsco@gmail.com>
First post2013-07-07 17:22 -0700
Last post2013-07-13 04:51 +0000
Articles 9 on this page of 49 — 15 participants

Back to article view | Back to comp.lang.python


Contents

  hex dump w/ or w/out utf-8 chars blatt <ferdy.blatsco@gmail.com> - 2013-07-07 17:22 -0700
    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-08 11:17 +1000
    Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-08 05:48 +0000
    Re: hex dump w/ or w/out utf-8 chars ferdy.blatsco@gmail.com - 2013-07-08 10:31 -0700
      Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 03:52 +1000
        Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 06:18 -0700
          Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-11 23:32 +1000
            Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 11:42 -0700
              Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 11:44 -0700
              Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-12 03:18 +0000
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-12 14:42 -0700
              Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-12 12:16 +1000
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-13 00:56 -0700
                  Re: hex dump w/ or w/out utf-8 chars Lele Gaifax <lele@metapensiero.it> - 2013-07-13 10:24 +0200
                  Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 09:36 +0000
                  Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-13 19:46 +1000
                  Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 09:49 +0000
                    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-13 20:09 +1000
                    Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-13 07:37 -0700
                      Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-13 15:02 -0400
                        Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-14 01:20 -0700
                          Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-14 10:44 +0000
                            Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-14 06:44 -0700
                              Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-24 06:28 -0700
                      Re: hex dump w/ or w/out utf-8 chars Neil Hodgson <nhodgson@iinet.net.au> - 2013-07-14 09:17 +1000
    Re: hex dump w/ or w/out utf-8 chars ferdy.blatsco@gmail.com - 2013-07-08 10:53 -0700
      Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 04:07 +1000
      Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-08 16:56 -0400
        Re: hex dump w/ or w/out utf-8 chars Neil Cerutti <neilc@norwich.edu> - 2013-07-09 12:22 +0000
          Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-09 08:54 -0400
            Re: hex dump w/ or w/out utf-8 chars Neil Cerutti <neilc@norwich.edu> - 2013-07-09 13:00 +0000
              Re: hex dump w/ or w/out utf-8 chars Skip Montanaro <skip@pobox.com> - 2013-07-09 08:18 -0500
              Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-09 09:23 -0400
      Re: hex dump w/ or w/out utf-8 chars MRAB <python@mrabarnett.plus.com> - 2013-07-08 22:38 +0100
      Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 07:49 +1000
        Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 06:53 +0000
      Re: hex dump w/ or w/out utf-8 chars Joshua Landau <joshua.landau.ws@gmail.com> - 2013-07-08 23:02 +0100
      Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-08 18:45 -0400
      Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 08:51 +1000
      Re: hex dump w/ or w/out utf-8 chars MRAB <python@mrabarnett.plus.com> - 2013-07-09 00:32 +0100
        Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 06:46 +0000
      Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 07:00 +0000
        Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-09 02:34 -0700
          Re: hex dump w/ or w/out utf-8 chars Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-07-09 12:15 +0200
            Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 16:32 +0000
              Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-10 01:52 -0700
          Re: hex dump w/ or w/out utf-8 chars Joshua Landau <joshua@landau.ws> - 2013-07-12 23:01 +0100
            Re: hex dump w/ or w/out utf-8 chars Tim Roberts <timr@probo.com> - 2013-07-12 20:42 -0700
            Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 04:51 +0000

Page 3 of 3 — ← Prev page 1 2 [3]


#50211

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-07-09 06:46 +0000
Message-ID<51dbb1d0$0$6512$c3e8da3$5496439d@news.astraweb.com>
In reply to#50184
On Tue, 09 Jul 2013 00:32:00 +0100, MRAB wrote:

> On 08/07/2013 23:02, Joshua Landau wrote:
>> On 8 July 2013 22:38, MRAB <python@mrabarnett.plus.com> wrote:
>>> On 08/07/2013 21:56, Dave Angel wrote:
>>>> Characters do not have a width.
>>>
>>> [snip]
>>>
>>> It depends what you mean by "width"! :-)
>>>
>>> Try this (Python 3):
>>>
>>>>>> print("A\N{FULLWIDTH LATIN CAPITAL LETTER A}")
>>> AA
>>
>> Serious question: How would one find the width of a character by that
>> definition?
>>
>  >>> import unicodedata
>  >>> unicodedata.east_asian_width("A")
> 'Na'
>  >>> unicodedata.east_asian_width("\N{FULLWIDTH LATIN CAPITAL LETTER
>  >>> A}")
> 'F'
> 
> The possible widths are:
> 
>      N  = Neutral
>      A  = Ambiguous
>      H  = Halfwidth
>      W  = Wide
>      F  = Fullwidth
>      Na = Narrow
> 
> All you then need to do is find out what those actually mean...

In some East-Asian encodings, there are code-points for Latin characters 
in two forms: "half-width" and "full-width". The half-width form took up 
a single fixed-width column; the full-width forms took up two fixed-width 
columns, so they would line up nicely in columns with Asian characters.

See also:

http://www.unicode.org/reports/tr11/

and search Wikipedia for "full-width" and "half-width".


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#50214

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-07-09 07:00 +0000
Message-ID<51dbb4f2$0$6512$c3e8da3$5496439d@news.astraweb.com>
In reply to#50165
On Mon, 08 Jul 2013 10:53:18 -0700, ferdy.blatsco wrote:

> Not using python 3, for me (a programmer which was present at the
> beginning of computer science, badly interacting with many languages
> from assembler to Fortran and from c to Pascal and so on) it was an hard
> job to arrange the abrupt transition from characters only equal to bytes

Characters have *never* been equal to bytes. Not even Perl treats the 
character 'A' as equal to the byte 0x0A:

if (0x0A eq 'A') {print "Equal\n";}
else {print "Unequal\n";}

will print Unequal, even if you replace "eq" with "==". Nor does Perl 
consider the character 'A' equal to 65.

If you have learned to think of characters being equal to bytes, you have 
learned wrong.


> to some special characters defined with 2, 3 bytes and even more. I
> should have preferred another solution... but i'm not Guido....!

What's a special character?

To an Italian, the characters J, K, W, X and Y are "special characters" 
which do not exist in the ordinary alphabet. To a German, they are not 
special, but S is special because you write SS as ß, but only in 
lowercase.

To a mathematician, σ is just as ordinary as it would be to a Greek; but 
the mathematician probably won't recognise ς unless she actually is 
Greek, even though they are the same letter.

To an American electrician, Ω is an ordinary character, but ω isn't.

To anyone working with angles, or temperatures, the degree symbol ° is an 
ordinary character, but the radian symbol is not. (I can't even find it.)

The English have forgotten that W used to be a ligature for VV, and 
consider it a single ordinary character. But the ligature Æ is considered 
an old-fashioned way of writing AE.

But to Danes and Norwegians, Æ is an ordinary letter, as distinct from AE 
as TH is from Þ. (Which English used to have.) And so on... 

I don't know what a special character is, unless it is the ASCII NUL 
character, since that terminates C strings.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#50228

Fromwxjmfauth@gmail.com
Date2013-07-09 02:34 -0700
Message-ID<60b625d1-de98-4307-8e67-ada361111954@googlegroups.com>
In reply to#50214
Le mardi 9 juillet 2013 09:00:02 UTC+2, Steven D'Aprano a écrit :
> On Mon, 08 Jul 2013 10:53:18 -0700, ferdy.blatsco wrote:
> 
> 
> 
> > Not using python 3, for me (a programmer which was present at the
> 
> > beginning of computer science, badly interacting with many languages
> 
> > from assembler to Fortran and from c to Pascal and so on) it was an hard
> 
> > job to arrange the abrupt transition from characters only equal to bytes
> 
> 
> 
> Characters have *never* been equal to bytes. Not even Perl treats the 
> 
> character 'A' as equal to the byte 0x0A:
> 
> 
> 
> if (0x0A eq 'A') {print "Equal\n";}
> 
> else {print "Unequal\n";}
> 
> 
> 
> will print Unequal, even if you replace "eq" with "==". Nor does Perl 
> 
> consider the character 'A' equal to 65.
> 
> 
> 
> If you have learned to think of characters being equal to bytes, you have 
> 
> learned wrong.
> 
> 
> 
> 
> 
> > to some special characters defined with 2, 3 bytes and even more. I
> 
> > should have preferred another solution... but i'm not Guido....!
> 
> 
> 
> What's a special character?
> 
> 
> 
> To an Italian, the characters J, K, W, X and Y are "special characters" 
> 
> which do not exist in the ordinary alphabet. To a German, they are not 
> 
> special, but S is special because you write SS as ß, but only in 
> 
> lowercase.
> 
> 
> 
> To a mathematician, σ is just as ordinary as it would be to a Greek; but 
> 
> the mathematician probably won't recognise ς unless she actually is 
> 
> Greek, even though they are the same letter.
> 
> 
> 
> To an American electrician, Ω is an ordinary character, but ω isn't.
> 
> 
> 
> To anyone working with angles, or temperatures, the degree symbol ° is an 
> 
> ordinary character, but the radian symbol is not. (I can't even find it.)
> 
> 
> 
> The English have forgotten that W used to be a ligature for VV, and 
> 
> consider it a single ordinary character. But the ligature Æ is considered 
> 
> an old-fashioned way of writing AE.
> 
> 
> 
> But to Danes and Norwegians, Æ is an ordinary letter, as distinct from AE 
> 
> as TH is from Þ. (Which English used to have.) And so on... 
> 
> 
> 
> I don't know what a special character is, unless it is the ASCII NUL 
> 
> character, since that terminates C strings.


--------

The concept of "special characters" does not exist.
However, the definition of a "character" is a problem
per se (character, glyph, grapheme, ...).

You are confusing Unicode, typography and linguistic.

There is no symbole for radian because mathematically
radian is a pure number, a unitless number. You can
hower sepecify a = ... in radian (rad).

Note the difference between SS and ẞ
'FRANZ-JOSEF-STRAUSS-STRAẞE'

jmf


[toc] | [prev] | [next] | [standalone]


#50256

FromChris “Kwpolska” Warrick <kwpolska@gmail.com>
Date2013-07-09 12:15 +0200
Message-ID<mailman.4458.1373382597.3114.python-list@python.org>
In reply to#50228
On Tue, Jul 9, 2013 at 11:34 AM,  <wxjmfauth@gmail.com> wrote:
> Note the difference between SS and ẞ
> 'FRANZ-JOSEF-STRAUSS-STRAẞE'

This is a capital Eszett.  Which just happens not to exist in German.
Germans do not use this character, it is not available on German
keyboards, and the German spelling rules have you replace ß with SS.
And, surprise surprise, STRASSE is the example the Council for German
Orthography used ([0] page 29, §25 E3).

[0]: http://www.neue-rechtschreibung.de/regelwerk.pdf

--
Kwpolska <http://kwpolska.tk> | GPG KEY: 5EAAEA16
stop html mail                | always bottom-post
http://asciiribbon.org        | http://caliburn.nl/topposting.html

[toc] | [prev] | [next] | [standalone]


#50261

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-07-09 16:32 +0000
Message-ID<51dc3b28$0$9505$c3e8da3$5496439d@news.astraweb.com>
In reply to#50256
On Tue, 09 Jul 2013 12:15:29 +0200, Chris “Kwpolska” Warrick wrote:

> On Tue, Jul 9, 2013 at 11:34 AM,  <wxjmfauth@gmail.com> wrote:
>> Note the difference between SS and ẞ 'FRANZ-JOSEF-STRAUSS-STRAẞE'
> 
> This is a capital Eszett.  Which just happens not to exist in German.
> Germans do not use this character, it is not available on German
> keyboards, and the German spelling rules have you replace ß with SS.
> And, surprise surprise, STRASSE is the example the Council for German
> Orthography used ([0] page 29, §25 E3).
> 
> [0]: http://www.neue-rechtschreibung.de/regelwerk.pdf


Only half-right. Uppercase Eszett has been used in Germany going back at 
least to 1879, and appears to be gaining popularity. In 2010 the use of 
uppercase ß apparently became mandatory for geographical place names when 
written in uppercase in official documentation.

http://opentype.info/blog/2011/01/24/capital-sharp-s/

http://en.wikipedia.org/wiki/Capital_ẞ

Font support is still quite poor, but at least half a dozen Windows 7 
fonts provide it, and at least one Mac font.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#50317

Fromwxjmfauth@gmail.com
Date2013-07-10 01:52 -0700
Message-ID<bbf6ebc8-6319-4756-89d3-1764d3ff685e@googlegroups.com>
In reply to#50261
For those who are interested. The official proposal request
for the encoding of the Latin uppercase letter Sharp S in
ISO/IEC 10646; DIN (The German Institute for Standardization)
proposal is available on the web. A pdf with the rationale.
I do not remember from where I got it, probably from a German
web site.

Fonts:
I'm observing the inclusion of this glyph since years. More
and more fonts are supporting it. Available in many fonts,
it is suprisingly not available in Cambria (at least the version
I'm using). STIX does not includes it, it has been requested. Ditto,
for the Latin Modern, the default bundle of fonts for the Unicode
TeX engines.

Last but not least, Python.
Thanks to the Flexible String Representation, it is not
necessary to mention the disastrous, erratic behaviour of
Python, when processing text containing it. It's more than
clear, a serious user willing to process the contain of
'DER GROẞE DUDEN' (a reference German dictionary) will be
better served by using something else.

The irony is that this Flexible String Representation has
been created by a German.

jmf

[toc] | [prev] | [next] | [standalone]


#50554

FromJoshua Landau <joshua@landau.ws>
Date2013-07-12 23:01 +0100
Message-ID<mailman.4657.1373666549.3114.python-list@python.org>
In reply to#50228

[Multipart message — attachments visible in raw view] — view raw

On 9 July 2013 10:34, <wxjmfauth@gmail.com> wrote:

> There is no symbole for radian because mathematically
> radian is a pure number, a unitless number. You can
> hower sepecify a = ... in radian (rad).
>

Isn't a superscript "c" the symbol for radians?

[toc] | [prev] | [next] | [standalone]


#50567

FromTim Roberts <timr@probo.com>
Date2013-07-12 20:42 -0700
Message-ID<04j1u8lr5tbqiaiic7gg9hhh7vkvvro9c5@4ax.com>
In reply to#50554
Joshua Landau <joshua@landau.ws> wrote:
>
>Isn't a superscript "c" the symbol for radians?

That's very rarely used.  More common is "rad".  The problem with a
superscript "c" is that it looks too much like a degree symbol.
-- 
Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.

[toc] | [prev] | [next] | [standalone]


#50568

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-07-13 04:51 +0000
Message-ID<51e0dcce$0$9505$c3e8da3$5496439d@news.astraweb.com>
In reply to#50554
On Fri, 12 Jul 2013 23:01:47 +0100, Joshua Landau wrote:

> Isn't a superscript "c" the symbol for radians?


Only in the sense that a superscript "o" is the symbol for degrees.

Semantically, both degree-sign and radian-sign are different "things" 
than merely an o or c in superscript.

Nevertheless, in mathematics at least, it is normal to leave out the 
radian sign when talking about angles. By default, "1.2" means "1.2 
radians", not "1.2 degrees".


-- 
Steven

[toc] | [prev] | [standalone]


Page 3 of 3 — ← Prev page 1 2 [3]

Back to top | Article view | comp.lang.python


csiph-web