Groups > comp.lang.python > #60686 > unrolled thread

Managing Google Groups headaches

Started by	rusi <rustompmody@gmail.com>
First post	2013-11-28 05:52 -0800
Last post	2013-12-04 08:31 -0800
Articles	20 on this page of 107 — 28 participants

Back to article view | Back to comp.lang.python

  Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-11-28 05:52 -0800
    Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-11-29 00:58 +1100
      Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-11-28 06:17 -0800
        Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-11-29 01:25 +1100
          Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-11-28 07:04 -0800
            Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-11-29 02:08 +1100
              Re: Managing Google Groups headaches Alister <alister.ware@ntlworld.com> - 2013-11-28 15:50 +0000
                Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-11-28 08:22 -0800
                  Re: Managing Google Groups headaches Alister <alister.ware@ntlworld.com> - 2013-11-28 16:33 +0000
              Re: Managing Google Groups headaches Alister <alister.ware@ntlworld.com> - 2013-11-28 15:49 +0000
              Re: Managing Google Groups headaches Alister <alister.ware@ntlworld.com> - 2013-11-28 15:49 +0000
              Re: Managing Google Groups headaches Alister <alister.ware@ntlworld.com> - 2013-11-28 15:50 +0000
                Re: Managing Google Groups headaches Roy Smith <roy@panix.com> - 2013-11-28 11:43 -0500
                  Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-11-29 04:29 +1100
                  Re: Managing Google Groups headaches Neil Cerutti <neilc@norwich.edu> - 2013-12-02 13:03 +0000
                    Re: Managing Google Groups headaches Roy Smith <roy@panix.com> - 2013-12-02 08:29 -0500
                      Re: Managing Google Groups headaches Neil Cerutti <neilc@norwich.edu> - 2013-12-02 14:04 +0000
                        Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-02 09:11 -0800
                          Re: Managing Google Groups headaches Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-02 17:48 +0000
                          Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-12-03 04:54 +1100
                          Re: Managing Google Groups headaches Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-02 18:07 +0000
                      Re: Managing Google Groups headaches Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-12-02 19:56 -0500
                  Re: Managing Google Groups headaches Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-12-02 19:54 -0500
                  Re: [OT] Managing Google Groups headaches Michael Torrie <torriem@gmail.com> - 2013-12-02 18:17 -0700
                    Re: [OT] Managing Google Groups headaches Roy Smith <roy@panix.com> - 2013-12-02 20:43 -0500
                      Re: [OT] Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-02 18:27 -0800
                      Re: [OT] Managing Google Groups headaches Michael Torrie <torriem@gmail.com> - 2013-12-02 20:09 -0700
                        Re: [OT] Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-02 19:26 -0800
                    Re: [OT] Managing Google Groups headaches Grant Edwards <invalid@invalid.invalid> - 2013-12-03 04:27 +0000
                      Re: [OT] Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-12-03 18:01 +1100
                    Re: [OT] Managing Google Groups headaches alex23 <wuwei23@gmail.com> - 2013-12-03 16:30 +1000
                      Re: [OT] Managing Google Groups headaches Steven D'Aprano <steve@pearwood.info> - 2013-12-03 07:13 +0000
                        Re: [OT] Managing Google Groups headaches alex23 <wuwei23@gmail.com> - 2013-12-04 10:23 +1000
                          Re: [OT] Managing Google Groups headaches Neil Cerutti <neilc@norwich.edu> - 2013-12-04 14:34 +0000
                          Re: [OT] Managing Google Groups headaches Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-04 15:21 +0000
                  Re: [OT] Managing Google Groups headaches Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-03 12:09 +0000
            Re: Managing Google Groups headaches Michael Torrie <torriem@gmail.com> - 2013-11-28 08:40 -0700
            Re: Managing Google Groups headaches Travis Griggs <travisgriggs@gmail.com> - 2013-11-28 08:23 -0800
            Re: Managing Google Groups headaches Ned Batchelder <ned@nedbatchelder.com> - 2013-11-28 12:23 -0500
            Re: Managing Google Groups headaches Michael Torrie <torriem@gmail.com> - 2013-11-28 11:29 -0700
              Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-11-28 10:37 -0800
                Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-11-28 11:00 -0800
                  Re: Managing Google Groups headaches Michael Torrie <torriem@gmail.com> - 2013-11-28 12:55 -0700
                Re: Managing Google Groups headaches Walter Hurry <walterhurry@lavabit.com> - 2013-11-28 19:40 +0000
                Re: Managing Google Groups headaches Michael Torrie <torriem@gmail.com> - 2013-11-28 11:50 -0700
                  Re: Managing Google Groups headaches Arif Khokar <akhokar1234@wvu.edu> - 2013-11-28 19:46 -0500
                    Re: Managing Google Groups headaches Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-29 14:41 +0000
                    Re: Managing Google Groups headaches Grant Edwards <invalid@invalid.invalid> - 2013-11-29 16:17 +0000
                    Re: Managing Google Groups headaches Cameron Simpson <cs@zip.com.au> - 2013-12-04 11:38 +1100
                      Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-03 17:39 -0800
                        Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-12-04 13:03 +1100
                        Re: Managing Google Groups headaches Cameron Simpson <cs@zip.com.au> - 2013-12-05 09:47 +1100
                          Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-05 23:42 -0800
                Re: Managing Google Groups headaches Walter Hurry <walterhurry@lavabit.com> - 2013-11-28 20:39 +0000
            Re: Managing Google Groups headaches Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-11-28 16:41 -0500
              Re: Managing Google Groups headaches pecore@pascolo.net - 2013-11-30 14:25 +0100
                Re: Managing Google Groups headaches Cameron Simpson <cs@zip.com.au> - 2013-12-04 11:40 +1100
                  Re: Managing Google Groups headaches Grant Edwards <invalid@invalid.invalid> - 2013-12-04 15:50 +0000
                    Re: Managing Google Groups headaches Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-04 16:07 +0000
                    Re: Managing Google Groups headaches Ned Batchelder <ned@nedbatchelder.com> - 2013-12-04 11:21 -0500
                    Re: Managing Google Groups headaches Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-04 16:33 +0000
            Re: Managing Google Groups headaches Zero Piraeus <z@etiol.net> - 2013-11-28 13:29 -0300
              Re: Managing Google Groups headaches Grant Edwards <invalid@invalid.invalid> - 2013-11-29 16:15 +0000
            Re: Managing Google Groups headaches Terry Reedy <tjreedy@udel.edu> - 2013-11-28 17:32 -0500
            Re: Managing Google Groups headaches Terry Reedy <tjreedy@udel.edu> - 2013-11-28 17:44 -0500
            Re: Managing Google Groups headaches Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-29 14:39 +0000
    Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-11-28 10:13 -0800
      Re: Managing Google Groups headaches Rich Kulawiec <rsk@gsp.org> - 2013-12-04 09:52 -0500
        Re: Managing Google Groups headaches Roy Smith <roy@panix.com> - 2013-12-04 19:58 -0500
          Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-05 23:13 -0800
            Re: Managing Google Groups headaches Roy Smith <roy@panix.com> - 2013-12-06 02:36 -0500
              Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-06 05:03 -0800
                Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-12-07 00:19 +1100
                  Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-06 05:32 -0800
                    Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-12-07 00:48 +1100
                      Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-06 06:11 -0800
                        Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-12-07 01:51 +1100
                ASCII and Unicode [was Re: Managing Google Groups headaches] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-06 19:00 +0000
                  Re: ASCII and Unicode [was Re: Managing Google Groups headaches] Gene Heskett <gheskett@wdtv.com> - 2013-12-06 14:34 -0500
                  Re: ASCII and Unicode [was Re: Managing Google Groups headaches] Roy Smith <roy@panix.com> - 2013-12-06 20:54 +0000
                  Re: ASCII and Unicode [was Re: Managing Google Groups headaches] Chris Angelico <rosuav@gmail.com> - 2013-12-07 10:42 +1100
                  Re: ASCII and Unicode [was Re: Managing Google Groups headaches] rusi <rustompmody@gmail.com> - 2013-12-06 18:33 -0800
                    Re: ASCII and Unicode [was Re: Managing Google Groups headaches] Chris Angelico <rosuav@gmail.com> - 2013-12-07 13:41 +1100
                      Re: ASCII and Unicode [was Re: Managing Google Groups headaches] rusi <rustompmody@gmail.com> - 2013-12-06 19:16 -0800
                        Re: ASCII and Unicode [was Re: Managing Google Groups headaches] Chris Angelico <rosuav@gmail.com> - 2013-12-07 15:08 +1100
                    Re: ASCII and Unicode [was Re: Managing Google Groups headaches] MRAB <python@mrabarnett.plus.com> - 2013-12-07 03:19 +0000
                  Re: ASCII and Unicode giacomo boffi <pecore@pascolo.net> - 2013-12-07 17:05 +0100
                    Re: ASCII and Unicode rusi <rustompmody@gmail.com> - 2013-12-08 08:41 -0800
                    Re: ASCII and Unicode Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-08 17:22 +0000
                      Re: ASCII and Unicode rusi <rustompmody@gmail.com> - 2013-12-08 09:39 -0800
                      Re: ASCII and Unicode giacomo boffi <pecore@pascolo.net> - 2013-12-08 21:11 +0100
                        Re: ASCII and Unicode rusi <rustompmody@gmail.com> - 2013-12-08 19:02 -0800
                Re: Managing Google Groups headaches Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2013-12-07 12:27 +1300
                Re: Managing Google Groups headaches Ned Batchelder <ned@nedbatchelder.com> - 2013-12-06 21:24 -0500
                  Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-06 23:43 -0800
                    Re: Managing Google Groups headaches wxjmfauth@gmail.com - 2013-12-07 02:16 -0800
                      Re: Managing Google Groups headaches Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-07 11:25 +0000
                        Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-12-07 22:49 +1100
                      Re: Managing Google Groups headaches Roy Smith <roy@panix.com> - 2013-12-07 11:08 -0500
                        Re: Managing Google Groups headaches Rotwang <sg552@hotmail.co.uk> - 2013-12-07 16:15 +0000
                        Re: Managing Google Groups headaches Tim Chase <python.list@tim.thechases.com> - 2013-12-07 10:19 -0600
                      Re: Managing Google Groups headaches rusi <rustompmody@gmail.com> - 2013-12-07 08:27 -0800
                        Re: Managing Google Groups headaches Ned Batchelder <ned@nedbatchelder.com> - 2013-12-07 12:04 -0500
            Re: Managing Google Groups headaches Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-12-07 03:07 +0000
              Re: Managing Google Groups headaches Roy Smith <roy@panix.com> - 2013-12-06 22:40 -0500
      Re: Managing Google Groups headaches Chris Angelico <rosuav@gmail.com> - 2013-12-05 02:46 +1100
      Re: Managing Google Groups headaches Travis Griggs <travisgriggs@gmail.com> - 2013-12-04 08:31 -0800

Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6 Next page →

#61194 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-07 10:42 +1100
Subject	Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
Message-ID	<mailman.3673.1386373692.18130.python-list@python.org>
In reply to	#61176

On Sat, Dec 7, 2013 at 6:00 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
>     - character 33 was permitted to be either the exclamation
>       mark ! or the logical OR symbol |
>
>     - consequently character 124 (vertical bar) was always
>       displayed as a broken bar ¦, which explains why even today
>       many keyboards show it that way
>
>     - character 35 was permitted to be either the number sign # or
>       the pound sign £
>
>     - character 94 could be either a caret ^ or a logical NOT ¬

Yeah, good fun stuff. I first met several of these ambiguities in the
OS/2 REXX documentation, which detailed the language's operators by
specifying their byte values as well as their characters - for
instance, this quote from the docs (yeah, I still have it all here):

"""
Note:   Depending upon your Personal System keyboard and the code page
you are using, you may not have the solid vertical bar to select. For
this reason, REXX also recognizes the use of the split vertical bar as
a logical OR symbol. Some keyboards may have both characters. If so,
they are not interchangeable; only the character that is equal to the
ASCII value of 124 works as the logical OR. This type of mismatch can
also cause the character on your screen to be different from the
character on your keyboard.
"""
(The front material on the docs says "(C) Copyright IBM Corp. 1987,
1994. All Rights Reserved.")

It says "ASCII value" where on this list we would be more likely to
call it "byte value", and I'd prefer to say "represented by" rather
than "equal to", but nonetheless, this is still clearly distinguishing
characters and bytes. The language spec is on characters, but
ultimately the interpreter is going to be looking at bytes, so when
there's a problem, it's byte 124 that's the one defined as logical OR.
Oh, and note the copyright date. The byte/char distinction isn't new.

ChrisA

[toc] | [prev] | [next] | [standalone]

#61219 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

From	rusi <rustompmody@gmail.com>
Date	2013-12-06 18:33 -0800
Subject	Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
Message-ID	<41e2ac37-d751-43b4-a720-e8da58429420@googlegroups.com>
In reply to	#61176

On Saturday, December 7, 2013 12:30:18 AM UTC+5:30, Steven D'Aprano wrote:
> On Fri, 06 Dec 2013 05:03:57 -0800, rusi wrote:

> > Evidently (and completely inadvertently) this exchange has just
> > illustrated one of the inadmissable assumptions:
> > "unicode as a medium is universal in the same way that ASCII used to be"

> Ironically, your post was not Unicode.

> Seriously. I am 100% serious.

> Your post was sent using a legacy encoding, Windows-1252, also known as 
> CP-1252, which is most certainly *not* Unicode. Whatever software you 
> used to send the message correctly flagged it with a charset header:

> Content-Type: text/plain; charset=windows-1252

> Alas, the software Roy Smith uses, MT-NewsWatcher, does not handle 
> encodings correctly (or at all!), it screws up the encoding then sends a 
> reply with no charset line at all. This is one bug that cannot be blamed 
> on Google Groups -- or on Unicode.

> > I wrote a number of ellipsis characters ie codepoint 2026 as in:

> Actually you didn't. You wrote a number of ellipsis characters, hex byte 
> \x85 (decimal 133), in the CP1252 charset. That happens to be mapped to 
> code point U+2026 in Unicode, but the two are as distinct as ASCII and 
> EBCDIC.

> > Somewhere between my sending and your quoting those ellipses became the
> > replacement character FFFD

> Yes, it appears that MT-NewsWatcher is *deeply, deeply* confused about 
> encodings and character sets. It doesn't just assume things are ASCII, 
> but makes a half-hearted attempt to be charset-aware, but badly. I can 
> only imagine that it was written back in the Dark Ages where there were a 
> lot of different charsets in use but no conventions for specifying which 
> charset was in use. Or perhaps the author was smoking crack while coding.

> > Leaving aside whose fault this is (very likely buggy google groups),
> > this mojibaking cannot happen if the assumption "All text is ASCII" were
> > to uniformly hold.

> This is incorrect. People forget that ASCII has evolved since the first 
> version of the standard in 1963. There have actually been five versions 
> of the ASCII standard, plus one unpublished version. (And that's not 
> including the things which are frequently called ASCII but aren't.)

> ASCII-1963 didn't even include lowercase letters. It is also missing some 
> graphic characters like braces, and included at least two characters no 
> longer used, the up-arrow and left-arrow. The control characters were 
> also significantly different from today.

> ASCII-1965 was unpublished and unused. I don't know the details of what 
> it changed.

> ASCII-1967 is a lot closer to the ASCII in use today. It made 
> considerable changes to the control characters, moving, adding, removing, 
> or renaming at least half a dozen control characters. It officially added 
> lowercase letters, braces, and some others. It replaced the up-arrow 
> character with the caret and the left-arrow with the underscore. It was 
> ambiguous, allowing variations and substitutions, e.g.:

>     - character 33 was permitted to be either the exclamation 
>       mark ! or the logical OR symbol |

>     - consequently character 124 (vertical bar) was always 
>       displayed as a broken bar ¦, which explains why even today
>       many keyboards show it that way

>     - character 35 was permitted to be either the number sign # or 
>       the pound sign £

>     - character 94 could be either a caret ^ or a logical NOT ¬

> Even the humble comma could be pressed into service as a cedilla.

> ASCII-1968 didn't change any characters, but allowed the use of LF on its 
> own. Previously, you had to use either LF/CR or CR/LF as newline.

> ASCII-1977 removed the ambiguities from the 1967 standard.

> The most recent version is ASCII-1986 (also known as ANSI X3.4-1986). 
> Unfortunately I haven't been able to find out what changes were made -- I 
> presume they were minor, and didn't affect the character set.

> So as you can see, even with actual ASCII, you can have mojibake. It's 
> just not normally called that. But if you are given an arbitrary ASCII 
> file of unknown age, containing code 94, how can you be sure it was 
> intended as a caret rather than a logical NOT symbol? You can't.

> Then there are at least 30 official variations of ASCII, strictly 
> speaking part of ISO-646. These 7-bit codes were commonly called "ASCII" 
> by their users, despite the differences, e.g. replacing the dollar sign $ 
> with the international currency sign ¤, or replacing the left brace 
> { with the letter s with caron š.

> One consequence of this is that the MIME type for ASCII text is called 
> "US ASCII", despite the redundancy, because many people expect "ASCII" 
> alone to mean whatever national variation they are used to.

> But it gets worse: there are proprietary variations on ASCII which are 
> commonly called "ASCII" but aren't, including dozens of 8-bit so-called 
> "extended ASCII" character sets, which is where the problems *really* 
> pile up. Invariably back in the 1980s and early 1990s people used to call 
> these "ASCII" no matter that they used 8-bits and contained anything up 
> to 256 characters.

> Just because somebody calls something "ASCII", doesn't make it so; even 
> if it is ASCII, doesn't mean you know which version of ASCII; even if you 
> know which version, doesn't mean you know how to interpret certain codes. 
> It simply is *wrong* to think that "good ol' plain ASCII text" is 
> unambiguous and devoid of problems.

> > With unicode there are in-memory formats, transportation formats eg
> > UTF-8, 

> And the same applies to ASCII. 

> ASCII is a *seven-bit code*. It will work fine on computers where the 
> word-size is seven bits. If the word-size is eight bits, or more, you 
> have to pad the ASCII code. How do you do that? Pad the most-significant 
> end or the least significant end? That's a choice there. How do you pad 
> it, with a zero or a one? That's another choice. If your word-size is 
> more than eight bits, you might even pad *both* ends.

> In C, a char is defined as the smallest addressable unit of the machine 
> that can contain basic character set, not necessarily eight bits. 
> Implementations of C and C++ sometimes reserve 8, 9, 16, 32, or 36 bits 
> as a "byte" and/or char. Your in-memory representation of ASCII "a" could 
> easily end up as bits 001100001 or 0000000001100001.

> And then there is the question of whether ASCII characters should be Big 
> Endian or Little Endian. I'm referring here to bit endianness, rather 
> than bytes: should character 'a' be represented as bits 1100001 (most 
> significant bit to the left) or 1000011 (least significant bit to the 
> left)? This may be relevant with certain networking protocols. Not all 
> networking protocols are big-endian, nor are all processors. The Ada 
> programming language even supports both bit orders.

> When transmitting ASCII characters, the networking protocol could include 
> various start and stop bits and parity codes. A single 7-bit ASCII 
> character might be anything up to 12 bits in length on the wire. It is 
> simply naive to imagine that the transmission of ASCII codes is the same 
> as the in-memory or on-disk storage of ASCII.

> You're lucky to be active in a time when most common processors have 
> standardized on a single bit-order, and when most (but not all) network 
> protocols have done the same. But that doesn't mean that these issues 
> don't exist for ASCII. If you get a message that purports to be ASCII 
> text but looks like this:

> "\tS\x1b\x1b{\x01u{'\x1b\x13!"

> you should suspect strongly that it is "Hello World!" which has been 
> accidentally bit-reversed by some rogue piece of hardware.

OOf! Thats a lot of data to digest! Thanks anyway.

There's one thing I want to get into:

> Your post was sent using a legacy encoding, Windows-1252, also known as 
> CP-1252, which is most certainly *not* Unicode. Whatever software you 
> used to send the message correctly flagged it with a charset header:

What the hell! I am using firefox 25.0 in debian-testing and posting via GG.

$ locale
shows me:
LANG=en_US.UTF-8

and a bunch of other things all en_US.UTF-8.

For the most part when I point FF at any site and go to view ->
character-encoding, it says Unicode (UTF-8).

However when I go to anything in the python archives:
https://mail.python.org/pipermail/python-list/2013-December/

FF shows it as Western (Windows-1252)

That seems to suggest that something is not right with the python
mailing list config. No??

[toc] | [prev] | [next] | [standalone]

#61221 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-07 13:41 +1100
Subject	Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
Message-ID	<mailman.3693.1386384117.18130.python-list@python.org>
In reply to	#61219

On Sat, Dec 7, 2013 at 1:33 PM, rusi <rustompmody@gmail.com> wrote:
> That seems to suggest that something is not right with the python
> mailing list config. No??

If in doubt, blame someone else, eh?

I'd first check what your browser's actually sending. Firebug will
help there. See if your form fill-out is encoded as UTF-8 or CP-1252.
That's the first step.

ChrisA

[toc] | [prev] | [next] | [standalone]

#61223 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

From	rusi <rustompmody@gmail.com>
Date	2013-12-06 19:16 -0800
Subject	Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
Message-ID	<fc0339d0-ffac-4c11-a93c-33282ebf5511@googlegroups.com>
In reply to	#61221

On Saturday, December 7, 2013 8:11:45 AM UTC+5:30, Chris Angelico wrote:
> On Sat, Dec 7, 2013 at 1:33 PM, rusi  wrote:
> > That seems to suggest that something is not right with the python
> > mailing list config. No??

> If in doubt, blame someone else, eh?

> I'd first check what your browser's actually sending. Firebug will
> help there. See if your form fill-out is encoded as UTF-8 or CP-1252.
> That's the first step.

If you give me some tip where to look, I'll do that.
But I dont see what this has to do with forms.

Everything in the python archive (not just my posts) show as Win 1252
[I checked about 6]

Every other page that I checked (most nothing to do with python list,
GG etc) show UTF-8. [I checked about 5]

None of these checkings had forms to be filled.

[toc] | [prev] | [next] | [standalone]

#61231 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-07 15:08 +1100
Subject	Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
Message-ID	<mailman.3698.1386389329.18130.python-list@python.org>
In reply to	#61223

On Sat, Dec 7, 2013 at 2:16 PM, rusi <rustompmody@gmail.com> wrote:
> On Saturday, December 7, 2013 8:11:45 AM UTC+5:30, Chris Angelico wrote:
>> On Sat, Dec 7, 2013 at 1:33 PM, rusi  wrote:
>> > That seems to suggest that something is not right with the python
>> > mailing list config. No??
>
>> If in doubt, blame someone else, eh?
>
>> I'd first check what your browser's actually sending. Firebug will
>> help there. See if your form fill-out is encoded as UTF-8 or CP-1252.
>> That's the first step.
>
> If you give me some tip where to look, I'll do that.
> But I dont see what this has to do with forms.
>

Page encodings specify what comes from the server to your browser.
Your post went the other way. Tracing the data going back to the
server would tell you how it's encoded.

ChrisA

[toc] | [prev] | [next] | [standalone]

#61224 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

From	MRAB <python@mrabarnett.plus.com>
Date	2013-12-07 03:19 +0000
Subject	Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
Message-ID	<mailman.3694.1386386359.18130.python-list@python.org>
In reply to	#61219

On 07/12/2013 02:41, Chris Angelico wrote:
> On Sat, Dec 7, 2013 at 1:33 PM, rusi <rustompmody@gmail.com> wrote:
>> That seems to suggest that something is not right with the python
>> mailing list config. No??
>
> If in doubt, blame someone else, eh?
>
> I'd first check what your browser's actually sending. Firebug will
> help there. See if your form fill-out is encoded as UTF-8 or CP-1252.
> That's the first step.
>
Looking back through the thread, it looks like:

     Roy posted a reply in us-ascii.

     rusi replied in windows-1252, adding the '…'.

     Roy replied in us-ascii, but with 'Š' in place of '…'.

     rusi replied in utf-8, with '�' in place of '…'

[toc] | [prev] | [next] | [standalone]

#61239 — Re: ASCII and Unicode

From	giacomo boffi <pecore@pascolo.net>
Date	2013-12-07 17:05 +0100
Subject	Re: ASCII and Unicode
Message-ID	<87iov0ir3l.fsf@pascolo.net>
In reply to	#61176

Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:

> Ironically, your post was not Unicode.  [...] Your post was sent
> using a legacy encoding, Windows-1252, also known as CP-1252

i access rusi's post using a NNTP server,
and in his post i see

Content-Type: text/plain; charset=UTF-8

is it possible that what you see is an artifact
of the gateway?

[toc] | [prev] | [next] | [standalone]

#61300 — Re: ASCII and Unicode

From	rusi <rustompmody@gmail.com>
Date	2013-12-08 08:41 -0800
Subject	Re: ASCII and Unicode
Message-ID	<acb2b7b0-fd0b-4b67-b3ee-7fd6dcdda1f5@googlegroups.com>
In reply to	#61239

On Saturday, December 7, 2013 9:35:34 PM UTC+5:30, giacomo boffi wrote:
> Steven D'Aprano  writes:

> > Ironically, your post was not Unicode.  [...] Your post was sent
> > using a legacy encoding, Windows-1252, also known as CP-1252

> i access rusi's post using a NNTP server,
> and in his post i see

> Content-Type: text/plain; charset=UTF-8

> is it possible that what you see is an artifact
> of the gateway?

Thanks for checking that!

[toc] | [prev] | [next] | [standalone]

#61301 — Re: ASCII and Unicode

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-12-08 17:22 +0000
Subject	Re: ASCII and Unicode
Message-ID	<52a4aada$0$30003$c3e8da3$5496439d@news.astraweb.com>
In reply to	#61239

On Sat, 07 Dec 2013 17:05:34 +0100, giacomo boffi wrote:

> Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
> 
>> Ironically, your post was not Unicode.  [...] Your post was sent using
>> a legacy encoding, Windows-1252, also known as CP-1252
> 
> i access rusi's post using a NNTP server, and in his post i see
> 
> Content-Type: text/plain; charset=UTF-8

But *which post* are you looking at?

I have just looked at three posts from him:

Rusi's original post, where he used the ellipsis characters:

  Subject: Re: Managing Google Groups headaches
  Date: Thu, 5 Dec 2013 23:13:54 -0800 (PST)
  Content-Type: text/plain; charset=windows-1252

Then his reply to me:

  Subject: Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
  Date: Fri, 6 Dec 2013 18:33:39 -0800 (PST)
  Content-Type: text/plain; charset=UTF-8

And finally, his reply to you:

  Subject: Re: ASCII and Unicode
  Date: Sun, 8 Dec 2013 08:41:10 -0800 (PST)
  Content-Type: text/plain; charset=ISO-8859-1

It seems to me that whatever client he is using to post (I believe it is 
Google Groups web interface?) varies the encoding depending on what 
characters are included in his post.

> is it possible that what you see is an artifact of the gateway?

I doubt it. Unfortunately the email mailing list archive doesn't display 
all the email headers, but for the record here is his original post as 
seen by the email mailing list:

https://mail.python.org/pipermail/python-list/2013-December/661782.html

If you view source, you'll see that Mailman (the mailing list software) 
sets the webpage encoding to US-ASCII and encodes the ellipses to &#8230, 
which is a perfectly reasonable thing for a web page to do. So we can be 
confident that when Mailman saw Rusi's post, it was able to correctly 
decode the message and see ellipses.

Although I think that (probably) Google Groups is being stupid by varying 
the charset (why not just use UTF-8 always?), at least it is setting the 
charset correctly. 

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#61302 — Re: ASCII and Unicode

From	rusi <rustompmody@gmail.com>
Date	2013-12-08 09:39 -0800
Subject	Re: ASCII and Unicode
Message-ID	<43bf6ffb-696d-4b5f-bfee-ec3e5e138883@googlegroups.com>
In reply to	#61301

On Sunday, December 8, 2013 10:52:34 PM UTC+5:30, Steven D'Aprano wrote:
> On Sat, 07 Dec 2013 17:05:34 +0100, giacomo boffi wrote:

> > Steven D'Aprano  writes:
> >> Ironically, your post was not Unicode.  [...] Your post was sent using
> >> a legacy encoding, Windows-1252, also known as CP-1252
> > i access rusi's post using a NNTP server, and in his post i see
> > Content-Type: text/plain; charset=UTF-8

> But *which post* are you looking at?

> I have just looked at three posts from him:

> Rusi's original post, where he used the ellipsis characters:

>   Subject: Re: Managing Google Groups headaches
>   Date: Thu, 5 Dec 2013 23:13:54 -0800 (PST)
>   Content-Type: text/plain; charset=windows-1252

> Then his reply to me:

>   Subject: Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
>   Date: Fri, 6 Dec 2013 18:33:39 -0800 (PST)
>   Content-Type: text/plain; charset=UTF-8

> And finally, his reply to you:

>   Subject: Re: ASCII and Unicode
>   Date: Sun, 8 Dec 2013 08:41:10 -0800 (PST)
>   Content-Type: text/plain; charset=ISO-8859-1

> It seems to me that whatever client he is using to post (I believe it is 
> Google Groups web interface?) varies the encoding depending on what 
> characters are included in his post.

> > is it possible that what you see is an artifact of the gateway?

> I doubt it. Unfortunately the email mailing list archive doesn't display 
> all the email headers, but for the record here is his original post as 
> seen by the email mailing list:

> https://mail.python.org/pipermail/python-list/2013-December/661782.html

> If you view source, you'll see that Mailman (the mailing list software) 
> sets the webpage encoding to US-ASCII and encodes the ellipses to &#8230, 
> which is a perfectly reasonable thing for a web page to do. So we can be 
> confident that when Mailman saw Rusi's post, it was able to correctly 
> decode the message and see ellipses.

> Although I think that (probably) Google Groups is being stupid by varying 
> the charset (why not just use UTF-8 always?), at least it is setting the 
> charset correctly. 

I think GG is being being sweet and affectionate and delectable enough that a
💩 in the footer will keep it stuck at UTF-8 you think ?? :-)

[toc] | [prev] | [next] | [standalone]

#61325 — Re: ASCII and Unicode

From	giacomo boffi <pecore@pascolo.net>
Date	2013-12-08 21:11 +0100
Subject	Re: ASCII and Unicode
Message-ID	<87ppp7umpu.fsf@pascolo.net>
In reply to	#61301

Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:

> On Sat, 07 Dec 2013 17:05:34 +0100, giacomo boffi wrote:
>
>> Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
>> 
>>> Ironically, your post was not Unicode.  [...] Your post was sent using
>>> a legacy encoding, Windows-1252, also known as CP-1252
>> 
>> i access rusi's post using a NNTP server, and in his post i see
>> 
>> Content-Type: text/plain; charset=UTF-8
>
> But *which post* are you looking at?

<blush> the wrong one...</> i.e, the one JUST BEFORE your change of
subject --- if i look at the "ellipsis" post, i see the same encoding
that you have mentioned

sorry for the confusion

[toc] | [prev] | [next] | [standalone]

#61351 — Re: ASCII and Unicode

From	rusi <rustompmody@gmail.com>
Date	2013-12-08 19:02 -0800
Subject	Re: ASCII and Unicode
Message-ID	<559eaf34-a2f3-4fb4-8a70-9384e84468fe@googlegroups.com>
In reply to	#61325

On Monday, December 9, 2013 1:41:41 AM UTC+5:30, giacomo boffi wrote:
> <blush> the wrong one...</> i.e, the one JUST BEFORE your change of
> subject --- if i look at the "ellipsis" post, i see the same encoding
> that you have mentioned

> sorry for the confusion

And thank you for pointing the way to the culprit, viz. GG trying to be
too clever.

[Since you neglected to close your <blush> I am included in it :-) ]

[toc] | [prev] | [next] | [standalone]

#61191

From	Gregory Ewing <greg.ewing@canterbury.ac.nz>
Date	2013-12-07 12:27 +1300
Message-ID	<bgf4s0F6jqhU1@mid.individual.net>
In reply to	#61141

rusi wrote:
> On Friday, December 6, 2013 1:06:30 PM UTC+5:30, Roy Smith wrote:
> 
>>Which means, if I wanted to (and many examples of this exist), I can 
>>write my own client which presents the same information in different 
>>ways.
> 
> Not sure whats your point.

The point is the existence of an alternative interface that's
designed for use by other programs rather than humans.

This is what web forums are missing. If it existed, one could
easily create an alternative client with a newsreader-like
interface. Without it, such a client would have to be a
monstrosity that worked by screen-scraping the html.

It's not about the format of the messages themselves -- that
could be text, or html, or reST, or bbcode or whatever. It's
about the *framing* of the messages, and being able to
query them by their metadata.

-- 
Greg

[toc] | [prev] | [next] | [standalone]

#61217

From	Ned Batchelder <ned@nedbatchelder.com>
Date	2013-12-06 21:24 -0500
Message-ID	<mailman.3691.1386383105.18130.python-list@python.org>
In reply to	#61141

On 12/6/13 8:03 AM, rusi wrote:
>> I think you're off on the wrong track here.  This has nothing to do with
>> >plain text (ascii or otherwise).  It has to do with divorcing how you
>> >store and transport messages (be they plain text, HTML, or whatever)
>> >from how a user interacts with them.
>
> Evidently (and completely inadvertently) this exchange has just
> illustrated one of the inadmissable assumptions:
>
> "unicode as a medium is universal in the same way that ASCII used to be"
>
> I wrote a number of ellipsis characters ie codepoint 2026 as in:
>
>    - human communication…
> (is not very different from)
>    - machine communication…
>
> Somewhere between my sending and your quoting those ellipses became
> the replacement character FFFD
>
>>> > >   - human communication�
>>> > >(is not very different from)
>>> > >   - machine communication�
> Leaving aside whose fault this is (very likely buggy google groups),
> this mojibaking cannot happen if the assumption "All text is ASCII"
> were to uniformly hold.
>
> Of course with unicode also this can be made to not happen, but that
> is fragile and error-prone.  And that is because ASCII (not extended)
> is ONE thing in a way that unicode is hopelessly a motley inconsistent
> variety.

You seem to be suggesting that we should stick to ASCII.  There are of 
course languages that need more than just the Latin alphabet.  How would 
you suggest we support them?  Or maybe I don't understand?

--Ned.

[toc] | [prev] | [next] | [standalone]

#61232

From	rusi <rustompmody@gmail.com>
Date	2013-12-06 23:43 -0800
Message-ID	<5137f9a5-d0a0-4f7a-8d50-e464094392e2@googlegroups.com>
In reply to	#61217

On Saturday, December 7, 2013 7:54:50 AM UTC+5:30, Ned Batchelder wrote:
> On 12/6/13 8:03 AM, rusi wrote:

> > Leaving aside whose fault this is (very likely buggy google groups),
> > this mojibaking cannot happen if the assumption "All text is ASCII"
> > were to uniformly hold.
> > Of course with unicode also this can be made to not happen, but that
> > is fragile and error-prone.  And that is because ASCII (not extended)
> > is ONE thing in a way that unicode is hopelessly a motley inconsistent
> > variety.

> You seem to be suggesting that we should stick to ASCII.  There are of 
> course languages that need more than just the Latin alphabet.  How would 
> you suggest we support them?  Or maybe I don't understand?

Heh! Yes I guess that can be read into what I was saying.

Practically: I dont see that as an option or that the question of
going back to ASCII even arises.

I was talking more philosophically/historically.

Up until the time of Unix a file for example was a structured
heavy-duty concept motivated by entirely technological considerations:
http://en.wikipedia.org/wiki/Data_set_%28IBM_mainframe%29

By simplifying that into the modern concept of file -- just a stream
of bytes -- and allowing the puns:

  byte string
= char list
= text

some elegant systems could be made with people having 'beautiful thoughts:'

Everything that could be stored anywhere -- core or disk -- being bytes
one could go to the next stage and pass around these bytes between
processes. And so we get the elegant --  pipeline -- beauty of Unix
scripts.

Of course there was a catch (Isn't there always?):

Things that did not fit in with this philosophy -- eg clicks of a mouse,
bits on display -- were modelled badly or not at all.

Not-at-all: CLI
Badly: Monstrosity called X

And this explains some of the cultural kinks of our field:

Unix guys invariably think of CLIs as natural and obvious whereas GUIs
are just wasteful eye-candy.

[Yours truly is one of those old geezers who does not know how to
write a GUI to save his life. Almost normal in the Unix world except
that he's not proud of it]

Windows/Mac people do not suffer these delusions but then they dont think of 
programming as natural or obvious at all.

Ive often been amused at windows folk: They dont think of Word as a program.
Rather docs are things that magically open when clicked :-)

Brings me to the point I was trying to make (got side-tracked by
the failure of a character to roundtrip between me and Roy  -- Im none the 
wiser why)

The ASCII = Text = Unicode (non)equation is a relatively minor point.

The more central point is that humans use and need more than just
words to communicate.  By straitjacketing communication into the thin
channel of text we are severely impoverishing ourselves.

We communicate with systems with programs that are unstructured
text-files even though programs are conceptually highly structured entities.

Likewise we communicate with each other by this obscenely obsolete
textual mode that I am using right now when rich text formats have been
available for decades.

Some of my more detailed writings on this:

http://blog.languager.org/2013/09/poorest-computer-users-are-programmers.html

http://blog.languager.org/2012/10/html-is-why-mess-in-programming-syntax.html

[toc] | [prev] | [next] | [standalone]

#61234

From	wxjmfauth@gmail.com
Date	2013-12-07 02:16 -0800
Message-ID	<31f1bb84-1432-446c-a7d4-79ce16f2a543@googlegroups.com>
In reply to	#61232

Rusi:

"unicode as a medium is universal in the same way that
ASCII used to be"

Probably, you do not realize deeply how this sentence
is correct. Unicode and ascii are constructed in the
same way. It has not even to do with "characters", but
with mathematics.

It is on this level the FSR fails. It is mathematically
wrong by design!

jmf

[toc] | [prev] | [next] | [standalone]

#61235

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-12-07 11:25 +0000
Message-ID	<52a30593$0$30003$c3e8da3$5496439d@news.astraweb.com>
In reply to	#61234

On Sat, 07 Dec 2013 02:16:02 -0800, wxjmfauth wrote:

> Rusi:
> 
> "unicode as a medium is universal in the same way that ASCII used to be"
> 
> Probably, you do not realize deeply how this sentence is correct.
> Unicode and ascii are constructed in the same way. It has not even to do
> with "characters", but with mathematics.
> 
> It is on this level the FSR fails. It is mathematically wrong by design!

I'm reminded of that fellow, I don't remember his name, who *years* after 
the Wright Brothers had flown, and there were dozens of people building 
aeroplanes, was still trying to convince everyone that heavier-than-air 
flight was mathematically impossible.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#61237

From	Chris Angelico <rosuav@gmail.com>
Date	2013-12-07 22:49 +1100
Message-ID	<mailman.3700.1386416994.18130.python-list@python.org>
In reply to	#61235

On Sat, Dec 7, 2013 at 10:25 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Sat, 07 Dec 2013 02:16:02 -0800, wxjmfauth wrote:
>
>> Rusi:
>>
>> "unicode as a medium is universal in the same way that ASCII used to be"
>>
>> Probably, you do not realize deeply how this sentence is correct.
>> Unicode and ascii are constructed in the same way. It has not even to do
>> with "characters", but with mathematics.
>>
>> It is on this level the FSR fails. It is mathematically wrong by design!
>
>
> I'm reminded of that fellow, I don't remember his name, who *years* after
> the Wright Brothers had flown, and there were dozens of people building
> aeroplanes, was still trying to convince everyone that heavier-than-air
> flight was mathematically impossible.

Nearest I can find is:

https://en.wikipedia.org/wiki/Simon_Newcomb#On_the_impossibility_of_a_flying_machine

He at least accepted the Wrights' work once he found out about it.
Also, he didn't make repeated usenet posts that torpedo you in the
face and leave an "Uh?"-shaped hole. [1] I'm still not sure what jmf
meant by the above.

ChrisA

[1] http://bofh.ntk.net/BOFH/1999/bastard99-24.php

[toc] | [prev] | [next] | [standalone]

#61240

From	Roy Smith <roy@panix.com>
Date	2013-12-07 11:08 -0500
Message-ID	<roy-F4F08B.11080807122013@news.panix.com>
In reply to	#61234

In article <31f1bb84-1432-446c-a7d4-79ce16f2a543@googlegroups.com>,
 wxjmfauth@gmail.com wrote:

> It is on this level the FSR fails.

What is "FSR"?  I apologize if this was explained earlier in the thread 
and I can't find the reference.

https://en.wikipedia.org/wiki/FSR#Science_and_technology was no help.

[toc] | [prev] | [next] | [standalone]

#61242

From	Rotwang <sg552@hotmail.co.uk>
Date	2013-12-07 16:15 +0000
Message-ID	<l7vhj6$tf1$2@dont-email.me>
In reply to	#61240

On 07/12/2013 16:08, Roy Smith wrote:
> In article <31f1bb84-1432-446c-a7d4-79ce16f2a543@googlegroups.com>,
>   wxjmfauth@gmail.com wrote:
>
>> It is on this level the FSR fails.
>
> What is "FSR"?  I apologize if this was explained earlier in the thread
> and I can't find the reference.

It's the Flexible String Representation, introduced in Python 3.3:

http://www.python.org/dev/peps/pep-0393/

[toc] | [prev] | [next] | [standalone]

Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6 Next page →

csiph-web

Managing Google Groups headaches

Contents

#61194 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

#61219 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

#61221 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

#61223 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

#61231 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

#61224 — Re: ASCII and Unicode [was Re: Managing Google Groups headaches]

#61239 — Re: ASCII and Unicode

#61300 — Re: ASCII and Unicode

#61301 — Re: ASCII and Unicode

#61302 — Re: ASCII and Unicode

#61325 — Re: ASCII and Unicode

#61351 — Re: ASCII and Unicode

#61191

#61217

#61232

#61234

#61235

#61237

#61240

#61242