Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #20242 > unrolled thread

Re: Python usage numbers

Started byChris Angelico <rosuav@gmail.com>
First post2012-02-12 12:28 +1100
Last post2012-02-15 11:56 +0200
Articles 20 on this page of 109 — 31 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-12 12:28 +1100
    Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 02:23 +0000
      Re: Python usage numbers Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-11 18:36 -0800
        Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-12 15:38 +1100
          Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 05:51 +0000
            Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-12 17:08 +1100
            Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 10:48 -0500
              Re: Python usage numbers Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-02-12 11:47 -0500
                Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 12:11 -0500
                  Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 22:49 +0000
            Re: Python usage numbers Dan Sommers <dan@tombstonezero.net> - 2012-02-12 15:55 +0000
            Re: Python usage numbers rusi <rustompmody@gmail.com> - 2012-02-12 08:50 -0800
              Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 12:21 -0500
              Re: Python usage numbers Nick Dokos <nicholas.dokos@hp.com> - 2012-02-12 12:36 -0500
                entering unicode  (was Python usage numbers) rusi <rustompmody@gmail.com> - 2012-02-12 19:09 -0800
                  Re: entering unicode  (was Python usage numbers) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-19 03:44 +0000
                    Re: entering unicode (was Python usage numbers) rusi <rustompmody@gmail.com> - 2012-02-19 00:52 -0800
              How do you Unicode proponents type your non-ASCII characters? (was: Python usage numbers) Ben Finney <ben+python@benfinney.id.au> - 2012-02-13 09:43 +1100
              Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 22:56 +0000
          Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 10:13 -0500
            Re: Python usage numbers Terry Reedy <tjreedy@udel.edu> - 2012-02-12 17:07 -0500
              Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 17:22 -0500
            Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-13 09:14 +1100
              Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 17:27 -0500
                Re: Python usage numbers Dave Angel <davea@dejaviewphoto.com> - 2012-02-12 17:40 -0500
                Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 23:29 +0000
                  Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 18:41 -0500
                  Re: Python usage numbers Dave Angel <d@davea.name> - 2012-02-12 19:03 -0500
                  Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-13 11:59 +1100
                    Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 20:11 -0500
            Re: Python usage numbers Christian Heimes <lists@cheimes.de> - 2012-02-13 01:00 +0100
            Re: Python usage numbers Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-02-12 21:37 -0500
            Re: Python usage numbers Terry Reedy <tjreedy@udel.edu> - 2012-02-12 22:09 -0500
              Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 22:57 -0500
                Re: Python usage numbers Ben Finney <ben+python@benfinney.id.au> - 2012-02-13 15:19 +1100
                  Re: Python usage numbers Andrew Berg <bahamutzero8825@gmail.com> - 2012-02-13 12:26 -0600
              Re: Python usage numbers jmfauth <wxjmfauth@gmail.com> - 2012-02-14 00:00 -0800
        Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 06:10 +0000
          Re: Python usage numbers Andrew Berg <bahamutzero8825@gmail.com> - 2012-02-12 01:05 -0600
            Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 09:12 +0000
              Re: Python usage numbers Andrew Berg <bahamutzero8825@gmail.com> - 2012-02-12 05:11 -0600
                Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 22:30 +0000
                  Re: Python usage numbers Dave Angel <d@davea.name> - 2012-02-12 17:50 -0500
              Re: Python usage numbers Peter Pearson <ppearson@nowhere.invalid> - 2012-02-12 17:58 +0000
          Re: Python usage numbers Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-12 20:48 -0800
            Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-13 16:03 +1100
            OT: Entitlements [was Re: Python usage numbers] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-13 08:05 +0000
              Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 08:01 -0800
                Re: OT: Entitlements [was Re: Python usage numbers] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-13 16:12 +0000
                  Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 08:27 -0800
                Re: OT: Entitlements [was Re: Python usage numbers] Ian Kelly <ian.g.kelly@gmail.com> - 2012-02-13 11:38 -0700
                  Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 13:01 -0800
                    Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-14 08:27 +1100
                    Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-13 21:46 +0000
                    Re: OT: Entitlements [was Re: Python usage numbers] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-14 00:19 +0000
                      Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 17:07 -0800
                    Re: OT: Entitlements [was Re: Python usage numbers] Ian Kelly <ian.g.kelly@gmail.com> - 2012-02-13 18:29 -0700
                      Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-17 17:13 -0800
                        Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-18 13:13 +1100
                        Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-18 02:39 +0000
                        Re: OT: Entitlements [was Re: Python usage numbers] Ian Kelly <ian.g.kelly@gmail.com> - 2012-02-18 00:28 -0700
                          Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-18 07:02 -0800
                            Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-18 16:15 +0000
                              Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-18 10:34 -0800
                                Re: OT: Entitlements [was Re: Python usage numbers] random joe <pywin32@gmail.com> - 2012-02-18 10:49 -0800
                            Re: OT: Entitlements [was Re: Python usage numbers] Albert van der Horst <albert@spenarnc.xs4all.nl> - 2012-02-26 12:14 +0000
                        Re: OT: Entitlements [was Re: Python usage numbers] Terry Reedy <tjreedy@udel.edu> - 2012-02-18 04:16 -0500
                    Re: OT: Entitlements [was Re: Python usage numbers] John O'Hagan <research@johnohagan.com> - 2012-02-14 19:41 +1100
                      Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-14 16:21 -0800
                        Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-15 11:44 +1100
                          Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-14 17:26 -0800
                            Re: OT: Entitlements [was Re: Python usage numbers] John O'Hagan <research@johnohagan.com> - 2012-02-15 19:56 +1100
                              Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-15 07:04 -0800
                                Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-15 15:18 +0000
                                  Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-15 08:27 -0800
                                    Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-15 17:16 +0000
                                Re: OT: Entitlements [was Re: Python usage numbers] Ian Kelly <ian.g.kelly@gmail.com> - 2012-02-15 09:46 -0700
                            Re: OT: Entitlements [was Re: Python usage numbers] Albert van der Horst <albert@spenarnc.xs4all.nl> - 2012-02-26 12:44 +0000
                              Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-26 12:35 -0800
                                Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-27 07:50 +1100
                                  Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-26 14:32 -0800
                              Re: OT: Entitlements Ben Finney <ben+python@benfinney.id.au> - 2012-02-27 07:46 +1100
                Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-14 07:47 +1100
                Re: OT: Entitlements [was Re: Python usage numbers] Michael Torrie <torriem@gmail.com> - 2012-02-13 14:46 -0700
                  Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 16:39 -0800
                    Re: OT: Entitlements [was Re: Python usage numbers] Michael Torrie <torriem@gmail.com> - 2012-02-13 18:36 -0700
                    Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-14 12:37 +1100
                      Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-17 17:37 -0800
                Re: OT: Entitlements [was Re: Python usage numbers] Tim Wintle <tim.wintle@teamrubber.com> - 2012-02-13 16:41 +0000
                  Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-14 16:40 -0800
                    RE: OT: Entitlements [was Re: Python usage numbers] "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-02-17 20:09 +0000
                Re: OT: Entitlements [was Re: Python usage numbers] Duncan Booth <duncan.booth@invalid.invalid> - 2012-02-14 11:31 +0000
                  Re: OT: Entitlements [was Re: Python usage numbers] Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-02-14 07:06 -0500
                  Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-14 16:48 -0800
                    Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-15 12:32 +1100
                    Re: OT: Entitlements [was Re: Python usage numbers] Duncan Booth <duncan.booth@invalid.invalid> - 2012-02-15 09:47 +0000
                      Re: OT: Entitlements [was Re: Python usage numbers] Arnaud Delobelle <arnodel@gmail.com> - 2012-02-15 09:58 +0000
                        Re: OT: Entitlements [was Re: Python usage numbers] Duncan Booth <duncan.booth@invalid.invalid> - 2012-02-15 10:04 +0000
                          Kill files [was Re: OT: Entitlements [was Re: Python usage numbers]] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-15 10:27 +0000
                            Re: Kill files [was Re: OT: Entitlements [was Re: Python usage numbers]] Ethan Furman <ethan@stoneleaf.us> - 2012-02-15 11:29 -0800
                Re: OT: Entitlements [was Re: Python usage numbers] rusi <rustompmody@gmail.com> - 2012-02-14 04:56 -0800
                Re: OT: Entitlements [was Re: Python usage numbers] Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-02-14 09:37 -0500
      Re: Python usage numbers Matej Cepl <mcepl@redhat.com> - 2012-02-12 09:14 +0100
        Re: Python usage numbers Matej Cepl <mcepl@redhat.com> - 2012-02-12 09:26 +0100
          Re: Python usage numbers Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-12 12:11 +0000
            Re: Python usage numbers alister <alister.ware@ntlworld.com> - 2012-02-12 18:55 +0000
              Re: Python usage numbers jmfauth <wxjmfauth@gmail.com> - 2012-02-12 11:52 -0800
                French and IDLE on Windows  (was Re: Python usage numbers) Terry Reedy <tjreedy@udel.edu> - 2012-02-12 18:30 -0500
          Re: Python usage numbers Anssi Saari <as@sci.fi> - 2012-02-15 11:56 +0200

Page 1 of 6  [1] 2 3 4 5 6  Next page →


#20242 — Re: Python usage numbers

FromChris Angelico <rosuav@gmail.com>
Date2012-02-12 12:28 +1100
SubjectRe: Python usage numbers
Message-ID<mailman.5711.1329010113.27778.python-list@python.org>
On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
> However, in at
> least one current thread (on python-ideas) and at a variety of times
> in the past, _some_ people have found Unicode in Python 3 to make more
> work.

If Unicode in Python is causing you more work, isn't it most likely
that the issue would have come up anyway? For instance, suppose you
have a web form and you accept customer names, which you then store in
a database. You could assume that the browser submits it in UTF-8 and
that your database back-end can accept UTF-8, and then pretend that
it's all ASCII, but if you then want to upper-case the name for a
heading, somewhere you're going to needto deal with Unicode; and when
your programming language has facilities like str.upper(), that's
going to make it easier, not later. Sure, the simple case is easier if
you pretend it's all ASCII, but it's still better to have language
facilities.

ChrisA

[toc] | [next] | [standalone]


#20244

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-02-12 02:23 +0000
Message-ID<4f37229b$0$29986$c3e8da3$5496439d@news.astraweb.com>
In reply to#20242
On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote:

> On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow
> <ericsnowcurrently@gmail.com> wrote:
>> However, in at
>> least one current thread (on python-ideas) and at a variety of times in
>> the past, _some_ people have found Unicode in Python 3 to make more
>> work.
> 
> If Unicode in Python is causing you more work, isn't it most likely that
> the issue would have come up anyway?

The argument being made is that in Python 2, if you try to read a file 
that contains Unicode characters encoded with some unknown codec, you 
don't have to think about it. Sure, you get moji-bake rubbish in your 
database, but that's the fault of people who insist on not being 
American. Or who spell Zoe with an umlaut.

In Python 3, if you try the same thing, you get an error. Fixing the 
error requires thought, and even if that is only a minuscule amount of 
thought, that's too much for some developers who are scared of Unicode. 
Hence the FUD that Python 3 is too hard because it makes you learn 
Unicode.

I know this isn't exactly helpful, but I wish they'd just HTFU. I'm with 
Joel Spolsky on this one: if you're a programmer in 2003 who doesn't have 
at least a basic working knowledge of Unicode, you're the equivalent of a 
doctor who doesn't believe in germs.

http://www.joelonsoftware.com/articles/Unicode.html

Learning a basic working knowledge of Unicode is not that hard. You don't 
need to be an expert, and it's just not that scary.

The use-case given is:

"I have a file containing text. I can open it in an editor and see it's 
nearly all ASCII text, except for a few weird and bizarre characters like 
£ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an 
error. What should I do that requires no thought?"

Obvious answers:

- Try decoding with UTF8 or Latin1. Even if you don't get the right 
characters, you'll get *something*.

- Use open(filename, encoding='ascii', errors='surrogateescape')

(Or possibly errors='ignore'.)



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#20245

FromRick Johnson <rantingrickjohnson@gmail.com>
Date2012-02-11 18:36 -0800
Message-ID<f610859f-3aa3-4c84-a737-40791a217ef1@m5g2000yqk.googlegroups.com>
In reply to#20244
On Feb 11, 8:23 pm, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote:
> > On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow
> > <ericsnowcurren...@gmail.com> wrote:
> >> However, in at
> >> least one current thread (on python-ideas) and at a variety of times in
> >> the past, _some_ people have found Unicode in Python 3 to make more
> >> work.
>
> > If Unicode in Python is causing you more work, isn't it most likely that
> > the issue would have come up anyway?
>
> The argument being made is that in Python 2, if you try to read a file
> that contains Unicode characters encoded with some unknown codec, you
> don't have to think about it. Sure, you get moji-bake rubbish in your
> database, but that's the fault of people who insist on not being
> American. Or who spell Zoe with an umlaut.

That's not the worst of it... i have many times had a block of text
that was valid ASCII except for some intermixed Unicode white-space.
Who the hell would even consider inserting Unicode white-space!!!

> "I have a file containing text. I can open it in an editor and see it's
> nearly all ASCII text, except for a few weird and bizarre characters like
> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
> error. What should I do that requires no thought?"
>
> Obvious answers:

the most obvious answer would be to read the file WITHOUT worrying
about asinine encoding.

[toc] | [prev] | [next] | [standalone]


#20248

FromChris Angelico <rosuav@gmail.com>
Date2012-02-12 15:38 +1100
Message-ID<mailman.5715.1329021524.27778.python-list@python.org>
In reply to#20245
On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson
<rantingrickjohnson@gmail.com> wrote:
> On Feb 11, 8:23 pm, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:
>> "I have a file containing text. I can open it in an editor and see it's
>> nearly all ASCII text, except for a few weird and bizarre characters like
>> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
>> error. What should I do that requires no thought?"
>>
>> Obvious answers:
>
> the most obvious answer would be to read the file WITHOUT worrying
> about asinine encoding.

What this statement misunderstands, though, is that ASCII is itself an
encoding. Files contain bytes, and it's only what's external to those
bytes that gives them meaning. The famous "bush hid the facts" trick
with Windows Notepad shows the folly of trying to use internal
evidence to identify meaning from bytes.

Everything that displays text to a human needs to translate bytes into
glyphs, and the usual way to do this conceptually is to go via
characters. Pretending that it's all the same thing really means
pretending that one byte represents one character and that each
character is depicted by one glyph. And that's doomed to failure,
unless everyone speaks English with no foreign symbols - so, no
mathematical notations.

ChrisA

[toc] | [prev] | [next] | [standalone]


#20250

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-02-12 05:51 +0000
Message-ID<4f375347$0$29986$c3e8da3$5496439d@news.astraweb.com>
In reply to#20248
On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:

> Everything that displays text to a human needs to translate bytes into
> glyphs, and the usual way to do this conceptually is to go via
> characters. Pretending that it's all the same thing really means
> pretending that one byte represents one character and that each
> character is depicted by one glyph. And that's doomed to failure, unless
> everyone speaks English with no foreign symbols - so, no mathematical
> notations.

Pardon me, but you can't even write *English* in ASCII.

You can't say that it cost you £10 to courier your résumé to the head 
office of Encyclopædia Britanica to apply for the position of Staff 
Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy 
and old-fashioned, but it is traditional English.)

Hell, you can't even write in *American*: you can't say that the recipe 
for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.

ASCII truly is a blight on the world, and the sooner it fades into 
obscurity, like EBCDIC, the better.

Even if everyone did change to speak ASCII, you still have all the 
historical records and documents and files to deal with. Encodings are 
not going away.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#20251

FromChris Angelico <rosuav@gmail.com>
Date2012-02-12 17:08 +1100
Message-ID<mailman.5717.1329026906.27778.python-list@python.org>
In reply to#20250
On Sun, Feb 12, 2012 at 4:51 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> You can't say that it cost you £10 to courier your résumé to the head
> office of Encyclopædia Britanica to apply for the position of Staff
> Coördinator.

True, but if it cost you $10 (or 10 GBP) to courier your curriculum
vitae to the head office of Encyclopaedia Britannica to become Staff
Coordinator, then you'd be fine. And if it cost you $10 to post your
work summary to Britannica's administration to apply for this Staff
Coordinator position, you could say it without 'e' too. Doesn't mean
you don't need Unicode!

ChrisA

[toc] | [prev] | [next] | [standalone]


#20273

FromRoy Smith <roy@panix.com>
Date2012-02-12 10:48 -0500
Message-ID<roy-74351C.10483512022012@news.panix.com>
In reply to#20250
In article <4f375347$0$29986$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> ASCII truly is a blight on the world, and the sooner it fades into 
> obscurity, like EBCDIC, the better.

That's a fair statement, but it's also fair to say that at the time it 
came out (49 years ago!) it was a revolutionary improvement on the 
extant state of affairs (every manufacturer inventing their own code, 
and often different codes for different machines).  Given the cost of 
both computer memory and CPU cycles at the time, sticking to a 7-bit 
code (the 8th bit was for parity) was a necessary evil.

As Steven D'Aprano pointed out, it was missing some commonly used US 
symbols such as ¢ or ©.  This was a small price to pay for the 
simplicity ASCII afforded.  It wasn't a bad encoding.  I was a very good 
encoding.  But the world has moved on and computing hardware has become 
cheap enough that supporting richer encodings and character sets is 
realistic.

And, before people complain about the character set being US-Centric, 
keep in mind that the A in ASCII stands for American, and it was 
published by ANSI (whose A also stands for American).  I'm not trying to 
wave the flag here, just pointing out that it was never intended to be 
anything other than a national character set.

Part of the complexity of Unicode is that when people switch from 
working with ASCII to working with Unicode, they're really having to 
master two distinct things at the same time (and often conflate them 
into a single confusing mess).  One is the Unicode character set.  The 
other is a specific encoding (UTF-8, UTF-16, etc).  Not to mention silly 
things like BOM (Byte Order Mark).  I expect that some day, storage 
costs will become so cheap that we'll all just be using UTF-32, and 
programmers of the day will wonder how their poor parents and 
grandparents ever managed in a world where nobody quite knew what you 
meant when you asked, "how long is that string?".

[toc] | [prev] | [next] | [standalone]


#20278

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-02-12 11:47 -0500
Message-ID<mailman.5730.1329065268.27778.python-list@python.org>
In reply to#20273
On Sun, 12 Feb 2012 10:48:36 -0500, Roy Smith <roy@panix.com> wrote:

>As Steven D'Aprano pointed out, it was missing some commonly used US 
>symbols such as ¢ or ©.  This was a small price to pay for the 
>simplicity ASCII afforded.  It wasn't a bad encoding.  I was a very good 
>encoding.  But the world has moved on and computing hardware has become 
>cheap enough that supporting richer encodings and character sets is 
>realistic.
>
	Any volunteers to create an Extended Baudot... Instead of "letter
shift" and "number shift" we could have a generic "encoding shift" which
uses the following characters to identify which 7-bit subset of Unicode
is to be represented <G>
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#20282

FromRoy Smith <roy@panix.com>
Date2012-02-12 12:11 -0500
Message-ID<roy-EBA03D.12114612022012@news.panix.com>
In reply to#20278
In article <mailman.5730.1329065268.27778.python-list@python.org>,
 Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote:

> On Sun, 12 Feb 2012 10:48:36 -0500, Roy Smith <roy@panix.com> wrote:
> 
> >As Steven D'Aprano pointed out, it was missing some commonly used US 
> >symbols such as ¢ or ©.

That's interesting.  When I wrote that, it showed on my screen as a cent 
symbol and a copyright symbol.  What I see in your response is an upper 
case "A" with a hat accent (circumflex?) over it followed by a cent 
symbol, and likewise an upper case "A" with a hat accent over it 
followed by copyright symbol.

Oh, for the days of ASCII again :-)

Not to mention, of course, that I wrote <colon><dash><close-paren>, but 
I fully expect some of you will be reading this with absurd clients 
which turn that into some kind of smiley-face image.


> 	Any volunteers to create an Extended Baudot... Instead of "letter
> shift" and "number shift" we could have a generic "encoding shift" which
> uses the following characters to identify which 7-bit subset of Unicode
> is to be represented <G>

I think that's called UTF-8.

[toc] | [prev] | [next] | [standalone]


#20304

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-02-12 22:49 +0000
Message-ID<4f3841e4$0$29986$c3e8da3$5496439d@news.astraweb.com>
In reply to#20282
On Sun, 12 Feb 2012 12:11:46 -0500, Roy Smith wrote:

> In article <mailman.5730.1329065268.27778.python-list@python.org>,
>  Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote:
> 
>> On Sun, 12 Feb 2012 10:48:36 -0500, Roy Smith <roy@panix.com> wrote:
>> 
>> >As Steven D'Aprano pointed out, it was missing some commonly used US
>> >symbols such as ¢ or ©.
> 
> That's interesting.  When I wrote that, it showed on my screen as a cent
> symbol and a copyright symbol.  What I see in your response is an upper
> case "A" with a hat accent (circumflex?) over it followed by a cent
> symbol, and likewise an upper case "A" with a hat accent over it
> followed by copyright symbol.

Somebody's mail or news reader is either ignoring the message's encoding 
line, or not inserting an encoding line. Either way, that's a bug.

> Oh, for the days of ASCII again :-)

I look forward to the day, probably around 2525, when everybody uses 
UTF-32 always.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#20276

FromDan Sommers <dan@tombstonezero.net>
Date2012-02-12 15:55 +0000
Message-ID<mailman.5727.1329062134.27778.python-list@python.org>
In reply to#20250
On Sun, 12 Feb 2012 17:08:24 +1100, Chris Angelico wrote:

> On Sun, Feb 12, 2012 at 4:51 PM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> You can't say that it cost you £10 to courier your résumé to the head
>> office of Encyclopædia Britanica to apply for the position of Staff
>> Coördinator.
> 
> True, but if it cost you $10 (or 10 GBP) to courier your curriculum
> vitae to the head office of Encyclopaedia Britannica to become Staff
> Coordinator, then you'd be fine. And if it cost you $10 to post your
> work summary to Britannica's administration to apply for this Staff
> Coordinator position, you could say it without 'e' too. Doesn't mean you
> don't need Unicode!

Back in the late 1970's, the economy and the outlook in the USA sucked, 
and the following joke made the rounds:

Mr. Smith:  Good morning, Mr. Jones.  How are you?

Mr. Jones:  I'm fine.

(The humor is that Mr. Jones had his head so far [in the sand] that he 
thought that things were fine.)

American English is my first spoken language, but I know enough French, 
Greek, math, and other languages that I am very happy to have more than 
ASCII these days.  I imagine that even Steven's surname should be spelled 
D’Aprano rather than D'Aprano.

Dan

[toc] | [prev] | [next] | [standalone]


#20281

Fromrusi <rustompmody@gmail.com>
Date2012-02-12 08:50 -0800
Message-ID<e7f457b3-7d49-4c95-bd95-e0f27fa66137@s8g2000pbj.googlegroups.com>
In reply to#20250
On Feb 12, 10:51 am, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:
> > Everything that displays text to a human needs to translate bytes into
> > glyphs, and the usual way to do this conceptually is to go via
> > characters. Pretending that it's all the same thing really means
> > pretending that one byte represents one character and that each
> > character is depicted by one glyph. And that's doomed to failure, unless
> > everyone speaks English with no foreign symbols - so, no mathematical
> > notations.
>
> Pardon me, but you can't even write *English* in ASCII.
>
> You can't say that it cost you £10 to courier your résumé to the head
> office of Encyclopædia Britanica to apply for the position of Staff
> Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
> and old-fashioned, but it is traditional English.)
>
> Hell, you can't even write in *American*: you can't say that the recipe
> for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.

[Quite OT but...] How do you type all this?
[Note: I grew up on APL so unlike Rick I am genuinely asking :-) ]

[toc] | [prev] | [next] | [standalone]


#20283

FromRoy Smith <roy@panix.com>
Date2012-02-12 12:21 -0500
Message-ID<roy-8C0B2D.12212712022012@news.panix.com>
In reply to#20281
In article 
<e7f457b3-7d49-4c95-bd95-e0f27fa66137@s8g2000pbj.googlegroups.com>,
 rusi <rustompmody@gmail.com> wrote:

> On Feb 12, 10:51 am, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:
> > On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:
> > > Everything that displays text to a human needs to translate bytes into
> > > glyphs, and the usual way to do this conceptually is to go via
> > > characters. Pretending that it's all the same thing really means
> > > pretending that one byte represents one character and that each
> > > character is depicted by one glyph. And that's doomed to failure, unless
> > > everyone speaks English with no foreign symbols - so, no mathematical
> > > notations.
> >
> > Pardon me, but you can't even write *English* in ASCII.
> >
> > You can't say that it cost you £10 to courier your résumé to the head
> > office of Encyclopædia Britanica to apply for the position of Staff
> > Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
> > and old-fashioned, but it is traditional English.)
> >
> > Hell, you can't even write in *American*: you can't say that the recipe
> > for the 20¢ WobblyBurger is © 2012 WobblyBurgerWorld Inc.
> 
> [Quite OT but...] How do you type all this?
> [Note: I grew up on APL so unlike Rick I am genuinely asking :-) ]

What I do (on a Mac) is open the Keyboard Viewer thingie and try various 
combinations of shift-control-option-command-function until the thing 
I'm looking for shows up on a keycap.  A few of them I've got memorized 
(for example, option-8 gets you a bullet €).  I would imagine if you 
commonly type in a language other than English, you would quickly 
memorize the ones you use a lot.

Or, open the Character Viewer thingie and either hunt around the various 
drill-down menus (North American Scripts / Canadian Aboriginal 
Syllabics, for example) or type in some guess at the official unicode 
name into the search box.

[toc] | [prev] | [next] | [standalone]


#20284

FromNick Dokos <nicholas.dokos@hp.com>
Date2012-02-12 12:36 -0500
Message-ID<mailman.5733.1329068559.27778.python-list@python.org>
In reply to#20281
rusi <rustompmody@gmail.com> wrote:

> On Feb 12, 10:51 am, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:
> > On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:
> > > Everything that displays text to a human needs to translate bytes into
> > > glyphs, and the usual way to do this conceptually is to go via
> > > characters. Pretending that it's all the same thing really means
> > > pretending that one byte represents one character and that each
> > > character is depicted by one glyph. And that's doomed to failure, unless
> > > everyone speaks English with no foreign symbols - so, no mathematical
> > > notations.
> >
> > Pardon me, but you can't even write *English* in ASCII.
> >
> > You can't say that it cost you £10 to courier your résumé to the head
> > office of Encyclopædia Britanica to apply for the position of Staff
> > Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
> > and old-fashioned, but it is traditional English.)
> >
> > Hell, you can't even write in *American*: you can't say that the recipe
> > for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.
> 
> [Quite OT but...] How do you type all this?
> [Note: I grew up on APL so unlike Rick I am genuinely asking :-) ]

[Emacs speficic]

Many different ways of course, but in emacs, you can select e.g. the TeX input method
with C-x RET C-\ TeX RET.
which does all of the above symbols with the exception of the cent
symbol (or maybe I missed it) - you type the thing in the first column and you
get the thing in the second column

\pounds £
\'e     é
\ae     æ
\"o     ö
^{TM}   ™
\copyright ©

I gave up on the cent symbol and used ucs-insert (C-x 8 RET) which allows you to type
a name, in this case CENT SIGN to get ¢.

Nick

[toc] | [prev] | [next] | [standalone]


#20322 — entering unicode (was Python usage numbers)

Fromrusi <rustompmody@gmail.com>
Date2012-02-12 19:09 -0800
Subjectentering unicode (was Python usage numbers)
Message-ID<d64b98fa-5821-4b9f-947b-25c7d4a2c6e9@og8g2000pbb.googlegroups.com>
In reply to#20284
On Feb 12, 10:36 pm, Nick Dokos <nicholas.do...@hp.com> wrote:
> rusi <rustompm...@gmail.com> wrote:
> > On Feb 12, 10:51 am, Steven D'Aprano <steve
> > +comp.lang.pyt...@pearwood.info> wrote:
> > > On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:
> > > > Everything that displays text to a human needs to translate bytes into
> > > > glyphs, and the usual way to do this conceptually is to go via
> > > > characters. Pretending that it's all the same thing really means
> > > > pretending that one byte represents one character and that each
> > > > character is depicted by one glyph. And that's doomed to failure, unless
> > > > everyone speaks English with no foreign symbols - so, no mathematical
> > > > notations.
>
> > > Pardon me, but you can't even write *English* in ASCII.
>
> > > You can't say that it cost you £10 to courier your résumé to the head
> > > office of Encyclopædia Britanica to apply for the position of Staff
> > > Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
> > > and old-fashioned, but it is traditional English.)
>
> > > Hell, you can't even write in *American*: you can't say that the recipe
> > > for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.
>
> > [Quite OT but...] How do you type all this?
> > [Note: I grew up on APL so unlike Rick I am genuinely asking :-) ]
>
> [Emacs speficic]
>
> Many different ways of course, but in emacs, you can select e.g. the TeX input method
> with C-x RET C-\ TeX RET.
> which does all of the above symbols with the exception of the cent
> symbol (or maybe I missed it) - you type the thing in the first column and you
> get the thing in the second column
>
> \pounds £
> \'e     é
> \ae     æ
> \"o     ö
> ^{TM}   ™
> \copyright ©
>
> I gave up on the cent symbol and used ucs-insert (C-x 8 RET) which allows you to type
> a name, in this case CENT SIGN to get ¢.
>
> Nick

[OT warning]
I asked this on the emacs list:

No response there and the responses here are more helpful so asking
here.
My question there was emacs-specific. If there is some other app,
thats fine.

I have some bunch of sanskrit (devanagari) to type.  It would be
easiest for me if I could have the English (roman) as well as the
sanskrit (devanagari).

For example using the devanagari-itrans input method I can write the
gayatri mantra using

OM bhUrbhuvaH suvaH
tatsaviturvarenyam
bhargo devasya dhImahi
dhiyo yonaH prachodayAt

and emacs produces *on the fly* (ie I cant see/edit the above)

ॐ भूर्भुवः सुवः
तत्सवितुर्वरेण्यम्
भर्गो देवस्य धीमहि
धियो योनः प्रचोदयात्

Can I do it in batch mode? ie write the first in a file and run some
command on it to produce the second?

[toc] | [prev] | [next] | [standalone]


#20603 — Re: entering unicode (was Python usage numbers)

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-02-19 03:44 +0000
SubjectRe: entering unicode (was Python usage numbers)
Message-ID<4f407007$0$29986$c3e8da3$5496439d@news.astraweb.com>
In reply to#20322
On Sun, 12 Feb 2012 19:09:32 -0800, rusi wrote:

> I have some bunch of sanskrit (devanagari) to type.  It would be easiest
> for me if I could have the English (roman) as well as the sanskrit
> (devanagari).
> 
> For example using the devanagari-itrans input method I can write the
> gayatri mantra using
>
> OM bhUrbhuvaH suvaH
> tatsaviturvarenyam
> bhargo devasya dhImahi
> dhiyo yonaH prachodayAt
> 
> and emacs produces *on the fly* (ie I cant see/edit the above)
> 
> ॐ भूर्भुवः सुवः तत्सवितुर्वरेण्यम् भर्गो 
देवस्य धीमहि धियो योनः
> प्रचोदयात्
> 
> Can I do it in batch mode? ie write the first in a file and run some
> command on it to produce the second?


What is the devanagari-itrans input method? Do you actually type the 
characters into a terminal?

If so, you should be able to type the first into a file, copy it, then 
paste it into the input buffer to be processed.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#20611 — Re: entering unicode (was Python usage numbers)

Fromrusi <rustompmody@gmail.com>
Date2012-02-19 00:52 -0800
SubjectRe: entering unicode (was Python usage numbers)
Message-ID<6c4323a5-79bb-4105-8581-a2bef19fc39b@4g2000pbz.googlegroups.com>
In reply to#20603
On Feb 19, 8:44 am, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> On Sun, 12 Feb 2012 19:09:32 -0800, rusi wrote:
> > I have some bunch of sanskrit (devanagari) to type.  It would be easiest
> > for me if I could have the English (roman) as well as the sanskrit
> > (devanagari).
>
> > For example using the devanagari-itrans input method I can write the
> > gayatri mantra using
>
> > OM bhUrbhuvaH suvaH
> > tatsaviturvarenyam
> > bhargo devasya dhImahi
> > dhiyo yonaH prachodayAt
>
> > and emacs produces *on the fly* (ie I cant see/edit the above)
>
> > ॐ भूर्भुवः सुवः तत्सवितुर्वरेण्यम् भर्गो
>
> देवस्य धीमहि धियो योनः
>
> > प्रचोदयात्
>
> > Can I do it in batch mode? ie write the first in a file and run some
> > command on it to produce the second?
>
> What is the devanagari-itrans input method? Do you actually type the
> characters into a terminal?

Its one of the dozens (hundreds actually) of input methods that emacs
has.
In emacs M-x set-input-method
and then give devanagari-itrans.
Its details are described (somewhat poorly) here http://en.wikipedia.org/wiki/ITRANS
>
> If so, you should be able to type the first into a file, copy it, then
> paste it into the input buffer to be processed.

For now Ive got it working in emacs with some glitches but it  will do
for now:

http://groups.google.com/group/gnu.emacs.help/browse_thread/thread/bfa6b05ce565d96d#
>
> --
> Steven

Thanks
[Actually thanks-squared
one for looking
two for backing up this thread to something more useful than ... :-)
]

Coincidentally, along this, your response, Ive got another mail to
another unicode related interest of mine: apl under linux.

So far I was the sole (known) user of this: http://www.emacswiki.org/emacs/AplInDebian
I hear the number of users has just doubled :-)

[toc] | [prev] | [next] | [standalone]


#20303 — How do you Unicode proponents type your non-ASCII characters? (was: Python usage numbers)

FromBen Finney <ben+python@benfinney.id.au>
Date2012-02-13 09:43 +1100
SubjectHow do you Unicode proponents type your non-ASCII characters? (was: Python usage numbers)
Message-ID<87d39j3c5z.fsf_-_@benfinney.id.au>
In reply to#20281
rusi <rustompmody@gmail.com> writes:

> On Feb 12, 10:51 am, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:
> > Pardon me, but you can't even write *English* in ASCII.
> >
> > You can't say that it cost you £10 to courier your résumé to the head
> > office of Encyclopædia Britanica to apply for the position of Staff
> > Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
> > and old-fashioned, but it is traditional English.)
> >
> > Hell, you can't even write in *American*: you can't say that the recipe
> > for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.
>
> [Quite OT but...] How do you type all this?

In GNU+Linux, I run the IBus daemon to manage different keyboard input
methods across all my applications consistently. That makes hundreds of
language-specific input methods available, and also many that are not
language-specific.

It's useful if I want to 英語の書面を書き中 type a passage of Japanese
with the ‘anthy’ input method, or likewise for any of the other
available language-specific input methods.

I normally have IBus presenting the ‘rfc1345’ input method. That makes
just about all keys input the corresponding character just as if no
input method were active. But when I type ‘&’ followed by a two- or
three-key sequence, it inputs the corresponding character from the
RFC 1345 mnemonics table:

    & → &
    P d → £
    e ' → é
    a e → æ
    o : → ö
    C t → ¢
    T M → ™
    C o → ©
    " 6 → “
    " 9 → ”
    …


Those same characters are also available with the ‘latex’ input method,
if I'm familiar with LaTeX character entity names. (I'm not.)

-- 
 \           “If [a technology company] has confidence in their future |
  `\      ability to innovate, the importance they place on protecting |
_o__)     their past innovations really should decline.” —Gary Barnett |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#20307

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-02-12 22:56 +0000
Message-ID<4f3843a0$0$29986$c3e8da3$5496439d@news.astraweb.com>
In reply to#20281
On Sun, 12 Feb 2012 08:50:28 -0800, rusi wrote:

>> You can't say that it cost you £10 to courier your résumé to the head
>> office of Encyclopædia Britanica to apply for the position of Staff
>> Coördinator. (Admittedly, the umlaut on the second "o" looks a bit
>> stuffy and old-fashioned, but it is traditional English.)
>>
>> Hell, you can't even write in *American*: you can't say that the recipe
>> for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.
> 
> [Quite OT but...] How do you type all this? [Note: I grew up on APL so
> unlike Rick I am genuinely asking :-) ]


In my case, I used the KDE application "KCharSelect". I manually hunt 
through the tables for the character I want (which sucks), click on the 
characters I want, and copy and paste them into my editor.

Back in Ancient Days when I ran Mac OS 6, I had memorised many keyboard 
shortcuts for these things. Option-4 was the pound sign, I believe, and 
Option-Shift-4 the cent sign. Or perhaps the other way around?


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#20272

FromRoy Smith <roy@panix.com>
Date2012-02-12 10:13 -0500
Message-ID<roy-98B81C.10130712022012@news.panix.com>
In reply to#20248
In article <mailman.5715.1329021524.27778.python-list@python.org>,
 Chris Angelico <rosuav@gmail.com> wrote:

> On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson
> <rantingrickjohnson@gmail.com> wrote:
> > On Feb 11, 8:23 pm, Steven D'Aprano <steve
> > +comp.lang.pyt...@pearwood.info> wrote:
> >> "I have a file containing text. I can open it in an editor and see it's
> >> nearly all ASCII text, except for a few weird and bizarre characters like
> >> Ł Š ą or ö. In Python 2, I can read that file fine. In Python 3 I get an
> >> error. What should I do that requires no thought?"
> >>
> >> Obvious answers:
> >
> > the most obvious answer would be to read the file WITHOUT worrying
> > about asinine encoding.
> 
> What this statement misunderstands, though, is that ASCII is itself an
> encoding. Files contain bytes, and it's only what's external to those
> bytes that gives them meaning.

Exactly.  <soapbox class="wise-old-geezer">.  ASCII was so successful at 
becoming a universal standard which lasted for decades, people who grew 
up with it don't realize there was once any other way.  Not just EBCDIC, 
but also SIXBIT, RAD-50, tilt/rotate, packed card records, and so on.  
Transcoding was a way of life, and if you didn't know what you were 
starting with and aiming for, it was hopeless.  Kind of like now where 
we are again with Unicode.  </soapbox>

[toc] | [prev] | [next] | [standalone]


Page 1 of 6  [1] 2 3 4 5 6  Next page →

Back to top | Article view | comp.lang.python


csiph-web