Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #20242 > unrolled thread
| Started by | Chris Angelico <rosuav@gmail.com> |
|---|---|
| First post | 2012-02-12 12:28 +1100 |
| Last post | 2012-02-15 11:56 +0200 |
| Articles | 20 on this page of 109 — 31 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-12 12:28 +1100
Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 02:23 +0000
Re: Python usage numbers Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-11 18:36 -0800
Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-12 15:38 +1100
Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 05:51 +0000
Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-12 17:08 +1100
Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 10:48 -0500
Re: Python usage numbers Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-02-12 11:47 -0500
Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 12:11 -0500
Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 22:49 +0000
Re: Python usage numbers Dan Sommers <dan@tombstonezero.net> - 2012-02-12 15:55 +0000
Re: Python usage numbers rusi <rustompmody@gmail.com> - 2012-02-12 08:50 -0800
Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 12:21 -0500
Re: Python usage numbers Nick Dokos <nicholas.dokos@hp.com> - 2012-02-12 12:36 -0500
entering unicode (was Python usage numbers) rusi <rustompmody@gmail.com> - 2012-02-12 19:09 -0800
Re: entering unicode (was Python usage numbers) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-19 03:44 +0000
Re: entering unicode (was Python usage numbers) rusi <rustompmody@gmail.com> - 2012-02-19 00:52 -0800
How do you Unicode proponents type your non-ASCII characters? (was: Python usage numbers) Ben Finney <ben+python@benfinney.id.au> - 2012-02-13 09:43 +1100
Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 22:56 +0000
Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 10:13 -0500
Re: Python usage numbers Terry Reedy <tjreedy@udel.edu> - 2012-02-12 17:07 -0500
Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 17:22 -0500
Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-13 09:14 +1100
Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 17:27 -0500
Re: Python usage numbers Dave Angel <davea@dejaviewphoto.com> - 2012-02-12 17:40 -0500
Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 23:29 +0000
Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 18:41 -0500
Re: Python usage numbers Dave Angel <d@davea.name> - 2012-02-12 19:03 -0500
Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-13 11:59 +1100
Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 20:11 -0500
Re: Python usage numbers Christian Heimes <lists@cheimes.de> - 2012-02-13 01:00 +0100
Re: Python usage numbers Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-02-12 21:37 -0500
Re: Python usage numbers Terry Reedy <tjreedy@udel.edu> - 2012-02-12 22:09 -0500
Re: Python usage numbers Roy Smith <roy@panix.com> - 2012-02-12 22:57 -0500
Re: Python usage numbers Ben Finney <ben+python@benfinney.id.au> - 2012-02-13 15:19 +1100
Re: Python usage numbers Andrew Berg <bahamutzero8825@gmail.com> - 2012-02-13 12:26 -0600
Re: Python usage numbers jmfauth <wxjmfauth@gmail.com> - 2012-02-14 00:00 -0800
Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 06:10 +0000
Re: Python usage numbers Andrew Berg <bahamutzero8825@gmail.com> - 2012-02-12 01:05 -0600
Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 09:12 +0000
Re: Python usage numbers Andrew Berg <bahamutzero8825@gmail.com> - 2012-02-12 05:11 -0600
Re: Python usage numbers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-12 22:30 +0000
Re: Python usage numbers Dave Angel <d@davea.name> - 2012-02-12 17:50 -0500
Re: Python usage numbers Peter Pearson <ppearson@nowhere.invalid> - 2012-02-12 17:58 +0000
Re: Python usage numbers Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-12 20:48 -0800
Re: Python usage numbers Chris Angelico <rosuav@gmail.com> - 2012-02-13 16:03 +1100
OT: Entitlements [was Re: Python usage numbers] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-13 08:05 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 08:01 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-13 16:12 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 08:27 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Ian Kelly <ian.g.kelly@gmail.com> - 2012-02-13 11:38 -0700
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 13:01 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-14 08:27 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-13 21:46 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-14 00:19 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 17:07 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Ian Kelly <ian.g.kelly@gmail.com> - 2012-02-13 18:29 -0700
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-17 17:13 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-18 13:13 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-18 02:39 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Ian Kelly <ian.g.kelly@gmail.com> - 2012-02-18 00:28 -0700
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-18 07:02 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-18 16:15 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-18 10:34 -0800
Re: OT: Entitlements [was Re: Python usage numbers] random joe <pywin32@gmail.com> - 2012-02-18 10:49 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Albert van der Horst <albert@spenarnc.xs4all.nl> - 2012-02-26 12:14 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Terry Reedy <tjreedy@udel.edu> - 2012-02-18 04:16 -0500
Re: OT: Entitlements [was Re: Python usage numbers] John O'Hagan <research@johnohagan.com> - 2012-02-14 19:41 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-14 16:21 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-15 11:44 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-14 17:26 -0800
Re: OT: Entitlements [was Re: Python usage numbers] John O'Hagan <research@johnohagan.com> - 2012-02-15 19:56 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-15 07:04 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-15 15:18 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-15 08:27 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-15 17:16 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Ian Kelly <ian.g.kelly@gmail.com> - 2012-02-15 09:46 -0700
Re: OT: Entitlements [was Re: Python usage numbers] Albert van der Horst <albert@spenarnc.xs4all.nl> - 2012-02-26 12:44 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-26 12:35 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-27 07:50 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-26 14:32 -0800
Re: OT: Entitlements Ben Finney <ben+python@benfinney.id.au> - 2012-02-27 07:46 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-14 07:47 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Michael Torrie <torriem@gmail.com> - 2012-02-13 14:46 -0700
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-13 16:39 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Michael Torrie <torriem@gmail.com> - 2012-02-13 18:36 -0700
Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-14 12:37 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-17 17:37 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Tim Wintle <tim.wintle@teamrubber.com> - 2012-02-13 16:41 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-14 16:40 -0800
RE: OT: Entitlements [was Re: Python usage numbers] "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-02-17 20:09 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Duncan Booth <duncan.booth@invalid.invalid> - 2012-02-14 11:31 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-02-14 07:06 -0500
Re: OT: Entitlements [was Re: Python usage numbers] Rick Johnson <rantingrickjohnson@gmail.com> - 2012-02-14 16:48 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Chris Angelico <rosuav@gmail.com> - 2012-02-15 12:32 +1100
Re: OT: Entitlements [was Re: Python usage numbers] Duncan Booth <duncan.booth@invalid.invalid> - 2012-02-15 09:47 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Arnaud Delobelle <arnodel@gmail.com> - 2012-02-15 09:58 +0000
Re: OT: Entitlements [was Re: Python usage numbers] Duncan Booth <duncan.booth@invalid.invalid> - 2012-02-15 10:04 +0000
Kill files [was Re: OT: Entitlements [was Re: Python usage numbers]] Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-02-15 10:27 +0000
Re: Kill files [was Re: OT: Entitlements [was Re: Python usage numbers]] Ethan Furman <ethan@stoneleaf.us> - 2012-02-15 11:29 -0800
Re: OT: Entitlements [was Re: Python usage numbers] rusi <rustompmody@gmail.com> - 2012-02-14 04:56 -0800
Re: OT: Entitlements [was Re: Python usage numbers] Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-02-14 09:37 -0500
Re: Python usage numbers Matej Cepl <mcepl@redhat.com> - 2012-02-12 09:14 +0100
Re: Python usage numbers Matej Cepl <mcepl@redhat.com> - 2012-02-12 09:26 +0100
Re: Python usage numbers Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-02-12 12:11 +0000
Re: Python usage numbers alister <alister.ware@ntlworld.com> - 2012-02-12 18:55 +0000
Re: Python usage numbers jmfauth <wxjmfauth@gmail.com> - 2012-02-12 11:52 -0800
French and IDLE on Windows (was Re: Python usage numbers) Terry Reedy <tjreedy@udel.edu> - 2012-02-12 18:30 -0500
Re: Python usage numbers Anssi Saari <as@sci.fi> - 2012-02-15 11:56 +0200
Page 1 of 6 [1] 2 3 4 5 6 Next page →
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-02-12 12:28 +1100 |
| Subject | Re: Python usage numbers |
| Message-ID | <mailman.5711.1329010113.27778.python-list@python.org> |
On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote: > However, in at > least one current thread (on python-ideas) and at a variety of times > in the past, _some_ people have found Unicode in Python 3 to make more > work. If Unicode in Python is causing you more work, isn't it most likely that the issue would have come up anyway? For instance, suppose you have a web form and you accept customer names, which you then store in a database. You could assume that the browser submits it in UTF-8 and that your database back-end can accept UTF-8, and then pretend that it's all ASCII, but if you then want to upper-case the name for a heading, somewhere you're going to needto deal with Unicode; and when your programming language has facilities like str.upper(), that's going to make it easier, not later. Sure, the simple case is easier if you pretend it's all ASCII, but it's still better to have language facilities. ChrisA
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-02-12 02:23 +0000 |
| Message-ID | <4f37229b$0$29986$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #20242 |
On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote: > On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow > <ericsnowcurrently@gmail.com> wrote: >> However, in at >> least one current thread (on python-ideas) and at a variety of times in >> the past, _some_ people have found Unicode in Python 3 to make more >> work. > > If Unicode in Python is causing you more work, isn't it most likely that > the issue would have come up anyway? The argument being made is that in Python 2, if you try to read a file that contains Unicode characters encoded with some unknown codec, you don't have to think about it. Sure, you get moji-bake rubbish in your database, but that's the fault of people who insist on not being American. Or who spell Zoe with an umlaut. In Python 3, if you try the same thing, you get an error. Fixing the error requires thought, and even if that is only a minuscule amount of thought, that's too much for some developers who are scared of Unicode. Hence the FUD that Python 3 is too hard because it makes you learn Unicode. I know this isn't exactly helpful, but I wish they'd just HTFU. I'm with Joel Spolsky on this one: if you're a programmer in 2003 who doesn't have at least a basic working knowledge of Unicode, you're the equivalent of a doctor who doesn't believe in germs. http://www.joelonsoftware.com/articles/Unicode.html Learning a basic working knowledge of Unicode is not that hard. You don't need to be an expert, and it's just not that scary. The use-case given is: "I have a file containing text. I can open it in an editor and see it's nearly all ASCII text, except for a few weird and bizarre characters like £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an error. What should I do that requires no thought?" Obvious answers: - Try decoding with UTF8 or Latin1. Even if you don't get the right characters, you'll get *something*. - Use open(filename, encoding='ascii', errors='surrogateescape') (Or possibly errors='ignore'.) -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Rick Johnson <rantingrickjohnson@gmail.com> |
|---|---|
| Date | 2012-02-11 18:36 -0800 |
| Message-ID | <f610859f-3aa3-4c84-a737-40791a217ef1@m5g2000yqk.googlegroups.com> |
| In reply to | #20244 |
On Feb 11, 8:23 pm, Steven D'Aprano <steve +comp.lang.pyt...@pearwood.info> wrote: > On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote: > > On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow > > <ericsnowcurren...@gmail.com> wrote: > >> However, in at > >> least one current thread (on python-ideas) and at a variety of times in > >> the past, _some_ people have found Unicode in Python 3 to make more > >> work. > > > If Unicode in Python is causing you more work, isn't it most likely that > > the issue would have come up anyway? > > The argument being made is that in Python 2, if you try to read a file > that contains Unicode characters encoded with some unknown codec, you > don't have to think about it. Sure, you get moji-bake rubbish in your > database, but that's the fault of people who insist on not being > American. Or who spell Zoe with an umlaut. That's not the worst of it... i have many times had a block of text that was valid ASCII except for some intermixed Unicode white-space. Who the hell would even consider inserting Unicode white-space!!! > "I have a file containing text. I can open it in an editor and see it's > nearly all ASCII text, except for a few weird and bizarre characters like > £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an > error. What should I do that requires no thought?" > > Obvious answers: the most obvious answer would be to read the file WITHOUT worrying about asinine encoding.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-02-12 15:38 +1100 |
| Message-ID | <mailman.5715.1329021524.27778.python-list@python.org> |
| In reply to | #20245 |
On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson <rantingrickjohnson@gmail.com> wrote: > On Feb 11, 8:23 pm, Steven D'Aprano <steve > +comp.lang.pyt...@pearwood.info> wrote: >> "I have a file containing text. I can open it in an editor and see it's >> nearly all ASCII text, except for a few weird and bizarre characters like >> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an >> error. What should I do that requires no thought?" >> >> Obvious answers: > > the most obvious answer would be to read the file WITHOUT worrying > about asinine encoding. What this statement misunderstands, though, is that ASCII is itself an encoding. Files contain bytes, and it's only what's external to those bytes that gives them meaning. The famous "bush hid the facts" trick with Windows Notepad shows the folly of trying to use internal evidence to identify meaning from bytes. Everything that displays text to a human needs to translate bytes into glyphs, and the usual way to do this conceptually is to go via characters. Pretending that it's all the same thing really means pretending that one byte represents one character and that each character is depicted by one glyph. And that's doomed to failure, unless everyone speaks English with no foreign symbols - so, no mathematical notations. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-02-12 05:51 +0000 |
| Message-ID | <4f375347$0$29986$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #20248 |
On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote: > Everything that displays text to a human needs to translate bytes into > glyphs, and the usual way to do this conceptually is to go via > characters. Pretending that it's all the same thing really means > pretending that one byte represents one character and that each > character is depicted by one glyph. And that's doomed to failure, unless > everyone speaks English with no foreign symbols - so, no mathematical > notations. Pardon me, but you can't even write *English* in ASCII. You can't say that it cost you £10 to courier your résumé to the head office of Encyclopædia Britanica to apply for the position of Staff Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy and old-fashioned, but it is traditional English.) Hell, you can't even write in *American*: you can't say that the recipe for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc. ASCII truly is a blight on the world, and the sooner it fades into obscurity, like EBCDIC, the better. Even if everyone did change to speak ASCII, you still have all the historical records and documents and files to deal with. Encodings are not going away. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-02-12 17:08 +1100 |
| Message-ID | <mailman.5717.1329026906.27778.python-list@python.org> |
| In reply to | #20250 |
On Sun, Feb 12, 2012 at 4:51 PM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > You can't say that it cost you £10 to courier your résumé to the head > office of Encyclopædia Britanica to apply for the position of Staff > Coördinator. True, but if it cost you $10 (or 10 GBP) to courier your curriculum vitae to the head office of Encyclopaedia Britannica to become Staff Coordinator, then you'd be fine. And if it cost you $10 to post your work summary to Britannica's administration to apply for this Staff Coordinator position, you could say it without 'e' too. Doesn't mean you don't need Unicode! ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-02-12 10:48 -0500 |
| Message-ID | <roy-74351C.10483512022012@news.panix.com> |
| In reply to | #20250 |
In article <4f375347$0$29986$c3e8da3$5496439d@news.astraweb.com>, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > ASCII truly is a blight on the world, and the sooner it fades into > obscurity, like EBCDIC, the better. That's a fair statement, but it's also fair to say that at the time it came out (49 years ago!) it was a revolutionary improvement on the extant state of affairs (every manufacturer inventing their own code, and often different codes for different machines). Given the cost of both computer memory and CPU cycles at the time, sticking to a 7-bit code (the 8th bit was for parity) was a necessary evil. As Steven D'Aprano pointed out, it was missing some commonly used US symbols such as ¢ or ©. This was a small price to pay for the simplicity ASCII afforded. It wasn't a bad encoding. I was a very good encoding. But the world has moved on and computing hardware has become cheap enough that supporting richer encodings and character sets is realistic. And, before people complain about the character set being US-Centric, keep in mind that the A in ASCII stands for American, and it was published by ANSI (whose A also stands for American). I'm not trying to wave the flag here, just pointing out that it was never intended to be anything other than a national character set. Part of the complexity of Unicode is that when people switch from working with ASCII to working with Unicode, they're really having to master two distinct things at the same time (and often conflate them into a single confusing mess). One is the Unicode character set. The other is a specific encoding (UTF-8, UTF-16, etc). Not to mention silly things like BOM (Byte Order Mark). I expect that some day, storage costs will become so cheap that we'll all just be using UTF-32, and programmers of the day will wonder how their poor parents and grandparents ever managed in a world where nobody quite knew what you meant when you asked, "how long is that string?".
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2012-02-12 11:47 -0500 |
| Message-ID | <mailman.5730.1329065268.27778.python-list@python.org> |
| In reply to | #20273 |
On Sun, 12 Feb 2012 10:48:36 -0500, Roy Smith <roy@panix.com> wrote:
>As Steven D'Aprano pointed out, it was missing some commonly used US
>symbols such as ¢ or ©. This was a small price to pay for the
>simplicity ASCII afforded. It wasn't a bad encoding. I was a very good
>encoding. But the world has moved on and computing hardware has become
>cheap enough that supporting richer encodings and character sets is
>realistic.
>
Any volunteers to create an Extended Baudot... Instead of "letter
shift" and "number shift" we could have a generic "encoding shift" which
uses the following characters to identify which 7-bit subset of Unicode
is to be represented <G>
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-02-12 12:11 -0500 |
| Message-ID | <roy-EBA03D.12114612022012@news.panix.com> |
| In reply to | #20278 |
In article <mailman.5730.1329065268.27778.python-list@python.org>, Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote: > On Sun, 12 Feb 2012 10:48:36 -0500, Roy Smith <roy@panix.com> wrote: > > >As Steven D'Aprano pointed out, it was missing some commonly used US > >symbols such as ¢ or ©. That's interesting. When I wrote that, it showed on my screen as a cent symbol and a copyright symbol. What I see in your response is an upper case "A" with a hat accent (circumflex?) over it followed by a cent symbol, and likewise an upper case "A" with a hat accent over it followed by copyright symbol. Oh, for the days of ASCII again :-) Not to mention, of course, that I wrote <colon><dash><close-paren>, but I fully expect some of you will be reading this with absurd clients which turn that into some kind of smiley-face image. > Any volunteers to create an Extended Baudot... Instead of "letter > shift" and "number shift" we could have a generic "encoding shift" which > uses the following characters to identify which 7-bit subset of Unicode > is to be represented <G> I think that's called UTF-8.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-02-12 22:49 +0000 |
| Message-ID | <4f3841e4$0$29986$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #20282 |
On Sun, 12 Feb 2012 12:11:46 -0500, Roy Smith wrote: > In article <mailman.5730.1329065268.27778.python-list@python.org>, > Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote: > >> On Sun, 12 Feb 2012 10:48:36 -0500, Roy Smith <roy@panix.com> wrote: >> >> >As Steven D'Aprano pointed out, it was missing some commonly used US >> >symbols such as ¢ or ©. > > That's interesting. When I wrote that, it showed on my screen as a cent > symbol and a copyright symbol. What I see in your response is an upper > case "A" with a hat accent (circumflex?) over it followed by a cent > symbol, and likewise an upper case "A" with a hat accent over it > followed by copyright symbol. Somebody's mail or news reader is either ignoring the message's encoding line, or not inserting an encoding line. Either way, that's a bug. > Oh, for the days of ASCII again :-) I look forward to the day, probably around 2525, when everybody uses UTF-32 always. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Dan Sommers <dan@tombstonezero.net> |
|---|---|
| Date | 2012-02-12 15:55 +0000 |
| Message-ID | <mailman.5727.1329062134.27778.python-list@python.org> |
| In reply to | #20250 |
On Sun, 12 Feb 2012 17:08:24 +1100, Chris Angelico wrote: > On Sun, Feb 12, 2012 at 4:51 PM, Steven D'Aprano > <steve+comp.lang.python@pearwood.info> wrote: >> You can't say that it cost you £10 to courier your résumé to the head >> office of Encyclopædia Britanica to apply for the position of Staff >> Coördinator. > > True, but if it cost you $10 (or 10 GBP) to courier your curriculum > vitae to the head office of Encyclopaedia Britannica to become Staff > Coordinator, then you'd be fine. And if it cost you $10 to post your > work summary to Britannica's administration to apply for this Staff > Coordinator position, you could say it without 'e' too. Doesn't mean you > don't need Unicode! Back in the late 1970's, the economy and the outlook in the USA sucked, and the following joke made the rounds: Mr. Smith: Good morning, Mr. Jones. How are you? Mr. Jones: I'm fine. (The humor is that Mr. Jones had his head so far [in the sand] that he thought that things were fine.) American English is my first spoken language, but I know enough French, Greek, math, and other languages that I am very happy to have more than ASCII these days. I imagine that even Steven's surname should be spelled D’Aprano rather than D'Aprano. Dan
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2012-02-12 08:50 -0800 |
| Message-ID | <e7f457b3-7d49-4c95-bd95-e0f27fa66137@s8g2000pbj.googlegroups.com> |
| In reply to | #20250 |
On Feb 12, 10:51 am, Steven D'Aprano <steve +comp.lang.pyt...@pearwood.info> wrote: > On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote: > > Everything that displays text to a human needs to translate bytes into > > glyphs, and the usual way to do this conceptually is to go via > > characters. Pretending that it's all the same thing really means > > pretending that one byte represents one character and that each > > character is depicted by one glyph. And that's doomed to failure, unless > > everyone speaks English with no foreign symbols - so, no mathematical > > notations. > > Pardon me, but you can't even write *English* in ASCII. > > You can't say that it cost you £10 to courier your résumé to the head > office of Encyclopædia Britanica to apply for the position of Staff > Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy > and old-fashioned, but it is traditional English.) > > Hell, you can't even write in *American*: you can't say that the recipe > for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc. [Quite OT but...] How do you type all this? [Note: I grew up on APL so unlike Rick I am genuinely asking :-) ]
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-02-12 12:21 -0500 |
| Message-ID | <roy-8C0B2D.12212712022012@news.panix.com> |
| In reply to | #20281 |
In article <e7f457b3-7d49-4c95-bd95-e0f27fa66137@s8g2000pbj.googlegroups.com>, rusi <rustompmody@gmail.com> wrote: > On Feb 12, 10:51 am, Steven D'Aprano <steve > +comp.lang.pyt...@pearwood.info> wrote: > > On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote: > > > Everything that displays text to a human needs to translate bytes into > > > glyphs, and the usual way to do this conceptually is to go via > > > characters. Pretending that it's all the same thing really means > > > pretending that one byte represents one character and that each > > > character is depicted by one glyph. And that's doomed to failure, unless > > > everyone speaks English with no foreign symbols - so, no mathematical > > > notations. > > > > Pardon me, but you can't even write *English* in ASCII. > > > > You can't say that it cost you £10 to courier your résumé to the head > > office of Encyclopædia Britanica to apply for the position of Staff > > Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy > > and old-fashioned, but it is traditional English.) > > > > Hell, you can't even write in *American*: you can't say that the recipe > > for the 20¢ WobblyBurger is © 2012 WobblyBurgerWorld Inc. > > [Quite OT but...] How do you type all this? > [Note: I grew up on APL so unlike Rick I am genuinely asking :-) ] What I do (on a Mac) is open the Keyboard Viewer thingie and try various combinations of shift-control-option-command-function until the thing I'm looking for shows up on a keycap. A few of them I've got memorized (for example, option-8 gets you a bullet €). I would imagine if you commonly type in a language other than English, you would quickly memorize the ones you use a lot. Or, open the Character Viewer thingie and either hunt around the various drill-down menus (North American Scripts / Canadian Aboriginal Syllabics, for example) or type in some guess at the official unicode name into the search box.
[toc] | [prev] | [next] | [standalone]
| From | Nick Dokos <nicholas.dokos@hp.com> |
|---|---|
| Date | 2012-02-12 12:36 -0500 |
| Message-ID | <mailman.5733.1329068559.27778.python-list@python.org> |
| In reply to | #20281 |
rusi <rustompmody@gmail.com> wrote:
> On Feb 12, 10:51 am, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:
> > On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:
> > > Everything that displays text to a human needs to translate bytes into
> > > glyphs, and the usual way to do this conceptually is to go via
> > > characters. Pretending that it's all the same thing really means
> > > pretending that one byte represents one character and that each
> > > character is depicted by one glyph. And that's doomed to failure, unless
> > > everyone speaks English with no foreign symbols - so, no mathematical
> > > notations.
> >
> > Pardon me, but you can't even write *English* in ASCII.
> >
> > You can't say that it cost you £10 to courier your résumé to the head
> > office of Encyclopædia Britanica to apply for the position of Staff
> > Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
> > and old-fashioned, but it is traditional English.)
> >
> > Hell, you can't even write in *American*: you can't say that the recipe
> > for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.
>
> [Quite OT but...] How do you type all this?
> [Note: I grew up on APL so unlike Rick I am genuinely asking :-) ]
[Emacs speficic]
Many different ways of course, but in emacs, you can select e.g. the TeX input method
with C-x RET C-\ TeX RET.
which does all of the above symbols with the exception of the cent
symbol (or maybe I missed it) - you type the thing in the first column and you
get the thing in the second column
\pounds £
\'e é
\ae æ
\"o ö
^{TM} ™
\copyright ©
I gave up on the cent symbol and used ucs-insert (C-x 8 RET) which allows you to type
a name, in this case CENT SIGN to get ¢.
Nick
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2012-02-12 19:09 -0800 |
| Subject | entering unicode (was Python usage numbers) |
| Message-ID | <d64b98fa-5821-4b9f-947b-25c7d4a2c6e9@og8g2000pbb.googlegroups.com> |
| In reply to | #20284 |
On Feb 12, 10:36 pm, Nick Dokos <nicholas.do...@hp.com> wrote:
> rusi <rustompm...@gmail.com> wrote:
> > On Feb 12, 10:51 am, Steven D'Aprano <steve
> > +comp.lang.pyt...@pearwood.info> wrote:
> > > On Sun, 12 Feb 2012 15:38:37 +1100, Chris Angelico wrote:
> > > > Everything that displays text to a human needs to translate bytes into
> > > > glyphs, and the usual way to do this conceptually is to go via
> > > > characters. Pretending that it's all the same thing really means
> > > > pretending that one byte represents one character and that each
> > > > character is depicted by one glyph. And that's doomed to failure, unless
> > > > everyone speaks English with no foreign symbols - so, no mathematical
> > > > notations.
>
> > > Pardon me, but you can't even write *English* in ASCII.
>
> > > You can't say that it cost you £10 to courier your résumé to the head
> > > office of Encyclopædia Britanica to apply for the position of Staff
> > > Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
> > > and old-fashioned, but it is traditional English.)
>
> > > Hell, you can't even write in *American*: you can't say that the recipe
> > > for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.
>
> > [Quite OT but...] How do you type all this?
> > [Note: I grew up on APL so unlike Rick I am genuinely asking :-) ]
>
> [Emacs speficic]
>
> Many different ways of course, but in emacs, you can select e.g. the TeX input method
> with C-x RET C-\ TeX RET.
> which does all of the above symbols with the exception of the cent
> symbol (or maybe I missed it) - you type the thing in the first column and you
> get the thing in the second column
>
> \pounds £
> \'e é
> \ae æ
> \"o ö
> ^{TM} ™
> \copyright ©
>
> I gave up on the cent symbol and used ucs-insert (C-x 8 RET) which allows you to type
> a name, in this case CENT SIGN to get ¢.
>
> Nick
[OT warning]
I asked this on the emacs list:
No response there and the responses here are more helpful so asking
here.
My question there was emacs-specific. If there is some other app,
thats fine.
I have some bunch of sanskrit (devanagari) to type. It would be
easiest for me if I could have the English (roman) as well as the
sanskrit (devanagari).
For example using the devanagari-itrans input method I can write the
gayatri mantra using
OM bhUrbhuvaH suvaH
tatsaviturvarenyam
bhargo devasya dhImahi
dhiyo yonaH prachodayAt
and emacs produces *on the fly* (ie I cant see/edit the above)
ॐ भूर्भुवः सुवः
तत्सवितुर्वरेण्यम्
भर्गो देवस्य धीमहि
धियो योनः प्रचोदयात्
Can I do it in batch mode? ie write the first in a file and run some
command on it to produce the second?
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-02-19 03:44 +0000 |
| Subject | Re: entering unicode (was Python usage numbers) |
| Message-ID | <4f407007$0$29986$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #20322 |
On Sun, 12 Feb 2012 19:09:32 -0800, rusi wrote: > I have some bunch of sanskrit (devanagari) to type. It would be easiest > for me if I could have the English (roman) as well as the sanskrit > (devanagari). > > For example using the devanagari-itrans input method I can write the > gayatri mantra using > > OM bhUrbhuvaH suvaH > tatsaviturvarenyam > bhargo devasya dhImahi > dhiyo yonaH prachodayAt > > and emacs produces *on the fly* (ie I cant see/edit the above) > > ॐ भूर्भुवः सुवः तत्सवितुर्वरेण्यम् भर्गो देवस्य धीमहि धियो योनः > प्रचोदयात् > > Can I do it in batch mode? ie write the first in a file and run some > command on it to produce the second? What is the devanagari-itrans input method? Do you actually type the characters into a terminal? If so, you should be able to type the first into a file, copy it, then paste it into the input buffer to be processed. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2012-02-19 00:52 -0800 |
| Subject | Re: entering unicode (was Python usage numbers) |
| Message-ID | <6c4323a5-79bb-4105-8581-a2bef19fc39b@4g2000pbz.googlegroups.com> |
| In reply to | #20603 |
On Feb 19, 8:44 am, Steven D'Aprano <steve +comp.lang.pyt...@pearwood.info> wrote: > On Sun, 12 Feb 2012 19:09:32 -0800, rusi wrote: > > I have some bunch of sanskrit (devanagari) to type. It would be easiest > > for me if I could have the English (roman) as well as the sanskrit > > (devanagari). > > > For example using the devanagari-itrans input method I can write the > > gayatri mantra using > > > OM bhUrbhuvaH suvaH > > tatsaviturvarenyam > > bhargo devasya dhImahi > > dhiyo yonaH prachodayAt > > > and emacs produces *on the fly* (ie I cant see/edit the above) > > > ॐ भूर्भुवः सुवः तत्सवितुर्वरेण्यम् भर्गो > > देवस्य धीमहि धियो योनः > > > प्रचोदयात् > > > Can I do it in batch mode? ie write the first in a file and run some > > command on it to produce the second? > > What is the devanagari-itrans input method? Do you actually type the > characters into a terminal? Its one of the dozens (hundreds actually) of input methods that emacs has. In emacs M-x set-input-method and then give devanagari-itrans. Its details are described (somewhat poorly) here http://en.wikipedia.org/wiki/ITRANS > > If so, you should be able to type the first into a file, copy it, then > paste it into the input buffer to be processed. For now Ive got it working in emacs with some glitches but it will do for now: http://groups.google.com/group/gnu.emacs.help/browse_thread/thread/bfa6b05ce565d96d# > > -- > Steven Thanks [Actually thanks-squared one for looking two for backing up this thread to something more useful than ... :-) ] Coincidentally, along this, your response, Ive got another mail to another unicode related interest of mine: apl under linux. So far I was the sole (known) user of this: http://www.emacswiki.org/emacs/AplInDebian I hear the number of users has just doubled :-)
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2012-02-13 09:43 +1100 |
| Subject | How do you Unicode proponents type your non-ASCII characters? (was: Python usage numbers) |
| Message-ID | <87d39j3c5z.fsf_-_@benfinney.id.au> |
| In reply to | #20281 |
rusi <rustompmody@gmail.com> writes:
> On Feb 12, 10:51 am, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:
> > Pardon me, but you can't even write *English* in ASCII.
> >
> > You can't say that it cost you £10 to courier your résumé to the head
> > office of Encyclopædia Britanica to apply for the position of Staff
> > Coördinator. (Admittedly, the umlaut on the second "o" looks a bit stuffy
> > and old-fashioned, but it is traditional English.)
> >
> > Hell, you can't even write in *American*: you can't say that the recipe
> > for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc.
>
> [Quite OT but...] How do you type all this?
In GNU+Linux, I run the IBus daemon to manage different keyboard input
methods across all my applications consistently. That makes hundreds of
language-specific input methods available, and also many that are not
language-specific.
It's useful if I want to 英語の書面を書き中 type a passage of Japanese
with the ‘anthy’ input method, or likewise for any of the other
available language-specific input methods.
I normally have IBus presenting the ‘rfc1345’ input method. That makes
just about all keys input the corresponding character just as if no
input method were active. But when I type ‘&’ followed by a two- or
three-key sequence, it inputs the corresponding character from the
RFC 1345 mnemonics table:
& → &
P d → £
e ' → é
a e → æ
o : → ö
C t → ¢
T M → ™
C o → ©
" 6 → “
" 9 → ”
…
Those same characters are also available with the ‘latex’ input method,
if I'm familiar with LaTeX character entity names. (I'm not.)
--
\ “If [a technology company] has confidence in their future |
`\ ability to innovate, the importance they place on protecting |
_o__) their past innovations really should decline.” —Gary Barnett |
Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-02-12 22:56 +0000 |
| Message-ID | <4f3843a0$0$29986$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #20281 |
On Sun, 12 Feb 2012 08:50:28 -0800, rusi wrote: >> You can't say that it cost you £10 to courier your résumé to the head >> office of Encyclopædia Britanica to apply for the position of Staff >> Coördinator. (Admittedly, the umlaut on the second "o" looks a bit >> stuffy and old-fashioned, but it is traditional English.) >> >> Hell, you can't even write in *American*: you can't say that the recipe >> for the 20¢ WobblyBurger™ is © 2012 WobblyBurgerWorld Inc. > > [Quite OT but...] How do you type all this? [Note: I grew up on APL so > unlike Rick I am genuinely asking :-) ] In my case, I used the KDE application "KCharSelect". I manually hunt through the tables for the character I want (which sucks), click on the characters I want, and copy and paste them into my editor. Back in Ancient Days when I ran Mac OS 6, I had memorised many keyboard shortcuts for these things. Option-4 was the pound sign, I believe, and Option-Shift-4 the cent sign. Or perhaps the other way around? -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-02-12 10:13 -0500 |
| Message-ID | <roy-98B81C.10130712022012@news.panix.com> |
| In reply to | #20248 |
In article <mailman.5715.1329021524.27778.python-list@python.org>, Chris Angelico <rosuav@gmail.com> wrote: > On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson > <rantingrickjohnson@gmail.com> wrote: > > On Feb 11, 8:23 pm, Steven D'Aprano <steve > > +comp.lang.pyt...@pearwood.info> wrote: > >> "I have a file containing text. I can open it in an editor and see it's > >> nearly all ASCII text, except for a few weird and bizarre characters like > >> Ł Š ą or ö. In Python 2, I can read that file fine. In Python 3 I get an > >> error. What should I do that requires no thought?" > >> > >> Obvious answers: > > > > the most obvious answer would be to read the file WITHOUT worrying > > about asinine encoding. > > What this statement misunderstands, though, is that ASCII is itself an > encoding. Files contain bytes, and it's only what's external to those > bytes that gives them meaning. Exactly. <soapbox class="wise-old-geezer">. ASCII was so successful at becoming a universal standard which lasted for decades, people who grew up with it don't realize there was once any other way. Not just EBCDIC, but also SIXBIT, RAD-50, tilt/rotate, packed card records, and so on. Transcoding was a way of life, and if you didn't know what you were starting with and aiming for, it was hopeless. Kind of like now where we are again with Unicode. </soapbox>
[toc] | [prev] | [next] | [standalone]
Page 1 of 6 [1] 2 3 4 5 6 Next page →
Back to top | Article view | comp.lang.python
csiph-web