Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #71389 > unrolled thread
| Started by | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| First post | 2014-05-12 16:19 +0100 |
| Last post | 2014-05-14 09:56 -0600 |
| Articles | 20 on this page of 72 — 25 participants |
Back to article view | Back to comp.lang.python
Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-12 16:19 +0100
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-12 17:47 +0000
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 12:31 -0600
Re: Everything you did not want to know about Unicode in Python 3 MRAB <python@mrabarnett.plus.com> - 2014-05-12 20:42 +0100
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 16:16 -0600
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 09:42 +1000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 01:18 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 11:39 +1000
Re: Everything you did not want to know about Unicode in Python 3 alex23 <wuwei23@gmail.com> - 2014-05-13 16:25 +1000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:32 +1000
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-12 20:58 -0500
Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 03:33 +0100
Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 22:10 -0700
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 00:39 -0500
Re: Everything you did not want to know about Unicode in Python 3 Gene Heskett <gheskett@wdtv.com> - 2014-05-13 01:45 -0400
Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-13 16:03 +1000
Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 23:09 -0700
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:18 +1000
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
Re: Everything you did not want to know about Unicode in Python 3 Roy Smith <roy@panix.com> - 2014-05-13 07:20 -0400
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 13:39 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:43 +1000
Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-13 07:30 -0700
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 00:36 +1000
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:51 +0000
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 14:42 +0000
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 15:21 +0000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 23:53 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 10:08 +1000
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:42 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 22:52 +1000
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-16 14:46 +0000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 01:07 +0000
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-17 07:19 +0300
Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-17 09:35 +0100
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 10:29 +0100
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 14:15 +0000
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:01 +0100
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 09:57 +0100
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 12:07 +0000
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:07 +0100
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-17 19:18 +1000
Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-17 21:05 +1000
[OT] Copyright statements and why they can be useful (was: Everything you did not want to know about Unicode in Python 3) Ben Finney <ben@benfinney.id.au> - 2014-05-14 11:01 +1000
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:07 -0600
Re: Everything you did not want to know about Unicode in Python 3 Dave Angel <davea@davea.name> - 2014-05-13 21:56 -0400
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:49 +0000
Re: Everything you did not want to know about Unicode in Python 3 gregor <gregor@ediwo.com> - 2014-05-13 09:27 +0200
Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 10:08 +0200
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 11:25 +0300
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 18:38 +1000
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:06 +0300
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 19:29 +1000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve@pearwood.info> - 2014-05-13 09:44 +0000
Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:38 +0200
Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:46 +0200
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:59 +0300
Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 14:30 +0100
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:37 +1000
Re: Everything you did not want to know about Unicode in Python 3 Skip Montanaro <skip@pobox.com> - 2014-05-13 09:02 -0500
Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-14 00:00 -0700
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 11:19 +0000
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-13 10:08 -0600
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-14 00:10 +0000
Re: Everything you did not want to know about Unicode in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2014-05-13 17:53 -0700
Re: Everything you did not want to know about Unicode in Python 3 Terry Reedy <tjreedy@udel.edu> - 2014-05-14 17:47 -0400
Re: Everything you did not want to know about Unicode in Python 3 Antoine Pitrou <antoine@python.org> - 2014-05-16 11:50 +0000
Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-16 06:20 -0700
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:38 +0000
Re: Everything you did not want to know about Unicode in Python 3 Robin Becker <robin@reportlab.com> - 2014-05-14 16:30 +0100
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:56 -0600
Page 1 of 4 [1] 2 3 4 Next page →
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-05-12 16:19 +0100 |
| Subject | Everything you did not want to know about Unicode in Python 3 |
| Message-ID | <mailman.9915.1399907977.18130.python-list@python.org> |
This was *NOT* written by our resident unicode expert http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ Posted as I thought it would make a rather pleasant change from interminable threads about names vs values vs variables vs objects. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [next] | [standalone]
| From | alister <alister.nospam.ware@ntlworld.com> |
|---|---|
| Date | 2014-05-12 17:47 +0000 |
| Message-ID | <8P7cv.78617$Sp6.8377@fx15.am4> |
| In reply to | #71389 |
On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: > This was *NOT* written by our resident unicode expert > http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ > > Posted as I thought it would make a rather pleasant change from > interminable threads about names vs values vs variables vs objects. Surely those example programs are not the pythonoic way to do things or am i missing something? if those code samples are anything to go by this guy makes JMF look sensible. -- The Heineken Uncertainty Principle: You can never be sure how many beers you had last night.
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-05-12 12:31 -0600 |
| Message-ID | <mailman.9918.1399919514.18130.python-list@python.org> |
| In reply to | #71397 |
On Mon, May 12, 2014 at 11:47 AM, alister <alister.nospam.ware@ntlworld.com> wrote: > On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: > >> This was *NOT* written by our resident unicode expert >> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ >> >> Posted as I thought it would make a rather pleasant change from >> interminable threads about names vs values vs variables vs objects. > > Surely those example programs are not the pythonoic way to do things or > am i missing something? The _is_binary_reader and _is_binary_writer functions look like they could be simplified by calling isinstance on the io object itself against io.TextIOBase, io.BufferedIOBase or io.RawIOBase, rather than doing those odd 0-length reads and writes. And then perhaps those exception-swallowing try-excepts wouldn't be necessary. But perhaps there's a non-obvious reason why it's written the way it is. And there appears to be a bug where everything *except* the filename '-' is treated as stdin, so the script probably hasn't been tested at all. > if those code samples are anything to go by this guy makes JMF look > sensible. This is an ad hominem. Just because his code sucks doesn't mean he's wrong about the state of Unicode and UNIX in Python 3.
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2014-05-12 20:42 +0100 |
| Message-ID | <mailman.9919.1399923932.18130.python-list@python.org> |
| In reply to | #71397 |
On 2014-05-12 19:31, Ian Kelly wrote: > On Mon, May 12, 2014 at 11:47 AM, alister > <alister.nospam.ware@ntlworld.com> wrote: >> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: >> >>> This was *NOT* written by our resident unicode expert >>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ >>> >>> Posted as I thought it would make a rather pleasant change from >>> interminable threads about names vs values vs variables vs objects. >> >> Surely those example programs are not the pythonoic way to do things or >> am i missing something? > > The _is_binary_reader and _is_binary_writer functions look like they > could be simplified by calling isinstance on the io object itself > against io.TextIOBase, io.BufferedIOBase or io.RawIOBase, rather than > doing those odd 0-length reads and writes. And then perhaps those > exception-swallowing try-excepts wouldn't be necessary. But perhaps > there's a non-obvious reason why it's written the way it is. > How about checking sys.stdin.mode and sys.stdout.mode? > And there appears to be a bug where everything *except* the filename > '-' is treated as stdin, so the script probably hasn't been tested at > all. > >> if those code samples are anything to go by this guy makes JMF look >> sensible. > > This is an ad hominem. Just because his code sucks doesn't mean he's > wrong about the state of Unicode and UNIX in Python 3. >
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-05-12 16:16 -0600 |
| Message-ID | <mailman.9923.1399936290.18130.python-list@python.org> |
| In reply to | #71397 |
On Mon, May 12, 2014 at 1:42 PM, MRAB <python@mrabarnett.plus.com> wrote: > How about checking sys.stdin.mode and sys.stdout.mode? Seems to work, but I notice that the docs only define the mode attribute for the FileIO class, which sys.stdin and sys.stdout are not instances of.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-13 09:42 +1000 |
| Message-ID | <mailman.9925.1399938136.18130.python-list@python.org> |
| In reply to | #71397 |
On Tue, May 13, 2014 at 4:31 AM, Ian Kelly <ian.g.kelly@gmail.com> wrote: > Just because his code sucks doesn't mean he's > wrong about the state of Unicode and UNIX in Python 3. Uhm... I think wrongness of code is generally fairly indicative of wrongness of thinking :) If I write a rant about how Python's list type sucks and it turns out my code is using it like a cons cell and never putting more than two elements into a list, then you would accurately conclude that I'm wrong about the state of data type support in Python. I don't have a problem with someone coming to the list here with misconceptions. That's what discussions are for. But rants like that, on blogs, I quickly get weary of reading. The tone is always "Look what's so wrong", not inviting dialogue, and I can't be bothered digging into the details to compose a full response. Chances are the author's (a) not looking at what 3.4 and what's happened to improve things (and certainly not 3.5 and what's going to happen), and (b) not listening to responses anyway. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-05-13 01:18 +0000 |
| Message-ID | <537172eb$0$29980$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #71397 |
On Mon, 12 May 2014 17:47:48 +0000, alister wrote: > On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: > >> This was *NOT* written by our resident unicode expert >> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ >> >> Posted as I thought it would make a rather pleasant change from >> interminable threads about names vs values vs variables vs objects. > > Surely those example programs are not the pythonoic way to do things or > am i missing something? Feel free to show us your version of "cat" for Python then. Feel free to target any version you like. Don't forget to test it against files with names and content that: - aren't valid UTF-8; - are valid UTF-8, but not valid in the local encoding. > if those code samples are anything to go by this guy makes JMF look > sensible. Armin Ronacher is an extremely experienced and knowledgeable Python developer, and a Python core developer. He might be wrong, but he's not *obviously* wrong. Unicode is hard, not because Unicode is hard, but because of legacy problems. I can create a file on a machine that uses ISO-8859-7 for the file name, put JShift-JIS encoded text inside it, transfer it to a machine that uses Windows-1251 as the file system encoding, then SSH into that machine from a system using Big5, and try to make sense of it. If everybody used UTF-8 any time data touched a disk or network, we'd be laughing. It would all be so simple. Reading Armin's post, I think that all that is needed to simplify his Python 3 version is: - have a bytes version of sys.argv (bargv? argvb?) and read the file names from that; - have a simple way to write bytes to stdout and stderr. Most programs won't need either of those, but file system utilities will. -- Steven D'Aprano http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-13 11:39 +1000 |
| Message-ID | <mailman.9934.1399945166.18130.python-list@python.org> |
| In reply to | #71416 |
On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Reading Armin's post, I think that all that is needed to simplify his > Python 3 version is: > > - have a bytes version of sys.argv (bargv? argvb?) and read > the file names from that; argb? :) > - have a simple way to write bytes to stdout and stderr. I'm not sure how that goes with I/O redirection, but sure. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | alex23 <wuwei23@gmail.com> |
|---|---|
| Date | 2014-05-13 16:25 +1000 |
| Message-ID | <lksdru$p5k$1@dont-email.me> |
| In reply to | #71418 |
On 13/05/2014 11:39 AM, Chris Angelico wrote: > On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano > <steve+comp.lang.python@pearwood.info> wrote: >> - have a bytes version of sys.argv (bargv? argvb?) and read >> the file names from that; > > argb? :) I tried and failed to come up with an "argy bargy" joke here so decided to go for a meta-reference instead.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-13 16:32 +1000 |
| Message-ID | <mailman.9941.1399962736.18130.python-list@python.org> |
| In reply to | #71438 |
On Tue, May 13, 2014 at 4:25 PM, alex23 <wuwei23@gmail.com> wrote: > On 13/05/2014 11:39 AM, Chris Angelico wrote: >> >> On Tue, May 13, 2014 at 11:18 AM, Steven D'Aprano >> <steve+comp.lang.python@pearwood.info> wrote: >>> >>> - have a bytes version of sys.argv (bargv? argvb?) and read >>> the file names from that; >> >> >> argb? :) > > > I tried and failed to come up with an "argy bargy" joke here so decided to > go for a meta-reference instead. I'm just waiting for someone to have need for arguments in both network byte order and host byte order. The latter, of course, would be "argh". ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Mark H Harris <harrismh777@gmail.com> |
|---|---|
| Date | 2014-05-12 20:58 -0500 |
| Message-ID | <lkru8l$t30$1@speranza.aioe.org> |
| In reply to | #71416 |
On 5/12/14 8:18 PM, Steven D'Aprano wrote: > Unicode is hard, not because Unicode is hard, but because of legacy > problems. Yes. To put a finer point on that, Unicode (which is only a specification constantly being improved upon) is harder to implement when it hasn't been on the design board from the ground up; Python in this case. Julia has Unicode support from the ground up, and it was easier for those guys to implement (in beta release) than for the Python crew when they undertook the Unicode work that had to be done for Python3.x (just an observation). Anytime there are legacy code issues, regression testing problems, and a host of domain issues that weren't thought through from the get-go there are going to be more problematic hurdles; not to mention bugs. Having said that, I still think Unicode is somewhat harder than you're admitting. marcus
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-05-13 03:33 +0100 |
| Message-ID | <mailman.9935.1399948395.18130.python-list@python.org> |
| In reply to | #71416 |
On 13/05/2014 02:18, Steven D'Aprano wrote: > On Mon, 12 May 2014 17:47:48 +0000, alister wrote: > >> On Mon, 12 May 2014 16:19:17 +0100, Mark Lawrence wrote: >> >>> This was *NOT* written by our resident unicode expert >>> http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/ >>> >>> Posted as I thought it would make a rather pleasant change from >>> interminable threads about names vs values vs variables vs objects. >> >> Surely those example programs are not the pythonoic way to do things or >> am i missing something? > > Feel free to show us your version of "cat" for Python then. Feel free to > target any version you like. Don't forget to test it against files with > names and content that: > > - aren't valid UTF-8; > > - are valid UTF-8, but not valid in the local encoding. > > > >> if those code samples are anything to go by this guy makes JMF look >> sensible. > > Armin Ronacher is an extremely experienced and knowledgeable Python > developer, and a Python core developer. He might be wrong, but he's not > *obviously* wrong. > > Unicode is hard, not because Unicode is hard, but because of legacy > problems. I can create a file on a machine that uses ISO-8859-7 for the > file name, put JShift-JIS encoded text inside it, transfer it to a > machine that uses Windows-1251 as the file system encoding, then SSH into > that machine from a system using Big5, and try to make sense of it. If > everybody used UTF-8 any time data touched a disk or network, we'd be > laughing. It would all be so simple. > > Reading Armin's post, I think that all that is needed to simplify his > Python 3 version is: > > - have a bytes version of sys.argv (bargv? argvb?) and read > the file names from that; > > - have a simple way to write bytes to stdout and stderr. > > Most programs won't need either of those, but file system utilities will. > I think http://bugs.python.org/issue8776 and http://bugs.python.org/issue8775 are relevant but both were placed in the small round filing cabinet. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2014-05-12 22:10 -0700 |
| Message-ID | <82899649-014a-4309-b06e-b981fc6921fa@googlegroups.com> |
| In reply to | #71416 |
On Tuesday, May 13, 2014 6:48:35 AM UTC+5:30, Steven D'Aprano wrote: > On Mon, 12 May 2014 17:47:48 +0000, alister wrote: > > > Surely those example programs are not the pythonoic way to do things or > > am i missing something? > > > > Feel free to show us your version of "cat" for Python then. Feel free to > target any version you like. Don't forget to test it against files with > names and content that: > > > - aren't valid UTF-8; > > > - are valid UTF-8, but not valid in the local encoding. Thanks for a non-defensive appraisal! > > > > if those code samples are anything to go by this guy makes JMF look > > sensible. > > > > Armin Ronacher is an extremely experienced and knowledgeable Python > developer, and a Python core developer. He might be wrong, but he's not > *obviously* wrong. > > > > Unicode is hard, not because Unicode is hard, but because of legacy > problems. I can create a file on a machine that uses ISO-8859-7 for the > file name, put JShift-JIS encoded text inside it, transfer it to a > machine that uses Windows-1251 as the file system encoding, then SSH into > that machine from a system using Big5, and try to make sense of it. If > everybody used UTF-8 any time data touched a disk or network, we'd be > laughing. It would all be so simple. I think the most helpful way forward is to accept two things: a. Unicode is a headache b. No-unicode is a non-option > > > > Reading Armin's post, I think that all that is needed to simplify his > Python 3 version is: > > > > - have a bytes version of sys.argv (bargv? argvb?) and read > the file names from that; > > - have a simple way to write bytes to stdout and stderr. > > > Most programs won't need either of those, but file system utilities will. About the technical merits of Armin's post and your suggestions, Ive nothing to say, since I am an ignoramus on (the mechanics of) unicode [Consider me an eager, early, ignorant adopter :-) ] Its however good to note that unicode is rather unique in the history not just of IT/CS but of humanity, in the sense that no one (to the best of my knowledge) has ever tried to come up with an all-encompassing umbrella for all humanity's scripts/writing systems etc. So hiccups and mistakes are only to be expected. The absence of these would be much more surprising!
[toc] | [prev] | [next] | [standalone]
| From | Mark H Harris <harrismh777@gmail.com> |
|---|---|
| Date | 2014-05-13 00:39 -0500 |
| Message-ID | <lksb5p$njf$1@speranza.aioe.org> |
| In reply to | #71425 |
On 5/13/14 12:10 AM, Rustom Mody wrote: > I think the most helpful way forward is to accept two things: > a. Unicode is a headache > b. No-unicode is a non-option QOTW (so far...)
[toc] | [prev] | [next] | [standalone]
| From | Gene Heskett <gheskett@wdtv.com> |
|---|---|
| Date | 2014-05-13 01:45 -0400 |
| Message-ID | <mailman.9936.1399960355.18130.python-list@python.org> |
| In reply to | #71428 |
On Tuesday 13 May 2014 01:39:06 Mark H Harris did opine And Gene did reply: > On 5/13/14 12:10 AM, Rustom Mody wrote: > > I think the most helpful way forward is to accept two things: > > a. Unicode is a headache > > b. No-unicode is a non-option > > QOTW (so far...) But its early yet, only Tuesday & its just barely started... :) Cheers, Gene -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) Genes Web page <http://geneslinuxbox.net:6309/gene> US V Castleman, SCOTUS, Mar 2014 is grounds for Impeaching SCOTUS
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben@benfinney.id.au> |
|---|---|
| Date | 2014-05-13 16:03 +1000 |
| Message-ID | <mailman.9938.1399961037.18130.python-list@python.org> |
| In reply to | #71428 |
Gene Heskett <gheskett@wdtv.com> writes: > On Tuesday 13 May 2014 01:39:06 Mark H Harris did opine > > QOTW (so far...) > > But its early yet, only Tuesday & its just barely started... :) Says who? For some of us, Tuesday is approaching sunset. (It's always a good day to remind people that the rest of the world exists.) -- \ “Reality must take precedence over public relations, for nature | `\ cannot be fooled.” —Richard P. Feynman, _Rogers' Commission | _o__) Report into the Challenger Crash_, 1986-06 | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Rustom Mody <rustompmody@gmail.com> |
|---|---|
| Date | 2014-05-12 23:09 -0700 |
| Message-ID | <72d4f4e7-1bbd-4ceb-8e7f-d8ca18e1c1b2@googlegroups.com> |
| In reply to | #71428 |
On Tuesday, May 13, 2014 11:09:06 AM UTC+5:30, Mark H. Harris wrote: > On 5/13/14 12:10 AM, Rustom Mody wrote: > > > I think the most helpful way forward is to accept two things: > > a. Unicode is a headache > > b. No-unicode is a non-option > > > QOTW (so far...) I said that getting unicode right straight off is unrealistic. I should have added this: Armin makes a (sarcastic?) dig about the fact that python (3) goofs because its mismatched with the assumptions of unix. | UNIX is bytes, has been defined that way and will always be that way. To | Unicode on UNIX is only madness if you force it on everything. But that's not | how Unicode on UNIX works. UNIX does not have a distinction between unicode | and byte APIs. They are one and the same which makes them easy to deal with.] | Python 3 takes a very difference stance on Unicode than UNIX does. Python 3 | says: everything is Unicode ... This may be right... Or it may be the other way round as I claim at http://blog.languager.org/2014/04/unicode-and-unix-assumption.html At this point I dont believe that anyone is very clear what is the right way and and wrong way
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-13 16:18 +1000 |
| Message-ID | <mailman.9939.1399961928.18130.python-list@python.org> |
| In reply to | #71428 |
On Tue, May 13, 2014 at 4:03 PM, Ben Finney <ben@benfinney.id.au> wrote: > (It's always a good day to remind people that the rest of the world > exists.) Ironic that this should come up in a discussion on Unicode, given that Unicode's fundamental purpose is to welcome that whole rest of the world instead of yelling "LALALALALA America is everything" and pretending that ASCII, or Latin-1, or something, is all you need. ChrisA Currently enjoying "Monday Night Flagging" on Threshold RPG... at 4pm on Tuesday.
[toc] | [prev] | [next] | [standalone]
| From | Mark H Harris <harrismh777@gmail.com> |
|---|---|
| Date | 2014-05-13 01:32 -0500 |
| Message-ID | <5371BC77.4090106@gmail.com> |
| In reply to | #71437 |
On 5/13/14 1:18 AM, Chris Angelico wrote: > instead of yelling "LALALALALA America is everything" and > pretending that ASCII, or Latin-1, or something, is all you need. > ... it isn't? LALALALALALALALALA :))
[toc] | [prev] | [next] | [standalone]
| From | Mark H Harris <harrismh777@gmail.com> |
|---|---|
| Date | 2014-05-13 01:32 -0500 |
| Message-ID | <mailman.9942.1399962753.18130.python-list@python.org> |
| In reply to | #71437 |
On 5/13/14 1:18 AM, Chris Angelico wrote: > instead of yelling "LALALALALA America is everything" and > pretending that ASCII, or Latin-1, or something, is all you need. > ... it isn't? LALALALALALALALALA :))
[toc] | [prev] | [next] | [standalone]
Page 1 of 4 [1] 2 3 4 Next page →
Back to top | Article view | comp.lang.python
csiph-web