Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #71389 > unrolled thread
| Started by | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| First post | 2014-05-12 16:19 +0100 |
| Last post | 2014-05-14 09:56 -0600 |
| Articles | 20 on this page of 72 — 25 participants |
Back to article view | Back to comp.lang.python
Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-12 16:19 +0100
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-12 17:47 +0000
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 12:31 -0600
Re: Everything you did not want to know about Unicode in Python 3 MRAB <python@mrabarnett.plus.com> - 2014-05-12 20:42 +0100
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-12 16:16 -0600
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 09:42 +1000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 01:18 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 11:39 +1000
Re: Everything you did not want to know about Unicode in Python 3 alex23 <wuwei23@gmail.com> - 2014-05-13 16:25 +1000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:32 +1000
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-12 20:58 -0500
Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 03:33 +0100
Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 22:10 -0700
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 00:39 -0500
Re: Everything you did not want to know about Unicode in Python 3 Gene Heskett <gheskett@wdtv.com> - 2014-05-13 01:45 -0400
Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-13 16:03 +1000
Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-12 23:09 -0700
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 16:18 +1000
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
Re: Everything you did not want to know about Unicode in Python 3 Mark H Harris <harrismh777@gmail.com> - 2014-05-13 01:32 -0500
Re: Everything you did not want to know about Unicode in Python 3 Roy Smith <roy@panix.com> - 2014-05-13 07:20 -0400
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 13:39 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:43 +1000
Re: Everything you did not want to know about Unicode in Python 3 Rustom Mody <rustompmody@gmail.com> - 2014-05-13 07:30 -0700
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 00:36 +1000
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:51 +0000
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 14:42 +0000
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 15:21 +0000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-13 23:53 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 10:08 +1000
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:42 +0000
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-14 22:52 +1000
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-16 14:46 +0000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 01:07 +0000
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-17 07:19 +0300
Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-17 09:35 +0100
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 10:29 +0100
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 14:15 +0000
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:01 +0100
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 09:57 +0100
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-17 12:07 +0000
Re: Everything you did not want to know about Unicode in Python 3 Robert Kern <robert.kern@gmail.com> - 2014-05-17 22:07 +0100
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-17 19:18 +1000
Re: Everything you did not want to know about Unicode in Python 3 Ben Finney <ben@benfinney.id.au> - 2014-05-17 21:05 +1000
[OT] Copyright statements and why they can be useful (was: Everything you did not want to know about Unicode in Python 3) Ben Finney <ben@benfinney.id.au> - 2014-05-14 11:01 +1000
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:07 -0600
Re: Everything you did not want to know about Unicode in Python 3 Dave Angel <davea@davea.name> - 2014-05-13 21:56 -0400
Re: Everything you did not want to know about Unicode in Python 3 Grant Edwards <invalid@invalid.invalid> - 2014-05-13 13:49 +0000
Re: Everything you did not want to know about Unicode in Python 3 gregor <gregor@ediwo.com> - 2014-05-13 09:27 +0200
Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 10:08 +0200
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 11:25 +0300
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 18:38 +1000
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:06 +0300
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 19:29 +1000
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve@pearwood.info> - 2014-05-13 09:44 +0000
Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:38 +0200
Re: Everything you did not want to know about Unicode in Python 3 Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-05-13 11:46 +0200
Re: Everything you did not want to know about Unicode in Python 3 Marko Rauhamaa <marko@pacujo.net> - 2014-05-13 12:59 +0300
Re: Everything you did not want to know about Unicode in Python 3 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-13 14:30 +0100
Re: Everything you did not want to know about Unicode in Python 3 Chris Angelico <rosuav@gmail.com> - 2014-05-13 23:37 +1000
Re: Everything you did not want to know about Unicode in Python 3 Skip Montanaro <skip@pobox.com> - 2014-05-13 09:02 -0500
Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-14 00:00 -0700
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-13 11:19 +0000
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-13 10:08 -0600
Re: Everything you did not want to know about Unicode in Python 3 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-05-14 00:10 +0000
Re: Everything you did not want to know about Unicode in Python 3 Ethan Furman <ethan@stoneleaf.us> - 2014-05-13 17:53 -0700
Re: Everything you did not want to know about Unicode in Python 3 Terry Reedy <tjreedy@udel.edu> - 2014-05-14 17:47 -0400
Re: Everything you did not want to know about Unicode in Python 3 Antoine Pitrou <antoine@python.org> - 2014-05-16 11:50 +0000
Re: Everything you did not want to know about Unicode in Python 3 wxjmfauth@gmail.com - 2014-05-16 06:20 -0700
Re: Everything you did not want to know about Unicode in Python 3 alister <alister.nospam.ware@ntlworld.com> - 2014-05-14 12:38 +0000
Re: Everything you did not want to know about Unicode in Python 3 Robin Becker <robin@reportlab.com> - 2014-05-14 16:30 +0100
Re: Everything you did not want to know about Unicode in Python 3 Ian Kelly <ian.g.kelly@gmail.com> - 2014-05-14 09:56 -0600
Page 3 of 4 — ← Prev page 1 2 [3] 4 Next page →
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2014-05-17 12:07 +0000 |
| Message-ID | <537750fc$0$29977$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #71673 |
On Sat, 17 May 2014 09:57:06 +0100, Robert Kern wrote:
> On 2014-05-17 02:07, Steven D'Aprano wrote:
>> On Fri, 16 May 2014 14:46:23 +0000, Grant Edwards wrote:
>>
>>> At least in the US, there doesn't seem to be such a thing as "placing
>>> a work into the public domain". The copyright holder can transfer
>>> ownershipt to soembody else, but there is no "public domain" to which
>>> ownership can be trasferred.
>>
>> That's factually incorrect. In the US, sufficiently old works, or works
>> of a certain age that were not explicitly registered for copyright, are
>> in the public domain. Under a wide range of circumstances, works
>> created by the federal government go immediately into the public
>> domain.
>
> There is such a thing as the public domain in the US, and there are
> works in it, but there isn't really such a thing as "placing a work"
> there voluntarily, as Grant says. A work either is or isn't in the
> public domain. The author has no choice in the matter.
That's incorrect.
http://cr.yp.to/publicdomain.html
Here's the money quote, from the 9th Circuit Court:
It is well settled that rights gained under the Copyright Act
may be abandoned. But abandonment of a right must be manifested
by some overt act indicating an intention to abandon that right.
There's also this:
http://creativecommons.org/publicdomain/zero/1.0/
which counts as an overt act.
By the way, there's more info on US copyright terms here:
http://copyright.cornell.edu/resources/publicdomain.cfm
although it doesn't specifically mention voluntarily abandonment of
copyright.
--
Steven D'Aprano
http://import-that.dreamwidth.org/
[toc] | [prev] | [next] | [standalone]
| From | Robert Kern <robert.kern@gmail.com> |
|---|---|
| Date | 2014-05-17 22:07 +0100 |
| Message-ID | <mailman.10100.1400360883.18130.python-list@python.org> |
| In reply to | #71679 |
On 2014-05-17 13:07, Steven D'Aprano wrote: > On Sat, 17 May 2014 09:57:06 +0100, Robert Kern wrote: > >> On 2014-05-17 02:07, Steven D'Aprano wrote: >>> On Fri, 16 May 2014 14:46:23 +0000, Grant Edwards wrote: >>> >>>> At least in the US, there doesn't seem to be such a thing as "placing >>>> a work into the public domain". The copyright holder can transfer >>>> ownershipt to soembody else, but there is no "public domain" to which >>>> ownership can be trasferred. >>> >>> That's factually incorrect. In the US, sufficiently old works, or works >>> of a certain age that were not explicitly registered for copyright, are >>> in the public domain. Under a wide range of circumstances, works >>> created by the federal government go immediately into the public >>> domain. >> >> There is such a thing as the public domain in the US, and there are >> works in it, but there isn't really such a thing as "placing a work" >> there voluntarily, as Grant says. A work either is or isn't in the >> public domain. The author has no choice in the matter. > > That's incorrect. > > http://cr.yp.to/publicdomain.html Thanks for the link. While it has not really changed my opinion (as discussed at length in my other reply), I did not know that the 9th Circuit had formalized the "overt act" test in their civil procedure rules, so there is at least one jurisdiction in the US that does currently work like this. None of the others do, to my knowledge, and this is the product of judicial common law, not statutory law, so it's still pretty shaky. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-17 19:18 +1000 |
| Message-ID | <mailman.10077.1400318344.18130.python-list@python.org> |
| In reply to | #71669 |
On Sat, May 17, 2014 at 6:57 PM, Robert Kern <robert.kern@gmail.com> wrote: > There is such a thing as the public domain in the US, and there are works in > it, but there isn't really such a thing as "placing a work" there > voluntarily, as Grant says. A work either is or isn't in the public domain. > The author has no choice in the matter. Then what's copyright status on PEPs? The nearest thing to "assigning to public domain" that works across legislatures is probably CC0: http://creativecommons.org/about/cc0 ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben@benfinney.id.au> |
|---|---|
| Date | 2014-05-17 21:05 +1000 |
| Message-ID | <mailman.10080.1400324748.18130.python-list@python.org> |
| In reply to | #71669 |
Chris Angelico <rosuav@gmail.com> writes: > On Sat, May 17, 2014 at 6:57 PM, Robert Kern <robert.kern@gmail.com> wrote: > > There is such a thing as the public domain in the US, and there are works in > > it, but there isn't really such a thing as "placing a work" there > > voluntarily, as Grant says. A work either is or isn't in the public domain. > > The author has no choice in the matter. > > Then what's copyright status on PEPs? My guess: They are in the default copyright status, with all rights reserved (i.e. everything that copyright law restricts, is forbidden to the recipient). But, if any of those copyright holders were ever to assert their copyright had been infringed by some recipient, the “this work is in the public domain” or equivalent would be taken as a clear indication of the *intent* of the copyright holder. Ultimately, what matters is the determination of whatever judge you find yourself facing. To that end, clarifying in the copyright statement and license terms exactly what is permitted can be immensely helpful in foreshortening and, ideally, avoiding a future copyright suit. Copyright is a ridiculous burden on everyone — to the extent that even those copyright holders who don't *want* those rights which the law reserves to the copyright holder, and want to divest themselves of the role of copyright holder, find it frustratingly difficult to do so effectively across jurisdictions. -- \ “Computer perspective on Moore's Law: Human effort becomes | `\ twice as expensive roughly every two years.” —anonymous | _o__) | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben@benfinney.id.au> |
|---|---|
| Date | 2014-05-14 11:01 +1000 |
| Subject | [OT] Copyright statements and why they can be useful (was: Everything you did not want to know about Unicode in Python 3) |
| Message-ID | <mailman.9985.1400029305.18130.python-list@python.org> |
| In reply to | #71515 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
> On Tue, 13 May 2014 14:42:51 +0000, alister wrote:
>
> > You do not need any statements at all, copyright is automaticly
> > assigned to anything you create (at least that is the case in UK
> > Law) although proving the creation date my be difficult.
>
> (1) In my lifetime, that wasn't always the case. Up until the 1970s or
> thereabouts, you had to explicitly register anything you wanted
> copyrighted […]
> (2) You don't have to just prove copyright. You also have to *identify*
> who the work is copyrighted by, and it needs to be an identifiable legal
> person (actual person or corporation), not necessarily the author. […]
(3) In all jurisdictions where copyright exists, the copyright holder
nominally has monopoly on the work for only a fixed term, starting from
the date of publication. To know when the copyright will expire, it's
essential to know the date from which copyright starts; this is best
done explicitly in the copyright statement.
I say “nominally”, because another alarming and unilateral trend is to
dramatically extend the nominally fixed term, and to strong-arm national
governments with terade deals to maximise the copyright term around the
world.
The effect, as Lawrence Lessig points out:
The meaning of this pattern is absolutely clear to those who pay to
produce it. The meaning is: No one can do to the Disney Corporation
what Walt Disney did to the Brothers Grimm. That though we had a
culture where people could take and build upon what went before,
that's over. There is no such thing as the public domain in the
minds of those who have produced these 11 extensions these last 40
years because now culture is owned.
<URL:http://www.oreillynet.com/pub/a/policy/2002/08/15/lessig.html>
Or, less poetically, since the term of copyright is only nominally
fixed, and in practice just keeps getting extended by newly-lobbied
legislation every twenty years or so, the copyright maximalists have
de facto instituted “perpetual copyright on the installment plan”
<URL:https://en.wikipedia.org/wiki/Perpetual_copyright>.
Nevertheless, copyright on works created this century will in principle
expire at some date in the future; and to know when that date will be,
we need to know when the copyright began. Hence the need for explicit
copyright statements saying the date of publication.
<URL:http://questioncopyright.org/>
--
\ “[T]he great menace to progress is not ignorance but the |
`\ illusion of knowledge.” —Daniel J. Boorstin, historian, |
_o__) 1914–2004 |
Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-05-14 09:07 -0600 |
| Message-ID | <mailman.10009.1400080047.18130.python-list@python.org> |
| In reply to | #71515 |
[Multipart message — attachments visible in raw view] — view raw
On May 13, 2014 6:10 PM, "Chris Angelico" <rosuav@gmail.com> wrote: > > On Wed, May 14, 2014 at 9:53 AM, Steven D'Aprano > <steve+comp.lang.python@pearwood.info> wrote: > > With the current system, all of us here are technically violating > > copyright every time we reply to an email and quote more than a small > > percentage of it. > > Oh wow... so when someone quotes heaps of text without trimming, and > adding blank lines, we can complain that it's a copyright violation - > reproducing our work with unauthorized modifications and without > permission... > > I never thought of it like that. I'd be surprised if this doesn't fall under fair use.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2014-05-13 21:56 -0400 |
| Message-ID | <mailman.10003.1400072207.18130.python-list@python.org> |
| In reply to | #71485 |
On 05/13/2014 09:39 AM, Steven D'Aprano wrote: > On Tue, 13 May 2014 07:20:34 -0400, Roy Smith wrote: > >> ASCII *is* all I need. > > You've never needed to copyright something? Copyright © Roy Smith 2014... > I know some people use (c) instead, but that actually has no legal > standing. (Not that any reasonable judge would invalidate a copyright > based on a technicality like that, not these days.) (c) has no standing whatsoever, as it's properly spelled (copr) -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Grant Edwards <invalid@invalid.invalid> |
|---|---|
| Date | 2014-05-13 13:49 +0000 |
| Message-ID | <lkt7tn$ncv$1@reader1.panix.com> |
| In reply to | #71437 |
On 2014-05-13, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, May 13, 2014 at 4:03 PM, Ben Finney <ben@benfinney.id.au> wrote:
>> (It's always a good day to remind people that the rest of the world
>> exists.)
>
> Ironic that this should come up in a discussion on Unicode, given that
> Unicode's fundamental purpose is to welcome that whole rest of the
> world instead of yelling "LALALALALA America is everything" and
> pretending that ASCII, or Latin-1, or something, is all you need.
Well, strictly speaking, it ASCII or Latin-1 _is_ all I need.
I will however admit to the existence of other people who might need
something else...
--
Grant Edwards grant.b.edwards Yow! How many retured
at bricklayers from FLORIDA
gmail.com are out purchasing PENCIL
SHARPENERS right NOW??
[toc] | [prev] | [next] | [standalone]
| From | gregor <gregor@ediwo.com> |
|---|---|
| Date | 2014-05-13 09:27 +0200 |
| Message-ID | <20140513092722.444c5a77@florenz> |
| In reply to | #71416 |
Am 13 May 2014 01:18:35 GMT schrieb Steven D'Aprano <steve+comp.lang.python@pearwood.info>: > > - have a simple way to write bytes to stdout and stderr. there is the underlying binary buffer: https://docs.python.org/3/library/sys.html#sys.stdin greg
[toc] | [prev] | [next] | [standalone]
| From | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| Date | 2014-05-13 10:08 +0200 |
| Message-ID | <lksju6$amr$1@news.albasani.net> |
| In reply to | #71416 |
On 13.05.2014 03:18, Steven D'Aprano wrote: > Armin Ronacher is an extremely experienced and knowledgeable Python > developer, and a Python core developer. He might be wrong, but he's not > *obviously* wrong. He's correct about file name encodings. Which can be fixed really easily wihtout messing everything up (sys.argv binary variant, open accepting binary filenames). But that he suggests that Go would be superior: > Which uses an even simpler model than Python 2: everything is a byte string. The assumed encoding is UTF-8. End of the story. Is just a horrible idea. An obviously horrible idea, too. Having dealt with the UTF-8 problems on Python2 I can safely say that I never, never ever want to go back to that freaky hell. If I deal with strings, I want to be able to sanely manipulate them and I want to be sure that after manipulation they're still valid strings. Manipulating the bytes representation of unicode data just doesn't work. And I'm very very glad that some people felt the same way and implemented a sane, consistent way of dealing with Unicode in Python3. It's one of the reasons why I switched to Py3 very early and I love it. Cheers, Johannes -- >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-05-13 11:25 +0300 |
| Message-ID | <87tx8uccgd.fsf@elektro.pacujo.net> |
| In reply to | #71449 |
Johannes Bauer <dfnsonfsduifb@gmx.de>: > Having dealt with the UTF-8 problems on Python2 I can safely say that > I never, never ever want to go back to that freaky hell. If I deal > with strings, I want to be able to sanely manipulate them and I want > to be sure that after manipulation they're still valid strings. > Manipulating the bytes representation of unicode data just doesn't > work. Based on my background (network and system programming), I'm a bit suspicious of strings, that is, text. For example, is the stuff that goes to syslog bytes or text? Does an XML file contain bytes or (encoded) text? The answers are not obvious to me. Modern computing is full of ASCII-esque binary communication standards and formats. Python 2's ambiguity allows me not to answer the tough philosophical questions. I'm not saying it's necessarily a good thing, but it has its benefits. Marko
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-13 18:38 +1000 |
| Message-ID | <mailman.9951.1399970292.18130.python-list@python.org> |
| In reply to | #71450 |
On Tue, May 13, 2014 at 6:25 PM, Marko Rauhamaa <marko@pacujo.net> wrote: > Johannes Bauer <dfnsonfsduifb@gmx.de>: > >> Having dealt with the UTF-8 problems on Python2 I can safely say that >> I never, never ever want to go back to that freaky hell. If I deal >> with strings, I want to be able to sanely manipulate them and I want >> to be sure that after manipulation they're still valid strings. >> Manipulating the bytes representation of unicode data just doesn't >> work. > > Based on my background (network and system programming), I'm a bit > suspicious of strings, that is, text. For example, is the stuff that > goes to syslog bytes or text? Does an XML file contain bytes or > (encoded) text? The answers are not obvious to me. Modern computing is > full of ASCII-esque binary communication standards and formats. These are problems that Unicode can't solve. In theory, XML should contain text in a known encoding (defaulting to UTF-8). With syslog, it's problematic - I don't remember what it's meant to be, but I know there are issues. Same with other log files. > Python 2's ambiguity allows me not to answer the tough philosophical > questions. I'm not saying it's necessarily a good thing, but it has its > benefits. It's not a good thing. It means that you have the convenience of pretending there's no problem, which means you don't notice trouble until something happens... and then, in all probability, your app is in production and you have no idea why stuff went wrong. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-05-13 12:06 +0300 |
| Message-ID | <87ppjicaj9.fsf@elektro.pacujo.net> |
| In reply to | #71451 |
Chris Angelico <rosuav@gmail.com>:
> These are problems that Unicode can't solve.
I actually think the problem has little to do with Unicode. Text is an
abstract data type just like any class. If I have an object (say, a
subprocess or a dictionary) in memory, I don't expect the object to have
any existence independently of the Python virtual machine. I have the
same feeling about Py3 strings: they only exist inside the Python
virtual machine.
An abstract object like a subprocess or dictionary justifies its
existence through its behaviour (its quacking). Now, do strings quack or
are they silent? I guess if you are writing a word processor they might
quack to you. Otherwise, they are just an esoteric storage format.
What I'm saying is that strings definitely have an important application
in the human interface. However, I feel strings might be overused in the
Py3 API. Case in point: are pathnames bytes objects or strings? The
linux position is that they are bytes objects. Py3 supports both
interpretations seemingly throughout:
open(b"/bin/ls") vs open("/bin/ls")
os.path.join(b"a", b"b") vs os.path.join("a", "b")
Marko
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-13 19:29 +1000 |
| Message-ID | <mailman.9954.1399973382.18130.python-list@python.org> |
| In reply to | #71453 |
On Tue, May 13, 2014 at 7:06 PM, Marko Rauhamaa <marko@pacujo.net> wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> These are problems that Unicode can't solve.
>
> I actually think the problem has little to do with Unicode. Text is an
> abstract data type just like any class. If I have an object (say, a
> subprocess or a dictionary) in memory, I don't expect the object to have
> any existence independently of the Python virtual machine. I have the
> same feeling about Py3 strings: they only exist inside the Python
> virtual machine.
That's true; the only difference is that text is extremely prevalent.
You can share a dict with another program, or store it in a file, or
whatever, simply by agreeing on an encoding - for instance, JSON. As
long as you and the other program know that this file is JSON encoded,
you can write it and he can read it, and you'll get the right data at
the far end. It's no different; there are encodings that are easy to
handle and have limitations, and there are encodings that are
elaborate and have lots of features (XML comes to mind, although
technically you can't encode a dict in XML).
> Case in point: are pathnames bytes objects or strings? The
> linux position is that they are bytes objects. Py3 supports both
> interpretations seemingly throughout:
>
> open(b"/bin/ls") vs open("/bin/ls")
> os.path.join(b"a", b"b") vs os.path.join("a", "b")
That's a problem that comes from the underlying file systems. If every
FS in the world worked with Unicode file names, it would be easy.
(Most would encode them onto the platters in UTF-8 or maybe UTF-16;
some might choose to use a PEP 393 or Pike string structure, with the
size_shift being a file mode just like the 'directory' bit; others
might use a limited encoding for legacy reasons, storing uppercased
CP437 on the disk, and returning an error if the desired name didn't
fit.) But since they don't, we have to cope with that. What happens if
you're running on Linux, and you have a mounted drive from an OS/2
share, and inside that, you access an aliased drive that represents a
Windows share, on which you've mounted a remote-backup share? A single
path name could have components parsed by each of those systems, so
what's its encoding? How do you handle that? There's no solution.
(Well, okay. There is a solution: don't do something so stupidly
convoluted. But there's no law against cackling admins making circular
mounts. In fact, I just mounted my own home directory as a
subdirectory under my home directory, via sshfs. I can now encrypt my
own file reads and writes exactly as many times as I choose to. I also
cackled.)
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2014-05-13 09:44 +0000 |
| Message-ID | <5371e97b$0$11109$c3e8da3@news.astraweb.com> |
| In reply to | #71453 |
On Tue, 13 May 2014 12:06:50 +0300, Marko Rauhamaa wrote:
> Chris Angelico <rosuav@gmail.com>:
>
>> These are problems that Unicode can't solve.
>
> I actually think the problem has little to do with Unicode. Text is an
> abstract data type just like any class. If I have an object (say, a
> subprocess or a dictionary) in memory, I don't expect the object to have
> any existence independently of the Python virtual machine. I have the
> same feeling about Py3 strings: they only exist inside the Python
> virtual machine.
And you would be correct. When you write them to a device (say, push them
over a network, or write them to a file) they need to be serialized. If
you're lucky, you have an API that takes a string and serializes it for
you, and then all you have to deal with is:
- am I happy with the default encoding?
- if not, what encoding do I want?
Otherwise you ought to have an API that requires bytes, not strings, and
you have to perform your own serialization by encoding it.
But abstractions leak, and this abstraction leaks because *right now*
there isn't a single serialization for text strings. There are HUNDREDS,
and sometimes you don't know which one is being used.
[...]
> What I'm saying is that strings definitely have an important application
> in the human interface. However, I feel strings might be overused in the
> Py3 API. Case in point: are pathnames bytes objects or strings?
Yes. On POSIX systems, file names are sequences of bytes, with a very few
restrictions. On recent Windows file systems (NTFS I believe?), file
names are Unicode strings encoded to UTF-16, but with a whole lot of
other restrictions imposed by the OS.
> The
> linux position is that they are bytes objects. Py3 supports both
> interpretations seemingly throughout:
>
> open(b"/bin/ls") vs open("/bin/ls") os.path.join(b"a", b"b")
> vs os.path.join("a", "b")
Because it has to, otherwise there will be files that are unreachable on
one platform or another.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| Date | 2014-05-13 11:38 +0200 |
| Message-ID | <lksp6o$lmf$1@news.albasani.net> |
| In reply to | #71451 |
On 13.05.2014 10:38, Chris Angelico wrote: >> Python 2's ambiguity allows me not to answer the tough philosophical >> questions. I'm not saying it's necessarily a good thing, but it has its >> benefits. > > It's not a good thing. It means that you have the convenience of > pretending there's no problem, which means you don't notice trouble > until something happens... and then, in all probability, your app is > in production and you have no idea why stuff went wrong. Exactly. With Py2 "strings" you never know what encoding they are, if they already have been converted or something like that. And it's very well possible to mix already converted strings with other, not yet encoded strings. What a mess! All these issues are avoided by Py3. There is a very clear distinction between strings and string representation (data bytes), which is beautiful. Accidental mixing is not possible. And you have some thing *guaranteed* for the string type which aren't guaranteed for the bytes type (for example when doing string manipulation). Regards, Johannes -- >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>
[toc] | [prev] | [next] | [standalone]
| From | Johannes Bauer <dfnsonfsduifb@gmx.de> |
|---|---|
| Date | 2014-05-13 11:46 +0200 |
| Message-ID | <lkspl2$mlc$1@news.albasani.net> |
| In reply to | #71450 |
On 13.05.2014 10:25, Marko Rauhamaa wrote: > Based on my background (network and system programming), I'm a bit > suspicious of strings, that is, text. For example, is the stuff that > goes to syslog bytes or text? Does an XML file contain bytes or > (encoded) text? The answers are not obvious to me. Modern computing is > full of ASCII-esque binary communication standards and formats. Traditional Unix programs (syslog for example) are notorious for being clear, ambiguous and/or ignorant of character encodings altogether. And this works, unfortunately, for the most time because many encodings share a common subset. If they wouldn't, the problems would be VERY apparent and people would be forced to handle the issues not so sloppily. Which is the route that Py3 chose. Don't be sloppy, make a great distinction between "text" (which handles naturally as strings) and its respective encoding. The only people who are angered by this now is people who always treated encodings sloppily and it "just worked". Well, there's a good chance it has worked by pure chance so far. It's a good thing that Python does this now more strictly as it gives developers *guarantees* about what they can and cannot do with text datatypes without having to deal with encoding issues in many places. Just one place: The interface where text is read or written, just as it should be. Regards, Johannes -- >> Wo hattest Du das Beben nochmal GENAU vorhergesagt? > Zumindest nicht öffentlich! Ah, der neueste und bis heute genialste Streich unsere großen Kosmologen: Die Geheim-Vorhersage. - Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$1@speranza.aioe.org>
[toc] | [prev] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-05-13 12:59 +0300 |
| Message-ID | <87iopac83j.fsf@elektro.pacujo.net> |
| In reply to | #71461 |
Johannes Bauer <dfnsonfsduifb@gmx.de>: > The only people who are angered by this now is people who always > treated encodings sloppily and it "just worked". Well, there's a good > chance it has worked by pure chance so far. It's a good thing that > Python does this now more strictly as it gives developers *guarantees* > about what they can and cannot do with text datatypes without having > to deal with encoding issues in many places. Just one place: The > interface where text is read or written, just as it should be. I'm not angered by text. I'm just wondering if it has any practical use that is not misuse... For example, Py3 should not make any pretense that there is a "default" encoding for strings. Locale's are an abhorrent invention from the early 8-bit days. IOW, you should never input or output text without explicit serialization. I get the feeling that Py3 would like to present a world where strings are first-class I/O objects that can exist in files, in filenames, inside pipes. You say, "text is read or written." I'm saying text is never read or written. It only exists as an abstraction (not even unicode) inside the virtual machine. Marko
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-05-13 14:30 +0100 |
| Message-ID | <mailman.9964.1399987825.18130.python-list@python.org> |
| In reply to | #71450 |
On 13/05/2014 09:38, Chris Angelico wrote: > > It's not a good thing. It means that you have the convenience of > pretending there's no problem, which means you don't notice trouble > until something happens... and then, in all probability, your app is > in production and you have no idea why stuff went wrong. > Unless you're (un)lucky enough to be working on IIRC the 1/3 of major IT projects that deliver nothing :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2014-05-13 23:37 +1000 |
| Message-ID | <mailman.9965.1399988276.18130.python-list@python.org> |
| In reply to | #71450 |
On Tue, May 13, 2014 at 11:30 PM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote: > On 13/05/2014 09:38, Chris Angelico wrote: >> >> >> It's not a good thing. It means that you have the convenience of >> pretending there's no problem, which means you don't notice trouble >> until something happens... and then, in all probability, your app is >> in production and you have no idea why stuff went wrong. >> > > Unless you're (un)lucky enough to be working on IIRC the 1/3 of major IT > projects that deliver nothing :) Been there, done that. At least, most likely so... there is a chance, albeit slim, that the boss/owner will either discover someone who'll finish the project for him, or find the time to finish it himself. I gather he's looking at ripping all my code out and replacing it with PHP of his own design, which should be fun. On the plus side, that does mean he can get any idiot straight out of a uni course to do the work; much easier than finding someone who knows Python, Pike, bash, and C++. The White King told Alice that cynicism is a disease that can be cured... but it can also be inflicted, and a promising-looking N-year project that collapses because the boss starts getting stupid with code formatting rules and then ends up firing his last remaining competent employee is a pretty effective means of instilling cynicism. ChrisA
[toc] | [prev] | [next] | [standalone]
Page 3 of 4 — ← Prev page 1 2 [3] 4 Next page →
Back to top | Article view | comp.lang.python
csiph-web