Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Marko Rauhamaa Newsgroups: comp.lang.python Subject: Re: Python 3 is killing Python Date: Wed, 16 Jul 2014 16:11:26 +0300 Organization: A noiseless patient Spider Lines: 52 Message-ID: <87sim1e9dt.fsf@elektro.pacujo.net> References: <57ajo9poljjre4c4ig0n0ss8kph8k78lp0@4ax.com> <5389cb53$0$29978$c3e8da3$5496439d@news.astraweb.com> <99b7b2a2-7521-42d7-a5a0-1a35d4d5b922@googlegroups.com> <53C4A454.9010600@gmail.com> <87zjga4j4v.fsf@elektro.pacujo.net> <53c57bae$0$9505$c3e8da3$5496439d@news.astraweb.com> <87iomy4ciy.fsf@elektro.pacujo.net> <53c5f6dc$0$9505$c3e8da3$5496439d@news.astraweb.com> <87egxl4zq8.fsf@elektro.pacujo.net> <53c62e7f$0$29897$c3e8da3$5496439d@news.astraweb.com> <871ttlfune.fsf@elektro.pacujo.net> <53c66ba8$0$9505$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: mx05.eternal-september.org; posting-host="ff5cf27ef3d5b31f034d3b72bdc27a41"; logging-data="6998"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19PtrYoC/B9h5GZh+nEhZtv" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) Cancel-Lock: sha1:nPPd1uYE8eEhnOiXQg8KgSdgrb8= sha1:pxteBKEkfj5WljIQjXYoHgOEG7s= Xref: csiph.com comp.lang.python:74552 Steven D'Aprano : > With a few exceptions, /etc is filled with text files, not binary > files, and half the executables on the system are text (Python, Perl, > bash, sh, awk, etc.). Our debate seems to stem from a different idea of what text is. To me, text in the Python sense is a sequence of UCS-4 character code points. The opposite of text is not necessarily binary. Most of those "text" files under /etc expect ASCII. In many contexts, they tolerate UTF-8 or Latin-3 or whatever, but it's a bit iffy (how are extra-ASCII passwords encoded in the /etc/shadow?). Also, the files under /etc, /var/log etc should not depend on the locale since they are typically interpreted by daemons, which typically don't possess locales. > Relatively rare. Like, um, email, news, html, Unix config files, > Windows ini files, source code in just about every language ever, > SMSes, XML, JSON, YAML, instant messenger apps, I would be especially wary of letting Python 3 interpret those files for me. Python's [text] strings could be a wonderful tool on the inside of my program, but I definitely would like to micromanage the I/O. Do I obey the locale or not? That's too big (and painful) a question for Python to answer on its own (and pretend like everything's under control). > word processors... even *graphic* applications invariably have a text > tool. Thing is, the serious text utilities like word processors probably need lots of ancillary information so Python's [text] strings might be too naive to represent even a single character. >> More often, len(b'λ') is what I want. > > Oh really? Are you sure? What exactly is b'λ'? That's something that ought to work in the UTF-8 paradise. Unfortunately, Python only allows ASCII in bytes. ASCII only! In this day and age! Even C is not so picky: #include int main() { printf("Hyvää yötä\n"); return 0; } Marko