Path: csiph.com!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Marko Rauhamaa Newsgroups: comp.lang.python Subject: Re: Everything you did not want to know about Unicode in Python 3 Date: Tue, 13 May 2014 11:25:22 +0300 Organization: A noiseless patient Spider Lines: 21 Message-ID: <87tx8uccgd.fsf@elektro.pacujo.net> References: <8P7cv.78617$Sp6.8377@fx15.am4> <537172eb$0$29980$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: mx05.eternal-september.org; posting-host="ff5cf27ef3d5b31f034d3b72bdc27a41"; logging-data="32422"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19SkaM85z4uegxT8EtbJZau" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) Cancel-Lock: sha1:MCgCAKGKcjx6b2qKam+lYFeMa6s= sha1:oducHk+onzlqqRVlY/0YlH97s3I= Xref: csiph.com comp.lang.python:71450 Johannes Bauer : > Having dealt with the UTF-8 problems on Python2 I can safely say that > I never, never ever want to go back to that freaky hell. If I deal > with strings, I want to be able to sanely manipulate them and I want > to be sure that after manipulation they're still valid strings. > Manipulating the bytes representation of unicode data just doesn't > work. Based on my background (network and system programming), I'm a bit suspicious of strings, that is, text. For example, is the stuff that goes to syslog bytes or text? Does an XML file contain bytes or (encoded) text? The answers are not obvious to me. Modern computing is full of ASCII-esque binary communication standards and formats. Python 2's ambiguity allows me not to answer the tough philosophical questions. I'm not saying it's necessarily a good thing, but it has its benefits. Marko