Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #53880

Re: Chardet, file, ... and the Flexible String Representation

Path csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <torriem+gmail@torriefamily.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.004
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'programmer': 0.03; 'algorithm': 0.04; 'encoding': 0.05; 'subject:file': 0.07; 'string': 0.09; "'a'": 0.09; 'arguments': 0.09; 'complicate': 0.09; 'expected.': 0.09; 'methods,': 0.09; 'pointless': 0.09; 'spec': 0.09; 'random': 0.14; '"a"': 0.16; 'encoding.': 0.16; 'expecting': 0.16; 'from:addr:torriem': 0.16; 'from:name:michael torrie': 0.16; 'subject:String': 0.16; 'unicode.': 0.16; 'comment:': 0.16; 'index': 0.16; 'wrote:': 0.18; 'trying': 0.19; 'basically': 0.19; 'properly': 0.19; '8bit%:5': 0.22; 'header :User-Agent:1': 0.23; 'byte': 0.24; 'unicode': 0.24; 'shown': 0.26; 'header:In-Reply-To:1': 0.27; 'point': 0.28; 'am,': 0.29; 'character': 0.29; "doesn't": 0.30; '>>>>': 0.31; 'concern': 0.31; 'consequence': 0.31; 'waters': 0.31; 'supposed': 0.32; 'subject:the': 0.34; 'problem': 0.35; 'something': 0.35; 'but': 0.35; 'there': 0.35; 'really': 0.36; 'earth': 0.36; 'doing': 0.36; 'should': 0.36; 'behind': 0.37; 'clear': 0.37; 'message- id:@gmail.com': 0.38; 'work?': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'either': 0.39; 'received:org': 0.40; 'even': 0.60; 'length': 0.61; 'full': 0.61; 'skip:* 10': 0.61; 'course': 0.61; "you're": 0.61; 'email addr:gmail.com': 0.63; 'such': 0.63; 'happen': 0.63; 'charset:windows-1252': 0.65; 'side': 0.67; 'between': 0.67; 'operated': 0.74; 'yourself': 0.78; 'effects,': 0.84; 'scenes': 0.84; 'whereas': 0.91; 'differences': 0.93; 'period.': 0.95
X-Virus-Scanned amavisd-new at torriefamily.org
Date Mon, 09 Sep 2013 11:05:44 -0600
From Michael Torrie <torriem@gmail.com>
User-Agent Mozilla/5.0 (X11; Linux i686; rv:10.0.12) Gecko/20130105 Thunderbird/10.0.12
MIME-Version 1.0
To python-list@python.org
Subject Re: Chardet, file, ... and the Flexible String Representation
References <4ce85ea8-4a4c-46cf-a546-ad999576a5f7@googlegroups.com> <m2a9jqq7g9.fsf@cochabamba.vanoostrum.org> <04abbe99-ca1e-40b5-86c7-64b0e5d9de9c@googlegroups.com>
In-Reply-To <04abbe99-ca1e-40b5-86c7-64b0e5d9de9c@googlegroups.com>
Content-Type text/plain; charset=windows-1252
Content-Transfer-Encoding 8bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.187.1378746353.5461.python-list@python.org> (permalink)
Lines 41
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1378746353 news.xs4all.nl 15864 [2001:888:2000:d::a6]:33208
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:53880

Show key headers only | View raw


On 09/09/2013 08:28 AM, wxjmfauth@gmail.com wrote:
> Comment: Such differences never happen with utf.

But with utf, slicing strings is O(n) (well that's a simplification as
someone showed an algorithm that is log n), whereas a fixed-width
encoding (Latin-1, UCS-2, UCS-4) is O(1).  Do you understand what this
means?

> Complicate and full of side effects, eg :
> 
>>>> sys.getsizeof('a')
> 26
>>>> sys.getsizeof('aé')
> 39

Why on earth are you doing getsizeof?  What are you expecting to prove?
 Why are you even trying to concern yourself with implementation
details?  As a programmer you should deal with unicode.  Period.  All
you should care about is that you can properly index or slice a unicode
string and that unicode strings can be operated on at a reasonable speed.

IE string[4] should give you the character at position 4.  len(string)
should return the length of the string in *characters*.

The byte encoding used behind the scenes is of no consequence other than
speed (and you have not shown any problem with speed).

> 
> Is not a latin-1 "é" supposed to count as a latin-1 "a" ?

Of course it does.  'aé'[0] == 'a' and 'aé'[1] == 'é'.  len('aé') returns 2.

> I picked up random methods, there may be variations, basically
> this general behaviour is always expected.

Eh?  Can you point to something in the unicode spec that doesn't work?

I don't even know that much about unicode yet it's clear you're either
deliberately muddying the waters with your stupid and pointless
arguments against FCS or you don't really understand the difference
between unicode and byte encoding.  Which is it?

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Chardet, file, ... and the Flexible String Representation wxjmfauth@gmail.com - 2013-09-06 02:11 -0700
  Re: Chardet, file, ... and the Flexible String Representation Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-06 10:57 +0000
  Re: Chardet, file, ... and the Flexible String Representation Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-09-06 13:10 +0200
  Re: Chardet, file, ... and the Flexible String Representation Ned Batchelder <ned@nedbatchelder.com> - 2013-09-06 07:02 -0400
  Re: Chardet, file, ... and the Flexible String Representation Piet van Oostrum <piet@vanoostrum.org> - 2013-09-06 11:46 -0400
    Re: Chardet, file, ... and the Flexible String Representation Chris Angelico <rosuav@gmail.com> - 2013-09-07 02:04 +1000
    Re: Chardet, file, ... and the Flexible String Representation random832@fastmail.us - 2013-09-06 12:59 -0400
    Re: Chardet, file, ... and the Flexible String Representation Chris Angelico <rosuav@gmail.com> - 2013-09-07 03:04 +1000
    Re: Chardet, file, ... and the Flexible String Representation wxjmfauth@gmail.com - 2013-09-09 07:28 -0700
      Re: Chardet, file, ... and the Flexible String Representation Ned Batchelder <ned@nedbatchelder.com> - 2013-09-09 12:38 -0400
      Re: Chardet, file, ... and the Flexible String Representation Michael Torrie <torriem@gmail.com> - 2013-09-09 11:05 -0600
        Re: Chardet, file, ... and the Flexible String Representation Steven D'Aprano <steve@pearwood.info> - 2013-09-10 04:58 +0000
      Re: Chardet, file, ... and the Flexible String Representation Terry Reedy <tjreedy@udel.edu> - 2013-09-09 16:47 -0400
      Re: Chardet, file, ... and the Flexible String Representation random832@fastmail.us - 2013-09-10 11:36 -0400
    Re: Chardet, file, ... and the Flexible String Representation random832@fastmail.us - 2013-09-09 14:34 -0400
    Re: Chardet, file, ... and the Flexible String Representation Ian Kelly <ian.g.kelly@gmail.com> - 2013-09-09 13:03 -0600
    Re: Chardet, file, ... and the Flexible String Representation random832@fastmail.us - 2013-09-09 15:27 -0400
    Re: Chardet, file, ... and the Flexible String Representation Serhiy Storchaka <storchaka@gmail.com> - 2013-09-12 00:11 +0300

csiph-web