Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #7267

Re: how to avoid leading white spaces

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!feeder.news-service.com!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!usenet-fr.net!feeder1-2.proxad.net!proxad.net!feeder2-2.proxad.net!nx02.iad01.newshosting.com!newshosting.com!69.16.185.16.MISMATCH!npeer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!spln!extra.newsguy.com!newsp.newsguy.com!not-for-mail
From Chris Torek <nospam@torek.net>
Newsgroups comp.lang.python
Subject Re: how to avoid leading white spaces
Date 9 Jun 2011 02:32:08 GMT
Organization None of the Above
Lines 36
Message-ID <ispbb8024r6@news2.newsguy.com> (permalink)
References <BANLkTikjY3U9Y24s-GOEyi8CNqCFLXuG6g@mail.gmail.com> <roy-E2FA6F.21571602062011@news.panix.com> <is9ikg083h@news1.newsguy.com> <mailman.2508.1307394262.9059.python-list@python.org>
NNTP-Posting-Host pc577d8fa08f4f5d9fa8bd94c28e96b05add6a5e66298286c.newsdawg.com
X-Newsreader trn 4.0-test76 (Apr 2, 2001)
Originator torek@elf.torek.net (Chris Torek)
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:7267

Show key headers only | View raw


>On 03/06/2011 03:58, Chris Torek wrote:
>>> -------------------------------------------------
>> This is a bit surprising, since both "s1 in s2" and re.search()
>> could use a Boyer-Moore-based algorithm for a sufficiently-long
>> fixed string, and the time required should be proportional to that
>> needed to set up the skip table.  The re.compile() gets to re-use
>> the table every time.

In article <mailman.2508.1307394262.9059.python-list@python.org>
Ian  <hobson42@gmail.com> wrote:
>Is that true?  My immediate thought is that Boyer-Moore would quickly give
>the number of characters to skip, but skipping them would be slow because
>UTF8 encoded characters are variable sized, and the string would have to be
>walked anyway.

As I understand it, strings in python 3 are Unicode internally and
(apparently) use wchar_t.  Byte strings in python 3 are of course
byte strings, not UTF-8 encoded.

>Or am I misunderstanding something.

Here's python 2.7 on a Linux box:

    >>> print sys.getsizeof('a'), sys.getsizeof('ab'), sys.getsizeof('abc')
    38 39 40
    >>> print sys.getsizeof(u'a'), sys.getsizeof(u'ab'), sys.getsizeof(u'abc')
    56 60 64

This implies that strings in Python 2.x are just byte strings (same
as b"..." in Python 3.x) and never actually contain unicode; and
unicode strings (same as "..." in Python 3.x) use 4-byte "characters"
per that box's wchar_t.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Re: how to avoid leading white spaces Chris Rebert <clp2@rebertia.com> - 2011-06-01 10:11 -0700
  Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-01 12:39 -0700
    Re: how to avoid leading white spaces Karim <karim.liateni@free.fr> - 2011-06-01 22:34 +0200
    Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-02 13:21 +0000
      Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-02 21:57 -0400
        Re: how to avoid leading white spaces MRAB <python@mrabarnett.plus.com> - 2011-06-03 03:41 +0100
        Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-03 02:58 +0000
          Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-02 23:44 -0400
            Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-03 13:52 +1000
            Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-03 13:54 +1000
            Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-03 04:30 +0000
              Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-03 14:11 +0100
          Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-03 14:18 +0100
          Re: how to avoid leading white spaces Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-06-04 13:41 +1200
            Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-04 20:44 +0100
          Re: how to avoid leading white spaces Ian <hobson42@gmail.com> - 2011-06-06 22:04 +0100
            Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-09 02:32 +0000
        Re: how to avoid leading white spaces Thorsten Kampe <thorsten@thorstenkampe.de> - 2011-06-03 10:32 +0200
      Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-03 05:51 -0700
        Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-03 13:17 +0000
          Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-03 08:14 -0700
        Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-03 14:25 +0000
          Re: how to avoid leading white spaces "D'Arcy J.M. Cain" <darcy@druid.net> - 2011-06-03 10:58 -0400
          Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-03 12:29 -0700
            Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-03 20:49 +0000
              Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-03 21:45 +0000
                Re: how to avoid leading white spaces Ethan Furman <ethan@stoneleaf.us> - 2011-06-03 15:11 -0700
                Re: how to avoid leading white spaces MRAB <python@mrabarnett.plus.com> - 2011-06-03 23:38 +0100
                Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-05 22:47 -0700
              Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-05 22:44 -0700
                Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-06 16:08 +0000
                Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 10:29 -0600
                Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-06 17:17 +0000
                Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 11:40 -0600
                Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-06 17:56 +0000
                Re: how to avoid leading white spaces Ethan Furman <ethan@stoneleaf.us> - 2011-06-06 10:48 -0700
                Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 11:42 -0600
            Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-04 02:05 +0000
              Re: how to avoid leading white spaces MRAB <python@mrabarnett.plus.com> - 2011-06-04 03:24 +0100
                Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-04 04:59 +0000
              Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-03 22:30 -0400
                Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-04 05:14 +0000
                Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-04 09:39 -0400
                Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-05 00:44 +0000
                Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-04 09:36 -0700
                Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-04 21:02 +0100
                Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-05 01:01 +0000
                Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-04 16:04 +1000
              Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-05 23:03 -0700
                Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-06 07:11 +0000
                Re: how to avoid leading white spaces "Octavian Rasnita" <orasnita@gmail.com> - 2011-06-06 11:51 +0300
                Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-06 19:01 +1000
                Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-06 07:33 -0700
                Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-07 11:37 -0700
                Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-07 20:30 -0400
                Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-08 07:38 -0700
                Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-08 09:14 -0700
                Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-08 01:27 -0700
                Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-06 15:29 +0000
                Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 10:06 -0600
                Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-07 09:00 -0700
                Re: how to avoid leading white spaces Duncan Booth <duncan.booth@invalid.invalid> - 2011-06-08 09:01 +0000
                Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-08 07:39 -0700
          Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-05 04:17 -0700

csiph-web