Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #6811 > unrolled thread

Re: how to avoid leading white spaces

Started byChris Rebert <clp2@rebertia.com>
First post2011-06-01 10:11 -0700
Last post2011-06-05 04:17 -0700
Articles 4 on this page of 64 — 19 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: how to avoid leading white spaces Chris Rebert <clp2@rebertia.com> - 2011-06-01 10:11 -0700
    Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-01 12:39 -0700
      Re: how to avoid leading white spaces Karim <karim.liateni@free.fr> - 2011-06-01 22:34 +0200
      Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-02 13:21 +0000
        Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-02 21:57 -0400
          Re: how to avoid leading white spaces MRAB <python@mrabarnett.plus.com> - 2011-06-03 03:41 +0100
          Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-03 02:58 +0000
            Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-02 23:44 -0400
              Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-03 13:52 +1000
              Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-03 13:54 +1000
              Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-03 04:30 +0000
                Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-03 14:11 +0100
            Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-03 14:18 +0100
            Re: how to avoid leading white spaces Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-06-04 13:41 +1200
              Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-04 20:44 +0100
            Re: how to avoid leading white spaces Ian <hobson42@gmail.com> - 2011-06-06 22:04 +0100
              Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-09 02:32 +0000
          Re: how to avoid leading white spaces Thorsten Kampe <thorsten@thorstenkampe.de> - 2011-06-03 10:32 +0200
        Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-03 05:51 -0700
          Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-03 13:17 +0000
            Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-03 08:14 -0700
          Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-03 14:25 +0000
            Re: how to avoid leading white spaces "D'Arcy J.M. Cain" <darcy@druid.net> - 2011-06-03 10:58 -0400
            Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-03 12:29 -0700
              Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-03 20:49 +0000
                Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-03 21:45 +0000
                  Re: how to avoid leading white spaces Ethan Furman <ethan@stoneleaf.us> - 2011-06-03 15:11 -0700
                  Re: how to avoid leading white spaces MRAB <python@mrabarnett.plus.com> - 2011-06-03 23:38 +0100
                  Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-05 22:47 -0700
                Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-05 22:44 -0700
                  Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-06 16:08 +0000
                    Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 10:29 -0600
                      Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-06 17:17 +0000
                        Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 11:40 -0600
                          Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-06 17:56 +0000
                    Re: how to avoid leading white spaces Ethan Furman <ethan@stoneleaf.us> - 2011-06-06 10:48 -0700
                    Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 11:42 -0600
              Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-04 02:05 +0000
                Re: how to avoid leading white spaces MRAB <python@mrabarnett.plus.com> - 2011-06-04 03:24 +0100
                  Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-04 04:59 +0000
                Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-03 22:30 -0400
                  Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-04 05:14 +0000
                    Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-04 09:39 -0400
                      Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-05 00:44 +0000
                    Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-04 09:36 -0700
                    Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-04 21:02 +0100
                      Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-05 01:01 +0000
                  Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-04 16:04 +1000
                Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-05 23:03 -0700
                  Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-06 07:11 +0000
                    Re: how to avoid leading white spaces "Octavian Rasnita" <orasnita@gmail.com> - 2011-06-06 11:51 +0300
                    Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-06 19:01 +1000
                    Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-06 07:33 -0700
                      Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-07 11:37 -0700
                        Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-07 20:30 -0400
                          Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-08 07:38 -0700
                            Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-08 09:14 -0700
                        Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-08 01:27 -0700
                  Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-06 15:29 +0000
                    Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 10:06 -0600
                    Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-07 09:00 -0700
                      Re: how to avoid leading white spaces Duncan Booth <duncan.booth@invalid.invalid> - 2011-06-08 09:01 +0000
                        Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-08 07:39 -0700
            Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-05 04:17 -0700

Page 4 of 4 — ← Prev page 1 2 3 [4]


#7160

From"rurpy@yahoo.com" <rurpy@yahoo.com>
Date2011-06-07 09:00 -0700
Message-ID<cd321576-407f-4eee-8f5c-4f14a45927e2@l26g2000yqm.googlegroups.com>
In reply to#7087
On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
> On Sun, 05 Jun 2011 23:03:39 -0700, rurpy@yahoo.com wrote:
[...]
> I would argue that the first, non-regex solution is superior, as it
> clearly distinguishes the multiple steps of the solution:
>
> * filter lines that start with "CUSTOMER"
> * extract fields in that line
> * validate fields (not shown in your code snippet)
>
> while the regex tries to do all of these in a single command. This makes
> the regex an "all or nothing" solution: it matches *everything* or
> *nothing*. This means that your opportunity for giving meaningful error
> messages is much reduced. E.g. I'd like to give an error message like:
>
>     found digit in customer name (field 2)
>
> but with your regex, if it fails to match, I have no idea why it failed,
> so can't give any more meaningful error than:
>
>     invalid customer line
>
> and leave it to the caller to determine what makes it invalid. (Did I
> misspell "CUSTOMER"? Put a dot after the initial? Forget the code? Use
> two spaces between fields instead of one?)

I agree that is a legitimate criticism.  Its importance depends
greatly on the purpose and consumers of the code.  While such
detailed error messages might be appropriate in a fully polished
product, in my case, I often have to process files personally
to extract information, or to provide code to others (who typically
have at least some degree of technical sophistication) to do the
same.

In this case, being able to code something quickly, and adapt it
quickly to changes is more important than providing highly detailed
error messages.  The format is simple enough that "invalid customer
line" and the line number is perfectly adaquate.  YMMV.

As I said, regexes are a tool, like any tool, to be used
appropriately.

[...]
>> In addition to being wrong (loading is done once, compilation is
>> typically done once or a few times, while the regex is used many times
>> inside a loop so the overhead cost is usually trivial compared with the
>> cost of starting Python or reading a file), this is another
>> micro-optimization argument.
>
> Yes, but you have to pay the cost of loading the re engine, even if it is
> a one off cost, it's still a cost,

~$ time python -c 'pass'
real	0m0.015s
user	0m0.011s
sys	0m0.003s

~$ time python -c 'import re'
real	0m0.015s
user	0m0.011s
sys	0m0.003s

Or do you mean something else by "loading the re engine"?

> and sometimes (not always!) it can be
> significant. It's quite hard to write fast, tiny Python scripts, because
> the initialization costs of the Python environment are so high. (Not as
> high as for, say, VB or Java, but much higher than, say, shell scripts.)
> In a tiny script, you may be better off avoiding regexes because it takes
> longer to load the engine than to run the rest of your script!

Do you have an example?  I am having a hard time imagining that.
Perhaps you are thinking on the time require to compile a RE?

~$ time python -c 'import re; re.compile(r"^[^()]*(\([^()]*\)[^()]*)*
$")'
real	0m0.017s
user	0m0.014s
sys	0m0.003s

Hard to imagine a case where where 15mS is fast enough but
17mS is too slow.  And that's without the diluting effect
of actually doing some real work in the script.  Of course
a more complex regex would likely take longer.

(The times vary greatly on my machine, I am quoting the most
common lowest but not absolutely lowest results.)

>>>> (Note that "Apocalypse" is referring to a series of Perl design
>>>> documents and has nothing to do with regexes in particular.)
>>>
>>> But Apocalypse 5 specifically has everything to do with regexes. That's
>>> why I linked to that, and not (say) Apocalypse 2.
>>
>> Where did I suggest that you should have linked to Apocalypse 2? I wrote
>> what I wrote to point out that the "Apocalypse" title was not a
>> pejorative comment on regexes.  I don't see how I could have been
>> clearer.
>
> Possibly by saying what you just said here?
>
> I never suggested, or implied, or thought, that "Apocalypse" was a
> pejorative comment on *regexes*. The fact that I referenced Apocalypse
> FIVE suggests strongly that there are at least four others, presumably
> not about regexes.

Nor did I ever suggest you did.  Don't forget that you are
not the only person reading this list.  The comment was for
the benefit of others.  Perhaps you are being overly sensitive?

> [...]
>>> If regexes were more readable, as proposed by Wall, that would go a
>>> long way to reducing my suspicion of them.
>>
>> I am delighted to read that you find the new syntax more acceptable.
>
> Perhaps I wasn't as clear as I could have been. I don't know what the new
> syntax is. I was referring to the design principle of improving the
> readability of regexes. Whether Wall's new syntax actually does improve
> readability and ease of maintenance is a separate issue, one on which I
> don't have an opinion on. I applaud his *intention* to reform regex
> syntax, without necessarily agreeing that he has done so.

Thanks for clarifying.  But since you earlier wrote in response
to MRAB,
http://groups.google.com/group/comp.lang.python/msg/43f3a81d9cc75217?

  "Have you considered the suggested Perl 6 syntax? Much of
  it looks good to me."

I'm sure you can understand my confusion.

[toc] | [prev] | [next] | [standalone]


#7224

FromDuncan Booth <duncan.booth@invalid.invalid>
Date2011-06-08 09:01 +0000
Message-ID<Xns9EFE65FF11DEDduncanbooth@127.0.0.1>
In reply to#7160
"rurpy@yahoo.com" <rurpy@yahoo.com> wrote:
> On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
>> Yes, but you have to pay the cost of loading the re engine, even if
>> it is a one off cost, it's still a cost,
> 
> ~$ time python -c 'pass'
> real     0m0.015s
> user     0m0.011s
> sys     0m0.003s
> 
> ~$ time python -c 'import re'
> real     0m0.015s
> user     0m0.011s
> sys     0m0.003s
> 
> Or do you mean something else by "loading the re engine"?

At least part of the reason that there's no difference there is that the 
're' module was imported in both cases:

C:\Python27>python -c "import sys; print('re' in sys.modules)"
True

C:\Python32>python -c "import sys; print('re' in sys.modules)"
True

Steven is right to assert that there's a cost to loading it, but unless you 
jump through hoops it's not a cost you can avoid paying and still use 
Python.

-- 
Duncan Booth http://kupuguy.blogspot.com

[toc] | [prev] | [next] | [standalone]


#7239

From"rurpy@yahoo.com" <rurpy@yahoo.com>
Date2011-06-08 07:39 -0700
Message-ID<fe1574e0-439e-443f-9cea-16445b6db8c3@dq9g2000vbb.googlegroups.com>
In reply to#7224
On 06/08/2011 03:01 AM, Duncan Booth wrote:
> "rurpy@yahoo.com" <rurpy@yahoo.com> wrote:
>> On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
>>> Yes, but you have to pay the cost of loading the re engine, even if
>>> it is a one off cost, it's still a cost,
[...]
> At least part of the reason that there's no difference there is that the
> 're' module was imported in both cases:

Quite right.  I should have thought of that.

[...]
> Steven is right to assert that there's a cost to loading it, but unless you
> jump through hoops it's not a cost you can avoid paying and still use
> Python.

I would say that it is effectively zero cost then.

[toc] | [prev] | [next] | [standalone]


#7043

Fromrusi <rustompmody@gmail.com>
Date2011-06-05 04:17 -0700
Message-ID<b9ebc697-9db3-4a2c-a3f3-c997c68e3d98@q12g2000prb.googlegroups.com>
In reply to#6946
On Jun 3, 7:25 pm, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:

> Regarding their syntax, I'd like to point out that even Larry Wall is
> dissatisfied with regex culture in the Perl community:
>
> http://www.perl.com/pub/2002/06/04/apo5.html

This is a very good link.
And it can be a starting point for python to leapfrog over perl.
After all for perl changing the regex syntax/semantics means deep
surgery to the language. For python its just another module.

In particular, there is something that is possible and easy today that
was not conceivable 20 years ago -- using unicode.
Much of the regex problem stems from what LW calls 'poor huffman
coding'
And much of that is thanks to the fact that regexes need different
kinds of grouping but the hegemony of ASCII has forced a
multicharacter rendering for most of those.

A snip from the article:

----------------------------------
Consider these constructs:

    (??{...})
    (?{...})
    (?#...)
    (?:...)
    (?i:...)
    (?=...)
    (?!...)
    (?<=...)
    (?<!...)
    (?>...)
    (?(...)...|...)

These all look quite similar, but some of them do radically different
things. In particular, the (?<...) does not mean the opposite of the (?
>...). The underlying visual problem is the overuse of parentheses, as
in Lisp. Programs are more readable if different things look
different.

----------------------------------
Some parenthesis usage shown here
http://xahlee.blogspot.com/2011/05/use-of-unicode-matching-brackets-as.html

[toc] | [prev] | [standalone]


Page 4 of 4 — ← Prev page 1 2 3 [4]

Back to top | Article view | comp.lang.python


csiph-web