Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #6811 > unrolled thread
| Started by | Chris Rebert <clp2@rebertia.com> |
|---|---|
| First post | 2011-06-01 10:11 -0700 |
| Last post | 2011-06-05 04:17 -0700 |
| Articles | 4 on this page of 64 — 19 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: how to avoid leading white spaces Chris Rebert <clp2@rebertia.com> - 2011-06-01 10:11 -0700
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-01 12:39 -0700
Re: how to avoid leading white spaces Karim <karim.liateni@free.fr> - 2011-06-01 22:34 +0200
Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-02 13:21 +0000
Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-02 21:57 -0400
Re: how to avoid leading white spaces MRAB <python@mrabarnett.plus.com> - 2011-06-03 03:41 +0100
Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-03 02:58 +0000
Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-02 23:44 -0400
Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-03 13:52 +1000
Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-03 13:54 +1000
Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-03 04:30 +0000
Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-03 14:11 +0100
Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-03 14:18 +0100
Re: how to avoid leading white spaces Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-06-04 13:41 +1200
Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-04 20:44 +0100
Re: how to avoid leading white spaces Ian <hobson42@gmail.com> - 2011-06-06 22:04 +0100
Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-09 02:32 +0000
Re: how to avoid leading white spaces Thorsten Kampe <thorsten@thorstenkampe.de> - 2011-06-03 10:32 +0200
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-03 05:51 -0700
Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-03 13:17 +0000
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-03 08:14 -0700
Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-03 14:25 +0000
Re: how to avoid leading white spaces "D'Arcy J.M. Cain" <darcy@druid.net> - 2011-06-03 10:58 -0400
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-03 12:29 -0700
Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-03 20:49 +0000
Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-03 21:45 +0000
Re: how to avoid leading white spaces Ethan Furman <ethan@stoneleaf.us> - 2011-06-03 15:11 -0700
Re: how to avoid leading white spaces MRAB <python@mrabarnett.plus.com> - 2011-06-03 23:38 +0100
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-05 22:47 -0700
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-05 22:44 -0700
Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-06 16:08 +0000
Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 10:29 -0600
Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-06 17:17 +0000
Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 11:40 -0600
Re: how to avoid leading white spaces Neil Cerutti <neilc@norwich.edu> - 2011-06-06 17:56 +0000
Re: how to avoid leading white spaces Ethan Furman <ethan@stoneleaf.us> - 2011-06-06 10:48 -0700
Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 11:42 -0600
Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-04 02:05 +0000
Re: how to avoid leading white spaces MRAB <python@mrabarnett.plus.com> - 2011-06-04 03:24 +0100
Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-04 04:59 +0000
Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-03 22:30 -0400
Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-04 05:14 +0000
Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-04 09:39 -0400
Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-05 00:44 +0000
Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-04 09:36 -0700
Re: how to avoid leading white spaces Nobody <nobody@nowhere.com> - 2011-06-04 21:02 +0100
Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-05 01:01 +0000
Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-04 16:04 +1000
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-05 23:03 -0700
Re: how to avoid leading white spaces Chris Torek <nospam@torek.net> - 2011-06-06 07:11 +0000
Re: how to avoid leading white spaces "Octavian Rasnita" <orasnita@gmail.com> - 2011-06-06 11:51 +0300
Re: how to avoid leading white spaces Chris Angelico <rosuav@gmail.com> - 2011-06-06 19:01 +1000
Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-06 07:33 -0700
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-07 11:37 -0700
Re: how to avoid leading white spaces Roy Smith <roy@panix.com> - 2011-06-07 20:30 -0400
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-08 07:38 -0700
Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-08 09:14 -0700
Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-08 01:27 -0700
Re: how to avoid leading white spaces Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-06 15:29 +0000
Re: how to avoid leading white spaces Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-06 10:06 -0600
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-07 09:00 -0700
Re: how to avoid leading white spaces Duncan Booth <duncan.booth@invalid.invalid> - 2011-06-08 09:01 +0000
Re: how to avoid leading white spaces "rurpy@yahoo.com" <rurpy@yahoo.com> - 2011-06-08 07:39 -0700
Re: how to avoid leading white spaces rusi <rustompmody@gmail.com> - 2011-06-05 04:17 -0700
Page 4 of 4 — ← Prev page 1 2 3 [4]
| From | "rurpy@yahoo.com" <rurpy@yahoo.com> |
|---|---|
| Date | 2011-06-07 09:00 -0700 |
| Message-ID | <cd321576-407f-4eee-8f5c-4f14a45927e2@l26g2000yqm.googlegroups.com> |
| In reply to | #7087 |
On 06/06/2011 09:29 AM, Steven D'Aprano wrote: > On Sun, 05 Jun 2011 23:03:39 -0700, rurpy@yahoo.com wrote: [...] > I would argue that the first, non-regex solution is superior, as it > clearly distinguishes the multiple steps of the solution: > > * filter lines that start with "CUSTOMER" > * extract fields in that line > * validate fields (not shown in your code snippet) > > while the regex tries to do all of these in a single command. This makes > the regex an "all or nothing" solution: it matches *everything* or > *nothing*. This means that your opportunity for giving meaningful error > messages is much reduced. E.g. I'd like to give an error message like: > > found digit in customer name (field 2) > > but with your regex, if it fails to match, I have no idea why it failed, > so can't give any more meaningful error than: > > invalid customer line > > and leave it to the caller to determine what makes it invalid. (Did I > misspell "CUSTOMER"? Put a dot after the initial? Forget the code? Use > two spaces between fields instead of one?) I agree that is a legitimate criticism. Its importance depends greatly on the purpose and consumers of the code. While such detailed error messages might be appropriate in a fully polished product, in my case, I often have to process files personally to extract information, or to provide code to others (who typically have at least some degree of technical sophistication) to do the same. In this case, being able to code something quickly, and adapt it quickly to changes is more important than providing highly detailed error messages. The format is simple enough that "invalid customer line" and the line number is perfectly adaquate. YMMV. As I said, regexes are a tool, like any tool, to be used appropriately. [...] >> In addition to being wrong (loading is done once, compilation is >> typically done once or a few times, while the regex is used many times >> inside a loop so the overhead cost is usually trivial compared with the >> cost of starting Python or reading a file), this is another >> micro-optimization argument. > > Yes, but you have to pay the cost of loading the re engine, even if it is > a one off cost, it's still a cost, ~$ time python -c 'pass' real 0m0.015s user 0m0.011s sys 0m0.003s ~$ time python -c 'import re' real 0m0.015s user 0m0.011s sys 0m0.003s Or do you mean something else by "loading the re engine"? > and sometimes (not always!) it can be > significant. It's quite hard to write fast, tiny Python scripts, because > the initialization costs of the Python environment are so high. (Not as > high as for, say, VB or Java, but much higher than, say, shell scripts.) > In a tiny script, you may be better off avoiding regexes because it takes > longer to load the engine than to run the rest of your script! Do you have an example? I am having a hard time imagining that. Perhaps you are thinking on the time require to compile a RE? ~$ time python -c 'import re; re.compile(r"^[^()]*(\([^()]*\)[^()]*)* $")' real 0m0.017s user 0m0.014s sys 0m0.003s Hard to imagine a case where where 15mS is fast enough but 17mS is too slow. And that's without the diluting effect of actually doing some real work in the script. Of course a more complex regex would likely take longer. (The times vary greatly on my machine, I am quoting the most common lowest but not absolutely lowest results.) >>>> (Note that "Apocalypse" is referring to a series of Perl design >>>> documents and has nothing to do with regexes in particular.) >>> >>> But Apocalypse 5 specifically has everything to do with regexes. That's >>> why I linked to that, and not (say) Apocalypse 2. >> >> Where did I suggest that you should have linked to Apocalypse 2? I wrote >> what I wrote to point out that the "Apocalypse" title was not a >> pejorative comment on regexes. I don't see how I could have been >> clearer. > > Possibly by saying what you just said here? > > I never suggested, or implied, or thought, that "Apocalypse" was a > pejorative comment on *regexes*. The fact that I referenced Apocalypse > FIVE suggests strongly that there are at least four others, presumably > not about regexes. Nor did I ever suggest you did. Don't forget that you are not the only person reading this list. The comment was for the benefit of others. Perhaps you are being overly sensitive? > [...] >>> If regexes were more readable, as proposed by Wall, that would go a >>> long way to reducing my suspicion of them. >> >> I am delighted to read that you find the new syntax more acceptable. > > Perhaps I wasn't as clear as I could have been. I don't know what the new > syntax is. I was referring to the design principle of improving the > readability of regexes. Whether Wall's new syntax actually does improve > readability and ease of maintenance is a separate issue, one on which I > don't have an opinion on. I applaud his *intention* to reform regex > syntax, without necessarily agreeing that he has done so. Thanks for clarifying. But since you earlier wrote in response to MRAB, http://groups.google.com/group/comp.lang.python/msg/43f3a81d9cc75217? "Have you considered the suggested Perl 6 syntax? Much of it looks good to me." I'm sure you can understand my confusion.
[toc] | [prev] | [next] | [standalone]
| From | Duncan Booth <duncan.booth@invalid.invalid> |
|---|---|
| Date | 2011-06-08 09:01 +0000 |
| Message-ID | <Xns9EFE65FF11DEDduncanbooth@127.0.0.1> |
| In reply to | #7160 |
"rurpy@yahoo.com" <rurpy@yahoo.com> wrote:
> On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
>> Yes, but you have to pay the cost of loading the re engine, even if
>> it is a one off cost, it's still a cost,
>
> ~$ time python -c 'pass'
> real 0m0.015s
> user 0m0.011s
> sys 0m0.003s
>
> ~$ time python -c 'import re'
> real 0m0.015s
> user 0m0.011s
> sys 0m0.003s
>
> Or do you mean something else by "loading the re engine"?
At least part of the reason that there's no difference there is that the
're' module was imported in both cases:
C:\Python27>python -c "import sys; print('re' in sys.modules)"
True
C:\Python32>python -c "import sys; print('re' in sys.modules)"
True
Steven is right to assert that there's a cost to loading it, but unless you
jump through hoops it's not a cost you can avoid paying and still use
Python.
--
Duncan Booth http://kupuguy.blogspot.com
[toc] | [prev] | [next] | [standalone]
| From | "rurpy@yahoo.com" <rurpy@yahoo.com> |
|---|---|
| Date | 2011-06-08 07:39 -0700 |
| Message-ID | <fe1574e0-439e-443f-9cea-16445b6db8c3@dq9g2000vbb.googlegroups.com> |
| In reply to | #7224 |
On 06/08/2011 03:01 AM, Duncan Booth wrote: > "rurpy@yahoo.com" <rurpy@yahoo.com> wrote: >> On 06/06/2011 09:29 AM, Steven D'Aprano wrote: >>> Yes, but you have to pay the cost of loading the re engine, even if >>> it is a one off cost, it's still a cost, [...] > At least part of the reason that there's no difference there is that the > 're' module was imported in both cases: Quite right. I should have thought of that. [...] > Steven is right to assert that there's a cost to loading it, but unless you > jump through hoops it's not a cost you can avoid paying and still use > Python. I would say that it is effectively zero cost then.
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2011-06-05 04:17 -0700 |
| Message-ID | <b9ebc697-9db3-4a2c-a3f3-c997c68e3d98@q12g2000prb.googlegroups.com> |
| In reply to | #6946 |
On Jun 3, 7:25 pm, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> Regarding their syntax, I'd like to point out that even Larry Wall is
> dissatisfied with regex culture in the Perl community:
>
> http://www.perl.com/pub/2002/06/04/apo5.html
This is a very good link.
And it can be a starting point for python to leapfrog over perl.
After all for perl changing the regex syntax/semantics means deep
surgery to the language. For python its just another module.
In particular, there is something that is possible and easy today that
was not conceivable 20 years ago -- using unicode.
Much of the regex problem stems from what LW calls 'poor huffman
coding'
And much of that is thanks to the fact that regexes need different
kinds of grouping but the hegemony of ASCII has forced a
multicharacter rendering for most of those.
A snip from the article:
----------------------------------
Consider these constructs:
(??{...})
(?{...})
(?#...)
(?:...)
(?i:...)
(?=...)
(?!...)
(?<=...)
(?<!...)
(?>...)
(?(...)...|...)
These all look quite similar, but some of them do radically different
things. In particular, the (?<...) does not mean the opposite of the (?
>...). The underlying visual problem is the overuse of parentheses, as
in Lisp. Programs are more readable if different things look
different.
----------------------------------
Some parenthesis usage shown here
http://xahlee.blogspot.com/2011/05/use-of-unicode-matching-brackets-as.html
[toc] | [prev] | [standalone]
Page 4 of 4 — ← Prev page 1 2 3 [4]
Back to top | Article view | comp.lang.python
csiph-web