Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #98121 > unrolled thread
| Started by | Seymore4Head <Seymore4Head@Hotmail.invalid> |
|---|---|
| First post | 2015-11-02 20:09 -0500 |
| Last post | 2015-11-03 22:15 +0000 |
| Articles | 20 on this page of 106 — 30 participants |
Back to article view | Back to comp.lang.python
Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 20:09 -0500
Re: Regular expressions MRAB <python@mrabarnett.plus.com> - 2015-11-03 01:19 +0000
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-02 20:42 -0600
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-02 22:58 -0500
Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:38 -0700
Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:33 -0800
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 19:04 -0700
Re: Regular expressions Dan Sommers <dan@tombstonezero.net> - 2015-11-04 02:55 +0000
Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:23 +1100
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 20:47 -0700
Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-04 13:27 +0000
Re: Regular expressions Nobody <nobody@nowhere.invalid> - 2015-11-04 05:05 +0000
Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-04 09:57 +0100
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:28 +1100
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 20:48 -0600
Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:03 +1100
Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-05 09:33 +0100
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 23:05 +1100
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-05 08:00 -0600
Re: Regular expressions Albert van der Horst <albert@spenarnc.xs4all.nl> - 2015-11-05 13:39 +0000
Re: Regular expressions Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-04 08:00 -0500
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-04 08:13 -0700
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:00 -0500
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:24 -0800
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:24 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:59 -0800
Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 09:18 +0100
Re: Regular expressions rurpy@yahoo.com - 2015-11-06 11:52 -0800
Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-06 21:36 +0100
Re: Regular expressions Larry Martell <larry.martell@gmail.com> - 2015-11-06 15:42 -0500
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:34 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 22:27 -0800
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:42 -0600
Re: Regular expressions Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-11-05 20:55 +1300
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:06 +1100
What does “grep” stand for? (was: Regular expressions) Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 05:24 +1100
Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 20:38 +0100
Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:42 +1100
Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 08:32 +0100
Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:00 +1100
Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 10:19 -0500
Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 18:29 +0000
Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 14:56 -0500
Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 20:19 +0000
Re: What does “grep” stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-05 20:18 -0500
Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-05 19:36 -0800
Re: What does “grep” stand for? Dan Sommers <dan@tombstonezero.net> - 2015-11-06 05:31 +0000
Re: What does “grep” stand for? William Ray Wing <wrw@mac.com> - 2015-11-06 08:25 -0500
Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-06 19:21 -0800
Re: What does ???grep??? stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-06 14:15 +0000
Re: What does ???grep??? stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-06 20:03 -0500
Re: What does “grep” stand for? (was: Regular expressions) Tim Chase <python.list@tim.thechases.com> - 2015-11-04 13:05 -0600
Re: Regular expressions Terry Reedy <tjreedy@udel.edu> - 2015-11-04 18:08 -0500
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:29 -0500
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 21:12 -0600
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 14:26 +1100
Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:48 +1100
Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 08:21 +0100
Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 19:47 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:43 -0800
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:38 -0800
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 01:52 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:13 -0800
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:33 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:42 -0800
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:26 +1100
Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:07 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:54 -0800
Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-05 10:14 +0100
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:02 -0500
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 11:54 +1100
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-05 10:07 -0500
Re: Regular expressions rurpy@yahoo.com - 2015-11-06 12:46 -0800
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-03 18:15 +1100
Re: Regular expressions Nick Sarbicki <nick.a.sarbicki@gmail.com> - 2015-11-03 08:43 +0000
Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:22 -0800
Re: Regular expressions Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-03 12:38 +0000
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:53 -0600
Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-03 10:34 -0500
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-03 11:10 -0500
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 03:20 +1100
Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:35 +1100
Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-04 12:41 +0100
Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 14:56 +0000
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 20:51 -0700
Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:33 -0700
Re: Regular expressions Robin Koch <robin.koch@t-online.de> - 2015-11-03 23:58 +0100
Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 10:25 +0100
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:50 -0600
Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 15:00 +0100
Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 17:12 +0200
Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 16:35 +0100
Re: Irregular last line in a text file, was Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 18:42 +0200
Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 10:56 -0600
Re: Irregular last line in a text file, was Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:39 +1100
Re: Irregular last line in a text file, was Re: Regular expressions Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-04 10:07 +0000
Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:33 -0600
Re: Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 18:44 +0100
Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:33 -0700
Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:39 -0700
Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 13:45 -0600
Re: Irregular last line in a text file, was Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 22:15 +0000
Page 2 of 6 — ← Prev page 1 [2] 3 4 5 6 Next page →
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-11-05 23:05 +1100 |
| Message-ID | <563b45fa$0$1593$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #98291 |
On Thu, 5 Nov 2015 07:33 pm, Peter Otten wrote: > Steven D'Aprano wrote: > >> On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote: >> >>> I tried Tim's example >>> >>> $ seq 5 | grep '1*' >>> 1 >>> 2 >>> 3 >>> 4 >>> 5 >>> $ >> >> I don't understand this. What on earth is grep matching? How does "4" >> match "1*"? > > Look for zero or more "1". Doh! Oh the shame, I knew that. Somehow I tangled myself in a knot, thinking that it had to be 1 *followed by* zero or more characters. But of course it's not a glob, it's a regex. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2015-11-05 08:00 -0600 |
| Message-ID | <mailman.52.1446732098.16136.python-list@python.org> |
| In reply to | #98297 |
On 2015-11-05 23:05, Steven D'Aprano wrote:
> Oh the shame, I knew that. Somehow I tangled myself in a knot,
> thinking that it had to be 1 *followed by* zero or more characters.
> But of course it's not a glob, it's a regex.
But that's a good reminder of fnmatch/glob modules too. Sometimes
all you need is to express a simple glob, in which case using a
regexp can cloud the clarity.
The overarching principle is to go for clarity & simplicity, rather
than favoring built-ins/glob/regex/parser modules all the time.
Want to test for presence in a string? Just use the builtin "a in b"
test. At the beginning/end? Use .startswith()/.endswith() for
clarity. Need to check if a string is purely
digits/alpha/alphanumerics/etc? Use the
string .is{alnum,alpha,decimal,digit,identifier,lower,numeric,printable,space,title,upper}
methods on the string.
For simple wild-carding, use the fnmatch module to do simple
globbing.
For more complex pattern matching, you've got regexps.
Finally, for occasions when you're searching for repeated/nested
structures, using an add-on module like pyparsing will give you
clearer code.
Oh, and with regexps, people should be less afraid of verbose
multi-line strings with commenting
r = re.compile(r"""
^ # start of the string
(?P<year>\d{4}) # capture 4 digits
- # a literal dash
(?P<month>\d{1,2}) # capture 1-2 digits
- # another literal dash
(?P<day>\d{1,2}) # capture 1-2 digits
_ # a literal underscore
(?P<accountnum> # capture the account-number
[A-Z]{1,3} # 1-3 letters
\d+ # followed by 1+ digits
)
\.txt # the extension of the file (ignored)
$ # the end of the string
""", re.VERBOSE)
They are a LOT easier to come back to if you haven't touched the code
for a year.
-tkc
[toc] | [prev] | [next] | [standalone]
| From | Albert van der Horst <albert@spenarnc.xs4all.nl> |
|---|---|
| Date | 2015-11-05 13:39 +0000 |
| Message-ID | <563b5c28$0$23757$e4fe514c@news.xs4all.nl> |
| In reply to | #98265 |
Steven D'Aprano <steve@pearwood.info> writes:
>On Wed, 4 Nov 2015 07:57 pm, Peter Otten wrote:
>> I tried Tim's example
>>
>> $ seq 5 | grep '1*'
>> 1
>> 2
>> 3
>> 4
>> 5
>> $
>I don't understand this. What on earth is grep matching? How does "4"
>match "1*"?
>> which surprised me because I remembered that there usually weren't any
>> matching lines when I invoked grep instead of egrep by mistake. So I tried
>> another one
>>
>> $ seq 5 | grep '[1-3]+'
>> $
>>
>> and then headed for the man page. Apparently there is a subset called
>> "basic regular expressions":
>>
>> """
>> Basic vs Extended Regular Expressions
>> In basic regular expressions the meta-characters ?, +, {, |, (,
>> and ) lose their special meaning; instead use the backslashed
>> versions \?, \+, \{, \|, \(, and \).
>> """
>None of this appears relevant, as the metacharacter * is not listed. So
>what's going on?
* is so fundamental that it never looses it special meaning.
Same for [ .
* means zero more of the preceeding char.
This makes + superfluous (a mere convenience) as
[1-3]+
can be expressed as
[1-3][1-3]*
Note that [1-3]* matches the empty string. This happens a lot.
Groetjes Albert
>--
>Steven
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2015-11-04 08:00 -0500 |
| Message-ID | <mailman.16.1446642069.16136.python-list@python.org> |
| In reply to | #98203 |
On Wed, 04 Nov 2015 14:23:04 +1100, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> declaimed the following:
>
>I don't even know what grep stands for.
>
As I recall, something like: General(ized) Regular Expression Parser
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | Michael Torrie <torriem@gmail.com> |
|---|---|
| Date | 2015-11-04 08:13 -0700 |
| Message-ID | <mailman.19.1446650043.16136.python-list@python.org> |
| In reply to | #98203 |
On 11/04/2015 01:57 AM, Peter Otten wrote:
> and then headed for the man page. Apparently there is a subset
> called "basic regular expressions":
>
> """> Basic vs Extended Regular Expressions
> In basic regular expressions the meta-characters ?, +, {, |, (,
> and ) lose their special meaning; instead use the backslashed
> versions \?, \+, \{, \|, \(, and \).
> """
Good catch. I think this must have been what my brain was thinking when
I commented about grep and regular expressions earlier. I checked the
man page but didn't read down far enough.
I was still technically wrong though.
It's neat to learn so much on these tangents that the python list goes
on frequently. Hope the OP is still lurking, reading all these comments,
though I suspect he's not.
[toc] | [prev] | [next] | [standalone]
| From | Seymore4Head <Seymore4Head@Hotmail.invalid> |
|---|---|
| Date | 2015-11-04 18:00 -0500 |
| Message-ID | <3f3l3bpm478nbnsec6c9tr6rre0aontkq1@4ax.com> |
| In reply to | #98231 |
On Wed, 04 Nov 2015 08:13:51 -0700, Michael Torrie <torriem@gmail.com>
wrote:
>On 11/04/2015 01:57 AM, Peter Otten wrote:
>> and then headed for the man page. Apparently there is a subset
>> called "basic regular expressions":
>>
>> """> Basic vs Extended Regular Expressions
>> In basic regular expressions the meta-characters ?, +, {, |, (,
>> and ) lose their special meaning; instead use the backslashed
>> versions \?, \+, \{, \|, \(, and \).
>> """
>
>Good catch. I think this must have been what my brain was thinking when
>I commented about grep and regular expressions earlier. I checked the
>man page but didn't read down far enough.
>
>I was still technically wrong though.
>
>It's neat to learn so much on these tangents that the python list goes
>on frequently. Hope the OP is still lurking, reading all these comments,
>though I suspect he's not.
>
I am still here, but I have to admit I am not picking up too much.
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-04 16:24 -0800 |
| Message-ID | <5e15df62-00b1-4746-83f8-c0821514d20b@googlegroups.com> |
| In reply to | #98251 |
On Wednesday, November 4, 2015 at 4:05:06 PM UTC-7, Seymore4Head wrote: >[...] > I am still here, but I have to admit I am not picking up too much. The "take away" I recommend is: the folks here are often way overly negative regarding regular expressions and that you not ignore them, but take them with a BIG grain of salt and continue learning about and using regexs. You will find they are an indispensable tool, not just in Python programming but in many aspects of computer use.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-11-05 13:24 +1100 |
| Message-ID | <563abdda$0$1614$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #98259 |
On Thu, 5 Nov 2015 11:24 am, rurpy@yahoo.com wrote: > You will find they are an indispensable tool, not just in Python > programming but in many aspects of computer use. You will find them a useful tool, but not indispensable by any means. Hint: - How many languages make arithmetic a built-in part of the language? Almost all of them. I don't know of any language that doesn't let you express something like "1 + 1" using built-in functions or syntax. Arithmetic is much closer to indispensable. - How many languages make regular expressions a built-in part of the language? Almost none of them. There's Perl, obviously, and its predecessors sed and awk, and probably a few others, but most languages relegate regular expressions to a library. - How many useful programs can be written with regexes? Clearly there are many. Some of them would even be quite difficult without regexes. (In effect, you would have to invent your own pattern-matching code.) - How many useful programs can be written without regexes? Clearly there are also many. Every time you write a Python program and fail to import re, you've written one. Can you call yourself a well-rounded programmer without at least a basic understanding of some regex library? Well, probably not. But that's part of the problem with regexes. They have, to some degree, driven out potentially better -- or at least differently bad -- pattern matching solutions, such as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or even alternative idioms, like Hypercard's "chunking" idioms. When all you have is a hammer, everything looks like a nail. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-04 21:59 -0800 |
| Message-ID | <2fd7d161-b1cb-4274-b8dc-0157916413f1@googlegroups.com> |
| In reply to | #98264 |
On 11/04/2015 07:24 PM, Steven D'Aprano wrote: > On Thu, 5 Nov 2015 11:24 am, wrote: > >> You will find they are an indispensable tool, not just in Python >> programming but in many aspects of computer use. > > You will find them a useful tool, but not indispensable by any means. > > Hint: > > - How many languages make arithmetic a built-in part of the language? Almost > all of them. I don't know of any language that doesn't let you express > something like "1 + 1" using built-in functions or syntax. Arithmetic is > much closer to indispensable. By my count there are 2377. That's counting rpn languages where it is 1 1 +. If you don't count them it is 2250. > - How many languages make regular expressions a built-in part of the > language? Almost none of them. There's Perl, obviously, and its > predecessors sed and awk, and probably a few others, but most languages > relegate regular expressions to a library. Yes, like python relegates io to a library. Clearly useful but not indispensable, after all who *really* needs anything beyond print() and input(). And that stuff in math like sin() and exp(). How many programs use that geeky trig stuff? Definitely not indispensable. In fact, now that you pointed it out to me, clearly all that stdlib stuff is dispensable, all one really needs to write "real programmer" programs is just core python. Who the hell needs "sys"! > - How many useful programs can be written with regexes? Clearly there are > many. Some of them would even be quite difficult without regexes. (In > effect, you would have to invent your own pattern-matching code.) Lucky for me then that there are regexes. > - How many useful programs can be written without regexes? Clearly there are > also many. Every time you write a Python program and fail to import re, > you've written one. By golly, you're right. Not every program I write uses regexes. Who would have thought?! However, you failed to establish that the programs I write without re are useful. > Can you call yourself a well-rounded programmer without at least a basic > understanding of some regex library? Well, probably not. But that's part of > the problem with regexes. They have, to some degree, driven out potentially > better -- or at least differently bad -- pattern matching solutions, such > as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or > even alternative idioms, like Hypercard's "chunking" idioms. Hmm, very good point. I wonder why all those "potentially better" solutions have not been more widely adopted? A conspiracy by a secret regex cabal? > When all you have is a hammer, everything looks like a nail. Lucky for us then, that we have more than just hammers! Sorry for the flippant response (well, not really) but I find your arguments pedantic beyond the point of absurdity. For me, regular expressions are indispensable in that if they were not available in Python I would not use Python. The same is true of a number of other stdlib modules. I don't give a rat's ass whether they are in a "library" that has to be explicitly requested with import or a "library" that is automatically loaded at startup.
[toc] | [prev] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-11-05 09:18 +0100 |
| Message-ID | <n1f39d$o44$1@dont-email.me> |
| In reply to | #98281 |
Am 05.11.15 um 06:59 schrieb rurpy@yahoo.com: >> Can you call yourself a well-rounded programmer without at least a basic >> understanding of some regex library? Well, probably not. But that's part of >> the problem with regexes. They have, to some degree, driven out potentially >> better -- or at least differently bad -- pattern matching solutions, such >> as (E)BNF grammars, SNOBOL pattern matching, or lowly globbing patterns. Or >> even alternative idioms, like Hypercard's "chunking" idioms. > > Hmm, very good point. I wonder why all those "potentially better" > solutions have not been more widely adopted? A conspiracy by a > secret regex cabal? I'm mostly on the pro-side of the regex discussion, but this IS a valid point. regexes are not always a good way to express a pattern, even if the pattern is regular. The point is, that you can't build them up easily piece-by-piece. Say, you want a regex like "first an international phone number, then a name, then a second phone number" - you will have to *repeat* the pattern for phone number twice. In more complex cases this can become a nightmare, like the monster that was mentioned before to validate an email. A better alternative, then, is PEG for example. You can easily write pattern <- phone_number name phone_number phone_number <- '+' [0-9]+ ( '-' [0-9]+ )* name <- [[:alpha:]]+ or something similar using a PEG parser. It has almost the same quantifiers as a Regex, is much more readable, runs in linear time over all inputs and can parse languages with the approximately the same complexity as the Knuth style parsers (LR(k) etc.), but without ambiguity. I'm really astonished that PEG parsing is not better supported in the world of computing, instead most people choose to stick to the lexer+scanner combination Finally, an anecdote from my "early" life of computing. In 1990, when I was 12 years old, I participated in an annual competition of computer science for high school students. I was learning how to program without formal training, and solved one problem where a grammar was depicted as a flowchart and the task was to write parser for it, to check the validity of input strings. The grammar is depicted here (problem 1): http://www.auriocus.de/StringKurs/RegEx/uebungen1.pdf As a 12 year old, not knowing anything about pattern recognition, but thinking I was the king, as is usual for boys in that age, I sat down and manually constructed a recursive descent parser in a BASIC like language. It had 1000 lines and took me a few weeks to get it correct. Finally the solution was accepted as working, but my participation was rejected because the solutions lacked documentation. 16 years later I used the problem for a course on string processing (that's what the PDF is for), and asked the students to solve it using regexes. My own solution consists of 67 characters, and it took me5 minutes to write it down. Admittedly, this problem is constructed, but solving similar tasks by regexes is still something that I need to do on a daily basis, when I get data from other scientists in odd formats and I need to preprocess them. I know people who use a spreadsheet and copy/paste millions of datapoints manually becasue they lack the knowledge of using such tools. Christian
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-06 11:52 -0800 |
| Message-ID | <a379f9ca-dd27-412c-a005-bfef9b9e6abc@googlegroups.com> |
| In reply to | #98290 |
On 11/05/2015 01:18 AM, Christian Gollwitzer wrote:
> Am 05.11.15 um 06:59 schrieb rurpy:
>>> Can you call yourself a well-rounded programmer without at least
>>> a basic understanding of some regex library? Well, probably not.
>>> But that's part of the problem with regexes. They have, to some
>>> degree, driven out potentially better -- or at least differently
>>> bad -- pattern matching solutions, such as (E)BNF grammars,
>>> SNOBOL pattern matching, or lowly globbing patterns. Or even
>>> alternative idioms, like Hypercard's "chunking" idioms.
>>
>> Hmm, very good point. I wonder why all those "potentially better"
>> solutions have not been more widely adopted? A conspiracy by a
>> secret regex cabal?
>
> I'm mostly on the pro-side of the regex discussion, but this IS a
> valid point. regexes are not always a good way to express a pattern,
> even if the pattern is regular. The point is, that you can't build
> them up easily piece-by-piece. Say, you want a regex like "first an
> international phone number, then a name, then a second phone number"
> - you will have to *repeat* the pattern for phone number twice. In
> more complex cases this can become a nightmare, like the monster that
> was mentioned before to validate an email.
>
> A better alternative, then, is PEG for example. You can easily write
> [...]
That is the solution adopted by Perl 6. I have always thought lexing
and parsing solutions for Python were a weak spot in the Python eco-
system and I was about to write that I would love to see a PEG parser
for python when I saw this:
http://fdik.org/pyPEG/
Unfortunately it suffers from the same problem that Pyparsing, Ply
and the rest suffer from: they use Python syntax to express the
parsing rules rather than using a dedicated problem-specific syntax
such as you used to illustrate peg parsing:
> pattern <- phone_number name phone_number phone_number <- '+' [0-9]+
> ( '-' [0-9]+ )* name <- [[:alpha:]]+
Some here have complained about excessive brevity of regexs but I
much prefer using problem-specific syntax like "(a*)" to having to
express a pattern using python with something like
star = RegexMatchAny()
a_group = RegexGroup('a' + star)
...
and I don't want to have to do something similar with PEG (or Ply
or Pyparsing) to formulate their rules.
>[...]
> As a 12 year old, not knowing anything about pattern recognition, but
> thinking I was the king, as is usual for boys in that age, I sat down
> and manually constructed a recursive descent parser in a BASIC like
> language. It had 1000 lines and took me a few weeks to get it
> correct. Finally the solution was accepted as working, but my
> participation was rejected because the solutions lacked
> documentation. 16 years later I used the problem for a course on
> string processing (that's what the PDF is for), and asked the
> students to solve it using regexes. My own solution consists of 67
> characters, and it took me5 minutes to write it down.
>
> Admittedly, this problem is constructed, but solving similar tasks by
> regexes is still something that I need to do on a daily basis, when I
> get data from other scientists in odd formats and I need to
> preprocess them. I know people who use a spreadsheet and copy/paste
> millions of datapoints manually becasue they lack the knowledge of
> using such tools.
I think in many cases those most hostile to regexes are the also
those who use them (or need to use them) the least. While my use
of regexes are limited to fairly simple ones they are complicated
enough that I'm sure it would take orders of magnitude longer
to get the same effect in python.
[toc] | [prev] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-11-06 21:36 +0100 |
| Message-ID | <n1j2s6$1qm$1@dont-email.me> |
| In reply to | #98364 |
Am 06.11.15 um 20:52 schrieb rurpy@yahoo.com:
> I have always thought lexing
> and parsing solutions for Python were a weak spot in the Python eco-
> system and I was about to write that I would love to see a PEG parser
> for python when I saw this:
>
> http://fdik.org/pyPEG/
>
> Unfortunately it suffers from the same problem that Pyparsing, Ply
> and the rest suffer from: they use Python syntax to express the
> parsing rules rather than using a dedicated problem-specific syntax
> such as you used to illustrate peg parsing:
>
>> pattern <- phone_number name phone_number
>> phone_number <- '+' [0-9]+ ( '-' [0-9]+ )*
>> name <- [[:alpha:]]+
That is actually real syntax of a parser generator used by me for
another language (Tcl). A calculator example using this package can be
found here: http://wiki.tcl.tk/39011
(actually it is a retargetable compiler in a few lines - very impressive)
And exactly as you say, it is working well exactly because it doesn't
try to abuse function composition in the frontend to construct the parser.
Looking through the parser generators listed at
http://bford.info/packrat/ it seems that waxeye could be interesting
http://waxeye.org/manual.html#_using_waxeye - however I'm not sure the
Python backend works with Python 3, maybe there will be unicode issues.
Another bonus would be a compilable backend, like Cython or similar. The
pt package mentioned above allows to generate a C module with an
interface for Tcl. Compiled parsers are approximately 100x faster. I
would expect a similar speedup for Python parsers.
> Some here have complained about excessive brevity of regexs but I
> much prefer using problem-specific syntax like "(a*)" to having to
> express a pattern using python with something like
>
> star = RegexMatchAny()
> a_group = RegexGroup('a' + star)
> ...
Yeah that is nonsense. Mechanical verbosity never leads to clarity (XML
anyone?)
> I think in many cases those most hostile to regexes are the also
> those who use them (or need to use them) the least. While my use
> of regexes are limited to fairly simple ones they are complicated
> enough that I'm sure it would take orders of magnitude longer
> to get the same effect in python.
That's also my impression. The "two problems quote" was lame already for
the first time. If you are satisfied with simple string functions, then
either you do not have problems where you need regexps/other formal
parsing tools, or you are very masochistic.
Christian
[toc] | [prev] | [next] | [standalone]
| From | Larry Martell <larry.martell@gmail.com> |
|---|---|
| Date | 2015-11-06 15:42 -0500 |
| Message-ID | <mailman.97.1446842610.16136.python-list@python.org> |
| In reply to | #98369 |
On Fri, Nov 6, 2015 at 3:36 PM, Christian Gollwitzer <auriocus@gmx.de> wrote: > Am 06.11.15 um 20:52 schrieb rurpy@yahoo.com: >> >> I have always thought lexing >> and parsing solutions for Python were a weak spot in the Python eco- >> system and I was about to write that I would love to see a PEG parser >> for python when I saw this: >> >> http://fdik.org/pyPEG/ >> >> Unfortunately it suffers from the same problem that Pyparsing, Ply >> and the rest suffer from: they use Python syntax to express the >> parsing rules rather than using a dedicated problem-specific syntax >> such as you used to illustrate peg parsing: >> >>> pattern <- phone_number name phone_number > >>> phone_number <- '+' [0-9]+ ( '-' [0-9]+ )* >>> name <- [[:alpha:]]+ > > That is actually real syntax of a parser generator used by me for another > language (Tcl). A calculator example using this package can be found here: > http://wiki.tcl.tk/39011 > (actually it is a retargetable compiler in a few lines - very impressive) Ah, Tcl - I wrote many a Tcl script back in the 80s to login to BBSs.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-11-05 11:34 +1100 |
| Message-ID | <mailman.39.1446691567.16136.python-list@python.org> |
| In reply to | #98259 |
On Thu, Nov 5, 2015 at 11:24 AM, rurpy--- via Python-list <python-list@python.org> wrote: > On Wednesday, November 4, 2015 at 4:05:06 PM UTC-7, Seymore4Head wrote: >>[...] >> I am still here, but I have to admit I am not picking up too much. > > The "take away" I recommend is: the folks here are often way > overly negative regarding regular expressions and that you not > ignore them, but take them with a BIG grain of salt and continue > learning about and using regexs. > > You will find they are an indispensable tool, not just in Python > programming but in many aspects of computer use. The "take away" that I recommend is: Rurpy loves to argue in favour of regular expressions, but as you can see from the other posts, there are alternatives, which are often FAR superior. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-04 22:27 -0800 |
| Message-ID | <7ffc8ea8-6445-4c59-b3b6-611edfbf4f62@googlegroups.com> |
| In reply to | #98267 |
On Wednesday, November 4, 2015 at 7:46:24 PM UTC-7, Chris Angelico wrote: > On Thu, Nov 5, 2015 at 11:24 AM, rurpy wrote: > The "take away" that I recommend is: Rurpy loves to argue in favour of > regular expressions, No, I don't love it, I quite dislike it. > but as you can see from the other posts, there > are alternatives, which are often FAR superior. No, not FAR superior, just preferable and just in the simple cases, regexes generally being better in anything beyond simple.
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2015-11-04 09:42 -0600 |
| Message-ID | <mailman.21.1446652375.16136.python-list@python.org> |
| In reply to | #98203 |
On 2015-11-04 09:57, Peter Otten wrote: > Well, I didn't know that grep uses regular expressions by default. It doesn't help that grep(1) comes in multiple flavors: grep: should use BRE (Basic REs) fgrep: same as "grep -F"; uses fixed strings, no REs egrep: same as "grep -E"; uses ERE (Extended REs) grep -P: a GNUism to use PCREs (Perl Compatible REs) there's also an "rgrep" which is just "grep -r" which I find kinda silly/redundant. Though frankly I feel the same way about fgrep/egrep since they just activate a command-line switch. You get even crazier when you start adding zgrep/zegrep/zfgrep. -tkc
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2015-11-05 20:55 +1300 |
| Message-ID | <da0gcbFs13sU1@mid.individual.net> |
| In reply to | #98233 |
Tim Chase wrote: > You get even crazier when you start adding zgrep/zegrep/zfgrep. It's fitting somehow that we should need an RE to describe all the possible names of the grep command. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-11-05 19:06 +1100 |
| Message-ID | <mailman.47.1446710789.16136.python-list@python.org> |
| In reply to | #98287 |
On Thu, Nov 5, 2015 at 6:55 PM, Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote: > Tim Chase wrote: > >> You get even crazier when you start adding zgrep/zegrep/zfgrep. > > > It's fitting somehow that we should need an RE > to describe all the possible names of the grep > command. Regex engine golf: Find the shortest regex that matches the names of all GNU commands which accept regular expressions, and no other commands! ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2015-11-05 05:24 +1100 |
| Subject | What does “grep” stand for? (was: Regular expressions) |
| Message-ID | <mailman.23.1446661467.16136.python-list@python.org> |
| In reply to | #98203 |
Steven D'Aprano <steve+comp.lang.python@pearwood.info> writes:
> On Wednesday 04 November 2015 13:55, Dan Sommers wrote:
>
> > Its very name indicates that its default mode most certainly is
> > regular expressions.
>
> I don't even know what grep stands for.
“grep” stands for ‘g/RE/p’.
The name is a mnemonic for a compound command in ‘ed’ [0], a text editor
that pre-dates extravagant luxuries like “presenting a full screen of
text at one time”.
In an ‘ed’ session, the user is obliged to keep mental track of the
current line in the text buffer, and even what that text contains during
the session.
Single-letter commands, with various terse parameters such as the range
of lines or some text to insert, are issued at a command prompt one
after another.
For these reasons, the manual page describes ‘ed’ as a “line-oriented
text editor”. Everything is done by specifying lines, blindly, to
commands which then operate on those lines.
The name of the ‘vi’ editor means “visual interface (to a text editor)”,
to proudly declare the innovation of a full screen of text that updates
content during the editing session. That was not available for users of
‘ed’.
A very common command to issue, then, is “actually show me the line of
text I just specified”; the ‘p’ (for “print”) command.
Another very common command is “find the text matching this pattern and
perform these commands on it”, which is ‘g’ (for “global”). The ‘g’
command addresses text matching a regular expression pattern, delimited
by slashes ‘/’.
So, for users with feeble human brains incapable of remembering
perfectly the entire content of the text while it changes and therefore
not always knowing exactly which lines they wanted to operate on without
seeing them all the time, a very frequent combination command is:
g/RE/p
meaning “find lines forward from here that match the regular expression
pattern “RE”, and do nothing to those lines except print them to
standard output”.
Wikipedia has useful pages on both ‘grep’ and ‘ed’
<URL:https://en.wikipedia.org/wiki/Grep>
<URL:https://en.wikipedia.org/wiki/Ed_%28text_editor%29>.
You can see a full specification of how the ‘ed’ interface is to behave
as part of the “Open Group Base Specifications Issue 7”, which is the
specification for Unix.
<URL:http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html>
See the manual for GNU ed which includes an example session to
appreciate just how far things have come.
<URL:https://www.gnu.org/software/ed/manual/ed_manual.html#Introduction-to-line-editing>
Of course, if you yearn for the days of minimalist purity, nothing beats
Ed, man! !man ed
[0] The standard text editor.
<URL:https://www.gnu.org/fun/jokes/ed-msg.txt>
--
\ “If you can't annoy somebody there is little point in writing.” |
`\ —Kingsley Amis |
_o__) |
Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-11-04 20:38 +0100 |
| Subject | Re: What does “grep” stand for? |
| Message-ID | <n1dmn7$fnq$1@dont-email.me> |
| In reply to | #98236 |
Am 04.11.15 um 19:24 schrieb Ben Finney: > The name is a mnemonic for a compound command in ‘ed’ [0], a text editor > that pre-dates extravagant luxuries like “presenting a full screen of > text at one time”. > > [... lots of fun facts ...] Here is another fun fact: The convincing UI of ed was actually so widely applied, that even Microsoft included a similar editor into MSDOS, called EDLIN. EDLIN, of course, was a bastardized version of ed that could do much less and also lacked regular expressions. Needless to say that the mighty "VIsual" editor was out 5 years before MSDOS shipped EDLIN as the only editor... In contrast to ed, the stream editor "sed" is used multiple times avery day in a typical Unix session inside shell scripts to perform automated text processing tasks, including regex replacement. Christian
[toc] | [prev] | [next] | [standalone]
Page 2 of 6 — ← Prev page 1 [2] 3 4 5 6 Next page →
Back to top | Article view | comp.lang.python
csiph-web