Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #98121 > unrolled thread

Regular expressions

Started bySeymore4Head <Seymore4Head@Hotmail.invalid>
First post2015-11-02 20:09 -0500
Last post2015-11-03 22:15 +0000
Articles 20 on this page of 106 — 30 participants

Back to article view | Back to comp.lang.python


Contents

  Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 20:09 -0500
    Re: Regular expressions MRAB <python@mrabarnett.plus.com> - 2015-11-03 01:19 +0000
      Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
    Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-02 20:42 -0600
      Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
        Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-02 22:58 -0500
          Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
            Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:38 -0700
              Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:33 -0800
                Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 19:04 -0700
                  Re: Regular expressions Dan Sommers <dan@tombstonezero.net> - 2015-11-04 02:55 +0000
                    Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:23 +1100
                      Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 20:47 -0700
                        Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-04 13:27 +0000
                      Re: Regular expressions Nobody <nobody@nowhere.invalid> - 2015-11-04 05:05 +0000
                      Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-04 09:57 +0100
                        Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:28 +1100
                          Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 20:48 -0600
                          Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:03 +1100
                          Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-05 09:33 +0100
                            Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 23:05 +1100
                              Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-05 08:00 -0600
                          Re: Regular expressions Albert van der Horst <albert@spenarnc.xs4all.nl> - 2015-11-05 13:39 +0000
                      Re: Regular expressions Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-04 08:00 -0500
                      Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-04 08:13 -0700
                        Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:00 -0500
                          Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:24 -0800
                            Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:24 +1100
                              Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:59 -0800
                                Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 09:18 +0100
                                  Re: Regular expressions rurpy@yahoo.com - 2015-11-06 11:52 -0800
                                    Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-06 21:36 +0100
                                      Re: Regular expressions Larry Martell <larry.martell@gmail.com> - 2015-11-06 15:42 -0500
                            Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:34 +1100
                              Re: Regular expressions rurpy@yahoo.com - 2015-11-04 22:27 -0800
                      Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:42 -0600
                        Re: Regular expressions Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-11-05 20:55 +1300
                          Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:06 +1100
                      What does “grep” stand for? (was: Regular expressions) Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 05:24 +1100
                        Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 20:38 +0100
                          Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:42 +1100
                            Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 08:32 +0100
                              Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:00 +1100
                          Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 10:19 -0500
                            Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 18:29 +0000
                              Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 14:56 -0500
                                Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 20:19 +0000
                                  Re: What does “grep” stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-05 20:18 -0500
                                    Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-05 19:36 -0800
                                      Re: What does “grep” stand for? Dan Sommers <dan@tombstonezero.net> - 2015-11-06 05:31 +0000
                                      Re: What does “grep” stand for? William Ray Wing <wrw@mac.com> - 2015-11-06 08:25 -0500
                                        Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-06 19:21 -0800
                                    Re: What does ???grep??? stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-06 14:15 +0000
                                      Re: What does ???grep??? stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-06 20:03 -0500
                      Re: What does “grep” stand for? (was: Regular expressions) Tim Chase <python.list@tim.thechases.com> - 2015-11-04 13:05 -0600
                      Re: Regular expressions Terry Reedy <tjreedy@udel.edu> - 2015-11-04 18:08 -0500
                        Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:29 -0500
                Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 21:12 -0600
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 14:26 +1100
                Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:48 +1100
                  Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 08:21 +0100
                    Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 19:47 +1100
                      Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:43 -0800
                  Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:38 -0800
                    Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 01:52 +1100
                      Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:13 -0800
                        Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:33 +1100
                          Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:42 -0800
                        Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:26 +1100
                          Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:07 +1100
                          Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:54 -0800
                        Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-05 10:14 +0100
                  Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:02 -0500
                    Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 11:54 +1100
                      Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-05 10:07 -0500
                        Re: Regular expressions rurpy@yahoo.com - 2015-11-06 12:46 -0800
            Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-03 18:15 +1100
              Re: Regular expressions Nick Sarbicki <nick.a.sarbicki@gmail.com> - 2015-11-03 08:43 +0000
              Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:22 -0800
        Re: Regular expressions Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-03 12:38 +0000
        Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:53 -0600
        Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-03 10:34 -0500
          Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-03 11:10 -0500
            Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 03:20 +1100
              Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:35 +1100
                Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-04 12:41 +0100
      Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 14:56 +0000
    Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 20:51 -0700
      Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
        Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:33 -0700
        Re: Regular expressions Robin Koch <robin.koch@t-online.de> - 2015-11-03 23:58 +0100
    Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 10:25 +0100
    Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:50 -0600
    Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 15:00 +0100
      Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 17:12 +0200
        Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 16:35 +0100
          Re: Irregular last line in a text file, was Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 18:42 +0200
        Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 10:56 -0600
          Re: Irregular last line in a text file, was Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:39 +1100
            Re: Irregular last line in a text file, was Re: Regular expressions Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-04 10:07 +0000
            Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:33 -0600
        Re: Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 18:44 +0100
        Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:33 -0700
        Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:39 -0700
        Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 13:45 -0600
          Re: Irregular last line in a text file, was Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 22:15 +0000

Page 4 of 6 — ← Prev page 1 2 3 [4] 5 6  Next page →


#98210

FromChristian Gollwitzer <auriocus@gmx.de>
Date2015-11-04 08:21 +0100
Message-ID<n1cbi4$ao5$1@dont-email.me>
In reply to#98207
Am 04.11.15 um 04:48 schrieb Steven D'Aprano:
> On Wednesday 04 November 2015 11:33, rurpy@yahoo.com wrote:
>
>>> Not quite.  Core language concepts like ifs, loops, functions,
>>> variables, slicing, etc are the socket wrenches of the programmer's
>>> toolbox.  Regexs are like an electric impact socket wrench.  You can do
>>> the same work without it, but in many cases it's slower. But you have to
>>> learn the other hand tools first in order to really use the electric
>>> driver properly (understanding torques, direction of threads, etc), lest
>>> you wonder why you're breaking off so many bolts with the torque of the
>>> impact drive.
>>
>> I consider regexs more fundemental
>
> I'm sure that there are people who consider the International Space Station
> more fundamental than the lever, the wedge and the hammer, but they would be
> wrong too.
>
> Given primitives for branching, loops and variables, you can build support
> for regexes. Given regexes, how would you build support for variables?
>
> Of course, you could easily prove me wrong.

You *know* that they are not equivalent, I assume? regexes are 
equivalent to finite state machines, which are less powerful than Turing 
machines, and even less powerful than stack machines. You can't even 
construct a regexp which validates, if parentheses are balanced.

What rurpy meant, was that regexes can surface to a computer user 
earlier than variables and branches; a user who does not go into the 
depth to actually program the machine, might still encounter them in a 
text editor or database engine. Even some web forms allow some limited 
form, like e.g. the DVD rental here or Google.

	Christian

[toc] | [prev] | [next] | [standalone]


#98214

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2015-11-04 19:47 +1100
Message-ID<5639c63f$0$2897$c3e8da3$76491128@news.astraweb.com>
In reply to#98210
On Wednesday 04 November 2015 18:21, Christian Gollwitzer wrote:

> What rurpy meant, was that regexes can surface to a computer user
> earlier than variables and branches; a user who does not go into the
> depth to actually program the machine, might still encounter them in a
> text editor or database engine. Even some web forms allow some limited
> form, like e.g. the DVD rental here or Google.

What Rurpy meant, only Rurpy can say, but I doubt that is what he is talking 
about. By that logic, a full-screen high-def 3D first-person shooter game 
with an advanced AI is "more fundamental" than an assembly language branch 
operation, because there are people who play computer games without doing 
assembly programming.

In context, Michael suggested that programmers should learn the basic 
fundamentals of their chosen language, such as variables, for-loops and 
branching, before regexes -- which Rurpy then disagreed with, claiming that 
regexes are more fundamental than those basic operations.

What *I* think that Rurpy means is that one can construct a mathematical 
system based on pattern matching which is Turing complete, and therefore in 
principle any problem you can solve using a program written in (say) Python, 
C, Lisp, Smalltalk, etc, or execute on a CPU (or simulate in your head!) 
could be written as a sufficiently complex regular expression.

I think he is *technically wrong*, if by "regex" we mean actual regular 
expressions. Perl, and Python, regexes are strictly more powerful than 
regular expressions (despite the name). I know that Perl regexes are Turing 
complete (mainly because they can call out to the Perl interpreter), I'm not 
sure about Python regexes.

But I also think that Rurpy is *not even wrong* if he means Perl or Python 
regexes. The (entirely theoretical) ability to solve a problem like "What is 
pi to the power of the first prime number larger than 97531000?" using a 
regex doesn't make regexes more fundamental than variables, branches and 
loops. It just makes them an alternative computing paradigm -- one which is 
*exponentially* more difficult to use than the standard paradigms of 
functional, procedural, OOP, etc. for anything except the limited subset of 
pattern matching problems they were created for.



-- 
Steve

[toc] | [prev] | [next] | [standalone]


#98229

Fromrurpy@yahoo.com
Date2015-11-04 06:43 -0800
Message-ID<f0ae8c1e-5217-495f-842f-b5c6d86a3e8a@googlegroups.com>
In reply to#98214
On Wednesday, November 4, 2015 at 1:52:31 AM UTC-7, Steven D'Aprano wrote:
> On Wednesday 04 November 2015 18:21, Christian Gollwitzer wrote:
> 
> > What rurpy meant, was that regexes can surface to a computer user
> > earlier than variables and branches; a user who does not go into the
> > depth to actually program the machine, might still encounter them in a
> > text editor or database engine. Even some web forms allow some limited
> > form, like e.g. the DVD rental here or Google.
> [...]
> What *I* think that Rurpy means is that one can construct a mathematical 
> system based on pattern matching which is Turing complete, and therefore in 
> principle any problem you can solve using a program written in (say) Python, 
> C, Lisp, Smalltalk, etc, or execute on a CPU (or simulate in your head!) 
> could be written as a sufficiently complex regular expression.

No, Christian was correct.

[toc] | [prev] | [next] | [standalone]


#98228

Fromrurpy@yahoo.com
Date2015-11-04 06:38 -0800
Message-ID<89a2a4a7-f483-4e94-9f68-ba77ce4b7598@googlegroups.com>
In reply to#98207
On 11/03/2015 08:48 PM, Steven D'Aprano wrote:
> On Wednesday 04 November 2015 11:33, rurpy wrote:
>
>>> Not quite.  Core language concepts like ifs, loops, functions,
>>> variables, slicing, etc are the socket wrenches of the programmer's
>>> toolbox.  Regexs are like an electric impact socket wrench.  You can do
>>> the same work without it, but in many cases it's slower. But you have to
>>> learn the other hand tools first in order to really use the electric
>>> driver properly (understanding torques, direction of threads, etc), lest
>>> you wonder why you're breaking off so many bolts with the torque of the
>>> impact drive.
>>
>> I consider regexs more fundemental
>
> I'm sure that there are people who consider the International Space Station 
> more fundamental than the lever, the wedge and the hammer, but they would be 
> wrong too.
>
> Given primitives for branching, loops and variables, you can build support 
> for regexes. Given regexes, how would you build support for variables?
>
> Of course, you could easily prove me wrong. All you would need to do to 
> demonstrate that regexes are more fundamental than branching, loops and 
> variables would be to demonstrate that the primitive operations available in 
> commonly used CPUs are regular expressions, and that (for example) C's for 
> loop and if...else are implemented in machine code as regular expressions, 
> rather than the other way around.

I'm afraid you are making a category error but perhaps that's in 
part because I wasn't clear.  I was not talking about computer 
science.  I was talking about human beings learning about computers.  
Most people I know consider programming to be a higher level activity 
than "using" a computer: editing, sending email etc.  Many computer
users (not programmers) learn to use regular expressions as part
of using a computer without knowing anything about programming.
It was on that basis I called them more fundamental -- something
learned earlier which is expanded on and added to later.  But you
have a bit of a point, perhaps "fundamental" was not the best choice
of word to communicate that.

Here is what I wrote:

> I consider regexs more fundemental.  One need not even be a programmer
> to use them: consider grep, sed, a zillion editors, database query 
> languages, etc.

I thought the context, which you removed even to the point cutting 
text from the very same line you quoted, made that clear but perhaps
not.

Indeed it is quite eye-opening when one does learn a little CS and 
discovers these things that were just a useful "feature" actually have 
a deep and profound theoretical basis.

[toc] | [prev] | [next] | [standalone]


#98230

FromChris Angelico <rosuav@gmail.com>
Date2015-11-05 01:52 +1100
Message-ID<mailman.18.1446648732.16136.python-list@python.org>
In reply to#98228
On Thu, Nov 5, 2015 at 1:38 AM, rurpy--- via Python-list
<python-list@python.org> wrote:
> I'm afraid you are making a category error but perhaps that's in
> part because I wasn't clear.  I was not talking about computer
> science.  I was talking about human beings learning about computers.
> Most people I know consider programming to be a higher level activity
> than "using" a computer: editing, sending email etc.  Many computer
> users (not programmers) learn to use regular expressions as part
> of using a computer without knowing anything about programming.
> It was on that basis I called them more fundamental -- something
> learned earlier which is expanded on and added to later.  But you
> have a bit of a point, perhaps "fundamental" was not the best choice
> of word to communicate that.

The "fundamentals" of something are its most basic functions, not its
most basic uses. The most common use of a computer might be to browse
the web, but the fundamental functionality is arithmetic and logic.

Setting aside the choice of word, though, I still don't think regular
expressions are a more basic use of computing than loops and
conditionals. A regex can't be used for anything other than string
matching; they exist for one purpose, and one purpose only: to answer
the question "Does this string match this pattern?". Sure, you can
abuse that into a primality check and other forms of crazy arithmetic,
but it's not what they truly do. I also would not teach regexes to
people as part of an "introduction to computing" course, any more than
I would teach the use of Microsoft Excel, which some such courses have
been known to do. (And no, it's not because of the Microsoftness. I
wouldn't teach LibreOffice Calc either.) You don't need to know how to
work a spreadsheet as part of the basics of computer usage, and you
definitely don't need an advanced form of text search.

ChrisA

[toc] | [prev] | [next] | [standalone]


#98258

Fromrurpy@yahoo.com
Date2015-11-04 16:13 -0800
Message-ID<74351250-2c5b-439d-abc9-65e3480cd930@googlegroups.com>
In reply to#98230
On 11/04/2015 07:52 AM, Chris Angelico wrote:
> On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote:
>> I'm afraid you are making a category error but perhaps that's in
>> part because I wasn't clear.  I was not talking about computer
>> science.  I was talking about human beings learning about computers.
>> Most people I know consider programming to be a higher level activity
>> than "using" a computer: editing, sending email etc.  Many computer
>> users (not programmers) learn to use regular expressions as part
>> of using a computer without knowing anything about programming.
>> It was on that basis I called them more fundamental -- something
>> learned earlier which is expanded on and added to later.  But you
>> have a bit of a point, perhaps "fundamental" was not the best choice
>> of word to communicate that.
>
> The "fundamentals" of something are its most basic functions, not its
> most basic uses. The most common use of a computer might be to browse
> the web, but the fundamental functionality is arithmetic and logic.

If one accepted that then one would have to reject the term "fundamental 
use" as meaningless.  A quick trip to google shows that's not true.

> Setting aside the choice of word, though, I still don't think regular
> expressions are a more basic use of computing than loops and
> conditionals. A regex can't be used for anything other than string
> matching; they exist for one purpose, and one purpose only: to answer
> the question "Does this string match this pattern?". 

But string matching *is* a fundamental problem that arises frequently
in many aspects of CS, programming and, as I mentioned, day-to-day
computer use.  Saying its "only" for pattern matching is like saying 
floating point numbers are "only" for doing non-integer arithmetic,
or unicode is "only" for representing text.  (Neither of those is a 
good analogy because both lack the important theoretical underpinnings 
that regular expressions have [*]).
There would be far fewer computer languages, and they would be much
more primitive if regular expressions (and the fundamental concepts
that they express) did not exist.

To be sure, I did gloss over Michael Torries' point that there are 
other concepts that are more basic in the context of learning 
programming, he was correct about that. 

But that does not negate the fact that regexes are important and 
fundamental.  They are both very useful in a practical sense (they 
are even available in Microsoft Excel) and important in a theoretical 
sense.  You are not well rounded as a programmer if you decline to 
learn about regular expressions because "they are too cryptic", or 
"I can do in code anything they do".  

I think the constant negative reception the posters receive here when
they ask about regexes does them a great disservice.

By all means point out that python offers a number of functions that 
can avoid the need for using regexes in simple cases.  Even point out 
that you (the plural you) don't like them and prefer other solutions
(like writing code that does the same thing in a more half-assed bug
ridden way, the posts in this thread being a case in point.)

But I really wish every mention of regexes here wasn't reflexively 
greeted with a barrage of negative comments and that lame "two problems"
quote, especially without an answer to the poster's regex question.

> Sure, you can
> abuse that into a primality check and other forms of crazy arithmetic,
> but it's not what they truly do. I also would not teach regexes to
> people as part of an "introduction to computing" course, any more than
> I would teach the use of Microsoft Excel, which some such courses have
> been known to do. (And no, it's not because of the Microsoftness. I
> wouldn't teach LibreOffice Calc either.) You don't need to know how to
> work a spreadsheet as part of the basics of computer usage, and you
> definitely don't need an advanced form of text search.

Seems to me that clearly depends on the intent of the class, the students
goal's, what they'll be studying after the class, what their current 
level of knowledge is, etc.  Your scenario seems way too under-specified
to say anything definitive.  And further, the pedagogy of CS (or of any 
subject of education) is not "settled science" and that kind of question
almost never has a clear right/wrong answer.

This list is not a class.  If someone comes here with a question about 
Python's regexes they deserve an answer and not be bombarded with reasons
why they shouldn't be using regexes beyond mentioning some of the alternatives
in a "oh, by the way" way.  (And yes, I recognize in this case the OP did 
get a good answer from MRAB early on.)

----
[*] yes, I know there is a lot of CS theory underlying floating point.
I don't think it is as deep or as important as that underlying regexes,
automata and language.

[toc] | [prev] | [next] | [standalone]


#98260

FromChris Angelico <rosuav@gmail.com>
Date2015-11-05 11:33 +1100
Message-ID<mailman.37.1446683628.16136.python-list@python.org>
In reply to#98258
On Thu, Nov 5, 2015 at 11:13 AM, rurpy--- via Python-list
<python-list@python.org> wrote:
> On 11/04/2015 07:52 AM, Chris Angelico wrote:
>> On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote:
>>> I'm afraid you are making a category error but perhaps that's in
>>> part because I wasn't clear.  I was not talking about computer
>>> science.  I was talking about human beings learning about computers.
>>> Most people I know consider programming to be a higher level activity
>>> than "using" a computer: editing, sending email etc.  Many computer
>>> users (not programmers) learn to use regular expressions as part
>>> of using a computer without knowing anything about programming.
>>> It was on that basis I called them more fundamental -- something
>>> learned earlier which is expanded on and added to later.  But you
>>> have a bit of a point, perhaps "fundamental" was not the best choice
>>> of word to communicate that.
>>
>> The "fundamentals" of something are its most basic functions, not its
>> most basic uses. The most common use of a computer might be to browse
>> the web, but the fundamental functionality is arithmetic and logic.
>
> If one accepted that then one would have to reject the term "fundamental
> use" as meaningless.  A quick trip to google shows that's not true.

A quick trip to Google showed me that there are a number of uses of
the phrase, mostly in scientific papers and such. I've no idea how
that helps your argument.

> But string matching *is* a fundamental problem that arises frequently
> in many aspects of CS, programming and, as I mentioned, day-to-day
> computer use.  Saying its "only" for pattern matching is like saying
> floating point numbers are "only" for doing non-integer arithmetic,
> or unicode is "only" for representing text.  (Neither of those is a
> good analogy because both lack the important theoretical underpinnings
> that regular expressions have [*]).

String matching does happen a lot. How often do you actually need
pattern matching? Most of the time, you're doing equality checks - or
prefix/suffix checks, at best.

> There would be far fewer computer languages, and they would be much
> more primitive if regular expressions (and the fundamental concepts
> that they express) did not exist.

So? There would also be far fewer computer languages if braces didn't
exist, because we wouldn't have the interminable arguments about
whether they're good or not.

> To be sure, I did gloss over Michael Torries' point that there are
> other concepts that are more basic in the context of learning
> programming, he was correct about that.
>
> But that does not negate the fact that regexes are important and
> fundamental.  They are both very useful in a practical sense (they
> are even available in Microsoft Excel) and important in a theoretical
> sense.  You are not well rounded as a programmer if you decline to
> learn about regular expressions because "they are too cryptic", or
> "I can do in code anything they do".

You've proven that they are important, but in no way have you proven
them fundamental. A regular expression library is the ideal solution
to the problem "I want to let my users search for patterns of their
own choosing". That's great, but it's only one specific class of
problem.

> I think the constant negative reception the posters receive here when
> they ask about regexes does them a great disservice.
>
> By all means point out that python offers a number of functions that
> can avoid the need for using regexes in simple cases.  Even point out
> that you (the plural you) don't like them and prefer other solutions
> (like writing code that does the same thing in a more half-assed bug
> ridden way, the posts in this thread being a case in point.)
>
> But I really wish every mention of regexes here wasn't reflexively
> greeted with a barrage of negative comments and that lame "two problems"
> quote, especially without an answer to the poster's regex question.

When has that happened? Usually there'll be at least two answers - one
that uses a regex and one that doesn't - and people get to read both.

>> Sure, you can
>> abuse that into a primality check and other forms of crazy arithmetic,
>> but it's not what they truly do. I also would not teach regexes to
>> people as part of an "introduction to computing" course, any more than
>> I would teach the use of Microsoft Excel, which some such courses have
>> been known to do. (And no, it's not because of the Microsoftness. I
>> wouldn't teach LibreOffice Calc either.) You don't need to know how to
>> work a spreadsheet as part of the basics of computer usage, and you
>> definitely don't need an advanced form of text search.
>
> Seems to me that clearly depends on the intent of the class, the students
> goal's, what they'll be studying after the class, what their current
> level of knowledge is, etc.  Your scenario seems way too under-specified
> to say anything definitive.  And further, the pedagogy of CS (or of any
> subject of education) is not "settled science" and that kind of question
> almost never has a clear right/wrong answer.

Uhh, "introduction to computing". What's the current level of
knowledge? Close to zero. That's the whole point of an introductory
class. It's a place where you teach the basics.

> This list is not a class.  If someone comes here with a question about
> Python's regexes they deserve an answer and not be bombarded with reasons
> why they shouldn't be using regexes beyond mentioning some of the alternatives
> in a "oh, by the way" way.  (And yes, I recognize in this case the OP did
> get a good answer from MRAB early on.)

"I want to swim from Sydney to Los Angeles, but my gloves keep wearing
out half way across the Pacific. How can I make my gloves strong
enough to get me to LA?"

Response 1: "If you use industrial-strength gloves and go via Papua
New Guinea, you can double up the gloves and swim to LA."

Response 2: "Swimming across the Pacific is a bad idea. Have you
considered taking a boat or plane instead?"

Which is the more helpful response? You can go ahead and assume the OP
always knows best; I'm going to at least offer some alternatives.

ChrisA

[toc] | [prev] | [next] | [standalone]


#98279

Fromrurpy@yahoo.com
Date2015-11-04 21:42 -0800
Message-ID<a33ce924-c795-447b-8fb9-d7e01fee8936@googlegroups.com>
In reply to#98260
On 11/04/2015 05:33 PM, Chris Angelico wrote:
> On Thu, Nov 5, 2015 at 11:13 AM, rurpy--- via Python-list
> <python-list@python.org> wrote:
>> On 11/04/2015 07:52 AM, Chris Angelico wrote:
>>> On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote:
>>>> I'm afraid you are making a category error but perhaps that's in
>>>> part because I wasn't clear.  I was not talking about computer
>>>> science.  I was talking about human beings learning about computers.
>>>> Most people I know consider programming to be a higher level activity
>>>> than "using" a computer: editing, sending email etc.  Many computer
>>>> users (not programmers) learn to use regular expressions as part
>>>> of using a computer without knowing anything about programming.
>>>> It was on that basis I called them more fundamental -- something
>>>> learned earlier which is expanded on and added to later.  But you
>>>> have a bit of a point, perhaps "fundamental" was not the best choice
>>>> of word to communicate that.
>>>
>>> The "fundamentals" of something are its most basic functions, not its
>>> most basic uses. The most common use of a computer might be to browse
>>> the web, but the fundamental functionality is arithmetic and logic.
>>
>> If one accepted that then one would have to reject the term "fundamental
>> use" as meaningless.  A quick trip to google shows that's not true.
>
> A quick trip to Google showed me that there are a number of uses of
> the phrase, mostly in scientific papers and such. I've no idea how
> that helps your argument.

I was showing that your objection to my use of "fundamental" on the 
grounds it does not apply to "use" is patently silly.  From Google:

   interferes with B's more fundamental use because
   fundamental use of english
   The fundamental use of testing
   Fundamental Use of the Michigan Terminal System
   negotiate a fundamental use and exchange of power
   the most fundamental use of pointers
   makes fundamental use of statistical theory

This is what I meant in a recent post when I referred to the Alice-
in-Wonderland nature of this group.  I'm afraid I don't have the 
time or interest to discuss basic english with you.  If you want 
to maintain that "fundamental" does apply to "use" please go right
ahead, it's your credibility at risk.

>> But string matching *is* a fundamental problem that arises frequently
>> in many aspects of CS, programming and, as I mentioned, day-to-day
>> computer use.  Saying its "only" for pattern matching is like saying
>> floating point numbers are "only" for doing non-integer arithmetic,
>> or unicode is "only" for representing text.  (Neither of those is a
>> good analogy because both lack the important theoretical underpinnings
>> that regular expressions have [*]).
>
> String matching does happen a lot. How often do you actually need
> pattern matching? Most of the time, you're doing equality checks - or
> prefix/suffix checks, at best.
>
>> There would be far fewer computer languages, and they would be much
>> more primitive if regular expressions (and the fundamental concepts
>> that they express) did not exist.
>
> So? There would also be far fewer computer languages if braces didn't
> exist, because we wouldn't have the interminable arguments about
> whether they're good or not.

Sorry, that makes no sense to me.  

>> To be sure, I did gloss over Michael Torries' point that there are
>> other concepts that are more basic in the context of learning
>> programming, he was correct about that.
>>
>> But that does not negate the fact that regexes are important and
>> fundamental.  They are both very useful in a practical sense (they
>> are even available in Microsoft Excel) and important in a theoretical
>> sense.  You are not well rounded as a programmer if you decline to
>> learn about regular expressions because "they are too cryptic", or
>> "I can do in code anything they do".
>
> You've proven that they are important, but in no way have you proven
> them fundamental. A regular expression library is the ideal solution
> to the problem "I want to let my users search for patterns of their
> own choosing". That's great, but it's only one specific class of
> problem.

If you think that is the sole use of pattern matching or even the most
important use, I can understand why you find regexes fairly useless.
Lexing (tokenization) and simple parsing are often done with regular
expressions.  Many dozens of times a year I write programs to extract 
or munge data in text files.  Three days ago I had to extract data from 
a 500MB log file for insertion in a database that used many regexes,
even some that could have been replaced by python methods.  But mixing
the two approaches would have been less clear than using regexs 
consistently.

Text recognition and modification is an *extremely* common need, not
some niche application as you suggest.

>> I think the constant negative reception the posters receive here when
>> they ask about regexes does them a great disservice.
>>
>> By all means point out that python offers a number of functions that
>> can avoid the need for using regexes in simple cases.  Even point out
>> that you (the plural you) don't like them and prefer other solutions
>> (like writing code that does the same thing in a more half-assed bug
>> ridden way, the posts in this thread being a case in point.)
>>
>> But I really wish every mention of regexes here wasn't reflexively
>> greeted with a barrage of negative comments and that lame "two problems"
>> quote, especially without an answer to the poster's regex question.
>
> When has that happened? Usually there'll be at least two answers - one
> that uses a regex and one that doesn't - and people get to read both.

No, usually there is one answer with a regex, five advising against 
regexes, and two with the silly "two problems" quote.  The impression
one is left with is that regexs are bad and to be avoided.  

Rarely to never does one see a response encouraging a poster 
to learn about and use regular expressions which is why I spoke
up this time.

>>> Sure, you can
>>> abuse that into a primality check and other forms of crazy arithmetic,
>>> but it's not what they truly do. I also would not teach regexes to
>>> people as part of an "introduction to computing" course, any more than
>>> I would teach the use of Microsoft Excel, which some such courses have
>>> been known to do. (And no, it's not because of the Microsoftness. I
>>> wouldn't teach LibreOffice Calc either.) You don't need to know how to
>>> work a spreadsheet as part of the basics of computer usage, and you
>>> definitely don't need an advanced form of text search.
>>
>> Seems to me that clearly depends on the intent of the class, the students
>> goal's, what they'll be studying after the class, what their current
>> level of knowledge is, etc.  Your scenario seems way too under-specified
>> to say anything definitive.  And further, the pedagogy of CS (or of any
>> subject of education) is not "settled science" and that kind of question
>> almost never has a clear right/wrong answer.
>
> Uhh, "introduction to computing". What's the current level of
> knowledge? Close to zero. That's the whole point of an introductory
> class. It's a place where you teach the basics.

"Introduction to computing" covers everything from teaching unemployed 
people how to use word and excel to a first "algorithms and data structures"
for AP high-school kids to programming with a heavy dose of hardware
architecture.  What "the basics" are is, as far as I know, still the 
subject of debate and research among professional educators.

>> This list is not a class.  If someone comes here with a question about
>> Python's regexes they deserve an answer and not be bombarded with reasons
>> why they shouldn't be using regexes beyond mentioning some of the alternatives
>> in a "oh, by the way" way.  (And yes, I recognize in this case the OP did
>> get a good answer from MRAB early on.)
>
> "I want to swim from Sydney to Los Angeles, but my gloves keep wearing
> out half way across the Pacific. How can I make my gloves strong
> enough to get me to LA?"
>
> Response 1: "If you use industrial-strength gloves and go via Papua
> New Guinea, you can double up the gloves and swim to LA."
>
> Response 2: "Swimming across the Pacific is a bad idea. Have you
> considered taking a boat or plane instead?"
>
> Which is the more helpful response? You can go ahead and assume the OP
> always knows best; I'm going to at least offer some alternatives.

Using a regular expression (even when there are other alternatives)
is not analogous to "Swimming across the Pacific".  (Back in Wonderland
again.)  Using a regex is *not* a life threatening situation.

I've said repeatedly that pointing out alternatives is fine.  Pointing 
out there is no need for a regex when searching for a constant string
is fine.  And similar...  But the responses here often go well beyond 
that in negativity.

My own theory is that regexes are associated with Perl in the minds 
of many participants here and thus provoke an automatic immune 
reaction.

[toc] | [prev] | [next] | [standalone]


#98266

FromSteven D'Aprano <steve@pearwood.info>
Date2015-11-05 13:26 +1100
Message-ID<563abe6f$0$1614$c3e8da3$5496439d@news.astraweb.com>
In reply to#98258
On Thu, 5 Nov 2015 11:13 am, rurpy@yahoo.com wrote:

> There would be far fewer computer languages, and they would be much
> more primitive if regular expressions (and the fundamental concepts
> that they express) did not exist.

Well, that's certainly true. But only because contra-factual statements can
imply the truth of anything. If squares had seven sides, then Joseph Stalin
would have been the first woman to go to the moon on horseback.

I can't imagine a world where pattern matching doesn't exist. That's like
trying to imagine a world where arithmetic doesn't exist. But I think we
can safely say that, had nobody thought of the idea of searching for
patterns ('find me all the lines with "green" in them'), there would be far
fewer regex libraries in existence. I doubt that there would be "far fewer"
programming languages. With the possible exception of Perl, sed and awk,
I'm not aware of any languages which were specifically inspired by, and
exist primarily to apply, regular expressions, nor any languages which
*require* regexes in their implementation. Most languages are built on
parsers, not regular expressions.


> But I really wish every mention of regexes here wasn't reflexively 
> greeted with a barrage of negative comments and that lame "two problems"
> quote, especially without an answer to the poster's regex question.

I don't disagree with this. Certainly we should accept questions from people
who are simply trying to learn how to use regexes without bombarding them
with admonitions to do something different. Yes yes, I know that regexes
aren't the only tool in my tool box, but *right now* I want to learn how to
use regexes.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#98270

FromBen Finney <ben+python@benfinney.id.au>
Date2015-11-05 14:07 +1100
Message-ID<mailman.42.1446693007.16136.python-list@python.org>
In reply to#98266
Steven D'Aprano <steve@pearwood.info> writes:

> Yes yes, I know that regexes aren't the only tool in my tool box, but
> *right now* I want to learn how to use regexes.

I'll gently suggest this isn't a particularly good forum to do so.

Learn them with a tool like <URL:http://www.regexr.com/> and a tutorial
<URL:http://www.usna.edu/Users/cs/wcbrown/regexp/RegexpTutorial.html> or
something longer.

-- 
 \                “Fascism is capitalism plus murder.” —Upton Sinclair |
  `\                                                                   |
_o__)                                                                  |
Ben Finney

[toc] | [prev] | [next] | [standalone]


#98280

Fromrurpy@yahoo.com
Date2015-11-04 21:54 -0800
Message-ID<8d2e4d8d-dead-4ffd-b6fd-00d16cd0c26c@googlegroups.com>
In reply to#98266
On Wednesday, November 4, 2015 at 7:31:34 PM UTC-7, Steven D'Aprano wrote:
> On Thu, 5 Nov 2015 11:13 am, rurpy wrote:
> 
> > There would be far fewer computer languages, and they would be much
> > more primitive if regular expressions (and the fundamental concepts
> > that they express) did not exist.
> 
> Well, that's certainly true. But only because contra-factual statements can
> imply the truth of anything. If squares had seven sides, then Joseph Stalin
> would have been the first woman to go to the moon on horseback.

Yes, thank you for that profoundly insightful comment.

[toc] | [prev] | [next] | [standalone]


#98292

FromAntoon Pardon <antoon.pardon@rece.vub.ac.be>
Date2015-11-05 10:14 +0100
Message-ID<mailman.49.1446714861.16136.python-list@python.org>
In reply to#98258
Op 05-11-15 om 01:33 schreef Chris Angelico:
> "I want to swim from Sydney to Los Angeles, but my gloves keep wearing
> out half way across the Pacific. How can I make my gloves strong
> enough to get me to LA?"
>
> Response 1: "If you use industrial-strength gloves and go via Papua
> New Guinea, you can double up the gloves and swim to LA."
>
> Response 2: "Swimming across the Pacific is a bad idea. Have you
> considered taking a boat or plane instead?"
>
> Which is the more helpful response? You can go ahead and assume the OP
> always knows best; I'm going to at least offer some alternatives.

What I see often enough doesn't look like offering an alternative but
more like strong argumentation against the direction the OP is going.

I have nothing against offering an alternative. There is the possibilty
that there are better methods to solve the original problem and there
is nothing wrong with suggesting this possibility.

But there is also the possibility that the direction the OP is heading
is the correct one, even if you can't see it. Take the original question
on how to recognize a line that ends with a '*' with a regular expression.

What almost noone seems to have considered is that the real problem might
have been more involved and an excellent example of a problem you can
solve with regular expressions but that there was this subproblem of recognizing
a '*' at the end of the line that was troublesome for the OP.

This is a possibility that is all too often ignored by the members on this
list. We advise people here to just show to most bare code that still 
shows the problem, yet we ignore that this effects the part of the problem we
get to see and often enough people then insist on a better alternative
to deal with the problem totally ignoring that this better alternative
may be totally useless in the original context.

-- 
Antoon Pardon

[toc] | [prev] | [next] | [standalone]


#98252

FromSeymore4Head <Seymore4Head@Hotmail.invalid>
Date2015-11-04 18:02 -0500
Message-ID<1i3l3b9836hatsuoopak5gtg2c38g49kb1@4ax.com>
In reply to#98207
On Wed, 04 Nov 2015 14:48:21 +1100, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:

>On Wednesday 04 November 2015 11:33, rurpy@yahoo.com wrote:
>
>>> Not quite.  Core language concepts like ifs, loops, functions,
>>> variables, slicing, etc are the socket wrenches of the programmer's
>>> toolbox.  Regexs are like an electric impact socket wrench.  You can do
>>> the same work without it, but in many cases it's slower. But you have to
>>> learn the other hand tools first in order to really use the electric
>>> driver properly (understanding torques, direction of threads, etc), lest
>>> you wonder why you're breaking off so many bolts with the torque of the
>>> impact drive.
>> 
>> I consider regexs more fundemental
>
>I'm sure that there are people who consider the International Space Station 
>more fundamental than the lever, the wedge and the hammer, but they would be 
>wrong too.
>
>Given primitives for branching, loops and variables, you can build support 
>for regexes. Given regexes, how would you build support for variables?
>
>Of course, you could easily prove me wrong. All you would need to do to 
>demonstrate that regexes are more fundamental than branching, loops and 
>variables would be to demonstrate that the primitive operations available in 
>commonly used CPUs are regular expressions, and that (for example) C's for 
>loop and if...else are implemented in machine code as regular expressions, 
>rather than the other way around.

So far the only use I have for regex is to replace slicing, but I
think it is an improvement.

[toc] | [prev] | [next] | [standalone]


#98263

FromSteven D'Aprano <steve@pearwood.info>
Date2015-11-05 11:54 +1100
Message-ID<563aa8be$0$1596$c3e8da3$5496439d@news.astraweb.com>
In reply to#98252
On Thu, 5 Nov 2015 10:02 am, Seymore4Head wrote:

> So far the only use I have for regex is to replace slicing, but I
> think it is an improvement.

I don't understand this. This is like saying "so far the only use I have for
a sandwich press is to replace my coffee pot". Regular expressions and
slicing do very different things.

Slicing extracts substrings, given known starting and ending positions:


py> the_str = "Now is the time for all good men..."
py> the_str[7:12]
'the t'


Regular expressions don't extract substrings with known start/end positions.
They *find* matching text, giving a search string with metacharacters. (If
there are no metacharacters in your search string, you shouldn't use a
regex. str.find will be significantly faster and more convenient.)

Slicing is not about finding text, it is about extracting text once you've
already found it. So they are complementary, not alternatives.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#98308

FromSeymore4Head <Seymore4Head@Hotmail.invalid>
Date2015-11-05 10:07 -0500
Message-ID<ftrm3btf2a8ik9h1uora8p1ptq4sqand60@4ax.com>
In reply to#98263
On Thu, 05 Nov 2015 11:54:20 +1100, Steven D'Aprano
<steve@pearwood.info> wrote:

>On Thu, 5 Nov 2015 10:02 am, Seymore4Head wrote:
>
>> So far the only use I have for regex is to replace slicing, but I
>> think it is an improvement.
>
>I don't understand this. This is like saying "so far the only use I have for
>a sandwich press is to replace my coffee pot". Regular expressions and
>slicing do very different things.
>
>Slicing extracts substrings, given known starting and ending positions:
>
>
>py> the_str = "Now is the time for all good men..."
>py> the_str[7:12]
>'the t'
>
>
>Regular expressions don't extract substrings with known start/end positions.
>They *find* matching text, giving a search string with metacharacters. (If
>there are no metacharacters in your search string, you shouldn't use a
>regex. str.find will be significantly faster and more convenient.)
>
>Slicing is not about finding text, it is about extracting text once you've
>already found it. So they are complementary, not alternatives.

Here is an example of the text we are slicing apart.

From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.90])
	 by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
	 Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
Received: from murder ([unix socket])
	 by mail.umich.edu (Cyrus v2.2.12) with LMTPA;
	 Sat, 05 Jan 2008 09:14:16 -0500
Received: from holes.mr.itd.umich.edu (holes.mr.itd.umich.edu
[141.211.14.79])
	by flawless.mail.umich.edu () with ESMTP id m05EEFR1013674;
	Sat, 5 Jan 2008 09:14:15 -0500
Received: FROM paploo.uhi.ac.uk (app1.prod.collab.uhi.ac.uk
[194.35.219.184])
	BY holes.mr.itd.umich.edu ID 477F90B0.2DB2F.12494 ; 
	 5 Jan 2008 09:14:10 -0500
Received: from paploo.uhi.ac.uk (localhost [127.0.0.1])
	by paploo.uhi.ac.uk (Postfix) with ESMTP id 5F919BC2F2;
	Sat,  5 Jan 2008 14:10:05 +0000 (GMT)
Message-ID: <200801051412.m05ECIaH010327@nakamura.uits.iupui.edu>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Received: from prod.collab.uhi.ac.uk ([194.35.219.182])
          by paploo.uhi.ac.uk (JAMES SMTP Server 2.1.3) with SMTP ID
899
          for <source@collab.sakaiproject.org>;
          Sat, 5 Jan 2008 14:09:50 +0000 (GMT)
Received: from nakamura.uits.iupui.edu (nakamura.uits.iupui.edu
[134.68.220.122])
	by shmi.uhi.ac.uk (Postfix) with ESMTP id A215243002
	for <source@collab.sakaiproject.org>; Sat,  5 Jan 2008
14:13:33 +0000 (GMT)
Received: from nakamura.uits.iupui.edu (localhost [127.0.0.1])
	by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11) with
ESMTP id m05ECJVp010329
	for <source@collab.sakaiproject.org>; Sat, 5 Jan 2008 09:12:19
-0500
Received: (from apache@localhost)
	by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11/Submit)
id m05ECIaH010327
	for source@collab.sakaiproject.org; Sat, 5 Jan 2008 09:12:18
-0500
Date: Sat, 5 Jan 2008 09:12:18 -0500
X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender
to stephen.marquard@uct.ac.za using -f
To: source@collab.sakaiproject.org
From: stephen.marquard@uct.ac.za

The practice problems are something like pull out all the email
addresses or pull out the days of the week and give the most common.

[toc] | [prev] | [next] | [standalone]


#98371

Fromrurpy@yahoo.com
Date2015-11-06 12:46 -0800
Message-ID<e17a32f3-3332-4452-bc26-c4097c137b78@googlegroups.com>
In reply to#98308
On Thursday, November 5, 2015 at 8:12:22 AM UTC-7, Seymore4Head wrote:
> On Thu, 05 Nov 2015 11:54:20 +1100, Steven D'Aprano <steve@pearwood.info> wrote:
> >On Thu, 5 Nov 2015 10:02 am, Seymore4Head wrote:
> >> So far the only use I have for regex is to replace slicing, but I
> >> think it is an improvement.
> >
> >I don't understand this. This is like saying "so far the only use I have for
> >a sandwich press is to replace my coffee pot". Regular expressions and
> >slicing do very different things.
> >[...]
> 
> Here is an example of the text we are slicing apart.
> 
>[...email headers...]
>
> The practice problems are something like pull out all the email
> addresses or pull out the days of the week and give the most common.

Yes, that is a perfectly appropriate use of regexes.

As Steven mentioned though, the term "slicing" is also used with a 
very specific and different meaning in Python, specifically referring
to a part of a list using a syntax like "alist[a:b]".  I can't seem
to get to python.org at the moment but if you look in the Python
docs index under "slicing" you'll find more info.
 

[toc] | [prev] | [next] | [standalone]


#98137

FromSteven D'Aprano <steve@pearwood.info>
Date2015-11-03 18:15 +1100
Message-ID<56385efc$0$1598$c3e8da3$5496439d@news.astraweb.com>
In reply to#98130
On Tue, 3 Nov 2015 03:23 pm, rurpy@yahoo.com wrote:

> Regular expressions should be learned by every programmer or by anyone
> who wants to use computers as a tool.  They are a fundamental part of
> computer science and are used in all sorts of matching and searching
> from compilers down to your work-a-day text editor.

You are absolutely right.

If only regular expressions weren't such an overly-terse, cryptic
mini-language, with all but no debugging capabilities, they would be great.

If only there wasn't an extensive culture of regular expression abuse within
programming communities, they would be fine.

All technologies are open to abuse. But we don't say:

  Some people, when confronted with a problem, think "I know, I'll use
  arithmetic." Now they have two problems.

because abuse of arithmetic is rare. It's hard to misuse it, and while
arithmetic can be complicated, it's rare for programmers to abuse it. But
the same cannot be said for regexes -- they are regularly misused, abused,
and down-right hard to use right even when you have a good reason for using
them:

http://www.thedailywtf.com/articles/Irregular_Expression

http://blog.codinghorror.com/regex-use-vs-regex-abuse/

http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html


If there is one person who has done more to create a regex culture, it is
Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
overused and their syntax is harmful, and he has recreated them for Perl 6:

http://www.perl.com/pub/2002/06/04/apo5.html

Oh, and the icing on the cake, regexes can be a security vulnerability too:

https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#98140

FromNick Sarbicki <nick.a.sarbicki@gmail.com>
Date2015-11-03 08:43 +0000
Message-ID<mailman.9.1446540211.8789.python-list@python.org>
In reply to#98137
On Tue, Nov 3, 2015 at 7:15 AM, Steven D'Aprano <steve@pearwood.info> wrote:

> On Tue, 3 Nov 2015 03:23 pm, rurpy@yahoo.com wrote:
>
> > Regular expressions should be learned by every programmer or by anyone
> > who wants to use computers as a tool.  They are a fundamental part of
> > computer science and are used in all sorts of matching and searching
> > from compilers down to your work-a-day text editor.
>
> You are absolutely right.
>
> If only regular expressions weren't such an overly-terse, cryptic
> mini-language, with all but no debugging capabilities, they would be great.
>
> If only there wasn't an extensive culture of regular expression abuse
> within
> programming communities, they would be fine.
>
> All technologies are open to abuse. But we don't say:
>
>   Some people, when confronted with a problem, think "I know, I'll use
>   arithmetic." Now they have two problems.
>
> because abuse of arithmetic is rare. It's hard to misuse it, and while
> arithmetic can be complicated, it's rare for programmers to abuse it. But
> the same cannot be said for regexes -- they are regularly misused, abused,
> and down-right hard to use right even when you have a good reason for using
> them:
>
> http://www.thedailywtf.com/articles/Irregular_Expression
>
> http://blog.codinghorror.com/regex-use-vs-regex-abuse/
>
>
> http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html
>
>
> If there is one person who has done more to create a regex culture, it is
> Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
> overused and their syntax is harmful, and he has recreated them for Perl 6:
>
> http://www.perl.com/pub/2002/06/04/apo5.html
>
> Oh, and the icing on the cake, regexes can be a security vulnerability too:
>
>
> https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
>
>
>
> --
> Steven
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>


+1

I agree that regex is an entirely necessary part of a programmers toolkit,
but dear god some people need to be taught restraint. The majority of
people I talk about regex to have no idea when and where it shouldn't be
used.

As an example part of my job is bringing our legacy Python code into the
modern day, and one of the largest roadblocks is the amount of regex used.

Some is necessary.

Some can be replaced by an `if word in str` or something similarly basic.

Some spans hundreds of lines and causes acute alopecia.

Just yesterday I found a colleague trying to parse HTML with regex.

So yes, teach regex, but teach it after the basics, and please emphasise
when it is appropriate to use it.

Yes I am bitter.

- Nick.

[toc] | [prev] | [next] | [standalone]


#98194

Fromrurpy@yahoo.com
Date2015-11-03 16:22 -0800
Message-ID<455b6498-5104-491a-98c2-6f7e48142496@googlegroups.com>
In reply to#98137
On 11/03/2015 12:15 AM, Steven D'Aprano wrote:
> On Tue, 3 Nov 2015 03:23 pm, rurpy wrote:
> 
>> Regular expressions should be learned by every programmer or by anyone
>> who wants to use computers as a tool.  They are a fundamental part of
>> computer science and are used in all sorts of matching and searching
>> from compilers down to your work-a-day text editor.
> 
> You are absolutely right.
> 
> If only regular expressions weren't such an overly-terse, cryptic
> mini-language, with all but no debugging capabilities, they would be great.
> 
> If only there wasn't an extensive culture of regular expression abuse within
> programming communities, they would be fine.
> 
> All technologies are open to abuse. But we don't say:
> 
>   Some people, when confronted with a problem, think "I know, I'll use
>   arithmetic." Now they have two problems.
> 
> because abuse of arithmetic is rare. It's hard to misuse it, and while
> arithmetic can be complicated, it's rare for programmers to abuse it. But
> the same cannot be said for regexes -- they are regularly misused, abused,
> and down-right hard to use right even when you have a good reason for using
> them:
> 
> http://www.thedailywtf.com/articles/Irregular_Expression
> 
> http://blog.codinghorror.com/regex-use-vs-regex-abuse/
> 
> http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html

Thanks for pointing out three cases of misuse of regexes out of the
approximately 375000000 [*] uses of regexes in the wild. I hope you're
not dumb enough to think that constitutes significant evidence.

Even worse, of the three only one was a real example. One of the others
was machine-generated code, the other was a "look what you can do with
regexes" example, not serious code.

Here is an example of "abusing" python

  https://benkurtovic.com/2014/06/01/obfuscating-hello-world.html

I wouldn't use this as evidence that Python is to be avoided.

> If there is one person who has done more to create a regex culture, it is
> Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
> overused and their syntax is harmful, and he has recreated them for Perl 6:
> 
> http://www.perl.com/pub/2002/06/04/apo5.html

You really should have read beyond the first paragraph. He proposes
fixing regexes by adding even more special character combinations and
making regexes even *more* powerful. (He turned them into full-blown
parsers.)

Nowhere does he advocate not using, or avoiding if possible, regexes
as is the mantra in this list.

Here is Larry's "recreation" that you are touting:

  http://design.perl6.org/S05.html

Please explain to us how you think this "fix" addresses the complaints
you and other Python anti-regexers have about regexes.

I hope you also noted Larry's tongue-in-cheek writing style. Right after
pointing out that some claim Perl is hard to read due largely to regex
syntax, he writes:

  "Funny that other languages have been borrowing Perl's regular
  expressions as fast as they can..."

So I don't think you can claim Larry Wall as a supporter of this list's
anti-regex attitude beyond some superficial verbiage taken out of context.

> Oh, and the icing on the cake, regexes can be a security vulnerability too:
> https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS

And here is a list of CVEs involving Python. There are (at time of
writing) 190 of them.

  http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=python

So if a security vulnerability is reason not to use regexes, we should
all be *running* from Python. I sure you'll point out that most have
been fixed.

But you failed to point out that same is true of regex engines. From
your source:

  "Notice, that not all algorithms are naïve, and actually Regex
  algorithms can be written in an efficient way."

And in fact, again, had you looked beyond a headline that suited your
purpose, you could have tried the "Evil Regexes" noted in that source
and discovered none of them are a DoS in Python.

Even were that not true, normal practice applies: if the input is
untrusted then sanitize it, or mitigate the threat by imposing a timeout,
etc. Not exactly a problem or solution unique to regexes. And common
sense should tell you that since there are a lot of "try a regex" web
sites, this is not a problem without a solution.

And *certainly* not a reason not to use them in the *far* more common
case when they *are* trusted because you are in control of them,

Finally, preemptively, I'll repeat I acknowledge regexs are not the
the optimum solution in every case where they could be used. But they
are very useful when one passes the border of the trivial; and they are
nowhere near as bad as routinely portrayed here.

----
[*] Yes, I made that number up.

[toc] | [prev] | [next] | [standalone]


#98149

FromDenis McMahon <denismfmcmahon@gmail.com>
Date2015-11-03 12:38 +0000
Message-ID<n1a9rk$s40$1@dont-email.me>
In reply to#98124
On Mon, 02 Nov 2015 22:17:49 -0500, Seymore4Head wrote:

> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
> <python.list@tim.thechases.com> wrote:
> 
>>On 2015-11-02 20:09, Seymore4Head wrote:

>>> How do I make a regular expression that returns true if the end of the
>>> line is an asterisk

>>Why use a regular expression?

> Because that is the part of Python I am trying to learn at the moment.

The most important thing to learn about regular expressions is when to 
use them and when not to use them.

Returning true if the last character in a string is an asterisk is almost 
certainly a brilliant example of when not to use a regular expression. 
Here are some timings I tested:

#!/usr/bin/python

import re

import timeit

patt = re.compile("\*$")

start_time = timeit.default_timer()
for i in range(1000000):
    x = re.match("\*$", "test 1")
elapsed = timeit.default_timer() - start_time
print "re, false", elapsed

start_time = timeit.default_timer()
for i in range(1000000):
    x = re.match("\*$", "test *")
elapsed = timeit.default_timer() - start_time
print "re, true", elapsed

start_time = timeit.default_timer()
for i in range(1000000):
    x = patt.match("test 1")
elapsed = timeit.default_timer() - start_time
print "compiled re, false", elapsed

start_time = timeit.default_timer()
for i in range(1000000):
    x = patt.match("test *")
elapsed = timeit.default_timer() - start_time
print "compiled re, true", elapsed

start_time = timeit.default_timer()
for i in range(1000000):
    x = "test 1"[-1] == "*"
elapsed = timeit.default_timer() - start_time
print "char compare, false", elapsed

start_time = timeit.default_timer()
for i in range(1000000):
    x = "test *"[-1] == "*"
elapsed = timeit.default_timer() - start_time
print "char compare, true", elapsed

RESULTS:

re, false 2.4701731205
re, true 2.42048001289
compiled re, false 0.875837087631
compiled re, true 0.876382112503
char compare, false 0.26283121109
char compare, true 0.263465881348

The compiled re is about 3 times as fast as the uncompiled re. The 
character comparison is about 3 times as fast as the compiled re.

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]


Page 4 of 6 — ← Prev page 1 2 3 [4] 5 6  Next page →

Back to top | Article view | comp.lang.python


csiph-web