Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #98194

Re: Regular expressions

Newsgroups comp.lang.python
Date 2015-11-03 16:22 -0800
References (1 earlier) <mailman.0.1446519578.8789.python-list@python.org> <hp9g3b9hsn06edb0po8bduegjqkmpo4p8n@4ax.com> <mailman.3.1446523111.8789.python-list@python.org> <d39290cf-cb26-470f-a987-2f71e3860f97@googlegroups.com> <56385efc$0$1598$c3e8da3$5496439d@news.astraweb.com>
Message-ID <455b6498-5104-491a-98c2-6f7e48142496@googlegroups.com> (permalink)
Subject Re: Regular expressions
From rurpy@yahoo.com

Show all headers | View raw


On 11/03/2015 12:15 AM, Steven D'Aprano wrote:
> On Tue, 3 Nov 2015 03:23 pm, rurpy wrote:
> 
>> Regular expressions should be learned by every programmer or by anyone
>> who wants to use computers as a tool.  They are a fundamental part of
>> computer science and are used in all sorts of matching and searching
>> from compilers down to your work-a-day text editor.
> 
> You are absolutely right.
> 
> If only regular expressions weren't such an overly-terse, cryptic
> mini-language, with all but no debugging capabilities, they would be great.
> 
> If only there wasn't an extensive culture of regular expression abuse within
> programming communities, they would be fine.
> 
> All technologies are open to abuse. But we don't say:
> 
>   Some people, when confronted with a problem, think "I know, I'll use
>   arithmetic." Now they have two problems.
> 
> because abuse of arithmetic is rare. It's hard to misuse it, and while
> arithmetic can be complicated, it's rare for programmers to abuse it. But
> the same cannot be said for regexes -- they are regularly misused, abused,
> and down-right hard to use right even when you have a good reason for using
> them:
> 
> http://www.thedailywtf.com/articles/Irregular_Expression
> 
> http://blog.codinghorror.com/regex-use-vs-regex-abuse/
> 
> http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html

Thanks for pointing out three cases of misuse of regexes out of the
approximately 375000000 [*] uses of regexes in the wild. I hope you're
not dumb enough to think that constitutes significant evidence.

Even worse, of the three only one was a real example. One of the others
was machine-generated code, the other was a "look what you can do with
regexes" example, not serious code.

Here is an example of "abusing" python

  https://benkurtovic.com/2014/06/01/obfuscating-hello-world.html

I wouldn't use this as evidence that Python is to be avoided.

> If there is one person who has done more to create a regex culture, it is
> Larry Wall, inventor of Perl. Even Larry Wall says that regexes are
> overused and their syntax is harmful, and he has recreated them for Perl 6:
> 
> http://www.perl.com/pub/2002/06/04/apo5.html

You really should have read beyond the first paragraph. He proposes
fixing regexes by adding even more special character combinations and
making regexes even *more* powerful. (He turned them into full-blown
parsers.)

Nowhere does he advocate not using, or avoiding if possible, regexes
as is the mantra in this list.

Here is Larry's "recreation" that you are touting:

  http://design.perl6.org/S05.html

Please explain to us how you think this "fix" addresses the complaints
you and other Python anti-regexers have about regexes.

I hope you also noted Larry's tongue-in-cheek writing style. Right after
pointing out that some claim Perl is hard to read due largely to regex
syntax, he writes:

  "Funny that other languages have been borrowing Perl's regular
  expressions as fast as they can..."

So I don't think you can claim Larry Wall as a supporter of this list's
anti-regex attitude beyond some superficial verbiage taken out of context.

> Oh, and the icing on the cake, regexes can be a security vulnerability too:
> https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS

And here is a list of CVEs involving Python. There are (at time of
writing) 190 of them.

  http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=python

So if a security vulnerability is reason not to use regexes, we should
all be *running* from Python. I sure you'll point out that most have
been fixed.

But you failed to point out that same is true of regex engines. From
your source:

  "Notice, that not all algorithms are naïve, and actually Regex
  algorithms can be written in an efficient way."

And in fact, again, had you looked beyond a headline that suited your
purpose, you could have tried the "Evil Regexes" noted in that source
and discovered none of them are a DoS in Python.

Even were that not true, normal practice applies: if the input is
untrusted then sanitize it, or mitigate the threat by imposing a timeout,
etc. Not exactly a problem or solution unique to regexes. And common
sense should tell you that since there are a lot of "try a regex" web
sites, this is not a problem without a solution.

And *certainly* not a reason not to use them in the *far* more common
case when they *are* trusted because you are in control of them,

Finally, preemptively, I'll repeat I acknowledge regexs are not the
the optimum solution in every case where they could be used. But they
are very useful when one passes the border of the trivial; and they are
nowhere near as bad as routinely portrayed here.

----
[*] Yes, I made that number up.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 20:09 -0500
  Re: Regular expressions MRAB <python@mrabarnett.plus.com> - 2015-11-03 01:19 +0000
    Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
  Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-02 20:42 -0600
    Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
      Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-02 22:58 -0500
        Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
          Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:38 -0700
            Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:33 -0800
              Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 19:04 -0700
                Re: Regular expressions Dan Sommers <dan@tombstonezero.net> - 2015-11-04 02:55 +0000
                Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:23 +1100
                Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 20:47 -0700
                Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-04 13:27 +0000
                Re: Regular expressions Nobody <nobody@nowhere.invalid> - 2015-11-04 05:05 +0000
                Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-04 09:57 +0100
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:28 +1100
                Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 20:48 -0600
                Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:03 +1100
                Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-05 09:33 +0100
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 23:05 +1100
                Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-05 08:00 -0600
                Re: Regular expressions Albert van der Horst <albert@spenarnc.xs4all.nl> - 2015-11-05 13:39 +0000
                Re: Regular expressions Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-04 08:00 -0500
                Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-04 08:13 -0700
                Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:00 -0500
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:24 -0800
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:24 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:59 -0800
                Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 09:18 +0100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-06 11:52 -0800
                Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-06 21:36 +0100
                Re: Regular expressions Larry Martell <larry.martell@gmail.com> - 2015-11-06 15:42 -0500
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:34 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 22:27 -0800
                Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:42 -0600
                Re: Regular expressions Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-11-05 20:55 +1300
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:06 +1100
                What does “grep” stand for? (was: Regular expressions) Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 05:24 +1100
                Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 20:38 +0100
                Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:42 +1100
                Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 08:32 +0100
                Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:00 +1100
                Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 10:19 -0500
                Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 18:29 +0000
                Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 14:56 -0500
                Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 20:19 +0000
                Re: What does “grep” stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-05 20:18 -0500
                Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-05 19:36 -0800
                Re: What does “grep” stand for? Dan Sommers <dan@tombstonezero.net> - 2015-11-06 05:31 +0000
                Re: What does “grep” stand for? William Ray Wing <wrw@mac.com> - 2015-11-06 08:25 -0500
                Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-06 19:21 -0800
                Re: What does ???grep??? stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-06 14:15 +0000
                Re: What does ???grep??? stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-06 20:03 -0500
                Re: What does “grep” stand for? (was: Regular expressions) Tim Chase <python.list@tim.thechases.com> - 2015-11-04 13:05 -0600
                Re: Regular expressions Terry Reedy <tjreedy@udel.edu> - 2015-11-04 18:08 -0500
                Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:29 -0500
              Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 21:12 -0600
              Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 14:26 +1100
              Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:48 +1100
                Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 08:21 +0100
                Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 19:47 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:43 -0800
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:38 -0800
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 01:52 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:13 -0800
                Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:33 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:42 -0800
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:26 +1100
                Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:07 +1100
                Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:54 -0800
                Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-05 10:14 +0100
                Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:02 -0500
                Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 11:54 +1100
                Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-05 10:07 -0500
                Re: Regular expressions rurpy@yahoo.com - 2015-11-06 12:46 -0800
          Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-03 18:15 +1100
            Re: Regular expressions Nick Sarbicki <nick.a.sarbicki@gmail.com> - 2015-11-03 08:43 +0000
            Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:22 -0800
      Re: Regular expressions Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-03 12:38 +0000
      Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:53 -0600
      Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-03 10:34 -0500
        Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-03 11:10 -0500
          Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 03:20 +1100
            Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:35 +1100
              Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-04 12:41 +0100
    Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 14:56 +0000
  Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 20:51 -0700
    Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
      Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:33 -0700
      Re: Regular expressions Robin Koch <robin.koch@t-online.de> - 2015-11-03 23:58 +0100
  Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 10:25 +0100
  Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:50 -0600
  Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 15:00 +0100
    Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 17:12 +0200
      Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 16:35 +0100
        Re: Irregular last line in a text file, was Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 18:42 +0200
      Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 10:56 -0600
        Re: Irregular last line in a text file, was Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:39 +1100
          Re: Irregular last line in a text file, was Re: Regular expressions Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-04 10:07 +0000
          Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:33 -0600
      Re: Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 18:44 +0100
      Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:33 -0700
      Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:39 -0700
      Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 13:45 -0600
        Re: Irregular last line in a text file, was Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 22:15 +0000

csiph-web