Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #98121 > unrolled thread
| Started by | Seymore4Head <Seymore4Head@Hotmail.invalid> |
|---|---|
| First post | 2015-11-02 20:09 -0500 |
| Last post | 2015-11-03 22:15 +0000 |
| Articles | 20 on this page of 106 — 30 participants |
Back to article view | Back to comp.lang.python
Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 20:09 -0500
Re: Regular expressions MRAB <python@mrabarnett.plus.com> - 2015-11-03 01:19 +0000
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-02 20:42 -0600
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-02 22:17 -0500
Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-02 22:58 -0500
Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:38 -0700
Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:33 -0800
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 19:04 -0700
Re: Regular expressions Dan Sommers <dan@tombstonezero.net> - 2015-11-04 02:55 +0000
Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:23 +1100
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-03 20:47 -0700
Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-04 13:27 +0000
Re: Regular expressions Nobody <nobody@nowhere.invalid> - 2015-11-04 05:05 +0000
Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-04 09:57 +0100
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:28 +1100
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 20:48 -0600
Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:03 +1100
Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-05 09:33 +0100
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 23:05 +1100
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-05 08:00 -0600
Re: Regular expressions Albert van der Horst <albert@spenarnc.xs4all.nl> - 2015-11-05 13:39 +0000
Re: Regular expressions Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-04 08:00 -0500
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-04 08:13 -0700
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:00 -0500
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:24 -0800
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:24 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:59 -0800
Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 09:18 +0100
Re: Regular expressions rurpy@yahoo.com - 2015-11-06 11:52 -0800
Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-06 21:36 +0100
Re: Regular expressions Larry Martell <larry.martell@gmail.com> - 2015-11-06 15:42 -0500
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:34 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 22:27 -0800
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:42 -0600
Re: Regular expressions Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2015-11-05 20:55 +1300
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:06 +1100
What does “grep” stand for? (was: Regular expressions) Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 05:24 +1100
Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 20:38 +0100
Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:42 +1100
Re: What does “grep” stand for? Christian Gollwitzer <auriocus@gmx.de> - 2015-11-05 08:32 +0100
Re: What does “grep” stand for? Chris Angelico <rosuav@gmail.com> - 2015-11-05 19:00 +1100
Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 10:19 -0500
Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 18:29 +0000
Re: What does “grep” stand for? Random832 <random832@fastmail.com> - 2015-11-05 14:56 -0500
Re: What does “grep” stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-05 20:19 +0000
Re: What does “grep” stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-05 20:18 -0500
Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-05 19:36 -0800
Re: What does “grep” stand for? Dan Sommers <dan@tombstonezero.net> - 2015-11-06 05:31 +0000
Re: What does “grep” stand for? William Ray Wing <wrw@mac.com> - 2015-11-06 08:25 -0500
Re: What does “grep” stand for? Larry Hudson <orgnut@yahoo.com> - 2015-11-06 19:21 -0800
Re: What does ???grep??? stand for? Grant Edwards <invalid@invalid.invalid> - 2015-11-06 14:15 +0000
Re: What does ???grep??? stand for? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2015-11-06 20:03 -0500
Re: What does “grep” stand for? (was: Regular expressions) Tim Chase <python.list@tim.thechases.com> - 2015-11-04 13:05 -0600
Re: Regular expressions Terry Reedy <tjreedy@udel.edu> - 2015-11-04 18:08 -0500
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:29 -0500
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 21:12 -0600
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 14:26 +1100
Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:48 +1100
Re: Regular expressions Christian Gollwitzer <auriocus@gmx.de> - 2015-11-04 08:21 +0100
Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 19:47 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:43 -0800
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 06:38 -0800
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 01:52 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 16:13 -0800
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-05 11:33 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:42 -0800
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 13:26 +1100
Re: Regular expressions Ben Finney <ben+python@benfinney.id.au> - 2015-11-05 14:07 +1100
Re: Regular expressions rurpy@yahoo.com - 2015-11-04 21:54 -0800
Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-05 10:14 +0100
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-04 18:02 -0500
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-05 11:54 +1100
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-05 10:07 -0500
Re: Regular expressions rurpy@yahoo.com - 2015-11-06 12:46 -0800
Re: Regular expressions Steven D'Aprano <steve@pearwood.info> - 2015-11-03 18:15 +1100
Re: Regular expressions Nick Sarbicki <nick.a.sarbicki@gmail.com> - 2015-11-03 08:43 +0000
Re: Regular expressions rurpy@yahoo.com - 2015-11-03 16:22 -0800
Re: Regular expressions Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-03 12:38 +0000
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:53 -0600
Re: Regular expressions Joel Goldstick <joel.goldstick@gmail.com> - 2015-11-03 10:34 -0500
Re: Regular expressions Seymore4Head <Seymore4Head@Hotmail.invalid> - 2015-11-03 11:10 -0500
Re: Regular expressions Chris Angelico <rosuav@gmail.com> - 2015-11-04 03:20 +1100
Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:35 +1100
Re: Regular expressions Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2015-11-04 12:41 +0100
Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 14:56 +0000
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 20:51 -0700
Re: Regular expressions rurpy@yahoo.com - 2015-11-02 20:23 -0800
Re: Regular expressions Michael Torrie <torriem@gmail.com> - 2015-11-02 21:33 -0700
Re: Regular expressions Robin Koch <robin.koch@t-online.de> - 2015-11-03 23:58 +0100
Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 10:25 +0100
Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 05:50 -0600
Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 15:00 +0100
Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 17:12 +0200
Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 16:35 +0100
Re: Irregular last line in a text file, was Re: Regular expressions Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-11-03 18:42 +0200
Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 10:56 -0600
Re: Irregular last line in a text file, was Re: Regular expressions Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-11-04 14:39 +1100
Re: Irregular last line in a text file, was Re: Regular expressions Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-04 10:07 +0000
Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-04 09:33 -0600
Re: Irregular last line in a text file, was Re: Regular expressions Peter Otten <__peter__@web.de> - 2015-11-03 18:44 +0100
Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:33 -0700
Re: Irregular last line in a text file, was Re: Regular expressions Ian Kelly <ian.g.kelly@gmail.com> - 2015-11-03 11:39 -0700
Re: Irregular last line in a text file, was Re: Regular expressions Tim Chase <python.list@tim.thechases.com> - 2015-11-03 13:45 -0600
Re: Irregular last line in a text file, was Re: Regular expressions Grant Edwards <invalid@invalid.invalid> - 2015-11-03 22:15 +0000
Page 4 of 6 — ← Prev page 1 2 3 [4] 5 6 Next page →
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2015-11-04 08:21 +0100 |
| Message-ID | <n1cbi4$ao5$1@dont-email.me> |
| In reply to | #98207 |
Am 04.11.15 um 04:48 schrieb Steven D'Aprano: > On Wednesday 04 November 2015 11:33, rurpy@yahoo.com wrote: > >>> Not quite. Core language concepts like ifs, loops, functions, >>> variables, slicing, etc are the socket wrenches of the programmer's >>> toolbox. Regexs are like an electric impact socket wrench. You can do >>> the same work without it, but in many cases it's slower. But you have to >>> learn the other hand tools first in order to really use the electric >>> driver properly (understanding torques, direction of threads, etc), lest >>> you wonder why you're breaking off so many bolts with the torque of the >>> impact drive. >> >> I consider regexs more fundemental > > I'm sure that there are people who consider the International Space Station > more fundamental than the lever, the wedge and the hammer, but they would be > wrong too. > > Given primitives for branching, loops and variables, you can build support > for regexes. Given regexes, how would you build support for variables? > > Of course, you could easily prove me wrong. You *know* that they are not equivalent, I assume? regexes are equivalent to finite state machines, which are less powerful than Turing machines, and even less powerful than stack machines. You can't even construct a regexp which validates, if parentheses are balanced. What rurpy meant, was that regexes can surface to a computer user earlier than variables and branches; a user who does not go into the depth to actually program the machine, might still encounter them in a text editor or database engine. Even some web forms allow some limited form, like e.g. the DVD rental here or Google. Christian
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2015-11-04 19:47 +1100 |
| Message-ID | <5639c63f$0$2897$c3e8da3$76491128@news.astraweb.com> |
| In reply to | #98210 |
On Wednesday 04 November 2015 18:21, Christian Gollwitzer wrote: > What rurpy meant, was that regexes can surface to a computer user > earlier than variables and branches; a user who does not go into the > depth to actually program the machine, might still encounter them in a > text editor or database engine. Even some web forms allow some limited > form, like e.g. the DVD rental here or Google. What Rurpy meant, only Rurpy can say, but I doubt that is what he is talking about. By that logic, a full-screen high-def 3D first-person shooter game with an advanced AI is "more fundamental" than an assembly language branch operation, because there are people who play computer games without doing assembly programming. In context, Michael suggested that programmers should learn the basic fundamentals of their chosen language, such as variables, for-loops and branching, before regexes -- which Rurpy then disagreed with, claiming that regexes are more fundamental than those basic operations. What *I* think that Rurpy means is that one can construct a mathematical system based on pattern matching which is Turing complete, and therefore in principle any problem you can solve using a program written in (say) Python, C, Lisp, Smalltalk, etc, or execute on a CPU (or simulate in your head!) could be written as a sufficiently complex regular expression. I think he is *technically wrong*, if by "regex" we mean actual regular expressions. Perl, and Python, regexes are strictly more powerful than regular expressions (despite the name). I know that Perl regexes are Turing complete (mainly because they can call out to the Perl interpreter), I'm not sure about Python regexes. But I also think that Rurpy is *not even wrong* if he means Perl or Python regexes. The (entirely theoretical) ability to solve a problem like "What is pi to the power of the first prime number larger than 97531000?" using a regex doesn't make regexes more fundamental than variables, branches and loops. It just makes them an alternative computing paradigm -- one which is *exponentially* more difficult to use than the standard paradigms of functional, procedural, OOP, etc. for anything except the limited subset of pattern matching problems they were created for. -- Steve
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-04 06:43 -0800 |
| Message-ID | <f0ae8c1e-5217-495f-842f-b5c6d86a3e8a@googlegroups.com> |
| In reply to | #98214 |
On Wednesday, November 4, 2015 at 1:52:31 AM UTC-7, Steven D'Aprano wrote: > On Wednesday 04 November 2015 18:21, Christian Gollwitzer wrote: > > > What rurpy meant, was that regexes can surface to a computer user > > earlier than variables and branches; a user who does not go into the > > depth to actually program the machine, might still encounter them in a > > text editor or database engine. Even some web forms allow some limited > > form, like e.g. the DVD rental here or Google. > [...] > What *I* think that Rurpy means is that one can construct a mathematical > system based on pattern matching which is Turing complete, and therefore in > principle any problem you can solve using a program written in (say) Python, > C, Lisp, Smalltalk, etc, or execute on a CPU (or simulate in your head!) > could be written as a sufficiently complex regular expression. No, Christian was correct.
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-04 06:38 -0800 |
| Message-ID | <89a2a4a7-f483-4e94-9f68-ba77ce4b7598@googlegroups.com> |
| In reply to | #98207 |
On 11/03/2015 08:48 PM, Steven D'Aprano wrote: > On Wednesday 04 November 2015 11:33, rurpy wrote: > >>> Not quite. Core language concepts like ifs, loops, functions, >>> variables, slicing, etc are the socket wrenches of the programmer's >>> toolbox. Regexs are like an electric impact socket wrench. You can do >>> the same work without it, but in many cases it's slower. But you have to >>> learn the other hand tools first in order to really use the electric >>> driver properly (understanding torques, direction of threads, etc), lest >>> you wonder why you're breaking off so many bolts with the torque of the >>> impact drive. >> >> I consider regexs more fundemental > > I'm sure that there are people who consider the International Space Station > more fundamental than the lever, the wedge and the hammer, but they would be > wrong too. > > Given primitives for branching, loops and variables, you can build support > for regexes. Given regexes, how would you build support for variables? > > Of course, you could easily prove me wrong. All you would need to do to > demonstrate that regexes are more fundamental than branching, loops and > variables would be to demonstrate that the primitive operations available in > commonly used CPUs are regular expressions, and that (for example) C's for > loop and if...else are implemented in machine code as regular expressions, > rather than the other way around. I'm afraid you are making a category error but perhaps that's in part because I wasn't clear. I was not talking about computer science. I was talking about human beings learning about computers. Most people I know consider programming to be a higher level activity than "using" a computer: editing, sending email etc. Many computer users (not programmers) learn to use regular expressions as part of using a computer without knowing anything about programming. It was on that basis I called them more fundamental -- something learned earlier which is expanded on and added to later. But you have a bit of a point, perhaps "fundamental" was not the best choice of word to communicate that. Here is what I wrote: > I consider regexs more fundemental. One need not even be a programmer > to use them: consider grep, sed, a zillion editors, database query > languages, etc. I thought the context, which you removed even to the point cutting text from the very same line you quoted, made that clear but perhaps not. Indeed it is quite eye-opening when one does learn a little CS and discovers these things that were just a useful "feature" actually have a deep and profound theoretical basis.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-11-05 01:52 +1100 |
| Message-ID | <mailman.18.1446648732.16136.python-list@python.org> |
| In reply to | #98228 |
On Thu, Nov 5, 2015 at 1:38 AM, rurpy--- via Python-list <python-list@python.org> wrote: > I'm afraid you are making a category error but perhaps that's in > part because I wasn't clear. I was not talking about computer > science. I was talking about human beings learning about computers. > Most people I know consider programming to be a higher level activity > than "using" a computer: editing, sending email etc. Many computer > users (not programmers) learn to use regular expressions as part > of using a computer without knowing anything about programming. > It was on that basis I called them more fundamental -- something > learned earlier which is expanded on and added to later. But you > have a bit of a point, perhaps "fundamental" was not the best choice > of word to communicate that. The "fundamentals" of something are its most basic functions, not its most basic uses. The most common use of a computer might be to browse the web, but the fundamental functionality is arithmetic and logic. Setting aside the choice of word, though, I still don't think regular expressions are a more basic use of computing than loops and conditionals. A regex can't be used for anything other than string matching; they exist for one purpose, and one purpose only: to answer the question "Does this string match this pattern?". Sure, you can abuse that into a primality check and other forms of crazy arithmetic, but it's not what they truly do. I also would not teach regexes to people as part of an "introduction to computing" course, any more than I would teach the use of Microsoft Excel, which some such courses have been known to do. (And no, it's not because of the Microsoftness. I wouldn't teach LibreOffice Calc either.) You don't need to know how to work a spreadsheet as part of the basics of computer usage, and you definitely don't need an advanced form of text search. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-04 16:13 -0800 |
| Message-ID | <74351250-2c5b-439d-abc9-65e3480cd930@googlegroups.com> |
| In reply to | #98230 |
On 11/04/2015 07:52 AM, Chris Angelico wrote: > On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote: >> I'm afraid you are making a category error but perhaps that's in >> part because I wasn't clear. I was not talking about computer >> science. I was talking about human beings learning about computers. >> Most people I know consider programming to be a higher level activity >> than "using" a computer: editing, sending email etc. Many computer >> users (not programmers) learn to use regular expressions as part >> of using a computer without knowing anything about programming. >> It was on that basis I called them more fundamental -- something >> learned earlier which is expanded on and added to later. But you >> have a bit of a point, perhaps "fundamental" was not the best choice >> of word to communicate that. > > The "fundamentals" of something are its most basic functions, not its > most basic uses. The most common use of a computer might be to browse > the web, but the fundamental functionality is arithmetic and logic. If one accepted that then one would have to reject the term "fundamental use" as meaningless. A quick trip to google shows that's not true. > Setting aside the choice of word, though, I still don't think regular > expressions are a more basic use of computing than loops and > conditionals. A regex can't be used for anything other than string > matching; they exist for one purpose, and one purpose only: to answer > the question "Does this string match this pattern?". But string matching *is* a fundamental problem that arises frequently in many aspects of CS, programming and, as I mentioned, day-to-day computer use. Saying its "only" for pattern matching is like saying floating point numbers are "only" for doing non-integer arithmetic, or unicode is "only" for representing text. (Neither of those is a good analogy because both lack the important theoretical underpinnings that regular expressions have [*]). There would be far fewer computer languages, and they would be much more primitive if regular expressions (and the fundamental concepts that they express) did not exist. To be sure, I did gloss over Michael Torries' point that there are other concepts that are more basic in the context of learning programming, he was correct about that. But that does not negate the fact that regexes are important and fundamental. They are both very useful in a practical sense (they are even available in Microsoft Excel) and important in a theoretical sense. You are not well rounded as a programmer if you decline to learn about regular expressions because "they are too cryptic", or "I can do in code anything they do". I think the constant negative reception the posters receive here when they ask about regexes does them a great disservice. By all means point out that python offers a number of functions that can avoid the need for using regexes in simple cases. Even point out that you (the plural you) don't like them and prefer other solutions (like writing code that does the same thing in a more half-assed bug ridden way, the posts in this thread being a case in point.) But I really wish every mention of regexes here wasn't reflexively greeted with a barrage of negative comments and that lame "two problems" quote, especially without an answer to the poster's regex question. > Sure, you can > abuse that into a primality check and other forms of crazy arithmetic, > but it's not what they truly do. I also would not teach regexes to > people as part of an "introduction to computing" course, any more than > I would teach the use of Microsoft Excel, which some such courses have > been known to do. (And no, it's not because of the Microsoftness. I > wouldn't teach LibreOffice Calc either.) You don't need to know how to > work a spreadsheet as part of the basics of computer usage, and you > definitely don't need an advanced form of text search. Seems to me that clearly depends on the intent of the class, the students goal's, what they'll be studying after the class, what their current level of knowledge is, etc. Your scenario seems way too under-specified to say anything definitive. And further, the pedagogy of CS (or of any subject of education) is not "settled science" and that kind of question almost never has a clear right/wrong answer. This list is not a class. If someone comes here with a question about Python's regexes they deserve an answer and not be bombarded with reasons why they shouldn't be using regexes beyond mentioning some of the alternatives in a "oh, by the way" way. (And yes, I recognize in this case the OP did get a good answer from MRAB early on.) ---- [*] yes, I know there is a lot of CS theory underlying floating point. I don't think it is as deep or as important as that underlying regexes, automata and language.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-11-05 11:33 +1100 |
| Message-ID | <mailman.37.1446683628.16136.python-list@python.org> |
| In reply to | #98258 |
On Thu, Nov 5, 2015 at 11:13 AM, rurpy--- via Python-list <python-list@python.org> wrote: > On 11/04/2015 07:52 AM, Chris Angelico wrote: >> On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote: >>> I'm afraid you are making a category error but perhaps that's in >>> part because I wasn't clear. I was not talking about computer >>> science. I was talking about human beings learning about computers. >>> Most people I know consider programming to be a higher level activity >>> than "using" a computer: editing, sending email etc. Many computer >>> users (not programmers) learn to use regular expressions as part >>> of using a computer without knowing anything about programming. >>> It was on that basis I called them more fundamental -- something >>> learned earlier which is expanded on and added to later. But you >>> have a bit of a point, perhaps "fundamental" was not the best choice >>> of word to communicate that. >> >> The "fundamentals" of something are its most basic functions, not its >> most basic uses. The most common use of a computer might be to browse >> the web, but the fundamental functionality is arithmetic and logic. > > If one accepted that then one would have to reject the term "fundamental > use" as meaningless. A quick trip to google shows that's not true. A quick trip to Google showed me that there are a number of uses of the phrase, mostly in scientific papers and such. I've no idea how that helps your argument. > But string matching *is* a fundamental problem that arises frequently > in many aspects of CS, programming and, as I mentioned, day-to-day > computer use. Saying its "only" for pattern matching is like saying > floating point numbers are "only" for doing non-integer arithmetic, > or unicode is "only" for representing text. (Neither of those is a > good analogy because both lack the important theoretical underpinnings > that regular expressions have [*]). String matching does happen a lot. How often do you actually need pattern matching? Most of the time, you're doing equality checks - or prefix/suffix checks, at best. > There would be far fewer computer languages, and they would be much > more primitive if regular expressions (and the fundamental concepts > that they express) did not exist. So? There would also be far fewer computer languages if braces didn't exist, because we wouldn't have the interminable arguments about whether they're good or not. > To be sure, I did gloss over Michael Torries' point that there are > other concepts that are more basic in the context of learning > programming, he was correct about that. > > But that does not negate the fact that regexes are important and > fundamental. They are both very useful in a practical sense (they > are even available in Microsoft Excel) and important in a theoretical > sense. You are not well rounded as a programmer if you decline to > learn about regular expressions because "they are too cryptic", or > "I can do in code anything they do". You've proven that they are important, but in no way have you proven them fundamental. A regular expression library is the ideal solution to the problem "I want to let my users search for patterns of their own choosing". That's great, but it's only one specific class of problem. > I think the constant negative reception the posters receive here when > they ask about regexes does them a great disservice. > > By all means point out that python offers a number of functions that > can avoid the need for using regexes in simple cases. Even point out > that you (the plural you) don't like them and prefer other solutions > (like writing code that does the same thing in a more half-assed bug > ridden way, the posts in this thread being a case in point.) > > But I really wish every mention of regexes here wasn't reflexively > greeted with a barrage of negative comments and that lame "two problems" > quote, especially without an answer to the poster's regex question. When has that happened? Usually there'll be at least two answers - one that uses a regex and one that doesn't - and people get to read both. >> Sure, you can >> abuse that into a primality check and other forms of crazy arithmetic, >> but it's not what they truly do. I also would not teach regexes to >> people as part of an "introduction to computing" course, any more than >> I would teach the use of Microsoft Excel, which some such courses have >> been known to do. (And no, it's not because of the Microsoftness. I >> wouldn't teach LibreOffice Calc either.) You don't need to know how to >> work a spreadsheet as part of the basics of computer usage, and you >> definitely don't need an advanced form of text search. > > Seems to me that clearly depends on the intent of the class, the students > goal's, what they'll be studying after the class, what their current > level of knowledge is, etc. Your scenario seems way too under-specified > to say anything definitive. And further, the pedagogy of CS (or of any > subject of education) is not "settled science" and that kind of question > almost never has a clear right/wrong answer. Uhh, "introduction to computing". What's the current level of knowledge? Close to zero. That's the whole point of an introductory class. It's a place where you teach the basics. > This list is not a class. If someone comes here with a question about > Python's regexes they deserve an answer and not be bombarded with reasons > why they shouldn't be using regexes beyond mentioning some of the alternatives > in a "oh, by the way" way. (And yes, I recognize in this case the OP did > get a good answer from MRAB early on.) "I want to swim from Sydney to Los Angeles, but my gloves keep wearing out half way across the Pacific. How can I make my gloves strong enough to get me to LA?" Response 1: "If you use industrial-strength gloves and go via Papua New Guinea, you can double up the gloves and swim to LA." Response 2: "Swimming across the Pacific is a bad idea. Have you considered taking a boat or plane instead?" Which is the more helpful response? You can go ahead and assume the OP always knows best; I'm going to at least offer some alternatives. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-04 21:42 -0800 |
| Message-ID | <a33ce924-c795-447b-8fb9-d7e01fee8936@googlegroups.com> |
| In reply to | #98260 |
On 11/04/2015 05:33 PM, Chris Angelico wrote: > On Thu, Nov 5, 2015 at 11:13 AM, rurpy--- via Python-list > <python-list@python.org> wrote: >> On 11/04/2015 07:52 AM, Chris Angelico wrote: >>> On Thu, Nov 5, 2015 at 1:38 AM, rurpy wrote: >>>> I'm afraid you are making a category error but perhaps that's in >>>> part because I wasn't clear. I was not talking about computer >>>> science. I was talking about human beings learning about computers. >>>> Most people I know consider programming to be a higher level activity >>>> than "using" a computer: editing, sending email etc. Many computer >>>> users (not programmers) learn to use regular expressions as part >>>> of using a computer without knowing anything about programming. >>>> It was on that basis I called them more fundamental -- something >>>> learned earlier which is expanded on and added to later. But you >>>> have a bit of a point, perhaps "fundamental" was not the best choice >>>> of word to communicate that. >>> >>> The "fundamentals" of something are its most basic functions, not its >>> most basic uses. The most common use of a computer might be to browse >>> the web, but the fundamental functionality is arithmetic and logic. >> >> If one accepted that then one would have to reject the term "fundamental >> use" as meaningless. A quick trip to google shows that's not true. > > A quick trip to Google showed me that there are a number of uses of > the phrase, mostly in scientific papers and such. I've no idea how > that helps your argument. I was showing that your objection to my use of "fundamental" on the grounds it does not apply to "use" is patently silly. From Google: interferes with B's more fundamental use because fundamental use of english The fundamental use of testing Fundamental Use of the Michigan Terminal System negotiate a fundamental use and exchange of power the most fundamental use of pointers makes fundamental use of statistical theory This is what I meant in a recent post when I referred to the Alice- in-Wonderland nature of this group. I'm afraid I don't have the time or interest to discuss basic english with you. If you want to maintain that "fundamental" does apply to "use" please go right ahead, it's your credibility at risk. >> But string matching *is* a fundamental problem that arises frequently >> in many aspects of CS, programming and, as I mentioned, day-to-day >> computer use. Saying its "only" for pattern matching is like saying >> floating point numbers are "only" for doing non-integer arithmetic, >> or unicode is "only" for representing text. (Neither of those is a >> good analogy because both lack the important theoretical underpinnings >> that regular expressions have [*]). > > String matching does happen a lot. How often do you actually need > pattern matching? Most of the time, you're doing equality checks - or > prefix/suffix checks, at best. > >> There would be far fewer computer languages, and they would be much >> more primitive if regular expressions (and the fundamental concepts >> that they express) did not exist. > > So? There would also be far fewer computer languages if braces didn't > exist, because we wouldn't have the interminable arguments about > whether they're good or not. Sorry, that makes no sense to me. >> To be sure, I did gloss over Michael Torries' point that there are >> other concepts that are more basic in the context of learning >> programming, he was correct about that. >> >> But that does not negate the fact that regexes are important and >> fundamental. They are both very useful in a practical sense (they >> are even available in Microsoft Excel) and important in a theoretical >> sense. You are not well rounded as a programmer if you decline to >> learn about regular expressions because "they are too cryptic", or >> "I can do in code anything they do". > > You've proven that they are important, but in no way have you proven > them fundamental. A regular expression library is the ideal solution > to the problem "I want to let my users search for patterns of their > own choosing". That's great, but it's only one specific class of > problem. If you think that is the sole use of pattern matching or even the most important use, I can understand why you find regexes fairly useless. Lexing (tokenization) and simple parsing are often done with regular expressions. Many dozens of times a year I write programs to extract or munge data in text files. Three days ago I had to extract data from a 500MB log file for insertion in a database that used many regexes, even some that could have been replaced by python methods. But mixing the two approaches would have been less clear than using regexs consistently. Text recognition and modification is an *extremely* common need, not some niche application as you suggest. >> I think the constant negative reception the posters receive here when >> they ask about regexes does them a great disservice. >> >> By all means point out that python offers a number of functions that >> can avoid the need for using regexes in simple cases. Even point out >> that you (the plural you) don't like them and prefer other solutions >> (like writing code that does the same thing in a more half-assed bug >> ridden way, the posts in this thread being a case in point.) >> >> But I really wish every mention of regexes here wasn't reflexively >> greeted with a barrage of negative comments and that lame "two problems" >> quote, especially without an answer to the poster's regex question. > > When has that happened? Usually there'll be at least two answers - one > that uses a regex and one that doesn't - and people get to read both. No, usually there is one answer with a regex, five advising against regexes, and two with the silly "two problems" quote. The impression one is left with is that regexs are bad and to be avoided. Rarely to never does one see a response encouraging a poster to learn about and use regular expressions which is why I spoke up this time. >>> Sure, you can >>> abuse that into a primality check and other forms of crazy arithmetic, >>> but it's not what they truly do. I also would not teach regexes to >>> people as part of an "introduction to computing" course, any more than >>> I would teach the use of Microsoft Excel, which some such courses have >>> been known to do. (And no, it's not because of the Microsoftness. I >>> wouldn't teach LibreOffice Calc either.) You don't need to know how to >>> work a spreadsheet as part of the basics of computer usage, and you >>> definitely don't need an advanced form of text search. >> >> Seems to me that clearly depends on the intent of the class, the students >> goal's, what they'll be studying after the class, what their current >> level of knowledge is, etc. Your scenario seems way too under-specified >> to say anything definitive. And further, the pedagogy of CS (or of any >> subject of education) is not "settled science" and that kind of question >> almost never has a clear right/wrong answer. > > Uhh, "introduction to computing". What's the current level of > knowledge? Close to zero. That's the whole point of an introductory > class. It's a place where you teach the basics. "Introduction to computing" covers everything from teaching unemployed people how to use word and excel to a first "algorithms and data structures" for AP high-school kids to programming with a heavy dose of hardware architecture. What "the basics" are is, as far as I know, still the subject of debate and research among professional educators. >> This list is not a class. If someone comes here with a question about >> Python's regexes they deserve an answer and not be bombarded with reasons >> why they shouldn't be using regexes beyond mentioning some of the alternatives >> in a "oh, by the way" way. (And yes, I recognize in this case the OP did >> get a good answer from MRAB early on.) > > "I want to swim from Sydney to Los Angeles, but my gloves keep wearing > out half way across the Pacific. How can I make my gloves strong > enough to get me to LA?" > > Response 1: "If you use industrial-strength gloves and go via Papua > New Guinea, you can double up the gloves and swim to LA." > > Response 2: "Swimming across the Pacific is a bad idea. Have you > considered taking a boat or plane instead?" > > Which is the more helpful response? You can go ahead and assume the OP > always knows best; I'm going to at least offer some alternatives. Using a regular expression (even when there are other alternatives) is not analogous to "Swimming across the Pacific". (Back in Wonderland again.) Using a regex is *not* a life threatening situation. I've said repeatedly that pointing out alternatives is fine. Pointing out there is no need for a regex when searching for a constant string is fine. And similar... But the responses here often go well beyond that in negativity. My own theory is that regexes are associated with Perl in the minds of many participants here and thus provoke an automatic immune reaction.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-11-05 13:26 +1100 |
| Message-ID | <563abe6f$0$1614$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #98258 |
On Thu, 5 Nov 2015 11:13 am, rurpy@yahoo.com wrote:
> There would be far fewer computer languages, and they would be much
> more primitive if regular expressions (and the fundamental concepts
> that they express) did not exist.
Well, that's certainly true. But only because contra-factual statements can
imply the truth of anything. If squares had seven sides, then Joseph Stalin
would have been the first woman to go to the moon on horseback.
I can't imagine a world where pattern matching doesn't exist. That's like
trying to imagine a world where arithmetic doesn't exist. But I think we
can safely say that, had nobody thought of the idea of searching for
patterns ('find me all the lines with "green" in them'), there would be far
fewer regex libraries in existence. I doubt that there would be "far fewer"
programming languages. With the possible exception of Perl, sed and awk,
I'm not aware of any languages which were specifically inspired by, and
exist primarily to apply, regular expressions, nor any languages which
*require* regexes in their implementation. Most languages are built on
parsers, not regular expressions.
> But I really wish every mention of regexes here wasn't reflexively
> greeted with a barrage of negative comments and that lame "two problems"
> quote, especially without an answer to the poster's regex question.
I don't disagree with this. Certainly we should accept questions from people
who are simply trying to learn how to use regexes without bombarding them
with admonitions to do something different. Yes yes, I know that regexes
aren't the only tool in my tool box, but *right now* I want to learn how to
use regexes.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2015-11-05 14:07 +1100 |
| Message-ID | <mailman.42.1446693007.16136.python-list@python.org> |
| In reply to | #98266 |
Steven D'Aprano <steve@pearwood.info> writes: > Yes yes, I know that regexes aren't the only tool in my tool box, but > *right now* I want to learn how to use regexes. I'll gently suggest this isn't a particularly good forum to do so. Learn them with a tool like <URL:http://www.regexr.com/> and a tutorial <URL:http://www.usna.edu/Users/cs/wcbrown/regexp/RegexpTutorial.html> or something longer. -- \ “Fascism is capitalism plus murder.” —Upton Sinclair | `\ | _o__) | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-04 21:54 -0800 |
| Message-ID | <8d2e4d8d-dead-4ffd-b6fd-00d16cd0c26c@googlegroups.com> |
| In reply to | #98266 |
On Wednesday, November 4, 2015 at 7:31:34 PM UTC-7, Steven D'Aprano wrote: > On Thu, 5 Nov 2015 11:13 am, rurpy wrote: > > > There would be far fewer computer languages, and they would be much > > more primitive if regular expressions (and the fundamental concepts > > that they express) did not exist. > > Well, that's certainly true. But only because contra-factual statements can > imply the truth of anything. If squares had seven sides, then Joseph Stalin > would have been the first woman to go to the moon on horseback. Yes, thank you for that profoundly insightful comment.
[toc] | [prev] | [next] | [standalone]
| From | Antoon Pardon <antoon.pardon@rece.vub.ac.be> |
|---|---|
| Date | 2015-11-05 10:14 +0100 |
| Message-ID | <mailman.49.1446714861.16136.python-list@python.org> |
| In reply to | #98258 |
Op 05-11-15 om 01:33 schreef Chris Angelico: > "I want to swim from Sydney to Los Angeles, but my gloves keep wearing > out half way across the Pacific. How can I make my gloves strong > enough to get me to LA?" > > Response 1: "If you use industrial-strength gloves and go via Papua > New Guinea, you can double up the gloves and swim to LA." > > Response 2: "Swimming across the Pacific is a bad idea. Have you > considered taking a boat or plane instead?" > > Which is the more helpful response? You can go ahead and assume the OP > always knows best; I'm going to at least offer some alternatives. What I see often enough doesn't look like offering an alternative but more like strong argumentation against the direction the OP is going. I have nothing against offering an alternative. There is the possibilty that there are better methods to solve the original problem and there is nothing wrong with suggesting this possibility. But there is also the possibility that the direction the OP is heading is the correct one, even if you can't see it. Take the original question on how to recognize a line that ends with a '*' with a regular expression. What almost noone seems to have considered is that the real problem might have been more involved and an excellent example of a problem you can solve with regular expressions but that there was this subproblem of recognizing a '*' at the end of the line that was troublesome for the OP. This is a possibility that is all too often ignored by the members on this list. We advise people here to just show to most bare code that still shows the problem, yet we ignore that this effects the part of the problem we get to see and often enough people then insist on a better alternative to deal with the problem totally ignoring that this better alternative may be totally useless in the original context. -- Antoon Pardon
[toc] | [prev] | [next] | [standalone]
| From | Seymore4Head <Seymore4Head@Hotmail.invalid> |
|---|---|
| Date | 2015-11-04 18:02 -0500 |
| Message-ID | <1i3l3b9836hatsuoopak5gtg2c38g49kb1@4ax.com> |
| In reply to | #98207 |
On Wed, 04 Nov 2015 14:48:21 +1100, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: >On Wednesday 04 November 2015 11:33, rurpy@yahoo.com wrote: > >>> Not quite. Core language concepts like ifs, loops, functions, >>> variables, slicing, etc are the socket wrenches of the programmer's >>> toolbox. Regexs are like an electric impact socket wrench. You can do >>> the same work without it, but in many cases it's slower. But you have to >>> learn the other hand tools first in order to really use the electric >>> driver properly (understanding torques, direction of threads, etc), lest >>> you wonder why you're breaking off so many bolts with the torque of the >>> impact drive. >> >> I consider regexs more fundemental > >I'm sure that there are people who consider the International Space Station >more fundamental than the lever, the wedge and the hammer, but they would be >wrong too. > >Given primitives for branching, loops and variables, you can build support >for regexes. Given regexes, how would you build support for variables? > >Of course, you could easily prove me wrong. All you would need to do to >demonstrate that regexes are more fundamental than branching, loops and >variables would be to demonstrate that the primitive operations available in >commonly used CPUs are regular expressions, and that (for example) C's for >loop and if...else are implemented in machine code as regular expressions, >rather than the other way around. So far the only use I have for regex is to replace slicing, but I think it is an improvement.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-11-05 11:54 +1100 |
| Message-ID | <563aa8be$0$1596$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #98252 |
On Thu, 5 Nov 2015 10:02 am, Seymore4Head wrote: > So far the only use I have for regex is to replace slicing, but I > think it is an improvement. I don't understand this. This is like saying "so far the only use I have for a sandwich press is to replace my coffee pot". Regular expressions and slicing do very different things. Slicing extracts substrings, given known starting and ending positions: py> the_str = "Now is the time for all good men..." py> the_str[7:12] 'the t' Regular expressions don't extract substrings with known start/end positions. They *find* matching text, giving a search string with metacharacters. (If there are no metacharacters in your search string, you shouldn't use a regex. str.find will be significantly faster and more convenient.) Slicing is not about finding text, it is about extracting text once you've already found it. So they are complementary, not alternatives. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Seymore4Head <Seymore4Head@Hotmail.invalid> |
|---|---|
| Date | 2015-11-05 10:07 -0500 |
| Message-ID | <ftrm3btf2a8ik9h1uora8p1ptq4sqand60@4ax.com> |
| In reply to | #98263 |
On Thu, 05 Nov 2015 11:54:20 +1100, Steven D'Aprano
<steve@pearwood.info> wrote:
>On Thu, 5 Nov 2015 10:02 am, Seymore4Head wrote:
>
>> So far the only use I have for regex is to replace slicing, but I
>> think it is an improvement.
>
>I don't understand this. This is like saying "so far the only use I have for
>a sandwich press is to replace my coffee pot". Regular expressions and
>slicing do very different things.
>
>Slicing extracts substrings, given known starting and ending positions:
>
>
>py> the_str = "Now is the time for all good men..."
>py> the_str[7:12]
>'the t'
>
>
>Regular expressions don't extract substrings with known start/end positions.
>They *find* matching text, giving a search string with metacharacters. (If
>there are no metacharacters in your search string, you shouldn't use a
>regex. str.find will be significantly faster and more convenient.)
>
>Slicing is not about finding text, it is about extracting text once you've
>already found it. So they are complementary, not alternatives.
Here is an example of the text we are slicing apart.
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.90])
by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
Received: from murder ([unix socket])
by mail.umich.edu (Cyrus v2.2.12) with LMTPA;
Sat, 05 Jan 2008 09:14:16 -0500
Received: from holes.mr.itd.umich.edu (holes.mr.itd.umich.edu
[141.211.14.79])
by flawless.mail.umich.edu () with ESMTP id m05EEFR1013674;
Sat, 5 Jan 2008 09:14:15 -0500
Received: FROM paploo.uhi.ac.uk (app1.prod.collab.uhi.ac.uk
[194.35.219.184])
BY holes.mr.itd.umich.edu ID 477F90B0.2DB2F.12494 ;
5 Jan 2008 09:14:10 -0500
Received: from paploo.uhi.ac.uk (localhost [127.0.0.1])
by paploo.uhi.ac.uk (Postfix) with ESMTP id 5F919BC2F2;
Sat, 5 Jan 2008 14:10:05 +0000 (GMT)
Message-ID: <200801051412.m05ECIaH010327@nakamura.uits.iupui.edu>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Received: from prod.collab.uhi.ac.uk ([194.35.219.182])
by paploo.uhi.ac.uk (JAMES SMTP Server 2.1.3) with SMTP ID
899
for <source@collab.sakaiproject.org>;
Sat, 5 Jan 2008 14:09:50 +0000 (GMT)
Received: from nakamura.uits.iupui.edu (nakamura.uits.iupui.edu
[134.68.220.122])
by shmi.uhi.ac.uk (Postfix) with ESMTP id A215243002
for <source@collab.sakaiproject.org>; Sat, 5 Jan 2008
14:13:33 +0000 (GMT)
Received: from nakamura.uits.iupui.edu (localhost [127.0.0.1])
by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11) with
ESMTP id m05ECJVp010329
for <source@collab.sakaiproject.org>; Sat, 5 Jan 2008 09:12:19
-0500
Received: (from apache@localhost)
by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11/Submit)
id m05ECIaH010327
for source@collab.sakaiproject.org; Sat, 5 Jan 2008 09:12:18
-0500
Date: Sat, 5 Jan 2008 09:12:18 -0500
X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender
to stephen.marquard@uct.ac.za using -f
To: source@collab.sakaiproject.org
From: stephen.marquard@uct.ac.za
The practice problems are something like pull out all the email
addresses or pull out the days of the week and give the most common.
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-06 12:46 -0800 |
| Message-ID | <e17a32f3-3332-4452-bc26-c4097c137b78@googlegroups.com> |
| In reply to | #98308 |
On Thursday, November 5, 2015 at 8:12:22 AM UTC-7, Seymore4Head wrote: > On Thu, 05 Nov 2015 11:54:20 +1100, Steven D'Aprano <steve@pearwood.info> wrote: > >On Thu, 5 Nov 2015 10:02 am, Seymore4Head wrote: > >> So far the only use I have for regex is to replace slicing, but I > >> think it is an improvement. > > > >I don't understand this. This is like saying "so far the only use I have for > >a sandwich press is to replace my coffee pot". Regular expressions and > >slicing do very different things. > >[...] > > Here is an example of the text we are slicing apart. > >[...email headers...] > > The practice problems are something like pull out all the email > addresses or pull out the days of the week and give the most common. Yes, that is a perfectly appropriate use of regexes. As Steven mentioned though, the term "slicing" is also used with a very specific and different meaning in Python, specifically referring to a part of a list using a syntax like "alist[a:b]". I can't seem to get to python.org at the moment but if you look in the Python docs index under "slicing" you'll find more info.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2015-11-03 18:15 +1100 |
| Message-ID | <56385efc$0$1598$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #98130 |
On Tue, 3 Nov 2015 03:23 pm, rurpy@yahoo.com wrote: > Regular expressions should be learned by every programmer or by anyone > who wants to use computers as a tool. They are a fundamental part of > computer science and are used in all sorts of matching and searching > from compilers down to your work-a-day text editor. You are absolutely right. If only regular expressions weren't such an overly-terse, cryptic mini-language, with all but no debugging capabilities, they would be great. If only there wasn't an extensive culture of regular expression abuse within programming communities, they would be fine. All technologies are open to abuse. But we don't say: Some people, when confronted with a problem, think "I know, I'll use arithmetic." Now they have two problems. because abuse of arithmetic is rare. It's hard to misuse it, and while arithmetic can be complicated, it's rare for programmers to abuse it. But the same cannot be said for regexes -- they are regularly misused, abused, and down-right hard to use right even when you have a good reason for using them: http://www.thedailywtf.com/articles/Irregular_Expression http://blog.codinghorror.com/regex-use-vs-regex-abuse/ http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html If there is one person who has done more to create a regex culture, it is Larry Wall, inventor of Perl. Even Larry Wall says that regexes are overused and their syntax is harmful, and he has recreated them for Perl 6: http://www.perl.com/pub/2002/06/04/apo5.html Oh, and the icing on the cake, regexes can be a security vulnerability too: https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Nick Sarbicki <nick.a.sarbicki@gmail.com> |
|---|---|
| Date | 2015-11-03 08:43 +0000 |
| Message-ID | <mailman.9.1446540211.8789.python-list@python.org> |
| In reply to | #98137 |
On Tue, Nov 3, 2015 at 7:15 AM, Steven D'Aprano <steve@pearwood.info> wrote: > On Tue, 3 Nov 2015 03:23 pm, rurpy@yahoo.com wrote: > > > Regular expressions should be learned by every programmer or by anyone > > who wants to use computers as a tool. They are a fundamental part of > > computer science and are used in all sorts of matching and searching > > from compilers down to your work-a-day text editor. > > You are absolutely right. > > If only regular expressions weren't such an overly-terse, cryptic > mini-language, with all but no debugging capabilities, they would be great. > > If only there wasn't an extensive culture of regular expression abuse > within > programming communities, they would be fine. > > All technologies are open to abuse. But we don't say: > > Some people, when confronted with a problem, think "I know, I'll use > arithmetic." Now they have two problems. > > because abuse of arithmetic is rare. It's hard to misuse it, and while > arithmetic can be complicated, it's rare for programmers to abuse it. But > the same cannot be said for regexes -- they are regularly misused, abused, > and down-right hard to use right even when you have a good reason for using > them: > > http://www.thedailywtf.com/articles/Irregular_Expression > > http://blog.codinghorror.com/regex-use-vs-regex-abuse/ > > > http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html > > > If there is one person who has done more to create a regex culture, it is > Larry Wall, inventor of Perl. Even Larry Wall says that regexes are > overused and their syntax is harmful, and he has recreated them for Perl 6: > > http://www.perl.com/pub/2002/06/04/apo5.html > > Oh, and the icing on the cake, regexes can be a security vulnerability too: > > > https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS > > > > -- > Steven > > -- > https://mail.python.org/mailman/listinfo/python-list > +1 I agree that regex is an entirely necessary part of a programmers toolkit, but dear god some people need to be taught restraint. The majority of people I talk about regex to have no idea when and where it shouldn't be used. As an example part of my job is bringing our legacy Python code into the modern day, and one of the largest roadblocks is the amount of regex used. Some is necessary. Some can be replaced by an `if word in str` or something similarly basic. Some spans hundreds of lines and causes acute alopecia. Just yesterday I found a colleague trying to parse HTML with regex. So yes, teach regex, but teach it after the basics, and please emphasise when it is appropriate to use it. Yes I am bitter. - Nick.
[toc] | [prev] | [next] | [standalone]
| From | rurpy@yahoo.com |
|---|---|
| Date | 2015-11-03 16:22 -0800 |
| Message-ID | <455b6498-5104-491a-98c2-6f7e48142496@googlegroups.com> |
| In reply to | #98137 |
On 11/03/2015 12:15 AM, Steven D'Aprano wrote: > On Tue, 3 Nov 2015 03:23 pm, rurpy wrote: > >> Regular expressions should be learned by every programmer or by anyone >> who wants to use computers as a tool. They are a fundamental part of >> computer science and are used in all sorts of matching and searching >> from compilers down to your work-a-day text editor. > > You are absolutely right. > > If only regular expressions weren't such an overly-terse, cryptic > mini-language, with all but no debugging capabilities, they would be great. > > If only there wasn't an extensive culture of regular expression abuse within > programming communities, they would be fine. > > All technologies are open to abuse. But we don't say: > > Some people, when confronted with a problem, think "I know, I'll use > arithmetic." Now they have two problems. > > because abuse of arithmetic is rare. It's hard to misuse it, and while > arithmetic can be complicated, it's rare for programmers to abuse it. But > the same cannot be said for regexes -- they are regularly misused, abused, > and down-right hard to use right even when you have a good reason for using > them: > > http://www.thedailywtf.com/articles/Irregular_Expression > > http://blog.codinghorror.com/regex-use-vs-regex-abuse/ > > http://psung.blogspot.com.au/2008/01/wonderful-abuse-of-regular-expressions.html Thanks for pointing out three cases of misuse of regexes out of the approximately 375000000 [*] uses of regexes in the wild. I hope you're not dumb enough to think that constitutes significant evidence. Even worse, of the three only one was a real example. One of the others was machine-generated code, the other was a "look what you can do with regexes" example, not serious code. Here is an example of "abusing" python https://benkurtovic.com/2014/06/01/obfuscating-hello-world.html I wouldn't use this as evidence that Python is to be avoided. > If there is one person who has done more to create a regex culture, it is > Larry Wall, inventor of Perl. Even Larry Wall says that regexes are > overused and their syntax is harmful, and he has recreated them for Perl 6: > > http://www.perl.com/pub/2002/06/04/apo5.html You really should have read beyond the first paragraph. He proposes fixing regexes by adding even more special character combinations and making regexes even *more* powerful. (He turned them into full-blown parsers.) Nowhere does he advocate not using, or avoiding if possible, regexes as is the mantra in this list. Here is Larry's "recreation" that you are touting: http://design.perl6.org/S05.html Please explain to us how you think this "fix" addresses the complaints you and other Python anti-regexers have about regexes. I hope you also noted Larry's tongue-in-cheek writing style. Right after pointing out that some claim Perl is hard to read due largely to regex syntax, he writes: "Funny that other languages have been borrowing Perl's regular expressions as fast as they can..." So I don't think you can claim Larry Wall as a supporter of this list's anti-regex attitude beyond some superficial verbiage taken out of context. > Oh, and the icing on the cake, regexes can be a security vulnerability too: > https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS And here is a list of CVEs involving Python. There are (at time of writing) 190 of them. http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=python So if a security vulnerability is reason not to use regexes, we should all be *running* from Python. I sure you'll point out that most have been fixed. But you failed to point out that same is true of regex engines. From your source: "Notice, that not all algorithms are naïve, and actually Regex algorithms can be written in an efficient way." And in fact, again, had you looked beyond a headline that suited your purpose, you could have tried the "Evil Regexes" noted in that source and discovered none of them are a DoS in Python. Even were that not true, normal practice applies: if the input is untrusted then sanitize it, or mitigate the threat by imposing a timeout, etc. Not exactly a problem or solution unique to regexes. And common sense should tell you that since there are a lot of "try a regex" web sites, this is not a problem without a solution. And *certainly* not a reason not to use them in the *far* more common case when they *are* trusted because you are in control of them, Finally, preemptively, I'll repeat I acknowledge regexs are not the the optimum solution in every case where they could be used. But they are very useful when one passes the border of the trivial; and they are nowhere near as bad as routinely portrayed here. ---- [*] Yes, I made that number up.
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-11-03 12:38 +0000 |
| Message-ID | <n1a9rk$s40$1@dont-email.me> |
| In reply to | #98124 |
On Mon, 02 Nov 2015 22:17:49 -0500, Seymore4Head wrote:
> On Mon, 2 Nov 2015 20:42:37 -0600, Tim Chase
> <python.list@tim.thechases.com> wrote:
>
>>On 2015-11-02 20:09, Seymore4Head wrote:
>>> How do I make a regular expression that returns true if the end of the
>>> line is an asterisk
>>Why use a regular expression?
> Because that is the part of Python I am trying to learn at the moment.
The most important thing to learn about regular expressions is when to
use them and when not to use them.
Returning true if the last character in a string is an asterisk is almost
certainly a brilliant example of when not to use a regular expression.
Here are some timings I tested:
#!/usr/bin/python
import re
import timeit
patt = re.compile("\*$")
start_time = timeit.default_timer()
for i in range(1000000):
x = re.match("\*$", "test 1")
elapsed = timeit.default_timer() - start_time
print "re, false", elapsed
start_time = timeit.default_timer()
for i in range(1000000):
x = re.match("\*$", "test *")
elapsed = timeit.default_timer() - start_time
print "re, true", elapsed
start_time = timeit.default_timer()
for i in range(1000000):
x = patt.match("test 1")
elapsed = timeit.default_timer() - start_time
print "compiled re, false", elapsed
start_time = timeit.default_timer()
for i in range(1000000):
x = patt.match("test *")
elapsed = timeit.default_timer() - start_time
print "compiled re, true", elapsed
start_time = timeit.default_timer()
for i in range(1000000):
x = "test 1"[-1] == "*"
elapsed = timeit.default_timer() - start_time
print "char compare, false", elapsed
start_time = timeit.default_timer()
for i in range(1000000):
x = "test *"[-1] == "*"
elapsed = timeit.default_timer() - start_time
print "char compare, true", elapsed
RESULTS:
re, false 2.4701731205
re, true 2.42048001289
compiled re, false 0.875837087631
compiled re, true 0.876382112503
char compare, false 0.26283121109
char compare, true 0.263465881348
The compiled re is about 3 times as fast as the uncompiled re. The
character comparison is about 3 times as fast as the compiled re.
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
Page 4 of 6 — ← Prev page 1 2 3 [4] 5 6 Next page →
Back to top | Article view | comp.lang.python
csiph-web