Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #74006 > unrolled thread
| Started by | rxjwg98@gmail.com |
|---|---|
| First post | 2014-07-06 04:51 -0700 |
| Last post | 2014-07-06 10:58 -0700 |
| Articles | 12 — 6 participants |
Back to article view | Back to comp.lang.python
Question about metacharacter '*' rxjwg98@gmail.com - 2014-07-06 04:51 -0700
Re: Question about metacharacter '*' Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-07-06 05:09 -0700
Re: Question about metacharacter '*' rxjwg98@gmail.com - 2014-07-07 11:51 -0700
Re: Question about metacharacter '*' Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-07-07 13:27 -0700
Re: Question about metacharacter '*' Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-07-07 22:50 +0100
Re: Question about metacharacter '*' MRAB <python@mrabarnett.plus.com> - 2014-07-06 16:32 +0100
Re: Question about metacharacter '*' Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-07-06 08:50 -0700
Re: Question about metacharacter '*' Rick Johnson <rantingrickjohnson@gmail.com> - 2014-07-06 09:24 -0700
Re: Question about metacharacter '*' Rick Johnson <rantingrickjohnson@gmail.com> - 2014-07-06 09:32 -0700
Re: Question about metacharacter '*' Roy Smith <roy@panix.com> - 2014-07-06 12:47 -0400
Re: Question about metacharacter '*' Rick Johnson <rantingrickjohnson@gmail.com> - 2014-07-06 10:38 -0700
Re: Question about metacharacter '*' Rick Johnson <rantingrickjohnson@gmail.com> - 2014-07-06 10:58 -0700
| From | rxjwg98@gmail.com |
|---|---|
| Date | 2014-07-06 04:51 -0700 |
| Subject | Question about metacharacter '*' |
| Message-ID | <3f7ecf04-b881-4e79-aa59-893580090468@googlegroups.com> |
Hi, I just begin to learn Python. I do not see the usefulness of '*' in its description below: The first metacharacter for repeating things that we'll look at is *. * doesn't match the literal character *; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once. For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a characters), and so forth. It has to be used with other search constraints? Thanks,
[toc] | [next] | [standalone]
| From | Devin Jeanpierre <jeanpierreda@gmail.com> |
|---|---|
| Date | 2014-07-06 05:09 -0700 |
| Message-ID | <mailman.11536.1404648646.18130.python-list@python.org> |
| In reply to | #74006 |
On Sun, Jul 6, 2014 at 4:51 AM, <rxjwg98@gmail.com> wrote: > Hi, > > I just begin to learn Python. I do not see the usefulness of '*' in its > description below: > > > > > The first metacharacter for repeating things that we'll look at is *. * doesn't > match the literal character *; instead, it specifies that the previous character > can be matched zero or more times, instead of exactly once. > > For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a > characters), and so forth. > > > > It has to be used with other search constraints? (BTW, this is a regexp question, not really a Python question per se.) That's usually when it's useful, yeah. For example, [0-9] matches any of the characters 0 through 9. So to match a natural number written in decimal form, we might use the regexp [0-9][0-9]*, which matches the strings "1", "12", and "007", but not "" or "Jeffrey". Another useful one is `.*` -- `.` matches exactly one character, no matter what that character is. So, `.*` matches any string at all. The power of regexps stems from the ability to mix and match all of the regexp pieces in pretty much any way you want. -- Devin
[toc] | [prev] | [next] | [standalone]
| From | rxjwg98@gmail.com |
|---|---|
| Date | 2014-07-07 11:51 -0700 |
| Message-ID | <86b50887-96b0-4a21-8b0a-26c5a435c76d@googlegroups.com> |
| In reply to | #74007 |
On Sunday, July 6, 2014 8:09:57 AM UTC-4, Devin Jeanpierre wrote: > On Sun, Jul 6, 2014 at 4:51 AM, <rxjwg98@gmail.com> wrote: > > > Hi, > > > > > > I just begin to learn Python. I do not see the usefulness of '*' in its > > > description below: > > > > > > > > > > > > > > > The first metacharacter for repeating things that we'll look at is *. * doesn't > > > match the literal character *; instead, it specifies that the previous character > > > can be matched zero or more times, instead of exactly once. > > > > > > For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a > > > characters), and so forth. > > > > > > > > > > > > It has to be used with other search constraints? > > > > (BTW, this is a regexp question, not really a Python question per se.) > > > > That's usually when it's useful, yeah. For example, [0-9] matches any > > of the characters 0 through 9. So to match a natural number written in > > decimal form, we might use the regexp [0-9][0-9]*, which matches the > > strings "1", "12", and "007", but not "" or "Jeffrey". > > > > Another useful one is `.*` -- `.` matches exactly one character, no > > matter what that character is. So, `.*` matches any string at all. > > > > The power of regexps stems from the ability to mix and match all of > > the regexp pieces in pretty much any way you want. > > > > -- Devin Would you give me an example using your pattern: `.*` -- `.`? I try it, but it cannot pass. (of course, I use it incorrectly)
[toc] | [prev] | [next] | [standalone]
| From | Devin Jeanpierre <jeanpierreda@gmail.com> |
|---|---|
| Date | 2014-07-07 13:27 -0700 |
| Message-ID | <mailman.11609.1404764921.18130.python-list@python.org> |
| In reply to | #74134 |
On Mon, Jul 7, 2014 at 11:51 AM, <rxjwg98@gmail.com> wrote: > Would you give me an example using your pattern: `.*` -- `.`? > I try it, but it cannot pass. (of course, I use it incorrectly) Those are two patterns. Python 3.4.1 (default, Jul 7 2014, 13:22:02) [GCC 4.6.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> re.fullmatch(r'.', 'a') <_sre.SRE_Match object; span=(0, 1), match='a'> >>> re.fullmatch(r'.', 'ab') >>> re.fullmatch(r'.', '') >>> >>> re.fullmatch(r'.*', 'a') <_sre.SRE_Match object; span=(0, 1), match='a'> >>> re.fullmatch(r'.*', 'ab') <_sre.SRE_Match object; span=(0, 2), match='ab'> >>> re.fullmatch(r'.*', '') <_sre.SRE_Match object; span=(0, 0), match=''> -- Devin
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2014-07-07 22:50 +0100 |
| Message-ID | <mailman.11610.1404769798.18130.python-list@python.org> |
| In reply to | #74134 |
On 07/07/2014 19:51, rxjwg98@gmail.com wrote: Will you please do something about the double spaced google crap that you keep sending, I've already asked you twice. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2014-07-06 16:32 +0100 |
| Message-ID | <mailman.11543.1404660755.18130.python-list@python.org> |
| In reply to | #74006 |
On 2014-07-06 13:09, Devin Jeanpierre wrote: > On Sun, Jul 6, 2014 at 4:51 AM, <rxjwg98@gmail.com> wrote: >> Hi, >> >> I just begin to learn Python. I do not see the usefulness of '*' in its >> description below: >> >> >> >> >> The first metacharacter for repeating things that we'll look at is *. * doesn't >> match the literal character *; instead, it specifies that the previous character >> can be matched zero or more times, instead of exactly once. >> >> For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a >> characters), and so forth. >> >> >> >> It has to be used with other search constraints? > > (BTW, this is a regexp question, not really a Python question per se.) > > That's usually when it's useful, yeah. For example, [0-9] matches any > of the characters 0 through 9. So to match a natural number written in > decimal form, we might use the regexp [0-9][0-9]*, which matches the > strings "1", "12", and "007", but not "" or "Jeffrey". > > Another useful one is `.*` -- `.` matches exactly one character, no > matter what that character is. So, `.*` matches any string at all. > Not quite. It won't match a '\n' unless the DOTALL flag is turned on. > The power of regexps stems from the ability to mix and match all of > the regexp pieces in pretty much any way you want. >
[toc] | [prev] | [next] | [standalone]
| From | Devin Jeanpierre <jeanpierreda@gmail.com> |
|---|---|
| Date | 2014-07-06 08:50 -0700 |
| Message-ID | <mailman.11545.1404662273.18130.python-list@python.org> |
| In reply to | #74006 |
In related news, the regexp I gave for numbers will match "1a". -- Devin On Sun, Jul 6, 2014 at 8:32 AM, MRAB <python@mrabarnett.plus.com> wrote: > On 2014-07-06 13:09, Devin Jeanpierre wrote: >> >> On Sun, Jul 6, 2014 at 4:51 AM, <rxjwg98@gmail.com> wrote: >>> >>> Hi, >>> >>> I just begin to learn Python. I do not see the usefulness of '*' in its >>> description below: >>> >>> >>> >>> >>> The first metacharacter for repeating things that we'll look at is *. * >>> doesn't >>> match the literal character *; instead, it specifies that the previous >>> character >>> can be matched zero or more times, instead of exactly once. >>> >>> For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a >>> characters), and so forth. >>> >>> >>> >>> It has to be used with other search constraints? >> >> >> (BTW, this is a regexp question, not really a Python question per se.) >> >> That's usually when it's useful, yeah. For example, [0-9] matches any >> of the characters 0 through 9. So to match a natural number written in >> decimal form, we might use the regexp [0-9][0-9]*, which matches the >> strings "1", "12", and "007", but not "" or "Jeffrey". >> >> Another useful one is `.*` -- `.` matches exactly one character, no >> matter what that character is. So, `.*` matches any string at all. >> > Not quite. It won't match a '\n' unless the DOTALL flag is turned on. > > >> The power of regexps stems from the ability to mix and match all of >> the regexp pieces in pretty much any way you want. >> > > -- > https://mail.python.org/mailman/listinfo/python-list
[toc] | [prev] | [next] | [standalone]
| From | Rick Johnson <rantingrickjohnson@gmail.com> |
|---|---|
| Date | 2014-07-06 09:24 -0700 |
| Message-ID | <d8f8d76d-0a47-4f59-8f09-da2a44cc1d2e@googlegroups.com> |
| In reply to | #74026 |
On Sunday, July 6, 2014 10:50:13 AM UTC-5, Devin Jeanpierre wrote:
> In related news, the regexp I gave for numbers will match "1a".
Well of course it matched, because your pattern defines "one
or more consecutive digits". So it will match the "1" of
"1a" and the "11" of "11a" likewise.
As an aside i prefer to only utilize a "character set" when
nothing else will suffice. And in this case r"[0-9][0-9]*"
can be expressed just as correctly (and less noisy IMHO) as
r"\d\d*".
============================================================
INTERACTIVE SESSION: Python 2.x
============================================================
# Note: Grouping used for explicitness.
#
# Using character sets:
>>> import re
>>> re.search(r'([0-9][0-9]*)', '1a').groups()
('1',)
>>> re.search(r'([0-9][0-9]*)', '11a').groups()
('11',)
>>> re.search(r'([0-9][0-9]*)', '111aaa222').groups()
('111',)
#
# Same result without charactor sets:
>>> re.search(r'(\d\d*)', '1a').groups()
('1',)
>>> re.search(r'(\d\d*)', '11a').groups()
('11',)
>>> re.search(r'(\d\d*)', '111aaa222').groups()
('111',)
[toc] | [prev] | [next] | [standalone]
| From | Rick Johnson <rantingrickjohnson@gmail.com> |
|---|---|
| Date | 2014-07-06 09:32 -0700 |
| Message-ID | <c40ca02c-93e6-4a3c-86b5-bb4062421554@googlegroups.com> |
| In reply to | #74031 |
[CONTINUED FROM LAST REPLY...] Likewise if your intent is to filter out any match strings which contain non-digits, then define the start and stop points of the pattern: # Match only if all are digits >>> re.match(r'\d\d*$', '111aaa222') # fails # Match only if all are digits and, # allow leading white-space >>> re.match(r'\s*\d\d*$', ' 111') <_sre.SRE_Match object at 0x026D8410> # But not trailing space! >>> re.match(r'\s*\d\d*$', ' 111 ') # fails
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2014-07-06 12:47 -0400 |
| Message-ID | <roy-279E10.12473806072014@news.panix.com> |
| In reply to | #74031 |
In article <d8f8d76d-0a47-4f59-8f09-da2a44cc1d2e@googlegroups.com>,
Rick Johnson <rantingrickjohnson@gmail.com> wrote:
> As an aside i prefer to only utilize a "character set" when
> nothing else will suffice. And in this case r"[0-9][0-9]*"
> can be expressed just as correctly (and less noisy IMHO) as
> r"\d\d*".
Even better, r"\d+"
>>> re.search(r'(\d\d*)', '111aaa222').groups()
('111',)
>>> re.search(r'(\d+)', '111aaa222').groups()
('111',)
Oddly enough, I prefer character sets to the backslash notation, but I
suppose that's largely because when I first learned regexes, that
new-fangled backslash stuff hadn't been invented yet. :-)
I know I've said this before, but people should put more effort into
learning regex. There are lots of good tools in Python (startswith,
endswith, split, in, etc) which handle many of the most common regex use
cases. Regex is also not as easy to use in Python as it is in a
language like Perl where it's baked into the syntax. As a result,
pythonistas tend to shy away from regex, and either never learn the full
power, or let their skills grow rusty. Which is a shame, because for
many tasks, there's no better tool.
[toc] | [prev] | [next] | [standalone]
| From | Rick Johnson <rantingrickjohnson@gmail.com> |
|---|---|
| Date | 2014-07-06 10:38 -0700 |
| Message-ID | <2f63eeac-77c3-48fc-b4dd-020ad74cd0b1@googlegroups.com> |
| In reply to | #74034 |
On Sunday, July 6, 2014 11:47:38 AM UTC-5, Roy Smith wrote:
> Even better, r"\d+"
> >>> re.search(r'(\d\d*)', '111aaa222').groups()
> ('111',)
> >>> re.search(r'(\d+)', '111aaa222').groups()
> ('111',)
Yes, good catch! I had failed to reduce your original
pattern down to it's most fundamental aspects for the sake
of completeness, and instead, opted to modify it in a manner
that mirrored your example.
> Oddly enough, I prefer character sets to the backslash
> notation, but I suppose that's largely because when I
> first learned regexes, that new-fangled backslash stuff
> hadn't been invented yet. :-)
Ha, point taken! :-)
Character sets really shine when you need a fixed range of
letters or numbers which are NOT defined by one of the
"special characters" of \d \D \W \w, etc...
Say you want to match any letters between "c" and "m" or the
digits between "3" and "6". Defining that pattern using OR'd
"char literals" would be a massive undertaking!
Another great use of character sets is skipping chars that
don't match a "target". For instance, a python comment will
start with one hash char and proceedeth to the end of the
line,,, which when accounting for leading white-space,,,
could be defined by the pattern:
r'\s*#[^\n]'
> Regex is also not as easy to use in Python as it is in a
> language like Perl where it's baked into the syntax. As a
> result, pythonistas tend to shy away from regex, and
> either never learn the full power, or let their skills
> grow rusty. Which is a shame, because for many tasks,
> there's no better tool.
Agreed, but unfortunately like many other languages, Python
has decided to import all the illogical of regex syntax from
other languages instead of creating a "new" regex syntax
that is consistent and logical. They did the same thing with
Tkinter, and what a nightmare!
And don't misunderstand my statements, i don't intend that
we should create a syntax of verbosity, NO, we *CAN* keep
the syntax succinct whist eliminating the illogical and
inconsistent aspects that plague our patterns.
Will regex ever be easy to learn, probably not, but they can
be easier to use if only we put on our "big boy" pants and
decide to do something about it!
[toc] | [prev] | [next] | [standalone]
| From | Rick Johnson <rantingrickjohnson@gmail.com> |
|---|---|
| Date | 2014-07-06 10:58 -0700 |
| Message-ID | <ea577e24-2858-461a-9738-f58994d4f838@googlegroups.com> |
| In reply to | #74036 |
On Sunday, July 6, 2014 12:38:23 PM UTC-5, Rick Johnson wrote:
> r'\s*#[^\n]'
Well, there i go not testing again!
r'\s*#[^\n]*'
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web