Groups > comp.lang.python > #74006 > unrolled thread

Question about metacharacter '*'

Started by	rxjwg98@gmail.com
First post	2014-07-06 04:51 -0700
Last post	2014-07-06 10:58 -0700
Articles	12 — 6 participants

Back to article view | Back to comp.lang.python

  Question about metacharacter '*' rxjwg98@gmail.com - 2014-07-06 04:51 -0700
    Re: Question about metacharacter '*' Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-07-06 05:09 -0700
      Re: Question about metacharacter '*' rxjwg98@gmail.com - 2014-07-07 11:51 -0700
        Re: Question about metacharacter '*' Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-07-07 13:27 -0700
        Re: Question about metacharacter '*' Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-07-07 22:50 +0100
    Re: Question about metacharacter '*' MRAB <python@mrabarnett.plus.com> - 2014-07-06 16:32 +0100
    Re: Question about metacharacter '*' Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-07-06 08:50 -0700
      Re: Question about metacharacter '*' Rick Johnson <rantingrickjohnson@gmail.com> - 2014-07-06 09:24 -0700
        Re: Question about metacharacter '*' Rick Johnson <rantingrickjohnson@gmail.com> - 2014-07-06 09:32 -0700
        Re: Question about metacharacter '*' Roy Smith <roy@panix.com> - 2014-07-06 12:47 -0400
          Re: Question about metacharacter '*' Rick Johnson <rantingrickjohnson@gmail.com> - 2014-07-06 10:38 -0700
            Re: Question about metacharacter '*' Rick Johnson <rantingrickjohnson@gmail.com> - 2014-07-06 10:58 -0700

#74006 — Question about metacharacter '*'

From	rxjwg98@gmail.com
Date	2014-07-06 04:51 -0700
Subject	Question about metacharacter '*'
Message-ID	<3f7ecf04-b881-4e79-aa59-893580090468@googlegroups.com>

Hi,

I just begin to learn Python. I do not see the usefulness of '*' in its
description below:




The first metacharacter for repeating things that we'll look at is *. * doesn't
match the literal character *; instead, it specifies that the previous character
can be matched zero or more times, instead of exactly once.

For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
characters), and so forth. 



It has to be used with other search constraints?


Thanks,

[toc] | [next] | [standalone]

#74007

From	Devin Jeanpierre <jeanpierreda@gmail.com>
Date	2014-07-06 05:09 -0700
Message-ID	<mailman.11536.1404648646.18130.python-list@python.org>
In reply to	#74006

On Sun, Jul 6, 2014 at 4:51 AM,  <rxjwg98@gmail.com> wrote:
> Hi,
>
> I just begin to learn Python. I do not see the usefulness of '*' in its
> description below:
>
>
>
>
> The first metacharacter for repeating things that we'll look at is *. * doesn't
> match the literal character *; instead, it specifies that the previous character
> can be matched zero or more times, instead of exactly once.
>
> For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
> characters), and so forth.
>
>
>
> It has to be used with other search constraints?

(BTW, this is a regexp question, not really a Python question per se.)

That's usually when it's useful, yeah. For example, [0-9] matches any
of the characters 0 through 9. So to match a natural number written in
decimal form, we might use the regexp [0-9][0-9]*, which matches the
strings "1", "12", and "007", but not "" or "Jeffrey".

Another useful one is `.*` -- `.` matches exactly one character, no
matter what that character is. So, `.*` matches any string at all.

The power of regexps stems from the ability to mix and match all of
the regexp pieces in pretty much any way you want.

-- Devin

[toc] | [prev] | [next] | [standalone]

#74134

From	rxjwg98@gmail.com
Date	2014-07-07 11:51 -0700
Message-ID	<86b50887-96b0-4a21-8b0a-26c5a435c76d@googlegroups.com>
In reply to	#74007

On Sunday, July 6, 2014 8:09:57 AM UTC-4, Devin Jeanpierre wrote:
> On Sun, Jul 6, 2014 at 4:51 AM,  <rxjwg98@gmail.com> wrote:
> 
> > Hi,
> 
> >
> 
> > I just begin to learn Python. I do not see the usefulness of '*' in its
> 
> > description below:
> 
> >
> 
> >
> 
> >
> 
> >
> 
> > The first metacharacter for repeating things that we'll look at is *. * doesn't
> 
> > match the literal character *; instead, it specifies that the previous character
> 
> > can be matched zero or more times, instead of exactly once.
> 
> >
> 
> > For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
> 
> > characters), and so forth.
> 
> >
> 
> >
> 
> >
> 
> > It has to be used with other search constraints?
> 
> 
> 
> (BTW, this is a regexp question, not really a Python question per se.)
> 
> 
> 
> That's usually when it's useful, yeah. For example, [0-9] matches any
> 
> of the characters 0 through 9. So to match a natural number written in
> 
> decimal form, we might use the regexp [0-9][0-9]*, which matches the
> 
> strings "1", "12", and "007", but not "" or "Jeffrey".
> 
> 
> 
> Another useful one is `.*` -- `.` matches exactly one character, no
> 
> matter what that character is. So, `.*` matches any string at all.
> 
> 
> 
> The power of regexps stems from the ability to mix and match all of
> 
> the regexp pieces in pretty much any way you want.
> 
> 
> 
> -- Devin

Would you give me an example using your pattern: `.*` -- `.`?
I try it, but it cannot pass. (of course, I use it incorrectly)

[toc] | [prev] | [next] | [standalone]

#74135

From	Devin Jeanpierre <jeanpierreda@gmail.com>
Date	2014-07-07 13:27 -0700
Message-ID	<mailman.11609.1404764921.18130.python-list@python.org>
In reply to	#74134

On Mon, Jul 7, 2014 at 11:51 AM,  <rxjwg98@gmail.com> wrote:
> Would you give me an example using your pattern: `.*` -- `.`?
> I try it, but it cannot pass. (of course, I use it incorrectly)

Those are two patterns.

Python 3.4.1 (default, Jul  7 2014, 13:22:02)
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.fullmatch(r'.', 'a')
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>> re.fullmatch(r'.', 'ab')
>>> re.fullmatch(r'.', '')
>>>
>>> re.fullmatch(r'.*', 'a')
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>> re.fullmatch(r'.*', 'ab')
<_sre.SRE_Match object; span=(0, 2), match='ab'>
>>> re.fullmatch(r'.*', '')
<_sre.SRE_Match object; span=(0, 0), match=''>

-- Devin

[toc] | [prev] | [next] | [standalone]

#74136

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2014-07-07 22:50 +0100
Message-ID	<mailman.11610.1404769798.18130.python-list@python.org>
In reply to	#74134

On 07/07/2014 19:51, rxjwg98@gmail.com wrote:

Will you please do something about the double spaced google crap that 
you keep sending, I've already asked you twice.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [prev] | [next] | [standalone]

#74022

From	MRAB <python@mrabarnett.plus.com>
Date	2014-07-06 16:32 +0100
Message-ID	<mailman.11543.1404660755.18130.python-list@python.org>
In reply to	#74006

On 2014-07-06 13:09, Devin Jeanpierre wrote:
> On Sun, Jul 6, 2014 at 4:51 AM,  <rxjwg98@gmail.com> wrote:
>> Hi,
>>
>> I just begin to learn Python. I do not see the usefulness of '*' in its
>> description below:
>>
>>
>>
>>
>> The first metacharacter for repeating things that we'll look at is *. * doesn't
>> match the literal character *; instead, it specifies that the previous character
>> can be matched zero or more times, instead of exactly once.
>>
>> For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
>> characters), and so forth.
>>
>>
>>
>> It has to be used with other search constraints?
>
> (BTW, this is a regexp question, not really a Python question per se.)
>
> That's usually when it's useful, yeah. For example, [0-9] matches any
> of the characters 0 through 9. So to match a natural number written in
> decimal form, we might use the regexp [0-9][0-9]*, which matches the
> strings "1", "12", and "007", but not "" or "Jeffrey".
>
> Another useful one is `.*` -- `.` matches exactly one character, no
> matter what that character is. So, `.*` matches any string at all.
>
Not quite. It won't match a '\n' unless the DOTALL flag is turned on.

> The power of regexps stems from the ability to mix and match all of
> the regexp pieces in pretty much any way you want.
>

[toc] | [prev] | [next] | [standalone]

#74026

From	Devin Jeanpierre <jeanpierreda@gmail.com>
Date	2014-07-06 08:50 -0700
Message-ID	<mailman.11545.1404662273.18130.python-list@python.org>
In reply to	#74006

In related news, the regexp I gave for numbers will match "1a".

-- Devin

On Sun, Jul 6, 2014 at 8:32 AM, MRAB <python@mrabarnett.plus.com> wrote:
> On 2014-07-06 13:09, Devin Jeanpierre wrote:
>>
>> On Sun, Jul 6, 2014 at 4:51 AM,  <rxjwg98@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I just begin to learn Python. I do not see the usefulness of '*' in its
>>> description below:
>>>
>>>
>>>
>>>
>>> The first metacharacter for repeating things that we'll look at is *. *
>>> doesn't
>>> match the literal character *; instead, it specifies that the previous
>>> character
>>> can be matched zero or more times, instead of exactly once.
>>>
>>> For example, ca*t will match ct (0 a characters), cat (1 a), caaat (3 a
>>> characters), and so forth.
>>>
>>>
>>>
>>> It has to be used with other search constraints?
>>
>>
>> (BTW, this is a regexp question, not really a Python question per se.)
>>
>> That's usually when it's useful, yeah. For example, [0-9] matches any
>> of the characters 0 through 9. So to match a natural number written in
>> decimal form, we might use the regexp [0-9][0-9]*, which matches the
>> strings "1", "12", and "007", but not "" or "Jeffrey".
>>
>> Another useful one is `.*` -- `.` matches exactly one character, no
>> matter what that character is. So, `.*` matches any string at all.
>>
> Not quite. It won't match a '\n' unless the DOTALL flag is turned on.
>
>
>> The power of regexps stems from the ability to mix and match all of
>> the regexp pieces in pretty much any way you want.
>>
>
> --
> https://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]

#74031

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2014-07-06 09:24 -0700
Message-ID	<d8f8d76d-0a47-4f59-8f09-da2a44cc1d2e@googlegroups.com>
In reply to	#74026

On Sunday, July 6, 2014 10:50:13 AM UTC-5, Devin Jeanpierre wrote:
> In related news, the regexp I gave for numbers will match "1a".

Well of course it matched, because your pattern defines "one
or more consecutive digits". So it will match the "1" of
"1a" and the "11" of "11a" likewise.

As an aside i prefer to only utilize a "character set" when
nothing else will suffice. And in this case r"[0-9][0-9]*"
can be expressed just as correctly  (and less noisy IMHO) as
r"\d\d*".

============================================================
 INTERACTIVE SESSION: Python 2.x
============================================================
# Note: Grouping used for explicitness.

#
# Using character sets:
>>> import re
>>> re.search(r'([0-9][0-9]*)', '1a').groups()
('1',)
>>> re.search(r'([0-9][0-9]*)', '11a').groups()
('11',)
>>> re.search(r'([0-9][0-9]*)', '111aaa222').groups()
('111',)

#
# Same result without charactor sets:
>>> re.search(r'(\d\d*)', '1a').groups()
('1',)
>>> re.search(r'(\d\d*)', '11a').groups()
('11',)
>>> re.search(r'(\d\d*)', '111aaa222').groups()
('111',)

[toc] | [prev] | [next] | [standalone]

#74032

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2014-07-06 09:32 -0700
Message-ID	<c40ca02c-93e6-4a3c-86b5-bb4062421554@googlegroups.com>
In reply to	#74031

[CONTINUED FROM LAST REPLY...]

Likewise if your intent is to filter out any match strings
which contain non-digits, then define the start and stop
points of the pattern:

# Match only if all are digits
>>> re.match(r'\d\d*$', '111aaa222') # fails

# Match only if all are digits and,
# allow leading white-space
>>> re.match(r'\s*\d\d*$', '   111')
<_sre.SRE_Match object at 0x026D8410>
# But not trailing space!
>>> re.match(r'\s*\d\d*$', '   111 ') # fails

[toc] | [prev] | [next] | [standalone]

#74034

From	Roy Smith <roy@panix.com>
Date	2014-07-06 12:47 -0400
Message-ID	<roy-279E10.12473806072014@news.panix.com>
In reply to	#74031

In article <d8f8d76d-0a47-4f59-8f09-da2a44cc1d2e@googlegroups.com>,
 Rick Johnson <rantingrickjohnson@gmail.com> wrote:

> As an aside i prefer to only utilize a "character set" when
> nothing else will suffice. And in this case r"[0-9][0-9]*"
> can be expressed just as correctly  (and less noisy IMHO) as
> r"\d\d*".

Even better, r"\d+"

>>> re.search(r'(\d\d*)', '111aaa222').groups()
('111',)
>>> re.search(r'(\d+)', '111aaa222').groups()
('111',)

Oddly enough, I prefer character sets to the backslash notation, but I 
suppose that's largely because when I first learned regexes, that 
new-fangled backslash stuff hadn't been invented yet. :-)

I know I've said this before, but people should put more effort into 
learning regex.  There are lots of good tools in Python (startswith, 
endswith, split, in, etc) which handle many of the most common regex use 
cases.  Regex is also not as easy to use in Python as it is in a 
language like Perl where it's baked into the syntax.  As a result, 
pythonistas tend to shy away from regex, and either never learn the full 
power, or let their skills grow rusty.  Which is a shame, because for 
many tasks, there's no better tool.

[toc] | [prev] | [next] | [standalone]

#74036

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2014-07-06 10:38 -0700
Message-ID	<2f63eeac-77c3-48fc-b4dd-020ad74cd0b1@googlegroups.com>
In reply to	#74034

On Sunday, July 6, 2014 11:47:38 AM UTC-5, Roy Smith wrote:
> Even better, r"\d+"
> >>> re.search(r'(\d\d*)', '111aaa222').groups()
> ('111',)
> >>> re.search(r'(\d+)', '111aaa222').groups()
> ('111',)

Yes, good catch! I had failed to reduce your original
pattern down to it's most fundamental aspects for the sake
of completeness, and instead, opted to modify it in a manner
that mirrored your example. 

> Oddly enough, I prefer character sets to the backslash
> notation, but I suppose that's largely because when I
> first learned regexes, that new-fangled backslash stuff
> hadn't been invented yet. :-) 

Ha, point taken! :-)

Character sets really shine when you need a fixed range of
letters or numbers which are NOT defined by one of the
"special characters" of \d \D \W \w, etc... 

Say you want to match any letters between "c" and "m" or the
digits between "3" and "6". Defining that pattern using OR'd
"char literals" would be a massive undertaking!

Another great use of character sets is skipping chars that
don't match a "target". For instance, a python comment will
start with one hash char and proceedeth to the end of the
line,,, which when accounting for leading white-space,,,
could be defined by the pattern:

    r'\s*#[^\n]'

> Regex is also not as easy to use in Python as it is in a
> language like Perl where it's baked into the syntax.  As a
> result, pythonistas tend to shy away from regex, and
> either never learn the full power, or let their skills
> grow rusty. Which is a shame, because for many tasks,
> there's no better tool.

Agreed, but unfortunately like many other languages, Python
has decided to import all the illogical of regex syntax from
other languages instead of creating a "new" regex syntax
that is consistent and logical. They did the same thing with
Tkinter, and what a nightmare!

And don't misunderstand my statements, i don't intend that
we should create a syntax of verbosity, NO, we *CAN* keep
the syntax succinct whist eliminating the illogical and
inconsistent aspects that plague our patterns.  

Will regex ever be easy to learn, probably not, but they can
be easier to use if only we put on our "big boy" pants and
decide to do something about it!

[toc] | [prev] | [next] | [standalone]

#74038

From	Rick Johnson <rantingrickjohnson@gmail.com>
Date	2014-07-06 10:58 -0700
Message-ID	<ea577e24-2858-461a-9738-f58994d4f838@googlegroups.com>
In reply to	#74036

On Sunday, July 6, 2014 12:38:23 PM UTC-5, Rick Johnson wrote:

>     r'\s*#[^\n]'

Well, there i go not testing again!

    r'\s*#[^\n]*'

[toc] | [prev] | [standalone]

csiph-web

Question about metacharacter '*'

Contents

#74006 — Question about metacharacter '*'

#74007

#74134

#74135

#74136

#74022

#74026

#74031

#74032

#74034

#74036

#74038