Groups > comp.lang.python > #58291 > unrolled thread

New to using re. Search for a number before a string.

Started by	Captain Dunsel <jonathan.nyquist@gmail.com>
First post	2013-11-01 14:33 -0700
Last post	2013-11-03 00:27 +0000
Articles	7 — 5 participants

Back to article view | Back to comp.lang.python

  New to using re.   Search for a number before a string. Captain Dunsel <jonathan.nyquist@gmail.com> - 2013-11-01 14:33 -0700
    Re: New to using re.   Search for a number before a string. MRAB <python@mrabarnett.plus.com> - 2013-11-01 21:40 +0000
    Re: New to using re.   Search for a number before a string. Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-11-02 09:06 +0200
    Re: New to using re.   Search for a number before a string. Justin Barber <jbarber@iliff.edu> - 2013-11-02 06:03 -0700
    Re: New to using re.   Search for a number before a string. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-02 13:31 +0000
    Re: New to using re.   Search for a number before a string. Captain Dunsel <jonathan.nyquist@gmail.com> - 2013-11-02 17:19 -0700
      Re: New to using re.   Search for a number before a string. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-03 00:27 +0000

#58291 — New to using re. Search for a number before a string.

From	Captain Dunsel <jonathan.nyquist@gmail.com>
Date	2013-11-01 14:33 -0700
Subject	New to using re. Search for a number before a string.
Message-ID	<f241bc01-d10a-433f-b3af-b22d05a603d8@googlegroups.com>

I have a text file that has lines with numbers occasionally appearing right before a person's name.  For example:

COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER

where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length.

Any ideas appreciated!

[toc] | [next] | [standalone]

#58292

From	MRAB <python@mrabarnett.plus.com>
Date	2013-11-01 21:40 +0000
Message-ID	<mailman.1940.1383342016.18130.python-list@python.org>
In reply to	#58291

On 01/11/2013 21:33, Captain Dunsel wrote:
> I have a text file that has lines with numbers occasionally appearing right before a person's name.  For example:
>
> COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER
>
> where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length.
>
> Any ideas appreciated!
>
How about searching with a pattern like r':(\d*)FUDD, ELMER'.

[toc] | [prev] | [next] | [standalone]

#58316

From	Jussi Piitulainen <jpiitula@ling.helsinki.fi>
Date	2013-11-02 09:06 +0200
Message-ID	<qoteh6zcmfn.fsf@ruuvi.it.helsinki.fi>
In reply to	#58291

Captain Dunsel writes:

> I have a text file that has lines with numbers occasionally
> appearing right before a person's name.  For example:
> 
> COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER
> 
> where I want to search for the name "ELMER FUDD" and extract the
> number right in front of it "608309" when such a number appears but
> the length of the number is variable and using ?<= in a regular
> expression will only search for a fixed length.
> 
> Any ideas appreciated! 

Search for the digits and the name together. Make the digits a group
by putting their pattern in parentheses. If there is a match object,
extract the group.

 >>> line = 'COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER'
 >>> m = re.search(r':(\d*)ELMER FUDD$', line)
 >>> m.group(1) if m else 'not found'
 'not found'
 >>> m = re.search(r':(\d*)FUDD, ELMER$', line)
 >>> m.group(1) if m else 'not found'
 '624309'

If you want a match only when there are digits, use \d+ instead.

Look at regex.search and match.group here:
<http://docs.python.org/3/library/re.html>

[toc] | [prev] | [next] | [standalone]

#58325

From	Justin Barber <jbarber@iliff.edu>
Date	2013-11-02 06:03 -0700
Message-ID	<6322d405-410e-4505-b2ed-6cfd3dea5fbd@googlegroups.com>
In reply to	#58291

I'm guessing that the name "FUDD, ELMER" varies. In that case, you might try something like this:

>>> id_num_regex = re.compile(r'\d+(?=\w+\b,.+?)')
>>> id_num_regex.findall(t)
['624309']

This would account for first names such as 'Mary Ann' and also automatically matches characters only to the end of the line, since you have not flagged re.DOTALL.

Justin

[toc] | [prev] | [next] | [standalone]

#58327

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-11-02 13:31 +0000
Message-ID	<mailman.1952.1383399138.18130.python-list@python.org>
In reply to	#58291

On 01/11/2013 21:33, Captain Dunsel wrote:
> I have a text file that has lines with numbers occasionally appearing right before a person's name.  For example:
>
> COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER
>
> where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length.
>
> Any ideas appreciated!
>

As you've had answers here's some references for future use 
http://docs.python.org/3/howto/regex.html, 
https://wiki.python.org/moin/RegularExpression and 
http://www.regular-expressions.info/python.html

Also, the new regex package is an alternative to the stdlib re package. 
  It's available from https://pypi.python.org/pypi/regex

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#58355

From	Captain Dunsel <jonathan.nyquist@gmail.com>
Date	2013-11-02 17:19 -0700
Message-ID	<981273cf-2c82-4b58-b2d0-5f1beb0b0c53@googlegroups.com>
In reply to	#58291

On Friday, November 1, 2013 5:33:43 PM UTC-4, Captain Dunsel wrote:
> I have a text file that has lines with numbers occasionally appearing right before a person's name.  For example:
> 
> 
> 
> COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER
> 
> 
> 
> where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length.
> 
> 
> 
> Any ideas appreciated!

Thanks for your help, everyone!

[toc] | [prev] | [next] | [standalone]

#58356

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-11-03 00:27 +0000
Message-ID	<mailman.1963.1383438482.18130.python-list@python.org>
In reply to	#58355

On 03/11/2013 00:19, Captain Dunsel wrote:
> On Friday, November 1, 2013 5:33:43 PM UTC-4, Captain Dunsel wrote:
>> I have a text file that has lines with numbers occasionally appearing right before a person's name.  For example:
>>
>>
>>
>> COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER
>>
>>
>>
>> where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length.
>>
>>
>>
>> Any ideas appreciated!
>
> Thanks for your help, everyone!
>

My invoice, like the proverbial cheque, is in the post :)

Slight aside, in order to avoid the unwanted newlines above could you 
please read and action this, thanks 
https://wiki.python.org/moin/GoogleGroupsPython

-- 
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

[toc] | [prev] | [standalone]

csiph-web

New to using re. Search for a number before a string.

Contents

#58291 — New to using re. Search for a number before a string.

#58292

#58316

#58325

#58327

#58355

#58356