Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #58291 > unrolled thread
| Started by | Captain Dunsel <jonathan.nyquist@gmail.com> |
|---|---|
| First post | 2013-11-01 14:33 -0700 |
| Last post | 2013-11-03 00:27 +0000 |
| Articles | 7 — 5 participants |
Back to article view | Back to comp.lang.python
New to using re. Search for a number before a string. Captain Dunsel <jonathan.nyquist@gmail.com> - 2013-11-01 14:33 -0700
Re: New to using re. Search for a number before a string. MRAB <python@mrabarnett.plus.com> - 2013-11-01 21:40 +0000
Re: New to using re. Search for a number before a string. Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-11-02 09:06 +0200
Re: New to using re. Search for a number before a string. Justin Barber <jbarber@iliff.edu> - 2013-11-02 06:03 -0700
Re: New to using re. Search for a number before a string. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-02 13:31 +0000
Re: New to using re. Search for a number before a string. Captain Dunsel <jonathan.nyquist@gmail.com> - 2013-11-02 17:19 -0700
Re: New to using re. Search for a number before a string. Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-03 00:27 +0000
| From | Captain Dunsel <jonathan.nyquist@gmail.com> |
|---|---|
| Date | 2013-11-01 14:33 -0700 |
| Subject | New to using re. Search for a number before a string. |
| Message-ID | <f241bc01-d10a-433f-b3af-b22d05a603d8@googlegroups.com> |
I have a text file that has lines with numbers occasionally appearing right before a person's name. For example: COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length. Any ideas appreciated!
[toc] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2013-11-01 21:40 +0000 |
| Message-ID | <mailman.1940.1383342016.18130.python-list@python.org> |
| In reply to | #58291 |
On 01/11/2013 21:33, Captain Dunsel wrote: > I have a text file that has lines with numbers occasionally appearing right before a person's name. For example: > > COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER > > where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length. > > Any ideas appreciated! > How about searching with a pattern like r':(\d*)FUDD, ELMER'.
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <jpiitula@ling.helsinki.fi> |
|---|---|
| Date | 2013-11-02 09:06 +0200 |
| Message-ID | <qoteh6zcmfn.fsf@ruuvi.it.helsinki.fi> |
| In reply to | #58291 |
Captain Dunsel writes: > I have a text file that has lines with numbers occasionally > appearing right before a person's name. For example: > > COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER > > where I want to search for the name "ELMER FUDD" and extract the > number right in front of it "608309" when such a number appears but > the length of the number is variable and using ?<= in a regular > expression will only search for a fixed length. > > Any ideas appreciated! Search for the digits and the name together. Make the digits a group by putting their pattern in parentheses. If there is a match object, extract the group. >>> line = 'COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER' >>> m = re.search(r':(\d*)ELMER FUDD$', line) >>> m.group(1) if m else 'not found' 'not found' >>> m = re.search(r':(\d*)FUDD, ELMER$', line) >>> m.group(1) if m else 'not found' '624309' If you want a match only when there are digits, use \d+ instead. Look at regex.search and match.group here: <http://docs.python.org/3/library/re.html>
[toc] | [prev] | [next] | [standalone]
| From | Justin Barber <jbarber@iliff.edu> |
|---|---|
| Date | 2013-11-02 06:03 -0700 |
| Message-ID | <6322d405-410e-4505-b2ed-6cfd3dea5fbd@googlegroups.com> |
| In reply to | #58291 |
I'm guessing that the name "FUDD, ELMER" varies. In that case, you might try something like this: >>> id_num_regex = re.compile(r'\d+(?=\w+\b,.+?)') >>> id_num_regex.findall(t) ['624309'] This would account for first names such as 'Mary Ann' and also automatically matches characters only to the end of the line, since you have not flagged re.DOTALL. Justin
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-11-02 13:31 +0000 |
| Message-ID | <mailman.1952.1383399138.18130.python-list@python.org> |
| In reply to | #58291 |
On 01/11/2013 21:33, Captain Dunsel wrote: > I have a text file that has lines with numbers occasionally appearing right before a person's name. For example: > > COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER > > where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length. > > Any ideas appreciated! > As you've had answers here's some references for future use http://docs.python.org/3/howto/regex.html, https://wiki.python.org/moin/RegularExpression and http://www.regular-expressions.info/python.html Also, the new regex package is an alternative to the stdlib re package. It's available from https://pypi.python.org/pypi/regex -- Python is the second best programming language in the world. But the best has yet to be invented. Christian Tismer Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Captain Dunsel <jonathan.nyquist@gmail.com> |
|---|---|
| Date | 2013-11-02 17:19 -0700 |
| Message-ID | <981273cf-2c82-4b58-b2d0-5f1beb0b0c53@googlegroups.com> |
| In reply to | #58291 |
On Friday, November 1, 2013 5:33:43 PM UTC-4, Captain Dunsel wrote: > I have a text file that has lines with numbers occasionally appearing right before a person's name. For example: > > > > COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER > > > > where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length. > > > > Any ideas appreciated! Thanks for your help, everyone!
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-11-03 00:27 +0000 |
| Message-ID | <mailman.1963.1383438482.18130.python-list@python.org> |
| In reply to | #58355 |
On 03/11/2013 00:19, Captain Dunsel wrote: > On Friday, November 1, 2013 5:33:43 PM UTC-4, Captain Dunsel wrote: >> I have a text file that has lines with numbers occasionally appearing right before a person's name. For example: >> >> >> >> COLLEGE:ENROLLMENT:COMPLETED EVALUATIONS:624309FUDD, ELMER >> >> >> >> where I want to search for the name "ELMER FUDD" and extract the number right in front of it "608309" when such a number appears but the length of the number is variable and using ?<= in a regular expression will only search for a fixed length. >> >> >> >> Any ideas appreciated! > > Thanks for your help, everyone! > My invoice, like the proverbial cheque, is in the post :) Slight aside, in order to avoid the unwanted newlines above could you please read and action this, thanks https://wiki.python.org/moin/GoogleGroupsPython -- Python is the second best programming language in the world. But the best has yet to be invented. Christian Tismer Mark Lawrence
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web