Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #97253 > unrolled thread
| Started by | massi_srb@msn.com |
|---|---|
| First post | 2015-09-30 11:34 -0700 |
| Last post | 2015-10-01 21:31 +0000 |
| Articles | 10 — 5 participants |
Back to article view | Back to comp.lang.python
Question about regular expression massi_srb@msn.com - 2015-09-30 11:34 -0700
Re: Question about regular expression Emile van Sebille <emile@fenx.com> - 2015-09-30 11:50 -0700
Re: Question about regular expression Tim Chase <python.list@tim.thechases.com> - 2015-09-30 14:20 -0500
Re: Question about regular expression Denis McMahon <denismfmcmahon@gmail.com> - 2015-09-30 23:30 +0000
Re: Question about regular expression Denis McMahon <denismfmcmahon@gmail.com> - 2015-10-02 18:25 +0000
Re: Question about regular expression Emile van Sebille <emile@fenx.com> - 2015-09-30 20:58 -0700
Re: Question about regular expression Tim Chase <python.list@tim.thechases.com> - 2015-10-01 07:39 -0500
Re: Question about regular expression Rob Gaddi <rgaddi@technologyhighland.invalid> - 2015-10-01 15:53 +0000
Re: Question about regular expression Denis McMahon <denismfmcmahon@gmail.com> - 2015-10-01 21:41 +0000
Re: Question about regular expression Denis McMahon <denismfmcmahon@gmail.com> - 2015-10-01 21:31 +0000
| From | massi_srb@msn.com |
|---|---|
| Date | 2015-09-30 11:34 -0700 |
| Subject | Question about regular expression |
| Message-ID | <811788b6-9955-4dcc-bf49-9647891d17ec@googlegroups.com> |
Hi everyone,
firstly the description of my problem. I have a string in the following form:
s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..."
that is a string made up of groups in the form 'name' (letters only) plus possibly a tuple containing 1 or 2 integer values. Blanks can be placed between names and tuples or not, but they surely are placed beween two groups. I would like to process this string in order to get a dictionary like this:
d = {
"name1":(0, 0),
"name2":(1, 0),
"name3":(0, 0),
"name4":(1, 4),
"name5":(2, 0),
}
I guess this problem can be tackled with regular expressions, but I have no idea bout how to use them in this case (I'm not a regexp guy). Can anyone give me a hint? any possible different approach is absolutely welcome.
Thanks in advance!
[toc] | [next] | [standalone]
| From | Emile van Sebille <emile@fenx.com> |
|---|---|
| Date | 2015-09-30 11:50 -0700 |
| Message-ID | <mailman.274.1443639036.28679.python-list@python.org> |
| In reply to | #97253 |
On 9/30/2015 11:34 AM, massi_srb@msn.com wrote:
> Hi everyone,
>
> firstly the description of my problem. I have a string in the following form:
>
> s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..."
>
> that is a string made up of groups in the form 'name' (letters only) plus possibly a tuple containing 1 or 2 integer values. Blanks can be placed between names and tuples or not, but they surely are placed beween two groups. I would like to process this string in order to get a dictionary like this:
>
> d = {
> "name1":(0, 0),
> "name2":(1, 0),
> "name3":(0, 0),
> "name4":(1, 4),
> "name5":(2, 0),
> }
>
> I guess this problem can be tackled with regular expressions,
Stop there! :)
I'd use string functions. If you can control the string output to drop
the spaces and always output in namex(a,b)<space>namey(c,d)... format,
try starting with
>>> "name1 name2(1) name3 name4(1,4) name5(2)".split()
['name1', 'name2(1)', 'name3', 'name4(1,4)', 'name5(2)']
then create the dict from the result.
Emile
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2015-09-30 14:20 -0500 |
| Message-ID | <mailman.276.1443641310.28679.python-list@python.org> |
| In reply to | #97253 |
On 2015-09-30 11:34, massi_srb@msn.com wrote:
> firstly the description of my problem. I have a string in the
> following form:
>
> s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..."
>
> that is a string made up of groups in the form 'name' (letters
> only) plus possibly a tuple containing 1 or 2 integer values.
> Blanks can be placed between names and tuples or not, but they
> surely are placed beween two groups. I would like to process this
> string in order to get a dictionary like this:
>
> d = {
> "name1":(0, 0),
> "name2":(1, 0),
> "name3":(0, 0),
> "name4":(1, 4),
> "name5":(2, 0),
> }
>
> I guess this problem can be tackled with regular expressions, b
First out of the gate, I suggest you follow Emile's advice and try
using string expressions. However, if you *want* to do it with
regular expressions, you can. It's ugly and might be fragile, but
#############################################################
import re
s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..."
r = re.compile(r"""
\b # start at a word boundary
(\w+) # capture the word
\s* # optional whitespace
(?: # start an optional grouping for things in the parens
\( # a literal open-paren
\s* # optional whitespace
(\d+) # capture the number in those parens
(?: # start a second optional grouping for the stuff after a comma
\s* # optional whitespace
, # a literal comma
\s* # optional whitespace
(\d+) # the second number
)? # make the command and following number optional
\) # a literal close-paren
)? # make that stuff in parens optional
""", re.X)
d = {}
for m in r.finditer(s):
a, b, c = m.groups()
d[a] = (int(b or 0), int(c or 0))
from pprint import pprint
pprint(d)
#############################################################
I'd stick with the commented version of the regexp if you were to use
this anywhere so that others can follow what you're doing.
-tkc
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-09-30 23:30 +0000 |
| Message-ID | <muhrb7$elp$2@dont-email.me> |
| In reply to | #97253 |
On Wed, 30 Sep 2015 11:34:04 -0700, massi_srb wrote:
> firstly the description of my problem. I have a string in the following
> form: .....
The way I solved this was to:
1) replace all the punctuation in the string with spaces
2) split the string on space
3) process each thing in the list to test if it was a number or word
4a) add words to the dictionary as keys with value of a default list, or
4b) add numbers to the dictionary in the list at the appropriate position
5) convert the list values of the dictionary to tuples
It seems to work on my test case:
s = "fred jim(1) alice tom (1, 4) peter (2) andrew(3,4) janet( 7,6 ) james
( 7 ) mike ( 9 )"
d = {'mike': (9, 0), 'janet': (7, 6), 'james': (7, 0), 'jim': (1, 0),
'andrew': (3, 4), 'alice': (0, 0), 'tom': (1, 4), 'peter': (2, 0), 'fred':
(0, 0)}
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-10-02 18:25 +0000 |
| Message-ID | <mumi6u$9d6$1@dont-email.me> |
| In reply to | #97262 |
On Wed, 30 Sep 2015 23:30:47 +0000, Denis McMahon wrote:
> On Wed, 30 Sep 2015 11:34:04 -0700, massi_srb wrote:
>
>> firstly the description of my problem. I have a string in the following
>> form: .....
>
> The way I solved this was to:
>
> 1) replace all the punctuation in the string with spaces
>
> 2) split the string on space
>
> 3) process each thing in the list to test if it was a number or word
>
> 4a) add words to the dictionary as keys with value of a default list, or
> 4b) add numbers to the dictionary in the list at the appropriate
> position
>
> 5) convert the list values of the dictionary to tuples
>
> It seems to work on my test case:
>
> s = "fred jim(1) alice tom (1, 4) peter (2) andrew(3,4) janet( 7,6 )
> james ( 7 ) mike ( 9 )"
>
> d = {'mike': (9, 0), 'janet': (7, 6), 'james': (7, 0), 'jim': (1, 0),
> 'andrew': (3, 4), 'alice': (0, 0), 'tom': (1, 4), 'peter': (2, 0),
> 'fred':
> (0, 0)}
Oh yeah, the code:
#!/usr/bin/python
import re
s = 'fred jim(1) alice tom (1, 4) peter (2) andrew(3,4) janet( 7,6 ) james
( 7 ) mike ( 9 ) jon ( 6 , 3 ) charles(0,12)'
bits = s.replace('(', ' ').replace(',', ' ').replace(')', ' ').split(' ')
d = {}
namep = re.compile('^[A-Za-z]+$')
numbp = re.compile('^[0-9]+$')
for bit in bits:
if namep.match(bit):
d[bit] = [0,0]
w = bit
nums = 0
if numbp.match(bit):
n = int(bit)
d[w][nums] = n
nums += 1
d = {x:tuple(d[x]) for x in d}
print s
print d
It uses regex to determine if the list element being processed is a name
or a number, which makes for 2 very simple patterns.
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Emile van Sebille <emile@fenx.com> |
|---|---|
| Date | 2015-09-30 20:58 -0700 |
| Message-ID | <mailman.281.1443671907.28679.python-list@python.org> |
| In reply to | #97253 |
On 9/30/2015 12:20 PM, Tim Chase wrote:
> On 2015-09-30 11:34, massi_srb@msn.com wrote:
<snip>
>> I guess this problem can be tackled with regular expressions, b
> ... However, if you *want* to do it with
> regular expressions, you can. It's ugly and might be fragile, but
>
> #############################################################
> import re
> s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..."
> r = re.compile(r"""
> \b # start at a word boundary
> (\w+) # capture the word
> \s* # optional whitespace
> (?: # start an optional grouping for things in the parens
> \( # a literal open-paren
> \s* # optional whitespace
> (\d+) # capture the number in those parens
> (?: # start a second optional grouping for the stuff after a comma
> \s* # optional whitespace
> , # a literal comma
> \s* # optional whitespace
> (\d+) # the second number
> )? # make the command and following number optional
> \) # a literal close-paren
> )? # make that stuff in parens optional
> """, re.X)
> d = {}
> for m in r.finditer(s):
> a, b, c = m.groups()
> d[a] = (int(b or 0), int(c or 0))
>
> from pprint import pprint
> pprint(d)
> #############################################################
:)
>
> I'd stick with the commented version of the regexp if you were to use
> this anywhere so that others can follow what you're doing.
... and this is why I use python. That looks too much like a hex sector
disk dump rot /x20. :)
No-really-that's-sick-ly yr's,
Emile
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2015-10-01 07:39 -0500 |
| Message-ID | <mailman.291.1443703705.28679.python-list@python.org> |
| In reply to | #97253 |
On 2015-10-01 01:48, gal kauffman wrote:
> items = s.replace(' (', '(').replace(', ',',').split()
s = "name1 (1)"
Your suggestion doesn't catch cases where more than one space can
occur before the paren.
-tkc
[toc] | [prev] | [next] | [standalone]
| From | Rob Gaddi <rgaddi@technologyhighland.invalid> |
|---|---|
| Date | 2015-10-01 15:53 +0000 |
| Message-ID | <mujku2$ia9$2@dont-email.me> |
| In reply to | #97253 |
On Wed, 30 Sep 2015 11:34:04 -0700, massi_srb wrote:
> Hi everyone,
>
> firstly the description of my problem. I have a string in the following
> form:
>
> s = "name1 name2(1) name3 name4 (1, 4) name5(2) ..."
>
> that is a string made up of groups in the form 'name' (letters only)
> plus possibly a tuple containing 1 or 2 integer values. Blanks can be
> placed between names and tuples or not, but they surely are placed
> beween two groups. I would like to process this string in order to get a
> dictionary like this:
>
> d = {
> "name1":(0, 0),
> "name2":(1, 0),
> "name3":(0, 0),
> "name4":(1, 4),
> "name5":(2, 0),
> }
>
> I guess this problem can be tackled with regular expressions, but I have
> no idea bout how to use them in this case (I'm not a regexp guy). Can
> anyone give me a hint? any possible different approach is absolutely
> welcome.
>
> Thanks in advance!
There's a quote for this. 'Some people, when confronted with a problem,
think “I know, I'll use regular expressions.” Now they have two
problems.'
That one's not always true, but any time you're debating a regex solution
it should at least come to mind. Lots of people have posted lots of pure
Python solutions. I will simply comment that using any of them will make
you fundamentally happier as time goes on than trying to shoehorn a regex
in.
--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-10-01 21:41 +0000 |
| Message-ID | <muk99h$ogn$2@dont-email.me> |
| In reply to | #97286 |
On Thu, 01 Oct 2015 15:53:38 +0000, Rob Gaddi wrote:
> There's a quote for this. 'Some people, when confronted with a problem,
> think “I know, I'll use regular expressions.” Now they have two
> problems.'
I actually used 2 regexes:
wordpatt = re.compile('[a-zA-Z]+')
numpatt = re.compile('[0-9]+')
replace all '(', ',' and ')' in the string with spaces
split the string on space
create an empty dict d
process each thing in the split list setting d[word]=[0,0] for each word
element (wordpatt.match(thing)) (a list because I want to be able to
modify it)
setting d[word][n] = int(num) for each num element (numpatt.match(thing))
with n depending on whether it was the first or second num following the
previous word
then:
d = {x:tuple(d[x]) for x in d}
to convert the lists in the new dic to tuples
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-10-01 21:31 +0000 |
| Message-ID | <muk8od$ogn$1@dont-email.me> |
| In reply to | #97253 |
On Thu, 01 Oct 2015 01:48:03 -0700, gal kauffman wrote:
> items = s.replace(' (', '(').replace(', ',',').split()
>
> items_dict = dict()
> for item in items:
> if '(' not in item:
> item += '(0,0)'
> if ',' not in item:
> item = item.replace(')', ',0)')
>
> name, raw_data = item.split('(') data_tuple = tuple((int(v) for v in
> raw_data.replace(')','').split(',')))
>
> items_dict[name] = data_tuple
Please don't top post.
What happens if there's more whitespace than you allow for preceding a
'(' or following a ',', or if there's whitespace following '('?
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web