Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #47874 > unrolled thread
| Started by | rice.cruft@gmail.com |
|---|---|
| First post | 2013-06-12 17:59 -0700 |
| Last post | 2013-06-14 12:14 -0700 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.lang.python
Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) rice.cruft@gmail.com - 2013-06-12 17:59 -0700
Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) Andreas Perstinger <andipersti@gmail.com> - 2013-06-13 09:17 +0200
Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) rice.stew@gmail.com - 2013-06-14 12:09 -0700
Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) Kevin LaTona <lists@studiosola.com> - 2013-06-13 07:42 -0700
Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) Kevin LaTona <lists@studiosola.com> - 2013-06-13 14:07 -0700
Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) rice.stew@gmail.com - 2013-06-14 12:14 -0700
| From | rice.cruft@gmail.com |
|---|---|
| Date | 2013-06-12 17:59 -0700 |
| Subject | Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) |
| Message-ID | <e682e1eb-1f7b-4776-82f0-11a0147947ec@googlegroups.com> |
I am parsing the output of an open-iscsi command that contains several blocks of data for each data set. Each block has the format:
Target: iqn.1992-04.com.emc:vplex-000000008460319f-0000000000000007
Current Portal: 221.128.52.224:3260,7
Persistent Portal: 221.128.52.224:3260,7
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1996-04.de.suse:01:7c9741b545b5
Iface IPaddress: 221.128.52.214
Iface HWaddress: <empty>
Iface Netdev: <empty>
SID: 154
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
I have worked out the regex to grab the values I am interested with the exception of the 'iSCSI Connection State' and 'iSCSI Session State'. My regex is
regex = re.compile( r'''
# Target name, iqn
Target:\s+(?P<iqn>\S+)\s*
# Target portal
\s+Current\sPortal:\s*
(?P<ipaddr>\w+\.\w+\.\w+\.\w+):(?P<port>\d+),(?P<tag>\d+)
# skip lines...
[\s\S]*?
# Initiator name, iqn
Iface\s+Initiatorname:\s+(?P<initiatorName>\S+)\s*
# Initiator port, IP address
Iface\s+IPaddress:\s+(?P<initiatorIP>\S+)
# skip lines...
[\s\S]*?
# Session ID
SID:\s+(?P<SID>\d+)\s*
# Connection state
iSCSI\ +Connection\ +State:\s+(?P<connState>\w+\s*\w*)
[\s\S]*? <<<<<< without this the regex fails
# Session state
iSCSI\ +Session\ +State:\s+(?P<sessionState>\w+)
''', re.VERBOSE|re.MULTILINE)
I tried using \s* to swallow the whitespace between the to iSCSI lines. No joy... However [\s\S]*? allows the regex to succeed. But that seems to me to be overkill (I am not trying to skip lines of text here.) Also note that I am using \ + to catch spaces between the words. On the two problem lines, using \s+ between the label words fails.
The regex is compiled and fed to a finditer() call... With debug prints:
for m in regex.finditer(inp):
print 'SSSSSS %d' % len(m.groups())
for i in range(len(m.groups())):
print ' SSS--> %s' % (m.group(i+1))
myDetails = [ m.groupdict() for m in regex.finditer(inp)]
print 'ZZZZ myDetails %s' % myDetails
Any help would be appreciated. Lastly, a version of this regex as a non-VERBOSE expression works as expected.. Something about re.VERBOSE... ????
Thanks.
--Eric
[toc] | [next] | [standalone]
| From | Andreas Perstinger <andipersti@gmail.com> |
|---|---|
| Date | 2013-06-13 09:17 +0200 |
| Message-ID | <mailman.3169.1371108348.3114.python-list@python.org> |
| In reply to | #47874 |
On 13.06.2013 02:59, rice.cruft@gmail.com wrote:
> I am parsing the output of an open-iscsi command that contains
> severalblocks of data for each data set. Each block has the format:
[SNIP]
> I tried using \s* to swallow the whitespace between the to iSCSI
> lines. No joy... However [\s\S]*? allows the regex to succeed. But that
> seems to me to be overkill (I am not trying to skip lines of text here.)
> Also note that I am using \ + to catch spaces between the words. On the
> two problem lines, using \s+ between the label words fails.
Changing
> # Connection state
> iSCSI\ +Connection\ +State:\s+(?P<connState>\w+\s*\w*)
> [\s\S]*? <<<<<< without this the regex fails
> # Session state
> iSCSI\ +Session\ +State:\s+(?P<sessionState>\w+)
to
# Connection state
iSCSI\s+Connection\s+State:\s+(?P<connState>\w+\s*\w*)\s*
# Session state
iSCSI\s+Session\s+State:\s+(?P<sessionState>\w+)
gives me
>>> # 'test' is the example string
>>> myDetails = [ m.groupdict() for m in regex.finditer(test)]
>>> print myDetails
[{'initiatorIP': '221.128.52.214', 'connState': 'LOGGED IN', 'SID':
'154', 'ipaddr': '221.128.52.224', 'initiatorName':
'iqn.1996-04.de.suse:01:7c9741b545b5', 'sessionState': 'LOGGED_IN',
'iqn': 'iqn.1992-04.com.emc:vplex-000000008460319f-0000000000000007',
'tag': '7', 'port': '3260'}]
for your example (same for the original regex).
It looks like it works (Python 2.7.3) and there is something else
breaking the regex.
Bye, Andreas
[toc] | [prev] | [next] | [standalone]
| From | rice.stew@gmail.com |
|---|---|
| Date | 2013-06-14 12:09 -0700 |
| Message-ID | <7527606b-ff62-4d31-9a5e-625e3e559834@googlegroups.com> |
| In reply to | #47915 |
On Thursday, June 13, 2013 3:17:28 AM UTC-4, Andreas Perstinger wrote: > On 13.06.2013 02:59, rice.cruft@gmail.com wrote: > > > I am parsing the output of an open-iscsi command that contains > > > severalblocks of data for each data set. Each block has the format: > > [SNIP] > > for your example (same for the original regex). > > It looks like it works (Python 2.7.3) and there is something else > > breaking the regex. > > > > Bye, Andreas Indeed. "there is something else breaking the regex." ..and a note if you are trying this regex. You need to have more than one block of Target data to see issues related to scanning multiple instances of the data. My regex works as expected if I leave those two lines related to the iSCSI Connection and Session states. For now I am scratching my head...
[toc] | [prev] | [next] | [standalone]
| From | Kevin LaTona <lists@studiosola.com> |
|---|---|
| Date | 2013-06-13 07:42 -0700 |
| Message-ID | <mailman.3206.1371136676.3114.python-list@python.org> |
| In reply to | #47874 |
On Jun 12, 2013, at 5:59 PM, rice.cruft@gmail.com wrote:
> I am parsing the output of an open-iscsi command that contains several blocks of data for each data set. Each block has the format:
> Lastly, a version of this regex as a non-VERBOSE expression works as expected.. Something about re.VERBOSE... ????
Snip
With the following code tweaks in Python 2.7.2, I find it works with VERBOSE for me, but not without.
I would say the regex could still use some more adjustments yet.
-Kevin
import re
inp ="""
Target: iqn.1992-04.com.emc:vplex-000000008460319f-0000000000000007
Current Portal: 221.128.52.224:3260,7
Persistent Portal: 221.128.52.224:3260,7
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1996-04.de.suse:01:7c9741b545b5
Iface IPaddress: 221.128.52.214
Iface HWaddress: <empty>
Iface Netdev: <empty>
SID: 154
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
"""
regex = re.compile( r'''
# Target name, iqn
Target:\s+(?P<iqn>\S+)\s*
# Target portal
\s+Current\sPortal:\s*
(?P<ipaddr>\w+\.\w+\.\w+\.\w+):(?P<port>\d+),(?P<tag>\d+)
# skip lines...
[\s\S]*?
# Initiator name, iqn
Iface\s+Initiatorname:\s+(?P<initiatorName>\S+)\s*
# Initiator port, IP address
Iface\s+IPaddress:\s+(?P<initiatorIP>\S+)
# skip lines...
[\s\S]*?
# Session ID
SID:\s+(?P<SID>\d+)\s*
# Connection state
iSCSI\ +Connection\ +State:\s+(?P<connState>\w+\s*\w*)
[\s\S]*?
# Session state iSCSI
iSCSI\s+Session\s+State:\s+(?P<sessionState>\w+)\s*
# Session state Internal
Internal\s+iscsid\s+Session\s+State:.*\s+(?P<ss2>\w+\s\w+)
''', re.VERBOSE|re.MULTILINE)
myDetails = [ m.groupdict() for m in regex.finditer(inp)][0]
for k,v in myDetails.iteritems():
print k,v
#*************
If you want just the values back in the order parsed this will work for now.
for match in regex.findall(inp):
for item in range(len(match)):
print match[item]
[toc] | [prev] | [next] | [standalone]
| From | Kevin LaTona <lists@studiosola.com> |
|---|---|
| Date | 2013-06-13 14:07 -0700 |
| Message-ID | <mailman.3262.1371198031.3114.python-list@python.org> |
| In reply to | #47874 |
On Jun 13, 2013, at 7:42 AM, Kevin LaTona <lists@studiosola.com> wrote: > With the following code tweaks in Python 2.7.2, I find it works with VERBOSE for me, but not without. Sorry had a small bleep while writing that last line this AM. Of course the regex pattern would work in VERBOSE mode as that was how it was presented. Without VERBOSE each line of the pattern would of needed to have been enclosed in quote or double quote marks. http://docs.python.org/2/library/re.html#re.VERBOSE -Kevin
[toc] | [prev] | [next] | [standalone]
| From | rice.stew@gmail.com |
|---|---|
| Date | 2013-06-14 12:14 -0700 |
| Message-ID | <cff4aa50-31db-49c9-a7a9-86f0c1136f8c@googlegroups.com> |
| In reply to | #48078 |
On Thursday, June 13, 2013 5:07:33 PM UTC-4, Kevin LaTona wrote: > On Jun 13, 2013, at 7:42 AM, Kevin LaTona <lists@studiosola.com> wrote: > > > With the following code tweaks in Python 2.7.2, I find it works with VERBOSE for me, but not without. > > > > > > Sorry had a small bleep while writing that last line this AM. > > > > Of course the regex pattern would work in VERBOSE mode as that was how it was presented. > > > > Without VERBOSE each line of the pattern would of needed to have been enclosed in quote or double quote marks. > > > > > > http://docs.python.org/2/library/re.html#re.VERBOSE > > > > > > -Kevin Yes. I tested with and without re.VERBOSE along with the required quoting changes. For both cases the oddities persist. Why [\s\S]+ is necessary between the two iSCSI Connection/Session lines is a mystery -- \s+? or similar should be sufficient to swallow the whitespace. --Eric
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web