Groups > comp.lang.python > #47874 > unrolled thread

Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?)

Started by	rice.cruft@gmail.com
First post	2013-06-12 17:59 -0700
Last post	2013-06-14 12:14 -0700
Articles	6 — 4 participants

Back to article view | Back to comp.lang.python

  Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) rice.cruft@gmail.com - 2013-06-12 17:59 -0700
    Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) Andreas Perstinger <andipersti@gmail.com> - 2013-06-13 09:17 +0200
      Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) rice.stew@gmail.com - 2013-06-14 12:09 -0700
    Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) Kevin LaTona <lists@studiosola.com> - 2013-06-13 07:42 -0700
    Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) Kevin LaTona <lists@studiosola.com> - 2013-06-13 14:07 -0700
      Re: Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?) rice.stew@gmail.com - 2013-06-14 12:14 -0700

#47874 — Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?)

From	rice.cruft@gmail.com
Date	2013-06-12 17:59 -0700
Subject	Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?)
Message-ID	<e682e1eb-1f7b-4776-82f0-11a0147947ec@googlegroups.com>

I am parsing the output of an open-iscsi command that contains several blocks of data for each data set. Each block has the format:

Target: iqn.1992-04.com.emc:vplex-000000008460319f-0000000000000007
        Current Portal: 221.128.52.224:3260,7
        Persistent Portal: 221.128.52.224:3260,7
                **********
                Interface:
                **********
                Iface Name: default
                Iface Transport: tcp
                Iface Initiatorname: iqn.1996-04.de.suse:01:7c9741b545b5
                Iface IPaddress: 221.128.52.214
                Iface HWaddress: <empty>
                Iface Netdev: <empty>
                SID: 154
                iSCSI Connection State: LOGGED IN
                iSCSI Session State: LOGGED_IN
                Internal iscsid Session State: NO CHANGE


I have worked out the regex to grab the values I am interested with the exception of the 'iSCSI Connection State' and 'iSCSI Session State'. My regex is

regex = re.compile( r'''
        # Target name, iqn
        Target:\s+(?P<iqn>\S+)\s*
        # Target portal
        \s+Current\sPortal:\s*
        (?P<ipaddr>\w+\.\w+\.\w+\.\w+):(?P<port>\d+),(?P<tag>\d+)
        # skip lines...
        [\s\S]*?
        # Initiator name, iqn
        Iface\s+Initiatorname:\s+(?P<initiatorName>\S+)\s*
        # Initiator port, IP address
        Iface\s+IPaddress:\s+(?P<initiatorIP>\S+)
        # skip lines...
        [\s\S]*?
        # Session ID
        SID:\s+(?P<SID>\d+)\s*
        # Connection state
        iSCSI\ +Connection\ +State:\s+(?P<connState>\w+\s*\w*)
        [\s\S]*?    <<<<<< without this the regex fails
        # Session state
        iSCSI\ +Session\ +State:\s+(?P<sessionState>\w+)
        ''', re.VERBOSE|re.MULTILINE)


I tried using \s* to swallow the whitespace between the to iSCSI lines. No joy... However [\s\S]*? allows the regex to succeed. But that seems to me to be overkill (I am not trying to skip lines of text here.) Also note that I am using \ +   to catch spaces between the words. On the two problem lines, using \s+ between the label words fails.

The regex is compiled and fed to a finditer() call... With debug prints:

for m in regex.finditer(inp):
    print 'SSSSSS %d' % len(m.groups())
    for i in range(len(m.groups())):
        print ' SSS--> %s' % (m.group(i+1))

myDetails = [ m.groupdict() for m in regex.finditer(inp)]
print 'ZZZZ myDetails %s' % myDetails


Any help would be appreciated.   Lastly, a version of this regex as a non-VERBOSE expression works as expected.. Something about re.VERBOSE... ????

Thanks.

--Eric

[toc] | [next] | [standalone]

#47915

From	Andreas Perstinger <andipersti@gmail.com>
Date	2013-06-13 09:17 +0200
Message-ID	<mailman.3169.1371108348.3114.python-list@python.org>
In reply to	#47874

On 13.06.2013 02:59, rice.cruft@gmail.com wrote:
> I am parsing the output of an open-iscsi command that contains
> severalblocks of data for each data set. Each block has the format:
[SNIP]
> I tried using \s* to swallow the whitespace between the to iSCSI
> lines. No joy... However [\s\S]*? allows the regex to succeed. But that
> seems to me to be overkill (I am not trying to skip lines of text here.)
> Also note that I am using \ + to catch spaces between the words. On the
> two problem lines, using \s+ between the label words fails.

Changing
>          # Connection state
>          iSCSI\ +Connection\ +State:\s+(?P<connState>\w+\s*\w*)
>          [\s\S]*?    <<<<<< without this the regex fails
>          # Session state
>          iSCSI\ +Session\ +State:\s+(?P<sessionState>\w+)

to
         # Connection state
         iSCSI\s+Connection\s+State:\s+(?P<connState>\w+\s*\w*)\s*
         # Session state
         iSCSI\s+Session\s+State:\s+(?P<sessionState>\w+)

gives me

 >>> # 'test' is the example string
 >>> myDetails = [ m.groupdict() for m in regex.finditer(test)]
 >>> print myDetails
[{'initiatorIP': '221.128.52.214', 'connState': 'LOGGED IN', 'SID': 
'154', 'ipaddr': '221.128.52.224', 'initiatorName': 
'iqn.1996-04.de.suse:01:7c9741b545b5', 'sessionState': 'LOGGED_IN', 
'iqn': 'iqn.1992-04.com.emc:vplex-000000008460319f-0000000000000007', 
'tag': '7', 'port': '3260'}]

for your example (same for the original regex).
It looks like it works (Python 2.7.3) and there is something else 
breaking the regex.

Bye, Andreas

[toc] | [prev] | [next] | [standalone]

#48208

From	rice.stew@gmail.com
Date	2013-06-14 12:09 -0700
Message-ID	<7527606b-ff62-4d31-9a5e-625e3e559834@googlegroups.com>
In reply to	#47915

On Thursday, June 13, 2013 3:17:28 AM UTC-4, Andreas Perstinger wrote:
> On 13.06.2013 02:59, rice.cruft@gmail.com wrote:
> 
> > I am parsing the output of an open-iscsi command that contains
> 
> > severalblocks of data for each data set. Each block has the format:
> 
> [SNIP]

> 
> for your example (same for the original regex).
> 
> It looks like it works (Python 2.7.3) and there is something else 
> 
> breaking the regex.
> 
> 
> 
> Bye, Andreas

Indeed. "there is something else breaking the regex."  ..and a note if you are trying this regex. You need to have more than one block of Target data to see issues related to scanning multiple instances of the data. My regex works as expected if I leave those two lines related to the iSCSI Connection and Session states. For now I am scratching my head...

[toc] | [prev] | [next] | [standalone]

#47977

From	Kevin LaTona <lists@studiosola.com>
Date	2013-06-13 07:42 -0700
Message-ID	<mailman.3206.1371136676.3114.python-list@python.org>
In reply to	#47874

On Jun 12, 2013, at 5:59 PM, rice.cruft@gmail.com wrote:


> I am parsing the output of an open-iscsi command that contains several blocks of data for each data set. Each block has the format:
>  Lastly, a version of this regex as a non-VERBOSE expression works as expected.. Something about re.VERBOSE... ????
Snip


With the following code tweaks in Python 2.7.2, I find it works with VERBOSE for me, but not without.

I would say the regex could still use some more adjustments yet.

-Kevin





import re

inp ="""
Target: iqn.1992-04.com.emc:vplex-000000008460319f-0000000000000007
       Current Portal: 221.128.52.224:3260,7
       Persistent Portal: 221.128.52.224:3260,7
               **********
               Interface:
               **********
               Iface Name: default
               Iface Transport: tcp
               Iface Initiatorname: iqn.1996-04.de.suse:01:7c9741b545b5
               Iface IPaddress: 221.128.52.214
               Iface HWaddress: <empty>
               Iface Netdev: <empty>
               SID: 154
               iSCSI Connection State: LOGGED IN
               iSCSI Session State: LOGGED_IN
               Internal iscsid Session State: NO CHANGE
"""

regex = re.compile( r'''
         # Target name, iqn
      Target:\s+(?P<iqn>\S+)\s*
      # Target portal
      \s+Current\sPortal:\s*
      (?P<ipaddr>\w+\.\w+\.\w+\.\w+):(?P<port>\d+),(?P<tag>\d+)
      # skip lines...
      [\s\S]*?
      # Initiator name, iqn
      Iface\s+Initiatorname:\s+(?P<initiatorName>\S+)\s*
      # Initiator port, IP address
      Iface\s+IPaddress:\s+(?P<initiatorIP>\S+)
      # skip lines...
      [\s\S]*?
      # Session ID
      SID:\s+(?P<SID>\d+)\s*
      # Connection state
      iSCSI\ +Connection\ +State:\s+(?P<connState>\w+\s*\w*)
      [\s\S]*?
      # Session state iSCSI
      iSCSI\s+Session\s+State:\s+(?P<sessionState>\w+)\s*
      # Session state Internal
      Internal\s+iscsid\s+Session\s+State:.*\s+(?P<ss2>\w+\s\w+)
       ''', re.VERBOSE|re.MULTILINE)

myDetails = [ m.groupdict() for m in regex.finditer(inp)][0]
for k,v in myDetails.iteritems():
   print k,v




#*************

If you want just the values back in the order parsed this will work for now.


for match in regex.findall(inp):
      for item in range(len(match)):
            print match[item]

[toc] | [prev] | [next] | [standalone]

#48078

From	Kevin LaTona <lists@studiosola.com>
Date	2013-06-13 14:07 -0700
Message-ID	<mailman.3262.1371198031.3114.python-list@python.org>
In reply to	#47874

On Jun 13, 2013, at 7:42 AM, Kevin LaTona <lists@studiosola.com> wrote:
> With the following code tweaks in Python 2.7.2, I find it works with VERBOSE for me, but not without.


Sorry had a small bleep while writing that last line this AM.

Of course the regex pattern would work in VERBOSE mode as that was how it was presented.

Without VERBOSE each line of the pattern would of needed to have been enclosed in quote or double quote marks.


http://docs.python.org/2/library/re.html#re.VERBOSE


-Kevin

[toc] | [prev] | [next] | [standalone]

#48209

From	rice.stew@gmail.com
Date	2013-06-14 12:14 -0700
Message-ID	<cff4aa50-31db-49c9-a7a9-86f0c1136f8c@googlegroups.com>
In reply to	#48078

On Thursday, June 13, 2013 5:07:33 PM UTC-4, Kevin LaTona wrote:
> On Jun 13, 2013, at 7:42 AM, Kevin LaTona <lists@studiosola.com> wrote:
> 
> > With the following code tweaks in Python 2.7.2, I find it works with VERBOSE for me, but not without.
> 
> 
> 
> 
> 
> Sorry had a small bleep while writing that last line this AM.
> 
> 
> 
> Of course the regex pattern would work in VERBOSE mode as that was how it was presented.
> 
> 
> 
> Without VERBOSE each line of the pattern would of needed to have been enclosed in quote or double quote marks.
> 
> 
> 
> 
> 
> http://docs.python.org/2/library/re.html#re.VERBOSE
> 
> 
> 
> 
> 
> -Kevin

Yes. I tested with and without re.VERBOSE along with the required quoting changes. For both cases the oddities persist. Why [\s\S]+ is necessary between the two iSCSI Connection/Session lines is a mystery -- \s+? or similar should be sufficient to swallow the whitespace.

--Eric

[toc] | [prev] | [standalone]

csiph-web

Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?)

Contents

#47874 — Problem creating a regular expression to parse open-iscsi, iscsiadm output (help?)

#47915

#48208

#47977

#48078

#48209