Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #104183 > unrolled thread

Regex: Perl to Python

Started byFillmore <fillmore_remove@hotmail.com>
First post2016-03-06 23:38 -0500
Last post2016-03-07 07:50 -0500
Articles 7 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Regex: Perl to Python Fillmore <fillmore_remove@hotmail.com> - 2016-03-06 23:38 -0500
    Re: Regex: Perl to Python Chris Angelico <rosuav@gmail.com> - 2016-03-07 15:45 +1100
    Re: Regex: Perl to Python Terry Reedy <tjreedy@udel.edu> - 2016-03-06 23:48 -0500
      Re: Regex: Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-03-06 21:53 -0800
      Re: Regex: Perl to Python Rustom Mody <rustompmody@gmail.com> - 2016-03-06 21:53 -0800
    Re: Regex: Perl to Python Peter Otten <__peter__@web.de> - 2016-03-07 08:48 +0100
    Re: Regex: Perl to Python Fillmore <fillmore_remove@hotmail.com> - 2016-03-07 07:50 -0500

#104183 — Regex: Perl to Python

FromFillmore <fillmore_remove@hotmail.com>
Date2016-03-06 23:38 -0500
SubjectRegex: Perl to Python
Message-ID<nbj0k0$12gk$1@gioia.aioe.org>
Hi, I'm trying to move away from Perl and go to Python.
Regex seems to bethe hardest challenge so far.

Perl:

while (<HEADERFILE>) {
     if (/(\d+)\t(.+)$/) {
	print $1." - ". $2."\n";
     }
}

into python

pattern = re.compile(r"(\d+)\t(.+)$")
with open(fields_Indexfile,mode="rt",encoding='utf-8') as headerfile:
     for line in headerfile:
         #sys.stdout.write(line)
         m = pattern.match(line)
         print(m.group(0))
     headerfile.close()

but I must be getting something fundamentally wrong because:

Traceback (most recent call last):
   File "./slicer.py", line 30, in <module>
     print(m.group(0))
AttributeError: 'NoneType' object has no attribute 'group'


  why is 'm' a None?

the input data has this format:

          :
      3	prop1
      4	prop2
      5	prop3

Thanks

[toc] | [next] | [standalone]


#104184

FromChris Angelico <rosuav@gmail.com>
Date2016-03-07 15:45 +1100
Message-ID<mailman.6.1457325934.10335.python-list@python.org>
In reply to#104183
On Mon, Mar 7, 2016 at 3:38 PM, Fillmore <fillmore_remove@hotmail.com> wrote:
> pattern = re.compile(r"(\d+)\t(.+)$")
> with open(fields_Indexfile,mode="rt",encoding='utf-8') as headerfile:
>     for line in headerfile:
>         #sys.stdout.write(line)
>         m = pattern.match(line)
>         print(m.group(0))
>     headerfile.close()
>
> but I must be getting something fundamentally wrong because:
>
> Traceback (most recent call last):
>   File "./slicer.py", line 30, in <module>
>     print(m.group(0))
> AttributeError: 'NoneType' object has no attribute 'group'
>
>
>  why is 'm' a None?

When the regex doesn't match, Python gives you back None instead of a
match object. Your Perl code has an 'if' to guard that; you can do the
same thing:

if m:
    print(m.group(0))

ChrisA

[toc] | [prev] | [next] | [standalone]


#104185

FromTerry Reedy <tjreedy@udel.edu>
Date2016-03-06 23:48 -0500
Message-ID<mailman.7.1457326155.10335.python-list@python.org>
In reply to#104183
On 3/6/2016 11:38 PM, Fillmore wrote:
>
> Hi, I'm trying to move away from Perl and go to Python.
> Regex seems to bethe hardest challenge so far.
>
> Perl:
>
> while (<HEADERFILE>) {
>      if (/(\d+)\t(.+)$/) {
>      print $1." - ". $2."\n";
>      }
> }
>
> into python
>
> pattern = re.compile(r"(\d+)\t(.+)$")
> with open(fields_Indexfile,mode="rt",encoding='utf-8') as headerfile:
>      for line in headerfile:
>          #sys.stdout.write(line)
>          m = pattern.match(line)
>          print(m.group(0))
>      headerfile.close()

Delete this line.  Files opened in a with statement are automatically 
closed when exiting the block.  This is a main motivator and use for the 
with statement.

> but I must be getting something fundamentally wrong because:
>
> Traceback (most recent call last):
>    File "./slicer.py", line 30, in <module>
>      print(m.group(0))
> AttributeError: 'NoneType' object has no attribute 'group'
>
>
>   why is 'm' a None?

Python has a wonderful interactive help facility.  Learn to use it.

 >>> import re
 >>> help(re.match)
Help on function match in module re:

match(pattern, string, flags=0)
     Try to apply the pattern at the start of the string, returning
     a match object, or None if no match was found.
 >>>

Add 'if m is not None:' before accessing m.group.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#104186

FromRustom Mody <rustompmody@gmail.com>
Date2016-03-06 21:53 -0800
Message-ID<df11597e-a96c-4201-93d7-18caa8f91476@googlegroups.com>
In reply to#104185
Also for regex hacking:
Try with findall before using match/search

[toc] | [prev] | [next] | [standalone]


#104187

FromRustom Mody <rustompmody@gmail.com>
Date2016-03-06 21:53 -0800
Message-ID<25d7cdf5-25c5-4452-9644-b3292d903fb4@googlegroups.com>
In reply to#104185
Also for regex hacking:
Try with findall before using match/search

[toc] | [prev] | [next] | [standalone]


#104191

FromPeter Otten <__peter__@web.de>
Date2016-03-07 08:48 +0100
Message-ID<mailman.8.1457336938.10335.python-list@python.org>
In reply to#104183
Fillmore wrote:

> 
> Hi, I'm trying to move away from Perl and go to Python.
> Regex seems to bethe hardest challenge so far.
> 
> Perl:
> 
> while (<HEADERFILE>) {
>      if (/(\d+)\t(.+)$/) {
> print $1." - ". $2."\n";
>      }
> }
> 
> into python
> 
> pattern = re.compile(r"(\d+)\t(.+)$")
> with open(fields_Indexfile,mode="rt",encoding='utf-8') as headerfile:
>      for line in headerfile:
>          #sys.stdout.write(line)
>          m = pattern.match(line)
>          print(m.group(0))
>      headerfile.close()
> 
> but I must be getting something fundamentally wrong because:
> 
> Traceback (most recent call last):
>    File "./slicer.py", line 30, in <module>
>      print(m.group(0))
> AttributeError: 'NoneType' object has no attribute 'group'
> 
> 
>   why is 'm' a None?

match() matches from the begin of the string, use search():

match = pattern.search(line)
if match is not None:
    print(match.group(1), "-", match.group(2))

Also, in Python you often can use string methods instead of regular 
expressions:

index, tab, value = line.strip().partition("\t")
if tab and index.isdigit():
    print(index, "-", value)

> the input data has this format:
> 
>           :
>       3	prop1
>       4	prop2
>       5	prop3
> 

[toc] | [prev] | [next] | [standalone]


#104213

FromFillmore <fillmore_remove@hotmail.com>
Date2016-03-07 07:50 -0500
Message-ID<nbjtdv$q48$1@gioia.aioe.org>
In reply to#104183
Big thank you to everyone who offered their help!

On 03/06/2016 11:38 PM, Fillmore wrote:
>
>

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web