Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #8325 > unrolled thread

How do you print a string after it's been searched for an RE?

Started byJohn Salerno <johnjsal@gmail.com>
First post2011-06-23 12:58 -0700
Last post2011-06-23 15:02 -0700
Articles 5 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  How do you print a string after it's been searched for an RE? John Salerno <johnjsal@gmail.com> - 2011-06-23 12:58 -0700
    Re: How do you print a string after it's been searched for an RE? Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-23 14:47 -0600
      Re: How do you print a string after it's been searched for an RE? John Salerno <johnjsal@gmail.com> - 2011-06-23 14:14 -0700
    Re: How do you print a string after it's been searched for an  RE? "Thomas L. Shinnick" <tshinnic@prismnet.com> - 2011-06-23 16:47 -0500
      Re: How do you print a string after it's been searched for an RE? John Salerno <johnjsal@gmail.com> - 2011-06-23 15:02 -0700

#8325 — How do you print a string after it's been searched for an RE?

FromJohn Salerno <johnjsal@gmail.com>
Date2011-06-23 12:58 -0700
SubjectHow do you print a string after it's been searched for an RE?
Message-ID<c4486911-f2cb-4e0e-a93c-3d6ff7e302e9@16g2000yqy.googlegroups.com>
After I've run the re.search function on a string and no match was
found, how can I access that string? When I try to print it directly,
it's an empty string, I assume because it has been "consumed." How do
I prevent this?

It seems to work fine for this 2.x code:

import urllib.request
import re

next_nothing = '12345'
pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
nothing='
pattern = re.compile(r'[0-9]+')

while True:
    page = urllib.request.urlopen(pc_url + next_nothing)
    match_obj = pattern.search(page.read().decode())
    if match_obj:
        next_nothing = match_obj.group()
        print(next_nothing)
    else:
        print(page.read().decode())
        break

But when I try it with my own code (3.2), it won't print the text of
the page:

import urllib.request
import re

next_nothing = '12345'
pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
nothing='
pattern = re.compile(r'[0-9]+')

while True:
    page = urllib.request.urlopen(pc_url + next_nothing)
    match_obj = pattern.search(page.read().decode())
    if match_obj:
        next_nothing = match_obj.group()
        print(next_nothing)
    else:
        print(page.read().decode())
        break

P.S. I plan to clean up my code, I know it's not great right now. But
my immediate goal is to just figure out why the 2.x code can print
"text", but my own code can't print "page," which are basically the
same thing, unless something significant has changed with either the
urllib.request module, or the way it's decoded, or something, or is it
just an RE issue?

Thanks.

[toc] | [next] | [standalone]


#8326

FromIan Kelly <ian.g.kelly@gmail.com>
Date2011-06-23 14:47 -0600
Message-ID<mailman.340.1308862058.1164.python-list@python.org>
In reply to#8325
On Thu, Jun 23, 2011 at 1:58 PM, John Salerno <johnjsal@gmail.com> wrote:
> After I've run the re.search function on a string and no match was
> found, how can I access that string? When I try to print it directly,
> it's an empty string, I assume because it has been "consumed." How do
> I prevent this?

This has nothing to do with regular expressions. It would appear that
page.read() is letting you read the response body multiple times in
2.x but not in 3.x, probably due to a change in buffering.  Just store
the string in a variable and avoid calling page.read() multiple times.

[toc] | [prev] | [next] | [standalone]


#8328

FromJohn Salerno <johnjsal@gmail.com>
Date2011-06-23 14:14 -0700
Message-ID<fdab8e20-9c2a-443f-acc2-e509c6c1cad3@b21g2000yqc.googlegroups.com>
In reply to#8326
On Jun 23, 3:47 pm, Ian Kelly <ian.g.ke...@gmail.com> wrote:
> On Thu, Jun 23, 2011 at 1:58 PM, John Salerno <johnj...@gmail.com> wrote:
> > After I've run the re.search function on a string and no match was
> > found, how can I access that string? When I try to print it directly,
> > it's an empty string, I assume because it has been "consumed." How do
> > I prevent this?
>
> This has nothing to do with regular expressions. It would appear that
> page.read() is letting you read the response body multiple times in
> 2.x but not in 3.x, probably due to a change in buffering.  Just store
> the string in a variable and avoid calling page.read() multiple times.

Thank you. That worked, and as a result I think my code will look
cleaner.

[toc] | [prev] | [next] | [standalone]


#8329 — Re: How do you print a string after it's been searched for an RE?

From"Thomas L. Shinnick" <tshinnic@prismnet.com>
Date2011-06-23 16:47 -0500
SubjectRe: How do you print a string after it's been searched for an RE?
Message-ID<mailman.342.1308865647.1164.python-list@python.org>
In reply to#8325
There is also
       print(match_obj.string)
which gives you a copy of the string searched.  See end of section 
6.2.5. Match Objects

At 02:58 PM 6/23/2011, John Salerno wrote:
>After I've run the re.search function on a string and no match was
>found, how can I access that string? When I try to print it directly,
>it's an empty string, I assume because it has been "consumed." How do
>I prevent this?
>
>It seems to work fine for this 2.x code:
>
>import urllib.request
>import re
>
>next_nothing = '12345'
>pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
>nothing='
>pattern = re.compile(r'[0-9]+')
>
>while True:
>     page = urllib.request.urlopen(pc_url + next_nothing)
>     match_obj = pattern.search(page.read().decode())
>     if match_obj:
>         next_nothing = match_obj.group()
>         print(next_nothing)
>     else:
>         print(page.read().decode())
>         break
>
>But when I try it with my own code (3.2), it won't print the text of
>the page:
>
>import urllib.request
>import re
>
>next_nothing = '12345'
>pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
>nothing='
>pattern = re.compile(r'[0-9]+')
>
>while True:
>     page = urllib.request.urlopen(pc_url + next_nothing)
>     match_obj = pattern.search(page.read().decode())
>     if match_obj:
>         next_nothing = match_obj.group()
>         print(next_nothing)
>     else:
>         print(page.read().decode())
>         break
>
>P.S. I plan to clean up my code, I know it's not great right now. But
>my immediate goal is to just figure out why the 2.x code can print
>"text", but my own code can't print "page," which are basically the
>same thing, unless something significant has changed with either the
>urllib.request module, or the way it's decoded, or something, or is it
>just an RE issue?
>
>Thanks.

[toc] | [prev] | [next] | [standalone]


#8332

FromJohn Salerno <johnjsal@gmail.com>
Date2011-06-23 15:02 -0700
Message-ID<dc93c082-dc29-4d91-b49c-3493d4325d9a@m22g2000yqh.googlegroups.com>
In reply to#8329
On Jun 23, 4:47 pm, "Thomas L. Shinnick" <tshin...@prismnet.com>
wrote:
> There is also
>        print(match_obj.string)
> which gives you a copy of the string searched.  See end of section
> 6.2.5. Match Objects

I tried that, but the only time I wanted the string printed was when
there *wasn't* a match, so the match object was a NoneType.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web