Groups > comp.lang.python > #52474 > unrolled thread

Getting a value that follows string.find()

Started by	englishkevin110@gmail.com
First post	2013-08-13 15:51 -0700
Last post	2013-08-14 15:58 +0000
Articles	8 — 5 participants

Back to article view | Back to comp.lang.python

  Getting a value that follows string.find() englishkevin110@gmail.com - 2013-08-13 15:51 -0700
    Re: Getting a value that follows string.find() Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-13 18:58 -0400
      Re: Getting a value that follows string.find() englishkevin110@gmail.com - 2013-08-13 16:03 -0700
        Re: Getting a value that follows string.find() Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-13 19:18 -0400
        Re: Getting a value that follows string.find() Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-13 19:40 -0400
        Re: Getting a value that follows string.find() Steven D'Aprano <steve@pearwood.info> - 2013-08-14 06:29 +0000
    Re: Getting a value that follows string.find() Dave Angel <davea@davea.name> - 2013-08-14 01:31 +0000
    Re: Getting a value that follows string.find() John Gordon <gordon@panix.com> - 2013-08-14 15:58 +0000

#52474 — Getting a value that follows string.find()

From	englishkevin110@gmail.com
Date	2013-08-13 15:51 -0700
Subject	Getting a value that follows string.find()
Message-ID	<40816fed-38d4-4baa-92cc-c80cd8febd82@googlegroups.com>

I know the title doesn't make much sense, but I didnt know how to explain my problem.

Anywho, I've opened a page's source in URLLIB
starturlsource = starturlopen.read()
string.find(starturlsource, '<a href="/profile.php?id=')
And I used string.find to find a specific area in the page's source.
I want to store what comes after ?id= in a variable.
Can someone help me with this?

[toc] | [next] | [standalone]

#52475

From	Joel Goldstick <joel.goldstick@gmail.com>
Date	2013-08-13 18:58 -0400
Message-ID	<mailman.548.1376434691.1251.python-list@python.org>
In reply to	#52474

lookup urlparse for you answer

On Tue, Aug 13, 2013 at 6:51 PM,  <englishkevin110@gmail.com> wrote:
> I know the title doesn't make much sense, but I didnt know how to explain my problem.
>
> Anywho, I've opened a page's source in URLLIB
> starturlsource = starturlopen.read()
> string.find(starturlsource, '<a href="/profile.php?id=')
> And I used string.find to find a specific area in the page's source.
> I want to store what comes after ?id= in a variable.
> Can someone help me with this?
> --
> http://mail.python.org/mailman/listinfo/python-list



-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [next] | [standalone]

#52476

From	englishkevin110@gmail.com
Date	2013-08-13 16:03 -0700
Message-ID	<5b73c6fe-a282-4d28-ab29-2e1dfdd09290@googlegroups.com>
In reply to	#52475

On Tuesday, August 13, 2013 5:58:07 PM UTC-5, Joel Goldstick wrote:
> lookup urlparse for you answer
> 
> 
> 
> On Tue, Aug 13, 2013 at 6:51 PM,  <> wrote:
> 
> > I know the title doesn't make much sense, but I didnt know how to explain my problem.
> 
> >
> 
> > Anywho, I've opened a page's source in URLLIB
> 
> > starturlsource = starturlopen.read()
> 
> > string.find(starturlsource, '<a href="/profile.php?id=')
> 
> > And I used string.find to find a specific area in the page's source.
> 
> > I want to store what comes after ?id= in a variable.
> 
> > Can someone help me with this?
> 
> > --
> 
> > http://mail.python.org/mailman/listinfo/python-list
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Joel Goldstick
> 
> http://joelgoldstick.com

I dont want to do any kind of HTML parsing.

[toc] | [prev] | [next] | [standalone]

#52479

From	Joel Goldstick <joel.goldstick@gmail.com>
Date	2013-08-13 19:18 -0400
Message-ID	<mailman.551.1376435925.1251.python-list@python.org>
In reply to	#52476

On Tue, Aug 13, 2013 at 7:03 PM,  <englishkevin110@gmail.com> wrote:
> On Tuesday, August 13, 2013 5:58:07 PM UTC-5, Joel Goldstick wrote:
>> lookup urlparse for you answer
>>
>>
>>
>> On Tue, Aug 13, 2013 at 6:51 PM,  <> wrote:
>>
>> > I know the title doesn't make much sense, but I didnt know how to explain my problem.
>>
>> >
>>
>> > Anywho, I've opened a page's source in URLLIB
>>
>> > starturlsource = starturlopen.read()
>>
>> > string.find(starturlsource, '<a href="/profile.php?id=')
>>
>> > And I used string.find to find a specific area in the page's source.
>>
>> > I want to store what comes after ?id= in a variable.
>>
>> > Can someone help me with this?
>>
>> > --
>>
>> > http://mail.python.org/mailman/listinfo/python-list
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Joel Goldstick
>>
>> http://joelgoldstick.com
>
> I dont want to do any kind of HTML parsing.

Aside from the fact that I really want a pony, and you seem to want
your work done for you, look here:

http://stackoverflow.com/questions/11600681/parse-query-part-from-url
> --
> http://mail.python.org/mailman/listinfo/python-list



-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [next] | [standalone]

#52481

From	Joel Goldstick <joel.goldstick@gmail.com>
Date	2013-08-13 19:40 -0400
Message-ID	<mailman.553.1376437227.1251.python-list@python.org>
In reply to	#52476

On Tue, Aug 13, 2013 at 7:18 PM, Joel Goldstick
<joel.goldstick@gmail.com> wrote:
> On Tue, Aug 13, 2013 at 7:03 PM,  <englishkevin110@gmail.com> wrote:
>> On Tuesday, August 13, 2013 5:58:07 PM UTC-5, Joel Goldstick wrote:
>>> lookup urlparse for you answer
>>>
>>>
>>>
>>> On Tue, Aug 13, 2013 at 6:51 PM,  <> wrote:
>>>
>>> > I know the title doesn't make much sense, but I didnt know how to explain my problem.
>>>
>>> >
>>>
>>> > Anywho, I've opened a page's source in URLLIB
>>>
>>> > starturlsource = starturlopen.read()
>>>
>>> > string.find(starturlsource, '<a href="/profile.php?id=')
>>>
>>> > And I used string.find to find a specific area in the page's source.
>>>
>>> > I want to store what comes after ?id= in a variable.
>>>
>>> > Can someone help me with this?
>>>
>>> > --
>>>
>>> > http://mail.python.org/mailman/listinfo/python-list
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Joel Goldstick
>>>
>>> http://joelgoldstick.com
>>
>> I dont want to do any kind of HTML parsing.
>
> Aside from the fact that I really want a pony, and you seem to want
> your work done for you, look here:
>
> http://stackoverflow.com/questions/11600681/parse-query-part-from-url
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
>
>
> --
> Joel Goldstick
> http://joelgoldstick.com

I may have been too quick on my reading of you question.  You wanted
to get the value of the parameters, but also to find the url in the
page.  You want to do this without parsing, if I understand you.  The
good news is there is a module called Beautiful Soup that will do the
parsing for you.  The tutorial is way better than excellent, and you
will be up and running in less than a half hour from downloading the
module

http://www.crummy.com/software/BeautifulSoup/bs4/doc/

-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [next] | [standalone]

#52497

From	Steven D'Aprano <steve@pearwood.info>
Date	2013-08-14 06:29 +0000
Message-ID	<520b23d3$0$29885$c3e8da3$5496439d@news.astraweb.com>
In reply to	#52476

On Tue, 13 Aug 2013 16:03:46 -0700, englishkevin110 wrote:

> On Tuesday, August 13, 2013 5:58:07 PM UTC-5, Joel Goldstick wrote:
[fixing Joel's top-posting]

>> On Tue, Aug 13, 2013 at 6:51 PM,  <> wrote:
>> 
>> > I know the title doesn't make much sense, but I didnt know how to
>> > explain my problem.
>> 
>> 
>> >
>> > Anywho, I've opened a page's source in URLLIB
>> 
>> > starturlsource = starturlopen.read()
>> 
>> > string.find(starturlsource, '<a href="/profile.php?id=')
>> 
>> > And I used string.find to find a specific area in the page's source.
>> 
>> > I want to store what comes after ?id= in a variable.
>> 
>> > Can someone help me with this?

>> lookup urlparse for you answer

> I dont want to do any kind of HTML parsing.

What you are doing *is* HTML parsing, or at least a half-baked, fragile, 
likely to go wrong form of parsing.

But if you insist, the algorithm is simple: after calling find(), you 
have the offset to the search string. You know the length of the search 
string. Therefore you can calculate the index of the first character that 
follows the search string:

text = "blah blah blah blah spam spam... blah blah blah blah..."
needle = "spam spam"  # what we search for

i = text.find(needle)
if i == -1:
    print("not found")
else:
    print(text[i+len(needle):])

Of course, the problem is, you need to know not just the *start* offset 
of the bit that follows, but the *ending* offset as well. Which brings 
you into the realm of half-arsed parsing.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#52487

From	Dave Angel <davea@davea.name>
Date	2013-08-14 01:31 +0000
Message-ID	<mailman.557.1376443939.1251.python-list@python.org>
In reply to	#52474

englishkevin110@gmail.com wrote:

> I know the title doesn't make much sense, but I didnt know how to explain my problem.
>
> Anywho, I've opened a page's source in URLLIB
> starturlsource = starturlopen.read()
> string.find(starturlsource, '<a href="/profile.php?id=')
> And I used string.find to find a specific area in the page's source.
> I want to store what comes after ?id= in a variable.
> Can someone help me with this?

Python 3.3.0 (default, Mar  7 2013, 00:24:38) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import string
>>> help(string.find)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'find'

There is no find function in the string module [1].  But assuming
starturlsource is a str, you could do:

pattern =  '<a href="/profile.php?id='
index = starturlsource.find( pattern )

index will then be -1 if there's no match, or have a non-negative value
if a match is found.

In the latter case, you can extract the next 17 characters with

newstr = starturlsource[index+len(pattern):index+len(pattern)+17]

You are of course making several assumptions about the web page, which
are perfectly reasonable since it's a page under your control.  Or is
it?

[1]  Assuming Python 3.3 since you omitted stating the version you're
using.  But even in Python 2.7, using the string.find function is
deprecated in favor of the str method.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#52523

From	John Gordon <gordon@panix.com>
Date	2013-08-14 15:58 +0000
Message-ID	<kug9ea$267$1@reader1.panix.com>
In reply to	#52474

In <40816fed-38d4-4baa-92cc-c80cd8febd82@googlegroups.com> englishkevin110@gmail.com writes:

> I know the title doesn't make much sense, but I didnt know how to explain my problem.

> Anywho, I've opened a page's source in URLLIB
> starturlsource = starturlopen.read()
> string.find(starturlsource, '<a href="/profile.php?id=')
> And I used string.find to find a specific area in the page's source.
> I want to store what comes after ?id= in a variable.
> Can someone help me with this?

starturlsource = starturlopen.read()

match_string = '<a href="/profile.php?id='

match_index = string.find(starturlsource, match_string)

if match_index != -1:
    url = starturlsource[match_index + len(match_string):]

else:
    print 'not found'

-- 
John Gordon                   A is for Amy, who fell down the stairs
gordon@panix.com              B is for Basil, assaulted by bears
                                -- Edward Gorey, "The Gashlycrumb Tinies"

[toc] | [prev] | [standalone]

csiph-web

Getting a value that follows string.find()

Contents

#52474 — Getting a value that follows string.find()

#52475

#52476

#52479

#52481

#52497

#52487

#52523