Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #52474 > unrolled thread
| Started by | englishkevin110@gmail.com |
|---|---|
| First post | 2013-08-13 15:51 -0700 |
| Last post | 2013-08-14 15:58 +0000 |
| Articles | 8 — 5 participants |
Back to article view | Back to comp.lang.python
Getting a value that follows string.find() englishkevin110@gmail.com - 2013-08-13 15:51 -0700
Re: Getting a value that follows string.find() Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-13 18:58 -0400
Re: Getting a value that follows string.find() englishkevin110@gmail.com - 2013-08-13 16:03 -0700
Re: Getting a value that follows string.find() Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-13 19:18 -0400
Re: Getting a value that follows string.find() Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-13 19:40 -0400
Re: Getting a value that follows string.find() Steven D'Aprano <steve@pearwood.info> - 2013-08-14 06:29 +0000
Re: Getting a value that follows string.find() Dave Angel <davea@davea.name> - 2013-08-14 01:31 +0000
Re: Getting a value that follows string.find() John Gordon <gordon@panix.com> - 2013-08-14 15:58 +0000
| From | englishkevin110@gmail.com |
|---|---|
| Date | 2013-08-13 15:51 -0700 |
| Subject | Getting a value that follows string.find() |
| Message-ID | <40816fed-38d4-4baa-92cc-c80cd8febd82@googlegroups.com> |
I know the title doesn't make much sense, but I didnt know how to explain my problem. Anywho, I've opened a page's source in URLLIB starturlsource = starturlopen.read() string.find(starturlsource, '<a href="/profile.php?id=') And I used string.find to find a specific area in the page's source. I want to store what comes after ?id= in a variable. Can someone help me with this?
[toc] | [next] | [standalone]
| From | Joel Goldstick <joel.goldstick@gmail.com> |
|---|---|
| Date | 2013-08-13 18:58 -0400 |
| Message-ID | <mailman.548.1376434691.1251.python-list@python.org> |
| In reply to | #52474 |
lookup urlparse for you answer On Tue, Aug 13, 2013 at 6:51 PM, <englishkevin110@gmail.com> wrote: > I know the title doesn't make much sense, but I didnt know how to explain my problem. > > Anywho, I've opened a page's source in URLLIB > starturlsource = starturlopen.read() > string.find(starturlsource, '<a href="/profile.php?id=') > And I used string.find to find a specific area in the page's source. > I want to store what comes after ?id= in a variable. > Can someone help me with this? > -- > http://mail.python.org/mailman/listinfo/python-list -- Joel Goldstick http://joelgoldstick.com
[toc] | [prev] | [next] | [standalone]
| From | englishkevin110@gmail.com |
|---|---|
| Date | 2013-08-13 16:03 -0700 |
| Message-ID | <5b73c6fe-a282-4d28-ab29-2e1dfdd09290@googlegroups.com> |
| In reply to | #52475 |
On Tuesday, August 13, 2013 5:58:07 PM UTC-5, Joel Goldstick wrote: > lookup urlparse for you answer > > > > On Tue, Aug 13, 2013 at 6:51 PM, <> wrote: > > > I know the title doesn't make much sense, but I didnt know how to explain my problem. > > > > > > Anywho, I've opened a page's source in URLLIB > > > starturlsource = starturlopen.read() > > > string.find(starturlsource, '<a href="/profile.php?id=') > > > And I used string.find to find a specific area in the page's source. > > > I want to store what comes after ?id= in a variable. > > > Can someone help me with this? > > > -- > > > http://mail.python.org/mailman/listinfo/python-list > > > > > > > > -- > > Joel Goldstick > > http://joelgoldstick.com I dont want to do any kind of HTML parsing.
[toc] | [prev] | [next] | [standalone]
| From | Joel Goldstick <joel.goldstick@gmail.com> |
|---|---|
| Date | 2013-08-13 19:18 -0400 |
| Message-ID | <mailman.551.1376435925.1251.python-list@python.org> |
| In reply to | #52476 |
On Tue, Aug 13, 2013 at 7:03 PM, <englishkevin110@gmail.com> wrote: > On Tuesday, August 13, 2013 5:58:07 PM UTC-5, Joel Goldstick wrote: >> lookup urlparse for you answer >> >> >> >> On Tue, Aug 13, 2013 at 6:51 PM, <> wrote: >> >> > I know the title doesn't make much sense, but I didnt know how to explain my problem. >> >> > >> >> > Anywho, I've opened a page's source in URLLIB >> >> > starturlsource = starturlopen.read() >> >> > string.find(starturlsource, '<a href="/profile.php?id=') >> >> > And I used string.find to find a specific area in the page's source. >> >> > I want to store what comes after ?id= in a variable. >> >> > Can someone help me with this? >> >> > -- >> >> > http://mail.python.org/mailman/listinfo/python-list >> >> >> >> >> >> >> >> -- >> >> Joel Goldstick >> >> http://joelgoldstick.com > > I dont want to do any kind of HTML parsing. Aside from the fact that I really want a pony, and you seem to want your work done for you, look here: http://stackoverflow.com/questions/11600681/parse-query-part-from-url > -- > http://mail.python.org/mailman/listinfo/python-list -- Joel Goldstick http://joelgoldstick.com
[toc] | [prev] | [next] | [standalone]
| From | Joel Goldstick <joel.goldstick@gmail.com> |
|---|---|
| Date | 2013-08-13 19:40 -0400 |
| Message-ID | <mailman.553.1376437227.1251.python-list@python.org> |
| In reply to | #52476 |
On Tue, Aug 13, 2013 at 7:18 PM, Joel Goldstick <joel.goldstick@gmail.com> wrote: > On Tue, Aug 13, 2013 at 7:03 PM, <englishkevin110@gmail.com> wrote: >> On Tuesday, August 13, 2013 5:58:07 PM UTC-5, Joel Goldstick wrote: >>> lookup urlparse for you answer >>> >>> >>> >>> On Tue, Aug 13, 2013 at 6:51 PM, <> wrote: >>> >>> > I know the title doesn't make much sense, but I didnt know how to explain my problem. >>> >>> > >>> >>> > Anywho, I've opened a page's source in URLLIB >>> >>> > starturlsource = starturlopen.read() >>> >>> > string.find(starturlsource, '<a href="/profile.php?id=') >>> >>> > And I used string.find to find a specific area in the page's source. >>> >>> > I want to store what comes after ?id= in a variable. >>> >>> > Can someone help me with this? >>> >>> > -- >>> >>> > http://mail.python.org/mailman/listinfo/python-list >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Joel Goldstick >>> >>> http://joelgoldstick.com >> >> I dont want to do any kind of HTML parsing. > > Aside from the fact that I really want a pony, and you seem to want > your work done for you, look here: > > http://stackoverflow.com/questions/11600681/parse-query-part-from-url >> -- >> http://mail.python.org/mailman/listinfo/python-list > > > > -- > Joel Goldstick > http://joelgoldstick.com I may have been too quick on my reading of you question. You wanted to get the value of the parameters, but also to find the url in the page. You want to do this without parsing, if I understand you. The good news is there is a module called Beautiful Soup that will do the parsing for you. The tutorial is way better than excellent, and you will be up and running in less than a half hour from downloading the module http://www.crummy.com/software/BeautifulSoup/bs4/doc/ -- Joel Goldstick http://joelgoldstick.com
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2013-08-14 06:29 +0000 |
| Message-ID | <520b23d3$0$29885$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #52476 |
On Tue, 13 Aug 2013 16:03:46 -0700, englishkevin110 wrote:
> On Tuesday, August 13, 2013 5:58:07 PM UTC-5, Joel Goldstick wrote:
[fixing Joel's top-posting]
>> On Tue, Aug 13, 2013 at 6:51 PM, <> wrote:
>>
>> > I know the title doesn't make much sense, but I didnt know how to
>> > explain my problem.
>>
>>
>> >
>> > Anywho, I've opened a page's source in URLLIB
>>
>> > starturlsource = starturlopen.read()
>>
>> > string.find(starturlsource, '<a href="/profile.php?id=')
>>
>> > And I used string.find to find a specific area in the page's source.
>>
>> > I want to store what comes after ?id= in a variable.
>>
>> > Can someone help me with this?
>> lookup urlparse for you answer
> I dont want to do any kind of HTML parsing.
What you are doing *is* HTML parsing, or at least a half-baked, fragile,
likely to go wrong form of parsing.
But if you insist, the algorithm is simple: after calling find(), you
have the offset to the search string. You know the length of the search
string. Therefore you can calculate the index of the first character that
follows the search string:
text = "blah blah blah blah spam spam... blah blah blah blah..."
needle = "spam spam" # what we search for
i = text.find(needle)
if i == -1:
print("not found")
else:
print(text[i+len(needle):])
Of course, the problem is, you need to know not just the *start* offset
of the bit that follows, but the *ending* offset as well. Which brings
you into the realm of half-arsed parsing.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-08-14 01:31 +0000 |
| Message-ID | <mailman.557.1376443939.1251.python-list@python.org> |
| In reply to | #52474 |
englishkevin110@gmail.com wrote: > I know the title doesn't make much sense, but I didnt know how to explain my problem. > > Anywho, I've opened a page's source in URLLIB > starturlsource = starturlopen.read() > string.find(starturlsource, '<a href="/profile.php?id=') > And I used string.find to find a specific area in the page's source. > I want to store what comes after ?id= in a variable. > Can someone help me with this? Python 3.3.0 (default, Mar 7 2013, 00:24:38) [GCC 4.6.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import string >>> help(string.find) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'module' object has no attribute 'find' There is no find function in the string module [1]. But assuming starturlsource is a str, you could do: pattern = '<a href="/profile.php?id=' index = starturlsource.find( pattern ) index will then be -1 if there's no match, or have a non-negative value if a match is found. In the latter case, you can extract the next 17 characters with newstr = starturlsource[index+len(pattern):index+len(pattern)+17] You are of course making several assumptions about the web page, which are perfectly reasonable since it's a page under your control. Or is it? [1] Assuming Python 3.3 since you omitted stating the version you're using. But even in Python 2.7, using the string.find function is deprecated in favor of the str method. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | John Gordon <gordon@panix.com> |
|---|---|
| Date | 2013-08-14 15:58 +0000 |
| Message-ID | <kug9ea$267$1@reader1.panix.com> |
| In reply to | #52474 |
In <40816fed-38d4-4baa-92cc-c80cd8febd82@googlegroups.com> englishkevin110@gmail.com writes:
> I know the title doesn't make much sense, but I didnt know how to explain my problem.
> Anywho, I've opened a page's source in URLLIB
> starturlsource = starturlopen.read()
> string.find(starturlsource, '<a href="/profile.php?id=')
> And I used string.find to find a specific area in the page's source.
> I want to store what comes after ?id= in a variable.
> Can someone help me with this?
starturlsource = starturlopen.read()
match_string = '<a href="/profile.php?id='
match_index = string.find(starturlsource, match_string)
if match_index != -1:
url = starturlsource[match_index + len(match_string):]
else:
print 'not found'
--
John Gordon A is for Amy, who fell down the stairs
gordon@panix.com B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web