Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #62946

Re: lookup xpath (other?) to value in html

References <CALyJZZXtwD=PKjnnOiAcm3TF6RJZVqjsZCqupooEb8YNBNVoGA@mail.gmail.com> <CANy1k1g6Ws4G2WkPaZDAm14cN9NESUNqhNk4O-jzctxp9TRCJQ@mail.gmail.com> <CALyJZZWBozyDRe7c7hVRExw_V9KSAPQ2+LWjnfkcAo4C2J-rew@mail.gmail.com>
Date 2013-12-31 22:30 -0700
Subject Re: lookup xpath (other?) to value in html
From Jason Friedman <jsf80238@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.4771.1388554247.18130.python-list@python.org> (permalink)

Show all headers | View raw


> For example this URL;
> http://jeffco.us/ats/displaygeneral.do?sch=001690
> The the land sqft is 11082.
> Google Chrome gives me the xpath to that data as;
> //*[@id="content"]/p[1]/table[4]/tbody/tr[2]/td[8]
>
> What I would like to do (using python) is given 11082 at what xpath can that
> be found? (may be more that one)
> The examples I can find using google refer to, given xpath what is the value
> (the opposite of what I want)

Which Chrome extension are you using to get that path?

Are you always interested in the square footage?  Here is a solution
using Beautiful Soup:

$ cat square-feet.py
#!/usr/bin/env python
import bs4
import requests
import sys
url = sys.argv[1]
request = requests.get(url)
soup = bs4.BeautifulSoup(request.text)
is_sqft_mark_found, is_total_mark_found = False, False
for line in soup.get_text().splitlines():
    if line.startswith("Land Sqft"):
        is_sqft_mark_found = True
        continue
    elif is_sqft_mark_found and line.startswith("Total"):
        is_total_mark_found = True
        continue
    elif is_total_mark_found:
        print(line.strip() + " total square feet.")
        break

$ python3 square-feet.py http://jeffco.us/ats/displaygeneral.do?sch=001690
11082 total square feet.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: lookup xpath (other?) to value in html Jason Friedman <jsf80238@gmail.com> - 2013-12-31 22:30 -0700

csiph-web