Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #43487

RE: extract HTML table in a structured format

Path csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <ramit.prasad@jpmorgan.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.013
X-Spam-Evidence '*H*': 0.97; '*S*': 0.00; 'completeness': 0.07; 'received:155': 0.09; 'api': 0.11; 'disclaimers': 0.16; 'disclaimers,': 0.16; 'from:addr:jpmorgan.com': 0.16; 'programmatic': 0.16; 'received:155.180': 0.16; 'received:159': 0.16; 'received:159.53': 0.16; 'received:159.53.110': 0.16; 'received:exchad.jpmchase.net': 0.16; 'received:jpmchase.com': 0.16; 'received:jpmchase.net': 0.16; 'securities,': 0.16; 'subject:format': 0.16; 'url:disclosures': 0.16; 'url:jpmorgan': 0.16; 'helpful': 0.24; 'script': 0.25; 'header:In-Reply-To:1': 0.27; 'to:2**1': 0.27; 'wonder': 0.29; 'url:wiki': 0.31; 'extract': 0.31; 'url:wikipedia': 0.31; 'way?': 0.31; 'figure': 0.32; 'received:169.254': 0.32; 'table': 0.34; 'could': 0.34; 'but': 0.35; 'there': 0.35; 'accuracy': 0.36; 'format.': 0.36; 'ubuntu': 0.36; 'charset:us-ascii': 0.36; 'hi,': 0.36; 'url:org': 0.36; 'should': 0.36; 'received:169': 0.37; 'question,': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'release': 0.40; 'information,': 0.61; 'purchase': 0.65; 'to:addr:gmail.com': 0.65; 'bottom': 0.67; 'subject': 0.69; 'legal': 0.71; 'sale': 0.75; 'received:169.254.8': 0.84
X-DKIM OpenDKIM Filter v2.1.3 sj1.jpmchase.com r3CM3m7B011305
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=jpmorgan.com; s=smtpout; t=1365804228; bh=ewGcbx1r6HF7SjqqClXdY9RmgrTN2BB5EuCvZsC63Uw=; h=From:To:Subject:Date:Message-ID:References:In-Reply-To: Content-Transfer-Encoding:MIME-Version:Content-Type; b=AP9vfdFg44F3RytA6hnByGrGYmM+f8f3DqT81sJocgc+AmJwyOZv+ghQyjISJ5vpV xQW3vVkoiLeOG/piWdPsBnJkknu6RyRs/KPqMm7nTh4Dci0oOdARHnTo0gxmDxP/iI 6yEFCY1MAXAJn3DSWoH4Hk9S+ZJ33Fhq8xrauiS4=
From "Prasad, Ramit" <ramit.prasad@jpmorgan.com>
To Jabba Laci <jabba.laci@gmail.com>, Python mailing list <python-list@python.org>
Subject RE: extract HTML table in a structured format
Thread-Topic extract HTML table in a structured format
Thread-Index AQHONcen0NI73OtFukacjqbNNcLb3JjTJeyA
Date Fri, 12 Apr 2013 22:00:25 +0000
References <CAOuJsM=u75nv-TxVCpXdcxmfyhxyY0v-NTYPEeGh1MmMuzxCVg@mail.gmail.com>
In-Reply-To <CAOuJsM=u75nv-TxVCpXdcxmfyhxyY0v-NTYPEeGh1MmMuzxCVg@mail.gmail.com>
Accept-Language en-US
Content-Language en-US
X-MS-Has-Attach
X-MS-TNEF-Correlator
x-originating-ip [10.67.79.47]
Content-Transfer-Encoding quoted-printable
MIME-Version 1.0
X-DLP-FWD Yes
Content-Type text/plain; charset="us-ascii"
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.538.1365804231.3114.python-list@python.org> (permalink)
Lines 15
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1365804231 news.xs4all.nl 2628 [2001:888:2000:d::a6]:44138
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:43487

Show key headers only | View raw


Jabba Laci
> Hi,
> 
> I wonder if there is a nice way to extract a whole HTML table and have the result in a nice structured
> format. What I want is to have the lifetime table at the bottom of this page:
> http://en.wikipedia.org/wiki/List_of_Ubuntu_releases (then figure out with a script until when my
> Ubuntu release is supported).
> 
> I could do it with BeautifulSoup or lxml but is there a better way? There should be :)
> 

I know you already answered your question, but thought this might be helpful
in the future.

Wikipedia has an API for programmatic access.
http://www.mediawiki.org/wiki/API 


~Ramit


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

RE: extract HTML table in a structured format "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2013-04-12 22:00 +0000

csiph-web