Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #45929
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <carlosnepomuceno@outlook.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.104 |
| X-Spam-Level | * |
| X-Spam-Evidence | '*H*': 0.80; '*S*': 0.01; 'ideally': 0.04; 'beginner': 0.05; 'column': 0.07; 'skip:u 30': 0.07; 'urllib2': 0.07; 'rows': 0.09; 'python': 0.11; '###': 0.16; '6th': 0.16; 'columns': 0.16; 'guys,': 0.16; 'luck!': 0.16; 'simple.': 0.16; 'url:example': 0.16; 'usable': 0.16; 'hey': 0.18; 'import': 0.22; 'to:name:python-list@python.org': 0.22; 'received:65.55.116': 0.24; 'tables': 0.26; 'excel': 0.26; 'header:In-Reply-To:1': 0.27; 'skip:- 40': 0.29; 'wondering': 0.29; "i'm": 0.30; 'url:mailman': 0.30; 'extract': 0.31; 'subject:Database': 0.31; 'url:python': 0.33; 'fri,': 0.33; 'table': 0.34; 'date:': 0.34; "i'd": 0.34; 'subject:from': 0.34; 'url:listinfo': 0.36; 'url:org': 0.36; 'list': 0.37; 'email addr:python.org': 0.37; 'to:addr:python- list': 0.38; 'subject:': 0.39; 'to:addr:python.org': 0.39; 'url:mail': 0.40; '2nd': 0.60; '5th': 0.60; 'skip:t 30': 0.61; 'simple': 0.61; 'here:': 0.62; 'email addr:gmail.com': 0.63; 'total': 0.65; 'different': 0.65; 'love': 0.65; 'email name :python-list': 0.65; '8bit%:40': 0.68; 'webpage': 0.68; 'skip:r 30': 0.69; 'online': 0.71; 'url:i': 0.72; '1st': 0.74; '4th': 0.74; 'url:page': 0.74; 'url:jpg': 0.83; '9th': 0.84; 'spreadsheet': 0.84; 'url:imgur': 0.84; '8bit%:33': 0.91; 'subject:Online': 0.96; '2013': 0.98 |
| X-TMN | [Tia6Tb1cJmDo/lUVwCC1yDYr+5ZxYBlJ] |
| X-Originating-Email | [carlosnepomuceno@outlook.com] |
| From | Carlos Nepomuceno <carlosnepomuceno@outlook.com> |
| To | "python-list@python.org" <python-list@python.org> |
| Subject | RE: Total Beginner - Extracting Data from a Database Online (Screenshot) |
| Date | Sat, 25 May 2013 02:36:35 +0300 |
| Importance | Normal |
| In-Reply-To | <b3730ef1-90bb-4ef4-8683-239e722aa1da@googlegroups.com> |
| References | <b3730ef1-90bb-4ef4-8683-239e722aa1da@googlegroups.com> |
| Content-Type | text/plain; charset="iso-8859-1" |
| Content-Transfer-Encoding | quoted-printable |
| MIME-Version | 1.0 |
| X-OriginalArrivalTime | 24 May 2013 23:36:35.0331 (UTC) FILETIME=[87688530:01CE58D7] |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.2088.1369438663.3114.python-list@python.org> (permalink) |
| Lines | 48 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1369438663 news.xs4all.nl 15916 [2001:888:2000:d::a6]:33449 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:45929 |
Show key headers only | View raw
### table_data_extraction.py ###
# Usage: table[id][row][column]
# tables[0] : 1st table
# tables[1][2] : 3rd row of 2nd table
# tables[3][4][5] : cell content of 6th column of 5th row of 4th table
# len(table) : quantity of tables
# len(table[6]) : quantity of rows of 7th table
# len(table[7][8]): quantity of columns of 9th row of 8th table
impor re
import urllib2
#to retrieve the contents of the page
page = urllib2.urlopen("http://example.com/page.html").read().strip()
#to create the tables list
tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]
Pretty simple. Good luck!
----------------------------------------
> Date: Fri, 24 May 2013 10:32:26 -0700
> Subject: Total Beginner - Extracting Data from a Database Online (Screenshot)
> From: logan.c.graham@gmail.com
> To: python-list@python.org
>
> Hey guys,
>
> I'm learning Python and I'm experimenting with different projects -- I like learning by doing. I'm wondering if you can help me here:
>
> http://i.imgur.com/KgvSKWk.jpg
>
> What this is is a publicly-accessible webpage that's a simple database of people who have used the website. Ideally what I'd like to end up with is an excel spreadsheet with data from the columns #fb, # vids, fb sent?, # email tm.
>
> I'd like to use Python to do it -- crawl the page and extract the data in a usable way.
>
> I'd love your input! I'm just a learner.
> --
> http://mail.python.org/mailman/listinfo/python-list
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Total Beginner - Extracting Data from a Database Online (Screenshot) logan.c.graham@gmail.com - 2013-05-24 10:32 -0700
Re: Total Beginner - Extracting Data from a Database Online (Screenshot) Dave Angel <davea@davea.name> - 2013-05-24 15:41 -0400
RE: Total Beginner - Extracting Data from a Database Online (Screenshot) Carlos Nepomuceno <carlosnepomuceno@outlook.com> - 2013-05-25 02:36 +0300
Re: Total Beginner - Extracting Data from a Database Online (Screenshot) John Ladasky <john_ladasky@sbcglobal.net> - 2013-05-25 18:33 -0700
Re: Total Beginner - Extracting Data from a Database Online (Screenshot) logan.c.graham@gmail.com - 2013-05-27 17:58 -0700
RE: Total Beginner - Extracting Data from a Database Online (Screenshot) Carlos Nepomuceno <carlosnepomuceno@outlook.com> - 2013-05-28 04:21 +0300
RE: Total Beginner - Extracting Data from a Database Online (Screenshot) Phil Connell <pconnell@gmail.com> - 2013-05-28 07:40 +0100
Re: Total Beginner - Extracting Data from a Database Online (Screenshot) Dave Angel <davea@davea.name> - 2013-05-24 21:16 -0400
Re: Total Beginner - Extracting Data from a Database Online (Screenshot) Chris Angelico <rosuav@gmail.com> - 2013-05-25 13:22 +1000
Re: Total Beginner - Extracting Data from a Database Online (Screenshot) logan.c.graham@gmail.com - 2013-05-25 17:48 -0700
Total Beginner - Extracting Data from a Database Online (Screenshot) "neil.suffield@gmail.com" <neil.suffield@gmail.com> - 2013-05-25 03:13 -0700
Total Beginner - Extracting Data from a Database Online (Screenshot) "neil.suffield@gmail.com" <neil.suffield@gmail.com> - 2013-05-25 03:15 -0700
csiph-web