Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #45929

RE: Total Beginner - Extracting Data from a Database Online (Screenshot)

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <carlosnepomuceno@outlook.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.104
X-Spam-Level *
X-Spam-Evidence '*H*': 0.80; '*S*': 0.01; 'ideally': 0.04; 'beginner': 0.05; 'column': 0.07; 'skip:u 30': 0.07; 'urllib2': 0.07; 'rows': 0.09; 'python': 0.11; '###': 0.16; '6th': 0.16; 'columns': 0.16; 'guys,': 0.16; 'luck!': 0.16; 'simple.': 0.16; 'url:example': 0.16; 'usable': 0.16; 'hey': 0.18; 'import': 0.22; 'to:name:python-list@python.org': 0.22; 'received:65.55.116': 0.24; 'tables': 0.26; 'excel': 0.26; 'header:In-Reply-To:1': 0.27; 'skip:- 40': 0.29; 'wondering': 0.29; "i'm": 0.30; 'url:mailman': 0.30; 'extract': 0.31; 'subject:Database': 0.31; 'url:python': 0.33; 'fri,': 0.33; 'table': 0.34; 'date:': 0.34; "i'd": 0.34; 'subject:from': 0.34; 'url:listinfo': 0.36; 'url:org': 0.36; 'list': 0.37; 'email addr:python.org': 0.37; 'to:addr:python- list': 0.38; 'subject:': 0.39; 'to:addr:python.org': 0.39; 'url:mail': 0.40; '2nd': 0.60; '5th': 0.60; 'skip:t 30': 0.61; 'simple': 0.61; 'here:': 0.62; 'email addr:gmail.com': 0.63; 'total': 0.65; 'different': 0.65; 'love': 0.65; 'email name :python-list': 0.65; '8bit%:40': 0.68; 'webpage': 0.68; 'skip:r 30': 0.69; 'online': 0.71; 'url:i': 0.72; '1st': 0.74; '4th': 0.74; 'url:page': 0.74; 'url:jpg': 0.83; '9th': 0.84; 'spreadsheet': 0.84; 'url:imgur': 0.84; '8bit%:33': 0.91; 'subject:Online': 0.96; '2013': 0.98
X-TMN [Tia6Tb1cJmDo/lUVwCC1yDYr+5ZxYBlJ]
X-Originating-Email [carlosnepomuceno@outlook.com]
From Carlos Nepomuceno <carlosnepomuceno@outlook.com>
To "python-list@python.org" <python-list@python.org>
Subject RE: Total Beginner - Extracting Data from a Database Online (Screenshot)
Date Sat, 25 May 2013 02:36:35 +0300
Importance Normal
In-Reply-To <b3730ef1-90bb-4ef4-8683-239e722aa1da@googlegroups.com>
References <b3730ef1-90bb-4ef4-8683-239e722aa1da@googlegroups.com>
Content-Type text/plain; charset="iso-8859-1"
Content-Transfer-Encoding quoted-printable
MIME-Version 1.0
X-OriginalArrivalTime 24 May 2013 23:36:35.0331 (UTC) FILETIME=[87688530:01CE58D7]
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2088.1369438663.3114.python-list@python.org> (permalink)
Lines 48
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1369438663 news.xs4all.nl 15916 [2001:888:2000:d::a6]:33449
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:45929

Show key headers only | View raw


### table_data_extraction.py ###
# Usage: table[id][row][column]
# tables[0]       : 1st table
# tables[1][2]    : 3rd row of 2nd table
# tables[3][4][5] : cell content of 6th column of 5th row of 4th table
# len(table)      : quantity of tables
# len(table[6])   : quantity of rows of 7th table
# len(table[7][8]): quantity of columns of 9th row of 8th table

impor re
import urllib2

#to retrieve the contents of the page
page = urllib2.urlopen("http://example.com/page.html").read().strip()

#to create the tables list
tables=[[re.findall('<TD>(.*?)</TD>',r,re.S) for r in re.findall('<TR>(.*?)</TR>',t,re.S)] for t in re.findall('<TABLE>(.*?)</TABLE>',page,re.S)]


Pretty simple. Good luck!

----------------------------------------
> Date: Fri, 24 May 2013 10:32:26 -0700
> Subject: Total Beginner - Extracting Data from a Database Online (Screenshot)
> From: logan.c.graham@gmail.com
> To: python-list@python.org
>
> Hey guys,
>
> I'm learning Python and I'm experimenting with different projects -- I like learning by doing. I'm wondering if you can help me here:
>
> http://i.imgur.com/KgvSKWk.jpg
>
> What this is is a publicly-accessible webpage that's a simple database of people who have used the website. Ideally what I'd like to end up with is an excel spreadsheet with data from the columns #fb, # vids, fb sent?, # email tm.
>
> I'd like to use Python to do it -- crawl the page and extract the data in a usable way.
>
> I'd love your input! I'm just a learner.
> --
> http://mail.python.org/mailman/listinfo/python-list 		 	   		  

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Total Beginner - Extracting Data from a Database Online (Screenshot) logan.c.graham@gmail.com - 2013-05-24 10:32 -0700
  Re: Total Beginner - Extracting Data from a Database Online (Screenshot) Dave Angel <davea@davea.name> - 2013-05-24 15:41 -0400
  RE: Total Beginner - Extracting Data from a Database Online (Screenshot) Carlos Nepomuceno <carlosnepomuceno@outlook.com> - 2013-05-25 02:36 +0300
    Re: Total Beginner - Extracting Data from a Database Online (Screenshot) John Ladasky <john_ladasky@sbcglobal.net> - 2013-05-25 18:33 -0700
      Re: Total Beginner - Extracting Data from a Database Online (Screenshot) logan.c.graham@gmail.com - 2013-05-27 17:58 -0700
        RE: Total Beginner - Extracting Data from a Database Online (Screenshot) Carlos Nepomuceno <carlosnepomuceno@outlook.com> - 2013-05-28 04:21 +0300
        RE: Total Beginner - Extracting Data from a Database Online (Screenshot) Phil Connell <pconnell@gmail.com> - 2013-05-28 07:40 +0100
  Re: Total Beginner - Extracting Data from a Database Online (Screenshot) Dave Angel <davea@davea.name> - 2013-05-24 21:16 -0400
  Re: Total Beginner - Extracting Data from a Database Online (Screenshot) Chris Angelico <rosuav@gmail.com> - 2013-05-25 13:22 +1000
    Re: Total Beginner - Extracting Data from a Database Online (Screenshot) logan.c.graham@gmail.com - 2013-05-25 17:48 -0700
  Total Beginner - Extracting Data from a Database Online (Screenshot) "neil.suffield@gmail.com" <neil.suffield@gmail.com> - 2013-05-25 03:13 -0700
  Total Beginner - Extracting Data from a Database Online (Screenshot) "neil.suffield@gmail.com" <neil.suffield@gmail.com> - 2013-05-25 03:15 -0700

csiph-web