Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.104 X-Spam-Level: * X-Spam-Evidence: '*H*': 0.80; '*S*': 0.01; 'ideally': 0.04; 'beginner': 0.05; 'column': 0.07; 'skip:u 30': 0.07; 'urllib2': 0.07; 'rows': 0.09; 'python': 0.11; '###': 0.16; '6th': 0.16; 'columns': 0.16; 'guys,': 0.16; 'luck!': 0.16; 'simple.': 0.16; 'url:example': 0.16; 'usable': 0.16; 'hey': 0.18; 'import': 0.22; 'to:name:python-list@python.org': 0.22; 'received:65.55.116': 0.24; 'tables': 0.26; 'excel': 0.26; 'header:In-Reply-To:1': 0.27; 'skip:- 40': 0.29; 'wondering': 0.29; "i'm": 0.30; 'url:mailman': 0.30; 'extract': 0.31; 'subject:Database': 0.31; 'url:python': 0.33; 'fri,': 0.33; 'table': 0.34; 'date:': 0.34; "i'd": 0.34; 'subject:from': 0.34; 'url:listinfo': 0.36; 'url:org': 0.36; 'list': 0.37; 'email addr:python.org': 0.37; 'to:addr:python- list': 0.38; 'subject:': 0.39; 'to:addr:python.org': 0.39; 'url:mail': 0.40; '2nd': 0.60; '5th': 0.60; 'skip:t 30': 0.61; 'simple': 0.61; 'here:': 0.62; 'email addr:gmail.com': 0.63; 'total': 0.65; 'different': 0.65; 'love': 0.65; 'email name :python-list': 0.65; '8bit%:40': 0.68; 'webpage': 0.68; 'skip:r 30': 0.69; 'online': 0.71; 'url:i': 0.72; '1st': 0.74; '4th': 0.74; 'url:page': 0.74; 'url:jpg': 0.83; '9th': 0.84; 'spreadsheet': 0.84; 'url:imgur': 0.84; '8bit%:33': 0.91; 'subject:Online': 0.96; '2013': 0.98 X-TMN: [Tia6Tb1cJmDo/lUVwCC1yDYr+5ZxYBlJ] X-Originating-Email: [carlosnepomuceno@outlook.com] From: Carlos Nepomuceno To: "python-list@python.org" Subject: RE: Total Beginner - Extracting Data from a Database Online (Screenshot) Date: Sat, 25 May 2013 02:36:35 +0300 Importance: Normal In-Reply-To: References: Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginalArrivalTime: 24 May 2013 23:36:35.0331 (UTC) FILETIME=[87688530:01CE58D7] X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 48 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1369438663 news.xs4all.nl 15916 [2001:888:2000:d::a6]:33449 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:45929 ### table_data_extraction.py ###=0A= # Usage: table[id][row][column]=0A= # tables[0]=A0=A0=A0=A0=A0=A0 : 1st table=0A= # tables[1][2]=A0=A0=A0 : 3rd row of 2nd table=0A= # tables[3][4][5] : cell content of 6th column of 5th row of 4th table=0A= # len(table)=A0=A0=A0=A0=A0 : quantity of tables=0A= # len(table[6])=A0=A0 : quantity of rows of 7th table=0A= # len(table[7][8]): quantity of columns of 9th row of 8th table=0A= =0A= impor re=0A= import urllib2=0A= =0A= #to retrieve the contents of the page=0A= page =3D urllib2.urlopen("http://example.com/page.html").read().strip()=0A= =0A= #to create the tables list=0A= tables=3D[[re.findall('(.*?)'=2Cr=2Cre.S) for r in re.findall('(.*?)'=2Ct=2Cre.S)] for t in re.findall('(.*?)
'=2Cpage= =2Cre.S)]=0A= =0A= =0A= Pretty simple. Good luck!=0A= =0A= ----------------------------------------=0A= > Date: Fri=2C 24 May 2013 10:32:26 -0700=0A= > Subject: Total Beginner - Extracting Data from a Database Online (Screens= hot)=0A= > From: logan.c.graham@gmail.com=0A= > To: python-list@python.org=0A= >=0A= > Hey guys=2C=0A= >=0A= > I'm learning Python and I'm experimenting with different projects -- I li= ke learning by doing. I'm wondering if you can help me here:=0A= >=0A= > http://i.imgur.com/KgvSKWk.jpg=0A= >=0A= > What this is is a publicly-accessible webpage that's a simple database of= people who have used the website. Ideally what I'd like to end up with is = an excel spreadsheet with data from the columns #fb=2C # vids=2C fb sent?= =2C # email tm.=0A= >=0A= > I'd like to use Python to do it -- crawl the page and extract the data in= a usable way.=0A= >=0A= > I'd love your input! I'm just a learner.=0A= > --=0A= > http://mail.python.org/mailman/listinfo/python-list =