Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #107606

Re: Scraping email to make invoice

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Grant Edwards <grant.b.edwards@gmail.com>
Newsgroups comp.lang.python
Subject Re: Scraping email to make invoice
Date Mon, 25 Apr 2016 14:39:45 +0000 (UTC)
Lines 35
Message-ID <mailman.80.1461595206.32212.python-list@python.org> (permalink)
References <e75f5681-6e6f-424f-8697-b01c94d0f3ce@googlegroups.com> <571D548D.2040500@gmail.com> <nfla7h$bbk$1@ger.gmane.org>
X-Trace news.uni-berlin.de P9WZvNPxiKSA0bRK51odNwQbIcLgQIolSCNUdhmpzQ9g==
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.028
X-Spam-Evidence '*H*': 0.94; '*S*': 0.00; 'configure': 0.04; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'received:qwest.net': 0.09; 'python': 0.10; 'file,': 0.15; 'directly?': 0.16; 'input:': 0.16; 'received:80.91.229.3': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'scrape': 0.16; 'subject:make': 0.16; 'wrote:': 0.16; 'basically': 0.18; 'library': 0.20; 'parse': 0.22; 'text,': 0.22; 'tables': 0.23; 'plain': 0.24; "i've": 0.25; 'header:User-Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'went': 0.28; 'freelancer': 0.29; "i'd": 0.31; 'table': 0.32; 'good.': 0.32; 'michael': 0.33; 'invoice': 0.34; 'file': 0.34; 'this?': 0.34; 'could': 0.35; 'text': 0.35; 'ahead': 0.35; 'but': 0.36; 'too': 0.36; 'there': 0.36; 'basic': 0.36; "wasn't": 0.36; 'to:addr:python-list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; '(with': 0.38; 'sure': 0.39; 'easily': 0.39; 'to:addr:python.org': 0.40; 'easy': 0.60; 'save': 0.60; 'your': 0.60; 'body': 0.61; 'per': 0.62; 'art': 0.62; 'relatively': 0.63; 'series': 0.65; 'body.': 0.66; 'wish': 0.71; 'special': 0.73; 'dead.': 0.84; 'is)': 0.84; 'mars': 0.84; 'miserable': 0.84; 'edwards': 0.91; 'steps.': 0.91
X-Injected-Via-Gmane http://gmane.org/
X-Gmane-NNTP-Posting-Host 67-130-15-94.dia.static.qwest.net
User-Agent slrn/1.0.2 (Linux)
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.22
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID <nfla7h$bbk$1@ger.gmane.org>
X-Mailman-Original-References <e75f5681-6e6f-424f-8697-b01c94d0f3ce@googlegroups.com> <571D548D.2040500@gmail.com>
Xref csiph.com comp.lang.python:107606

Show key headers only | View raw


On 2016-04-24, Michael Torrie <torriem@gmail.com> wrote:
> On 04/24/2016 12:58 PM, CM wrote:
>
>> 1. INPUT: What's the best way to scrape an email like this? The
>>    email is to a Gmail account, and the content shows up in the
>>    email as a series of basically 6x7 tables (HTML?), one table per
>>    PO number/task. I know if the freelancer were to copy and paste
>>    the whole set of tables into a text file and save it as plain
>>    text, Python could easily scrape that file, but I'd much prefer
>>    to save the user those steps. Is there a relatively easy way to
>>    go from the Gmail email to generating the invoice directly? (I
>>    know there is, but wasn't sure what is state of the art these
>>    days).
>
> I would configure Gmail to allow IMAP access (you'll have to set up a
> special password for this most likely),

Your normal gmail password is used for IMAP.

> and then use an imap library from Python to directly find the
> relevant messages and access the email message body.  If the body is
> HTML-formatted (sounds like it is) I would use either BeautifulSoup
> or lxml to parse it and get out the relevant information.

Warning: don't use the basic imaplib.  IMAP is a miserable protocol,
and imap lib is too thin a wrapper. It'll make you bleed from the ears
and wish you were dead.  Use imapclient or imaplib2.  I've used both
(with Gmail's IMAP server), and IMO both are pretty good.  Either one
is miles ahead of plain imaplib.

-- 
Grant Edwards               grant.b.edwards        Yow! But they went to MARS
                                  at               around 1953!!
                              gmail.com            

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Scraping email to make invoice CM <cmpython@gmail.com> - 2016-04-24 11:58 -0700
  Re: Scraping email to make invoice Friedrich Rentsch <anthra.norell@bluewin.ch> - 2016-04-24 22:38 +0200
  Re: Scraping email to make invoice Michael Torrie <torriem@gmail.com> - 2016-04-24 17:19 -0600
  Re: Scraping email to make invoice Grant Edwards <grant.b.edwards@gmail.com> - 2016-04-25 14:39 +0000
  Re: Scraping email to make invoice Michael Torrie <torriem@gmail.com> - 2016-04-25 11:16 -0600
  Re: Scraping email to make invoice Grant Edwards <grant.b.edwards@gmail.com> - 2016-04-25 17:59 +0000

csiph-web