Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #108027
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Stephen Hansen <me+python@ixokai.io> |
| Newsgroups | comp.lang.python |
| Subject | Re: Best way to clean up list items? |
| Date | Mon, 02 May 2016 11:23:51 -0700 |
| Lines | 10 |
| Message-ID | <mailman.328.1462213433.32212.python-list@python.org> (permalink) |
| References | <ng7v9d$ld8$1@dont-email.me> <1462209930.1313497.595817481.73635112@webmail.messagingengine.com> <mailman.325.1462209932.32212.python-list@python.org> <ng84uj$emf$1@dont-email.me> <1462213431.1326811.595881033.1348150C@webmail.messagingengine.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain |
| Content-Transfer-Encoding | 7bit |
| X-Trace | news.uni-berlin.de gHtqRYMxH8EJmauIro/BLw4qIZNeCqWO4HyPAG+J+Sew== |
| Return-Path | <me+python@ixokai.io> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.013 |
| X-Spam-Evidence | '*H*': 0.97; '*S*': 0.00; 'received:internal': 0.09; '11:09': 0.16; 'ah,': 0.16; 'dfs': 0.16; 'message- id:@webmail.messagingengine.com': 0.16; 'received:10.202': 0.16; 'received:10.202.2': 0.16; 'received:66.111': 0.16; 'received:66.111.4': 0.16; 'received:io': 0.16; 'received:messagingengine.com': 0.16; 'received:psf.io': 0.16; 'wrote:': 0.16; 'stephen': 0.22; 'am,': 0.23; 'header:In-Reply- To:1': 0.24; 'mon,': 0.24; 'subject:list': 0.26; 'right.': 0.27; "i'd": 0.31; 'extract': 0.33; 'but': 0.36; 'to:addr:python-list': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'received:10': 0.37; 'received:66': 0.38; 'data': 0.39; "didn't": 0.39; 'to:addr:python.org': 0.40; 'header:Message-Id:1': 0.61; 'scraping': 0.91; 'subject:Best': 0.93 |
| DKIM-Signature | v=1; a=rsa-sha1; c=relaxed/relaxed; d=ixokai.io; h= content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-sasl-enc :x-sasl-enc; s=mesmtp; bh=1Ie15afUVlg8hgMKKj7AvAIqYIg=; b=NhAXgj BFqAPXEtW2VzxXsHjRqua+ZWD9GYhIUJ7rhQ9qno6uPOTVCTL5WmCxYp6e0lVEk7 lzRAtSspCna66T+km3/8rSV4CRetBFgJxwoVh8vuM0zCktPY5S651Oi2HrisEHS5 e+FEUkosCY1Nu0Zr/0UBDbIuNq83wEYwB8v9I= |
| DKIM-Signature | v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-sasl-enc:x-sasl-enc; s=smtpout; bh=1Ie15afUVlg8hgM KKj7AvAIqYIg=; b=idvtcyXFynzS/r/gcg/5e6K8OHxhyCq8CBx68nKxv+vzckE z1ZQeAKkSNkjwDsmNqDSW0H0D7iP6AaoJBM3wCdevnsjypUJxxSkfn/ejEvJCamg 9r9GteGwQ3UljQEzf4q+JkhKiyBh2PYSPltrErBzvl163SlP9mv4i6SQj5nA= |
| X-Sasl-Enc | TZH6Icll7JdlmlwOg4XgM1ZRG79joAUtGHmk3xMFDUtL 1462213431 |
| X-Mailer | MessagingEngine.com Webmail Interface - ajax-491eb5a4 |
| In-Reply-To | <ng84uj$emf$1@dont-email.me> |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.22 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <1462213431.1326811.595881033.1348150C@webmail.messagingengine.com> |
| X-Mailman-Original-References | <ng7v9d$ld8$1@dont-email.me> <1462209930.1313497.595817481.73635112@webmail.messagingengine.com> <mailman.325.1462209932.32212.python-list@python.org> <ng84uj$emf$1@dont-email.me> |
| Xref | csiph.com comp.lang.python:108027 |
Show key headers only | View raw
On Mon, May 2, 2016, at 11:09 AM, DFS wrote: > I'd prefer to get clean data in the first place, but I don't know a > better way to extract it from the HTML. Ah, right. I didn't know you were scraping HTML. Scraping HTML is rarely clean so you have to do a lot of cleanup. -- Stephen Hansen m e @ i x o k a i . i o
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Best way to clean up list items? DFS <nospam@dfs.com> - 2016-05-02 12:33 -0400
Re: Best way to clean up list items? Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-02 19:57 +0300
Re: Best way to clean up list items? justin walters <walters.justin01@gmail.com> - 2016-05-02 10:10 -0700
Re: Best way to clean up list items? DFS <nospam@dfs.com> - 2016-05-02 14:06 -0400
Re: Best way to clean up list items? Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-05-02 21:27 +0300
Re: Best way to clean up list items? DFS <nospam@dfs.com> - 2016-05-02 15:04 -0400
Re: Best way to clean up list items? Stephen Hansen <me+python@ixokai.io> - 2016-05-02 10:25 -0700
Re: Best way to clean up list items? DFS <nospam@dfs.com> - 2016-05-02 14:09 -0400
Re: Best way to clean up list items? Stephen Hansen <me+python@ixokai.io> - 2016-05-02 11:23 -0700
Re: Best way to clean up list items? Peter Otten <__peter__@web.de> - 2016-05-02 19:30 +0200
csiph-web