Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Terry Reedy <tjreedy@udel.edu>
Subject: Re: I wonder if I would be able to collect data from such page using Python
Date: Wed, 21 Aug 2013 15:18:19 -0400
References: <a50210f8-8959-46da-a386-2d9a7a17a79e@googlegroups.com> <mailman.81.1377099024.19984.python-list@python.org> <bfd5cc17-8901-47b4-944f-7841c8d7cc15@googlegroups.com> <mailman.83.1377100719.19984.python-list@python.org> <02caf0a8-1506-4746-9136-3452cbdea14b@googlegroups.com> <CAPM-O+zV25UNAaVagdPCwXig+J==PJsxtmgSXcVuy1kV1k+Jag@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
In-Reply-To: <CAPM-O+zV25UNAaVagdPCwXig+J==PJsxtmgSXcVuy1kV1k+Jag@mail.gmail.com>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.97.1377112715.19984.python-list@python.org>
Lines: 21
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:52786

On 8/21/2013 1:52 PM, Joel Goldstick wrote:
> On Wed, Aug 21, 2013 at 1:41 PM, Comment Holder <commentholder@gmail.com> wrote:

>> Many thanks for your help - I think I shall start with this way and see how it goes. My concerns were if the task can be accomplished with Python, and from your posts, I guess it can - so I shall give it a try :).

CM: You still seem a bit doubtful. If you are wondering why no one else 
has answered, it is because Joel has given you a really good answer that 
cannot be beat without writing your code for you.

> You're welcome.  One thought popped into my mind.  Since the site
> seems to be from the Wall Street Journal, you may want to look into
> whether they have an api for searching and retrieving articles.  If
> they do, this would be simpler and probably safer than parsing web
> pages.  From time to time, websites change their layout, which would
> probably break your program.  However APIs are more stable

Including this suggestion, which I did not think of.

-- 
Terry Jan Reedy