Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #52809

Re: I wonder if I would be able to collect data from such page using Python

From Piet van Oostrum <piet@vanoostrum.org>
Newsgroups comp.lang.python
Subject Re: I wonder if I would be able to collect data from such page using Python
Date 2013-08-22 00:54 -0400
Message-ID <m2haeiiaur.fsf@cochabamba.vanoostrum.org> (permalink)
References <a50210f8-8959-46da-a386-2d9a7a17a79e@googlegroups.com>

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

> Hi,
> I am totally new to Python. I noticed that there are many videos showing how to collect data from Python, but I am not sure if I would be able to accomplish my goal using Python so I can start learning.
>
> Here is the example of the target page:
> http://and.medianewsonline.com/hello.html
> In this example, there are 10 articles.
>
> What I exactly need is to do the following:
> 1- Collect the article title, date, source, and contents.
> 2- I need to be able to export the final results to excel or a database client. That is, I need to have all of those specified in step 1 in one row, while each of them saved in separate column. For example:
>
> Title1    Date1   Source1   Contents1
> Title2    Date2   Source2   Contents2
>
> I appreciate any advise regarding my case. 
>
> Thanks & Regards//

Here is an attempt for you. It uses BeatifulSoup 4. It is written in Python 3.3, so if you want to use Python 2.x you will have to make some small changes, like
from urllib import urlopen
and probably something with the print statements.

The formatting in columns is left as an exercise for you. I wonder how you would want that with multiparagraph contents.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-21 07:55 -0700
  Re: I wonder if I would be able to collect data from such page using Python Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-21 11:30 -0400
    Re: I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-21 08:44 -0700
      Re: I wonder if I would be able to collect data from such page using Python Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-21 11:58 -0400
        Re: I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-21 10:41 -0700
          Re: I wonder if I would be able to collect data from such page using Python Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-21 13:52 -0400
          Re: I wonder if I would be able to collect data from such page using Python Terry Reedy <tjreedy@udel.edu> - 2013-08-21 15:18 -0400
            Re: I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-22 07:58 -0700
  Re: I wonder if I would be able to collect data from such page using Python Piet van Oostrum <piet@vanoostrum.org> - 2013-08-22 00:54 -0400
    Re: I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-22 08:03 -0700
      Re: I wonder if I would be able to collect data from such page using Python Chris Angelico <rosuav@gmail.com> - 2013-08-23 01:11 +1000

csiph-web