Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #52809
| From | Piet van Oostrum <piet@vanoostrum.org> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: I wonder if I would be able to collect data from such page using Python |
| Date | 2013-08-22 00:54 -0400 |
| Message-ID | <m2haeiiaur.fsf@cochabamba.vanoostrum.org> (permalink) |
| References | <a50210f8-8959-46da-a386-2d9a7a17a79e@googlegroups.com> |
[Multipart message — attachments visible in raw view] - view raw
> Hi, > I am totally new to Python. I noticed that there are many videos showing how to collect data from Python, but I am not sure if I would be able to accomplish my goal using Python so I can start learning. > > Here is the example of the target page: > http://and.medianewsonline.com/hello.html > In this example, there are 10 articles. > > What I exactly need is to do the following: > 1- Collect the article title, date, source, and contents. > 2- I need to be able to export the final results to excel or a database client. That is, I need to have all of those specified in step 1 in one row, while each of them saved in separate column. For example: > > Title1 Date1 Source1 Contents1 > Title2 Date2 Source2 Contents2 > > I appreciate any advise regarding my case. > > Thanks & Regards// Here is an attempt for you. It uses BeatifulSoup 4. It is written in Python 3.3, so if you want to use Python 2.x you will have to make some small changes, like from urllib import urlopen and probably something with the print statements. The formatting in columns is left as an exercise for you. I wonder how you would want that with multiparagraph contents.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-21 07:55 -0700
Re: I wonder if I would be able to collect data from such page using Python Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-21 11:30 -0400
Re: I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-21 08:44 -0700
Re: I wonder if I would be able to collect data from such page using Python Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-21 11:58 -0400
Re: I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-21 10:41 -0700
Re: I wonder if I would be able to collect data from such page using Python Joel Goldstick <joel.goldstick@gmail.com> - 2013-08-21 13:52 -0400
Re: I wonder if I would be able to collect data from such page using Python Terry Reedy <tjreedy@udel.edu> - 2013-08-21 15:18 -0400
Re: I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-22 07:58 -0700
Re: I wonder if I would be able to collect data from such page using Python Piet van Oostrum <piet@vanoostrum.org> - 2013-08-22 00:54 -0400
Re: I wonder if I would be able to collect data from such page using Python Comment Holder <commentholder@gmail.com> - 2013-08-22 08:03 -0700
Re: I wonder if I would be able to collect data from such page using Python Chris Angelico <rosuav@gmail.com> - 2013-08-23 01:11 +1000
csiph-web