Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'python,': 0.02; 'example:': 0.03; 'tutorial': 0.03; 'source,': 0.04; 'subject:Python': 0.06; '21,': 0.07; 'subject:would': 0.07; 'subject:using': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'comma': 0.16; 'guessing': 0.16; 'require.': 0.16; 'separated': 0.16; 'traceback.': 0.16; 'all.': 0.16; 'wrote:': 0.18; 'code.': 0.18; 'wed,': 0.18; 'module': 0.19; 'subject:page': 0.19; 'select': 0.22; 'example': 0.22; 'import': 0.22; 'aug': 0.22; 'portion': 0.22; 'separate': 0.22; 'cc:addr:python.org': 0.22; 'case.': 0.24; 'title,': 0.24; "haven't": 0.24; 'cc:2**0': 0.24; 'this:': 0.26; 'excel': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'specified': 0.30; 'message-id:@mail.gmail.com': 0.30; 'along': 0.30; "i'm": 0.30; 'url:mailman': 0.30; 'code': 0.31; 'requests': 0.31; 'that.': 0.31; 'url:python': 0.33; 'totally': 0.33; 'comment': 0.34; 'noticed': 0.34; 'subject:from': 0.34; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'format.': 0.36; 'subject:data': 0.36; 'url:listinfo': 0.36; 'thanks': 0.36; 'hi,': 0.36; 'url:org': 0.36; 'example,': 0.37; 'so,': 0.37; 'list': 0.37; 'step': 0.37; 'sure': 0.39; 'mailing': 0.39; 'url:mail': 0.40; 'how': 0.40; 'new': 0.61; 'save': 0.62; 'to:addr:gmail.com': 0.65; 'here': 0.66; 'date,': 0.68; 'results': 0.69; 'export': 0.74; 'goal': 0.75; 'article': 0.77; 'column.': 0.84; 'contents.': 0.91; 'joel': 0.91; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=M5+R0mtexcM1w4p9yhWR4OogmSzwpqpMHbT09ghgUCs=; b=xUVkvlxxeVCtvc0MzeAOGGhxN9cI3/f9qivmT266h//r7lsHcbxUKwozGdgOQiV1s2 9K8CtwrDA7OC5oc3J2UrK47js32m9ERqIFp2f9hiwF+g7R0d/2jTetjgBqszGUfmMNfC u4+E9E8O1XgWE+GSn8+PVPxmH/fZLMpxbhz1UznpMofI1UG2cwwfAh5DOL4gi/iBBKYj 7cQ14EXBZbjhGaJUcfd0b/VOTQ1G9Mwj89hn7ojw7iXuVfTn9PrlsVCJ2zP9uFwJEq53 nkcSczM48KcJjz270VpT+Tg8o9YRInwhPNCpm+CMNPpcA9bBro2KhvW39EmCxZ3N0Ksb +glw== MIME-Version: 1.0 X-Received: by 10.52.110.66 with SMTP id hy2mr5898257vdb.16.1377099016685; Wed, 21 Aug 2013 08:30:16 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Aug 2013 11:30:16 -0400 Subject: Re: I wonder if I would be able to collect data from such page using Python From: Joel Goldstick To: Comment Holder Content-Type: text/plain; charset=UTF-8 Cc: "python-list@python.org" X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 45 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1377099024 news.xs4all.nl 15974 [2001:888:2000:d::a6]:57686 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:52768 On Wed, Aug 21, 2013 at 10:55 AM, Comment Holder wrote: > Hi, > I am totally new to Python. I noticed that there are many videos showing how to collect data from Python, but I am not sure if I would be able to accomplish my goal using Python so I can start learning. > > Here is the example of the target page: > http://and.medianewsonline.com/hello.html > In this example, there are 10 articles. > > What I exactly need is to do the following: > 1- Collect the article title, date, source, and contents. > 2- I need to be able to export the final results to excel or a database client. That is, I need to have all of those specified in step 1 in one row, while each of them saved in separate column. For example: > > Title1 Date1 Source1 Contents1 > Title2 Date2 Source2 Contents2 > > I appreciate any advise regarding my case. > > Thanks & Regards// > -- > http://mail.python.org/mailman/listinfo/python-list I'm guessing that you are not only new to Python, but that you haven't much experience in writing computer programs at all. So, you need to do that. There is a good tutorial on the python site, and lots of links to other resources. then do this: 1. write code to access the page you require. The Requests module can help with that 2. write code to select the data you want. The BeautifulSoup module is excellent for this 3. write code to save your data in comma separated value format. 4. import to excel or wherever Now, go off and write the code. When you get stuck, copy and paste the portion of the code that is giving you problems, along with the traceback. You can also get help at the python-tutor mailing list -- Joel Goldstick http://joelgoldstick.com