Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.012 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'python.': 0.02; 'subject:Python': 0.06; '21,': 0.07; 'subject:would': 0.07; 'urllib2': 0.07; 'classes.': 0.09; 'pages.': 0.09; 'proficient': 0.09; 'subject:using': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'programs.': 0.14; 'background,': 0.16; 'ground,': 0.16; 'guessing': 0.16; 'non-native': 0.16; 'think.': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'module': 0.19; 'subject:page': 0.19; 'things.': 0.19; 'version.': 0.19; 'aug': 0.22; 'cc:addr:python.org': 0.22; 'simpler': 0.24; 'cc:2**0': 0.24; 'defined': 0.27; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; 'wonder': 0.29; 'message-id:@mail.gmail.com': 0.30; 'url:mailman': 0.30; 'asked': 0.31; 'page.': 0.31; 'requests': 0.31; 'languages': 0.32; 'stuff': 0.32; 'url:python': 0.33; 'not.': 0.33; 'comment': 0.34; 'subject:from': 0.34; 'could': 0.34; 'but': 0.35; 'received:google.com': 0.35; 'google': 0.35; 'there': 0.35; 'really': 0.36; 'subject:data': 0.36; 'url:listinfo': 0.36; 'thanks': 0.36; 'url:org': 0.36; 'question,': 0.38; 'weekend': 0.38; 'that,': 0.38; 'little': 0.38; 'structure': 0.39; 'url:mail': 0.40; 'how': 0.40; 'referred': 0.60; 'complete': 0.62; 'developed': 0.63; 'such': 0.63; 'great': 0.65; 'to:addr:gmail.com': 0.65; 'hours': 0.66; 'latest': 0.67; 'sample': 0.67; 'biggest': 0.67; 'beautiful': 0.68; 'study': 0.69; 'finance': 0.70; 'yourself': 0.78; '11:44': 0.84; 'actually,': 0.84; 'extent.': 0.84; 'joel': 0.91; 'skilled,': 0.91; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=aptjYUqxInjH0lArhrX9vIcnxdwv8049yH7C/LW85EQ=; b=MJYt63C1m5RDIs/rCv0u7wK7xKi22Tr96z5H4exYHQDFmwar0kst06aHejg/brYvJ2 dz3yrv4Jw/eNJslHs1RNiiLVyJHWpwctARLBnnCCoDnzv4JJVgzmjphu29v5you/tQ4Q hfsiD4rJPZEEiP1vxDg/OySRfw5+01rFf+C4knBcSOyelogtV4mmE7wx7G+HMBmINzrM p0UJigcCx/Nmj1VCJWMEJxhxuG1zNjmjxosoimUl/KUKyF1hhUz3xSQMOAftJjGhMnqU BHEqXZAAUzASJgNUZu85E/r7m36gse2S4T/gC69lahKT+QG4rGW0N6mKCO4WEGW28Pxv Jf9w== MIME-Version: 1.0 X-Received: by 10.58.196.132 with SMTP id im4mr1592241vec.28.1377100710226; Wed, 21 Aug 2013 08:58:30 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Aug 2013 11:58:30 -0400 Subject: Re: I wonder if I would be able to collect data from such page using Python From: Joel Goldstick To: Comment Holder Content-Type: text/plain; charset=UTF-8 Cc: "python-list@python.org" X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 32 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1377100719 news.xs4all.nl 15986 [2001:888:2000:d::a6]:41230 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:52770 On Wed, Aug 21, 2013 at 11:44 AM, Comment Holder wrote: > Many thanks Joel, > > You are right to some extent. I come from Finance background, but I am very familiar with what could be referred to as non-native languages such as Matlab, VBA,.. actually, I have developed couple of complete programs. > > I have asked this question, because I am a little worried about the structure of this particular page, as there are no specific defined classes. > > I know how powerful Python is, but I wonder if it could do the job with this particular page. > > Again, many thanks Joel, I appreciate your guidance. > All Best// > -- > http://mail.python.org/mailman/listinfo/python-list Your biggest hurdle will be to get proficient with python. Give yourself a weekend with a good tutorial. You won't be very skilled, but you will get the gist of things. Also, google Beautiful Soup. You need the latest version. Its v4 I think. They have a GREAT tutorial. Spend a few hours with it and you will see your way to get the data you want from your web pages. Since you gave a sample web page, I am guessing that you need to log in to the site for 'real data'. For that, you need to really understand stuff that you might not. At any rate, study the Requests Module documentation. Python comes with urllib, and urllib2 that cover the same ground, but Requests is a lot simpler to understand -- Joel Goldstick http://joelgoldstick.com