Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'subject:Python': 0.05; 'source.': 0.05; 'class,': 0.07; 'parsing': 0.07; 'subject:question': 0.08; 'python': 0.09; ':-(': 0.09; 'parsed': 0.09; 'subject:parsing': 0.09; 'programs.': 0.11; 'file,': 0.15; '"level"': 0.16; 'pygments': 0.16; 'tokenize': 0.16; 'code.': 0.20; 'otherwise,': 0.20; 'example': 0.23; 'split': 0.23; 'random': 0.24; 'idea': 0.24; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'skip:[ 10': 0.26; 'in.': 0.27; 'regular': 0.27; 'block,': 0.29; 'staying': 0.29; 'this.': 0.29; "i'm": 0.29; '(from': 0.30; 'code': 0.31; 'could': 0.32; 'problem': 0.33; 'to:addr:python-list': 0.33; 'but': 0.36; 'method': 0.36; 'too': 0.36; 'subject:: ': 0.38; 'fact': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'space': 0.39; 'little': 0.39; 'received:192.168': 0.40; 'help': 0.40; 'identify': 0.61; 'more': 0.63; 'family': 0.68; 'received:204': 0.72; 'friends': 0.83; 'affraid': 0.84; 'divide': 0.84; 'probably,': 0.84; 'navigate': 0.91 Date: Mon, 30 Jul 2012 16:59:12 +0200 From: Laszlo Nagy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120714 Thunderbird/14.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: simplified Python parsing question References: <5015C58D.4040101@harvee.org> <50165308.5060708@shopzeus.com> <50165A94.5050906@harvee.org> In-Reply-To: <50165A94.5050906@harvee.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 24 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1343660358 news.xs4all.nl 6896 [2001:888:2000:d::a6]:60630 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:26258 > > yeah the problem is also little more complicated than simple parsing > of Python code. For example, one example (from the white paper) > > *meat space blowback = Friends and family [well-meaning attempt] > > *could that be parsed by the tools you mention? It is not valid Python code. Pygments is able to tokenize code that is not valid Python code. Because it is not parsing, it is just tokenizing. But if you put a bunch of random tokens into a file, then of course you will never be able to split that into statements. Probably, you will need to process ident/dedent tokens, identify the "level" of the satement. And then you can tell what file, class, inner class, method you are staying in. Inside one "level" or code block, you could try to divide the code into statements. Otherwise, I have no idea how a blind person could navigate in a Python source. In fact I have no idea how they use regular programs. So I'm affraid I cannot help too much with this. :-(