Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'that?': 0.05; 'json': 0.07; 'lines.': 0.07; 'subject:How': 0.09; 'python': 0.09; 'encode': 0.09; 'subject:language': 0.09; 'sat,': 0.15; 'breakpoints': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'lexer': 0.16; 'subject:parser': 0.16; 'wrote:': 0.17; 'detect': 0.17; 'feb': 0.19; 'supposed': 0.21; "haven't": 0.23; 'header:In-Reply-To:1': 0.25; 'am,': 0.27; 'message- id:@mail.gmail.com': 0.27; 'received:209.85.212': 0.28; 'statements': 0.29; 'that.': 0.30; 'subject: ?': 0.30; 'to:addr :python-list': 0.33; 'received:google.com': 0.34; 'received:209.85': 0.35; 'should': 0.36; 'itself': 0.37; 'drop': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'url:en': 0.38; 'to:addr:python.org': 0.39; 'where': 0.40; '2013': 0.84; 'subject:write': 0.84; 'timothy': 0.84; 'url:function': 0.84; 'url:php': 0.86 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=3277HMKIWQQDIaFKYLtnLmWTOyz9owMDv6Hb9dFOrxw=; b=qAbVcwGa8F7jBaRdPjBVo07Q7dnLe0DGCY5sTqNKCKKdMPpenghVeCGPpbMzTDil2U 9uEdTjCawrn/cpqJaPwmYaayeS4eXCxURn+LoBfuhCjbcM6NuOJHxtNbIWIFTwbM21IZ eJirfAcLgPeIG3zdMxwdr6MghXEvhvrzyh80Y012O8g9ubqZgUKeMnJP/NvFHhPlMQ1Q Zxwpb7+lD3rmY3RwmTUGf8L5tCFUZKLNQhnf7J4P/IlCR6YMwEtn2NOdLocQer2LFql6 1xPf1koR1M6sja44F9OAFraBmTBfQscz9ArQQ45BSCqPVvNSyn0IsvmTobtUfOf6pu/h 6UVw== MIME-Version: 1.0 X-Received: by 10.52.67.164 with SMTP id o4mr2982430vdt.42.1361551100163; Fri, 22 Feb 2013 08:38:20 -0800 (PST) In-Reply-To: <51279cfe$0$293$14726298@news.sunsite.dk> References: <51279cfe$0$293$14726298@news.sunsite.dk> Date: Sat, 23 Feb 2013 03:38:19 +1100 Subject: Re: How to write a language parser ? From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 15 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1361551115 news.xs4all.nl 6859 [2001:888:2000:d::a6]:47341 X-Complaints-To: abuse@xs4all.nl Path: csiph.com!usenet.pasdenom.info!news.stben.net!border3.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!feeder2.cambriumusenet.nl!feeder1.cambriumusenet.nl!feed.tweaknews.nl!194.109.133.83.MISMATCH!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Xref: csiph.com comp.lang.python:39588 On Sat, Feb 23, 2013 at 3:29 AM, Timothy Madden wrote: > For that I would like to write a php parser, in order to detect the proper > breakpoints line for statements spanning multiple lines. Are you able to drop to PHP itself for that? It makes its own lexer available to user-code: http://php.net/manual/en/function.token-get-all.php It's supposed to be able to tell you line numbers, too, though I haven't actually used that. In theory, you should be able to use token_get_all, then JSON encode it, and write the whole lot out to stdout, where Python can pick it up and work with it. ChrisA