Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #39586 > unrolled thread

How to write a language parser ?

Started byTimothy Madden <terminatorul@gmail.com>
First post2013-02-22 18:29 +0200
Last post2013-02-23 05:57 -0500
Articles 6 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  How to write a language parser ? Timothy Madden <terminatorul@gmail.com> - 2013-02-22 18:29 +0200
    Re: How to write a language parser ? Chris Angelico <rosuav@gmail.com> - 2013-02-23 03:38 +1100
    Re: How to write a language parser ? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-23 00:06 +0000
    Re: How to write a language parser ? mbg1708@planetmail.com - 2013-02-22 17:25 -0800
    Re: How to write a language parser ? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-02-23 02:14 +0000
    Re: How to write a language parser ? Devin Jeanpierre <jeanpierreda@gmail.com> - 2013-02-23 05:57 -0500

#39586 — How to write a language parser ?

FromTimothy Madden <terminatorul@gmail.com>
Date2013-02-22 18:29 +0200
SubjectHow to write a language parser ?
Message-ID<51279cfe$0$293$14726298@news.sunsite.dk>
Hello

I am trying to write a DBGp client in python, to be used for debugging 
mostly php scripts.

Currently the XDebug module for php allows me to set breakpoints on any 
line, include blank ones and lines that are not considered executable, 
resulting in breakpoints that will never be hit, even if program flow 
control appears to pass through the lines.

For that I would like to write a php parser, in order to detect the 
proper breakpoints line for statements spanning multiple lines.

Is there an (open-source) way to do to this in python code ? Most 
parsers I could see after a search are either too simple for a real 
programming language, or based on a python module written in C. My debug 
client is a Vim plugin, and I would like to distribute it as script 
files only, if that is possible. The generator itself my well be a C 
module, as I only distribute the generated output.

The best parser I could find is PLY, and I would like to know if it is 
good enough for the job. My attempt at a bison parser (C only) ended in 
about a hundred conflicts, most of which are difficult to understand, 
although I admit I do not know much about the subject yet.

Are there other parsers you have used for complete languages ?

Thank you,
Timothy Madden

[toc] | [next] | [standalone]


#39588

FromChris Angelico <rosuav@gmail.com>
Date2013-02-23 03:38 +1100
Message-ID<mailman.2281.1361551115.2939.python-list@python.org>
In reply to#39586
On Sat, Feb 23, 2013 at 3:29 AM, Timothy Madden <terminatorul@gmail.com> wrote:
> For that I would like to write a php parser, in order to detect the proper
> breakpoints line for statements spanning multiple lines.

Are you able to drop to PHP itself for that? It makes its own lexer
available to user-code:

http://php.net/manual/en/function.token-get-all.php

It's supposed to be able to tell you line numbers, too, though I
haven't actually used that. In theory, you should be able to use
token_get_all, then JSON encode it, and write the whole lot out to
stdout, where Python can pick it up and work with it.

ChrisA

[toc] | [prev] | [next] | [standalone]


#39631

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-02-23 00:06 +0000
Message-ID<512807ff$0$29988$c3e8da3$5496439d@news.astraweb.com>
In reply to#39586
On Fri, 22 Feb 2013 18:29:42 +0200, Timothy Madden wrote:

[...]
> For that I would like to write a php parser, in order to detect the
> proper breakpoints line for statements spanning multiple lines.
> 
> Is there an (open-source) way to do to this in python code ? 

Try pyparsing:

http://pyparsing.wikispaces.com/



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#39639

Frommbg1708@planetmail.com
Date2013-02-22 17:25 -0800
Message-ID<ee25ac81-e136-4afe-adf7-461193b3d006@googlegroups.com>
In reply to#39586
On Friday, February 22, 2013 11:29:42 AM UTC-5, Timothy Madden wrote:
> Hello
> 
> 
> 
> I am trying to write a DBGp client in python, to be used for debugging 
> 
> mostly php scripts.
> 
> 
> 
> Currently the XDebug module for php allows me to set breakpoints on any 
> 
> line, include blank ones and lines that are not considered executable, 
> 
> resulting in breakpoints that will never be hit, even if program flow 
> 
> control appears to pass through the lines.
> 
> 
> 
> For that I would like to write a php parser, in order to detect the 
> 
> proper breakpoints line for statements spanning multiple lines.
> 
> 
> 
> Is there an (open-source) way to do to this in python code ? Most 
> 
> parsers I could see after a search are either too simple for a real 
> 
> programming language, or based on a python module written in C. My debug 
> 
> client is a Vim plugin, and I would like to distribute it as script 
> 
> files only, if that is possible. The generator itself my well be a C 
> 
> module, as I only distribute the generated output.
> 
> 
> 
> The best parser I could find is PLY, and I would like to know if it is 
> 
> good enough for the job. My attempt at a bison parser (C only) ended in 
> 
> about a hundred conflicts, most of which are difficult to understand, 
> 
> although I admit I do not know much about the subject yet.
> 
> 
> 
> Are there other parsers you have used for complete languages ?
> 
> 
> 
> Thank you,
> 
> Timothy Madden

Take a look at this whitepaper:
    http://www.cis.upenn.edu/~matuszek/General/recursive-descent-parsing.html

I needed a parser for a chunk of SQL syntax.  After trying PyBison and writing crude text analysis in Python, I found this very useful paper.  I used the advice in this paper to write my own recursive descent parser in pure Python.  The two steps were:

1.  Write a yacc syntax (without any action items).  This step allowed me to get rid of various shift and reduce conflicts in my grammar.

2.  Use the yacc grammar as a guide for the recursive descent parser.  Essentially I wrote one parser function in Python for each yacc production.

The process has the merit that the yacc syntax is known to be robust before you start coding, so the eventual Python code is based on a good design.

Good luck.

[toc] | [prev] | [next] | [standalone]


#39642

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-02-23 02:14 +0000
Message-ID<mailman.2323.1361585689.2939.python-list@python.org>
In reply to#39586
On 22/02/2013 16:29, Timothy Madden wrote:
> Hello
>
> I am trying to write a DBGp client in python, to be used for debugging
> mostly php scripts.
>
> Currently the XDebug module for php allows me to set breakpoints on any
> line, include blank ones and lines that are not considered executable,
> resulting in breakpoints that will never be hit, even if program flow
> control appears to pass through the lines.
>
> For that I would like to write a php parser, in order to detect the
> proper breakpoints line for statements spanning multiple lines.
>
> Is there an (open-source) way to do to this in python code ? Most
> parsers I could see after a search are either too simple for a real
> programming language, or based on a python module written in C. My debug
> client is a Vim plugin, and I would like to distribute it as script
> files only, if that is possible. The generator itself my well be a C
> module, as I only distribute the generated output.
>
> The best parser I could find is PLY, and I would like to know if it is
> good enough for the job. My attempt at a bison parser (C only) ended in
> about a hundred conflicts, most of which are difficult to understand,
> although I admit I do not know much about the subject yet.
>
> Are there other parsers you have used for complete languages ?
>
> Thank you,
> Timothy Madden

http://nedbatchelder.com/text/python-parsers.html

-- 
Cheers.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#39658

FromDevin Jeanpierre <jeanpierreda@gmail.com>
Date2013-02-23 05:57 -0500
Message-ID<mailman.2337.1361617100.2939.python-list@python.org>
In reply to#39586
On Fri, Feb 22, 2013 at 9:14 PM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
> http://nedbatchelder.com/text/python-parsers.html

Hm, that list is missing information. e.g. ANTLR 4 doesn't support
python, and LEPL is dead now.

-- Devin

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web