Re: A simpler way to tokenize and parse?

From	Kaz Kylheku <864-117-4973@kylheku.com>
Newsgroups	comp.compilers
Subject	Re: A simpler way to tokenize and parse?
Date	2023-03-26 18:19 +0000
Organization	A noiseless patient Spider
Message-ID	<23-03-026@comp.compilers> (permalink)
References	<23-03-011@comp.compilers> <23-03-019@comp.compilers>

Show all headers | View raw

On 2023-03-25, Lieven Marchand <mal@wyrd.be> wrote:
> Roger L Costello <costello@mitre.org> writes:
>
>> I have done some work with Flex and Bison and recently I've done some work
>> with building parsers using read. My experience is the latter is much easier.
>> Why isn't read more widely discussed and used in the compiler community?
>> Surely the concept that read embodies is not specific to Lisp and Scheme,
>> right?
>
> Apart from the already mentioned problem that it forces you into a
> syntax that a lot of people don't like, there's also the problem that
> you have to deal with hostile input. Where you expect "(+ 2 3)" someone
> will enter "(+ 2 3 #.(progn (launch-the-nukes) 4))". A lot of security

Not every Lisp dialect has hash-dot read-time evaluation; that's
a feature of Common Lisp, disabled by setting/binding *read-eval*
to nil. I don't seem to recall that Scheme has it. I deliberately
kept it out of TXR Lisp.

However, that doesn't disable compile-time evaluation in macros,
which kicks in if you feed the read code to the compile function.
compile must be regarded the same as eval from a security POV.
We are seeing compile-time evaluation in newer languages,
though.

It's not a bona-fide security issue, except in applications that
dynamically compile untrusted input. Since the aim is almost
always to execute it, whether the malice happens at compile
time or run time. Both have to be sandboxed.

When you're building an open-source program, it's a given that
you're running its code: shell scripts, make files or what
have you. It doesn't need read-time evaluation to perpetrate
malice.

> problems in real world settings come from not correctly validating
> inputs and by the time you have worked around all these problems read
> isn't all that easy anymore. C for example has a somewhat similar
> facility scanf that tries to pattern match input and is also considered
> unsafe.

scanf is unsafe, but not in the way that hash-dot read-time evaluation
is unsafe. The situations are not comparable.

scanf doesn't feature a documented, reliable scan-time programing language
that the *input* can use to extend the program which calls scanf,
and which can be turned off by a flag.

You have to exploit a buffer overflow or whatever, enabled by careless
use of scanf.

All they have in common is that calling read on untrusted input without
disabling *read-eval* is a kind of careless use of read.

But setting *read-eval* to nil is s something you may be able to do in
just one place in the entire application. (Anything which needs
*read-eval* can opt-in using (let ((*read-eval t)) ...) around its
calls to read.

> A good rule of thumb for production ready software is to define
> a grammar for valid input and provide a validating parser.

Sure, if you want to waste your time defining grammars and
writing validating parsers.

This is no longer done that much outside of the Lisp world.  People use XML,
JSON, Yaml, ..., whose grammar they definitely didn't design or
implement, and validate the content/shape of the object that comes out.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

A simpler way to tokenize and parse? Roger L Costello <costello@mitre.org> - 2023-03-24 14:45 +0000
  Re: Lisp syntax, was A simpler way to tokenize and parse? Spiros Bousbouras <spibou@gmail.com> - 2023-03-25 11:55 +0000
    Re: Lisp syntax, was A simpler way to tokenize and parse? gah4 <gah4@u.washington.edu> - 2023-03-25 14:32 -0700
  Re: Lisp syntax, was A simpler way to tokenize and parse? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2023-03-25 13:14 +0000
    Re: Lisp syntax, was A simpler way to tokenize and parse? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-03-26 00:46 +0000
  Re: A simpler way to tokenize and parse? Lieven Marchand <mal@wyrd.be> - 2023-03-25 19:58 +0100
    Re: A simpler way to tokenize and parse? Spiros Bousbouras <spibou@gmail.com> - 2023-03-26 14:10 +0000
    Re: A simpler way to tokenize and parse? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-03-26 18:19 +0000
      Re: Lisp syntax, A simpler way to tokenize and parse? Lieven Marchand <mal@wyrd.be> - 2023-03-27 23:15 +0200
  Re: A simpler way to tokenize and parse? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-03-26 01:17 +0000

csiph-web