Groups | Search | Server Info | Keyboard shortcuts | Login | Register
Groups > comp.compilers > #3428
| From | Kaz Kylheku <864-117-4973@kylheku.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Re: A simpler way to tokenize and parse? |
| Date | 2023-03-26 18:19 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <23-03-026@comp.compilers> (permalink) |
| References | <23-03-011@comp.compilers> <23-03-019@comp.compilers> |
On 2023-03-25, Lieven Marchand <mal@wyrd.be> wrote: > Roger L Costello <costello@mitre.org> writes: > >> I have done some work with Flex and Bison and recently I've done some work >> with building parsers using read. My experience is the latter is much easier. >> Why isn't read more widely discussed and used in the compiler community? >> Surely the concept that read embodies is not specific to Lisp and Scheme, >> right? > > Apart from the already mentioned problem that it forces you into a > syntax that a lot of people don't like, there's also the problem that > you have to deal with hostile input. Where you expect "(+ 2 3)" someone > will enter "(+ 2 3 #.(progn (launch-the-nukes) 4))". A lot of security Not every Lisp dialect has hash-dot read-time evaluation; that's a feature of Common Lisp, disabled by setting/binding *read-eval* to nil. I don't seem to recall that Scheme has it. I deliberately kept it out of TXR Lisp. However, that doesn't disable compile-time evaluation in macros, which kicks in if you feed the read code to the compile function. compile must be regarded the same as eval from a security POV. We are seeing compile-time evaluation in newer languages, though. It's not a bona-fide security issue, except in applications that dynamically compile untrusted input. Since the aim is almost always to execute it, whether the malice happens at compile time or run time. Both have to be sandboxed. When you're building an open-source program, it's a given that you're running its code: shell scripts, make files or what have you. It doesn't need read-time evaluation to perpetrate malice. > problems in real world settings come from not correctly validating > inputs and by the time you have worked around all these problems read > isn't all that easy anymore. C for example has a somewhat similar > facility scanf that tries to pattern match input and is also considered > unsafe. scanf is unsafe, but not in the way that hash-dot read-time evaluation is unsafe. The situations are not comparable. scanf doesn't feature a documented, reliable scan-time programing language that the *input* can use to extend the program which calls scanf, and which can be turned off by a flag. You have to exploit a buffer overflow or whatever, enabled by careless use of scanf. All they have in common is that calling read on untrusted input without disabling *read-eval* is a kind of careless use of read. But setting *read-eval* to nil is s something you may be able to do in just one place in the entire application. (Anything which needs *read-eval* can opt-in using (let ((*read-eval t)) ...) around its calls to read. > A good rule of thumb for production ready software is to define > a grammar for valid input and provide a validating parser. Sure, if you want to waste your time defining grammars and writing validating parsers. This is no longer done that much outside of the Lisp world. People use XML, JSON, Yaml, ..., whose grammar they definitely didn't design or implement, and validate the content/shape of the object that comes out. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar
A simpler way to tokenize and parse? Roger L Costello <costello@mitre.org> - 2023-03-24 14:45 +0000
Re: Lisp syntax, was A simpler way to tokenize and parse? Spiros Bousbouras <spibou@gmail.com> - 2023-03-25 11:55 +0000
Re: Lisp syntax, was A simpler way to tokenize and parse? gah4 <gah4@u.washington.edu> - 2023-03-25 14:32 -0700
Re: Lisp syntax, was A simpler way to tokenize and parse? anton@mips.complang.tuwien.ac.at (Anton Ertl) - 2023-03-25 13:14 +0000
Re: Lisp syntax, was A simpler way to tokenize and parse? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-03-26 00:46 +0000
Re: A simpler way to tokenize and parse? Lieven Marchand <mal@wyrd.be> - 2023-03-25 19:58 +0100
Re: A simpler way to tokenize and parse? Spiros Bousbouras <spibou@gmail.com> - 2023-03-26 14:10 +0000
Re: A simpler way to tokenize and parse? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-03-26 18:19 +0000
Re: Lisp syntax, A simpler way to tokenize and parse? Lieven Marchand <mal@wyrd.be> - 2023-03-27 23:15 +0200
Re: A simpler way to tokenize and parse? Kaz Kylheku <864-117-4973@kylheku.com> - 2023-03-26 01:17 +0000
csiph-web