Path: csiph.com!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.lang.c Subject: Re: Lexical Elements Date: Fri, 29 Sep 2017 08:47:29 -0700 Organization: None to speak of Lines: 66 Message-ID: References: <2d14fe25-9390-4711-b8e2-1cdefc3d56bf@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: reader02.eternal-september.org; posting-host="61a8d29e3e3ce5c1ce3f75578cd619a7"; logging-data="10348"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/hBKWoKB2X9HFVRZkUavBe" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:afuFlBTuJGTanm4xO8BZubp9CgY= sha1:uHw0yKrvyIDraI3HPKmHA4fsFKw= Xref: csiph.com comp.lang.c:120546 David Kleinecke writes: > On Thursday, September 28, 2017 at 6:40:08 PM UTC-7, Keith Thompson wrote: [...] >> I suspect I really don't undrestand what you're saying. >> >> Here's an example. C11 6.8.4.2p1 specifies the following constraint: >> >> The controlling expression of a switch statement shall have integer >> type. >> >> Unless you extend the meaning of "syntax" beyond (my) recognition, you >> won't be able to enforce that constraint using only syntax information. >> >> Roughly, a syntax error is a failure to parse the source code in >> accordance with a grammar that can be defined, for example, in BNF >> (Backus-Naur Form). (C's treatment of typedefs punches a small hole in >> this model.) Any errors that are not syntax errors are what I think of >> as semantic errors. In C, this is more or less expressed as syntax >> rules vs. constraints. > > I understand syntax, perhaps, like a linguist would. In > all the linguistic work I know what fills a slot (a slot > like the identifier slot in a switch statement) can be > sub-categorized to a subset of all identifiers. In this > case the identifier must have the attribute "integer". > The token (already identified as an identifier) is further > sub-categorized by being declared, for example, an "int". I'm not a linguist, but I suspect that you're looking at syntax in a way that's not useful for analyzing C. There is no "identifier slot" in a switch statement. What follows the "switch" keyword is an expression. That expression can be arbitrarily complex, but it must be of some integer type. The attribute "integer", if you have such a thing, would need to apply to (your internal representation of) that expression, not to some identifier. > The source of my concern about what a token actually is > comes from this accumulation of additional attributes - > which would include constant and volatile as well as > static/extern. In a typical compiler design, input is split into tokens by the "lexer". Each token is derived from a sequence of characters in the source. One kind of token is an identifier. At the token level, an identifier has no type and does not refer to any declaration; those concepts occur later in processing. All you need to know about it at that point is that it's an identifier and how it's spelled. The stream of tokens is consumed by the parser, which does all the syntactic analysis. The parser might build some data structure that reflects the syntax (declarations, statements, function definitions, etc.). That structure may or may not use the token stuctures built by the lexer, but it will at least need to annotate them with extra information. But as for const/volatile/static/extern, those are attributes that should apply to a declaration, not to an identifier. The parser would figure out, for example, which declaration a given occurrence of an identifier refers to. -- Keith Thompson (The_Other_Keith) kst-u@mib.org Working, but not speaking, for JetHead Development, Inc. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"