Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2932

Re: What stage should entities be resolved? Lexical analysis stage? Syntax analysis stage? Semantic analysis stage?

From matt.timmermans@gmail.com
Newsgroups comp.compilers
Subject Re: What stage should entities be resolved? Lexical analysis stage? Syntax analysis stage? Semantic analysis stage?
Date 2022-03-12 05:12 -0800
Organization Compilers Central
Message-ID <22-03-029@comp.compilers> (permalink)
References <22-03-019@comp.compilers>

Show all headers | View raw


On Wednesday, 9 March 2022 at 15:21:40 UTC-5, Roger L Costello wrote:
> [...]
> Okay, back to XML. Consider this non-well-formed XML:
> <Publisher>Harper&amp;Row</Publsher>
> (The end-tag is misspelled)
> The &amp; is called an "XML entity." An XML parser will convert it to &. The
> other XML entities are: &lt; ... &gt; ... &quot; ... &apos;
> What stage should the entity &amp; be converted to &?
>
> 1. Lexical analysis stage
> 2. Syntax analysis stage
> 3. Semantic analysis stage
> What stage should detect that the <Publisher> start-tag does not have a
> matching end-tag?

Other answers provide a discussion of how you make this decision in general.

Specifically for XML, though, these are practical questions.

Re Entities:
 - you can't really recognize them in lexical analysis, because they aren't
valid everywhere.  <T&#41;G> is not a valid tag, and <![CDATA[&amp;]]> has no
entities in it.  It can be convenient, though, for the lexer to capture them
as ENTITY_REFERENCE tokens with their original text (like strings).  Where
they occur in CDATA sections, phase 3 can convert them back into their
original text.  Otherwise, the lexer should produce tokens like ENTITY_START,
WORD_CHARS, ENTITY_END.
- Regardless of what the lexer produces, the syntax analysis phase ensures
that entities only occur in valid locations, and produces a parse tree with
enough information to determine how they're handled.  This is where <T&#41;G>
is rejected as invalid.
- During semantic processing, entities are converted to whatever their
appropriate final form is.  They will be converted into the indicated
characters in strings or element content, or replaced with their original text
in CDATA sections.

Re: Tag Matching:
If you include tag matching in syntax, then the syntax is not context-free and
cannot be described with a context-free grammar... so don't do that.
Practically, only semantic analysis can match the </Publisher> end tag with
the preceding <Publisher> start tag.

Unfortunately, that means that your parse tree cannot have the element
hierarchy embedded in it.  Your grammar cannot have an Element non-terminal.

Back to comp.compilers | Previous | NextPrevious in thread | Find similar


Thread

What stage should entities be resolved? Lexical analysis stage? Syntax analysis stage? Semantic analysis stage? Roger L Costello <costello@mitre.org> - 2022-03-09 17:22 +0000
  Re: What stage should entities be resolved? Lexical analysis stage? Syntax analysis stage? Semantic analysis stage? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-03-10 09:48 +0100
    Re: What stage should entities be resolved? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-12 14:11 +0200
      Re: What stage should entities be resolved? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-03-14 19:43 +0100
    Re: What stage should entities be resolved? Roger L Costello <costello@mitre.org> - 2022-03-15 11:49 +0000
      Re: What stage should entities be resolved? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-03-18 00:31 +0100
      Re: What stage should entities be resolved? gah4 <gah4@u.washington.edu> - 2022-03-17 17:06 -0700
      Re: What stage should entities be resolved? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-03-18 17:50 +0000
        Re: What stage should entities be resolved? gah4 <gah4@u.washington.edu> - 2022-03-18 14:08 -0700
        Re: What stage should entities be resolved? Martin Ward <martin@gkc.org.uk> - 2022-03-19 18:17 +0000
      Re: What stage should entities be resolved? "matt.ti...@gmail.com" <matt.timmermans@gmail.com> - 2022-03-20 07:32 -0700
  RE: What stage should entities be resolved? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-10 12:54 +0200
  Re: What stage should entities be resolved? Lexical analysis stage? Syntax analysis stage? Semantic analysis stage? matt.timmermans@gmail.com - 2022-03-12 05:12 -0800

csiph-web