Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2932
| From | matt.timmermans@gmail.com |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Re: What stage should entities be resolved? Lexical analysis stage? Syntax analysis stage? Semantic analysis stage? |
| Date | 2022-03-12 05:12 -0800 |
| Organization | Compilers Central |
| Message-ID | <22-03-029@comp.compilers> (permalink) |
| References | <22-03-019@comp.compilers> |
On Wednesday, 9 March 2022 at 15:21:40 UTC-5, Roger L Costello wrote: > [...] > Okay, back to XML. Consider this non-well-formed XML: > <Publisher>Harper&Row</Publsher> > (The end-tag is misspelled) > The & is called an "XML entity." An XML parser will convert it to &. The > other XML entities are: < ... > ... " ... ' > What stage should the entity & be converted to &? > > 1. Lexical analysis stage > 2. Syntax analysis stage > 3. Semantic analysis stage > What stage should detect that the <Publisher> start-tag does not have a > matching end-tag? Other answers provide a discussion of how you make this decision in general. Specifically for XML, though, these are practical questions. Re Entities: - you can't really recognize them in lexical analysis, because they aren't valid everywhere. <T)G> is not a valid tag, and <![CDATA[&]]> has no entities in it. It can be convenient, though, for the lexer to capture them as ENTITY_REFERENCE tokens with their original text (like strings). Where they occur in CDATA sections, phase 3 can convert them back into their original text. Otherwise, the lexer should produce tokens like ENTITY_START, WORD_CHARS, ENTITY_END. - Regardless of what the lexer produces, the syntax analysis phase ensures that entities only occur in valid locations, and produces a parse tree with enough information to determine how they're handled. This is where <T)G> is rejected as invalid. - During semantic processing, entities are converted to whatever their appropriate final form is. They will be converted into the indicated characters in strings or element content, or replaced with their original text in CDATA sections. Re: Tag Matching: If you include tag matching in syntax, then the syntax is not context-free and cannot be described with a context-free grammar... so don't do that. Practically, only semantic analysis can match the </Publisher> end tag with the preceding <Publisher> start tag. Unfortunately, that means that your parse tree cannot have the element hierarchy embedded in it. Your grammar cannot have an Element non-terminal.
Back to comp.compilers | Previous | Next — Previous in thread | Find similar
What stage should entities be resolved? Lexical analysis stage? Syntax analysis stage? Semantic analysis stage? Roger L Costello <costello@mitre.org> - 2022-03-09 17:22 +0000
Re: What stage should entities be resolved? Lexical analysis stage? Syntax analysis stage? Semantic analysis stage? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-03-10 09:48 +0100
Re: What stage should entities be resolved? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-12 14:11 +0200
Re: What stage should entities be resolved? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-03-14 19:43 +0100
Re: What stage should entities be resolved? Roger L Costello <costello@mitre.org> - 2022-03-15 11:49 +0000
Re: What stage should entities be resolved? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-03-18 00:31 +0100
Re: What stage should entities be resolved? gah4 <gah4@u.washington.edu> - 2022-03-17 17:06 -0700
Re: What stage should entities be resolved? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-03-18 17:50 +0000
Re: What stage should entities be resolved? gah4 <gah4@u.washington.edu> - 2022-03-18 14:08 -0700
Re: What stage should entities be resolved? Martin Ward <martin@gkc.org.uk> - 2022-03-19 18:17 +0000
Re: What stage should entities be resolved? "matt.ti...@gmail.com" <matt.timmermans@gmail.com> - 2022-03-20 07:32 -0700
RE: What stage should entities be resolved? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-03-10 12:54 +0200
Re: What stage should entities be resolved? Lexical analysis stage? Syntax analysis stage? Semantic analysis stage? matt.timmermans@gmail.com - 2022-03-12 05:12 -0800
csiph-web