Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2862 > unrolled thread
| Started by | Roger L Costello <costello@mitre.org> |
|---|---|
| First post | 2022-01-22 23:54 +0000 |
| Last post | 2022-01-23 06:58 -0800 |
| Articles | 4 — 4 participants |
Back to article view | Back to comp.compilers
Does the theory and algorithms of compiler design also apply to data formats? Roger L Costello <costello@mitre.org> - 2022-01-22 23:54 +0000
Re: Does the theory and algorithms of compiler design also apply to data formats? gah4 <gah4@u.washington.edu> - 2022-01-22 20:33 -0800
Re: Does the theory and algorithms of compiler design also apply to data formats? Thomas Koenig <tkoenig@netcologne.de> - 2022-01-23 21:05 +0000
Re: Does the theory and algorithms of compiler design also apply to data formats? "matt.ti...@gmail.com" <matt.timmermans@gmail.com> - 2022-01-23 06:58 -0800
| From | Roger L Costello <costello@mitre.org> |
|---|---|
| Date | 2022-01-22 23:54 +0000 |
| Subject | Does the theory and algorithms of compiler design also apply to data formats? |
| Message-ID | <22-01-100@comp.compilers> |
Hello Compiler Experts! The books that I've read always talk about applying compiler theory and algorithms to programming languages. But there are other kinds of languages such as XML, JSON, Comma-Separated-Values (CSV). And aren't data formats such as JPEG, Powerpoint (ppt), Excel (xls) also languages? Does the rich theory and vast algorithms of compilers apply to these non-programming languages? Has anyone created a Bison parser for JPEG? For JSON? For CSV? /Roger [You could, but for the most part their syntax is so simple that a formal parser would be overkill. For example, JSON has a handful of atoms and only two data structures, a sequential list and a key:value object. Everything else is the semantics. The Microsoft formats like docx, xlsx, and pptx are in fact zip files containing XML files. Unzip one and take a look. Also look at XDR, a widely used network data format and rpcgen which compiles an XDR description into code to read and write it. -John]
[toc] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2022-01-22 20:33 -0800 |
| Message-ID | <22-01-102@comp.compilers> |
| In reply to | #2862 |
On Saturday, January 22, 2022 at 5:54:52 PM UTC-8, Roger L Costello wrote: > The books that I've read always talk about applying compiler theory and > algorithms to programming languages. But there are other kinds of languages > such as XML, JSON, Comma-Separated-Values (CSV). And aren't data formats such > as JPEG, Powerpoint (ppt), Excel (xls) also languages? Does the rich theory > and vast algorithms of compilers apply to these non-programming languages? Has > anyone created a Bison parser for JPEG? For JSON? For CSV? In the cases where a data format has enough structure to be parsable with compiler tools, it is usually named a programming language. (Unless you define programming language as only something that can be converted into executable object code for actual hardware.) JPEG files are actually EXIF files containing JPEG image data. The EXIF part contains other information such as data, time, shutter speed, and pretty much anything related to the camera and settings that one could think of. Many data formats are the simplest format for the internal data structures for some program. PostScript is a programming language designed for controlling printers, but it does have many of the characteristics of a more general purpose language. It is mostly meant to be written by programs, but can be written by people. Some PostScript programs contain macros to parse data inside the file and format it for output, such as plots. TeX is a document description language that also has many general language features. It is pretty much not parsable with compiler tools, as just about everything can be changed inside the program, such as which characters are letters. Since changes take effect right away, the parser can't do too much look ahead. metafont is a language, meant to be used with TeX, meant for designing fonts. It looks and works more like a programming language, though with some features that usual programming languages don't have. Among others, instead of the usual assignment statement, but defines the relationship between variables, more generally. In all these cases, and I am sure more, the difference between data and program blurs just enough.
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2022-01-23 21:05 +0000 |
| Message-ID | <22-01-108@comp.compilers> |
| In reply to | #2863 |
gah4 <gah4@u.washington.edu> schrieb: > In the cases where a data format has enough structure to be parsable with > compiler tools, it is usually named a programming language. I think STEP (the CAD graphics format) is an exception. A language called EXPRESS (specified in something like BNF) is used to specify a "schema", and this specification can then be used to write parsers for the actual file. All of this is specified in standards which are quite expensive. When I had occasion to write out CAD data from programs I wrote myself, I looked at this workflow for an hour and decided to use IGES instead.
[toc] | [prev] | [next] | [standalone]
| From | "matt.ti...@gmail.com" <matt.timmermans@gmail.com> |
|---|---|
| Date | 2022-01-23 06:58 -0800 |
| Message-ID | <22-01-104@comp.compilers> |
| In reply to | #2862 |
On Saturday, 22 January 2022 at 20:54:52 UTC-5, Roger L Costello wrote: > Hello Compiler Experts! > > The books that I've read always talk about applying compiler theory and > algorithms to programming languages. But there are other kinds of languages > such as XML, JSON, Comma-Separated-Values (CSV). And aren't data formats such > as JPEG, Powerpoint (ppt), Excel (xls) also languages? Does the rich theory > and vast algorithms of compilers apply to these non-programming languages? Has > anyone created a Bison parser for JPEG? For JSON? For CSV? As the moderator indicates, these kinds of data formats are designed to be simple, and so its not usually useful to use grammar-based parser generators for the data format itself. SGML is a notable exception to this. The standard that defines it is large and its grammar is complicated. It wouldn't be crazy to use a parser generator for XML either. For a lot of these data formats, though, you can apply schemas of some sort to the data (SGML DTDs, XML schema, JSON schema, etc.), and when the data is anticipated to represent a *document*, as in SGML or XML, these schemas are basically a graph of nested regular expressions much like a grammar, and a lot of parsing theory applies. Furthermore, document *processing*, as in generating a printed manual from the structure document that defines its parts, involves applying rules to structures that are recognized in the content. This is syntax directed translation (https://en.wikipedia.org/wiki/Syntax-directed_translation), and all the related compiler theory applies. In some ways it is easier, because the content you're translating is a tree instead of flat text, but in some ways it is more difficult, because the job is to implement a manual human process instead of a language that was designed to be parsed.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web