Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Thomas Koenig Newsgroups: comp.compilers Subject: Re: What does it mean to "move characters" in the lexer? Date: Wed, 22 Jun 2022 11:45:22 -0000 (UTC) Organization: news.netcologne.de Lines: 24 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-06-071@comp.compilers> References: <22-06-057@comp.compilers> <22-06-058@comp.compilers> <22-06-064@comp.compilers> <22-06-066@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="77766"; mail-complaints-to="abuse@iecc.com" Keywords: parse, performance, parallel Posted-Date: 22 Jun 2022 10:01:48 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3092 Kaz Kylheku <480-992-1380@kylheku.com> schrieb: > I remember reading some article some years ago whereby some Javascript > programmer discovered it was faster to read JSON from a file using > dedicated JSON routines available in Javascript, than to declare the > same syntax in the Javascript program as a literal and let it be > scanned along with the program and available to it that way. This came up on comp.arch recently. There is an insanely fast JSON parser ad UTF-8 validator based on SIMD to be found at https://github.com/simdjson/simdjson . They select a different length of vector according to the CPU version they find. The algorithm is described at https://arxiv.org/pdf/1902.08318.pdf. It heavily relies on special-casing for JSON and for the SIMD instructions that are available. A general SIMD-based parser generator is likely to be even harder to write and will probably not outperform the package above (nor, for that case, a traditional character-at-a-time approach). Is there research on this?