Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2572 > unrolled thread
| Started by | Roger L Costello <costello@mitre.org> |
|---|---|
| First post | 2020-08-31 10:35 +0000 |
| Last post | 2020-09-09 14:09 -0700 |
| Articles | 12 — 9 participants |
Back to article view | Back to comp.compilers
Parsing using a Graphics Processing Unit (GPU)? Roger L Costello <costello@mitre.org> - 2020-08-31 10:35 +0000
Re: Parsing using a Graphics Processing Unit (GPU)? Christian Gollwitzer <auriocus@gmx.de> - 2020-09-01 09:22 +0200
Re: Parsing using a Graphics Processing Unit (GPU)? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-09-01 19:02 +0200
Parsing using a Graphics Processing Unit (GPU)? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-09-02 01:14 +0300
Re: Parsing using a Graphics Processing Unit (GPU)? Jan Ziak <0xe2.0x9a.0x9b@gmail.com> - 2020-09-02 02:13 -0700
Re: Parsing using a Graphics Processing Unit (GPU)? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-09-02 22:34 +0200
Re: Parsing using a Graphics Processing Unit (GPU)? "A. K." <minforth@arcor.de> - 2020-09-01 01:25 -0700
Re: Parsing using a Graphics Processing Unit (GPU)? arnold@skeeve.com (Aharon Robbins) - 2020-09-02 05:43 +0000
Re: Parsing using a Graphics Processing Unit (GPU)? Elijah Stone <elronnd@elronnd.net> - 2020-09-01 20:13 -0700
Re: Parsing using a Graphics Processing Unit (GPU)? Roger L Costello <costello@mitre.org> - 2020-09-02 11:45 +0000
Re: Parsing using a Graphics Processing Unit (GPU)? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-09-02 17:57 +0300
Re: Parsing using a Graphics Processing Unit (GPU)? gah4 <gah4@u.washington.edu> - 2020-09-09 14:09 -0700
| From | Roger L Costello <costello@mitre.org> |
|---|---|
| Date | 2020-08-31 10:35 +0000 |
| Subject | Parsing using a Graphics Processing Unit (GPU)? |
| Message-ID | <20-09-001@comp.compilers> |
Hi Folks, I am reading a book [1] on machine learning and the book says some pretty interesting things: "In the search for more speed, machine learning researchers started taking advantage of special hardware found in some computers, originally designed to improve graphics performance. You may have heard these called graphics cards. ... Those graphics cards contain a GPU, or graphics processing unit. Unlike a general purpose CPU, a GPU is designed to perform specific tasks, and do them well. One of those tasks is to carry out arithmetic, including matrix multiplication, in a highly parallel way. ... GPUs have many more [than CPUs] arithmetic cores, thousands are fairly common today. This means a huge workload can be split amongst all those cores and the job can be done quickly." Neat! Has the parsing community found a way to take advantage of GPUs? From the above excerpt, it appears that GPUs are especially good at arithmetic. When I think of parsing, I don't think of lots of arithmetic. Perhaps someone has devised a way to recast the parsing problem into an arithmetic problem? Any thoughts you might have on: (a) parsing-using-GPUs, and (b) recasting-the-parsing-problem-into-an-arithmetic-problem would be appreciated. /Roger [1] "Make Your First GAN with Pytorch" by Tariq Rashid [Parsing is not usually an important factor in compiler performance. The slow parts are the lexer, because it has to look at every character of the input, and some optimizations that have to analyze the entire intermediate form of the program. The first step in lexing is to identify what class each character is, e.g., identifier, white space, or operator. Perhaps a GPU could do vector lookups to speed that up. For optimizations, I can sort of imagine how some analyses like reachability might be expressible as matrices. -John]
[toc] | [next] | [standalone]
| From | Christian Gollwitzer <auriocus@gmx.de> |
|---|---|
| Date | 2020-09-01 09:22 +0200 |
| Message-ID | <20-09-002@comp.compilers> |
| In reply to | #2572 |
Am 31.08.20 um 12:35 schrieb Roger L Costello: > Has the parsing community found a way to take advantage of GPUs? I don't think that you can speed up parsing a lot using GPUs. GPUs are generally good at parallel processing. A single thread is usually slower than a CPU thread, and there is overhead, because they are not self-contained - i.e. you can usually speed up only some part of a program, and it needs to be uploaded to the GPU and downloaded back. GPUs also have faster memory, *if* you access it either in big blocks or as a serial stream, in which case the compiler can transform it to block access. For random accesses, the memory is slower. I have done some basic GPU programming, and I think that parsing is not a parallel task in that sense. The parser reads the input as a stream of tokens; you can't split the C file at some arbitrary point in half and parse both parts independently. Christian
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-09-01 19:02 +0200 |
| Message-ID | <20-09-004@comp.compilers> |
| In reply to | #2573 |
Am 01.09.2020 um 09:22 schrieb Christian Gollwitzer: > Am 31.08.20 um 12:35 schrieb Roger L Costello: >> Has the parsing community found a way to take advantage of GPUs? > I have done some basic GPU programming, and I think that parsing is not > a parallel task in that sense. The parser reads the input as a stream of > tokens; and acts *differently* depending on that input, while a GPU performs essentially the *same* operation to all elements of a stream (merge, scale...). > you can't split the C file at some arbitrary point in half and > parse both parts independently. Multiple files can be parsed in parallel just with a multi-core CPU. More parallel modules require more symbol tables etc., what limits the amount of concurrent threads by available (cached!) memory. DoDi
[toc] | [prev] | [next] | [standalone]
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2020-09-02 01:14 +0300 |
| Message-ID | <20-09-005@comp.compilers> |
| In reply to | #2573 |
Our esteemed moderator wrote: [Parsing is not usually an important factor in compiler performance. The slow parts are the lexer, because it has to look at every character of the input, and some optimizations that have to analyze the entire intermediate form of the program. The first step in lexing is to identify what class each character is, e.g., identifier, white space, or operator. Perhaps a GPU could do vector lookups to speed that up. For optimizations, I can sort of imagine how some analyses like reachability might be expressible as matrices. -John] As to parallelizing lexers, we did some experiments on speeding up regular expressions (mostly for Snort) while I was at Intel. Charlie Lasswell did experiments on starting parallel scans on different parts of a stream or file that showed some promise. The hyperscan team (aka Sensory Networks, Geoff Langdale &co) had some interesting results on when you could skip ahead and guess the correct state when using NFAs--I believe they called it "and then a miracle happens" (following the famous cartoon about a mathematical proof with some missing steps on a whiteboard). Michela Becchi worked on when you could predict states involved in xFA loops would terminate. Geoff had me do a mode they called Maclellan (the HS folks had some interesting naming conventions, one of them was the use of US Civil War generals for major variations/modes) which was turning NFAs into parallelized DFAs for Hyperscan. The regular expression engine we put into the Cave Creek chip, did a different form of parallel DFAs, spawning them when certain conditions were met, and for things like the Aho-Corasick algorithm you can show that there tends to be a "bushy" part near the head of the automata, with "skinny" parts being forked off after a small number of bytes--that was the key insight that led to our chip. So, there are definite possibilities for doing SIMD regular expression matching that would be suitable for GPU implementation. Geoff and I discussed how to approach it on numerous occasions. It just never quite got past the to-consider list. Some of this "could" also be applied to parsing, particularly in a GLR style parser, where you have forests. But our moderator is correct in saying the big boost would be in speeding up lexing. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
[toc] | [prev] | [next] | [standalone]
| From | Jan Ziak <0xe2.0x9a.0x9b@gmail.com> |
|---|---|
| Date | 2020-09-02 02:13 -0700 |
| Message-ID | <20-09-008@comp.compilers> |
| In reply to | #2573 |
On Tuesday, September 1, 2020 at 6:03:27 PM UTC+2, Christian Gollwitzer wrote: > The parser reads the input as a stream of > tokens; you can't split the C file at some arbitrary point in half and > parse both parts independently. Of course you can split asm/C/C++/Go/Python/Rust/etc file at arbitrary points: when the state of the lexical analyzer collapses to a single state starting from a random file position with an arbitrary starting state. -atom
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-09-02 22:34 +0200 |
| Message-ID | <20-09-011@comp.compilers> |
| In reply to | #2579 |
Am 02.09.2020 um 11:13 schrieb Jan Ziak: > On Tuesday, September 1, 2020 at 6:03:27 PM UTC+2, Christian Gollwitzer wrote: >> The parser reads the input as a stream of >> tokens; you can't split the C file at some arbitrary point in half and >> parse both parts independently. > > Of course you can split asm/C/C++/Go/Python/Rust/etc file at arbitrary > points: Certainly not for C/C++. The preprocessor and included files modify the original source. That's one of the reasons for the slow C/C++ compilation. > when the state of the lexical analyzer collapses to a single > state starting from a random file position with an arbitrary starting > state. If you mean the lexer by "lexical analysis", that's only a front end for the parser. A compiler needs more state and context information like symbol tables, which can not be built from arbitrary parts of a source file. DoDi [One can certainly pipeline the C preprocessor and later phases but I'm guessing that's not what you're talking about here. -John]
[toc] | [prev] | [next] | [standalone]
| From | "A. K." <minforth@arcor.de> |
|---|---|
| Date | 2020-09-01 01:25 -0700 |
| Message-ID | <20-09-003@comp.compilers> |
| In reply to | #2572 |
Am Dienstag, 1. September 2020 06:44:53 UTC+2 schrieb Roger L Costello: > Has the parsing community found a way to take advantage of GPUs? The old IT classic still holds: Early optimization is the root of all evil. IOW: first identify your bottlenecks by profiling your parsing routines. Repeat it. Most probably bottlenecks will have to do with I/O speed and selected algorithms. Memory access speed perhaps, CPU/GPU performance very behind.
[toc] | [prev] | [next] | [standalone]
| From | arnold@skeeve.com (Aharon Robbins) |
|---|---|
| Date | 2020-09-02 05:43 +0000 |
| Message-ID | <20-09-007@comp.compilers> |
| In reply to | #2574 |
In article <20-09-003@comp.compilers>, A. K. <minforth@arcor.de> wrote: >Am Dienstag, 1. September 2020 06:44:53 UTC+2 schrieb Roger L Costello: >The old IT classic still holds: Early optimization is the root of all evil. Jus to give credit where credit is due, it was Donald Knuth who said "Premature optimization is the root of all evil." IIRC it was in his famous "Structured Programming with GOTO Statements" paper. Arnold
[toc] | [prev] | [next] | [standalone]
| From | Elijah Stone <elronnd@elronnd.net> |
|---|---|
| Date | 2020-09-01 20:13 -0700 |
| Message-ID | <20-09-006@comp.compilers> |
| In reply to | #2572 |
On Mon, 31 Aug 2020, Roger L Costello wrote: > Any thoughts you might have on: > > (a) parsing-using-GPUs, and > (b) recasting-the-parsing-problem-into-an-arithmetic-problem Look up Aaron Hsu's Ph.D thesis, _A data parallel compiler hosted on the GPU_ (https://scholarworks.iu.edu/dspace/handle/2022/24749). As John (and others) mention, I don't think the GPU is an especially interesting target to speed up parsing specifically, but it may be a fruitful line of inquiry. If so then that thesis is, as far as I can tell, the only research that has been done so far; maybe you can make your own experiments based on it. -- time flies like an arrow; fruit flies like a banana
[toc] | [prev] | [next] | [standalone]
| From | Roger L Costello <costello@mitre.org> |
|---|---|
| Date | 2020-09-02 11:45 +0000 |
| Message-ID | <20-09-009@comp.compilers> |
| In reply to | #2577 |
> Look up Aaron Hsu's Ph.D thesis, > A data parallel compiler hosted on the GPU > (https://scholarworks.iu.edu/dspace/handle/2022/24749) Abstract: This work describes a general, scalable method for building data-parallel by construction tree transformations that exhibit simplicity, directness of expression, and high-performance on both CPU and GPU architectures when executed on either interpreted or compiled platforms across a wide range of data sizes, as exemplified and expounded by the exposition of a complete compiler for a lexically scoped, functionally oriented programming commercial language. The entire source code to the compiler written in this method requires only 17 lines of simple code compared to roughly 1000 lines of equivalent code in the domain-specific compiler construction framework, Nanopass, and requires no domain specific techniques, libraries, or infrastructure support. It requires no sophisticated abstraction barriers to retain its concision and simplicity of form. The execution performance of the compiler scales along multiple dimensions: it consistently outperforms the equivalent traditional compiler by orders of magnitude in memory usage and run time at all data sizes and achieves this performance on both interpreted and compiled platforms across CPU and GPU hardware using a single source code for both architectures and no hardware-specific annotations or code. It does not use any novel domain-specific inventions of technique or process, nor does it use any sophisticated language or platform support. Indeed, the source does not utilize branching, conditionals, if statements, pattern matching, ADTs, recursions, explicit looping, or other non-trivial control or dispatch, nor any specialized data models. --------------------------------------------------- Wow! Thanks for the reference Elijah! /Roger
[toc] | [prev] | [next] | [standalone]
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2020-09-02 17:57 +0300 |
| Message-ID | <20-09-010@comp.compilers> |
| In reply to | #2572 |
To follow up on the part of the question about using GPUs to do arithmetic to implement DFA transitions. It can "easily" be done. The base ideas can be found in the Wu Mamber string search algorithm. You treat states like (usually one-hot) bits in a fixed length (usually one word) bit vector. Your transition function then simply maps bit vectors to bit vectors. You can do that as a SIMD instruction, thus it is suitable for a GPU to execute. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2020-09-09 14:09 -0700 |
| Message-ID | <20-09-025@comp.compilers> |
| In reply to | #2572 |
On Monday, August 31, 2020 at 9:44:53 PM UT C-7, Roger L Costello wrote: > Hi Folks, > > I am reading a book [1] on machine learning and the book says some pretty > interesting things: > > "In the search for more speed, machine learning researchers started taking > advantage of special hardware found in some computers, originally designed to > improve graphics performance. You may have heard these called graphics cards. The usual way to use a GPU is for single precision floating point. They might also be able to do fixed point of a reasonable size. As above, parsing is usually fast enough. GPUs are often enough used for dynamic programming, which is sometimes used for optimization and code generation in compilers. It might be that those could use a speed boost. This might be more true for unusual architectures where optimal code generation is more important.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web