Path: csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: astian Newsgroups: gnu.bash.bug Subject: Re: Combination of "eval set -- ..." and $() command substitution is slow Date: Sat, 13 Jul 2019 17:36:00 +0000 Lines: 129 Approved: bug-bash@gnu.org Message-ID: References: <7ba227f9-ed87-8224-6f07-fd444488d472@case.edu> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: usenet.stanford.edu 1563039416 25524 209.51.188.17 (13 Jul 2019 17:36:56 GMT) X-Complaints-To: action@cs.stanford.edu Cc: chet.ramey@case.edu To: bug-bash@gnu.org Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=e-nautia.com; s=e-nautia; t=1563039405; bh=UnyAayy6tGYJvgxuPG7tta0TWFgpNwr3kS3pi+ULX4o=; h=Subject:To:References:From:Cc:Date:In-Reply-To:From; b=xWYow5eTcbspyJTklXySNhirscIDILTGPEJfgf5kLC2yGoxOFE9Vz3vuhVTuNVpZS vapBNpIP70vzO+M5kCe6td8fPVPuq9lmomqs0gF6prA/UigJtTUggNNf3O4MvZ8C1Z 3sYwIdoApU/zvyKEIx08NbksK0/WbDI9zEpWtw34= Authentication-Results: m1; dmarc=fail (p=none dis=none) header.from=e-nautia.com DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=e-nautia.com; s=e-nautia; t=1563039404; bh=UnyAayy6tGYJvgxuPG7tta0TWFgpNwr3kS3pi+ULX4o=; h=Subject:To:References:From:Cc:Date:In-Reply-To:From; b=l3TX/mV9feVqUDJ4htr8I9bs5q9UAfUDbLzZ9fH0Mng0gO9LKnwwUBqstxbjtMPgS 42ytStb6C6MubB9pDET4QyUOrZnsQIDb/Ye72Yg2uKiDtu+PbWv1pxEUkBlqS1Cnn0 PWFwV3F+YmYQUCHlYZUw+uDSP/C/8akLCz6tf4A4= In-Reply-To: <7ba227f9-ed87-8224-6f07-fd444488d472@case.edu> Content-Language: en-US X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2001:41d0:303:7741::1 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <7ba227f9-ed87-8224-6f07-fd444488d472@case.edu> Xref: csiph.com gnu.bash.bug:15162 Dear Chet Ramey, thanks for the explanations, I assume you were able to reproduce the issue. First a warning that I forgot to include in the previous email: the quote= d excerpts of the gprof tables correspond to a slightly modified test scrip= t in which I tried to magnify the overhead in order to more easily spot perfor= mance differences. Therefore, the absolute values displayed, e.g. number of function calls, are exaggerated. The relative values (and hopefully your conclusions) remain largely the same, though. Sorry for any potential confusion. A few short comments prompted by your reply (probably not very useful, fe= el free to ignore them). Chet Ramey: >> - $() seems generally slightly slower than ``, but becomes patholo= gically >> so when preceded with "eval set -- ...". > > It is slightly slower -- POSIX requires that the shell parse the conten= ts > of $(...) to determine that it's a valid script as part of finding the > closing `)'. The rules for finding the closing "`" don't have that > requirement. > >> - "eval set -- ..." itself doesn't seem slow at all, but obviously= it has >> side-effects not captured by the "time" measurement tool. > > What happens is you end up with a 4900-character command string that yo= u > have to parse multiple times. But that's not the worst of it. Since this statement ought to run exactly once, na=C3=AFvely I would expe= ct that by "multiple times" you really mean at most "twice": once for the top-level script, another time "inside" the eval "sub-script". > The gprof output provides a clue. > > >> case 1 1 0 (pathological): >> % cumulative self self total >> time seconds seconds calls us/call us/call name >> 38.89 0.21 0.21 28890 7.27 7.27 set_line_mb= state > > set_line_mbstate() runs through each command line before parsing, creat= ing > a bitmap that indicates whether each element is a single-byte character= or > part of a multi-byte character. The scanner uses this to determine whet= her > a shell metacharacter should act as a delimiter or get skipped over as = part > of a multibyte character. For a single run with args `1 1 0', it gets > called around 7300 times, with around 2400 of them for the 4900-charact= er > string with all the arguments. > > When you're in a multibyte locale (en_US.UTF-8 is one such), each one o= f > those characters requires a call to mbrlen/mbrtowc. So that ends up bei= ng > 2400 * 4900 calls to mbrlen. I am indeed using an UTF-8 locale, but I tested also with export LC_ALL=3D= C and the behaviour did not change, I should have mentioned that. Also, it occurs to me that since all shell metacharacters are surely 7-bi= t ASCII, and all UTF-8 code units are strictly and deliberately non-ASCII 8= -bit bytes, at least in the case of UTF-8 it may not be necessary to construct= such bitmap: there cannot be a shell-metacharacter byte part of a UTF-8 sequen= ce representing something else, so there's no need for the distinction. Of course, this observation applies only to such specially crafted multibyte encodings as UTF-8 (which nonetheless is surely by far the most common). Furthermore, it does not explain the issue at hand. > There is something happening here -- there's no way there should be tha= t > many calls to set_line_mbstate(), Notice that there are almost as many calls (only 2 fewer) in case "0 1 0"= (in which eval is not used) yet in that case the performance is not harmed. Quoting from the previous email: case 0 1 0: 2.08 0.32 0.01 28888 0.00 0.00 set_line_mbstate case 1 1 0 (pathological): 38.89 0.21 0.21 28890 7.27 7.27 set_line_mbstate OTOH, all other test cases show less than 100 calls (these numbers corres= pond to a profiling run of the given script, unmodified): $ grep -m1 set_line_mbstate gmon*.table gmon.out.0-0-X.table: 0.00 0.00 0.00 88 0.00 0.= 00 set_line_mbstate gmon.out.0-1-0.table: 0.00 0.13 0.00 7288 0.00 0.= 00 set_line_mbstate gmon.out.0-1-1.table: 0.00 0.18 0.00 88 0.00 0.= 00 set_line_mbstate gmon.out.1-0-X.table: 0.00 0.01 0.00 90 0.00 0.= 00 set_line_mbstate gmon.out.1-1-0.table: 37.50 0.06 0.06 7290 8.23 8.= 23 set_line_mbstate gmon.out.1-1-1.table: 0.00 0.07 0.00 90 0.00 0.= 00 set_line_mbstate > even when you have to save and restore > the input line because you have to parse the contents of $(). There mus= t > be some combination of the effect of `eval' on the line bitmap and the > long string. I'll see what I can figure out. > > Chet Cheers.