Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: Chet Ramey Newsgroups: gnu.bash.bug Subject: Re: Combination of "eval set -- ..." and $() command substitution is slow Date: Fri, 12 Jul 2019 10:44:27 -0400 Lines: 84 Approved: bug-bash@gnu.org Message-ID: References: <7ba227f9-ed87-8224-6f07-fd444488d472@case.edu> Reply-To: chet.ramey@case.edu NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Trace: usenet.stanford.edu 1562942677 4028 209.51.188.17 (12 Jul 2019 14:44:37 GMT) X-Complaints-To: action@cs.stanford.edu Cc: chet.ramey@case.edu To: astian , bug-bash@gnu.org Envelope-to: bug-bash@gnu.org Openpgp: preference=signencrypt Autocrypt: addr=chet.ramey@case.edu; prefer-encrypt=mutual; keydata= xsDiBEEOsGwRBACFa0A1oa71HSZLWxAx0svXzhOZNQZOzqHmSuGOG92jIpQpr8DpvgRh40Yp AwdcXb8QG1J5yGAKeevNE1zCFaA725vGSdHUyypHouV0xoWwukYO6qlyyX+2BZU+okBUqoWQ koWxiYaCSfzB2Ln7pmdys1fJhcgBKf3VjWCjd2XJTwCgoFJOwyBFJdugjfwjSoRSwDOIMf0D /iQKqlWhIO1LGpMrGX0il0/x4zj0NAcSwAk7LaPZbN4UPjn5pqGEHBlf1+xDDQCkAoZ/VqES GZragl4VqJfxBr29Ag0UDvNbUbXoxQsARdero1M8GiAIRc50hj7HXFoERwenbNDJL86GPLAQ OTGOCa4W2o29nFfFjQrsrrYHzVtyA/9oyKvTeEMJ7NA3VJdWcmn7gOu0FxEmSNhSoV1T4vP2 1Wf7f5niCCRKQLNyUy0wEApQi4tSysdz+AbgAc0b/bHYVzIf2uO2lIEZQNNt+3g2bmXgloWm W5fsm/di50Gm1l1Na63d3RZ00SeFQos6WEwLUHEB0yp6KXluXLLIZitEJM0gQ2hldCBSYW1l eSA8Y2hldC5yYW1leUBjYXNlLmVkdT7CYQQTEQIAIQIbAwYLCQgHAwIDFQIDAxYCAQIeAQIX gAUCRX3FIgIZAQAKCRC7WGnwZOp0q069AKCNDRn+zzN/AHbaynls/Lvq1kH/RQCgkLvF8bDs maUHSxSIPqzlGuKWDxbOwE0EQQ6wbxAEAJCukwDigRDPhAuI+lf+6P64lWanIFOXIndqhvU1 3cDbQ/Wt5LwPzm2QTvd7F+fcHOgZ8KOFScbDpjJaRqwIybMTcIN0B2pBLX/C10W1aY+cUrXZ gXUGVISEMmpaP9v02auToo7XXVEHC+XLO9IU7/xaU98FL69l6/K4xeNSBRM/AAMHA/wNAmRB pcyK0+VggZ5esQaIP/LyolAm2qwcmrd3dZi+g24s7yjV0EUwvRP7xHRDQFgkAo6++QbuecU/ J90lxrVnQwucZmfz9zgWDkT/MpfB/CNRSKLFjhYq2yHmHWT6vEjw9Ry/hF6Pc0oh1a62USdf aKAiim0nVxxQmPmiRvtCmcJJBBgRAgAJBQJBDrBvAhsMAAoJELtYafBk6nSr43AAn2ZZFQg8 Gs/zUzvXMt7evaFqVTzcAJ0cHtKpP1i/4H4R9+OsYeQdxxWxTQ== User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 In-Reply-To: Content-Language: en-US X-Junkmail-Status: score=7/90, host=mpv2-2015.case.edu X-Junkmail-PrAS-Raw: score=7/90, refid=2.7.2:2019.7.12.140017:17:7.944, ip=, rules=__HAS_REPLYTO, __HAS_CC_HDR, __SUBJ_REPLY, __BOUNCE_CHALLENGE_SUBJ, __BOUNCE_NDR_SUBJ_EXEMPT, __SUBJ_ALPHA_END, __TO_MALFORMED_2, __TO_NAME, __HAS_REFERENCES, __REFERENCES, __HAS_FROM, FROM_EDU_TLD, __HAS_MSGID, __SANE_MSGID, DATE_TZ_NA, __USER_AGENT, __MOZILLA_USER_AGENT, __MIME_VERSION, __IN_REP_TO, __CT, __CT_TEXT_PLAIN, __CTE, __REPLYTO_SAMEAS_FROM_ADDY, __REPLYTO_SAMEAS_FROM_ACC, __FROM_DOMAIN_IN_ANY_CC1, __FROM_DOMAIN_IN_ANY_CC2, __REPLYTO_SAMEAS_FROM_DOMAIN, __ANY_URI, __URI_WITH_PATH, __URI_NO_WWW, __CP_URI_IN_BODY, __FRAUD_MONEY_CURRENCY_DOLLAR, __SUBJ_ALPHA_NEGATE, __URI_IN_BODY, __URI_NOT_IMG, __FORWARDED_MSG, __BODY_NO_MAILTO, __NO_HTML_TAG_RAW, BODY_SIZE_3000_3999, __MIME_TEXT_P1, __MIME_TEXT_ONLY, __URI_NS, HTML_00_01, HTML_00_10, __FRAUD_MONEY_CURRENCY, BODY_SIZE_5000_LESS, IN_REP_TO, MSG_THREAD, __FROM_DOMAIN_IN_RCPT, [TRUNCATED], so=2010-03-03 19:42:08, dmn=2016-08-03-0138 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 129.22.103.227 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <7ba227f9-ed87-8224-6f07-fd444488d472@case.edu> X-Mailman-Original-References: Xref: csiph.com gnu.bash.bug:15135 On 7/10/19 1:21 PM, astian wrote: > Bash Version: 5.0 > Patch Level: 3 > Release Status: release > > Description: > > I discovered a curious performance degradation in the combined usage of the > constructs "eval set -- ..." and new-style command substitution. In short, > setting the positional arguments via eval and then iterating over each one > while performing $() command substitution(s) is significantly slower than > not using eval, or not making command substitution, or using `` instead. > > I include below a reduced test script that illustrates the issue. A few > notes: > - The pathological case is "1 1 0". > - I did not observe performance difference in unoptimised builds (-O0). > > -------------------------- > case 1 1 0 > eval set > real 0m0.002s > user 0m0.000s > sys 0m0.000s > for loop cmdsubst-currency > real 0m0.968s > user 0m0.432s > sys 0m0.148s > -------------------------- > > Observations: > - The pathological case "1 1 0" spends about 10 times more time doing > something in userspace during the loop, relative to the comparable cases > "0 1 0", "0 1 1", and "1 1 1". > - $() seems generally slightly slower than ``, but becomes pathologically > so when preceded with "eval set -- ...". It is slightly slower -- POSIX requires that the shell parse the contents of $(...) to determine that it's a valid script as part of finding the closing `)'. The rules for finding the closing "`" don't have that requirement. > - "eval set -- ..." itself doesn't seem slow at all, but obviously it has > side-effects not captured by the "time" measurement tool. What happens is you end up with a 4900-character command string that you have to parse multiple times. But that's not the worst of it. The gprof output provides a clue. > case 1 1 0 (pathological): > % cumulative self self total > time seconds seconds calls us/call us/call name > 38.89 0.21 0.21 28890 7.27 7.27 set_line_mbstate set_line_mbstate() runs through each command line before parsing, creating a bitmap that indicates whether each element is a single-byte character or part of a multi-byte character. The scanner uses this to determine whether a shell metacharacter should act as a delimiter or get skipped over as part of a multibyte character. For a single run with args `1 1 0', it gets called around 7300 times, with around 2400 of them for the 4900-character string with all the arguments. When you're in a multibyte locale (en_US.UTF-8 is one such), each one of those characters requires a call to mbrlen/mbrtowc. So that ends up being 2400 * 4900 calls to mbrlen. There is something happening here -- there's no way there should be that many calls to set_line_mbstate(), even when you have to save and restore the input line because you have to parse the contents of $(). There must be some combination of the effect of `eval' on the line bitmap and the long string. I'll see what I can figure out. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/