Path: csiph.com!fu-berlin.de!usenet.stanford.edu!not-for-mail From: Chet Ramey Newsgroups: gnu.bash.bug Subject: Re: Bash removes unrequested characters in bracket expressions (not a range). Date: Sat, 24 Nov 2018 17:32:11 -0500 Lines: 38 Approved: bug-bash@gnu.org Message-ID: References: <1c24a279-f439-a13c-be60-901096ccd4e1@case.edu> Reply-To: chet.ramey@case.edu NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: usenet.stanford.edu 1543846705 31513 208.118.235.17 (3 Dec 2018 14:18:25 GMT) X-Complaints-To: action@cs.stanford.edu Cc: chet.ramey@case.edu To: Bize Ma Envelope-to: bug-bash@gnu.org Resent-Date: Mon, 3 Dec 2018 09:18:03 -0500 In-Reply-To: Resent-From: Chet Ramey Resent-To: bug-bash@gnu.org, bash@packages.debian.org X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:delivered-to :reply-to:cc:subject:to:references:from:openpgp:autocrypt :organization:message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=OId1gr+/tPeyQ8MpvLdrWGVLuOCZAxRzAtaKCePzLYk=; b=npQoNOJ9hBOn4WW60p7Bh7kvqiolxR2p4ZyiR78+T0oZUhgbCV6gfcnY8LDstpGjeY Q/4zjSHj7VSflzphcEecCH2Tp2c3x8pXsmqIvsU/gJ095khr8JUVKAxD5AL2NBOvTjFd osOjCaCn7dhYrS9jb1V5lErB5F8pLCWnu0hFRCeHNSafoF9J8NB3wH8QiVilDfqV0O7n stuTgELXfuRSY/de5jloNJOd1vCVSpaHoWiIzjqEOxYg/WkBQBbNK/weQZoo27stnwK0 710AlBshyofEdz1CAACG8Yon/wFidcZhyUgXOLGRg+3alwiAAI3XStbrHro2eTpwEJnr mZ2Q== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of chet.ramey@case.edu designates 209.85.222.197 as permitted sender) smtp.mailfrom=chet.ramey@case.edu X-Gm-Message-State: AA+aEWZOK1WEJ5Pxi6tiaHsEIww78ooYXDjDwP7W/0CpA7sjy/DkZUvz b2h9ZKGAPovFiRrLTK51sDgegg6YEkymdmR3Qle6nVmf6CztfGeG/s+nFQynWmS9tQ94EUsuLQm vRsRnvEOPmBXd6996O77y4hNeCkQnnvp8/37Tb05InJztm0wzKh1OeA== X-Received: X-Received X-Received: by 2002:a1c:48c2:: with SMTP id v185-v6mr14138800wma.1.1543098735979; Sat, 24 Nov 2018 14:32:15 -0800 (PST) X-Forwarded-To: chet@chet-mail.tis.cwru.edu X-Forwarded-For: cpr@case.edu chet@chet-mail.tis.cwru.edu X-Delivered-To: cpr@cwru.edu X-Google-Smtp-Source: AFSGD/X6aO9oiCUdML3CvH1v51KAxt7IzESdrsxh15b53PhzIdzM59TUIULTY9LwWC2Xb2mIKLSx X-ARC-Seal: i=1; a=rsa-sha256; t=1543098734; cv=none; d=google.com; s=arc-20160816; b=oaCc3cECd7UGQwckMJonjvu8CS1hT+BmcsOLLJnTtFvjLucQILzzM/pT6fGAnXHXgv NlVCk+20R7f5/23rG+DNBtppG6QpxR9p9q4r9N9DhAymF/FticCn/uM1FpVFauXrRe9W 4VHFcFu4tMES4f+5dMdmNWhHdvDMZxpRrGVWY6rrpUM5N4TW8aJWkBq7oL6n9nxXdRXw zP3kEXQsJBV4aFzuZEwgF0uUi62kHspYkTL8t1Rvo+1VhKA/ph4pv3JuD1UZFAg7hSd1 nB3WL9vRt+sjpeEP2oUCpAXFI5TaePOivDgMWeh+x583eH9C1jMScKk54DslNflh5faE Ev2w== X-ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:content-language:in-reply-to:mime-version :user-agent:date:message-id:organization:autocrypt:openpgp:from :references:to:subject:cc:reply-to; bh=OId1gr+/tPeyQ8MpvLdrWGVLuOCZAxRzAtaKCePzLYk=; b=mSZRRa9qBVtQX+2Q6oU0eWr/H4ldq3+qT8MtHtk2Iew5jYvzHIPfmbdVfxEHeiYAc4 4/gl5tcD6LO0k8Yl3BEFoX/2TdOiHa3+8bgSpDFYg1Jd12REukv+sZb24gKq8JZBfTH6 wap4sWOCKcuyeb8VSAVq4Dlk0WIVa8E9VIYh6PsulTo+bUl6qafoU0R12GQ8Jm5M6x6E t15AkmmQKLI1J3BSH71Y5vkIznbkHH8dobJULPa65dk3wz1b6sGpov0D55pLjTcYPk5l bnpJ2Du9X7ZykSNdTq5okXA8B14OPMRkCHMRO6mmJM631FpyzZMKzB5CTsoQmBfPFN05 qPWg== X-ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of chet.ramey@case.edu designates 209.85.222.197 as permitted sender) smtp.mailfrom=chet.ramey@case.edu X-Received-SPF: pass (google.com: domain of chet.ramey@case.edu designates 209.85.222.197 as permitted sender) client-ip=209.85.222.197; X-Authentication-Results: mx.google.com; spf=pass (google.com: domain of chet.ramey@case.edu designates 209.85.222.197 as permitted sender) smtp.mailfrom=chet.ramey@case.edu X-Openpgp: preference=signencrypt X-Autocrypt: addr=chet.ramey@case.edu; prefer-encrypt=mutual; keydata= xsDiBEEOsGwRBACFa0A1oa71HSZLWxAx0svXzhOZNQZOzqHmSuGOG92jIpQpr8DpvgRh40Yp AwdcXb8QG1J5yGAKeevNE1zCFaA725vGSdHUyypHouV0xoWwukYO6qlyyX+2BZU+okBUqoWQ koWxiYaCSfzB2Ln7pmdys1fJhcgBKf3VjWCjd2XJTwCgoFJOwyBFJdugjfwjSoRSwDOIMf0D /iQKqlWhIO1LGpMrGX0il0/x4zj0NAcSwAk7LaPZbN4UPjn5pqGEHBlf1+xDDQCkAoZ/VqES GZragl4VqJfxBr29Ag0UDvNbUbXoxQsARdero1M8GiAIRc50hj7HXFoERwenbNDJL86GPLAQ OTGOCa4W2o29nFfFjQrsrrYHzVtyA/9oyKvTeEMJ7NA3VJdWcmn7gOu0FxEmSNhSoV1T4vP2 1Wf7f5niCCRKQLNyUy0wEApQi4tSysdz+AbgAc0b/bHYVzIf2uO2lIEZQNNt+3g2bmXgloWm W5fsm/di50Gm1l1Na63d3RZ00SeFQos6WEwLUHEB0yp6KXluXLLIZitEJM0aQ2hldCBSYW1l eSA8Y2hldEBjd3J1LmVkdT7CYQQTEQIAIQIbAwYLCQgHAwIDFQIDAxYCAQIeAQIXgAUCQ+La kQIZAQAKCRC7WGnwZOp0q9rGAJ4sRGLmlF8klZTH75z7jyQScpU6aACeNMahjWIhumt4u96d 9mdMJqlabVnOwE0EQQ6wbxAEAJCukwDigRDPhAuI+lf+6P64lWanIFOXIndqhvU13cDbQ/Wt 5LwPzm2QTvd7F+fcHOgZ8KOFScbDpjJaRqwIybMTcIN0B2pBLX/C10W1aY+cUrXZgXUGVISE MmpaP9v02auToo7XXVEHC+XLO9IU7/xaU98FL69l6/K4xeNSBRM/AAMHA/wNAmRBpcyK0+Vg gZ5esQaIP/LyolAm2qwcmrd3dZi+g24s7yjV0EUwvRP7xHRDQFgkAo6++QbuecU/J90lxrVn QwucZmfz9zgWDkT/MpfB/CNRSKLFjhYq2yHmHWT6vEjw9Ry/hF6Pc0oh1a62USdfaKAiim0n VxxQmPmiRvtCmcJJBBgRAgAJBQJBDrBvAhsMAAoJELtYafBk6nSr43AAn2ZZFQg8Gs/zUzvX Mt7evaFqVTzcAJ0cHtKpP1i/4H4R9+OsYeQdxxWxTQ== X-Organization: ITS, Case Western Reserve University X-User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 X-Content-Language: en-US Resent-Message-Id: <20181203141804.2353C32D770D@caleb.ins.cwru.edu> X-Junkmail-Status: score=7/90, host=mpv3-2015.case.edu X-Junkmail-PrAS-Raw: score=7/90, refid=2.7.2:2018.12.3.135116:17:7.944, ip=, rules=DATE_TZ_NA, __HAS_FROM, FROM_EDU_TLD, __TO_MALFORMED_2, __TO_NAME, __TO_NAME_DIFF_FROM_ACC, __SUBJ_REPLY, __BOUNCE_CHALLENGE_SUBJ, __BOUNCE_NDR_SUBJ_EXEMPT, __HAS_CC_HDR, __HAS_REPLYTO, __REFERENCES, __IN_REP_TO, __X_RESENT_FROM, __HAS_MSGID, __SANE_MSGID, __MIME_VERSION, __CT, __CT_TEXT_PLAIN, __CTE, __X_GOOGLE_DKIM_SIGNATURE, __X_FORWARDED_TO, __REPLYTO_SAMEAS_FROM_ADDY, __REPLYTO_SAMEAS_FROM_ACC, __FROM_DOMAIN_IN_ANY_CC1, __REPLYTO_SAMEAS_FROM_DOMAIN, __ANY_URI, __URI_WITH_PATH, __URI_NO_WWW, __HIGHBITS, __CP_URI_IN_BODY, __FRAUD_MONEY_CURRENCY_DOLLAR, __SUBJ_ALPHA_NEGATE, __URI_IN_BODY, __URI_NOT_IMG, __FORWARDED_MSG, __NO_HTML_TAG_RAW, BODYTEXTP_SIZE_3000_LESS, BODY_SIZE_1400_1499, __MIME_TEXT_P1, __MIME_TEXT_ONLY, __URI_NS, HTML_00_01, HTML_00_10, __FRAUD_MONEY_CURRENCY, BODY_SIZE_5000_LESS, IN_REP_TO, MSG_THREAD, __FROM_DOMAIN_IN_RCPT, [TRUNCATED], so=2010-03-03 19:42:08, dmn=2016-08-03-0138 X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.5C053B1D.003C,ss=1,re=0.000,fgs=0, ip=0.0.0.0, so=2016-11-06 16:00:04, dmn=2011-05-27 18:58:46 X-Mirapoint-Loop-Id: 516e7a201e238d92cc4e9ba7973220ba X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] [fuzzy] X-Received-From: 129.22.103.194 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com gnu.bash.bug:14873 On 11/24/18 4:32 PM, Bize Ma wrote: > > Bash is removing characters not explicitly listed in a bracket > > expression (character range). > > In this example, it is removing digits from other languages. > > What is your locale? > >   > The locale used was en_US.utf-8 but also happens with  459 > locales out of 868 available under Debian (not in C, for example). > > Also in all locales affected (except one), setting either > LC_ALL=$loc or LC_COLLATE=$loc did the same. > Except in zh_CN.gb18030 > > But IMO locale collation should not be used for an explicit list. Collation order is used for each individual character in a bracket expression when compared against the string, as posix specifies. > I have been made aware that there is a >       cstart = cend = FOLD (cstart); > inside the `sm_loop.c` file that will convert into a range many > individual character. If that understanding is correct that is the > source of the difference with other shells. I'm not sure what you mean by "convert into a range." If cstart and cend were treated as a range, the start end and end characters would be the same. If cstart == cend, a character that collates >= cstart and <= cend would have to collate equal to cstart and cend. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/