Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: =?UTF-8?B?SXN0dsOhbiBQw6FzenRvcg==?= Newsgroups: gnu.bash.bug Subject: Minor utf32-to-utf8 bug Date: Sun, 10 Nov 2019 14:07:49 +0000 Lines: 39 Approved: bug-bash@gnu.org Message-ID: References: NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="000000000000fb714a0596fe8a10" X-Trace: usenet.stanford.edu 1573396992 9325 209.51.188.17 (10 Nov 2019 14:43:12 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash@gnu.org Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=k5c/Ys6dJcglEKl4aY70ex11TsqZXYM5VnGjtJXhKiM=; b=nj7WF7zurOifW96cytPLYGvqcRwJP4X3wNs2OngyzRVVH1ZbOFi7Hx+CSdaTsvJXl6 Yo/lnIGNwWzHsuq9VStX5H/Md/EmWTVCDRUsIg+rpsGnwd440Mw53C78UPWURtLWccCp appgPVjtbG3PW5skcxwDdznz6DagpIpH96ICtmIuVb3P55axTuaFUEoNMCw7Z+pxBfE/ +8dO3GAYg0Dji9WfpjcUeB5TWeCtJgncHlxEHFLpE/E9GghNlrMcri41LwqaUDtMGuGR nNDDoN3+gx72gDe54oNOrqypWCvcKhDXmIuC3uXH2gtWbqIIMq+PGgABpaXKcaJeCRXt EOrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=k5c/Ys6dJcglEKl4aY70ex11TsqZXYM5VnGjtJXhKiM=; b=ssfk9TH579x6dndETA5ODURJ7X0OzLigUqSb4nGKMPdjNlLnCpZihtDLAsOJhbw/wP B0ruXBHOxLMOOZUamdVc1GaeyOm4+IybLSAPQU2ssrt3WKHUisRUg2gG8ElqtJ5kQpbZ s2AS2qTYw1XIvq0WODtzRA+RKzF+Uh3bvkM8raY8nJtQZOMhTaRLnafpCv9yyvOvTsKZ pqFlq8gE/+Ukgai7JHs20vCJpTQjgeXQzjBUl6qfpieIUw6JgRFW4/S/IwUMydvwdbNX AEgUR5b8e3s/8mM1qG8FPYu+crKPeg9ZNti1KCSblEF5hb733thJnzJ6Pj8c3LfchwnH hfdw== X-Gm-Message-State: APjAAAX+NgyJ56QgSgYUMKaRI1+0JqirIgScaTVGQ9rj4+X57mRZVp3U fiyr44h+zffHMrYmSPooPpkb3pvkcNBQYprrPY8Kqg7v X-Google-Smtp-Source: APXvYqxWT79eig7eSLOwMIC2I+n3OtkcLKsKYmu9PEgF55hxX0nOleB0DU1eSIGwrNZjDTKE3jNo+iIWJaREy2FNLI4= X-Received: by 2002:a92:495a:: with SMTP id w87mr14650646ila.133.1573394881138; Sun, 10 Nov 2019 06:08:01 -0800 (PST) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::12d X-Mailman-Approved-At: Sun, 10 Nov 2019 09:43:10 -0500 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: Xref: csiph.com gnu.bash.bug:15591 --000000000000fb714a0596fe8a10 Content-Type: text/plain; charset="UTF-8" Hi The encoding of six bytes long utf-8 sequences is buggy. Today unicode requires at most 4 bytes long utf-8 sequences but if we handle 5 and 6 too then let's do it the right way. The attached patch was created using a fresh master clone (d894cfd104086ddf68c286e67a5fb2e02eb43b7b). I'm writing a tool (pxargs, an xargs variant) that can accept input strings in shell-quoted format and used the bash manual and source code as references. I haven't compiled the latest bash sources to check the bug but it is likely to affect ANSI-C quoted strings like $'\U7fffffff'. Best Regards, Istvan Pasztor --000000000000fb714a0596fe8a10 Content-Type: application/octet-stream; name="six_bytes_long_utf8_sequences_d894cfd104086ddf68c286e67a5fb2e02eb43b7b.patch" Content-Disposition: attachment; filename="six_bytes_long_utf8_sequences_d894cfd104086ddf68c286e67a5fb2e02eb43b7b.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_k2t1zper0 ZGlmZiAtLWdpdCBhL2xpYi9zaC91bmljb2RlLmMgYi9saWIvc2gvdW5pY29kZS5jCmluZGV4IGZl MTNjNGEwLi4xMjk4MGM0MyAxMDA2NDQKLS0tIGEvbGliL3NoL3VuaWNvZGUuYworKysgYi9saWIv c2gvdW5pY29kZS5jCkBAIC0xOTYsNyArMTk2LDcgQEAgdTMydG91dGY4ICh3YywgcykKICAgICB9 CiAgIGVsc2UgaWYgKHdjIDwgMHgwODAwMDAwMDApCiAgICAgewotICAgICAgc1swXSA9ICh3YyA+ PiAzMCkgfCAweGY4OworICAgICAgc1swXSA9ICh3YyA+PiAzMCkgfCAweGZjOwogICAgICAgc1sx XSA9ICgod2MgPj4gMjQpICYgMHgzZikgfCAweDgwOwogICAgICAgc1syXSA9ICgod2MgPj4gMTgp ICYgMHgzZikgfCAweDgwOwogICAgICAgc1szXSA9ICgod2MgPj4gMTIpICYgMHgzZikgfCAweDgw Owo= --000000000000fb714a0596fe8a10--