Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > gnu.bash.bug > #15591

Minor utf32-to-utf8 bug

Path csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From István Pásztor <pasztorpisti@gmail.com>
Newsgroups gnu.bash.bug
Subject Minor utf32-to-utf8 bug
Date Sun, 10 Nov 2019 14:07:49 +0000
Lines 39
Approved bug-bash@gnu.org
Message-ID <mailman.1182.1573396992.13325.bug-bash@gnu.org> (permalink)
References <CAFg+f+oDHn-5fyQGR4u3AWVTzwwXzDxv8y_kM8hb5_MAdZmurw@mail.gmail.com>
NNTP-Posting-Host lists.gnu.org
Mime-Version 1.0
Content-Type multipart/mixed; boundary="000000000000fb714a0596fe8a10"
X-Trace usenet.stanford.edu 1573396992 9325 209.51.188.17 (10 Nov 2019 14:43:12 GMT)
X-Complaints-To action@cs.stanford.edu
To bug-bash@gnu.org
Envelope-to bug-bash@gnu.org
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=k5c/Ys6dJcglEKl4aY70ex11TsqZXYM5VnGjtJXhKiM=; b=nj7WF7zurOifW96cytPLYGvqcRwJP4X3wNs2OngyzRVVH1ZbOFi7Hx+CSdaTsvJXl6 Yo/lnIGNwWzHsuq9VStX5H/Md/EmWTVCDRUsIg+rpsGnwd440Mw53C78UPWURtLWccCp appgPVjtbG3PW5skcxwDdznz6DagpIpH96ICtmIuVb3P55axTuaFUEoNMCw7Z+pxBfE/ +8dO3GAYg0Dji9WfpjcUeB5TWeCtJgncHlxEHFLpE/E9GghNlrMcri41LwqaUDtMGuGR nNDDoN3+gx72gDe54oNOrqypWCvcKhDXmIuC3uXH2gtWbqIIMq+PGgABpaXKcaJeCRXt EOrQ==
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=k5c/Ys6dJcglEKl4aY70ex11TsqZXYM5VnGjtJXhKiM=; b=ssfk9TH579x6dndETA5ODURJ7X0OzLigUqSb4nGKMPdjNlLnCpZihtDLAsOJhbw/wP B0ruXBHOxLMOOZUamdVc1GaeyOm4+IybLSAPQU2ssrt3WKHUisRUg2gG8ElqtJ5kQpbZ s2AS2qTYw1XIvq0WODtzRA+RKzF+Uh3bvkM8raY8nJtQZOMhTaRLnafpCv9yyvOvTsKZ pqFlq8gE/+Ukgai7JHs20vCJpTQjgeXQzjBUl6qfpieIUw6JgRFW4/S/IwUMydvwdbNX AEgUR5b8e3s/8mM1qG8FPYu+crKPeg9ZNti1KCSblEF5hb733thJnzJ6Pj8c3LfchwnH hfdw==
X-Gm-Message-State APjAAAX+NgyJ56QgSgYUMKaRI1+0JqirIgScaTVGQ9rj4+X57mRZVp3U fiyr44h+zffHMrYmSPooPpkb3pvkcNBQYprrPY8Kqg7v
X-Google-Smtp-Source APXvYqxWT79eig7eSLOwMIC2I+n3OtkcLKsKYmu9PEgF55hxX0nOleB0DU1eSIGwrNZjDTKE3jNo+iIWJaREy2FNLI4=
X-Received by 2002:a92:495a:: with SMTP id w87mr14650646ila.133.1573394881138; Sun, 10 Nov 2019 06:08:01 -0800 (PST)
X-detected-operating-system by eggs.gnu.org: Genre and OS details not recognized.
X-Received-From 2607:f8b0:4864:20::12d
X-Mailman-Approved-At Sun, 10 Nov 2019 09:43:10 -0500
X-Content-Filtered-By Mailman/MimeDel 2.1.23
X-BeenThere bug-bash@gnu.org
X-Mailman-Version 2.1.23
Precedence list
List-Id Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive <https://lists.gnu.org/archive/html/bug-bash>
List-Post <mailto:bug-bash@gnu.org>
List-Help <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
X-Mailman-Original-Message-ID <CAFg+f+oDHn-5fyQGR4u3AWVTzwwXzDxv8y_kM8hb5_MAdZmurw@mail.gmail.com>
Xref csiph.com gnu.bash.bug:15591

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

Hi

The encoding of six bytes long utf-8 sequences is buggy. Today unicode
requires at most 4 bytes long utf-8 sequences but if we handle 5 and 6 too
then let's do it the right way.

The attached patch was created using a fresh master clone
(d894cfd104086ddf68c286e67a5fb2e02eb43b7b).

I'm writing a tool (pxargs, an xargs variant) that can accept input strings
in shell-quoted format and used the bash manual and source code as
references. I haven't compiled the latest bash sources to check the bug but
it is likely to affect ANSI-C quoted strings like $'\U7fffffff'.

Best Regards,
Istvan Pasztor

Back to gnu.bash.bug | Previous | Next | Find similar | Unroll thread


Thread

Minor utf32-to-utf8 bug István Pásztor <pasztorpisti@gmail.com> - 2019-11-10 14:07 +0000

csiph-web