Path: csiph.com!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail From: Enrico Maria De Angelis Newsgroups: gnu.bash.bug Subject: Re: Bash vi mode's e command (end of word) goes to eol when hitting a unicode character Date: Tue, 4 Sep 2018 19:54:15 +0200 Lines: 51 Approved: bug-bash@gnu.org Message-ID: References: <20180904132833.4f5oo4qjefwzyvyw@eeg.ccf.org> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Trace: usenet.stanford.edu 1536083671 28724 208.118.235.17 (4 Sep 2018 17:54:31 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash@gnu.org Envelope-to: bug-bash@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=bd9SrlcK0xW7YIgjl+tMFIlj2+BqC/LO2VL944A77Xw=; b=mUjhpYYSMcaOiewiAVj4cTMI5dORqgjMpSmz3T+cgEco4S5qFhIAaEh30bEy/4RjbB wTZKvT9vWWVI0TnP+AsuJaImEuOXVs+cpDS9W5NCNsNCnT6ExfayZaMBoLp2HuNAp0Xk qWgLC9Eygrl0sVcDSKUgMUZb9iK68n/I/AC4pcao+hJRLnXuXJlzEDQfErlRPdfSeVcz JXhEtEgMmTpOuNfnePVsCGdEWGFl6QeKab1CBCj/SZsEVtosNjnaSSp+XEJ1uHpwGAgn 6LhzZiG59wQM+bBcgdcshNwWqLIdHvbKlzGDNxjdw6U2PrypcX+vGtNgacpR9R8VFQ4R 7MsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=bd9SrlcK0xW7YIgjl+tMFIlj2+BqC/LO2VL944A77Xw=; b=oGErBCRtLbqmupBsKzNT2afVV95a5/Eaw2NvuHE+o5FGwJXmbBa4gebemCxgBOGFhD qhZ3uZl6Tqy3Z4uUi8XK3wYonV9wPo9iFln9oX9aGpto6yDBHHgmwUamOg1WYa7uiY5c xGxUQNApXiBkzs0vwNCZgsm0TRtbA5HVr+YJSwhlUKpxeJ0Tp+cBBoOX+5ILwgSj9MuD PEmkrorCg+xe9uKeOui6TIEcR29rCwXcIiP+/Duj9REH7YS0T+CfO1e+cU+nQw+7hoFo rjtViM51TOQ6oJ3MobR6/3rhVrTypyK2XW20511Uvd4nKhsITVeQj09UB6fpqnnmvNLS ZeVA== X-Gm-Message-State: APzg51CQYI2OIjotOnXf9rOn13e69Zuk1CuG/X+SzEuRCH5KveovhgXs lmI/XiIBcn9/qnA9NLOmQym5P2XFcylRHzOPeGvmbw== X-Google-Smtp-Source: ANB0VdZy/EpLMRAU0ERJeMEYvooYidMpVGPK23bE7uPAyq9Dei7E4spkqzmOxItYmX0JXsU9NrvKwaO6akbbdGI5p3M= X-Received: by 2002:a5d:4d82:: with SMTP id b2-v6mr22827515wru.80.1536083666069; Tue, 04 Sep 2018 10:54:26 -0700 (PDT) In-Reply-To: <20180904132833.4f5oo4qjefwzyvyw@eeg.ccf.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::435 X-Content-Filtered-By: Mailman/MimeDel 2.1.21 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com gnu.bash.bug:14549 Ow, I'm sorry for not having investigated further, since I thought it was kind of expected. Thank you for doing it, Greg. Hope this will be fixed. Kind regards, Enrico Maria Il giorno mar 4 set 2018 alle ore 15:28 Greg Wooledge ha scritto: > On Mon, Sep 03, 2018 at 01:13:03PM +0200, Enrico Maria De Angelis wrote: > > The version number of bash: GNU bash, version 4.4.23(1)-release > > The hardware and operating system: Arch LInux (constatly update) > > The compiler used to compile: I didn't compile bash myself > > A description of the bug behaviour: & A short script or `recipe' which > > exercises the bug: > > While vi-editing a line like the following > > $ ls bulk32=C2=B3 grids.dat COPYING > > with the cursor in normal mode at the beginning of the line, hitting e > > repeatedly, cause the cursor to move in order to > > s of ls (correct) > > 2 of bulk32=C2=B3 (correct, since Vim itself works like this, with an e= nd of > > word being detected in between 2 and =C2=B3) > > end of line (wrong) > > I can confirm this in Debian's bash 4.4.12 and in bash 5.0-alpha. It's > actually worse than Enrico reports. > > First, the cursor doesn't actually move to the end-of-line character > ('G'). The cursor moves one space *past* that. > > Once there, pressing either 'h' or 'b' moves the cursor from end-of-line > back to the =C2=B3 character. That's fairly odd on its, own, but it gets > even more interesting. > > If you go back to beginning-of-line, then press 'e' 3 times (so the curso= r > is beyond the 'G'), then press 'i' ' ' to insert a space character, the > multi-byte character gets broken up. What I see is this: > > wooledg:~$ ls bulk32=EF=BF=BD =EF=BF=BD grids.dat COPYING > > So, it seems the space was inserted in the middle of the byte sequence > that constituted the =C2=B3 character (0xc2 0xb3) originally, resulting i= n > two invalid-character bytes with a space in the middle. > > This is in LANG=3Den_US.UTF-8 on Debian 9 amd64. >