Path: csiph.com!goblin2!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail From: Greg Wooledge Newsgroups: gnu.bash.bug Subject: Re: Bash vi mode's e command (end of word) goes to eol when hitting a unicode character Date: Tue, 4 Sep 2018 09:28:33 -0400 Lines: 37 Approved: bug-bash@gnu.org Message-ID: References: NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: usenet.stanford.edu 1536067753 20033 208.118.235.17 (4 Sep 2018 13:29:13 GMT) X-Complaints-To: action@cs.stanford.edu Cc: bug-bash@gnu.org To: Enrico Maria De Angelis Envelope-to: bug-bash@gnu.org Mail-Followup-To: Enrico Maria De Angelis , bug-bash@gnu.org Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 139.137.100.1 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com gnu.bash.bug:14548 On Mon, Sep 03, 2018 at 01:13:03PM +0200, Enrico Maria De Angelis wrote: > The version number of bash: GNU bash, version 4.4.23(1)-release > The hardware and operating system: Arch LInux (constatly update) > The compiler used to compile: I didn't compile bash myself > A description of the bug behaviour: & A short script or `recipe' which > exercises the bug: > While vi-editing a line like the following > $ ls bulk32³ grids.dat COPYING > with the cursor in normal mode at the beginning of the line, hitting e > repeatedly, cause the cursor to move in order to > s of ls (correct) > 2 of bulk32³ (correct, since Vim itself works like this, with an end of > word being detected in between 2 and ³) > end of line (wrong) I can confirm this in Debian's bash 4.4.12 and in bash 5.0-alpha. It's actually worse than Enrico reports. First, the cursor doesn't actually move to the end-of-line character ('G'). The cursor moves one space *past* that. Once there, pressing either 'h' or 'b' moves the cursor from end-of-line back to the ³ character. That's fairly odd on its, own, but it gets even more interesting. If you go back to beginning-of-line, then press 'e' 3 times (so the cursor is beyond the 'G'), then press 'i' ' ' to insert a space character, the multi-byte character gets broken up. What I see is this: wooledg:~$ ls bulk32� � grids.dat COPYING So, it seems the space was inserted in the middle of the byte sequence that constituted the ³ character (0xc2 0xb3) originally, resulting in two invalid-character bytes with a space in the middle. This is in LANG=en_US.UTF-8 on Debian 9 amd64.