Path: csiph.com!optima2.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!usenet.stanford.edu!not-for-mail From: Chet Ramey Newsgroups: gnu.bash.bug Subject: Re: \c-handling in $'-strings Date: Mon, 31 Aug 2015 10:17:17 -0400 Organization: ITS, Case Western Reserve University Lines: 73 Approved: bug-bash@gnu.org Message-ID: References: Reply-To: chet.ramey@case.edu NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Trace: usenet.stanford.edu 1441030661 1292 208.118.235.17 (31 Aug 2015 14:17:41 GMT) X-Complaints-To: action@cs.stanford.edu Cc: chet.ramey@case.edu To: Helmut Karlowski , bug-bash@gnu.org Envelope-to: bug-bash@gnu.org User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 In-Reply-To: X-Junkmail-Status: score=10/50, host=mpv5.cwru.edu X-Junkmail-Whitelist: YES (by domain whitelist at mpv2.tis.cwru.edu) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-Received-From: 129.22.105.37 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com gnu.bash.bug:11449 On 8/28/15 7:28 PM, Helmut Karlowski wrote: > Hello > > The bash-manual says: > > Words of the form $'string' are treated specially. The word expands to > string, with backslash-escaped characters replaced as specified by the > ANSI C standard. Backslash escape sequences, if present, are decoded as > follows: > > ... > > \cx a control-x character > > Now when I run this: > > { > echo $LINENO $'\h\ca\ek' > echo $LINENO $'\h\cA\ek' > echo $LINENO $'\h\cd\ek' > echo $LINENO $'\h\c\d\ek' > echo $LINENO $'\h\c|d\ek' > echo $LINENO $'\h\c echo $LINENO $'\h\c d\ek' > echo $LINENO $'\h\\c d\ek' > } | tee /dev/stderr | od -ax > > I get (output pasted from my editor): > > 2 \h^A^[k > 3 \h^A^[k > 4 \h^D^[k > 5 \h^\d^[k > 6 \h^\d^[k > 7 \h^\d^[k > 8 \h > 9 \h\c d^[k > 0000000 2 sp \ h soh esc k nl 3 sp \ h soh esc k nl > 2032 685c 1b01 0a6b 2033 685c 1b01 0a6b > 0000020 4 sp \ h eot esc k nl 5 sp \ h fs d esc k > 2034 685c 1b04 0a6b 2035 685c 641c 6b1b > 0000040 nl 6 sp \ h fs d esc k nl 7 sp \ h fs d > 360a 5c20 1c68 1b64 0a6b 2037 685c 641c > 0000060 esc k nl 8 sp \ h nl 9 sp \ h \ c sp d > 6b1b 380a 5c20 0a68 2039 685c 635c 6420 > 0000100 esc k nl > 6b1b 000a > 0000103 > > I wonder about the lines 6, 7, 8: 6,7: all non-alnum-characters (here | and > <) are printed as 0x1c? Conversion to a control character is effected by ANDing with 0x1f, since the valid control character range is 0-0x1f. If you have something that's not a valid control character after being ANDed with 0x1f, you get undefined results. There is a table in http://pubs.opengroup.org/onlinepubs/9699919799/utilities/stty.html#tag_20_123 that has the list of valid characters. > And line 8: Why is the output truncated after '\c '? Space is outside the range of a control character, and, as it happens, &0x1f == 0. The NUL causes the string to be truncated. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, ITS, CWRU chet@case.edu http://cnswww.cns.cwru.edu/~chet/