Groups > gnu.bash.bug > #11957

bash variable names do not comply w/POSIX character set rules

From	Linda Walsh <bash@tlinx.org>
Newsgroups	gnu.bash.bug
Subject	bash variable names do not comply w/POSIX character set rules
Date	2015-12-05 21:43 -0800
Message-ID	<mailman.1501.1449380599.31583.bug-bash@gnu.org> (permalink)

Show all headers | View raw



Under section 2.5.3, Shell Variables, it mentions:

LC_CTYPE
    Determine the interpretation of sequences of bytes of text data as 
characters (for example, single-byte as opposed to multi-byte 
characters), which characters are defined as letters (character class 
alpha) and <blank> characters (character class blank), and the behavior 
of character classes within pattern matching.

If I have an LC_CTYPE set to UTF-8, then the rules in unicode as
to how the character is defined (alpha, numeric, alphanumeric, etc...)
seem appropriate to use.

In the bash man page, there is a definition of 'name':
   name   A word consisting only of  alphanumeric  characters  and  under-
          scores,  and beginning with an alphabetic character or an under-
          score.  Also referred to as an identifier.

However, I was looking for a char to visually separate
a "class" and a var in the class (would have liked something
like a.b, but "." isn't alpha numeric), but
"LATIN CAPITAL LETTER O WITH STROKE" (U+00D8), is alphabetic,
but doesn't work:
>  aØb=1
-bash: aØb=1: command not found

The POSIX portable character set:
6. Character Set
6.1 Portable Character Set

Conforming implementations shall support one or more coded character 
sets. Each supported locale shall include the portable character set, 
which is the set of symbolic names for characters in Portable Character 
Set. This is used to describe characters within the text of 
POSIX.1-2008. The first eight entries in Portable Character Set are 
defined in the ISO/IEC 6429:1992 standard and the rest of the characters 
are defined in the ISO/IEC 10646-1:2000 standard.

ISO10646 = Unicode -- I.e. Posix appears to base its definition of
alphanumeric characters, for example, on the Unicode character set.

So, theoretically, any alphanumeric class char from Unicode should work
as described in the bash manpages, to compose a "name" (variable or
subroutine name), but this doesn't seem to be the case.

I know this isn't a trivial POSIX requirement to meet, but given
Gnu and bash's changes in the shell and unix command behavior, it
seems support of the character set would be the foundation of POSIX
compatibility.

It it were me, I'd probably try to look at the perl-handling (imperfect
as it may be) for unicode -- which has had alot of work put into it and
may be one of the more complete and up-to-date implementations for unicode
character handling.  I'd try to see if there was any part that might
either give ideas for bringing bash into compliance or any code that
might provide a pattern for implementation.  But investigating it further
might yield other, better options for bash.  Dunno.

Is this something that's even been thought about or is planned for?

Thanks!
-Linda

Back to gnu.bash.bug | Previous | Next | Find similar

Thread

bash variable names do not comply w/POSIX character set rules Linda Walsh <bash@tlinx.org> - 2015-12-05 21:43 -0800

csiph-web