Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Eric Sosman Newsgroups: comp.lang.java.programmer Subject: Re: ascii char 26 Date: Sun, 11 Sep 2011 18:28:25 -0400 Organization: A noiseless patient Spider Lines: 17 Message-ID: References: <16f8836c-27b9-483b-a71f-61d7d6cfd188@i2g2000yqm.googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sun, 11 Sep 2011 22:28:29 +0000 (UTC) Injection-Info: mx04.eternal-september.org; posting-host="f8igmItKsWs6nM5YanFxAA"; logging-data="25638"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19B+sc7TuENSlw0BPzcUSpg" User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.2) Gecko/20110902 Thunderbird/6.0.2 In-Reply-To: Cancel-Lock: sha1:IkC45fl4x94xzpCnDl1PcrmWQ3Q= Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:7833 On 9/11/2011 5:52 PM, Joshua Cranmer wrote: > On 9/11/2011 4:33 PM, bob wrote: >> Anyone know why ASCII char 26 is used in place of a hyphen in UTF-8? > > The US-ASCII encoder only properly encodes characters in the range of > 0-127, i.e., the characters that are present in ASCII. Any other > character is replaced with some sort of substitution character; in this > case, it looks like the charset has chosen to use ^Z as the "I don't > know what this character is" character (I would have guessed '?' > instead, but I suppose they decided to go with the less-commonly used > variant). It makes more sense when you think of 26 not as ^Z, but as SUB. -- Eric Sosman esosman@ieee-dot-org.invalid