Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #7852
| From | bob <bob@coolgroups.com> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: ascii char 26 |
| Date | 2011-09-11 19:12 -0700 |
| Organization | http://groups.google.com |
| Message-ID | <63554bdb-dab4-43e7-b809-5128fd831f3c@m38g2000vbn.googlegroups.com> (permalink) |
| References | <16f8836c-27b9-483b-a71f-61d7d6cfd188@i2g2000yqm.googlegroups.com> <j4jakd$dfl$1@dont-email.me> |
You're right. I messed up, and it was the em dash. It turned into 26
after going thru 'b = html.getBytes("US-ASCII");'
Here's the new code:
public static String convertToAscii(String html) {
html = html.replaceAll("\u2019", "'");
html = html.replaceAll("\u201D", "\"");
html = html.replaceAll("\u201C", "\"");
// mdash
html = html.replaceAll("\u2014", "-");
byte[] b = null;
try {
b = html.getBytes("US-ASCII");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return html;
}
Also, I'm on Android 2.1, so import java.text.Normalizer; doesn't
work.
On Sep 11, 4:52 pm, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:
> On 9/11/2011 4:33 PM, bob wrote:
>
> > Anyone know why ASCII char 26 is used in place of a hyphen in UTF-8?
>
> The US-ASCII encoder only properly encodes characters in the range of
> 0-127, i.e., the characters that are present in ASCII. Any other
> character is replaced with some sort of substitution character; in this
> case, it looks like the charset has chosen to use ^Z as the "I don't
> know what this character is" character (I would have guessed '?'
> instead, but I suppose they decided to go with the less-commonly used
> variant).
>
> My guess is your input is using one of the characters like the minus
> sign, em dash, or perhaps an en dash instead (there may be others),
> which are visually close in appearance to a hyphen but do not share the
> same Unicode codepoint.
>
> --
> Beware of bugs in the above code; I have only proved it correct, not
> tried it. -- Donald E. Knuth
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar
ascii char 26 bob <bob@coolgroups.com> - 2011-09-11 14:33 -0700
Re: ascii char 26 Arne Vajhøj <arne@vajhoej.dk> - 2011-09-11 17:48 -0400
Re: ascii char 26 Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-09-11 16:52 -0500
Re: ascii char 26 Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-09-11 18:28 -0400
Re: ascii char 26 bob <bob@coolgroups.com> - 2011-09-11 19:12 -0700
Re: ascii char 26 Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-09-11 21:25 -0500
Re: ascii char 26 bob <bob@coolgroups.com> - 2011-09-12 01:30 -0700
Re: ascii char 26 Roedy Green <see_website@mindprod.com.invalid> - 2011-09-11 15:25 -0700
Re: ascii char 26 Bent C Dalager <bcd@pvv.ntnu.no> - 2011-09-11 23:18 +0000
Re: ascii char 26 Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-09-11 18:37 -0500
Re: ascii char 26 Retahiv Oopsiscame <roopsisc@gmail.com> - 2011-09-11 16:53 -0700
Re: ascii char 26 Roedy Green <see_website@mindprod.com.invalid> - 2011-09-14 11:55 -0700
csiph-web