Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #7852
| Path | csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!postnews.google.com!m38g2000vbn.googlegroups.com!not-for-mail |
|---|---|
| From | bob <bob@coolgroups.com> |
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: ascii char 26 |
| Date | Sun, 11 Sep 2011 19:12:28 -0700 (PDT) |
| Organization | http://groups.google.com |
| Lines | 52 |
| Message-ID | <63554bdb-dab4-43e7-b809-5128fd831f3c@m38g2000vbn.googlegroups.com> (permalink) |
| References | <16f8836c-27b9-483b-a71f-61d7d6cfd188@i2g2000yqm.googlegroups.com> <j4jakd$dfl$1@dont-email.me> |
| NNTP-Posting-Host | 64.134.125.149 |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| X-Trace | posting.google.com 1315793651 845 127.0.0.1 (12 Sep 2011 02:14:11 GMT) |
| X-Complaints-To | groups-abuse@google.com |
| NNTP-Posting-Date | Mon, 12 Sep 2011 02:14:11 +0000 (UTC) |
| Complaints-To | groups-abuse@google.com |
| Injection-Info | m38g2000vbn.googlegroups.com; posting-host=64.134.125.149; posting-account=v1lx5wkAAAALWYfGBkwkMb2guPF9cW2u |
| User-Agent | G2/1.0 |
| X-Google-Web-Client | true |
| X-Google-Header-Order | HUALESNKRC |
| X-HTTP-UserAgent | Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:6.0.2) Gecko/20100101 Firefox/6.0.2,gzip(gfe) |
| Xref | x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:7852 |
Show key headers only | View raw
You're right. I messed up, and it was the em dash. It turned into 26
after going thru 'b = html.getBytes("US-ASCII");'
Here's the new code:
public static String convertToAscii(String html) {
html = html.replaceAll("\u2019", "'");
html = html.replaceAll("\u201D", "\"");
html = html.replaceAll("\u201C", "\"");
// mdash
html = html.replaceAll("\u2014", "-");
byte[] b = null;
try {
b = html.getBytes("US-ASCII");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return html;
}
Also, I'm on Android 2.1, so import java.text.Normalizer; doesn't
work.
On Sep 11, 4:52 pm, Joshua Cranmer <Pidgeo...@verizon.invalid> wrote:
> On 9/11/2011 4:33 PM, bob wrote:
>
> > Anyone know why ASCII char 26 is used in place of a hyphen in UTF-8?
>
> The US-ASCII encoder only properly encodes characters in the range of
> 0-127, i.e., the characters that are present in ASCII. Any other
> character is replaced with some sort of substitution character; in this
> case, it looks like the charset has chosen to use ^Z as the "I don't
> know what this character is" character (I would have guessed '?'
> instead, but I suppose they decided to go with the less-commonly used
> variant).
>
> My guess is your input is using one of the characters like the minus
> sign, em dash, or perhaps an en dash instead (there may be others),
> which are visually close in appearance to a hyphen but do not share the
> same Unicode codepoint.
>
> --
> Beware of bugs in the above code; I have only proved it correct, not
> tried it. -- Donald E. Knuth
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar
ascii char 26 bob <bob@coolgroups.com> - 2011-09-11 14:33 -0700
Re: ascii char 26 Arne Vajhøj <arne@vajhoej.dk> - 2011-09-11 17:48 -0400
Re: ascii char 26 Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-09-11 16:52 -0500
Re: ascii char 26 Eric Sosman <esosman@ieee-dot-org.invalid> - 2011-09-11 18:28 -0400
Re: ascii char 26 bob <bob@coolgroups.com> - 2011-09-11 19:12 -0700
Re: ascii char 26 Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-09-11 21:25 -0500
Re: ascii char 26 bob <bob@coolgroups.com> - 2011-09-12 01:30 -0700
Re: ascii char 26 Roedy Green <see_website@mindprod.com.invalid> - 2011-09-11 15:25 -0700
Re: ascii char 26 Bent C Dalager <bcd@pvv.ntnu.no> - 2011-09-11 23:18 +0000
Re: ascii char 26 Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-09-11 18:37 -0500
Re: ascii char 26 Retahiv Oopsiscame <roopsisc@gmail.com> - 2011-09-11 16:53 -0700
Re: ascii char 26 Roedy Green <see_website@mindprod.com.invalid> - 2011-09-14 11:55 -0700
csiph-web