Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #25475 > unrolled thread

Re: replace extended characters

Started byv_borchert@despammed.com (Volker Borchert)
First post2011-02-11 03:24 +0000
Last post2011-02-12 05:58 +0000
Articles 4 — 3 participants

Back to article view | Back to comp.lang.java.programmer

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: replace extended characters v_borchert@despammed.com (Volker Borchert) - 2011-02-11 03:24 +0000
    Re: replace extended characters RedGrittyBrick <RedGrittyBrick@spamweary.invalid> - 2011-02-11 15:31 +0000
    Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 16:57 -0800
      Re: replace extended characters v_borchert@despammed.com (Volker Borchert) - 2011-02-12 05:58 +0000

#25475 — Re: replace extended characters

Fromv_borchert@despammed.com (Volker Borchert)
Date2011-02-11 03:24 +0000
SubjectRe: replace extended characters
Message-ID<ij2a4r$llo$1@Gaia.teknon.de>
VIDEO MAN wrote:
> Hi,
> 
> I'm trying to create a java utility that will read in a file that may
> or may not contain extended ascii characters and replace these
> characters with a predetermined character e.g. replace é with e and
> then write the amended file out.
> 
> How would people suggest I approach this from an efficiency  point of
> view given that the input files could be pretty large?
> 
> Any guidance appreciated.

Don't reinvent the wheel, use tr

-- 

"I'm a doctor, not a mechanic." Dr Leonard McCoy <mccoy@ncc1701.starfleet.fed>
"I'm a mechanic, not a doctor." Volker Borchert  <v_borchert@despammed.com>

[toc] | [next] | [standalone]


#25763

FromRedGrittyBrick <RedGrittyBrick@spamweary.invalid>
Date2011-02-11 15:31 +0000
Message-ID<4d555663$0$2510$db0fefd9@news.zen.co.uk>
In reply to#25475
On 11/02/2011 03:24, Volker Borchert wrote:
> VIDEO MAN wrote:
>> Hi,
>>
>> I'm trying to create a java utility that will read in a file that may
>> or may not contain extended ascii characters and replace these
>> characters with a predetermined character e.g. replace é with e and
>> then write the amended file out.
>>
>> How would people suggest I approach this from an efficiency  point of
>> view given that the input files could be pretty large?
>>
>> Any guidance appreciated.
>
> Don't reinvent the wheel, use tr
>

Or at least consider iconv or recode. At a minimum I'd see how they 
handle the mapping (if any).

The term "Extended ASCII" covers several handfuls of different 8-bit 
character-set/encodings. Many of which include é. If the file "may or 
may not" contain "extended" characters you wil at a minimum have to 
assume a specific encoding. If this assumption is wrong you may 
translate Ú to e by mistake.

-- 
RGB

[toc] | [prev] | [next] | [standalone]


#25943

FromRoedy Green <see_website@mindprod.com.invalid>
Date2011-02-11 16:57 -0800
Message-ID<3mmbl65838teo7efmij0f5r1hqcs5189cl@4ax.com>
In reply to#25475
On 11 Feb 2011 03:24:11 GMT, v_borchert@despammed.com (Volker
Borchert) wrote, quoted or indirectly quoted someone who said :

>Don't reinvent the wheel, use tr

Does that not presume Unix? or is there a decent Java/Windows
implementation?

There is also native2ascii see
http://mindprod.com/jgloss/native2ascii.html

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Refactor early. If you procrastinate, you will have
even more code to adjust based on the faulty design.
.

[toc] | [prev] | [next] | [standalone]


#26120

Fromv_borchert@despammed.com (Volker Borchert)
Date2011-02-12 05:58 +0000
Message-ID<ij57in$h9f$1@Gaia.teknon.de>
In reply to#25943
Roedy Green wrote:
> On 11 Feb 2011 03:24:11 GMT, v_borchert@despammed.com (Volker
> Borchert) wrote, quoted or indirectly quoted someone who said :
> 
> >Don't reinvent the wheel, use tr
> 
> Does that not presume Unix? or is there a decent Java/Windows
> implementation?

Cygwin works for me.

-- 

"I'm a doctor, not a mechanic." Dr Leonard McCoy <mccoy@ncc1701.starfleet.fed>
"I'm a mechanic, not a doctor." Volker Borchert  <v_borchert@despammed.com>

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web