Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #26025

Re: replace extended characters

From Lew <noone@lewscanon.com>
Newsgroups comp.lang.java.programmer
Subject Re: replace extended characters
Date 2011-02-10 19:18 -0500
Organization albasani.net
Message-ID <ij1v7l$jel$1@news.albasani.net> (permalink)
References <15bd3363-c781-487b-98d5-2243eff7cc8f@24g2000yqa.googlegroups.com>

Show all headers | View raw


VIDEO MAN wrote:
> I'm trying to create a java [sic] utility that will read in a file that may
> or may not contain extended ascii [sic] characters and replace these
> characters with a predetermined character [sic] e.g. [sic] replace é with e and
> then write the amended file out.
>
> How would people suggest I approach this from an efficiency  point of
> view given that the input files could be pretty large?
>
> Any guidance appreciated.

Read from a BufferedReader.  Write to a BufferedWriter.  Process one character 
at a time.  It won't be efficient unless you are guaranteed a limited 
character-set input.  The Unicode character space is on the order of 2^24 
characters large.  "Extended ASCII" is a very tiny subset of that, and also 
depends on the character encoding.

If you are certain that the set of possible input characters is small, and 
those you wish to substitute even smaller, you can use a lookup table.  Use a 
'Map<Character,Character>' (will choke on supplementary code points) for 
those, and only those, you wish to substitute.  If the key is absent, pass the 
source character through unchanged.  If present, replace with the associated 
value.

-- 
Lew
Ceci n'est pas une fenêtre.
.___________.
|###] | [###|
|##/  | *\##|
|#/ * |   \#|
|#----|----#|
||    |  * ||
|o *  |    o|
|_____|_____|
|===========|

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

replace extended characters VIDEO MAN <bigmush7@googlemail.com> - 2011-02-10 15:33 -0800
  Re: replace extended characters RedGrittyBrick <RedGrittyBrick@spamweary.invalid> - 2011-02-11 15:31 +0000
  Re: replace extended characters Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-11 18:40 -0500
  Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 16:57 -0800
    Re: replace extended characters v_borchert@despammed.com (Volker Borchert) - 2011-02-12 05:58 +0000
  Re: replace extended characters Arne Vajhøj <arne@vajhoej.dk> - 2011-02-10 21:52 -0500
  Re: replace extended characters Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-10 19:37 -0500
  Re: replace extended characters Lew <noone@lewscanon.com> - 2011-02-10 19:18 -0500
    Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 16:55 -0800
  Re: replace extended characters Owen Jacobson <angrybaldguy@gmail.com> - 2011-02-11 22:15 -0500
  Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 15:07 -0800
    Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 15:11 -0800

csiph-web