Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #26127

Re: replace extended characters

From Roedy Green <see_website@mindprod.com.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: replace extended characters
Date 2011-02-11 15:07 -0800
Organization Canadian Mind Products
Message-ID <i4gbl6houh7bon6rvfjii3rdel829ql2hi@4ax.com> (permalink)
References <15bd3363-c781-487b-98d5-2243eff7cc8f@24g2000yqa.googlegroups.com>

Show all headers | View raw


On Thu, 10 Feb 2011 15:33:39 -0800 (PST), VIDEO MAN
<bigmush7@googlemail.com> wrote, quoted or indirectly quoted someone
who said :

>I'm trying to create a java utility that will read in a file that may
>or may not contain extended ascii characters and replace these
>characters with a predetermined character e.g. replace =E9 with e and
>then write the amended file out.
>
>How would people suggest I approach this from an efficiency  point of
>view given that the input files could be pretty large?

Have at look at http://mindprod.com/products1.html#ENTITIES

It includes a program called Entify that finds awkward chars and
replaces them with &xxxx; entities in a set of files. There is also a
program that does the reverse, DeEntify.

You could use the code almost as is and simply modify the table of
entities with your unaccented versions of the chars.


-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Refactor early. If you procrastinate, you will have
even more code to adjust based on the faulty design.
.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

replace extended characters VIDEO MAN <bigmush7@googlemail.com> - 2011-02-10 15:33 -0800
  Re: replace extended characters RedGrittyBrick <RedGrittyBrick@spamweary.invalid> - 2011-02-11 15:31 +0000
  Re: replace extended characters Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2011-02-10 21:27 -0400
    Re: replace extended characters Arne Vajhøj <arne@vajhoej.dk> - 2011-02-10 21:42 -0500
    Re: replace extended characters Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-02-11 15:35 +1300
    Re: replace extended characters Lew <noone@lewscanon.com> - 2011-02-10 21:29 -0500
  Re: replace extended characters Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-11 18:40 -0500
  Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 16:57 -0800
    Re: replace extended characters v_borchert@despammed.com (Volker Borchert) - 2011-02-12 05:58 +0000
  Re: replace extended characters Arne Vajhøj <arne@vajhoej.dk> - 2011-02-10 21:52 -0500
  Re: replace extended characters Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-10 19:37 -0500
  Re: replace extended characters Lew <noone@lewscanon.com> - 2011-02-10 19:18 -0500
    Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 16:55 -0800
  Re: replace extended characters Owen Jacobson <angrybaldguy@gmail.com> - 2011-02-11 22:15 -0500
  Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 15:07 -0800
    Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 15:11 -0800

csiph-web