Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #26205

Re: replace extended characters

From Roedy Green <see_website@mindprod.com.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: replace extended characters
Date 2011-02-11 15:11 -0800
Organization Canadian Mind Products
Message-ID <0bgbl698d19hku2vlldf5rldbsebis933u@4ax.com> (permalink)
References <15bd3363-c781-487b-98d5-2243eff7cc8f@24g2000yqa.googlegroups.com> <i4gbl6houh7bon6rvfjii3rdel829ql2hi@4ax.com>

Show all headers | View raw


On Fri, 11 Feb 2011 15:07:22 -0800, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>
>Have at look at http://mindprod.com/products1.html#ENTITIES
>
>It includes a program called Entify that finds awkward chars and
>replaces them with &xxxx; entities in a set of files. There is also a
>program that does the reverse, DeEntify.
>
>You could use the code almost as is and simply modify the table of
>entities with your unaccented versions of the chars.

My version reads the entire file into RAM in one I/O.  You could
modify it to read one line of a file at it a time. For the code to do
that talk to http://mindprod.com/applet/fileio.html 

By making whacking huge buffers, you can ensure the bottleneck is the
CPU.  I use a big switch statement.  For extra speed you could use an
array lookup of the replacement string for each char.  The
compiler/JVM  is not all that clever about generating code for switch
statements.
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
Refactor early. If you procrastinate, you will have
even more code to adjust based on the faulty design.
.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Find similar


Thread

replace extended characters VIDEO MAN <bigmush7@googlemail.com> - 2011-02-10 15:33 -0800
  Re: replace extended characters RedGrittyBrick <RedGrittyBrick@spamweary.invalid> - 2011-02-11 15:31 +0000
  Re: replace extended characters Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2011-02-10 21:27 -0400
    Re: replace extended characters Arne Vajhøj <arne@vajhoej.dk> - 2011-02-10 21:42 -0500
    Re: replace extended characters Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-02-11 15:35 +1300
    Re: replace extended characters Lew <noone@lewscanon.com> - 2011-02-10 21:29 -0500
  Re: replace extended characters Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-11 18:40 -0500
  Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 16:57 -0800
    Re: replace extended characters v_borchert@despammed.com (Volker Borchert) - 2011-02-12 05:58 +0000
  Re: replace extended characters Arne Vajhøj <arne@vajhoej.dk> - 2011-02-10 21:52 -0500
  Re: replace extended characters Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-10 19:37 -0500
  Re: replace extended characters Lew <noone@lewscanon.com> - 2011-02-10 19:18 -0500
    Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 16:55 -0800
  Re: replace extended characters Owen Jacobson <angrybaldguy@gmail.com> - 2011-02-11 22:15 -0500
  Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 15:07 -0800
    Re: replace extended characters Roedy Green <see_website@mindprod.com.invalid> - 2011-02-11 15:11 -0800

csiph-web