Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #25867

Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals?

From Roedy Green <see_website@mindprod.com.invalid>
Newsgroups comp.lang.java.programmer
Subject Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals?
Date 2011-02-04 15:22 -0800
Organization Canadian Mind Products
Message-ID <4p1pk6dv6fg1firm1hvvh3jqaga6l69rib@4ax.com> (permalink)
References (2 earlier) <iig84e$uqu$1@lust.ihug.co.nz> <4d4c2019$0$23753$14726298@news.sunsite.dk> <iihbuo$cqo$1@news.eternal-september.org> <iihhdo$emc$1@news.eternal-september.org> <alpine.DEB.1.10.1102042036190.11442@urchin.earth.li>

Show all headers | View raw


On Fri, 4 Feb 2011 21:30:57 +0000, Tom Anderson <twic@urchin.earth.li>
wrote, quoted or indirectly quoted someone who said :

>I am, however, at a loss to suggest a practical alternative!

What might happen is strings are nominally 32-bit.

You could probably come up with a very rapid compression scheme,
similar to UTF-8 but with a bit more compression, that could be
applied to strings at garbage collection time if they have not been
referenced since the last GC sweep.

String are immutable.  This admits some other flavours of
"compression".

If the high three bytes of the string are 0, store the string
UNCOMPRESSED, as a string of bytes.  All the indexOf indexing
arithmetic works identically.  This behaviour is hidden inside the
JVM. The String class knows nothing about it. It is an implementation
detail of 32-bit strings.

If the high two bytes of the string are 0, store the string
uncompressed as a string of unsigned shorts.

if there are any one bits in the high 2 byte, store as a string of
unsigned ints.

Strings are what you gobble up your RAM with.  If we start supporting
32 bit chars, we have to do something to compensate for the doubling
of RAM use.
 

Short lived strings would still be 32-bit.  They would only be
converted to the other forms if they have been sitting around for a
while.  Interned strings would be immediately converted to canonical
form.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
To err is human, but to really foul things up requires a computer.
~ Farmer's Almanac
It is breathtaking how a misplaced comma in a computer program can
shred megabytes of data in seconds.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: Why No Supplemental Characters In Character Literals? Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-02-04 19:59 +1300
  Re: Why No Supplemental Characters In Character Literals? "Mike Schilling" <mscottschilling@hotmail.com> - 2011-02-04 17:02 -0800
    Re: Why No Supplemental Characters In Character Literals? Ken Wesson <kwesson@gmail.com> - 2011-02-05 04:21 +0100
  Re: Why No Supplemental Characters In Character Literals? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-04 19:05 -0500
    Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 19:56 -0500
    Re: Why No Supplemental Characters In Character Literals? "Mike Schilling" <mscottschilling@hotmail.com> - 2011-02-04 16:37 -0800
  Re: Why No Supplemental Characters In Character Literals? "Mike Schilling" <mscottschilling@hotmail.com> - 2011-02-04 00:22 -0800
    Re: Why No Supplemental Characters In Character Literals? Roedy Green <see_website@mindprod.com.invalid> - 2011-02-04 15:03 -0800
    Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 18:04 -0500
    Re: Why No Supplemental Characters In Character Literals? Lew <noone@lewscanon.com> - 2011-02-04 07:49 -0500
    Re: Why No Supplemental Characters In Character Literals? Lawrence D'Oliveiro <ldo@geek-central.gen.new_zealand> - 2011-02-05 11:26 +1300
  Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-04 19:13 -0500
    Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 20:08 -0500
  Re: Why No Supplemental Characters In Character Literals? Daniele Futtorovic <da.futt.news@laposte.net.invalid> - 2011-02-04 18:37 +0100
    Re: Why No Supplemental Characters In Character Literals? markspace <nospam@nowhere.com> - 2011-02-04 11:27 -0800
  Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-04 17:28 -0500
  Re: Why No Supplemental Characters In Character Literals? "Mike Schilling" <mscottschilling@hotmail.com> - 2011-02-04 09:10 -0800
    Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Roedy Green <see_website@mindprod.com.invalid> - 2011-02-04 15:22 -0800
      Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 18:41 -0500
    Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 18:12 -0500
    Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Tom Anderson <twic@urchin.earth.li> - 2011-02-04 21:30 +0000
      Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Ken Wesson <kwesson@gmail.com> - 2011-02-05 04:25 +0100
    Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 12:33 -0500
    Re: Why No Supplemental Characters In Character Literals? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-04 13:44 -0500
      Re: Why No Supplemental Characters In Character Literals? Roedy Green <see_website@mindprod.com.invalid> - 2011-02-04 15:08 -0800
  Re: Why No Supplemental Characters In Character Literals? Lew <lew@lewscanon.com> - 2011-02-04 12:43 -0800
  Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 10:49 -0500
  Re: Why No Supplemental Characters In Character Literals? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-04 08:04 -0500

csiph-web