Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #26171
| Date | 2011-02-04 18:41 -0500 |
|---|---|
| From | Arne Vajhøj <arne@vajhoej.dk> |
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? |
| References | (3 earlier) <4d4c2019$0$23753$14726298@news.sunsite.dk> <iihbuo$cqo$1@news.eternal-september.org> <iihhdo$emc$1@news.eternal-september.org> <alpine.DEB.1.10.1102042036190.11442@urchin.earth.li> <4p1pk6dv6fg1firm1hvvh3jqaga6l69rib@4ax.com> |
| Message-ID | <4d4c8ea6$0$23758$14726298@news.sunsite.dk> (permalink) |
| Organization | SunSITE.dk - Supporting Open source |
On 04-02-2011 18:22, Roedy Green wrote: > On Fri, 4 Feb 2011 21:30:57 +0000, Tom Anderson<twic@urchin.earth.li> > wrote, quoted or indirectly quoted someone who said : > >> I am, however, at a loss to suggest a practical alternative! > > What might happen is strings are nominally 32-bit. > > You could probably come up with a very rapid compression scheme, > similar to UTF-8 but with a bit more compression, that could be > applied to strings at garbage collection time if they have not been > referenced since the last GC sweep. > > String are immutable. This admits some other flavours of > "compression". > > If the high three bytes of the string are 0, store the string > UNCOMPRESSED, as a string of bytes. All the indexOf indexing > arithmetic works identically. This behaviour is hidden inside the > JVM. The String class knows nothing about it. It is an implementation > detail of 32-bit strings. > > If the high two bytes of the string are 0, store the string > uncompressed as a string of unsigned shorts. > > if there are any one bits in the high 2 byte, store as a string of > unsigned ints. > > Strings are what you gobble up your RAM with. If we start supporting > 32 bit chars, we have to do something to compensate for the doubling > of RAM use. > > > Short lived strings would still be 32-bit. They would only be > converted to the other forms if they have been sitting around for a > while. Interned strings would be immediately converted to canonical > form. indexOf works fine with compression, but substring and charAt becomes rather expensive. Arne
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar
Re: Why No Supplemental Characters In Character Literals? "Mike Schilling" <mscottschilling@hotmail.com> - 2011-02-04 09:10 -0800
Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Roedy Green <see_website@mindprod.com.invalid> - 2011-02-04 15:22 -0800
Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 18:41 -0500
Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 18:12 -0500
Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Tom Anderson <twic@urchin.earth.li> - 2011-02-04 21:30 +0000
Re: Efficient unicode string implementation was: Re: Why No Supplemental Characters In Character Literals? Ken Wesson <kwesson@gmail.com> - 2011-02-05 04:25 +0100
Re: Why No Supplemental Characters In Character Literals? Arne Vajhøj <arne@vajhoej.dk> - 2011-02-04 12:33 -0500
Re: Why No Supplemental Characters In Character Literals? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2011-02-04 13:44 -0500
Re: Why No Supplemental Characters In Character Literals? Roedy Green <see_website@mindprod.com.invalid> - 2011-02-04 15:08 -0800
csiph-web