Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!us.feeder.erje.net!137.226.231.214.MISMATCH!newsfeed.fsmpi.rwth-aachen.de!eternal-september.org!feeder.eternal-september.org!mx05.eternal-september.org!.POSTED!not-for-mail From: markspace Newsgroups: comp.lang.java.programmer Subject: Re: email stop words Date: Thu, 21 Mar 2013 09:33:12 -0700 Organization: A noiseless patient Spider Lines: 60 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Thu, 21 Mar 2013 16:31:18 +0000 (UTC) Injection-Info: mx05.eternal-september.org; posting-host="fba3415ba68d85d643935af2f52f0b4b"; logging-data="25647"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+DNDY+ULRhRXLr6xHJp5/SAtEqzGwCFNk=" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130307 Thunderbird/17.0.4 In-Reply-To: Cancel-Lock: sha1:j5/R9/ULn0g0jJ0A9iGHqzUkcEQ= Xref: csiph.com comp.lang.java.programmer:23034 On 3/21/2013 6:24 AM, Eric Sosman wrote: > > Integer count = map.get(word); > map.put(word, count == null ? 1 : count + 1); Basically, yes. > > ... and that you switched to something more like > > Integer count = map.get(word); > map.put(word, new Integer(count == null > ? 1 : count.intValue() + 1); > No, I made a Counter with a primitive and a reference to the word: Counter counter = map.get( word ); if( counter == null ) { counter = new Counter(); counter.word = word; counter.count = 1; map.put( word, counter ); } else counter.count++; > If so, the slowdown is probably due to increased memory pressure > and garbage collection: `new' actually creates a new object every Yeah, that's what I thought too. Although since there's only as many Counters as there are Strings (words), I don't get why just making a 2x change would slow the system as horribly as it did. There should be only 4 million Strings and therefore also 4 million Counters. I can't figure out why that would be a problem. > time, while auto-boxing uses (the equivalent of) Integer.valueOf(). > The latter maintains a pool of a couple hundred small-valued Integers > and doles them out whenever needed, using `new' only for un-pooled > values. I think it would be worth it to change the JVM memory parameters from the defaults and see if that makes a difference. Also, any thoughts on the best way to observe a GC that is thrashing? I'm really curious to pin this down to some sort of root cause. I couldn't rule out a coding error somewhere either. > My suggestion would be to implement a Counter class that > wraps a mutable integer value. Then you'd use Thanks, I'll take a look at this when I get a chance. A good suggestion! > Or, you could just go back to auto-boxing. Yes, A-B-A testing works. Going back to auto-boxing restored the previous run times, so I'm fairly certain it's related to memory pressure or something similar.