Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #23016
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!us.feeder.erje.net!137.226.231.214.MISMATCH!newsfeed.fsmpi.rwth-aachen.de!weretis.net!feeder4.news.weretis.net!feeder2.ecngs.de!ecngs!feeder.ecngs.de!Xl.tags.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!local2.nntp.ams.giganews.com!nntp.bt.com!news.bt.com.POSTED!not-for-mail |
|---|---|
| NNTP-Posting-Date | Thu, 21 Mar 2013 04:31:36 -0500 |
| Date | Thu, 21 Mar 2013 09:31:35 +0000 |
| From | lipska the kat <"nospam at neversurrender dot co dot uk"> |
| Organization | Trollbusters 3 |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120410 Thunderbird/11.0.1 |
| MIME-Version | 1.0 |
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: email stop words |
| References | <kidh9f$57s$1@dont-email.me> <514a50a0$0$32115$14726298@news.sunsite.dk> <kidjmt$f8j$1@dont-email.me> |
| In-Reply-To | <kidjmt$f8j$1@dont-email.me> |
| Content-Type | text/plain; charset=ISO-8859-1; format=flowed |
| Content-Transfer-Encoding | 8bit |
| Message-ID | <AOSdnVU1JL7lTtfMnZ2dnUVZ7o2dnZ2d@bt.com> (permalink) |
| Lines | 30 |
| X-Usenet-Provider | http://www.giganews.com |
| X-AuthenticatedUsername | NoAuthUser |
| X-Trace | sv3-3UCwWmcaAvQcpnrro/kHzr3AtkIO46Pkv/tXPvwmkW+FQmXe11xV1joXFV+2bNYu8+mNDVs5ALTY8Ay!mEM2fFYrDkhydhXvXhA6YUo5BeAzrWb84zWn6u7kSDiDqSterAX1oNFCTbtgbTWoC9VUaXVLLSc= |
| X-Complaints-To | abuse@btinternet.com |
| X-DMCA-Complaints-To | abuse@btinternet.com |
| X-Abuse-and-DMCA-Info | Please be sure to forward a copy of ALL headers |
| X-Abuse-and-DMCA-Info | Otherwise we will be unable to process your complaint properly |
| X-Postfilter | 1.3.40 |
| X-Original-Bytes | 2179 |
| Xref | csiph.com comp.lang.java.programmer:23016 |
Show key headers only | View raw
On 21/03/13 00:21, markspace wrote: > On 3/20/2013 5:13 PM, Arne Vajhøj wrote: >> >> I would have discarded special characters: >> >=-() >> up front. > > It's not nearly that sophisticated yet. In time, it will find actual > words. Right now I'm just making sure I can read the whole mess in a > timely fashion. You should have seen how long it took before ByteBuffers > and using Regex. This is just a suggestion relating to speed of indexing and may not be suitable but have you ever used lucene. http://lucene.apache.org I've used it to index and search text databases for several years now. I'm always surprised at how fast the indexing is. It's a pretty sophisticated piece of kit though and the code can get quite verbose. Like I say, just a suggestion WRT speed lipska -- Lipska the Kat©: Troll hunter, sandbox destroyer and farscape dreamer of Aeryn Sun
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
email stop words markspace <markspace@nospam.nospam> - 2013-03-20 16:40 -0700
Re: email stop words Arne Vajhøj <arne@vajhoej.dk> - 2013-03-20 20:13 -0400
Re: email stop words Lew <lewbloch@gmail.com> - 2013-03-20 17:21 -0700
Re: email stop words Arne Vajhøj <arne@vajhoej.dk> - 2013-03-20 20:41 -0400
Re: email stop words markspace <markspace@nospam.nospam> - 2013-03-20 17:21 -0700
Re: email stop words lipska the kat <"nospam at neversurrender dot co dot uk"> - 2013-03-21 09:31 +0000
Re: email stop words Joshua Cranmer 🐧 <Pidgeot18@verizon.invalid> - 2013-03-20 20:51 -0500
Re: email stop words markspace <markspace@nospam.nospam> - 2013-03-20 19:41 -0700
Re: email stop words Jukka Lahtinen <jtfjdehf@hotmail.com.invalid> - 2013-03-21 08:29 +0200
Re: email stop words Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-03-21 09:24 -0400
Re: email stop words markspace <markspace@nospam.nospam> - 2013-03-21 09:33 -0700
Re: email stop words Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-03-21 14:15 -0400
Re: email stop words Joerg Meier <joergmmeier@arcor.de> - 2013-03-21 14:29 +0100
Re: email stop words Joshua Cranmer 🐧 <Pidgeot18@verizon.invalid> - 2013-03-21 15:38 -0500
Re: email stop words markspace <markspace@nospam.nospam> - 2013-03-21 16:49 -0700
Re: email stop words Fredrik Jonson <fredrik@jonson.org> - 2013-03-21 06:58 +0000
csiph-web