X-Received: by 10.224.189.78 with SMTP id dd14mr4598970qab.0.1363825269706; Wed, 20 Mar 2013 17:21:09 -0700 (PDT) X-Received: by 10.50.109.228 with SMTP id hv4mr168879igb.2.1363825269621; Wed, 20 Mar 2013 17:21:09 -0700 (PDT) Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-1.proxad.net!198.186.194.247.MISMATCH!news-out.readnews.com!transit3.readnews.com!209.85.216.88.MISMATCH!dd2no4439313qab.0!news-out.google.com!k8ni5855qas.0!nntp.google.com!dd2no4439311qab.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.java.programmer Date: Wed, 20 Mar 2013 17:21:09 -0700 (PDT) In-Reply-To: <514a50a0$0$32115$14726298@news.sunsite.dk> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=69.28.149.29; posting-account=CP-lKQoAAAAGtB5diOuGlDQk0jIwmH0T NNTP-Posting-Host: 69.28.149.29 References: <514a50a0$0$32115$14726298@news.sunsite.dk> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <5ca94924-be65-45ee-9e0d-38afde16a808@googlegroups.com> Subject: Re: email stop words From: Lew Injection-Date: Thu, 21 Mar 2013 00:21:09 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.lang.java.programmer:22997 Arne Vajh=F8j wrote: > I would have discarded special characters: >=20 > >=3D-() >=20 > up front. You need a hyphen to spell words like "laissez-faire" and "higgledy-piggled= y". Apostrophe is a "special" character but very common in English words (most = possessives, for example). Plus-sign appears in, for example, "Google+" and "G+" and "+1". So you need to be judicious in your definition of "special". --=20 Lew