Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!.POSTED!not-for-mail From: supercalifragilisticexpialadiamaticonormalizeringelimatisticantations Newsgroups: comp.lang.java.programmer Subject: Re: contains Date: Thu, 08 Sep 2011 20:14:26 -0400 Organization: supercalifragilisticexpialadiamaticonormalizeringelimatisticantations Lines: 13 Message-ID: References: <13987de0-042f-45e7-8279-25e9f7bcfb0e@glegroupsg2000goo.googlegroups.com> <3n4aq.6818$GV2.28@newsfe20.iad> <7bd53b6f-ed95-4f77-995e-a179f4f30ad0@glegroupsg2000goo.googlegroups.com> <0Q8aq.1219$tT1.1195@newsfe21.iad> NNTP-Posting-Host: sACVE1W5aZWPOUAkR9itHA.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: WinVN 0.99.12z (x86 32bit) X-Notice: Filtered by postfilter v. 0.8.2 Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:7740 On 08/09/2011 3:37 PM, Arved Sandstrom wrote: > I'm not exactly advising any programmer to deal with just ASCII; what I > am saying here is that if you know that your text is ASCII text (*way* > more common than you make out) that lowercasing and uppercasing in this > particular situation is a potential approach. By ASCII text I still mean > Unicode; simply the ASCII subset thereof. Not only that -- if everything is passed through .toLowerCase().toUpperCase() then the input set of strings gets projected down onto a particular set of canonical representations. Some stuff will get conflated, but I think they amount only to alternative spellings of the same thing -- so finding matches among them does amount to there being substrings in common among the original inputs.