Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail From: Tim Watts Newsgroups: comp.os.linux.misc Subject: Re: Re (2): Linux utility with reverse index facility? Followup-To: comp.os.linux.misc Date: Mon, 16 May 2011 22:32:54 +0100 Organization: A noiseless patient Spider Lines: 42 Message-ID: <6ph8a8-blt.ln1@squidward.dionic.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit Injection-Info: mx04.eternal-september.org; posting-host="6oIlEBqCjOm0MjsSUEk5CA"; logging-data="22902"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18U3HjlXPwmun2NZgvvRlLrEK45dePkiXs=" User-Agent: KNode/4.4.10 Cancel-Lock: sha1:PwoAXWhTQCoFL+NBsnFUip3SSvQ= Xref: x330-a1.tempe.blueboxinc.net comp.os.linux.misc:1116 no.top.post@gmail.com wrote: > What I want is the KNOWLEDGE, whether *nix designers included > which one would use eg. to find: > "which token/field is 'mat' in the record: > "the cat sat on the mat". Well, if I'm understanding this, then in perl: #!/usr/bin/perl $a="the cat sat on the mat"; # Input @b=split /\s+/, $a; # space(s) as delimeter, make array of tokens %c=map {$_ => $i++} @b; # Make an asociative array (tokenname => index) print $c{'mat'}' # try one index is 0 based, ie 0 is the 1st token Of course - there is a flaw in that program - hint: what happens if you search for the token "the" <- You didn't define what you wanted: a list of indices; the first one; the last one? if the above is too long, you can abbreviate it: perl -e '$a="the cat sat on the mat";$b={map {$_ => $i++} split /\s+/, $a}- >{mat};print $b' (That's one line) This is why I use perl. Don't ask me about awk, sed, bash - I've found it more productive to learn one tool that ha enough functionality to do pretty much everything in one place - OK, not so clever on embedded systems where perl may not be present, but there you go... One thing to note: the first part of split is a regex so you can get arbitrarily fancy on how you tokenise the string. an Unicode - no problem... Cheers Tim -- Tim Watts