Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > microsoft.public.scripting.vbscript > #11373

Re: script to search list of strings in files/directories

From "Mayayana" <mayayana@invalid.nospam>
Newsgroups microsoft.public.scripting.vbscript
Subject Re: script to search list of strings in files/directories
Date 2016-09-01 12:45 -0400
Organization A noiseless patient Spider
Message-ID <nq9m0f$lvk$1@dont-email.me> (permalink)
References (5 earlier) <XnsA675E89F7DE68eejj99@194.109.6.166> <nq7u9e$ie7$1@dont-email.me> <nq84dk$1f5$1@dont-email.me> <nq9ch8$ikr$1@dont-email.me> <nq9gbg$19l$1@dont-email.me>

Show all headers | View raw


"Dave "Crash" Dummy" wrote

| > | >   The case-insensitive search takes more time. With
| > | > a single operation it's not disernible, but with hundreds
| > | > of calls using case-sensitive search can speed things up.
|
| I don't know that normalizing the string prior to running the InStr
| function is any faster. If the case insensitive option is selected the
| function is going to normalize the string before doing the search,
| anyway. It may even be slower to normalize the string in a separate
| operation before running InStr.
|

  Your speculation seems reasonable, but Microsoft
apparently didn't think the same way. I think what
the InStr function probably does is to search numerically.
So a CS search for "A" will look for byte 65. A non-CS
search will look for 65 or 97. Then that will get less
efficient as the string gets longer and each character
adds a dual search. If 65 or 97 is found then look for
66 or 98. If any of those 4 combinations are found then
look for 67 or 99. Etc.

  Here's a simple test:

400 iterations of searching a text file, 573 KB.
The nonsense word "AggyDaggy" (to ensure uniqueness)
was added near the end and then a search was run.

-------------------------------------------
Dim FSO, Arg, TS, s1, x1, x2, i, Ret

Arg = WScript.Arguments(0)
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TS = FSO.OpenTextFile(Arg, 1)
  s1 = TS.ReadAll
  TS.Close
Set TS = Nothing

x1 = Timer
  s1 = UCase(s1)
For i = 1 to 400
  Ret = InStr(1, s1, "AGGYDAGGY", 0)
Next
x2 = Timer

MsgBox x2 - x1
--------------------------------------

Case sensitive:                          .234375 seconds
UCase followed by case sensitive: .53125 seconds
non-case sensitive:                    3.875 seconds

 I've consistently found that two things can greatly
increase the speed of scripts that have to do extensive
work with strings:

1) non-case sensitive string search using UCase.
2) Build strings with an array rather than concatenation.

   The latter method uses an array member for each
concatenation. Instead of doing s = s & "more text"
it does A(x) = "more text". Then it uses Join at the
end. I actually got that idea from Matthew Curland's book.
He was one of the original VB6 designers and pointed
out that Join walks the whole array, measuring the
content, then allocates a single string to accomodate
it all. Concatenating must allocate a new string every
time, so adding "more" to a 3 MB ANSI string requires
allocating a new string of 3 MB + 4 bytes. Memory
allocation takes a lot more time than calculations,
and slows as it gets bigger.

  In typical usage it doesn't much matter. One InStr call
will be insignificant no matter which way it's done. A half dozen
concatenations don't cost much. But it's not unusual to need
to optimize. Using the two methods above seems like more
work but can actually cut a lot of time out of operations.
Also, Replace is extremely slow, probably because of the
same concatenation problem. It's actually often much faster
to write a complex tokenizing routine than to run a few
Replace operations. 

Back to microsoft.public.scripting.vbscript | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

script to search list of strings in files/directories zmau1962@gmail.com - 2016-08-30 00:29 -0700
  Re: script to search list of strings in files/directories "Evertjan." <exxjxw.hannivoort@inter.nl.net> - 2016-08-30 10:52 +0200
  Re: script to search list of strings in files/directories "Mayayana" <mayayana@invalid.nospam> - 2016-08-31 14:22 -0400
    Re: script to search list of strings in files/directories Mau Z <zmau1962@gmail.com> - 2016-08-31 13:05 -0700
      Re: script to search list of strings in files/directories "Evertjan." <exxjxw.hannivoort@inter.nl.net> - 2016-08-31 22:16 +0200
        Re: script to search list of strings in files/directories "Mayayana" <mayayana@invalid.nospam> - 2016-08-31 16:39 -0400
          Re: script to search list of strings in files/directories "Evertjan." <exxjxw.hannivoort@inter.nl.net> - 2016-08-31 22:52 +0200
            Re: script to search list of strings in files/directories "Mayayana" <mayayana@invalid.nospam> - 2016-08-31 20:54 -0400
              Re: script to search list of strings in files/directories "Dave \"Crash\" Dummy" <invalid@invalid.invalid> - 2016-08-31 22:39 -0400
                Re: script to search list of strings in files/directories "Mayayana" <mayayana@invalid.nospam> - 2016-09-01 10:03 -0400
                Re: script to search list of strings in files/directories "Dave \"Crash\" Dummy" <invalid@invalid.invalid> - 2016-09-01 11:09 -0400
                Re: script to search list of strings in files/directories "Mayayana" <mayayana@invalid.nospam> - 2016-09-01 12:45 -0400
                Re: script to search list of strings in files/directories "Dave \"Crash\" Dummy" <invalid@invalid.invalid> - 2016-09-01 13:41 -0400
                Re: script to search list of strings in files/directories "Mayayana" <mayayana@invalid.nospam> - 2016-09-01 15:19 -0400
      Re: script to search list of strings in files/directories "R.Wieser" <address@not.available> - 2016-09-01 09:25 +0200
        Re: script to search list of strings in files/directories Mau Z <zmau1962@gmail.com> - 2016-09-05 07:36 -0700
  Re: script to search list of strings in files/directories "Dave \"Crash\" Dummy" <invalid@invalid.invalid> - 2016-09-01 14:35 -0400
    Re: script to search list of strings in files/directories Mau Z <zmau1962@gmail.com> - 2016-09-05 07:21 -0700
      Re: script to search list of strings in files/directories "Dave \"Crash\" Dummy" <invalid@invalid.invalid> - 2016-09-05 14:43 -0400
      Re: script to search list of strings in files/directories "Dave \"Crash\" Dummy" <invalid@invalid.invalid> - 2016-09-05 14:54 -0400
        Re: script to search list of strings in files/directories "Dave \"Crash\" Dummy" <invalid@invalid.invalid> - 2016-09-05 15:41 -0400
          Re: script to search list of strings in files/directories Mau Z <zmau1962@gmail.com> - 2016-09-05 13:07 -0700
          Re: script to search list of strings in files/directories Mau Z <zmau1962@gmail.com> - 2016-09-05 13:09 -0700
            Re: script to search list of strings in files/directories "Dave \"Crash\" Dummy" <invalid@invalid.invalid> - 2016-09-06 11:33 -0400
  Re: script to search list of strings in files/directories Dr J R Stockton <reply1600@merlyn.demon.co.uk.invalid> - 2016-09-07 23:43 +0100

csiph-web