Groups | Search | Server Info | Login | Register
Groups > comp.lang.scheme > #6561
| From | tpeplt <tpeplt@gmail.com> |
|---|---|
| Newsgroups | comp.lang.lisp, comp.lang.scheme |
| Subject | Re: Tokenizer (Re: Most efficient way to read words from string.) |
| Date | 2025-08-26 15:02 -0400 |
| Organization | A noiseless patient Spider |
| Message-ID | <87ldn5zz2k.fsf@gmail.com> (permalink) |
| References | <108jta2$3uj44$1@dont-email.me> |
Cross-posted to 2 groups.
>> (defun string-split (str &optional (separator #\Space))
>> "Splits the string STR at each SEPARATOR character occurrence.
>> The resulting substrings are collected into a list which is returned.
>> A SEPARATOR at the beginning or at the end of the string STR results
>> in an empty string in the first or last position of the list
>> returned."
>> (declare (type string str)
>> (type character separator))
>> (loop for start = 0 then (1+ end)
>> for end = (position separator str :start 0)
>> then (position separator str :start start)
>> for substr = (subseq str start end)
>> then (subseq str start end)
>> collect substr into result
>> when (null end) do (return result)
>> ))
>
1. With a for-as-equals-then clause, if the ‘then’ FORM2 is
omitted, then the FORM1 is evaluated for each iteration.
So, if FORM2 is identical to FORM1, it may be omitted.
See:
http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/sec_6-1-2-1-4.html
2. ‘position’ returns NIL if the item is not found in the
sequence, so it is necessary to check the result of a call
to this function before using it with arithmetic functions.
Here is an alternative version of the function
‘string-split’:
(defun string-split (str &optional (separator #\Space))
"Splits the string STR at each SEPARATOR character occurrence.
The resulting substrings are collected into a list which is returned.
A SEPARATOR at the beginning or at the end of the string STR results
in an empty string in the first or last position of the list
returned."
(declare (type string str)
(type character separator))
(loop
with end = 0
for start = (position separator str
:start end :test-not #'equal)
if start
do (setf end (or (position separator str :start start)
(length str)))
and collect (subseq str start end)
until (null start)))
Some tests:
(string-split " ")
;;=> NIL
(string-split "stringtolist")
;;=> ("stringtolist")
(string-split "stringtolist ")
;;=> ("stringtolist")
(string-split " stringtolist")
;;=> ("stringtolist")
(string-split " stringtolist ")
;;=> ("stringtolist")
(string-split " string to list ")
;;=> ("string" "to" "list")
(string-split " string to list")
;;=> ("string" "to" "list")
(string-split "string to list ")
;;=> ("string" "to" "list")
>
> Gauche Scheme
>
> "!" is similar to "do".
>
> (define (tokenize str separators)
> (let ((seps (string->list separators)))
> (! (ch :in (reverse (cons (car seps) (string->list str)))
> := sep (member ch seps)
> r cons (list->string tmp) :if (and (pair? tmp) sep)
> tmp '() (if sep '() (cons ch tmp)))
> #f r)))
>
> (tokenize " foo; bar, baz, and ... zap" " ,;.")
> ===>
> ("foo" "bar" "baz" "and" "zap")
>
3. Here is a generalized version of ‘string-split’,
supporting a string of separators as an argument and using
Common Lisp’s ‘position-if’/‘position-if-not’ functions in
place of ‘position’:
```
(defun string-split (str &optional (separators " ;,."))
"Splits the string STR at each SEPARATOR character occurrence.
The resulting substrings are collected into a list which is returned.
A SEPARATOR at the beginning or at the end of the string STR results
in an empty string in the first or last position of the list
returned."
(declare (type string str)
(type string separators))
(loop
with bag = (coerce separators 'list) and end = 0
for start = (position-if-not (lambda (ch) (member ch bag)) str
:start end)
if start
do (setf end (or (position-if (lambda (ch) (member ch bag)) str
:start start)
(length str)))
and collect (subseq str start end)
until (null start)))
```
Some tests:
(string-split " " "- ")
;;=> NIL
(string-split " " "-;")
;;=> (" ")
(string-split "string-to-list" "- ")
;;=> ("string" "to" "list")
(string-split "string-to-list " "- ")
;;=> ("string" "to" "list")
(string-split " string-to-list" "- ")
;;=> ("string" "to" "list")
(string-split " string-to-list " "- ")
;;=> ("string" "to" "list")
(string-split " string-to-list " "- ")
;;=> ("string" "to" "list")
(string-split " string-to-list" "- ")
;;=> ("string" "to" "list")
(string-split "string-to-list " "- ")
;;=> ("string" "to" "list")
--
The lyf so short, the craft so long to lerne.
- Geoffrey Chaucer, The Parliament of Birds.
Back to comp.lang.scheme | Previous | Next — Previous in thread | Find similar
Re: Tokenizer (Re: Most efficient way to read words from string.) "B. Pym" <Nobody447095@here-nor-there.org> - 2025-08-26 09:04 +0000 Re: Tokenizer (Re: Most efficient way to read words from string.) tpeplt <tpeplt@gmail.com> - 2025-08-26 15:02 -0400
csiph-web