Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #83783 > unrolled thread

strtok() implementation

Started byboon <root@localhost>
First post2016-03-13 17:13 +0100
Last post2016-03-15 22:32 +0100
Articles 20 on this page of 39 — 14 participants

Back to article view | Back to comp.lang.c


Contents

  strtok() implementation boon <root@localhost> - 2016-03-13 17:13 +0100
    Re: strtok() implementation Malcolm McLean <malcolm.mclean5@btinternet.com> - 2016-03-13 10:04 -0700
      Re: strtok() implementation boon <root@localhost> - 2016-03-13 18:51 +0100
    Re: strtok() implementation Eric Sosman <esosman@comcast-dot-net.invalid> - 2016-03-13 13:38 -0400
      Re: strtok() implementation boon <root@localhost> - 2016-03-13 19:05 +0100
        Re: strtok() implementation Keith Thompson <kst-u@mib.org> - 2016-03-13 13:50 -0700
          Re: strtok() implementation boon <root@localhost> - 2016-03-13 23:10 +0100
    Re: strtok() implementation boon <root@localhost> - 2016-03-13 21:06 +0100
      Re: strtok() implementation Eric Sosman <esosman@comcast-dot-net.invalid> - 2016-03-13 16:26 -0400
        Re: strtok() implementation boon <root@localhost> - 2016-03-13 22:52 +0100
          Re: strtok() implementation boon <root@localhost> - 2016-03-13 23:25 +0100
            Re: strtok() implementation Ian Collins <ian-news@hotmail.com> - 2016-03-14 15:26 +1300
              Re: strtok() implementation boon <root@localhost.localdomain> - 2016-03-14 12:44 +0100
                Re: strtok() implementation Tim Rentsch <txr@alumni.caltech.edu> - 2016-03-17 08:23 -0700
                  Re: strtok() implementation boon <root@localhost> - 2016-03-18 21:09 +0100
                    Re: strtok() implementation Tim Rentsch <txr@alumni.caltech.edu> - 2016-03-19 14:21 -0700
                      Re: strtok() implementation Randy Howard <rhoward.mx@EverybodyUsesIt.com> - 2016-03-19 16:25 -0500
                        Re: strtok() implementation boon <fred900rbc@gmail.com> - 2016-03-24 13:05 -0700
                        Re: strtok() implementation Tim Rentsch <txr@alumni.caltech.edu> - 2016-03-30 09:13 -0700
                          Re: strtok() implementation Randy Howard <rhoward.mx@EverybodyUsesIt.com> - 2016-03-30 14:44 -0500
                      Re: strtok() implementation boon <root@127.10.10.1> - 2016-03-31 10:24 +0200
                        Re: strtok() implementation Tim Rentsch <txr@alumni.caltech.edu> - 2016-04-05 12:23 -0700
    Re: strtok() implementation Ian Collins <ian-news@hotmail.com> - 2016-03-14 15:31 +1300
      Re: strtok() implementation boon <root@localhost> - 2016-03-14 20:13 +0100
        Re: strtok() implementation Ian Collins <ian-news@hotmail.com> - 2016-03-15 09:48 +1300
          Re: strtok() implementation Malcolm McLean <malcolm.mclean5@btinternet.com> - 2016-03-14 14:05 -0700
            Re: strtok() implementation Ian Collins <ian-news@hotmail.com> - 2016-03-15 10:09 +1300
              Re: strtok() implementation Richard Heathfield <rjh@cpax.org.uk> - 2016-03-14 22:02 +0000
                Re: strtok() implementation Gareth Owen <gwowen@gmail.com> - 2016-03-14 22:16 +0000
            Re: strtok() implementation Keith Thompson <kst-u@mib.org> - 2016-03-14 14:50 -0700
            Re: strtok() implementation raltbos@xs4all.nl (Richard Bos) - 2016-03-14 22:06 +0000
              Re: strtok() implementation boon <root@localhost> - 2016-03-15 22:14 +0100
                Re: strtok() implementation BartC <bc@freeuk.com> - 2016-03-15 21:23 +0000
                Re: strtok() implementation raltbos@xs4all.nl (Richard Bos) - 2016-03-17 12:27 +0000
          Re: strtok() implementation boon <root@localhost> - 2016-03-15 22:04 +0100
            Re: strtok() implementation Eric Sosman <esosman@comcast-dot-net.invalid> - 2016-03-15 18:18 -0400
              Re: strtok() implementation boon <root@localhost> - 2016-03-18 21:19 +0100
          Re: strtok() implementation boon <root@localhost> - 2016-03-15 22:08 +0100
    Re: strtok() implementation boon <root@localhost> - 2016-03-15 22:32 +0100

Page 1 of 2  [1] 2  Next page →


#83783 — strtok() implementation

Fromboon <root@localhost>
Date2016-03-13 17:13 +0100
Subjectstrtok() implementation
Message-ID<56e59164$0$27830$426a74cc@news.free.fr>
Hello,

I am writing strtok() implementation, just for the fun and to improve my 
C coding style and skills.

Here is my solution.

char *my_strtok(char *str, const char *delim)
{
     char *ret, *s, *min_s;
     const char *p;
     static char *saveptr, *end;

     if (str) {
         end = str + strlen(str);
         saveptr = str;
     }

     if (*saveptr == '\0') {
         saveptr = end = NULL;
         return NULL;
     }

     ret = saveptr;
     min_s = end;

     for (p = delim; *p != '\0'; p++) {
         s = strchr(saveptr, *p);
         if (s && s < min_s)
             min_s = s;
     }

     if (min_s < end) {
         *min_s++ = '\0';
         saveptr = min_s;
     } else {
         saveptr = end;
     }

     return ret;
}

I have often difficulties in variable naming.
Please do not hesitate to criticize this implementation.

Regards.

[toc] | [next] | [standalone]


#83791

FromMalcolm McLean <malcolm.mclean5@btinternet.com>
Date2016-03-13 10:04 -0700
Message-ID<ddd9be59-bb17-48bd-8183-6aed8426a433@googlegroups.com>
In reply to#83783
On Sunday, March 13, 2016 at 4:12:33 PM UTC, boon wrote:
> Hello,
> 
> I am writing strtok() implementation, just for the fun and to improve my 
> C coding style and skills.
> 
> Here is my solution.
> 
> char *my_strtok(char *str, const char *delim)
> {
>      char *ret, *s, *min_s;
>      const char *p;
>      static char *saveptr, *end;
> 
>      if (str) {
>          end = str + strlen(str);
>          saveptr = str;
>      }
> 
I don't think we need "end". When the string you are parsing hits a NUL, 
that's the end.

>      if (*saveptr == '\0') {
>          saveptr = end = NULL;
>          return NULL;
>      }
>
Now if strtok is called with NULL before it is called with a parse string.
I think technically the behaviour is undefined, so you can do anything,
including a null pointer dereference. But it's far better to check for NULL
and handle it gracefully. 

> 
>      ret = saveptr;
>      min_s = end;
> 
>      for (p = delim; *p != '\0'; p++) {
>          s = strchr(saveptr, *p);
>          if (s && s < min_s)
>              min_s = s;
>      }
>
This looks OK to me, I would think of it as stepping through the
string and checking each character for delimit rather than
stepping through the delimiters and checking if the string contains
them, but the logic is correct.
(However your way will be a bit slower for most real-life inputs,
where delim is small and the search string is long) 
> 
>      if (min_s < end) {
>          *min_s++ = '\0';
>          saveptr = min_s;
>      } else {
>          saveptr = end;
>      }
>
Looks OK, but as I said, you can rewrite it totally to get rid of
end, I believe.
 
>      return ret;
> }
> 

[toc] | [prev] | [next] | [standalone]


#83797

Fromboon <root@localhost>
Date2016-03-13 18:51 +0100
Message-ID<56e5a84c$0$22771$426a74cc@news.free.fr>
In reply to#83791
On 03/13/2016 06:04 PM, Malcolm McLean wrote:
> On Sunday, March 13, 2016 at 4:12:33 PM UTC, boon wrote:
>> Hello,
>>
>> I am writing strtok() implementation, just for the fun and to improve my
>> C coding style and skills.
>>
>> Here is my solution.
>>
>> char *my_strtok(char *str, const char *delim)
>> {
>>       char *ret, *s, *min_s;
>>       const char *p;
>>       static char *saveptr, *end;
>>
>>       if (str) {
>>           end = str + strlen(str);
>>           saveptr = str;
>>       }
>>
> I don't think we need "end". When the string you are parsing hits a NUL,
> that's the end.


'end' was there because of the wrong logic I have used to step through 
the string to be parsed, as you noticed it below.


>>       if (*saveptr == '\0') {
>>           saveptr = end = NULL;
>>           return NULL;
>>       }
>>
> Now if strtok is called with NULL before it is called with a parse string.
> I think technically the behaviour is undefined, so you can do anything,
> including a null pointer dereference. But it's far better to check for NULL
> and handle it gracefully.
>
>>
>>       ret = saveptr;
>>       min_s = end;
>>
>>       for (p = delim; *p != '\0'; p++) {
>>           s = strchr(saveptr, *p);
>>           if (s && s < min_s)
>>               min_s = s;
>>       }
>>
> This looks OK to me, I would think of it as stepping through the
> string and checking each character for delimit rather than
> stepping through the delimiters and checking if the string contains
> them, but the logic is correct.
> (However your way will be a bit slower for most real-life inputs,
> where delim is small and the search string is long)


Of course. You are correct. Do not know why I have thought the logic 
this way.

>>
>>       if (min_s < end) {
>>           *min_s++ = '\0';
>>           saveptr = min_s;
>>       } else {
>>           saveptr = end;
>>       }
>>
> Looks OK, but as I said, you can rewrite it totally to get rid of
> end, I believe.

I agree. I will rewrite it totally.

>>       return ret;
>> }
>>
>
>

Thanks Malcom.

[toc] | [prev] | [next] | [standalone]


#83795

FromEric Sosman <esosman@comcast-dot-net.invalid>
Date2016-03-13 13:38 -0400
Message-ID<nc48cd$hqj$1@dont-email.me>
In reply to#83783
On 3/13/2016 12:13 PM, boon wrote:
> Hello,
>
> I am writing strtok() implementation, just for the fun and to improve my
> C coding style and skills.
>
> Here is my solution.
>
> char *my_strtok(char *str, const char *delim)
> {
>      char *ret, *s, *min_s;
>      const char *p;
>      static char *saveptr, *end;
>
>      if (str) {
>          end = str + strlen(str);
>          saveptr = str;
>      }
>
>      if (*saveptr == '\0') {
>          saveptr = end = NULL;
>          return NULL;
>      }
>
>      ret = saveptr;
>      min_s = end;
>
>      for (p = delim; *p != '\0'; p++) {
>          s = strchr(saveptr, *p);
>          if (s && s < min_s)
>              min_s = s;
>      }
>
>      if (min_s < end) {
>          *min_s++ = '\0';
>          saveptr = min_s;
>      } else {
>          saveptr = end;
>      }
>
>      return ret;
> }
>
> I have often difficulties in variable naming.
> Please do not hesitate to criticize this implementation.

     The behavior of this function does not quite match that of the
official strtok(), as you can see by trying

	strtok("///x///", "/")

That is, it is not enough to search for a delimiter character:
You must search for a *non*-delimiter first, and then for the
delimiter (or '\0') that follows it.

     Since you're doing this as an exercise you might consider
most of the Standard library off-limits.  But since you're using
strlen() and strchr(), perhaps you aren't really trying to avoid
using the Standard functions after all.  If you're willing to use
a few others, I commend strspn() and strcspn() to your attention.

-- 
esosman@comcast-dot-net.invalid
"Don't be afraid of work. Make work afraid of you." -- TLM

[toc] | [prev] | [next] | [standalone]


#83798

Fromboon <root@localhost>
Date2016-03-13 19:05 +0100
Message-ID<56e5abc8$0$27836$426a34cc@news.free.fr>
In reply to#83795
On 03/13/2016 06:38 PM, Eric Sosman wrote:
> On 3/13/2016 12:13 PM, boon wrote:
>> Hello,
>>
>> I am writing strtok() implementation, just for the fun and to improve my
>> C coding style and skills.
>>
>> Here is my solution.
>>
>> char *my_strtok(char *str, const char *delim)
>> {
>>      char *ret, *s, *min_s;
>>      const char *p;
>>      static char *saveptr, *end;
>>
>>      if (str) {
>>          end = str + strlen(str);
>>          saveptr = str;
>>      }
>>
>>      if (*saveptr == '\0') {
>>          saveptr = end = NULL;
>>          return NULL;
>>      }
>>
>>      ret = saveptr;
>>      min_s = end;
>>
>>      for (p = delim; *p != '\0'; p++) {
>>          s = strchr(saveptr, *p);
>>          if (s && s < min_s)
>>              min_s = s;
>>      }
>>
>>      if (min_s < end) {
>>          *min_s++ = '\0';
>>          saveptr = min_s;
>>      } else {
>>          saveptr = end;
>>      }
>>
>>      return ret;
>> }
>>
>> I have often difficulties in variable naming.
>> Please do not hesitate to criticize this implementation.
>
>      The behavior of this function does not quite match that of the
> official strtok(), as you can see by trying
>
>      strtok("///x///", "/")
>
> That is, it is not enough to search for a delimiter character:
> You must search for a *non*-delimiter first, and then for the
> delimiter (or '\0') that follows it.

Well. Indeed, I understand. According to strtok() manual page :

  "The start of the next token is determined by scan‐
        ning forward for the next nondelimiter byte in str."

etc.

I apologize. I have not read completely the manual page.

I will take this remark into an account for my next implementation.

>      Since you're doing this as an exercise you might consider
> most of the Standard library off-limits.  But since you're using
> strlen() and strchr(), perhaps you aren't really trying to avoid
> using the Standard functions after all.  If you're willing to use
> a few others, I commend strspn() and strcspn() to your attention.
>

Indeed, it will be better to avoid the Standard functions to re-write a 
Standard function.

I will have a look to the manual pages of these two functions which are 
new to me.

Thank you.

[toc] | [prev] | [next] | [standalone]


#83812

FromKeith Thompson <kst-u@mib.org>
Date2016-03-13 13:50 -0700
Message-ID<lnoaaiksqt.fsf@kst-u.example.com>
In reply to#83798
boon <root@localhost> writes:
> On 03/13/2016 06:38 PM, Eric Sosman wrote:

[...]

> I apologize. I have not read completely the manual page.

I suggest reading the standard rather than (or in addition to) the
manual page; see
    http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
Man pages are *usually* accurate, but the standard is definitive.

> I will take this remark into an account for my next implementation.
>
>>      Since you're doing this as an exercise you might consider
>> most of the Standard library off-limits.  But since you're using
>> strlen() and strchr(), perhaps you aren't really trying to avoid
>> using the Standard functions after all.  If you're willing to use
>> a few others, I commend strspn() and strcspn() to your attention.

Or you could use strtok().  8-)}

> Indeed, it will be better to avoid the Standard functions to re-write a 
> Standard function.

It depends on your goal.  For a production implementation of
the C standard library, it's perfectly appropriate for standard
functions to call other standard functions.  (For example glibc's
strtok() calls strspn(), strpbrk(), and a non-standard function
called __rawmemchr().)  But if your goal is to learn, implementing
everything from scratch is a good approach.

[...]

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

[toc] | [prev] | [next] | [standalone]


#83820

Fromboon <root@localhost>
Date2016-03-13 23:10 +0100
Message-ID<56e5e525$0$3295$426a74cc@news.free.fr>
In reply to#83812
On 03/13/2016 09:50 PM, Keith Thompson wrote:
> boon <root@localhost> writes:
>> On 03/13/2016 06:38 PM, Eric Sosman wrote:
>
> [...]
>
>> I apologize. I have not read completely the manual page.
>
> I suggest reading the standard rather than (or in addition to) the
> manual page; see
>      http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
> Man pages are *usually* accurate, but the standard is definitive.

Downloaded! Thank you Keith.

>> I will take this remark into an account for my next implementation.
>>
>>>       Since you're doing this as an exercise you might consider
>>> most of the Standard library off-limits.  But since you're using
>>> strlen() and strchr(), perhaps you aren't really trying to avoid
>>> using the Standard functions after all.  If you're willing to use
>>> a few others, I commend strspn() and strcspn() to your attention.
>
> Or you could use strtok().  8-)}
>
>> Indeed, it will be better to avoid the Standard functions to re-write a
>> Standard function.
>
> It depends on your goal.  For a production implementation of
> the C standard library, it's perfectly appropriate for standard
> functions to call other standard functions.  (For example glibc's
> strtok() calls strspn(), strpbrk(), and a non-standard function
> called __rawmemchr().)  But if your goal is to learn, implementing
> everything from scratch is a good approach.

Thank you for this information again. So strpbrk() is the 3rd new 
function I will have learned today ;)

Yes. My goal is to learn. C is my preferred language.

> [...]
>

Regards.

[toc] | [prev] | [next] | [standalone]


#83804

Fromboon <root@localhost>
Date2016-03-13 21:06 +0100
Message-ID<56e5c7fe$0$9212$426a74cc@news.free.fr>
In reply to#83783
On 03/13/2016 05:13 PM, boon wrote:
> Hello,
>
> I am writing strtok() implementation, just for the fun and to improve my
> C coding style and skills.
>
> Here is my solution.
>
> char *my_strtok(char *str, const char *delim)
> {
>      char *ret, *s, *min_s;
>      const char *p;
>      static char *saveptr, *end;
>
>      if (str) {
>          end = str + strlen(str);
>          saveptr = str;
>      }
>
>      if (*saveptr == '\0') {
>          saveptr = end = NULL;
>          return NULL;
>      }
>
>      ret = saveptr;
>      min_s = end;
>
>      for (p = delim; *p != '\0'; p++) {
>          s = strchr(saveptr, *p);
>          if (s && s < min_s)
>              min_s = s;
>      }
>
>      if (min_s < end) {
>          *min_s++ = '\0';
>          saveptr = min_s;
>      } else {
>          saveptr = end;
>      }
>
>      return ret;
> }
>
> I have often difficulties in variable naming.
> Please do not hesitate to criticize this implementation.
>
> Regards.


Here is my new implementation taking into and accound Eric and Malcolm 
remarks.

char *my_strtok(char *str, const char *delim)
{
     char *p, *ret;
     const char *q;
     static char *saveptr;

     if (!str && !saveptr)
         return NULL;

     if (!saveptr)
         saveptr = str;

     ret = saveptr;

     /* Skip any delimiter character */
     for (p = saveptr; *p != '\0'; p++) {

         for (q = delim; *q != '\0'; q++)
             if (*p == *q)
                 break;

         if (*q == '\0')
             /* No more delimiter character found */
             break;
     }

     if (*p == '\0') {
         saveptr = NULL;
         return NULL;
     }

     /* 'ret' is the string to be returned */
     ret = p;

     for (; *p != '\0'; p++) {

         for (q = delim; *q != '\0'; q++)
             if (*p == *q)
                 break;

         if (*q != '\0') {
             *p++ = '\0';
             break;
         }
     }

     saveptr = p;

     return ret;
}

[toc] | [prev] | [next] | [standalone]


#83810

FromEric Sosman <esosman@comcast-dot-net.invalid>
Date2016-03-13 16:26 -0400
Message-ID<nc4i81$qr9$1@dont-email.me>
In reply to#83804
On 3/13/2016 4:06 PM, boon wrote:
> On 03/13/2016 05:13 PM, boon wrote:
>
> Here is my new implementation taking into and accound Eric and Malcolm
> remarks.
>
> char *my_strtok(char *str, const char *delim)
> {
>      char *p, *ret;
>      const char *q;
>      static char *saveptr;
>
>      if (!str && !saveptr)
>          return NULL;
>
>      if (!saveptr)
>          saveptr = str;

     This should be

	if (str)
	    saveptr = str;

The caller of strtok() is not required to keep on calling until it
returns NULL; he can abandon one partly-tokenized string and turn his
attention to a different one.

>      ret = saveptr;
>
>      /* Skip any delimiter character */
>      for (p = saveptr; *p != '\0'; p++) {
>
>          for (q = delim; *q != '\0'; q++)
>              if (*p == *q)
>                  break;
>
>          if (*q == '\0')
>              /* No more delimiter character found */
>              break;
>      }
>
>      if (*p == '\0') {
>          saveptr = NULL;
>          return NULL;
>      }
>
>      /* 'ret' is the string to be returned */
>      ret = p;
>
>      for (; *p != '\0'; p++) {
>
>          for (q = delim; *q != '\0'; q++)
>              if (*p == *q)
>                  break;
>
>          if (*q != '\0') {
>              *p++ = '\0';
>              break;
>          }
>      }
>
>      saveptr = p;
>
>      return ret;
> }

     If you're still willing to use strchr() -- as you were in the
earlier version -- you could use it to write those delimiter-testing
loops more succinctly.  For example, the second one could become

	for (; *p != '\0'; p++) {
	    if (strchr(delim, *p) != NULL)
	        *p++ = '\0';
	        break;
	}

     Once again, I suggest you study strspn() and strcspn().  They
seem to me to be quite useful (especially the latter), and for some
reason quite underused.

-- 
esosman@comcast-dot-net.invalid
"Don't be afraid of work. Make work afraid of you." -- TLM

[toc] | [prev] | [next] | [standalone]


#83818

Fromboon <root@localhost>
Date2016-03-13 22:52 +0100
Message-ID<56e5e0dc$0$9225$426a74cc@news.free.fr>
In reply to#83810
On 03/13/2016 09:26 PM, Eric Sosman wrote:
> On 3/13/2016 4:06 PM, boon wrote:
>> On 03/13/2016 05:13 PM, boon wrote:
>>
>> Here is my new implementation taking into and accound Eric and Malcolm
>> remarks.
>>
>> char *my_strtok(char *str, const char *delim)
>> {
>>      char *p, *ret;
>>      const char *q;
>>      static char *saveptr;
>>
>>      if (!str && !saveptr)
>>          return NULL;
>>
>>      if (!saveptr)
>>          saveptr = str;
>
>      This should be
>
>      if (str)
>          saveptr = str;
>
> The caller of strtok() is not required to keep on calling until it
> returns NULL; he can abandon one partly-tokenized string and turn his
> attention to a different one.

Ok. Understood.

>>      ret = saveptr;
>>
>>      /* Skip any delimiter character */
>>      for (p = saveptr; *p != '\0'; p++) {
>>
>>          for (q = delim; *q != '\0'; q++)
>>              if (*p == *q)
>>                  break;
>>
>>          if (*q == '\0')
>>              /* No more delimiter character found */
>>              break;
>>      }
>>
>>      if (*p == '\0') {
>>          saveptr = NULL;
>>          return NULL;
>>      }
>>
>>      /* 'ret' is the string to be returned */
>>      ret = p;
>>
>>      for (; *p != '\0'; p++) {
>>
>>          for (q = delim; *q != '\0'; q++)
>>              if (*p == *q)
>>                  break;
>>
>>          if (*q != '\0') {
>>              *p++ = '\0';
>>              break;
>>          }
>>      }
>>
>>      saveptr = p;
>>
>>      return ret;
>> }
>
>      If you're still willing to use strchr() -- as you were in the
> earlier version -- you could use it to write those delimiter-testing
> loops more succinctly.  For example, the second one could become
>
>      for (; *p != '\0'; p++) {
>          if (strchr(delim, *p) != NULL)
>              *p++ = '\0';
>              break;
>      }

Ok. Again I never think of matching characters of a string to be parsed 
with a string of delimiters. I will remember that.

>      Once again, I suggest you study strspn() and strcspn().  They
> seem to me to be quite useful (especially the latter), and for some
> reason quite underused.
>

In fact both of them may be used to implement strtok(). Here is my new 
version :

char *my_strtok(char *str, const char *delim)
{
     char *ret;
     static char *saveptr;

     if (!str && !saveptr)
         return NULL;

     if (str)
         saveptr = str;

     saveptr += strspn(saveptr, delim);
     ret = saveptr;
     saveptr += strcspn(saveptr, delim);

     if (*saveptr == '\0') {
         saveptr = NULL;
         return ret;
     }

     *saveptr++ = '\0';

     return ret;
}

Sounds correct to me. Note that the string parsing is stopped only when 
both str and saveptr are NULL.

saveptr value is reset when its last character is null character.


[toc] | [prev] | [next] | [standalone]


#83824

Fromboon <root@localhost>
Date2016-03-13 23:25 +0100
Message-ID<56e5e88f$0$3047$426a74cc@news.free.fr>
In reply to#83818
On 03/13/2016 10:52 PM, boon wrote:
> On 03/13/2016 09:26 PM, Eric Sosman wrote:
>> On 3/13/2016 4:06 PM, boon wrote:
>>> On 03/13/2016 05:13 PM, boon wrote:

[...]

>
> In fact both of them may be used to implement strtok(). Here is my new
> version :
>
> char *my_strtok(char *str, const char *delim)
> {
>      char *ret;
>      static char *saveptr;
>
>      if (!str && !saveptr)
>          return NULL;
>
>      if (str)
>          saveptr = str;
>
>      saveptr += strspn(saveptr, delim);
>      ret = saveptr;
>      saveptr += strcspn(saveptr, delim);
>
>      if (*saveptr == '\0') {
>          saveptr = NULL;
>          return ret;
>      }
>
>      *saveptr++ = '\0';
>
>      return ret;
> }
>
> Sounds correct to me. Note that the string parsing is stopped only when
> both str and saveptr are NULL.
>
> saveptr value is reset when its last character is null character.

Ooops! I meant, 'saveptr' is reset to NULL value when it points to a 
null string (made of a unique null character).


[toc] | [prev] | [next] | [standalone]


#83839

FromIan Collins <ian-news@hotmail.com>
Date2016-03-14 15:26 +1300
Message-ID<dkmlqfF53ugU1@mid.individual.net>
In reply to#83824
On 03/14/16 11:25, boon wrote:
> On 03/13/2016 10:52 PM, boon wrote:
>> On 03/13/2016 09:26 PM, Eric Sosman wrote:
>>> On 3/13/2016 4:06 PM, boon wrote:
>>>> On 03/13/2016 05:13 PM, boon wrote:
>
> [...]
>
>>
>> In fact both of them may be used to implement strtok(). Here is my new
>> version :
>>
>> char *my_strtok(char *str, const char *delim)
>> {
>>       char *ret;
>>       static char *saveptr;
>>
>>       if (!str && !saveptr)
>>           return NULL;
>>
>>       if (str)
>>           saveptr = str;
>>
>>       saveptr += strspn(saveptr, delim);
>>       ret = saveptr;
>>       saveptr += strcspn(saveptr, delim);
>>
>>       if (*saveptr == '\0') {
>>           saveptr = NULL;
>>           return ret;
>>       }
>>
>>       *saveptr++ = '\0';
>>
>>       return ret;
>> }
>>
>> Sounds correct to me. Note that the string parsing is stopped only when
>> both str and saveptr are NULL.
>>
>> saveptr value is reset when its last character is null character.
>
> Ooops! I meant, 'saveptr' is reset to NULL value when it points to a
> null string (made of a unique null character).

This fails two of my tests:

{
   char data[] = "  ";
   const char* delim = " ";

   CPPUNIT_ASSERT( !my_strtok( data, delim ) );
}

and

{
   char data[] = "A B ";
   const char* delim = " ";

   my_strtok( data, delim );
   my_strtok( NULL, delim );

   char* p = my_strtok( NULL, delim );

   CPPUNIT_ASSERT( !p );
}

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#83862

Fromboon <root@localhost.localdomain>
Date2016-03-14 12:44 +0100
Message-ID<nc688p$up7$2@adenine.netfront.net>
In reply to#83839
On 03/14/2016 03:26 AM, Ian Collins wrote:
> On 03/14/16 11:25, boon wrote:
>> On 03/13/2016 10:52 PM, boon wrote:
>>> On 03/13/2016 09:26 PM, Eric Sosman wrote:
>>>> On 3/13/2016 4:06 PM, boon wrote:
>>>>> On 03/13/2016 05:13 PM, boon wrote:
>>
>> [...]
>>
>>>
>>> In fact both of them may be used to implement strtok(). Here is my new
>>> version :
>>>
>>> char *my_strtok(char *str, const char *delim)
>>> {
>>>       char *ret;
>>>       static char *saveptr;
>>>
>>>       if (!str && !saveptr)
>>>           return NULL;
>>>
>>>       if (str)
>>>           saveptr = str;
>>>
>>>       saveptr += strspn(saveptr, delim);
>>>       ret = saveptr;
>>>       saveptr += strcspn(saveptr, delim);
>>>
>>>       if (*saveptr == '\0') {
>>>           saveptr = NULL;
>>>           return ret;
>>>       }
>>>
>>>       *saveptr++ = '\0';
>>>
>>>       return ret;
>>> }
>>>
>>> Sounds correct to me. Note that the string parsing is stopped only when
>>> both str and saveptr are NULL.
>>>
>>> saveptr value is reset when its last character is null character.
>>
>> Ooops! I meant, 'saveptr' is reset to NULL value when it points to a
>> null string (made of a unique null character).
>
> This fails two of my tests:
>
> {
>    char data[] = "  ";
>    const char* delim = " ";
>
>    CPPUNIT_ASSERT( !my_strtok( data, delim ) );
> }
>
> and
>
> {
>    char data[] = "A B ";
>    const char* delim = " ";
>
>    my_strtok( data, delim );
>    my_strtok( NULL, delim );
>
>    char* p = my_strtok( NULL, delim );
>
>    CPPUNIT_ASSERT( !p );
> }
>

Correct. There is a remaining bug (I hope so ;)) in my latest 
implementation. After having skipped the delimiter characters, 'saveptr' 
may point to a null-terminated string. In this case, my_strtok() must 
return NULL (and not saveptr or ret which points have the same value). 
Then, there is no need to reset it to NULL value.

I hope this is the last implementation :

char *my_strtok(char *str, const char *delim)
{
     char *ret;
     static char *saveptr;

     if (!str && !saveptr)
         return NULL;

     if (str)
         saveptr = str;

     saveptr += strspn(saveptr, delim);
     if (*saveptr == '\0')
         return NULL;

     ret = saveptr;
     saveptr += strcspn(saveptr, delim);

     if (*saveptr == '\0') {
         saveptr = NULL;
         return ret;
     }

     *saveptr++ = '\0';

     return ret;
}


--- news://freenews.netfront.net/ - complaints: news@netfront.net ---

[toc] | [prev] | [next] | [standalone]


#84251

FromTim Rentsch <txr@alumni.caltech.edu>
Date2016-03-17 08:23 -0700
Message-ID<kfn4mc5f7qu.fsf@x-alumni2.alumni.caltech.edu>
In reply to#83862
boon <root@localhost.localdomain> writes:

> On 03/14/2016 03:26 AM, Ian Collins wrote:
>> On 03/14/16 11:25, boon wrote:
>>> On 03/13/2016 10:52 PM, boon wrote:
>>>> On 03/13/2016 09:26 PM, Eric Sosman wrote:
>>>>> On 3/13/2016 4:06 PM, boon wrote:
>>>>>> On 03/13/2016 05:13 PM, boon wrote:
>
> [...] There is a remaining bug (I hope so ;)) in my latest
> implementation.  After having skipped the delimiter characters,
> saveptr' may point to a null-terminated string.  In this case,
> my_strtok() must return NULL (and not saveptr or ret which points have
> the same value).  Then, there is no need to reset it to NULL value.
>
> I hope this is the last implementation :
>
> char *my_strtok(char *str, const char *delim)
> {
>     char *ret;
>     static char *saveptr;
>
>     if (!str && !saveptr)
>         return NULL;
>
>     if (str)
>         saveptr = str;
>
>     saveptr += strspn(saveptr, delim);
>     if (*saveptr == '\0')
>         return NULL;
>
>     ret = saveptr;
>     saveptr += strcspn(saveptr, delim);
>
>     if (*saveptr == '\0') {
>         saveptr = NULL;
>         return ret;
>     }
>
>     *saveptr++ = '\0';
>
>     return ret;
> }

If you imagine there are available functions 'skip_over' to skip
over a sequence of characters in a given set, and 'skip_to' to
skip to the first occurrence of any character in a given set (or
to a teminating null, whichever comes first), then strtok() may
be written as follows, without ever testing the saved pointer
for NULL (because that never happens):

    char *
    my_strtok( char *input, const char *delimiters ){
        static char *saved = "";
        char *result = skip_over( input ? input : saved, delimiters );
        saved        = skip_to( result, delimiters );

        if(  *saved  )  return  *saved++ = 0,  result;

        return  saved = "",   *result  ? result  : NULL;
    }

The functions skip_over() and skip_to() can be written using
strspn() and strcspn() if that is deemed appropriate:

    char *
    skip_over( char *s, const char *space_like ){
        return  s + strspn( s, space_like );
    }

    char *
    skip_to( char *s, const char *space_like ){
        return  s + strcspn( s, space_like );
    }

If for some reason we don't want to rely on strspn() and strcspn(),
skip_over() and skip_to() can be written in terms of a single
sub-function 'contains()', as follows.  Note the parallel
construction of skip_over() and skip_to() in these definitions:

    char *
    skip_over( char *s, const char *set ){
        return  *s && contains( set, *s )  ? skip_over( s+1, set )  : s;
    }

    char *
    skip_to( char *s, const char *set ){
        return  *s && !contains( set, *s )  ? skip_to( s+1, set )  : s;
    }

The sub-function contains() can be written in a similar fashion

    int
    contains( const char *set, char c ){
        return  *set && *set != c  ? contains( set+1, c )  :  *set != 0;
    }

I hope you find this alternate approach of interest.

[toc] | [prev] | [next] | [standalone]


#84341

Fromboon <root@localhost>
Date2016-03-18 21:09 +0100
Message-ID<56ec6051$0$19772$426a34cc@news.free.fr>
In reply to#84251
On 03/17/2016 04:23 PM, Tim Rentsch wrote:
> boon <root@localhost.localdomain> writes:
>
>> On 03/14/2016 03:26 AM, Ian Collins wrote:
>>> On 03/14/16 11:25, boon wrote:
>>>> On 03/13/2016 10:52 PM, boon wrote:
>>>>> On 03/13/2016 09:26 PM, Eric Sosman wrote:
>>>>>> On 3/13/2016 4:06 PM, boon wrote:
>>>>>>> On 03/13/2016 05:13 PM, boon wrote:

[...]

>
> If you imagine there are available functions 'skip_over' to skip
> over a sequence of characters in a given set, and 'skip_to' to
> skip to the first occurrence of any character in a given set (or
> to a teminating null, whichever comes first), then strtok() may
> be written as follows, without ever testing the saved pointer
> for NULL (because that never happens):

I see. As the reset value for saved pointer is null-terminated string, 
there is no need to test if it is NULL, indeed.

But you test its first element *saved for null character (if 
(*saved))... isn't this the same logic but with different value for 
string 'saved' pointer points to?

I note that your implementation is safer that mine as non NULL pointer 
are used.

I like your trick and I will remember it.

>      char *
>      my_strtok( char *input, const char *delimiters ){
>          static char *saved = "";
>          char *result = skip_over( input ? input : saved, delimiters );
>          saved        = skip_to( result, delimiters );
>
>          if(  *saved  )  return  *saved++ = 0,  result;
>
>          return  saved = "",   *result  ? result  : NULL;
>      }
>
> The functions skip_over() and skip_to() can be written using
> strspn() and strcspn() if that is deemed appropriate:
>
>      char *
>      skip_over( char *s, const char *space_like ){
>          return  s + strspn( s, space_like );
>      }
>
>      char *
>      skip_to( char *s, const char *space_like ){
>          return  s + strcspn( s, space_like );
>      }

I noticed these functions work even if a null terminated string is 
passed as 's' parameter.

> If for some reason we don't want to rely on strspn() and strcspn(),
> skip_over() and skip_to() can be written in terms of a single
> sub-function 'contains()', as follows.  Note the parallel
> construction of skip_over() and skip_to() in these definitions:
>
>      char *
>      skip_over( char *s, const char *set ){
>          return  *s && contains( set, *s )  ? skip_over( s+1, set )  : s;
>      }
>
>      char *
>      skip_to( char *s, const char *set ){
>          return  *s && !contains( set, *s )  ? skip_to( s+1, set )  : s;
>      }
>
> The sub-function contains() can be written in a similar fashion
>
>      int
>      contains( const char *set, char c ){
>          return  *set && *set != c  ? contains( set+1, c )  :  *set != 0;
>      }
>
> I hope you find this alternate approach of interest.
>

Nice implementation with recursive functions. Thank you.

Here is a new implementation exploiting your trick :

char *my_strtok(char *str, const char *delim)
{
     char *ret;
     static char *saveptr = "";

     if (str)
         saveptr = str;

     ret = saveptr += strspn(saveptr, delim);
     saveptr += strcspn(saveptr, delim);

     if (*saveptr) return *saveptr++ = '\0', ret;

     return saveptr = "", *ret ? ret : NULL;
}

[toc] | [prev] | [next] | [standalone]


#84373

FromTim Rentsch <txr@alumni.caltech.edu>
Date2016-03-19 14:21 -0700
Message-ID<kfnd1qqcgfv.fsf@x-alumni2.alumni.caltech.edu>
In reply to#84341
boon <root@localhost> writes:

> On 03/17/2016 04:23 PM, Tim Rentsch wrote:
>> boon <root@localhost.localdomain> writes:
>>
>>> On 03/14/2016 03:26 AM, Ian Collins wrote:
>>>> On 03/14/16 11:25, boon wrote:
>>>>> On 03/13/2016 10:52 PM, boon wrote:
>>>>>> On 03/13/2016 09:26 PM, Eric Sosman wrote:
>>>>>>> On 3/13/2016 4:06 PM, boon wrote:
>>>>>>>> On 03/13/2016 05:13 PM, boon wrote:
>
> [...]
>
>> If you imagine there are available functions 'skip_over' to skip
>> over a sequence of characters in a given set, and 'skip_to' to
>> skip to the first occurrence of any character in a given set (or
>> to a teminating null, whichever comes first), then strtok() may
>> be written as follows, without ever testing the saved pointer
>> for NULL (because that never happens):
>
> I see.  As the reset value for saved pointer is null-terminated string,
> there is no need to test if it is NULL, indeed.
>
> But you test its first element *saved for null character (if
> (*saved))... isn't this the same logic but with different value for
> string 'saved' pointer points to?

It isn't.  The test of *saved is a test for what's happening with
the input, not (necessarily) a test for the previous value of
'saved'.  If the argument 'input' is non-null, convince yourself
that, just before the if() test, the value of '*saved' may be 0,
or the values of '*result' and '*saved' may both be zero,
depending on what 'input' points to.  Note that these values may
arise regardless of the previous value of 'saved' when 'input'
is non-null.

> I note that your implementation is safer that mine as non NULL pointer
> are used.
>
> I like your trick and I will remember it.

I like it because, for me, it makes it easier to reason about how
the function works.

>>      char *
>>      my_strtok( char *input, const char *delimiters ){
>>          static char *saved = "";
>>          char *result = skip_over( input ? input : saved, delimiters );
>>          saved        = skip_to( result, delimiters );
>>
>>          if(  *saved  )  return  *saved++ = 0,  result;
>>
>>          return  saved = "",   *result  ? result  : NULL;
>>      }
>>
>> [...] I hope you find this alternate approach of interest.
>
> Nice implementation with recursive functions.  Thank you.

I'm glad you like it.  Note that gcc with optimization level
greater than -O1 will turn the recursive calls into loops.

> Here is a new implementation exploiting your trick :
>
> char *my_strtok(char *str, const char *delim)
> {
>     char *ret;
>     static char *saveptr = "";
>
>     if (str)
>         saveptr = str;
>
>     ret = saveptr += strspn(saveptr, delim);
>     saveptr += strcspn(saveptr, delim);
>
>     if (*saveptr) return *saveptr++ = '\0', ret;
>
>     return saveptr = "", *ret ? ret : NULL;
> }

I would be inclined to write a version like this using a
conditional-expression assignment rather than an if() (and also
with different variable names, but I'm using your names):

    char *my_strtok(char *str, const char *delim)
    {
        static char *saveptr = "";
        char *ret = str ? str : saveptr;
    
        ret = ret + strspn(saveptr, delim);
        saveptr = ret + strcspn(ret, delim);
    
        if (*saveptr) return *saveptr++ = '\0', ret;
    
        return saveptr = "", *ret ? ret : NULL;
    }

I find this writing easier to follow than the last one.

[toc] | [prev] | [next] | [standalone]


#84375

FromRandy Howard <rhoward.mx@EverybodyUsesIt.com>
Date2016-03-19 16:25 -0500
Message-ID<nckg4u$kn9$2@gioia.aioe.org>
In reply to#84373
On 3/19/16 4:21 PM, Tim Rentsch wrote:

> I would be inclined to write a version like this using a
> conditional-expression assignment rather than an if() (and also
> with different variable names, but I'm using your names):
>
>      char *my_strtok(char *str, const char *delim)
>      {
>          static char *saveptr = "";
>          char *ret = str ? str : saveptr;
>
>          ret = ret + strspn(saveptr, delim);
>          saveptr = ret + strcspn(ret, delim);
>
>          if (*saveptr) return *saveptr++ = '\0', ret;
>
>          return saveptr = "", *ret ? ret : NULL;
>      }
>
> I find this writing easier to follow than the last one.
>

The one Chris Torek showed in this group many years ago (which you
can probably find fairly easily) is quite good, and doesn't suffer
from the same silliness the standard strtok does.


-- 
Randy Howard
(replace the obvious text in the obvious way if you wish to contact me 
directly)

[toc] | [prev] | [next] | [standalone]


#84857

Fromboon <fred900rbc@gmail.com>
Date2016-03-24 13:05 -0700
Message-ID<61d04bb8-882c-463e-a8b6-b027381abf48@googlegroups.com>
In reply to#84375
On Saturday, March 19, 2016 at 10:26:23 PM UTC+1, Randy Howard wrote:
> On 3/19/16 4:21 PM, Tim Rentsch wrote:
> 
> > I would be inclined to write a version like this using a
> > conditional-expression assignment rather than an if() (and also
> > with different variable names, but I'm using your names):
> >
> >      char *my_strtok(char *str, const char *delim)
> >      {
> >          static char *saveptr = "";
> >          char *ret = str ? str : saveptr;
> >
> >          ret = ret + strspn(saveptr, delim);
> >          saveptr = ret + strcspn(ret, delim);
> >
> >          if (*saveptr) return *saveptr++ = '\0', ret;
> >
> >          return saveptr = "", *ret ? ret : NULL;
> >      }
> >
> > I find this writing easier to follow than the last one.
> >
> 
> The one Chris Torek showed in this group many years ago (which you
> can probably find fairly easily) is quite good, and doesn't suffer
> from the same silliness the standard strtok does.
> 
> 
> -- 
> Randy Howard
> (replace the obvious text in the obvious way if you wish to contact me 
> directly)

I found a very old post:

http://yarchive.net/comp/strtok.html

[toc] | [prev] | [next] | [standalone]


#85344

FromTim Rentsch <txr@alumni.caltech.edu>
Date2016-03-30 09:13 -0700
Message-ID<kfnh9fo9c6l.fsf@x-alumni2.alumni.caltech.edu>
In reply to#84375
Randy Howard <rhoward.mx@EverybodyUsesIt.com> writes:

> On 3/19/16 4:21 PM, Tim Rentsch wrote:
>
>> I would be inclined to write a version like this using a
>> conditional-expression assignment rather than an if() (and also
>> with different variable names, but I'm using your names):
>>
>>      char *my_strtok(char *str, const char *delim)
>>      {
>>          static char *saveptr = "";
>>          char *ret = str ? str : saveptr;
>>
>>          ret = ret + strspn(saveptr, delim);
>>          saveptr = ret + strcspn(ret, delim);
>>
>>          if (*saveptr) return *saveptr++ = '\0', ret;
>>
>>          return saveptr = "", *ret ? ret : NULL;
>>      }
>>
>> I find this writing easier to follow than the last one.
>
> The one Chris Torek showed in this group many years ago (which you
> can probably find fairly easily) is quite good, and doesn't suffer
> from the same silliness the standard strtok does.

If you're talking about the posting mentioned by boon in his
followup, that looks like it's about changing the semantics
and/or interface of the function.  The code above is predicated
on using the existing interface and semantics.

[toc] | [prev] | [next] | [standalone]


#85383

FromRandy Howard <rhoward.mx@EverybodyUsesIt.com>
Date2016-03-30 14:44 -0500
Message-ID<ndhab8$kmf$3@gioia.aioe.org>
In reply to#85344
On 3/30/16 11:13 AM, Tim Rentsch wrote:
> Randy Howard <rhoward.mx@EverybodyUsesIt.com> writes:
>
>> On 3/19/16 4:21 PM, Tim Rentsch wrote:
>>
>>> I would be inclined to write a version like this using a
>>> conditional-expression assignment rather than an if() (and also
>>> with different variable names, but I'm using your names):
>>>
>>>       char *my_strtok(char *str, const char *delim)
>>>       {
>>>           static char *saveptr = "";
>>>           char *ret = str ? str : saveptr;
>>>
>>>           ret = ret + strspn(saveptr, delim);
>>>           saveptr = ret + strcspn(ret, delim);
>>>
>>>           if (*saveptr) return *saveptr++ = '\0', ret;
>>>
>>>           return saveptr = "", *ret ? ret : NULL;
>>>       }
>>>
>>> I find this writing easier to follow than the last one.
>>
>> The one Chris Torek showed in this group many years ago (which you
>> can probably find fairly easily) is quite good, and doesn't suffer
>> from the same silliness the standard strtok does.
>
> If you're talking about the posting mentioned by boon in his
> followup, that looks like it's about changing the semantics
> and/or interface of the function.  The code above is predicated
> on using the existing interface and semantics.
>

I thought was understood, given I wrote "... doesn't suffer from
the same silliness the standard strtok does."

Point being, strtok() is basically garbage and easily improved upon.

-- 
Randy Howard
(replace the obvious text in the obvious way if you wish to contact me 
directly)

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.c


csiph-web