Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #83783 > unrolled thread

strtok() implementation

Started byboon <root@localhost>
First post2016-03-13 17:13 +0100
Last post2016-03-15 22:32 +0100
Articles 19 on this page of 39 — 14 participants

Back to article view | Back to comp.lang.c


Contents

  strtok() implementation boon <root@localhost> - 2016-03-13 17:13 +0100
    Re: strtok() implementation Malcolm McLean <malcolm.mclean5@btinternet.com> - 2016-03-13 10:04 -0700
      Re: strtok() implementation boon <root@localhost> - 2016-03-13 18:51 +0100
    Re: strtok() implementation Eric Sosman <esosman@comcast-dot-net.invalid> - 2016-03-13 13:38 -0400
      Re: strtok() implementation boon <root@localhost> - 2016-03-13 19:05 +0100
        Re: strtok() implementation Keith Thompson <kst-u@mib.org> - 2016-03-13 13:50 -0700
          Re: strtok() implementation boon <root@localhost> - 2016-03-13 23:10 +0100
    Re: strtok() implementation boon <root@localhost> - 2016-03-13 21:06 +0100
      Re: strtok() implementation Eric Sosman <esosman@comcast-dot-net.invalid> - 2016-03-13 16:26 -0400
        Re: strtok() implementation boon <root@localhost> - 2016-03-13 22:52 +0100
          Re: strtok() implementation boon <root@localhost> - 2016-03-13 23:25 +0100
            Re: strtok() implementation Ian Collins <ian-news@hotmail.com> - 2016-03-14 15:26 +1300
              Re: strtok() implementation boon <root@localhost.localdomain> - 2016-03-14 12:44 +0100
                Re: strtok() implementation Tim Rentsch <txr@alumni.caltech.edu> - 2016-03-17 08:23 -0700
                  Re: strtok() implementation boon <root@localhost> - 2016-03-18 21:09 +0100
                    Re: strtok() implementation Tim Rentsch <txr@alumni.caltech.edu> - 2016-03-19 14:21 -0700
                      Re: strtok() implementation Randy Howard <rhoward.mx@EverybodyUsesIt.com> - 2016-03-19 16:25 -0500
                        Re: strtok() implementation boon <fred900rbc@gmail.com> - 2016-03-24 13:05 -0700
                        Re: strtok() implementation Tim Rentsch <txr@alumni.caltech.edu> - 2016-03-30 09:13 -0700
                          Re: strtok() implementation Randy Howard <rhoward.mx@EverybodyUsesIt.com> - 2016-03-30 14:44 -0500
                      Re: strtok() implementation boon <root@127.10.10.1> - 2016-03-31 10:24 +0200
                        Re: strtok() implementation Tim Rentsch <txr@alumni.caltech.edu> - 2016-04-05 12:23 -0700
    Re: strtok() implementation Ian Collins <ian-news@hotmail.com> - 2016-03-14 15:31 +1300
      Re: strtok() implementation boon <root@localhost> - 2016-03-14 20:13 +0100
        Re: strtok() implementation Ian Collins <ian-news@hotmail.com> - 2016-03-15 09:48 +1300
          Re: strtok() implementation Malcolm McLean <malcolm.mclean5@btinternet.com> - 2016-03-14 14:05 -0700
            Re: strtok() implementation Ian Collins <ian-news@hotmail.com> - 2016-03-15 10:09 +1300
              Re: strtok() implementation Richard Heathfield <rjh@cpax.org.uk> - 2016-03-14 22:02 +0000
                Re: strtok() implementation Gareth Owen <gwowen@gmail.com> - 2016-03-14 22:16 +0000
            Re: strtok() implementation Keith Thompson <kst-u@mib.org> - 2016-03-14 14:50 -0700
            Re: strtok() implementation raltbos@xs4all.nl (Richard Bos) - 2016-03-14 22:06 +0000
              Re: strtok() implementation boon <root@localhost> - 2016-03-15 22:14 +0100
                Re: strtok() implementation BartC <bc@freeuk.com> - 2016-03-15 21:23 +0000
                Re: strtok() implementation raltbos@xs4all.nl (Richard Bos) - 2016-03-17 12:27 +0000
          Re: strtok() implementation boon <root@localhost> - 2016-03-15 22:04 +0100
            Re: strtok() implementation Eric Sosman <esosman@comcast-dot-net.invalid> - 2016-03-15 18:18 -0400
              Re: strtok() implementation boon <root@localhost> - 2016-03-18 21:19 +0100
          Re: strtok() implementation boon <root@localhost> - 2016-03-15 22:08 +0100
    Re: strtok() implementation boon <root@localhost> - 2016-03-15 22:32 +0100

Page 2 of 2 — ← Prev page 1 [2]


#85460

Fromboon <root@127.10.10.1>
Date2016-03-31 10:24 +0200
Message-ID<ndimro$qg2$1@adenine.netfront.net>
In reply to#84373
On 03/19/2016 10:21 PM, Tim Rentsch wrote:
> boon <root@localhost> writes:
>
>> On 03/17/2016 04:23 PM, Tim Rentsch wrote:
>>> boon <root@localhost.localdomain> writes:
>>>
>>>> On 03/14/2016 03:26 AM, Ian Collins wrote:
>>>>> On 03/14/16 11:25, boon wrote:
>>>>>> On 03/13/2016 10:52 PM, boon wrote:
>>>>>>> On 03/13/2016 09:26 PM, Eric Sosman wrote:
>>>>>>>> On 3/13/2016 4:06 PM, boon wrote:
>>>>>>>>> On 03/13/2016 05:13 PM, boon wrote:
>>
>> [...]
>>
>>> If you imagine there are available functions 'skip_over' to skip
>>> over a sequence of characters in a given set, and 'skip_to' to
>>> skip to the first occurrence of any character in a given set (or
>>> to a teminating null, whichever comes first), then strtok() may
>>> be written as follows, without ever testing the saved pointer
>>> for NULL (because that never happens):
>>
>> I see.  As the reset value for saved pointer is null-terminated string,
>> there is no need to test if it is NULL, indeed.
>>
>> But you test its first element *saved for null character (if
>> (*saved))... isn't this the same logic but with different value for
>> string 'saved' pointer points to?
>
> It isn't.  The test of *saved is a test for what's happening with
> the input, not (necessarily) a test for the previous value of
> 'saved'.  If the argument 'input' is non-null, convince yourself
> that, just before the if() test, the value of '*saved' may be 0,
> or the values of '*result' and '*saved' may both be zero,
> depending on what 'input' points to.  Note that these values may
> arise regardless of the previous value of 'saved' when 'input'
> is non-null.
>
>> I note that your implementation is safer that mine as non NULL pointer
>> are used.
>>
>> I like your trick and I will remember it.
>
> I like it because, for me, it makes it easier to reason about how
> the function works.
>
>>>       char *
>>>       my_strtok( char *input, const char *delimiters ){
>>>           static char *saved = "";
>>>           char *result = skip_over( input ? input : saved, delimiters );
>>>           saved        = skip_to( result, delimiters );
>>>
>>>           if(  *saved  )  return  *saved++ = 0,  result;
>>>
>>>           return  saved = "",   *result  ? result  : NULL;
>>>       }
>>>
>>> [...] I hope you find this alternate approach of interest.
>>
>> Nice implementation with recursive functions.  Thank you.
>
> I'm glad you like it.  Note that gcc with optimization level
> greater than -O1 will turn the recursive calls into loops.
>
>> Here is a new implementation exploiting your trick :
>>
>> char *my_strtok(char *str, const char *delim)
>> {
>>      char *ret;
>>      static char *saveptr = "";
>>
>>      if (str)
>>          saveptr = str;
>>
>>      ret = saveptr += strspn(saveptr, delim);
>>      saveptr += strcspn(saveptr, delim);
>>
>>      if (*saveptr) return *saveptr++ = '\0', ret;
>>
>>      return saveptr = "", *ret ? ret : NULL;
>> }
>
> I would be inclined to write a version like this using a
> conditional-expression assignment rather than an if() (and also
> with different variable names, but I'm using your names):
>
>      char *my_strtok(char *str, const char *delim)
>      {
>          static char *saveptr = "";
>          char *ret = str ? str : saveptr;
>
>          ret = ret + strspn(saveptr, delim);
>          saveptr = ret + strcspn(ret, delim);
>
>          if (*saveptr) return *saveptr++ = '\0', ret;
>
>          return saveptr = "", *ret ? ret : NULL;
>      }
>
> I find this writing easier to follow than the last one.
>

I agree with you about the conditional-expression assignment to 
initialize 'ret' variable.

But I do not feel at ease with using comma operator within a return 
statement. ;)




--- news://freenews.netfront.net/ - complaints: news@netfront.net ---

[toc] | [prev] | [next] | [standalone]


#85953

FromTim Rentsch <txr@alumni.caltech.edu>
Date2016-04-05 12:23 -0700
Message-ID<kfn1t6j5082.fsf@x-alumni2.alumni.caltech.edu>
In reply to#85460
boon <root@127.10.10.1> writes:

> On 03/19/2016 10:21 PM, Tim Rentsch wrote:
[...]
>>
>> I would be inclined to write a version like this using a
>> conditional-expression assignment rather than an if() (and also
>> with different variable names, but I'm using your names):
>>
>>      char *my_strtok(char *str, const char *delim)
>>      {
>>          static char *saveptr = "";
>>          char *ret = str ? str : saveptr;
>>
>>          ret = ret + strspn(saveptr, delim);
>>          saveptr = ret + strcspn(ret, delim);
>>
>>          if (*saveptr) return *saveptr++ = '\0', ret;
>>
>>          return saveptr = "", *ret ? ret : NULL;
>>      }
>>
>> I find this writing easier to follow than the last one.
>
> I agree with you about the conditional-expression assignment to
> initialize 'ret' variable.
>
> But I do not feel at ease with using comma operator within a return
> statement. ;)

I understand your reaction.  Let me offer a perspective that the
usage shown is acceptable in this case.

For functions that have an output parameter (eg, 'int *out'),
there is a school of thought that storing those values should be
expressed in 'return' statements along with the function's
result value:

    return  *out = n,  p;

Doing this makes it easier to see that all outputs of a
function have been given appropriate values.

In the code above, the value stored in 'saveptr' is part of the
"output state" of the function.  It isn't state that is directly
visible to the caller, but it is important to the caller that the
state-saving be done.  Indeed, that state-saving is part of the
specification of the function here, modeled as it is after
strtok().  Since it is part of the function's "output state", it
makes sense to include the setting of that state in the return
expression, along with the regular function return value.

[toc] | [prev] | [next] | [standalone]


#83840

FromIan Collins <ian-news@hotmail.com>
Date2016-03-14 15:31 +1300
Message-ID<dkmm42F53ugU2@mid.individual.net>
In reply to#83783
On 03/14/16 05:13, boon wrote:
> Hello,
>
> I am writing strtok() implementation, just for the fun and to improve my
> C coding style and skills.
>
> Here is my solution.

<snip>

Here's an alternative, avoiding any library functions...

#include <stddef.h>
#include <stdbool.h>

static bool
charIn( char c, const char* delim )
{
   while( *delim )
   {
     if( *delim++ == c )
     {
       return true;
     }
   }

   return false;
}

static bool
charNotIn( char c, const char* delim )
{
   while( *delim && *delim != c )
   {
     ++delim;
   }

   return *delim == '\0';
}

char*
my_strtok( char* restrict str, const char* restrict delim )
{
   static char* last = NULL;

   if( !delim ) return NULL;
   if( !str && !last ) return NULL;

   if( !str )
   {
     str = last;
   }
   else
   {
     last = NULL;
   }

   while( *str && charIn( *str, delim) )
   {
     ++str;
   }

   if( *str )
   {
     char* start = str;

     while( *start && charNotIn(*start, delim) )
     {
       ++start;
     }

     if( *start )
     {
       *start++ = NULL;
     }

     last = start;

     return str;
   }

   return NULL;
}

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#83911

Fromboon <root@localhost>
Date2016-03-14 20:13 +0100
Message-ID<56e70d11$0$4562$426a74cc@news.free.fr>
In reply to#83840
On 03/14/2016 03:31 AM, Ian Collins wrote:
> On 03/14/16 05:13, boon wrote:
>> Hello,
>>
>> I am writing strtok() implementation, just for the fun and to improve my
>> C coding style and skills.
>>
>> Here is my solution.
>
> <snip>
>
> Here's an alternative, avoiding any library functions...
>
> #include <stddef.h>
> #include <stdbool.h>
>
> static bool
> charIn( char c, const char* delim )
> {
>    while( *delim )
>    {
>      if( *delim++ == c )
>      {
>        return true;
>      }
>    }
>
>    return false;
> }
>
> static bool
> charNotIn( char c, const char* delim )
> {
>    while( *delim && *delim != c )
>    {
>      ++delim;
>    }
>
>    return *delim == '\0';
> }
>
> char*
> my_strtok( char* restrict str, const char* restrict delim )
> {
>    static char* last = NULL;
>
>    if( !delim ) return NULL;
>    if( !str && !last ) return NULL;
>
>    if( !str )
>    {
>      str = last;
>    }
>    else
>    {
>      last = NULL;
>    }
>
>    while( *str && charIn( *str, delim) )
>    {
>      ++str;
>    }
>
>    if( *str )
>    {
>      char* start = str;
>
>      while( *start && charNotIn(*start, delim) )
>      {
>        ++start;
>      }
>
>      if( *start )
>      {
>        *start++ = NULL;
>      }
>
>      last = start;
>
>      return str;
>    }
>
>    return NULL;
> }
>

Thank you Ian. I noticed you have not used local variables (excepted the 
ones used as formal parameters and static 'last' variable to save the 
"parsing context"). I guess this implementation have chances to be 
faster than mines.

Furthermore you added a check on 'delim' parameter. This is something I 
have missed again.

I will try to be more attentive during next exercises.

Regards.

[toc] | [prev] | [next] | [standalone]


#83920

FromIan Collins <ian-news@hotmail.com>
Date2016-03-15 09:48 +1300
Message-ID<dkomcrFn02uU2@mid.individual.net>
In reply to#83911
On 03/15/16 08:13, boon wrote:
> On 03/14/2016 03:31 AM, Ian Collins wrote:
>> On 03/14/16 05:13, boon wrote:
>>> Hello,
>>>
>>> I am writing strtok() implementation, just for the fun and to improve my
>>> C coding style and skills.
>>>
>>> Here is my solution.
>>
>> <snip>
>>
>> Here's an alternative, avoiding any library functions...

<snip>

> Thank you Ian. I noticed you have not used local variables (excepted the
> ones used as formal parameters and static 'last' variable to save the
> "parsing context"). I guess this implementation have chances to be
> faster than mines.

Gaining speed wasn't the intent, more of a case of style.  I often reuse 
parameter values in cases such as this.  If there is any potential 
performance boost it would come from having the equivalent of "strspn" 
in-line.

> Furthermore you added a check on 'delim' parameter. This is something I
> have missed again.

There's a gap in the standard there, it does not specify the behaviour 
when the delimiter is null.  My standard library's strtok shares a crash 
with your version if I run that particular test :)

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#83924

FromMalcolm McLean <malcolm.mclean5@btinternet.com>
Date2016-03-14 14:05 -0700
Message-ID<b62598b0-dadc-48c2-aca0-0632ef0a6151@googlegroups.com>
In reply to#83920
On Monday, March 14, 2016 at 8:48:38 PM UTC, Ian Collins wrote:
> 
> There's a gap in the standard there, it does not specify the behaviour 
> when the delimiter is null.  My standard library's strtok shares a crash 
> with your version if I run that particular test :)
> 
I'd just match the whole string.
In the olden days an empty string and the null character pointer
were the same.

[toc] | [prev] | [next] | [standalone]


#83926

FromIan Collins <ian-news@hotmail.com>
Date2016-03-15 10:09 +1300
Message-ID<dkonksFn02uU3@mid.individual.net>
In reply to#83924
On 03/15/16 10:05, Malcolm McLean wrote:
> On Monday, March 14, 2016 at 8:48:38 PM UTC, Ian Collins wrote:
>>
>> There's a gap in the standard there, it does not specify the behaviour
>> when the delimiter is null.  My standard library's strtok shares a crash
>> with your version if I run that particular test :)
>>
> I'd just match the whole string.
> In the olden days an empty string and the null character pointer
> were the same.

Where they?

-- 
Ian Collins

[toc] | [prev] | [next] | [standalone]


#83938

FromRichard Heathfield <rjh@cpax.org.uk>
Date2016-03-14 22:02 +0000
Message-ID<nc7c7p$juc$1@dont-email.me>
In reply to#83926
On 14/03/16 21:09, Ian Collins wrote:
> On 03/15/16 10:05, Malcolm McLean wrote:
>> On Monday, March 14, 2016 at 8:48:38 PM UTC, Ian Collins wrote:
>>>
>>> There's a gap in the standard there, it does not specify the behaviour
>>> when the delimiter is null.  My standard library's strtok shares a crash
>>> with your version if I run that particular test :)
>>>
>> I'd just match the whole string.
>> In the olden days an empty string and the null character pointer
>> were the same.
>
> Where they?

Nope. YHBM.

-- 
Richard Heathfield
Email: rjh at cpax dot org dot uk
"Usenet is a strange place" - dmr 29 July 1999
Sig line 4 vacant - apply within

[toc] | [prev] | [next] | [standalone]


#83944

FromGareth Owen <gwowen@gmail.com>
Date2016-03-14 22:16 +0000
Message-ID<8737rsaeni.fsf@gmail.com>
In reply to#83938
Richard Heathfield <rjh@cpax.org.uk> writes:

> Nope. YHBM.

:)

[toc] | [prev] | [next] | [standalone]


#83935

FromKeith Thompson <kst-u@mib.org>
Date2016-03-14 14:50 -0700
Message-ID<lnfuvsk9va.fsf@kst-u.example.com>
In reply to#83924
Malcolm McLean <malcolm.mclean5@btinternet.com> writes:
> On Monday, March 14, 2016 at 8:48:38 PM UTC, Ian Collins wrote:
>> 
>> There's a gap in the standard there, it does not specify the behaviour 
>> when the delimiter is null.  My standard library's strtok shares a crash 
>> with your version if I run that particular test :)
>> 
> I'd just match the whole string.
> In the olden days an empty string and the null character pointer
> were the same.

Nope.

On some systems (such as, IIRC, SunOS on 68k), a null pointer pointed to
address 0, and that location happened to be readable and contain a 0
byte.  As a result, passing a null pointer to a string function often
had the same effect as passing a valid pointer to an empty string, and
some code (probably inadvertently) depended on that.  A lot of latent
bugs were detected when the system was changed to protect memory page
zero.

K&R1 page 97 says:

    C guarantees that no pointer that validly points at data will
    contain zero ...

Therefore a null (zero) pointer does not point to valid data.  An empty
string is valid data, consisting of a single null character.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"

[toc] | [prev] | [next] | [standalone]


#83940

Fromraltbos@xs4all.nl (Richard Bos)
Date2016-03-14 22:06 +0000
Message-ID<56e735ca.1726078@news.xs4all.nl>
In reply to#83924
Malcolm McLean <malcolm.mclean5@btinternet.com> wrote:

> On Monday, March 14, 2016 at 8:48:38 PM UTC, Ian Collins wrote:
> > 
> > There's a gap in the standard there, it does not specify the behaviour 
> > when the delimiter is null.  My standard library's strtok shares a crash 
> > with your version if I run that particular test :)
> > 
> I'd just match the whole string.
> In the olden days an empty string and the null character pointer
> were the same.

Must've been _very_ olden days... even on the Speccy that's bollocks.

Richard

[toc] | [prev] | [next] | [standalone]


#84054

Fromboon <root@localhost>
Date2016-03-15 22:14 +0100
Message-ID<56e87af1$0$27816$426a34cc@news.free.fr>
In reply to#83940
On 03/14/2016 11:06 PM, Richard Bos wrote:
> Malcolm McLean <malcolm.mclean5@btinternet.com> wrote:
>
>> On Monday, March 14, 2016 at 8:48:38 PM UTC, Ian Collins wrote:
>>>
>>> There's a gap in the standard there, it does not specify the behaviour
>>> when the delimiter is null.  My standard library's strtok shares a crash
>>> with your version if I run that particular test :)
>>>
>> I'd just match the whole string.
>> In the olden days an empty string and the null character pointer
>> were the same.
>
> Must've been _very_ olden days... even on the Speccy that's bollocks.

have made a search... did you mean ZX Spectrum architecure (discontinued 
in 1992)?

> Richard
>

[toc] | [prev] | [next] | [standalone]


#84056

FromBartC <bc@freeuk.com>
Date2016-03-15 21:23 +0000
Message-ID<nc9ub9$d48$1@dont-email.me>
In reply to#84054
On 15/03/2016 21:14, boon wrote:
> On 03/14/2016 11:06 PM, Richard Bos wrote:
>> Malcolm McLean <malcolm.mclean5@btinternet.com> wrote:
>>
>>> On Monday, March 14, 2016 at 8:48:38 PM UTC, Ian Collins wrote:
>>>>
>>>> There's a gap in the standard there, it does not specify the behaviour
>>>> when the delimiter is null.  My standard library's strtok shares a
>>>> crash
>>>> with your version if I run that particular test :)
>>>>
>>> I'd just match the whole string.
>>> In the olden days an empty string and the null character pointer
>>> were the same.
>>
>> Must've been _very_ olden days... even on the Speccy that's bollocks.
>
> have made a search... did you mean ZX Spectrum architecure (discontinued
> in 1992)?

The ZX used the Z80 chip. On that, a null pointer would point to address 
0x0000, which is also where the program counter starts executing code.

So it would be unlikely to contain 0, necessary for *NULL to yield a 
zero byte just like passing "". Unless there is an unexplicable NOP at 
the start, but C can hardly rely on that. (But being a Sinclair product, 
this wouldn't be surprising.)

-- 
Bartc

[toc] | [prev] | [next] | [standalone]


#84240

Fromraltbos@xs4all.nl (Richard Bos)
Date2016-03-17 12:27 +0000
Message-ID<56eaa29c.3853484@news.xs4all.nl>
In reply to#84054
boon <root@localhost> wrote:

> On 03/14/2016 11:06 PM, Richard Bos wrote:
> > Malcolm McLean <malcolm.mclean5@btinternet.com> wrote:
> >
> >> On Monday, March 14, 2016 at 8:48:38 PM UTC, Ian Collins wrote:
> >>>
> >>> There's a gap in the standard there, it does not specify the behaviour
> >>> when the delimiter is null.  My standard library's strtok shares a crash
> >>> with your version if I run that particular test :)
> >>>
> >> I'd just match the whole string.
> >> In the olden days an empty string and the null character pointer
> >> were the same.
> >
> > Must've been _very_ olden days... even on the Speccy that's bollocks.
> 
> have made a search... did you mean ZX Spectrum architecure (discontinued 
> in 1992)?

Yup, and more relevantly, started in 1982. First C compiler I can find,
1984. I've just tried it, and it uses all-bytes-zero as a null pointer
and duly prints the byte-code of the first interrupt call if you try to
print the string _at_ null. (It also correctly prints nothing if you
print an actual null string, of course.)

Richard

[toc] | [prev] | [next] | [standalone]


#84052

Fromboon <root@localhost>
Date2016-03-15 22:04 +0100
Message-ID<56e8788b$0$665$426a74cc@news.free.fr>
In reply to#83920
On 03/14/2016 09:48 PM, Ian Collins wrote:
> On 03/15/16 08:13, boon wrote:
>> On 03/14/2016 03:31 AM, Ian Collins wrote:
>>> On 03/14/16 05:13, boon wrote:

[...]

>> Thank you Ian. I noticed you have not used local variables (excepted the
>> ones used as formal parameters and static 'last' variable to save the
>> "parsing context"). I guess this implementation have chances to be
>> faster than mines.
>
> Gaining speed wasn't the intent, more of a case of style.  I often reuse
> parameter values in cases such as this.  If there is any potential
> performance boost it would come from having the equivalent of "strspn"
> in-line.

Is your style, GNU style? (default indent tool style)

>> Furthermore you added a check on 'delim' parameter. This is something I
>> have missed again.
>
> There's a gap in the standard there, it does not specify the behaviour
> when the delimiter is null.  My standard library's strtok shares a crash
> with your version if I run that particular test :)
>

Of course, but I am quite sure that strtok() (3) does not crash in such 
a case.

[toc] | [prev] | [next] | [standalone]


#84059

FromEric Sosman <esosman@comcast-dot-net.invalid>
Date2016-03-15 18:18 -0400
Message-ID<nca1hr$p3l$1@dont-email.me>
In reply to#84052
On 3/15/2016 5:04 PM, boon wrote:
> On 03/14/2016 09:48 PM, Ian Collins wrote:
>> [...]
>> There's a gap in the standard there, it does not specify the behaviour
>> when the delimiter is null.  My standard library's strtok shares a crash
>> with your version if I run that particular test :)
>
> Of course, but I am quite sure that strtok() (3) does not crash in such
> a case.

     The C Standard leaves the behavior "undefined," meaning that
(1) different implementations can behave differently, (2) they
need not behave sensibly, and (3) they need not even document
how they'll behave.  The "gap" Ian refers to isn't an oversight,
but rather a refusal to place requirements on the consequences of
a programming error: Feed a function invalid arguments, and all
bets are off.

     Reference: C Standard, section 7.1.4, paragraph 1.

-- 
esosman@comcast-dot-net.invalid
"Don't be afraid of work. Make work afraid of you." -- TLM

[toc] | [prev] | [next] | [standalone]


#84342

Fromboon <root@localhost>
Date2016-03-18 21:19 +0100
Message-ID<56ec627d$0$19746$426a74cc@news.free.fr>
In reply to#84059
On 03/15/2016 11:18 PM, Eric Sosman wrote:
> On 3/15/2016 5:04 PM, boon wrote:
>> On 03/14/2016 09:48 PM, Ian Collins wrote:
>>> [...]
>>> There's a gap in the standard there, it does not specify the behaviour
>>> when the delimiter is null.  My standard library's strtok shares a crash
>>> with your version if I run that particular test :)
>>
>> Of course, but I am quite sure that strtok() (3) does not crash in such
>> a case.
>
>      The C Standard leaves the behavior "undefined," meaning that
> (1) different implementations can behave differently, (2) they
> need not behave sensibly, and (3) they need not even document
> how they'll behave.  The "gap" Ian refers to isn't an oversight,
> but rather a refusal to place requirements on the consequences of
> a programming error: Feed a function invalid arguments, and all
> bets are off.
>
>      Reference: C Standard, section 7.1.4, paragraph 1.
>

Ok. Thank you for this clarification. Ian standard library's strtok is 
not buggy but has an undefined behavior when a NULL pointer is passed 
has delimiter parameter.

[toc] | [prev] | [next] | [standalone]


#84053

Fromboon <root@localhost>
Date2016-03-15 22:08 +0100
Message-ID<56e87999$0$3054$426a74cc@news.free.fr>
In reply to#83920
On 03/14/2016 09:48 PM, Ian Collins wrote:
> On 03/15/16 08:13, boon wrote:
>> On 03/14/2016 03:31 AM, Ian Collins wrote:
>>> On 03/14/16 05:13, boon wrote:

[...]

> There's a gap in the standard there, it does not specify the behaviour
> when the delimiter is null.  My standard library's strtok shares a crash
> with your version if I run that particular test :)
>
Well I have read too fast... what is your C library with a strtok() 
implementation as buggy as mine? ;)

[toc] | [prev] | [next] | [standalone]


#84058

Fromboon <root@localhost>
Date2016-03-15 22:32 +0100
Message-ID<56e87f32$0$3328$426a74cc@news.free.fr>
In reply to#83783
On 03/13/2016 05:13 PM, boon wrote:
> Hello,
>
> I am writing strtok() implementation, just for the fun and to improve my
> C coding style and skills.
>
> Here is my solution.
>
> char *my_strtok(char *str, const char *delim)
> {
>      char *ret, *s, *min_s;
>      const char *p;
>      static char *saveptr, *end;
>
>      if (str) {
>          end = str + strlen(str);
>          saveptr = str;
>      }
>
>      if (*saveptr == '\0') {
>          saveptr = end = NULL;
>          return NULL;
>      }
>
>      ret = saveptr;
>      min_s = end;
>
>      for (p = delim; *p != '\0'; p++) {
>          s = strchr(saveptr, *p);
>          if (s && s < min_s)
>              min_s = s;
>      }
>
>      if (min_s < end) {
>          *min_s++ = '\0';
>          saveptr = min_s;
>      } else {
>          saveptr = end;
>      }
>
>      return ret;
> }
>
> I have often difficulties in variable naming.
> Please do not hesitate to criticize this implementation.
>
> Regards.

In addition:

an strtok_r() implemention:


char *my_strtok(char *str, const char *delim, char **saveptr)
{
     char *ret;

     if (!delim || !saveptr)
         return NULL;

     if (str)
         *saveptr = str;

     *saveptr += strspn(*saveptr, delim);

     if (**saveptr == '\0')
         return NULL;

     ret = *saveptr;
     *saveptr += strcspn(*saveptr, delim);

     if (*saveptr == '\0') {
         *saveptr = NULL;
         return ret;
     }

     *(*saveptr)++ = '\0';

     return ret;
}


Sounds to be correct, but I have not implemented any unit test. I should 
have. I have found a nice unit test C library (libcmocka) which seems 
interesting. It uses ld(1) --wrap option to implement wrapper functions 
for symbols to be mocked. May be interesting for my next C exercises. 
But mocking is not required here for such little functions with no 
interactions with outside world.

https://lwn.net/Articles/558106/

[toc] | [prev] | [standalone]


Page 2 of 2 — ← Prev page 1 [2]

Back to top | Article view | comp.lang.c


csiph-web