Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #234882 > unrolled thread

Re: portable way to get highest bit set?

Started byAnton Shepelev <anton.txt@g{oogle}mail.com>
First post2023-10-12 11:11 +0300
Last post2023-10-12 18:31 +0300
Articles 11 — 4 participants

Back to article view | Back to comp.lang.c

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: portable way to get highest bit set? Anton Shepelev <anton.txt@g{oogle}mail.com> - 2023-10-12 11:11 +0300
    Re: portable way to get highest bit set? Michael S <already5chosen@yahoo.com> - 2023-10-12 02:27 -0700
    Re: portable way to get highest bit set? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-10-12 02:41 -0700
      Re: portable way to get highest bit set? Anton Shepelev <anton.txt@g{oogle}mail.com> - 2023-10-12 14:17 +0300
        Re: portable way to get highest bit set? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-10-12 07:01 -0700
          Re: portable way to get highest bit set? scott@slp53.sl.home (Scott Lurndal) - 2023-10-12 14:18 +0000
            Re: portable way to get highest bit set? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-10-12 07:58 -0700
          Re: portable way to get highest bit set? Michael S <already5chosen@yahoo.com> - 2023-10-12 17:30 +0300
            Re: portable way to get highest bit set? Tim Rentsch <tr.17687@z991.linuxsc.com> - 2023-10-12 08:05 -0700
              Re: portable way to get highest bit set? Michael S <already5chosen@yahoo.com> - 2023-10-12 19:16 +0300
          Re: portable way to get highest bit set? Anton Shepelev <anton.txt@g{oogle}mail.com> - 2023-10-12 18:31 +0300

#234882 — Re: portable way to get highest bit set?

FromAnton Shepelev <anton.txt@g{oogle}mail.com>
Date2023-10-12 11:11 +0300
SubjectRe: portable way to get highest bit set?
Message-ID<20231012111100.272c96b3209baad26a150e55@g{oogle}mail.com>
Tim Rentsch:

> I've coded up a dozen[*] different solutions.
> Here is my current avorite[**]:
>
>     typedef union {
>         long double d;
>         struct { unsigned long long s; unsigned e:15;} b;
>     } X;
>
>     unsigned long long
>     high_bit_set( unsigned long long u ){
>         return  1ULL << (X){ .d = u }.b.e - 16383  & u;
>     }

And mine is quite anticlimactic.  It accepts a byte, because
I wrote it yesterday, before learning that the OP needs a
more generic solution:

   unsigned char highest_bit( unsigned char byte )
   {    int           i            ;
        unsigned char mask = 1 << 7;
        for( i = 0; i <= 8; i += 1 )
        {    if( byte & mask )
             {    break;  }
             mask >>= 1;
        }
        return mask;
   }

It is the most straight-forward method that I could think
of, and extensible to larger sizes of the variable.

-- 
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments

[toc] | [next] | [standalone]


#235911

FromMichael S <already5chosen@yahoo.com>
Date2023-10-12 02:27 -0700
Message-ID<1d76464c-1f6d-423d-ab97-ea24af85558cn@googlegroups.com>
In reply to#234882
On Thursday, October 12, 2023 at 11:11:16 AM UTC+3, Anton Shepelev wrote:
> Tim Rentsch:
> > I've coded up a dozen[*] different solutions.
> > Here is my current avorite[**]:
> > 
> > typedef union { 
> > long double d; 
> > struct { unsigned long long s; unsigned e:15;} b; 
> > } X; 
> > 
> > unsigned long long 
> > high_bit_set( unsigned long long u ){ 
> > return 1ULL << (X){ .d = u }.b.e - 16383 & u; 
> > }
> And mine is quite anticlimactic. It accepts a byte, because 
> I wrote it yesterday, before learning that the OP needs a 
> more generic solution: 
> 
> unsigned char highest_bit( unsigned char byte ) 
> { int i ; 
> unsigned char mask = 1 << 7; 
> for( i = 0; i <= 8; i += 1 ) 
> { if( byte & mask ) 
> { break; } 
> mask >>= 1; 
> } 
> return mask; 
> } 
> 
> It is the most straight-forward method that I could think 
> of, and extensible to larger sizes of the variable.
> -- 

There exist more straight-forward method that you could think of 
if you try to think a little longer. It is suggested by IanC near beginning
of this thread starting from "If it was me I'd do it in reverse".


> () ascii ribbon campaign -- against html e-mail 
> /\ www.asciiribbon.org -- against proprietary attachments

[toc] | [prev] | [next] | [standalone]


#236095

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-10-12 02:41 -0700
Message-ID<86cyxkb2ka.fsf@linuxsc.com>
In reply to#234882
Anton Shepelev <anton.txt@g{oogle}mail.com> writes:

> Tim Rentsch:
>
>> I've coded up a dozen[*] different solutions.
>> Here is my current avorite[**]:
>>
>>     typedef union {
>>         long double d;
>>         struct { unsigned long long s; unsigned e:15;} b;
>>     } X;
>>
>>     unsigned long long
>>     high_bit_set( unsigned long long u ){
>>         return  1ULL << (X){ .d = u }.b.e - 16383  & u;
>>     }
>
> And mine is quite anticlimactic.  It accepts a byte, because
> I wrote it yesterday, before learning that the OP needs a
> more generic solution:
>
>    unsigned char highest_bit( unsigned char byte )
>    {    int           i            ;
>         unsigned char mask = 1 << 7;
>         for( i = 0; i <= 8; i += 1 )
>         {    if( byte & mask )
>              {    break;  }
>              mask >>= 1;
>         }
>         return mask;
>    }
>
> It is the most straight-forward method that I could think
> of, and extensible to larger sizes of the variable.

But the approach you are using requires knowing the size
of the type involved.  The challenge is to write code
that works without having to know the size in advance.
So something like the following, with type T being an
unknown unsigned integer type:

    T
    highest_bit_set( T u ){
        T r;
        /* set r to the highest bit set in u */
        return  r;
    }

I have written code for this form of the problem, but
won't be posting it before the weekend.  Try it!

P.S. My earlier "solution" was just for fun, not a serious
attempt to solve the problem.

[toc] | [prev] | [next] | [standalone]


#236929

FromAnton Shepelev <anton.txt@g{oogle}mail.com>
Date2023-10-12 14:17 +0300
Message-ID<20231012141719.99f5a10ec921db3ee6f7d948@g{oogle}mail.com>
In reply to#236095
Tim Rentsch:

> But the approach you are using requires knowing the size
> of the type involved.  The challenge is to write code that
> works without having to know the size in advance.

Well, I could use macros, or the largest integer type, or:
#include <stddef.h>
#include <limits.h>

char is_le()
{  const int   v = 1;
   const char* p = (unsigned char*)&v;
   return *p == 0;
}

unsigned char highest_bit_1( unsigned char byte )
{  int           i            ;
   unsigned char mask = 1 << (CHAR_BIT-1);
   for( i = 0; i <= CHAR_BIT; i += 1 )
   {  if( byte & mask )
      {  break;  }
      mask >>= 1;
   }
   return mask;
}

unsigned long long highest_bit_2( void* val, size_t sz )
{  int            i,f,inc;
   unsigned char *p = val;
   unsigned char        c;
   char           le = is_le();

   if( le )
   {  inc =  1;            } else
   {  inc = -1; p += sz-1; }

   f = sz;
   for( i = 0; i < sz; i += 1 )
   {  if( (c = *p) == 0 )
      {  p += inc;  }
      else
      {  f = i;
         break;
      }
   }
   return highest_bit_1( c ) << (CHAR_BIT*(sz-1-f));
}


-- 
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments

[toc] | [prev] | [next] | [standalone]


#237611

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-10-12 07:01 -0700
Message-ID<864jiwaqic.fsf@linuxsc.com>
In reply to#236929
Anton Shepelev <anton.txt@g{oogle}mail.com> writes:

> Tim Rentsch:
>
>> But the approach you are using requires knowing the size
>> of the type involved.  The challenge is to write code that
>> works without having to know the size in advance.
>
> Well, I could use macros, or the largest integer type, or:

Neither necessary nor sufficient.  The code should work for
the type T being __uint128_t, which is larger than uintmax_t.

> #include <stddef.h>
> #include <limits.h>
>
> [..examine value character-by-character, including
>    determining if little-endian encoding is used..]

You're making things more complicated than they need to be.
Looking at character values rather than just values of the type
involved is almost guaranteed to have potential representation
issues.  For example, little endian and big endian are not the
only choices possible for endianness.

Below is a proposed solution that mostly works.  What problems
does it have, and how can they be fixed?  And can we find
something better?

    /* The type T has been supplied somewhere upstream.  It */
    /* is an unsigned type, and possibly outside the realm  */
    /* of 'integer types' as defined by the C standard, as  */
    /* for example __uint128_t.                             */

    T
    highest_bit_set( T u ){
        T r = 1;
        while(  u>>1 > r  )  r <<= 1;
        return  r;
    }

[toc] | [prev] | [next] | [standalone]


#237682

Fromscott@slp53.sl.home (Scott Lurndal)
Date2023-10-12 14:18 +0000
Message-ID<ffTVM.60121$8fO.5299@fx15.iad>
In reply to#237611
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>Anton Shepelev <anton.txt@g{oogle}mail.com> writes:
>
>> Tim Rentsch:
>>
>>> But the approach you are using requires knowing the size
>>> of the type involved.  The challenge is to write code that
>>> works without having to know the size in advance.
>>
>> Well, I could use macros, or the largest integer type, or:
>
>Neither necessary nor sufficient.  The code should work for
>the type T being __uint128_t, which is larger than uintmax_t.
>
>> #include <stddef.h>
>> #include <limits.h>
>>
>> [..examine value character-by-character, including
>>    determining if little-endian encoding is used..]
>
>You're making things more complicated than they need to be.
>Looking at character values rather than just values of the type
>involved is almost guaranteed to have potential representation
>issues.  For example, little endian and big endian are not the
>only choices possible for endianness.
>
>Below is a proposed solution that mostly works.  What problems
>does it have, and how can they be fixed?  And can we find
>something better?
>
>    /* The type T has been supplied somewhere upstream.  It */
>    /* is an unsigned type, and possibly outside the realm  */
>    /* of 'integer types' as defined by the C standard, as  */
>    /* for example __uint128_t.                             */
>
>    T
>    highest_bit_set( T u ){
>        T r = 1;
>        while(  u>>1 > r  )  r <<= 1;
>        return  r;
>    }

Given a reasonable compiler (gcc or clang), I'd use

/**
 * Return the 1-relative bit number of the
 * highest set bit.   If no bits are set,
 * return zero.
 */
size_t
highest_bit_set(unsigned long long u)
{
    return u ? 64ul - __builtin_clzll(u) : 0ul;
}

[toc] | [prev] | [next] | [standalone]


#237840

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-10-12 07:58 -0700
Message-ID<86zg0nanw5.fsf@linuxsc.com>
In reply to#237682
scott@slp53.sl.home (Scott Lurndal) writes:

> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>
>> Anton Shepelev <anton.txt@g{oogle}mail.com> writes:
>>
>>> Tim Rentsch:
>>>
>>>> But the approach you are using requires knowing the size
>>>> of the type involved.  The challenge is to write code that
>>>> works without having to know the size in advance.
>>>
>>> Well, I could use macros, or the largest integer type, or:
>>
>> Neither necessary nor sufficient.  The code should work for
>> the type T being __uint128_t, which is larger than uintmax_t.
>>
>>> #include <stddef.h>
>>> #include <limits.h>
>>>
>>> [..examine value character-by-character, including
>>>    determining if little-endian encoding is used..]
>>
>> You're making things more complicated than they need to be.
>> Looking at character values rather than just values of the type
>> involved is almost guaranteed to have potential representation
>> issues.  For example, little endian and big endian are not the
>> only choices possible for endianness.
>>
>> Below is a proposed solution that mostly works.  What problems
>> does it have, and how can they be fixed?  And can we find
>> something better?
>>
>>    /* The type T has been supplied somewhere upstream.  It */
>>    /* is an unsigned type, and possibly outside the realm  */
>>    /* of 'integer types' as defined by the C standard, as  */
>>    /* for example __uint128_t.                             */
>>
>>    T
>>    highest_bit_set( T u ){
>>        T r = 1;
>>        while(  u>>1 > r  )  r <<= 1;
>>        return  r;
>>    }
>
> Given a reasonable compiler (gcc or clang), I'd use
>
> /**
>  * Return the 1-relative bit number of the
>  * highest set bit.   If no bits are set,
>  * return zero.
>  */
> size_t
> highest_bit_set(unsigned long long u)
> {
>     return u ? 64ul - __builtin_clzll(u) : 0ul;
> }

That gives a solution to a different problem.

When you have code that solves the problem being
discussed, I'm interested to see it.

[toc] | [prev] | [next] | [standalone]


#237719

FromMichael S <already5chosen@yahoo.com>
Date2023-10-12 17:30 +0300
Message-ID<20231012173021.0000149c@yahoo.com>
In reply to#237611
On Thu, 12 Oct 2023 07:01:47 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

> Anton Shepelev <anton.txt@g{oogle}mail.com> writes:
> 
> > Tim Rentsch:
> >
> >> But the approach you are using requires knowing the size
> >> of the type involved.  The challenge is to write code that
> >> works without having to know the size in advance.
> >
> > Well, I could use macros, or the largest integer type, or:
> 
> Neither necessary nor sufficient.  The code should work for
> the type T being __uint128_t, which is larger than uintmax_t.
> 
> > #include <stddef.h>
> > #include <limits.h>
> >
> > [..examine value character-by-character, including
> >    determining if little-endian encoding is used..]
> 
> You're making things more complicated than they need to be.
> Looking at character values rather than just values of the type
> involved is almost guaranteed to have potential representation
> issues.  For example, little endian and big endian are not the
> only choices possible for endianness.
> 
> Below is a proposed solution that mostly works.  What problems
> does it have, and how can they be fixed?  And can we find
> something better?
> 
>     /* The type T has been supplied somewhere upstream.  It */
>     /* is an unsigned type, and possibly outside the realm  */
>     /* of 'integer types' as defined by the C standard, as  */
>     /* for example __uint128_t.                             */
> 
>     T
>     highest_bit_set( T u ){
>         T r = 1;
>         while(  u>>1 > r  )  r <<= 1;
>         return  r;
>     }

That code does not look like working.
That is:

T
highest_bit_set1( T u ){
  T r = 0;
  while (u > r) r = r*2+1;
  return r ^ (r>>1);
}

T
highest_bit_set1( T u ){
  T r = 0;
  while( u > r) r = r*2+1;
  return r ^ (r>>1);
}

[toc] | [prev] | [next] | [standalone]


#237875

FromTim Rentsch <tr.17687@z991.linuxsc.com>
Date2023-10-12 08:05 -0700
Message-ID<86v8bbanjv.fsf@linuxsc.com>
In reply to#237719
Michael S <already5chosen@yahoo.com> writes:

> On Thu, 12 Oct 2023 07:01:47 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Anton Shepelev <anton.txt@g{oogle}mail.com> writes:
>>
>>> Tim Rentsch:
>>>
>>>> But the approach you are using requires knowing the size
>>>> of the type involved.  The challenge is to write code that
>>>> works without having to know the size in advance.
>>>
>>> Well, I could use macros, or the largest integer type, or:
>>
>> Neither necessary nor sufficient.  The code should work for
>> the type T being __uint128_t, which is larger than uintmax_t.
>>
>>> #include <stddef.h>
>>> #include <limits.h>
>>>
>>> [..examine value character-by-character, including
>>>    determining if little-endian encoding is used..]
>>
>> You're making things more complicated than they need to be.
>> Looking at character values rather than just values of the type
>> involved is almost guaranteed to have potential representation
>> issues.  For example, little endian and big endian are not the
>> only choices possible for endianness.
>>
>> Below is a proposed solution that mostly works.  What problems
>> does it have, and how can they be fixed?  And can we find
>> something better?
>>
>>     /* The type T has been supplied somewhere upstream.  It */
>>     /* is an unsigned type, and possibly outside the realm  */
>>     /* of 'integer types' as defined by the C standard, as  */
>>     /* for example __uint128_t.                             */
>>
>>     T
>>     highest_bit_set( T u ){
>>         T r = 1;
>>         while(  u>>1 > r  )  r <<= 1;
>>         return  r;
>>     }
>
> That code does not look like working.

Yes, deliberately so.  I didn't want to give working
code yet, so other people could work on the problem.

> That is:
>
> T
> highest_bit_set1( T u ){
>   T r = 0;
>   while (u > r) r = r*2+1;
>   return r ^ (r>>1);
> }

As expected, this works.  I like the way it naturally
accommodates a zero input.

Can you find a different solution that works in
logarithmic time rather than linear time?

[toc] | [prev] | [next] | [standalone]


#238355

FromMichael S <already5chosen@yahoo.com>
Date2023-10-12 19:16 +0300
Message-ID<20231012191641.00007493@yahoo.com>
In reply to#237875
On Thu, 12 Oct 2023 08:05:40 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

> Michael S <already5chosen@yahoo.com> writes:
> 
> > On Thu, 12 Oct 2023 07:01:47 -0700
> > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >
> >> Anton Shepelev <anton.txt@g{oogle}mail.com> writes:
> >>
> >>> Tim Rentsch:
> >>>
> >>>> But the approach you are using requires knowing the size
> >>>> of the type involved.  The challenge is to write code that
> >>>> works without having to know the size in advance.
> >>>
> >>> Well, I could use macros, or the largest integer type, or:
> >>
> >> Neither necessary nor sufficient.  The code should work for
> >> the type T being __uint128_t, which is larger than uintmax_t.
> >>
> >>> #include <stddef.h>
> >>> #include <limits.h>
> >>>
> >>> [..examine value character-by-character, including
> >>>    determining if little-endian encoding is used..]
> >>
> >> You're making things more complicated than they need to be.
> >> Looking at character values rather than just values of the type
> >> involved is almost guaranteed to have potential representation
> >> issues.  For example, little endian and big endian are not the
> >> only choices possible for endianness.
> >>
> >> Below is a proposed solution that mostly works.  What problems
> >> does it have, and how can they be fixed?  And can we find
> >> something better?
> >>
> >>     /* The type T has been supplied somewhere upstream.  It */
> >>     /* is an unsigned type, and possibly outside the realm  */
> >>     /* of 'integer types' as defined by the C standard, as  */
> >>     /* for example __uint128_t.                             */
> >>
> >>     T
> >>     highest_bit_set( T u ){
> >>         T r = 1;
> >>         while(  u>>1 > r  )  r <<= 1;
> >>         return  r;
> >>     }
> >
> > That code does not look like working.
> 
> Yes, deliberately so.  I didn't want to give working
> code yet, so other people could work on the problem.
> 
> > That is:
> >
> > T
> > highest_bit_set1( T u ){
> >   T r = 0;
> >   while (u > r) r = r*2+1;
> >   return r ^ (r>>1);
> > }
> 
> As expected, this works.  I like the way it naturally
> accommodates a zero input.
> 
> Can you find a different solution that works in
> logarithmic time rather than linear time?

Naturally, I can. But the code is not as easy to comprehend as I would
like. And I am not sure that for typical distributions with prevalence
of small numbers it is faster than above variant.


T
highest_bit_set2( T u ){
  T v;
  for (int rshift = 0; (v = (u >> rshift) >> 1) != 0; rshift = rshift*2+1)
    u |= v;
  return u ^ (u>>1);
}

[toc] | [prev] | [next] | [standalone]


#237990

FromAnton Shepelev <anton.txt@g{oogle}mail.com>
Date2023-10-12 18:31 +0300
Message-ID<20231012183135.a6faa3027ccac0820b24ad96@g{oogle}mail.com>
In reply to#237611
Tim Rentsch:

> The code should work for the type T being __uint128_t,
> which is larger than uintmax_t.

I was in my C89 mode:

#include <stdio.h>
#include <stddef.h>
#include <limits.h>

char is_le()
{  const int   v = 1;
   const char* p = (unsigned char*)&v;
   return *p == 0;
}

unsigned char highest_bit_1( unsigned char byte )
{  int           i            ;
   unsigned char mask = 1 << (CHAR_BIT-1);
   for( i = 0; i <= CHAR_BIT; i += 1 )
   {  if( byte & mask )
      {  break;  }
      mask >>= 1;
   }
   return mask;
}

/* sz must be the size of the value /and/ of the buffer: */
void highest_bit_2( void* val, void* buffer, size_t sz )
{  int            i,inc;
   unsigned char *pval   = val;
   unsigned char *pbuf = buffer;
   unsigned char        c     ;
   char           le  = is_le();
   char           set;

   if( le )
   {  inc =  1;                             } else
   {  inc = -1; pval += sz-1; pbuf += sz-1; }

   set = 0;
   for( i = 0; i < sz; i += 1 )
   {  c = *pval  ;
      if( !set && c != 0 )
      {  *pbuf = highest_bit_1( c );
         set = 1;
      }
      else *pbuf = 0;
      pval += inc; pbuf += inc;
   }
}

void test( long int val )
{  char     buf[sizeof(long int)];

   highest_bit_2(&val, &buf, sizeof( long int ));
   printf("Value: %9x, High-bit mask: %9x\n", val, *(long unsigned int *)&buf);
}


int main(void)
{  test(0x0); test(0x1); test(0x2); test(0x3); test(0x4); test(0x5);

   test((1<<17)-1);  test((1<<17));    test((1<<17)+1);
}

...printf("%x"), anybody?

-- 
()  ascii ribbon campaign -- against html e-mail
/\  www.asciiribbon.org   -- against proprietary attachments

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.c


csiph-web