Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.c > #395382

Re: Unicode...

From Philipp Klaus Krause <pkk@spth.de>
Newsgroups comp.lang.c
Subject Re: Unicode...
Date 2025-11-23 12:42 +0100
Message-ID <10furus$hl6g$1@solani.org> (permalink)
References <10f85f9$33pck$1@dont-email.me>

Show all headers | View raw


Am 14.11.25 um 22:03 schrieb Michael Sanders:
> static int utf8_width(const char *s) {
>      int w = 0;
>      const unsigned char *p = (const unsigned char *)s;
> 
>      while (*p) {
>          if (*p < 0x80) { w++; p++; } // ASCII 1-byte
>          else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8
>          else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8
>          else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8
>          else { w++; p++; } // fallback
>      }
> 
>      return w;
> }
Do you need this to work under non-UTF-8 locales? If you only need that 
length when the locale is UTF-8, why not just use mblen from stdlib.h?

Philipp

Back to comp.lang.c | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-14 21:03 +0000
  Re: Unicode... Kaz Kylheku <643-408-1753@kylheku.com> - 2025-11-14 21:20 +0000
    Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-14 21:46 +0000
      Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-14 16:12 -0800
        Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 00:46 +0000
          Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-14 18:47 -0800
            Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:10 +0000
              Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-15 13:51 -0800
                Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 22:31 +0000
  Re: Unicode... richard@cogsci.ed.ac.uk (Richard Tobin) - 2025-11-14 23:23 +0000
    Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-14 23:51 +0000
  Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-14 16:11 -0800
    Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 00:49 +0000
  Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-15 05:51 +0100
    Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-15 06:24 +0100
      Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:28 +0000
        Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-19 11:56 +0100
          Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-21 02:21 +0000
            Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-21 11:10 +0100
      Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-16 00:38 +0000
      Re: Unicode... bart <bc@freeuk.com> - 2025-11-21 17:03 +0000
        Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-21 17:39 +0000
        Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 06:39 +0100
          Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 11:55 +0000
            Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 14:10 +0100
              Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 13:38 +0000
                Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 15:08 +0100
                Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 14:28 +0000
                Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 15:51 +0100
                Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 16:05 +0100
                Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 16:35 +0000
                Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 18:13 +0100
                Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 17:35 +0000
                Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 17:39 +0000
                Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-22 15:24 -0800
                Re: Unicode... bart <bc@freeuk.com> - 2025-11-23 00:14 +0000
                Re: Unicode... David Brown <david.brown@hesbynett.no> - 2025-11-23 13:32 +0100
                Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 18:44 +0100
                Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 19:28 +0000
                Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 20:59 +0100
                Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-26 19:42 +0100
    Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:06 +0000
  Re: Unicode... Mikko <mikko.levanto@iki.fi> - 2025-11-15 12:47 +0200
    Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:09 +0000
      Re: Unicode... Mikko <mikko.levanto@iki.fi> - 2025-11-16 11:22 +0200
  Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:14 +0000
    Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 20:16 +0000
  Unicode Sorting (Was Re: Unicode...) Michael Sanders <porkchop@invalid.foo> - 2025-11-16 20:30 +0000
    Re: Unicode Sorting (Was Re: Unicode...) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-16 16:13 -0800
  Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-17 23:49 +0000
    Re: Unicode... James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-11-18 14:27 -0500
      Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-18 20:17 +0000
        Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-18 20:40 +0000
        Re: Unicode... James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-11-19 09:08 -0500
          Re: Unicode... Michael Bäuerle <michael.baeuerle@stz-e.de> - 2025-11-19 15:29 +0100
          Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-19 19:22 +0000
        Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-12-03 06:24 +0100
          Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-12-03 18:33 +0000
            Re: Unicode... James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-12-03 14:01 -0500
              Re: Unicode... bart <bc@freeuk.com> - 2025-12-03 20:15 +0000
                Re: Unicode... Michael S <already5chosen@yahoo.com> - 2025-12-03 22:43 +0200
                Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-12-03 12:49 -0800
                Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-12-03 18:15 -0800
              Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-12-03 23:23 +0000
              Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-12-04 14:15 +0100
            Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-12-04 14:03 +0100
          Binary Search Trees (Was Re: Unicode...) Michael Sanders <porkchop@invalid.foo> - 2025-12-04 04:11 +0000
  Re: Unicode... Philipp Klaus Krause <pkk@spth.de> - 2025-11-23 12:42 +0100
    Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-23 22:05 +0000

csiph-web