Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #395382
| From | Philipp Klaus Krause <pkk@spth.de> |
|---|---|
| Newsgroups | comp.lang.c |
| Subject | Re: Unicode... |
| Date | 2025-11-23 12:42 +0100 |
| Message-ID | <10furus$hl6g$1@solani.org> (permalink) |
| References | <10f85f9$33pck$1@dont-email.me> |
Am 14.11.25 um 22:03 schrieb Michael Sanders:
> static int utf8_width(const char *s) {
> int w = 0;
> const unsigned char *p = (const unsigned char *)s;
>
> while (*p) {
> if (*p < 0x80) { w++; p++; } // ASCII 1-byte
> else if ((*p & 0xE0) == 0xC0) { w++; p += 2; } // 2-byte UTF-8
> else if ((*p & 0xF0) == 0xE0) { w++; p += 3; } // 3-byte UTF-8
> else if ((*p & 0xF8) == 0xF0) { w++; p += 4; } // 4-byte UTF-8
> else { w++; p++; } // fallback
> }
>
> return w;
> }
Do you need this to work under non-UTF-8 locales? If you only need that
length when the locale is UTF-8, why not just use mblen from stdlib.h?
Philipp
Back to comp.lang.c | Previous | Next — Previous in thread | Next in thread | Find similar
Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-14 21:03 +0000
Re: Unicode... Kaz Kylheku <643-408-1753@kylheku.com> - 2025-11-14 21:20 +0000
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-14 21:46 +0000
Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-14 16:12 -0800
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 00:46 +0000
Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-14 18:47 -0800
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:10 +0000
Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-15 13:51 -0800
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 22:31 +0000
Re: Unicode... richard@cogsci.ed.ac.uk (Richard Tobin) - 2025-11-14 23:23 +0000
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-14 23:51 +0000
Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-14 16:11 -0800
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 00:49 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-15 05:51 +0100
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-15 06:24 +0100
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:28 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-19 11:56 +0100
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-21 02:21 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-21 11:10 +0100
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-16 00:38 +0000
Re: Unicode... bart <bc@freeuk.com> - 2025-11-21 17:03 +0000
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-21 17:39 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 06:39 +0100
Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 11:55 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 14:10 +0100
Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 13:38 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 15:08 +0100
Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 14:28 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 15:51 +0100
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 16:05 +0100
Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 16:35 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 18:13 +0100
Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 17:35 +0000
Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 17:39 +0000
Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-22 15:24 -0800
Re: Unicode... bart <bc@freeuk.com> - 2025-11-23 00:14 +0000
Re: Unicode... David Brown <david.brown@hesbynett.no> - 2025-11-23 13:32 +0100
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 18:44 +0100
Re: Unicode... bart <bc@freeuk.com> - 2025-11-22 19:28 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-22 20:59 +0100
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-11-26 19:42 +0100
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:06 +0000
Re: Unicode... Mikko <mikko.levanto@iki.fi> - 2025-11-15 12:47 +0200
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:09 +0000
Re: Unicode... Mikko <mikko.levanto@iki.fi> - 2025-11-16 11:22 +0200
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 19:14 +0000
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-15 20:16 +0000
Unicode Sorting (Was Re: Unicode...) Michael Sanders <porkchop@invalid.foo> - 2025-11-16 20:30 +0000
Re: Unicode Sorting (Was Re: Unicode...) Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-11-16 16:13 -0800
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-17 23:49 +0000
Re: Unicode... James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-11-18 14:27 -0500
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-18 20:17 +0000
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-18 20:40 +0000
Re: Unicode... James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-11-19 09:08 -0500
Re: Unicode... Michael Bäuerle <michael.baeuerle@stz-e.de> - 2025-11-19 15:29 +0100
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-19 19:22 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-12-03 06:24 +0100
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-12-03 18:33 +0000
Re: Unicode... James Kuyper <jameskuyper@alumni.caltech.edu> - 2025-12-03 14:01 -0500
Re: Unicode... bart <bc@freeuk.com> - 2025-12-03 20:15 +0000
Re: Unicode... Michael S <already5chosen@yahoo.com> - 2025-12-03 22:43 +0200
Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-12-03 12:49 -0800
Re: Unicode... Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2025-12-03 18:15 -0800
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-12-03 23:23 +0000
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-12-04 14:15 +0100
Re: Unicode... Bonita Montero <Bonita.Montero@gmail.com> - 2025-12-04 14:03 +0100
Binary Search Trees (Was Re: Unicode...) Michael Sanders <porkchop@invalid.foo> - 2025-12-04 04:11 +0000
Re: Unicode... Philipp Klaus Krause <pkk@spth.de> - 2025-11-23 12:42 +0100
Re: Unicode... Michael Sanders <porkchop@invalid.foo> - 2025-11-23 22:05 +0000
csiph-web