Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail From: Keith Thompson Newsgroups: comp.lang.c Subject: Re: Testing nodes and lists for hashtable Date: Mon, 28 Mar 2016 12:37:11 -0700 Organization: None to speak of Lines: 69 Message-ID: References: <31d527f2-3806-4195-a754-38bea32b5d79@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Info: mx02.eternal-september.org; posting-host="945944de09706c9b4e29b53c9d2efdc2"; logging-data="30502"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18E6t/zBGVibj4NE/9vV0dd" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) Cancel-Lock: sha1:oFdMDx+x1W6QyO0Tyqk+0Ib3UPw= sha1:Zi8qamTOrKBReSDENQ7RwjPGKls= Xref: csiph.com comp.lang.c:85199 Alla _ writes: > Please, take a look at the explanation on the apostrophe issue > from the member of the course's staff, who didn't believe me initially: > [quote] > It's always fun when someone finds a really unique problem! Because you were so adamant that there was a problem, I went back and used your test data to prove > there wasn't. I was surprised when it did what you said. > > Well, there's sort of a problem. In this case, the problem is the input data > and internationalization. As I said, the code is well tested and handles > apostrophes correctly. The problem here is that " John's " doesn't contain an > apostrophe. It appears to be something else, a UTF-8 right single quotation > mark. When I copied it and ran it through a converter, it returned a digital > value of 8217 or hex 0x2019. > > Speller is not set up to handle internationalized character codes, UTF-8, or > anything similar, only standard ASCII characters. For starters, UTF-8 > characters are too large for standard char storage. This particular symbol > doesn't appear to be an alpha, a digit or an apostrophe to speller, so it > treats it as an end of word, thus causing the behavior you are seeing. (I'm > thinking that there are certain ASCII chars that will cause the same behavior, > but you can test for that if you wish.) > > The short fix is to use an actual apostrophe, " ' ", ASCII 39, or the single > quote key next to the enter key on US keyboards and not " ' ".[/quote] > > When I claimed that I use an English keyboard, and I do always use the standard > apostrophe (the one that is on the same button as the quotation mark), he > said that: > [quote] > Don't know what's going on with your computer, but I'd suspect that it's quite > probably configured to use UTF-8 encoding from your keyboard, or it's taking > whatever your keyboard uses locally and mapping to UTF-8. > [/quote] > I have to learn more about this issue, it's interesting ) There are no non-ASCII characters in the above quoted article. All the "'" characters are ASCII apostrophes, '\x27'. Getting a Unicode RIGHT SINGLE QUOTATION MARK, 0x2019, in copied text is a common problem. It can easily occur when copying from a web page, or from a document created by some word processor such as MS Word. Such programs will sometimes automatically replace single and double quotes by their left and right equivalents as you type. Hyphens and dashes can also be an issue. To see if there's really an issue with your keyboard, assuming you're on a Unix/Linux-like system: $ echo "'" | od -c 0000000 ' \n 0000002 $ Type the command, don't copy-and-paste it from this article. Also try inserting a "'" character into a text file using your preferred text editor, saving the file, and running "od -c" on the file. But I'm skeptical that it's a keyboard issue. If your keyboard were generating something other than an ASCII apostrophe when you type "'", you wouldn't be able to use C character constants. It's more likely, I think, that you copied the string "John's" (or rather "John’s") from a file or web page. -- Keith Thompson (The_Other_Keith) kst-u@mib.org Working, but not speaking, for JetHead Development, Inc. "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"