Groups | Search | Server Info | Login | Register


Groups > perl.unicode > #193

Comparing inputs with source strings

Newsgroups perl.unicode
Subject Comparing inputs with source strings
Date 2016-05-09 16:53 +0200
Message-ID <87y47j1dwk.fsf@hati.baby-gnu.org> (permalink)
From daniel.dehennin@baby-gnu.org (Daniel Dehennin)

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

Hello,

I tried to make my Perl5 code unicode compliant after reading a post on
stackoverflow[1].

As suggested in the post:

    “always run incoming stuff through NFD and outbound stuff from NFC.”

I got a hard time finding why my Test::More was failing but displaying
exactly the same strings for “got” and “expected”.

I finally check how UTF-8 sources are handled and found that they are in
NFC form, I run the following script:

#+begin_src perl
#!/usr/bin/env perl

use utf8;
use warnings;

use Test::More;
use Unicode::Normalize;

my $unistring = 'C’est une chaîne unicode';

my @forms = ("NFD", "NFC", "NFKD", "NFKC");

for my $form (@forms) {
	if ($unistring eq &$form($unistring)) {
		print "UTF-8 source is in form '$form'\n";
	}
}
#+end_src

and got:

#+begin_src
UTF-8 source is in form 'NFC'
UTF-8 source is in form 'NFKC'
#+end_src

So, the Test::More::is_deeply was trying to compare an input in NFD with
the expected string in NFC.

My code can use Unicode::Collate, but for all the code I did not write I
wonder if there is a way to handle it cleanly.

Or maybe I'm doing something wrong?

Regards.

Footnotes: 
[1]  https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default

-- 
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

Back to perl.unicode | Previous | NextNext in thread | Find similar


Thread

Comparing inputs with source strings daniel.dehennin@baby-gnu.org (Daniel Dehennin) - 2016-05-09 16:53 +0200
  Re: Comparing inputs with source strings daniel.dehennin@baby-gnu.org (Daniel Dehennin) - 2016-05-10 13:21 +0200
    Re: Comparing inputs with source strings daniel.dehennin@baby-gnu.org (Daniel Dehennin) - 2016-05-10 13:45 +0200
  Re: Comparing inputs with source strings public@khwilliamson.com (Karl Williamson) - 2016-05-10 22:48 -0600
    Re: Comparing inputs with source strings daniel.dehennin@baby-gnu.org (Daniel Dehennin) - 2016-05-11 10:04 +0200
      Re: Comparing inputs with source strings public@khwilliamson.com (Karl Williamson) - 2016-05-11 14:51 -0600

csiph-web