Groups > perl.unicode > #192 > unrolled thread

Re: UTF-8 encoding & decoding

Started by	public@khwilliamson.com (Karl Williamson)
First post	2016-05-06 09:24 -0600
Last post	2016-05-15 05:05 +0200
Articles	2 — 2 participants

Back to article view | Back to perl.unicode

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: UTF-8 encoding & decoding public@khwilliamson.com (Karl Williamson) - 2016-05-06 09:24 -0600
    Re: UTF-8 encoding & decoding pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-05-15 05:05 +0200

#192 — Re: UTF-8 encoding & decoding

From	public@khwilliamson.com (Karl Williamson)
Date	2016-05-06 09:24 -0600
Subject	Re: UTF-8 encoding & decoding
Message-ID	<572CB711.2060307@khwilliamson.com>

On 05/05/2016 08:37 AM, Pali Rohár wrote:
> Hi!
>
> I though that I understand UTF-8 encoding/decoding done in perl until I
> looked into source code of Encode package... (exactly sub encode_utf8)
>
> Before... I only read description of Encode package (not source code):
> https://metacpan.org/pod/Encode#UTF-8-vs.-utf8-vs.-UTF8
>
> I tried to find some more information (ideally those which answer my
> question) but without success. Can you help me? My questions are:
>
> 1. What is difference between those two calls?
>
>   utf8::encode($str);
>
> and
>
>   $str = Encode::encode('utf8', $str);
>
> 2. What is difference between those?
>
>   utf8::decode($str);
>   $str = Encode::decode_utf8($str);

Each pair of functions is supposed to do essentially the same thing. I 
have not studied them to know what subtle differences there may be.
>
> 3. Where is implementation of utf8::encode/decode functions? It is not
> in utf8.pm, nor in utf8_heavy.pl and also not in unicore/Heavy.pl. And
> what those functions doing?

The implementation is in universal.c.  But these are just wrappers for 
sv_utf8_encode and sv_utf8_decode, which are implemented in sv.c.  Their 
documentation is in perlapi.  It should match the documentation of 
utf8::decode and utf8::encode, whose documentation is in utf8.pm.  (I 
myself have a hard time mapping the names chosen for these operations 
with what they actually do)
>

[toc] | [next] | [standalone]

#199

From	pagaltzis@gmx.de (Aristotle Pagaltzis)
Date	2016-05-15 05:05 +0200
Message-ID	<20160515030528.GA57966@plasmasturm.org>
In reply to	#192

* Pali Rohár <pali.rohar@gmail.com> [2016-05-12 20:23]:
> If both functions should do same thing, why we have duplicity?

Encode.pm is big and fairly slow, because it handles a zillion encodings
and has lots of options for handling invalid input data. Perl needs only
UTF-8 transcoding and needs it fast, so it has code for just that. Since
that code is there anyway, it can just as well be exposed to Perl space.

> And which one is preferred to use?

Well, either you need Encode.pm or you don’t. The built-ins are faster
and always loaded, but they only do UTF-8 and if you have invalid data
then all you get is a false return value and no other help. If you need
anything else you pay the memory and take the speed hit of Encode.pm.
(If you are working on a large application, chances are high that you
have Encode.pm loaded anyway.)

Regards,
-- 
Aristotle Pagaltzis // <http://plasmasturm.org/>

[toc] | [prev] | [standalone]

csiph-web

Re: UTF-8 encoding & decoding

Contents

#192 — Re: UTF-8 encoding & decoding

#199