Groups | Search | Server Info | Login | Register


Groups > comp.lang.awk > #9831

Re: [gawk] Handling variants of CSV input data formats

From Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Newsgroups comp.lang.awk
Subject Re: [gawk] Handling variants of CSV input data formats
Date 2024-08-27 03:39 +0200
Organization A noiseless patient Spider
Message-ID <vajant$2m8em$1@dont-email.me> (permalink)
References <vaeh9m$1pfge$1@dont-email.me> <vahop1$2eavu$1@dont-email.me> <vahttd$2f666$1@dont-email.me> <vaj7ps$2lph3$1@dont-email.me>

Show all headers | View raw


On 27.08.2024 02:49, Ed Morton wrote:
> On 8/26/2024 7:54 AM, Janis Papanagnou wrote:
>> snip>
>> I'd have liked to provide more concrete information here, but I'm at
>> the moment even unable to reproduce Awk's behavior as documented in
>> its manual; I've tried the following command with various locales
>>
>> $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
>> -| 5,321
>>
>> but always got just  5  as result.
> 
> You need to specifically TELL gawk to use your locale to read input
> numbers:
> 
> $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
> 5
> 
> $ echo 4,321 | POSIXLY_CORRECT=1 LC_ALL=en_DK.utf-8 gawk '{ print $1 + 1 }'
> 5,321
> 
> $ echo 4,321 | LC_ALL=en_DK.utf-8 gawk -N '{ print $1 + 1 }'        5,321
> 
> See
> https://www.gnu.org/software/gawk/manual/gawk.html#Locale-influences-conversions
> for more info on that.

Thanks. That's actually where I got above example from.

I've missed that there was an explicit
$ export POSIXLY_CORRECT=1
set on the very top of these examples. Gee!

Feels anyway strange that an explicit LC_* setting is ineffective
without the additional POSIXLY_CORRECT variable. And the page also
says: "The POSIX standard says that awk always uses the period as
the decimal point when reading the awk program source code".
So despite POSIX saying that, you have to use a variable named
POSIXLY_CORRECT. - Do I need some more coffee to understand that?

And I see there's an additional GNU Awk option '--use-lc-numeric'.
What a mess!

(I suppose current status can only be explained by the mentioned
forth-and-back during history of various GNU Awk versions.)

What's worth the LC_* variables if they are ignored (or maybe not).

Janis

> 
> Regards,
> 
>     Ed

Back to comp.lang.awk | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

[gawk] Handling variants of CSV input data formats Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-08-25 08:00 +0200
  Re: [gawk] Handling variants of CSV input data formats Ed Morton <mortonspam@gmail.com> - 2024-08-26 06:26 -0500
    Re: [gawk] Handling variants of CSV input data formats Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-08-26 14:54 +0200
      Re: [gawk] Handling variants of CSV input data formats Manuel Collado <mcollado2011@gmail.com> - 2024-08-26 19:01 +0200
        Re: [gawk] Handling variants of CSV input data formats Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-08-27 02:31 +0200
          Re: [gawk] Handling variants of CSV input data formats Manuel Collado <mcollado2011@gmail.com> - 2024-08-27 12:20 +0200
            Re: [gawk] Handling variants of CSV input data formats Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-08-27 18:19 +0200
      Re: [gawk] Handling variants of CSV input data formats Ed Morton <mortonspam@gmail.com> - 2024-08-26 19:49 -0500
        Re: [gawk] Handling variants of CSV input data formats Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-08-27 03:39 +0200
          Re: [gawk] Handling variants of CSV input data formats Ed Morton <mortonspam@gmail.com> - 2024-08-27 06:45 -0500
            Re: [gawk] Handling variants of CSV input data formats Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-08-27 18:23 +0200

csiph-web