Groups > comp.lang.awk > #124 > unrolled thread

Regular expression in awk

Started by	gio001 <gcrippa@gmail.com>
First post	2011-04-06 07:40 -0700
Last post	2011-04-08 13:18 +0100
Articles	15 — 7 participants

Back to article view | Back to comp.lang.awk

  Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-06 07:40 -0700
    Re: Regular expression in awk pk <pk@pk.invalid> - 2011-04-06 16:01 +0100
    Re: Regular expression in awk Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-06 19:24 +0200
      Re: Regular expression in awk Manuel Collado <m.collado@domain.invalid> - 2011-04-06 21:07 +0200
      Re: Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-06 12:25 -0700
        Re: Regular expression in awk Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-06 23:09 +0200
          Re: Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-06 19:59 -0700
            Re: Regular expression in awk Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID> - 2011-04-07 07:32 +0000
              Re: Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-07 05:06 -0700
                Re: Regular expression in awk Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID> - 2011-04-07 13:34 +0000
                  Re: Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-07 08:27 -0700
                    Re: Regular expression in awk Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID> - 2011-04-07 18:32 +0000
        Re: Regular expression in awk Geoff Clare <geoff@clare.See-My-Signature.invalid> - 2011-04-07 13:34 +0100
          Re: Regular expression in awk arnold@skeeve.com (Aharon Robbins) - 2011-04-07 13:31 +0000
            Re: Regular expression in awk Geoff Clare <geoff@clare.See-My-Signature.invalid> - 2011-04-08 13:18 +0100

#124 — Regular expression in awk

From	gio001 <gcrippa@gmail.com>
Date	2011-04-06 07:40 -0700
Subject	Regular expression in awk
Message-ID	<4e09d4fd-8351-45f0-8f17-b6ac0d32e19a@l18g2000yql.googlegroups.com>

Hello,
I have thousand of messages (HL7), I want to use awk to extract only
the ones that have a particular value in pv1.18
Each record in the file is the whole HL7 message, ie. when I print $0
I get the whole message MSH EVN PID etc. ,there is an x0d between the
segments.
I would like to use a line somewhat like:

awk '{if(/PV1\|\([^|]*\|\)\{16\}\|X/){print $0}}' infile > outfile

I do not seem to be able to get this working properly in awk on a AIX
box.
Yet this statement works fine against the infile:

grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile

Can anyone please help?
Thanks.

[toc] | [next] | [standalone]

#125

From	pk <pk@pk.invalid>
Date	2011-04-06 16:01 +0100
Message-ID	<inhv8h$fbs$1@speranza.aioe.org>
In reply to	#124

gio001 wrote:

> Hello,
> I have thousand of messages (HL7), I want to use awk to extract only
> the ones that have a particular value in pv1.18
> Each record in the file is the whole HL7 message, ie. when I print $0
> I get the whole message MSH EVN PID etc. ,there is an x0d between the
> segments.
> I would like to use a line somewhat like:
> 
> awk '{if(/PV1\|\([^|]*\|\)\{16\}\|X/){print $0}}' infile > outfile
> 
> I do not seem to be able to get this working properly in awk on a AIX
> box.
> Yet this statement works fine against the infile:
> 
> grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile
> 
> Can anyone please help?

It would be useful if you pasted some sample of this HL7, which probably not 
everyone is familiar with.

If you're having trouble processing NULs with awk on AIX, you may be able to 
preprocess the input and replace all the \x0 with something else, for 
example using tr.

[toc] | [prev] | [next] | [standalone]

#127

From	Janis Papanagnou <janis_papanagnou@hotmail.com>
Date	2011-04-06 19:24 +0200
Message-ID	<ini7kr$410$1@news.m-online.net>
In reply to	#124

On 06.04.2011 16:40, gio001 wrote:
> Hello,
> I have thousand of messages (HL7),

Provide sample data for a few HL7 records.

> I want to use awk to extract only
> the ones that have a particular value in pv1.18

Define what you mean by "pv1.18".

> Each record in the file is the whole HL7 message, ie. when I print $0
> I get the whole message MSH EVN PID etc. ,there is an x0d between the

You mean a single ASCII CR character separates some entity in the data?

> segments.

Define what a segment in your sample data is.

> I would like to use a line somewhat like:
> 
> awk '{if(/PV1\|\([^|]*\|\)\{16\}\|X/){print $0}}' infile > outfile

In awk you can write

  /pattern/ { action }

instead of

  { if ($0 ~ /pattern/) action }

> 
> I do not seem to be able to get this working properly in awk on a AIX
> box.
> Yet this statement works fine against the infile:
> 
> grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile

Gee! What's that cryptic expression supposed to do?

Please don't try to give your cryptic tries without explaining what you
actually want to achieve.

> 
> Can anyone please help?

It may help you to know that in awk you can define the field separator
and record separators. Then you can access individual fields in a simple
way. Say, your field separator is | (a pipe symbol) and you want to
extract the 18th field only in lines where you've some pattern /PV1/:

  awk -F\| '/PV1/ { print $18 }' infile

Janis

> Thanks.

[toc] | [prev] | [next] | [standalone]

#128

From	Manuel Collado <m.collado@domain.invalid>
Date	2011-04-06 21:07 +0200
Message-ID	<inidsl$gfl$1@peque.uv.es>
In reply to	#127

El 06/04/2011 19:24, Janis Papanagnou escribió:
> On 06.04.2011 16:40, gio001 wrote:
>> Hello,
>> I have thousand of messages (HL7),
>
> Provide sample data for a few HL7 records.

IIRC, HL7 defines an XML markup for health information exchange. So 
please provide sample data, so we can see if your message format is XML 
or plain text.

>
>> I want to use awk to extract only
>> the ones that have a particular value in pv1.18

If your data are XML, then it could be difficult to use plain awk to 
process them. You could better use the XML extensions of xgawk, or a 
native XML tool, like an XSLT processor.

-- 
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado

[toc] | [prev] | [next] | [standalone]

#129

From	gio001 <gcrippa@gmail.com>
Date	2011-04-06 12:25 -0700
Message-ID	<02482113-48f0-4690-b20c-98f906b1cd51@w7g2000yqe.googlegroups.com>
In reply to	#127

On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:
> On 06.04.2011 16:40, gio001 wrote:
>
> > Hello,
> > I have thousand of messages (HL7),
>
> Provide sample data for a few HL7 records.
>
> > I want to use awk to extract only
> > the ones that have a particular value in pv1.18
>
> Define what you mean by "pv1.18".
>
> > Each record in the file is the whole HL7 message, ie. when I print $0
> > I get the whole message MSH EVN PID etc. ,there is an x0d between the
>
> You mean a single ASCII CR character separates some entity in the data?
>
> > segments.
>
> Define what a segment in your sample data is.
>
> > I would like to use a line somewhat like:
>
> > awk '{if(/PV1\|\([^|]*\|\)\{16\}\|X/){print $0}}' infile > outfile
>
> In awk you can write
>
>   /pattern/ { action }
>
> instead of
>
>   { if ($0 ~ /pattern/) action }
>
>
>
> > I do not seem to be able to get this working properly in awk on a AIX
> > box.
> > Yet this statement works fine against the infile:
>
> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile
>
> Gee! What's that cryptic expression supposed to do?
>
> Please don't try to give your cryptic tries without explaining what you
> actually want to achieve.
>
>
>
> > Can anyone please help?
>
> It may help you to know that in awk you can define the field separator
> and record separators. Then you can access individual fields in a simple
> way. Say, your field separator is | (a pipe symbol) and you want to
> extract the 18th field only in lines where you've some pattern /PV1/:
>
>   awk -F\| '/PV1/ { print $18 }' infile
>
> Janis
>
>
>
> > Thanks.- Hide quoted text -
>
> - Show quoted text -

I perfectly understand what you are suggesting, I tried this form in
vain:
awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
yet the same exact costruct works in
grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
I am trying to get the entire messages which have a PV1.18 containing
X out to a file....
I also understand about the field delimeter, setting to a | would be
ok, the only thing is that my messages are a single entity, i.e. every
transaction in the file is the a whole message (MSH, EVN, PID all the
segments for one message are all listed as a single blob, and that is
what $0 returns if I want to look at each rec), that is why I was
trying to apply the regexpr onto the whole message since I want to
match my search string without knowing how many other  | separated
fields exist before it.
Hope I explained myself.
Thanks.

[toc] | [prev] | [next] | [standalone]

#130

From	Janis Papanagnou <janis_papanagnou@hotmail.com>
Date	2011-04-06 23:09 +0200
Message-ID	<inikr7$abg$1@news.m-online.net>
In reply to	#129

On 06.04.2011 21:25, gio001 wrote:
> On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> wrote:
>> On 06.04.2011 16:40, gio001 wrote:
>>
>>> Hello,
>>> I have thousand of messages (HL7),
>>
>> Provide sample data for a few HL7 records.

(This request has meanwhile been repeated a couple times.)

[...]
> 
> I perfectly understand what you are suggesting, I tried this form in
> vain:
> awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
> yet the same exact costruct works in
> grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile

You are repeating yourself, thus completely ignoring what I suggested
to clarify your task.

1. Please don't try to give your cryptic tries without explaining what you
actually want to achieve.

2. Provide sample data for a few HL7 records.

[snip of irrelevant repetitions]

> Hope I explained myself.

Nope. (See above.)

> Thanks.

Good luck.

[toc] | [prev] | [next] | [standalone]

#131

From	gio001 <gcrippa@gmail.com>
Date	2011-04-06 19:59 -0700
Message-ID	<6be0ebd4-c6d6-4dbd-8665-f01ae8142545@l11g2000yqb.googlegroups.com>
In reply to	#130

On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:
> On 06.04.2011 21:25, gio001 wrote:
>
> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> > wrote:
> >> On 06.04.2011 16:40, gio001 wrote:
>
> >>> Hello,
> >>> I have thousand of messages (HL7),
>
> >> Provide sample data for a few HL7 records.
>
> (This request has meanwhile been repeated a couple times.)
>
> [...]
>
>
>
> > I perfectly understand what you are suggesting, I tried this form in
> > vain:
> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
> > yet the same exact costruct works in
> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>
> You are repeating yourself, thus completely ignoring what I suggested
> to clarify your task.
>
> 1. Please don't try to give your cryptic tries without explaining what you
> actually want to achieve.
>
> 2. Provide sample data for a few HL7 records.
>
> [snip of irrelevant repetitions]
>
> > Hope I explained myself.
>
> Nope. (See above.)
>
> > Thanks.
>
> Good luck.

Ok, sorry for the delay, here are some sample records, each record
starts with an MSH, let me know if you need additional info.
From one MSH to the next MSH is a single message inside of which I am
looking for field 18 after the PV1 tag to examine that field content
and only select message that satisfy a particular value in that field
(in this particular case a capital letter 'O'), I am therefore trying
to use awk in the format of   /pattern/ { action }   and more
specifically:

awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile

Here are 4 messages of this group I would like only the 2nd message to
be written to outfile:

MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
133445567^MPV1|||LWK^^|||||||LWK||||||||X|
000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
00001051|12312333|7777777^Test for exam one|||201103301533|||||||
201104061535|^
^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
32225567^MPV1|||LMK^^|||||||LMK||||||||O|
000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
^MOBR||00001001|766666676|7777777^Test for exam one|||
201104011523|||||||201104061535|^
^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
133445567^MPV1|||LWH^^|||||||LWH||||||||X|
000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
00001001|760222266|7777777^Test for exam one|||201103301533|||||||
201104061535|^
^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
32225567^MPV1|||LMK^^|||||||LMK||||||||R|
000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
^MOBR||00001001|761234676|7777777^Test for exam one|||
201104011523|||||||201104061535|^
^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~

[toc] | [prev] | [next] | [standalone]

#132

From	Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID>
Date	2011-04-07 07:32 +0000
Message-ID	<4d9d6886$0$5892$426a74cc@news.free.fr>
In reply to	#131

Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :

> On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> wrote:
>> On 06.04.2011 21:25, gio001 wrote:
>>
>> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
>> > wrote:
>> >> On 06.04.2011 16:40, gio001 wrote:
>>
>> >>> Hello,
>> >>> I have thousand of messages (HL7),
>>
>> >> Provide sample data for a few HL7 records.
>>
>> (This request has meanwhile been repeated a couple times.)
>>
>> [...]
>>
>>
>>
>> > I perfectly understand what you are suggesting, I tried this form in
>> > vain:
>> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile yet
>> > the same exact costruct works in
>> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>>
>> You are repeating yourself, thus completely ignoring what I suggested
>> to clarify your task.
>>
>> 1. Please don't try to give your cryptic tries without explaining what
>> you actually want to achieve.
>>
>> 2. Provide sample data for a few HL7 records.
>>
>> [snip of irrelevant repetitions]
>>
>> > Hope I explained myself.
>>
>> Nope. (See above.)
>>
>> > Thanks.
>>
>> Good luck.
> 
> Ok, sorry for the delay, here are some sample records, each record
> starts with an MSH, let me know if you need additional info. From one
> MSH to the next MSH is a single message inside of which I am looking for
> field 18 after the PV1 tag to examine that field content and only select
> message that satisfy a particular value in that field (in this
> particular case a capital letter 'O'), I am therefore trying to use awk
> in the format of   /pattern/ { action }   and more specifically:
> 
> awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
> 
> Here are 4 messages of this group I would like only the 2nd message to
> be written to outfile:
> 
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
> 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
> 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
> 201104061535|^
> ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
> 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
> 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> ^MOBR||00001001|766666676|7777777^Test for exam one|||
> 201104011523|||||||201104061535|^
> ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
> 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
> 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
> 201104061535|^
> ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
> 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
> 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> ^MOBR||00001001|761234676|7777777^Test for exam one|||
> 201104011523|||||||201104061535|^
> ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~

I supposed that, as you stated in a previous post, the records
(which you called "segments") are separated by an "x0d", if so:
--------
$ awk '$49=="O"' FS='|' yourfile
MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam one|||201104011523|||||||201104061535|^^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
--------

Note that your flag might be the 31st field then:
--------
$ awk '{print $31}' FS='|' yourfile
133445567^MPV1
032225567^MPV1
133445567^MPV1
032225567^MPV1
--------

then you may like to lock it up so:
--------
$ awk '$31~/PV1$/&&$49=="O"' FS='|'
--------

(
And if your records/segments don't go that way please post one "segment"
pushed through 'od -Ad -c' (or 'xxd' if you prefer)
)

[toc] | [prev] | [next] | [standalone]

#133

From	gio001 <gcrippa@gmail.com>
Date	2011-04-07 05:06 -0700
Message-ID	<a99d6877-ce45-43f2-8205-b9637d45f1f1@r3g2000yqh.googlegroups.com>
In reply to	#132

On Apr 7, 3:32 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
wrote:
> Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :
>
>
>
>
>
>
>
>
>
> > On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> > wrote:
> >> On 06.04.2011 21:25, gio001 wrote:
>
> >> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> >> > wrote:
> >> >> On 06.04.2011 16:40, gio001 wrote:
>
> >> >>> Hello,
> >> >>> I have thousand of messages (HL7),
>
> >> >> Provide sample data for a few HL7 records.
>
> >> (This request has meanwhile been repeated a couple times.)
>
> >> [...]
>
> >> > I perfectly understand what you are suggesting, I tried this form in
> >> > vain:
> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile yet
> >> > the same exact costruct works in
> >> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>
> >> You are repeating yourself, thus completely ignoring what I suggested
> >> to clarify your task.
>
> >> 1. Please don't try to give your cryptic tries without explaining what
> >> you actually want to achieve.
>
> >> 2. Provide sample data for a few HL7 records.
>
> >> [snip of irrelevant repetitions]
>
> >> > Hope I explained myself.
>
> >> Nope. (See above.)
>
> >> > Thanks.
>
> >> Good luck.
>
> > Ok, sorry for the delay, here are some sample records, each record
> > starts with an MSH, let me know if you need additional info. From one
> > MSH to the next MSH is a single message inside of which I am looking for
> > field 18 after the PV1 tag to examine that field content and only select
> > message that satisfy a particular value in that field (in this
> > particular case a capital letter 'O'), I am therefore trying to use awk
> > in the format of   /pattern/ { action }   and more specifically:
>
> > awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
>
> > Here are 4 messages of this group I would like only the 2nd message to
> > be written to outfile:
>
> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> > 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
> > 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> > 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
> > 201104061535|^
> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> > 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
> > 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> > ^MOBR||00001001|766666676|7777777^Test for exam one|||
> > 201104011523|||||||201104061535|^
> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> > 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
> > 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> > 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
> > 201104061535|^
> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> > 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
> > 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> > ^MOBR||00001001|761234676|7777777^Test for exam one|||
> > 201104011523|||||||201104061535|^
> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>
> I supposed that, as you stated in a previous post, the records
> (which you called "segments") are separated by an "x0d", if so:
> --------
> $ awk '$49=="O"' FS='|' yourfile
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam one|||201104011523|||||||201104061535|^^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
> --------
>
> Note that your flag might be the 31st field then:
> --------
> $ awk '{print $31}' FS='|' yourfile
> 133445567^MPV1
> 032225567^MPV1
> 133445567^MPV1
> 032225567^MPV1
> --------
>
> then you may like to lock it up so:
> --------
> $ awk '$31~/PV1$/&&$49=="O"' FS='|'
> --------
>
> (
> And if your records/segments don't go that way please post one "segment"
> pushed through 'od -Ad -c' (or 'xxd' if you prefer)
> )

Thanks for the help and the suggestions.
Yet I would like to be able to use the awk regexp pattern syntax to
achieve the same result and not have to count manually where I need to
match in my records.
The pattern in the regexp will make it flexible enough that if even my
layout before the pattern to be matched changes I can still use the
regexp to look for my match.
Thanks again.

[toc] | [prev] | [next] | [standalone]

#136

From	Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID>
Date	2011-04-07 13:34 +0000
Message-ID	<4d9dbd7a$0$20837$426a34cc@news.free.fr>
In reply to	#133

Thu, 07 Apr 2011 05:06:48 -0700, gio001 did cat :

> On Apr 7, 3:32 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
> wrote:
>> Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
>> > wrote:
>> >> On 06.04.2011 21:25, gio001 wrote:
>>
>> >> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
>> >> > wrote:
>> >> >> On 06.04.2011 16:40, gio001 wrote:
>>
>> >> >>> Hello,
>> >> >>> I have thousand of messages (HL7),
>>
>> >> >> Provide sample data for a few HL7 records.
>>
>> >> (This request has meanwhile been repeated a couple times.)
>>
>> >> [...]
>>
>> >> > I perfectly understand what you are suggesting, I tried this form
>> >> > in vain:
>> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile yet
>> >> > the same exact costruct works in
>> >> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>>
>> >> You are repeating yourself, thus completely ignoring what I
>> >> suggested to clarify your task.
>>
>> >> 1. Please don't try to give your cryptic tries without explaining
>> >> what you actually want to achieve.
>>
>> >> 2. Provide sample data for a few HL7 records.
>>
>> >> [snip of irrelevant repetitions]
>>
>> >> > Hope I explained myself.
>>
>> >> Nope. (See above.)
>>
>> >> > Thanks.
>>
>> >> Good luck.
>>
>> > Ok, sorry for the delay, here are some sample records, each record
>> > starts with an MSH, let me know if you need additional info. From one
>> > MSH to the next MSH is a single message inside of which I am looking
>> > for field 18 after the PV1 tag to examine that field content and only
>> > select message that satisfy a particular value in that field (in this
>> > particular case a capital letter 'O'), I am therefore trying to use
>> > awk in the format of   /pattern/ { action }   and more specifically:
>>
>> > awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
>>
>> > Here are 4 messages of this group I would like only the 2nd message
>> > to be written to outfile:
>>
>> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
>> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
>> > 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
>> > 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
>> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
>> > 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
>> > 201104061535|^
>> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
>> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
>> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
>> > 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
>> > 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
>> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
>> > ^MOBR||00001001|766666676|7777777^Test for exam one|||
>> > 201104011523|||||||201104061535|^
>> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
>> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
>> > 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
>> > 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
>> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
>> > 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
>> > 201104061535|^
>> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
>> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
>> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
>> > 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
>> > 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
>> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
>> > ^MOBR||00001001|761234676|7777777^Test for exam one|||
>> > 201104011523|||||||201104061535|^
>> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>>
>> I supposed that, as you stated in a previous post, the records (which
>> you called "segments") are separated by an "x0d", if so: --------
>> $ awk '$49=="O"' FS='|' yourfile
>> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER
>> JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam
>> one|||201104011523|||||||201104061535|^^^|114233^MISTER
>> JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~ --------
>>
>> Note that your flag might be the 31st field then: --------
>> $ awk '{print $31}' FS='|' yourfile
>> 133445567^MPV1
>> 032225567^MPV1
>> 133445567^MPV1
>> 032225567^MPV1
>> --------
>>
>> then you may like to lock it up so:
>> --------
>> $ awk '$31~/PV1$/&&$49=="O"' FS='|'
>> --------
>>
>> (
>> And if your records/segments don't go that way please post one
>> "segment" pushed through 'od -Ad -c' (or 'xxd' if you prefer) )
> 
> Thanks for the help and the suggestions. Yet I would like to be able to
> use the awk regexp pattern syntax to achieve the same result and not
> have to count manually where I need to match in my records.
> The pattern in the regexp will make it flexible enough that if even my
> layout before the pattern to be matched changes I can still use the
> regexp to look for my match.

OK, I understand your idea :-) Still, as the expression you gave doesn't
work on my installations (neither in grep nor in awk) you'll be advised
to read Geoff Clare and Aharon Robbins suggestions :D)

Just for completion, an algorithmical version of your exp could be
quite flexible too, here's a possible version, adapted to the data you gave:
--------
$ awk '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"' FS='|' yourfile
--------

[toc] | [prev] | [next] | [standalone]

#137

From	gio001 <gcrippa@gmail.com>
Date	2011-04-07 08:27 -0700
Message-ID	<04625524-8fec-4c2e-a4ca-21ea748d7961@v16g2000vbq.googlegroups.com>
In reply to	#136

On Apr 7, 9:34 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
wrote:
> Thu, 07 Apr 2011 05:06:48 -0700, gio001 did cat :
>
>
>
>
>
>
>
>
>
> > On Apr 7, 3:32 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
> > wrote:
> >> Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :
>
> >> > On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> >> > wrote:
> >> >> On 06.04.2011 21:25, gio001 wrote:
>
> >> >> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> >> >> > wrote:
> >> >> >> On 06.04.2011 16:40, gio001 wrote:
>
> >> >> >>> Hello,
> >> >> >>> I have thousand of messages (HL7),
>
> >> >> >> Provide sample data for a few HL7 records.
>
> >> >> (This request has meanwhile been repeated a couple times.)
>
> >> >> [...]
>
> >> >> > I perfectly understand what you are suggesting, I tried this form
> >> >> > in vain:
> >> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile yet
> >> >> > the same exact costruct works in
> >> >> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>
> >> >> You are repeating yourself, thus completely ignoring what I
> >> >> suggested to clarify your task.
>
> >> >> 1. Please don't try to give your cryptic tries without explaining
> >> >> what you actually want to achieve.
>
> >> >> 2. Provide sample data for a few HL7 records.
>
> >> >> [snip of irrelevant repetitions]
>
> >> >> > Hope I explained myself.
>
> >> >> Nope. (See above.)
>
> >> >> > Thanks.
>
> >> >> Good luck.
>
> >> > Ok, sorry for the delay, here are some sample records, each record
> >> > starts with an MSH, let me know if you need additional info. From one
> >> > MSH to the next MSH is a single message inside of which I am looking
> >> > for field 18 after the PV1 tag to examine that field content and only
> >> > select message that satisfy a particular value in that field (in this
> >> > particular case a capital letter 'O'), I am therefore trying to use
> >> > awk in the format of   /pattern/ { action }   and more specifically:
>
> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
>
> >> > Here are 4 messages of this group I would like only the 2nd message
> >> > to be written to outfile:
>
> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
> >> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> >> > 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
> >> > 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> >> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> >> > 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
> >> > 201104061535|^
> >> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
> >> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> >> > 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
> >> > 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> >> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> >> > ^MOBR||00001001|766666676|7777777^Test for exam one|||
> >> > 201104011523|||||||201104061535|^
> >> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
> >> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> >> > 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
> >> > 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> >> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> >> > 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
> >> > 201104061535|^
> >> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
> >> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> >> > 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
> >> > 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> >> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> >> > ^MOBR||00001001|761234676|7777777^Test for exam one|||
> >> > 201104011523|||||||201104061535|^
> >> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>
> >> I supposed that, as you stated in a previous post, the records (which
> >> you called "segments") are separated by an "x0d", if so: --------
> >> $ awk '$49=="O"' FS='|' yourfile
> >> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER
> >> JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam
> >> one|||201104011523|||||||201104061535|^^^|114233^MISTER
> >> JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~ --------
>
> >> Note that your flag might be the 31st field then: --------
> >> $ awk '{print $31}' FS='|' yourfile
> >> 133445567^MPV1
> >> 032225567^MPV1
> >> 133445567^MPV1
> >> 032225567^MPV1
> >> --------
>
> >> then you may like to lock it up so:
> >> --------
> >> $ awk '$31~/PV1$/&&$49=="O"' FS='|'
> >> --------
>
> >> (
> >> And if your records/segments don't go that way please post one
> >> "segment" pushed through 'od -Ad -c' (or 'xxd' if you prefer) )
>
> > Thanks for the help and the suggestions. Yet I would like to be able to
> > use the awk regexp pattern syntax to achieve the same result and not
> > have to count manually where I need to match in my records.
> > The pattern in the regexp will make it flexible enough that if even my
> > layout before the pattern to be matched changes I can still use the
> > regexp to look for my match.
>
> OK, I understand your idea :-) Still, as the expression you gave doesn't
> work on my installations (neither in grep nor in awk) you'll be advised
> to read Geoff Clare and Aharon Robbins suggestions :D)
>
> Just for completion, an algorithmical version of your exp could be
> quite flexible too, here's a possible version, adapted to the data you gave:
> --------
> $ awk '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"' FS='|' yourfile
> --------

Thanks to all,
because of your help I have a better understanding of the issue and I
have it corrected .... it now works properly :-), what I needed was
the escaping of | (only) in my awk statement, like this:
awk '/PV1\|([^\|]*\|){16}\|X/ {print $0}' infile > outfile

Loki, when I execute:
awk '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i
+18)=="O"' FS='|' yourfile
I am getting a message about event i not defined, so I removed and ran
like:
awk '{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"'
FS='|' yourfile
which seems to generate the proper output, was that section necessary
for some other purpose, and if so how can I resolve the i event
message, should I initialize i in the BEGIN section?

Again, thanks to Geoff Clare and Aharon Robbins and Loki Harfagr and
everyone else who contributed with their input!

[toc] | [prev] | [next] | [standalone]

#138

From	Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID>
Date	2011-04-07 18:32 +0000
Message-ID	<pan.2011.04.07.18.32.21@thedarkdesign.free.fr.INVALID>
In reply to	#137

Thu, 07 Apr 2011 08:27:56 -0700, gio001 did cat :

> On Apr 7, 9:34 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
> wrote:
>> Thu, 07 Apr 2011 05:06:48 -0700, gio001 did cat :
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > On Apr 7, 3:32 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
>> > wrote:
>> >> Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :
>>
>> >> > On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
>> >> > wrote:
>> >> >> On 06.04.2011 21:25, gio001 wrote:
>>
>> >> >> > On Apr 6, 1:24 pm, Janis Papanagnou
>> >> >> > <janis_papanag...@hotmail.com> wrote:
>> >> >> >> On 06.04.2011 16:40, gio001 wrote:
>>
>> >> >> >>> Hello,
>> >> >> >>> I have thousand of messages (HL7),
>>
>> >> >> >> Provide sample data for a few HL7 records.
>>
>> >> >> (This request has meanwhile been repeated a couple times.)
>>
>> >> >> [...]
>>
>> >> >> > I perfectly understand what you are suggesting, I tried this
>> >> >> > form in vain:
>> >> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
>> >> >> > yet the same exact costruct works in
>> >> >> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>>
>> >> >> You are repeating yourself, thus completely ignoring what I
>> >> >> suggested to clarify your task.
>>
>> >> >> 1. Please don't try to give your cryptic tries without explaining
>> >> >> what you actually want to achieve.
>>
>> >> >> 2. Provide sample data for a few HL7 records.
>>
>> >> >> [snip of irrelevant repetitions]
>>
>> >> >> > Hope I explained myself.
>>
>> >> >> Nope. (See above.)
>>
>> >> >> > Thanks.
>>
>> >> >> Good luck.
>>
>> >> > Ok, sorry for the delay, here are some sample records, each record
>> >> > starts with an MSH, let me know if you need additional info. From
>> >> > one MSH to the next MSH is a single message inside of which I am
>> >> > looking for field 18 after the PV1 tag to examine that field
>> >> > content and only select message that satisfy a particular value in
>> >> > that field (in this particular case a capital letter 'O'), I am
>> >> > therefore trying to use awk in the format of   /pattern/ { action
>> >> > }   and more specifically:
>>
>> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
>>
>> >> > Here are 4 messages of this group I would like only the 2nd
>> >> > message to be written to outfile:
>>
>> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
>> >> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
>> >> > 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
>> >> > 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
>> >> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
>> >> > 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
>> >> > 201104061535|^
>> >> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
>> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
>> >> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
>> >> > 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
>> >> > 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
>> >> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
>> >> > ^MOBR||00001001|766666676|7777777^Test for exam one|||
>> >> > 201104011523|||||||201104061535|^
>> >> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
>> >> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
>> >> > 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
>> >> > 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
>> >> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
>> >> > 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
>> >> > 201104061535|^
>> >> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
>> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
>> >> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
>> >> > 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
>> >> > 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
>> >> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
>> >> > ^MOBR||00001001|761234676|7777777^Test for exam one|||
>> >> > 201104011523|||||||201104061535|^
>> >> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>>
>> >> I supposed that, as you stated in a previous post, the records
>> >> (which you called "segments") are separated by an "x0d", if so:
>> >> -------- $ awk '$49=="O"' FS='|' yourfile
>> >> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER
>> >> JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam
>> >> one|||201104011523|||||||201104061535|^^^|114233^MISTER
>> >> JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~ --------
>>
>> >> Note that your flag might be the 31st field then: -------- $ awk
>> >> '{print $31}' FS='|' yourfile
>> >> 133445567^MPV1
>> >> 032225567^MPV1
>> >> 133445567^MPV1
>> >> 032225567^MPV1
>> >> --------
>>
>> >> then you may like to lock it up so:
>> >> --------
>> >> $ awk '$31~/PV1$/&&$49=="O"' FS='|'
>> >> --------
>>
>> >> (
>> >> And if your records/segments don't go that way please post one
>> >> "segment" pushed through 'od -Ad -c' (or 'xxd' if you prefer) )
>>
>> > Thanks for the help and the suggestions. Yet I would like to be able
>> > to use the awk regexp pattern syntax to achieve the same result and
>> > not have to count manually where I need to match in my records. The
>> > pattern in the regexp will make it flexible enough that if even my
>> > layout before the pattern to be matched changes I can still use the
>> > regexp to look for my match.
>>
>> OK, I understand your idea :-) Still, as the expression you gave
>> doesn't work on my installations (neither in grep nor in awk) you'll be
>> advised to read Geoff Clare and Aharon Robbins suggestions :D)
>>
>> Just for completion, an algorithmical version of your exp could be
>> quite flexible too, here's a possible version, adapted to the data you
>> gave: --------
>> $ awk
>> '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"'
>> FS='|' yourfile --------
> 
> Thanks to all,
> because of your help I have a better understanding of the issue and I
> have it corrected .... it now works properly :-), what I needed was the
> escaping of | (only) in my awk statement, like this: awk
> '/PV1\|([^\|]*\|){16}\|X/ {print $0}' infile > outfile
> 
> Loki, when I execute:
> awk '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i
> +18)=="O"' FS='|' yourfile
> I am getting a message about event i not defined,

This is typical of a  doublequote (") used instead of a program
bound single-quote ('), but in some shells (some env.) it may be
necessary to protect the code from the shell thus to put it in a script
instead of launching it on the command line, that's quite common
in MSWindows shells and real shells running in MSWindows but I've never
seen this in AIX (now my last AIX was a 4.3.1 or 4.3.2 so I guess a
few things may have rot since ;-)


> so I removed and ran
> like:
> awk '{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"'
> FS='|' yourfile
> which seems to generate the proper output, was that section necessary
> for some other purpose, and if so how can I resolve the i event message,
> should I initialize i in the BEGIN section?

the !i test is there just to fix the condition that only the first record
containing a field triggering the exp /PV1$/ will be checked to get the
value for i. It's essentially an optimization, as such it can also have
backlash effects and/or curative effects, depending on the 'solidity' of
your input data ;-)

> Again, thanks to Geoff Clare and Aharon Robbins and Loki Harfagr and
> everyone else who contributed with their input!

[toc] | [prev] | [next] | [standalone]

#134

From	Geoff Clare <geoff@clare.See-My-Signature.invalid>
Date	2011-04-07 13:34 +0100
Message-ID	<qjn078-7j2.ln1@leafnode-msgid.gclare.org.uk>
In reply to	#129

gio001 wrote:

> I perfectly understand what you are suggesting, I tried this form in
> vain:
> awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
> yet the same exact costruct works in
> grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile

Without the -E option grep uses "basic regular expressions" (BREs),
whereas awk uses "extended regular expressions" (EREs).  There are
some characters that are special in EREs but not in BREs.  Some of
these can be made special in BREs by preceding them with a
backslash, but in EREs doing that makes them not special.

The ERE equivalent of the BRE you used with grep is:

PV1|([^|]*|){16}|X

if your version of grep treats \| as special, or:

PV1\|([^|]*\|){16}\|X

if your version of grep does not treat \| as special.

-- 
Geoff Clare <netnews@gclare.org.uk>

[toc] | [prev] | [next] | [standalone]

#135

From	arnold@skeeve.com (Aharon Robbins)
Date	2011-04-07 13:31 +0000
Message-ID	<inkeb1$2pi$1@tornado.tornevall.net>
In reply to	#134

In article <qjn078-7j2.ln1@leafnode-msgid.gclare.org.uk>,
Geoff Clare  <geoff@clare.See-My-Signature.invalid> wrote:
>The ERE equivalent of the BRE you used with grep is:
>
>PV1|([^|]*|){16}|X

It may be that awk on AIX does not support interval expressions
(x{16}); this should be tested too.
-- 
Aharon (Arnold) Robbins 			arnold AT skeeve DOT com
P.O. Box 354		Home Phone: +972  8 979-0381
Nof Ayalon		Cell Phone: +972 50  729-7545
D.N. Shimshon 99785	ISRAEL

[toc] | [prev] | [next] | [standalone]

#140

From	Geoff Clare <geoff@clare.See-My-Signature.invalid>
Date	2011-04-08 13:18 +0100
Message-ID	<11b378-vho.ln1@leafnode-msgid.gclare.org.uk>
In reply to	#135

Aharon Robbins wrote:

> Geoff Clare  <geoff@clare.See-My-Signature.invalid> wrote:
>>The ERE equivalent of the BRE you used with grep is:
>>
>>PV1|([^|]*|){16}|X
>
> It may be that awk on AIX does not support interval expressions
> (x{16}); this should be tested too.

They are required by the POSIX/UNIX standard, and AIX is certified
as conforming.

I suppose it's possible that they are not supported by default in
awk and you need to do something to set up a conforming environment,
but I think that's unlikely.

-- 
Geoff Clare <netnews@gclare.org.uk>

[toc] | [prev] | [standalone]

csiph-web

Regular expression in awk

Contents

#124 — Regular expression in awk

#125

#127

#128

#129

#130

#131

#132

#133

#136

#137

#138

#134

#135

#140