Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.awk > #124 > unrolled thread
| Started by | gio001 <gcrippa@gmail.com> |
|---|---|
| First post | 2011-04-06 07:40 -0700 |
| Last post | 2011-04-08 13:18 +0100 |
| Articles | 15 — 7 participants |
Back to article view | Back to comp.lang.awk
Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-06 07:40 -0700
Re: Regular expression in awk pk <pk@pk.invalid> - 2011-04-06 16:01 +0100
Re: Regular expression in awk Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-06 19:24 +0200
Re: Regular expression in awk Manuel Collado <m.collado@domain.invalid> - 2011-04-06 21:07 +0200
Re: Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-06 12:25 -0700
Re: Regular expression in awk Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-06 23:09 +0200
Re: Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-06 19:59 -0700
Re: Regular expression in awk Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID> - 2011-04-07 07:32 +0000
Re: Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-07 05:06 -0700
Re: Regular expression in awk Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID> - 2011-04-07 13:34 +0000
Re: Regular expression in awk gio001 <gcrippa@gmail.com> - 2011-04-07 08:27 -0700
Re: Regular expression in awk Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID> - 2011-04-07 18:32 +0000
Re: Regular expression in awk Geoff Clare <geoff@clare.See-My-Signature.invalid> - 2011-04-07 13:34 +0100
Re: Regular expression in awk arnold@skeeve.com (Aharon Robbins) - 2011-04-07 13:31 +0000
Re: Regular expression in awk Geoff Clare <geoff@clare.See-My-Signature.invalid> - 2011-04-08 13:18 +0100
| From | gio001 <gcrippa@gmail.com> |
|---|---|
| Date | 2011-04-06 07:40 -0700 |
| Subject | Regular expression in awk |
| Message-ID | <4e09d4fd-8351-45f0-8f17-b6ac0d32e19a@l18g2000yql.googlegroups.com> |
Hello,
I have thousand of messages (HL7), I want to use awk to extract only
the ones that have a particular value in pv1.18
Each record in the file is the whole HL7 message, ie. when I print $0
I get the whole message MSH EVN PID etc. ,there is an x0d between the
segments.
I would like to use a line somewhat like:
awk '{if(/PV1\|\([^|]*\|\)\{16\}\|X/){print $0}}' infile > outfile
I do not seem to be able to get this working properly in awk on a AIX
box.
Yet this statement works fine against the infile:
grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile
Can anyone please help?
Thanks.
[toc] | [next] | [standalone]
| From | pk <pk@pk.invalid> |
|---|---|
| Date | 2011-04-06 16:01 +0100 |
| Message-ID | <inhv8h$fbs$1@speranza.aioe.org> |
| In reply to | #124 |
gio001 wrote:
> Hello,
> I have thousand of messages (HL7), I want to use awk to extract only
> the ones that have a particular value in pv1.18
> Each record in the file is the whole HL7 message, ie. when I print $0
> I get the whole message MSH EVN PID etc. ,there is an x0d between the
> segments.
> I would like to use a line somewhat like:
>
> awk '{if(/PV1\|\([^|]*\|\)\{16\}\|X/){print $0}}' infile > outfile
>
> I do not seem to be able to get this working properly in awk on a AIX
> box.
> Yet this statement works fine against the infile:
>
> grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile
>
> Can anyone please help?
It would be useful if you pasted some sample of this HL7, which probably not
everyone is familiar with.
If you're having trouble processing NULs with awk on AIX, you may be able to
preprocess the input and replace all the \x0 with something else, for
example using tr.
[toc] | [prev] | [next] | [standalone]
| From | Janis Papanagnou <janis_papanagnou@hotmail.com> |
|---|---|
| Date | 2011-04-06 19:24 +0200 |
| Message-ID | <ini7kr$410$1@news.m-online.net> |
| In reply to | #124 |
On 06.04.2011 16:40, gio001 wrote:
> Hello,
> I have thousand of messages (HL7),
Provide sample data for a few HL7 records.
> I want to use awk to extract only
> the ones that have a particular value in pv1.18
Define what you mean by "pv1.18".
> Each record in the file is the whole HL7 message, ie. when I print $0
> I get the whole message MSH EVN PID etc. ,there is an x0d between the
You mean a single ASCII CR character separates some entity in the data?
> segments.
Define what a segment in your sample data is.
> I would like to use a line somewhat like:
>
> awk '{if(/PV1\|\([^|]*\|\)\{16\}\|X/){print $0}}' infile > outfile
In awk you can write
/pattern/ { action }
instead of
{ if ($0 ~ /pattern/) action }
>
> I do not seem to be able to get this working properly in awk on a AIX
> box.
> Yet this statement works fine against the infile:
>
> grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile
Gee! What's that cryptic expression supposed to do?
Please don't try to give your cryptic tries without explaining what you
actually want to achieve.
>
> Can anyone please help?
It may help you to know that in awk you can define the field separator
and record separators. Then you can access individual fields in a simple
way. Say, your field separator is | (a pipe symbol) and you want to
extract the 18th field only in lines where you've some pattern /PV1/:
awk -F\| '/PV1/ { print $18 }' infile
Janis
> Thanks.
[toc] | [prev] | [next] | [standalone]
| From | Manuel Collado <m.collado@domain.invalid> |
|---|---|
| Date | 2011-04-06 21:07 +0200 |
| Message-ID | <inidsl$gfl$1@peque.uv.es> |
| In reply to | #127 |
El 06/04/2011 19:24, Janis Papanagnou escribió: > On 06.04.2011 16:40, gio001 wrote: >> Hello, >> I have thousand of messages (HL7), > > Provide sample data for a few HL7 records. IIRC, HL7 defines an XML markup for health information exchange. So please provide sample data, so we can see if your message format is XML or plain text. > >> I want to use awk to extract only >> the ones that have a particular value in pv1.18 If your data are XML, then it could be difficult to use plain awk to process them. You could better use the XML extensions of xgawk, or a native XML tool, like an XSLT processor. -- Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
[toc] | [prev] | [next] | [standalone]
| From | gio001 <gcrippa@gmail.com> |
|---|---|
| Date | 2011-04-06 12:25 -0700 |
| Message-ID | <02482113-48f0-4690-b20c-98f906b1cd51@w7g2000yqe.googlegroups.com> |
| In reply to | #127 |
On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:
> On 06.04.2011 16:40, gio001 wrote:
>
> > Hello,
> > I have thousand of messages (HL7),
>
> Provide sample data for a few HL7 records.
>
> > I want to use awk to extract only
> > the ones that have a particular value in pv1.18
>
> Define what you mean by "pv1.18".
>
> > Each record in the file is the whole HL7 message, ie. when I print $0
> > I get the whole message MSH EVN PID etc. ,there is an x0d between the
>
> You mean a single ASCII CR character separates some entity in the data?
>
> > segments.
>
> Define what a segment in your sample data is.
>
> > I would like to use a line somewhat like:
>
> > awk '{if(/PV1\|\([^|]*\|\)\{16\}\|X/){print $0}}' infile > outfile
>
> In awk you can write
>
> /pattern/ { action }
>
> instead of
>
> { if ($0 ~ /pattern/) action }
>
>
>
> > I do not seem to be able to get this working properly in awk on a AIX
> > box.
> > Yet this statement works fine against the infile:
>
> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile
>
> Gee! What's that cryptic expression supposed to do?
>
> Please don't try to give your cryptic tries without explaining what you
> actually want to achieve.
>
>
>
> > Can anyone please help?
>
> It may help you to know that in awk you can define the field separator
> and record separators. Then you can access individual fields in a simple
> way. Say, your field separator is | (a pipe symbol) and you want to
> extract the 18th field only in lines where you've some pattern /PV1/:
>
> awk -F\| '/PV1/ { print $18 }' infile
>
> Janis
>
>
>
> > Thanks.- Hide quoted text -
>
> - Show quoted text -
I perfectly understand what you are suggesting, I tried this form in
vain:
awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
yet the same exact costruct works in
grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
I am trying to get the entire messages which have a PV1.18 containing
X out to a file....
I also understand about the field delimeter, setting to a | would be
ok, the only thing is that my messages are a single entity, i.e. every
transaction in the file is the a whole message (MSH, EVN, PID all the
segments for one message are all listed as a single blob, and that is
what $0 returns if I want to look at each rec), that is why I was
trying to apply the regexpr onto the whole message since I want to
match my search string without knowing how many other | separated
fields exist before it.
Hope I explained myself.
Thanks.
[toc] | [prev] | [next] | [standalone]
| From | Janis Papanagnou <janis_papanagnou@hotmail.com> |
|---|---|
| Date | 2011-04-06 23:09 +0200 |
| Message-ID | <inikr7$abg$1@news.m-online.net> |
| In reply to | #129 |
On 06.04.2011 21:25, gio001 wrote:
> On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> wrote:
>> On 06.04.2011 16:40, gio001 wrote:
>>
>>> Hello,
>>> I have thousand of messages (HL7),
>>
>> Provide sample data for a few HL7 records.
(This request has meanwhile been repeated a couple times.)
[...]
>
> I perfectly understand what you are suggesting, I tried this form in
> vain:
> awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
> yet the same exact costruct works in
> grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
You are repeating yourself, thus completely ignoring what I suggested
to clarify your task.
1. Please don't try to give your cryptic tries without explaining what you
actually want to achieve.
2. Provide sample data for a few HL7 records.
[snip of irrelevant repetitions]
> Hope I explained myself.
Nope. (See above.)
> Thanks.
Good luck.
[toc] | [prev] | [next] | [standalone]
| From | gio001 <gcrippa@gmail.com> |
|---|---|
| Date | 2011-04-06 19:59 -0700 |
| Message-ID | <6be0ebd4-c6d6-4dbd-8665-f01ae8142545@l11g2000yqb.googlegroups.com> |
| In reply to | #130 |
On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
wrote:
> On 06.04.2011 21:25, gio001 wrote:
>
> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> > wrote:
> >> On 06.04.2011 16:40, gio001 wrote:
>
> >>> Hello,
> >>> I have thousand of messages (HL7),
>
> >> Provide sample data for a few HL7 records.
>
> (This request has meanwhile been repeated a couple times.)
>
> [...]
>
>
>
> > I perfectly understand what you are suggesting, I tried this form in
> > vain:
> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
> > yet the same exact costruct works in
> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>
> You are repeating yourself, thus completely ignoring what I suggested
> to clarify your task.
>
> 1. Please don't try to give your cryptic tries without explaining what you
> actually want to achieve.
>
> 2. Provide sample data for a few HL7 records.
>
> [snip of irrelevant repetitions]
>
> > Hope I explained myself.
>
> Nope. (See above.)
>
> > Thanks.
>
> Good luck.
Ok, sorry for the delay, here are some sample records, each record
starts with an MSH, let me know if you need additional info.
From one MSH to the next MSH is a single message inside of which I am
looking for field 18 after the PV1 tag to examine that field content
and only select message that satisfy a particular value in that field
(in this particular case a capital letter 'O'), I am therefore trying
to use awk in the format of /pattern/ { action } and more
specifically:
awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
Here are 4 messages of this group I would like only the 2nd message to
be written to outfile:
MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
133445567^MPV1|||LWK^^|||||||LWK||||||||X|
000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
00001051|12312333|7777777^Test for exam one|||201103301533|||||||
201104061535|^
^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
32225567^MPV1|||LMK^^|||||||LMK||||||||O|
000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
^MOBR||00001001|766666676|7777777^Test for exam one|||
201104011523|||||||201104061535|^
^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
133445567^MPV1|||LWH^^|||||||LWH||||||||X|
000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
00001001|760222266|7777777^Test for exam one|||201103301533|||||||
201104061535|^
^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
32225567^MPV1|||LMK^^|||||||LMK||||||||R|
000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
^MOBR||00001001|761234676|7777777^Test for exam one|||
201104011523|||||||201104061535|^
^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
[toc] | [prev] | [next] | [standalone]
| From | Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID> |
|---|---|
| Date | 2011-04-07 07:32 +0000 |
| Message-ID | <4d9d6886$0$5892$426a74cc@news.free.fr> |
| In reply to | #131 |
Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :
> On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> wrote:
>> On 06.04.2011 21:25, gio001 wrote:
>>
>> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
>> > wrote:
>> >> On 06.04.2011 16:40, gio001 wrote:
>>
>> >>> Hello,
>> >>> I have thousand of messages (HL7),
>>
>> >> Provide sample data for a few HL7 records.
>>
>> (This request has meanwhile been repeated a couple times.)
>>
>> [...]
>>
>>
>>
>> > I perfectly understand what you are suggesting, I tried this form in
>> > vain:
>> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile yet
>> > the same exact costruct works in
>> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>>
>> You are repeating yourself, thus completely ignoring what I suggested
>> to clarify your task.
>>
>> 1. Please don't try to give your cryptic tries without explaining what
>> you actually want to achieve.
>>
>> 2. Provide sample data for a few HL7 records.
>>
>> [snip of irrelevant repetitions]
>>
>> > Hope I explained myself.
>>
>> Nope. (See above.)
>>
>> > Thanks.
>>
>> Good luck.
>
> Ok, sorry for the delay, here are some sample records, each record
> starts with an MSH, let me know if you need additional info. From one
> MSH to the next MSH is a single message inside of which I am looking for
> field 18 after the PV1 tag to examine that field content and only select
> message that satisfy a particular value in that field (in this
> particular case a capital letter 'O'), I am therefore trying to use awk
> in the format of /pattern/ { action } and more specifically:
>
> awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
>
> Here are 4 messages of this group I would like only the 2nd message to
> be written to outfile:
>
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
> 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
> 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
> 201104061535|^
> ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
> 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
> 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> ^MOBR||00001001|766666676|7777777^Test for exam one|||
> 201104011523|||||||201104061535|^
> ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
> 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
> 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
> 201104061535|^
> ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
> 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
> 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> ^MOBR||00001001|761234676|7777777^Test for exam one|||
> 201104011523|||||||201104061535|^
> ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
I supposed that, as you stated in a previous post, the records
(which you called "segments") are separated by an "x0d", if so:
--------
$ awk '$49=="O"' FS='|' yourfile
MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam one|||201104011523|||||||201104061535|^^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
--------
Note that your flag might be the 31st field then:
--------
$ awk '{print $31}' FS='|' yourfile
133445567^MPV1
032225567^MPV1
133445567^MPV1
032225567^MPV1
--------
then you may like to lock it up so:
--------
$ awk '$31~/PV1$/&&$49=="O"' FS='|'
--------
(
And if your records/segments don't go that way please post one "segment"
pushed through 'od -Ad -c' (or 'xxd' if you prefer)
)
[toc] | [prev] | [next] | [standalone]
| From | gio001 <gcrippa@gmail.com> |
|---|---|
| Date | 2011-04-07 05:06 -0700 |
| Message-ID | <a99d6877-ce45-43f2-8205-b9637d45f1f1@r3g2000yqh.googlegroups.com> |
| In reply to | #132 |
On Apr 7, 3:32 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
wrote:
> Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :
>
>
>
>
>
>
>
>
>
> > On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> > wrote:
> >> On 06.04.2011 21:25, gio001 wrote:
>
> >> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> >> > wrote:
> >> >> On 06.04.2011 16:40, gio001 wrote:
>
> >> >>> Hello,
> >> >>> I have thousand of messages (HL7),
>
> >> >> Provide sample data for a few HL7 records.
>
> >> (This request has meanwhile been repeated a couple times.)
>
> >> [...]
>
> >> > I perfectly understand what you are suggesting, I tried this form in
> >> > vain:
> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile yet
> >> > the same exact costruct works in
> >> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>
> >> You are repeating yourself, thus completely ignoring what I suggested
> >> to clarify your task.
>
> >> 1. Please don't try to give your cryptic tries without explaining what
> >> you actually want to achieve.
>
> >> 2. Provide sample data for a few HL7 records.
>
> >> [snip of irrelevant repetitions]
>
> >> > Hope I explained myself.
>
> >> Nope. (See above.)
>
> >> > Thanks.
>
> >> Good luck.
>
> > Ok, sorry for the delay, here are some sample records, each record
> > starts with an MSH, let me know if you need additional info. From one
> > MSH to the next MSH is a single message inside of which I am looking for
> > field 18 after the PV1 tag to examine that field content and only select
> > message that satisfy a particular value in that field (in this
> > particular case a capital letter 'O'), I am therefore trying to use awk
> > in the format of /pattern/ { action } and more specifically:
>
> > awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
>
> > Here are 4 messages of this group I would like only the 2nd message to
> > be written to outfile:
>
> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> > 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
> > 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> > 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
> > 201104061535|^
> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> > 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
> > 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> > ^MOBR||00001001|766666676|7777777^Test for exam one|||
> > 201104011523|||||||201104061535|^
> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> > 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
> > 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> > 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
> > 201104061535|^
> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> > 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
> > 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> > ^MOBR||00001001|761234676|7777777^Test for exam one|||
> > 201104011523|||||||201104061535|^
> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>
> I supposed that, as you stated in a previous post, the records
> (which you called "segments") are separated by an "x0d", if so:
> --------
> $ awk '$49=="O"' FS='|' yourfile
> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam one|||201104011523|||||||201104061535|^^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
> --------
>
> Note that your flag might be the 31st field then:
> --------
> $ awk '{print $31}' FS='|' yourfile
> 133445567^MPV1
> 032225567^MPV1
> 133445567^MPV1
> 032225567^MPV1
> --------
>
> then you may like to lock it up so:
> --------
> $ awk '$31~/PV1$/&&$49=="O"' FS='|'
> --------
>
> (
> And if your records/segments don't go that way please post one "segment"
> pushed through 'od -Ad -c' (or 'xxd' if you prefer)
> )
Thanks for the help and the suggestions.
Yet I would like to be able to use the awk regexp pattern syntax to
achieve the same result and not have to count manually where I need to
match in my records.
The pattern in the regexp will make it flexible enough that if even my
layout before the pattern to be matched changes I can still use the
regexp to look for my match.
Thanks again.
[toc] | [prev] | [next] | [standalone]
| From | Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID> |
|---|---|
| Date | 2011-04-07 13:34 +0000 |
| Message-ID | <4d9dbd7a$0$20837$426a34cc@news.free.fr> |
| In reply to | #133 |
Thu, 07 Apr 2011 05:06:48 -0700, gio001 did cat :
> On Apr 7, 3:32 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
> wrote:
>> Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
>> > wrote:
>> >> On 06.04.2011 21:25, gio001 wrote:
>>
>> >> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
>> >> > wrote:
>> >> >> On 06.04.2011 16:40, gio001 wrote:
>>
>> >> >>> Hello,
>> >> >>> I have thousand of messages (HL7),
>>
>> >> >> Provide sample data for a few HL7 records.
>>
>> >> (This request has meanwhile been repeated a couple times.)
>>
>> >> [...]
>>
>> >> > I perfectly understand what you are suggesting, I tried this form
>> >> > in vain:
>> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile yet
>> >> > the same exact costruct works in
>> >> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>>
>> >> You are repeating yourself, thus completely ignoring what I
>> >> suggested to clarify your task.
>>
>> >> 1. Please don't try to give your cryptic tries without explaining
>> >> what you actually want to achieve.
>>
>> >> 2. Provide sample data for a few HL7 records.
>>
>> >> [snip of irrelevant repetitions]
>>
>> >> > Hope I explained myself.
>>
>> >> Nope. (See above.)
>>
>> >> > Thanks.
>>
>> >> Good luck.
>>
>> > Ok, sorry for the delay, here are some sample records, each record
>> > starts with an MSH, let me know if you need additional info. From one
>> > MSH to the next MSH is a single message inside of which I am looking
>> > for field 18 after the PV1 tag to examine that field content and only
>> > select message that satisfy a particular value in that field (in this
>> > particular case a capital letter 'O'), I am therefore trying to use
>> > awk in the format of /pattern/ { action } and more specifically:
>>
>> > awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
>>
>> > Here are 4 messages of this group I would like only the 2nd message
>> > to be written to outfile:
>>
>> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
>> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
>> > 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
>> > 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
>> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
>> > 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
>> > 201104061535|^
>> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
>> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
>> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
>> > 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
>> > 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
>> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
>> > ^MOBR||00001001|766666676|7777777^Test for exam one|||
>> > 201104011523|||||||201104061535|^
>> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
>> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
>> > 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
>> > 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
>> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
>> > 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
>> > 201104061535|^
>> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
>> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
>> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
>> > 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
>> > 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
>> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
>> > ^MOBR||00001001|761234676|7777777^Test for exam one|||
>> > 201104011523|||||||201104061535|^
>> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>>
>> I supposed that, as you stated in a previous post, the records (which
>> you called "segments") are separated by an "x0d", if so: --------
>> $ awk '$49=="O"' FS='|' yourfile
>> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER
>> JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam
>> one|||201104011523|||||||201104061535|^^^|114233^MISTER
>> JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~ --------
>>
>> Note that your flag might be the 31st field then: --------
>> $ awk '{print $31}' FS='|' yourfile
>> 133445567^MPV1
>> 032225567^MPV1
>> 133445567^MPV1
>> 032225567^MPV1
>> --------
>>
>> then you may like to lock it up so:
>> --------
>> $ awk '$31~/PV1$/&&$49=="O"' FS='|'
>> --------
>>
>> (
>> And if your records/segments don't go that way please post one
>> "segment" pushed through 'od -Ad -c' (or 'xxd' if you prefer) )
>
> Thanks for the help and the suggestions. Yet I would like to be able to
> use the awk regexp pattern syntax to achieve the same result and not
> have to count manually where I need to match in my records.
> The pattern in the regexp will make it flexible enough that if even my
> layout before the pattern to be matched changes I can still use the
> regexp to look for my match.
OK, I understand your idea :-) Still, as the expression you gave doesn't
work on my installations (neither in grep nor in awk) you'll be advised
to read Geoff Clare and Aharon Robbins suggestions :D)
Just for completion, an algorithmical version of your exp could be
quite flexible too, here's a possible version, adapted to the data you gave:
--------
$ awk '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"' FS='|' yourfile
--------
[toc] | [prev] | [next] | [standalone]
| From | gio001 <gcrippa@gmail.com> |
|---|---|
| Date | 2011-04-07 08:27 -0700 |
| Message-ID | <04625524-8fec-4c2e-a4ca-21ea748d7961@v16g2000vbq.googlegroups.com> |
| In reply to | #136 |
On Apr 7, 9:34 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
wrote:
> Thu, 07 Apr 2011 05:06:48 -0700, gio001 did cat :
>
>
>
>
>
>
>
>
>
> > On Apr 7, 3:32 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
> > wrote:
> >> Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :
>
> >> > On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> >> > wrote:
> >> >> On 06.04.2011 21:25, gio001 wrote:
>
> >> >> > On Apr 6, 1:24 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
> >> >> > wrote:
> >> >> >> On 06.04.2011 16:40, gio001 wrote:
>
> >> >> >>> Hello,
> >> >> >>> I have thousand of messages (HL7),
>
> >> >> >> Provide sample data for a few HL7 records.
>
> >> >> (This request has meanwhile been repeated a couple times.)
>
> >> >> [...]
>
> >> >> > I perfectly understand what you are suggesting, I tried this form
> >> >> > in vain:
> >> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile yet
> >> >> > the same exact costruct works in
> >> >> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>
> >> >> You are repeating yourself, thus completely ignoring what I
> >> >> suggested to clarify your task.
>
> >> >> 1. Please don't try to give your cryptic tries without explaining
> >> >> what you actually want to achieve.
>
> >> >> 2. Provide sample data for a few HL7 records.
>
> >> >> [snip of irrelevant repetitions]
>
> >> >> > Hope I explained myself.
>
> >> >> Nope. (See above.)
>
> >> >> > Thanks.
>
> >> >> Good luck.
>
> >> > Ok, sorry for the delay, here are some sample records, each record
> >> > starts with an MSH, let me know if you need additional info. From one
> >> > MSH to the next MSH is a single message inside of which I am looking
> >> > for field 18 after the PV1 tag to examine that field content and only
> >> > select message that satisfy a particular value in that field (in this
> >> > particular case a capital letter 'O'), I am therefore trying to use
> >> > awk in the format of /pattern/ { action } and more specifically:
>
> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
>
> >> > Here are 4 messages of this group I would like only the 2nd message
> >> > to be written to outfile:
>
> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
> >> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> >> > 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
> >> > 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> >> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> >> > 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
> >> > 201104061535|^
> >> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
> >> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> >> > 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
> >> > 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> >> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> >> > ^MOBR||00001001|766666676|7777777^Test for exam one|||
> >> > 201104011523|||||||201104061535|^
> >> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
> >> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
> >> > 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
> >> > 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
> >> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
> >> > 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
> >> > 201104061535|^
> >> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
> >> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
> >> > 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
> >> > 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
> >> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
> >> > ^MOBR||00001001|761234676|7777777^Test for exam one|||
> >> > 201104011523|||||||201104061535|^
> >> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>
> >> I supposed that, as you stated in a previous post, the records (which
> >> you called "segments") are separated by an "x0d", if so: --------
> >> $ awk '$49=="O"' FS='|' yourfile
> >> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER
> >> JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam
> >> one|||201104011523|||||||201104061535|^^^|114233^MISTER
> >> JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~ --------
>
> >> Note that your flag might be the 31st field then: --------
> >> $ awk '{print $31}' FS='|' yourfile
> >> 133445567^MPV1
> >> 032225567^MPV1
> >> 133445567^MPV1
> >> 032225567^MPV1
> >> --------
>
> >> then you may like to lock it up so:
> >> --------
> >> $ awk '$31~/PV1$/&&$49=="O"' FS='|'
> >> --------
>
> >> (
> >> And if your records/segments don't go that way please post one
> >> "segment" pushed through 'od -Ad -c' (or 'xxd' if you prefer) )
>
> > Thanks for the help and the suggestions. Yet I would like to be able to
> > use the awk regexp pattern syntax to achieve the same result and not
> > have to count manually where I need to match in my records.
> > The pattern in the regexp will make it flexible enough that if even my
> > layout before the pattern to be matched changes I can still use the
> > regexp to look for my match.
>
> OK, I understand your idea :-) Still, as the expression you gave doesn't
> work on my installations (neither in grep nor in awk) you'll be advised
> to read Geoff Clare and Aharon Robbins suggestions :D)
>
> Just for completion, an algorithmical version of your exp could be
> quite flexible too, here's a possible version, adapted to the data you gave:
> --------
> $ awk '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"' FS='|' yourfile
> --------
Thanks to all,
because of your help I have a better understanding of the issue and I
have it corrected .... it now works properly :-), what I needed was
the escaping of | (only) in my awk statement, like this:
awk '/PV1\|([^\|]*\|){16}\|X/ {print $0}' infile > outfile
Loki, when I execute:
awk '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i
+18)=="O"' FS='|' yourfile
I am getting a message about event i not defined, so I removed and ran
like:
awk '{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"'
FS='|' yourfile
which seems to generate the proper output, was that section necessary
for some other purpose, and if so how can I resolve the i event
message, should I initialize i in the BEGIN section?
Again, thanks to Geoff Clare and Aharon Robbins and Loki Harfagr and
everyone else who contributed with their input!
[toc] | [prev] | [next] | [standalone]
| From | Loki Harfagr <l0k1@thedarkdesign.free.fr.INVALID> |
|---|---|
| Date | 2011-04-07 18:32 +0000 |
| Message-ID | <pan.2011.04.07.18.32.21@thedarkdesign.free.fr.INVALID> |
| In reply to | #137 |
Thu, 07 Apr 2011 08:27:56 -0700, gio001 did cat :
> On Apr 7, 9:34 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
> wrote:
>> Thu, 07 Apr 2011 05:06:48 -0700, gio001 did cat :
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > On Apr 7, 3:32 am, Loki Harfagr <l...@thedarkdesign.free.fr.INVALID>
>> > wrote:
>> >> Wed, 06 Apr 2011 19:59:03 -0700, gio001 did cat :
>>
>> >> > On Apr 6, 5:09 pm, Janis Papanagnou <janis_papanag...@hotmail.com>
>> >> > wrote:
>> >> >> On 06.04.2011 21:25, gio001 wrote:
>>
>> >> >> > On Apr 6, 1:24 pm, Janis Papanagnou
>> >> >> > <janis_papanag...@hotmail.com> wrote:
>> >> >> >> On 06.04.2011 16:40, gio001 wrote:
>>
>> >> >> >>> Hello,
>> >> >> >>> I have thousand of messages (HL7),
>>
>> >> >> >> Provide sample data for a few HL7 records.
>>
>> >> >> (This request has meanwhile been repeated a couple times.)
>>
>> >> >> [...]
>>
>> >> >> > I perfectly understand what you are suggesting, I tried this
>> >> >> > form in vain:
>> >> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
>> >> >> > yet the same exact costruct works in
>> >> >> > grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
>>
>> >> >> You are repeating yourself, thus completely ignoring what I
>> >> >> suggested to clarify your task.
>>
>> >> >> 1. Please don't try to give your cryptic tries without explaining
>> >> >> what you actually want to achieve.
>>
>> >> >> 2. Provide sample data for a few HL7 records.
>>
>> >> >> [snip of irrelevant repetitions]
>>
>> >> >> > Hope I explained myself.
>>
>> >> >> Nope. (See above.)
>>
>> >> >> > Thanks.
>>
>> >> >> Good luck.
>>
>> >> > Ok, sorry for the delay, here are some sample records, each record
>> >> > starts with an MSH, let me know if you need additional info. From
>> >> > one MSH to the next MSH is a single message inside of which I am
>> >> > looking for field 18 after the PV1 tag to examine that field
>> >> > content and only select message that satisfy a particular value in
>> >> > that field (in this particular case a capital letter 'O'), I am
>> >> > therefore trying to use awk in the format of /pattern/ { action
>> >> > } and more specifically:
>>
>> >> > awk '/PV1\|\([^|]*\|\)\{16\}\|O/ {print $0}}' infile > outfile
>>
>> >> > Here are 4 messages of this group I would like only the 2nd
>> >> > message to be written to outfile:
>>
>> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985590|P|2.4^MPID|||
>> >> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
>> >> > 133445567^MPV1|||LWK^^|||||||LWK||||||||X|
>> >> > 000004143354|||||||||||||||||||||||||201103301530||||||^MORC|SC|
>> >> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
>> >> > 00001051|12312333|7777777^Test for exam one|||201103301533|||||||
>> >> > 201104061535|^
>> >> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301553^^A|~~~
>> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||
>> >> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
>> >> > 32225567^MPV1|||LMK^^|||||||LMK||||||||O|
>> >> > 000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
>> >> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
>> >> > ^MOBR||00001001|766666676|7777777^Test for exam one|||
>> >> > 201104011523|||||||201104061535|^
>> >> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980290|P|2.4^MPID|||
>> >> > 000000121212||JIHNSON^JUDITH^B||19410423|F||||||||||000083755555|
>> >> > 133445567^MPV1|||LWH^^|||||||LWH||||||||X|
>> >> > 000004455554|||||||||||||||||||||||||201103301530||||||^MORC|SC|
>> >> > 00001001|1|760666666|IP||||201104061533|||087677^TESTDR^NAMEDR|^MOBR||
>> >> > 00001001|760222266|7777777^Test for exam one|||201103301533|||||||
>> >> > 201104061535|^
>> >> > ^^|087677^TESTDR^NAMEDR||||||||NCHM|||^^^201103301533^^A|~~~
>> >> > MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22985594|P|2.4^MPID|||
>> >> > 000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|0
>> >> > 32225567^MPV1|||LMK^^|||||||LMK||||||||R|
>> >> > 000004155555|||||||||||||||||||||||||201104011521||||||^MORC|SC|
>> >> > 00001001|1|760604676|IP||||201104061524|||114233^MISTER JR^MOLLY|
>> >> > ^MOBR||00001001|761234676|7777777^Test for exam one|||
>> >> > 201104011523|||||||201104061535|^
>> >> > ^^|114233^MISTER JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~
>>
>> >> I supposed that, as you stated in a previous post, the records
>> >> (which you called "segments") are separated by an "x0d", if so:
>> >> -------- $ awk '$49=="O"' FS='|' yourfile
>> >> MSH|^~\&|LAB|ABC||DEF|201104061535||ORM^O01|22980255|P|2.4^MPID|||000000665544||STATON^JOHN^C||19490625|M||||||||||000083764444|032225567^MPV1|||LMK^^|||||||LMK||||||||O|000004455555|||||||||||||||||||||||||201104011521||||||^MORC|SC|00001001|1|760604676|IP||||201104061524|||114233^MISTER
>> >> JR^MOLLY|^MOBR||00001001|766666676|7777777^Test for exam
>> >> one|||201104011523|||||||201104061535|^^^|114233^MISTER
>> >> JR^MOLLY||||||||NCHM|||^^^201104011523^^A|~~~ --------
>>
>> >> Note that your flag might be the 31st field then: -------- $ awk
>> >> '{print $31}' FS='|' yourfile
>> >> 133445567^MPV1
>> >> 032225567^MPV1
>> >> 133445567^MPV1
>> >> 032225567^MPV1
>> >> --------
>>
>> >> then you may like to lock it up so:
>> >> --------
>> >> $ awk '$31~/PV1$/&&$49=="O"' FS='|'
>> >> --------
>>
>> >> (
>> >> And if your records/segments don't go that way please post one
>> >> "segment" pushed through 'od -Ad -c' (or 'xxd' if you prefer) )
>>
>> > Thanks for the help and the suggestions. Yet I would like to be able
>> > to use the awk regexp pattern syntax to achieve the same result and
>> > not have to count manually where I need to match in my records. The
>> > pattern in the regexp will make it flexible enough that if even my
>> > layout before the pattern to be matched changes I can still use the
>> > regexp to look for my match.
>>
>> OK, I understand your idea :-) Still, as the expression you gave
>> doesn't work on my installations (neither in grep nor in awk) you'll be
>> advised to read Geoff Clare and Aharon Robbins suggestions :D)
>>
>> Just for completion, an algorithmical version of your exp could be
>> quite flexible too, here's a possible version, adapted to the data you
>> gave: --------
>> $ awk
>> '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"'
>> FS='|' yourfile --------
>
> Thanks to all,
> because of your help I have a better understanding of the issue and I
> have it corrected .... it now works properly :-), what I needed was the
> escaping of | (only) in my awk statement, like this: awk
> '/PV1\|([^\|]*\|){16}\|X/ {print $0}' infile > outfile
>
> Loki, when I execute:
> awk '!i&&/PV1/{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i
> +18)=="O"' FS='|' yourfile
> I am getting a message about event i not defined,
This is typical of a doublequote (") used instead of a program
bound single-quote ('), but in some shells (some env.) it may be
necessary to protect the code from the shell thus to put it in a script
instead of launching it on the command line, that's quite common
in MSWindows shells and real shells running in MSWindows but I've never
seen this in AIX (now my last AIX was a 4.3.1 or 4.3.2 so I guess a
few things may have rot since ;-)
> so I removed and ran
> like:
> awk '{for(i=NF;i;i--){if($i~/PV1$/)break}}$i~/PV1$/&&$(i+18)=="O"'
> FS='|' yourfile
> which seems to generate the proper output, was that section necessary
> for some other purpose, and if so how can I resolve the i event message,
> should I initialize i in the BEGIN section?
the !i test is there just to fix the condition that only the first record
containing a field triggering the exp /PV1$/ will be checked to get the
value for i. It's essentially an optimization, as such it can also have
backlash effects and/or curative effects, depending on the 'solidity' of
your input data ;-)
> Again, thanks to Geoff Clare and Aharon Robbins and Loki Harfagr and
> everyone else who contributed with their input!
[toc] | [prev] | [next] | [standalone]
| From | Geoff Clare <geoff@clare.See-My-Signature.invalid> |
|---|---|
| Date | 2011-04-07 13:34 +0100 |
| Message-ID | <qjn078-7j2.ln1@leafnode-msgid.gclare.org.uk> |
| In reply to | #129 |
gio001 wrote:
> I perfectly understand what you are suggesting, I tried this form in
> vain:
> awk '/PV1\|\([^|]*\|\)\{16\}\|X/ {print $0}}' infile > outfile
> yet the same exact costruct works in
> grep 'PV1\|\([^|]*\|\)\{16\}\|X' infile > outfile
Without the -E option grep uses "basic regular expressions" (BREs),
whereas awk uses "extended regular expressions" (EREs). There are
some characters that are special in EREs but not in BREs. Some of
these can be made special in BREs by preceding them with a
backslash, but in EREs doing that makes them not special.
The ERE equivalent of the BRE you used with grep is:
PV1|([^|]*|){16}|X
if your version of grep treats \| as special, or:
PV1\|([^|]*\|){16}\|X
if your version of grep does not treat \| as special.
--
Geoff Clare <netnews@gclare.org.uk>
[toc] | [prev] | [next] | [standalone]
| From | arnold@skeeve.com (Aharon Robbins) |
|---|---|
| Date | 2011-04-07 13:31 +0000 |
| Message-ID | <inkeb1$2pi$1@tornado.tornevall.net> |
| In reply to | #134 |
In article <qjn078-7j2.ln1@leafnode-msgid.gclare.org.uk>,
Geoff Clare <geoff@clare.See-My-Signature.invalid> wrote:
>The ERE equivalent of the BRE you used with grep is:
>
>PV1|([^|]*|){16}|X
It may be that awk on AIX does not support interval expressions
(x{16}); this should be tested too.
--
Aharon (Arnold) Robbins arnold AT skeeve DOT com
P.O. Box 354 Home Phone: +972 8 979-0381
Nof Ayalon Cell Phone: +972 50 729-7545
D.N. Shimshon 99785 ISRAEL
[toc] | [prev] | [next] | [standalone]
| From | Geoff Clare <geoff@clare.See-My-Signature.invalid> |
|---|---|
| Date | 2011-04-08 13:18 +0100 |
| Message-ID | <11b378-vho.ln1@leafnode-msgid.gclare.org.uk> |
| In reply to | #135 |
Aharon Robbins wrote:
> Geoff Clare <geoff@clare.See-My-Signature.invalid> wrote:
>>The ERE equivalent of the BRE you used with grep is:
>>
>>PV1|([^|]*|){16}|X
>
> It may be that awk on AIX does not support interval expressions
> (x{16}); this should be tested too.
They are required by the POSIX/UNIX standard, and AIX is certified
as conforming.
I suppose it's possible that they are not supported by default in
awk and you need to do something to set up a conforming environment,
but I think that's unlikely.
--
Geoff Clare <netnews@gclare.org.uk>
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.awk
csiph-web