Groups | Search | Server Info | Login | Register
| Newsgroups | comp.lang.awk |
|---|---|
| From | j.eh@mchsi.com |
| Subject | Re: getline timeout (revisited) |
| References | (1 earlier) <CsqdnTvy49IoCjDQnZ2dnUVZ_t2dnZ2d@mchsi.com> <ioka93$13e$1@news.m-online.net> <xpKdne1Tg-IuVTPQnZ2dnUVZ_hCdnZ2d@mchsi.com> <iomj2j$j9o$1@speranza.aioe.org> <s-idnXrjvPkiQDPQnZ2dnUVZ_u-dnZ2d@mchsi.com> |
| Message-ID | <3K-dneqSL9hwly3QnZ2dnUVZ_rCdnZ2d@mchsi.com> (permalink) |
| Date | 2011-04-21 05:46 -0500 |
In article <s-idnXrjvPkiQDPQnZ2dnUVZ_u-dnZ2d@mchsi.com>, j.eh@mchsi.com wrote:
> In article <iomj2j$j9o$1@speranza.aioe.org>, Janis Papanagnou wrote:
>> Am 20.04.2011 13:51, schrieb j.eh@mchsi.com:
>>> In article<ioka93$13e$1@news.m-online.net>, Janis Papanagnou wrote:
>>>> On 19.04.2011 16:09, j.eh@mchsi.com wrote:
>>>>> In article<inmun6$lq1$1@speranza.aioe.org>, Janis Papanagnou wrote:
>>>>>> I've been currently looking for a timeout option for getline in
>>>>>> the context of an /inet/tcp/... socket communication with gawk.
>>>>>>
>>>>>> This topic had already been addressed here in c.l.a many many
>>>>>> years ago, as my google search showed, but I haven't found any
>>>>>> positive answers. Has there been something incorporated in gawk
>>>>>> or xgawk, meanwhile, or is the status unchanged. I suppose the
>>>>>> latter, but asking for a confirmation anyway.
>>>>>
>>>>> [ Janis, I hit the wrong button. My intention was to post to
>>>>> the group, not to send a personal email. Sorry about that. ]
>>>>
>>>> Don't worry; in my mailbox only spam isn't welcome. :-)
>>>>
>>>>>
>>>>>
>>>>> I find the discussions in the old threads interesting, but I have
>>>>> a question. Let's assume we use PROCINFO to specify timeout like
>>>>> this:
>>>>>
>>>>> PROCINFO["/inet/tcp/..", "TIMEOUT"] = 1000 (ms)
>>>>>
>>>>> and a value of 0 means the default behavior i.e. no timeout.
>>>>
>>>> Yes.
>>>>
>>>>>
>>>>> Also, assume the existence of a builtin either in the gawk source
>>>>> or in a extension:
>>>>>
>>>>> readline("/inet/tcp/..")
>>>>
>>>> Hmm.. - why a new function or builtin 'readline()' and not extending
>>>> the functionality of 'getline'? (Shouldn't conflict given the PROCINFO
>>>> approach.)
>>>
>>> I don't know what extensions of 'getline' (or 'RS', or any other
>>> builtin feature of the language) functionality will be required,
>>> but all I know is that if those are in conflict with the required
>>> semantics and rules of the language, Arnold isn't simply going to
>>> accept it. So, one has to prove that the required functionality can't be
>>> provided by a seperate builtin, and/or any proposed changes to 'getline'
>>> functionality aren't going to violet the rules of the language etc. etc.
>>> Besides, a seperate function provides lot more flexibility; Just
>>> consider:
>>>
>>> readline("/inet/tcp/...", var, timeout)
>>>
>>> 'var' is pass by reference. You won't need PROCINFO or environment
>>> variable for timeout specification, and won't even be limited to
>>> returning only -1, 0 or 1.
>>
>> Yes, if you can easily avoid PROCINFO that would be fine.
>> But if, as an alternative, you'd introducde an new function
>> that just implements 99% of an existing language construct;
>> that wouldn't be my preferred choice.
>
> Not mine either, but our hands are tied by the all powerful
> standard. See more on this w.r.t. the return value below.
>
>>
>>>
>>>>
>>>>>
>>>>> which other than being a function call behaves exactly like getline
>>>>> w.r.t. RS, RT and setting fields. We modify 'readline' to handle
>>>>> timeout anyway you think is suitable to serve our purpose. The question
>>>>> is then, if a script using 'readline' going to look any different
>>>>> than from the one that uses getline with exactly the same
>>>>> modifications.
>>>>
>>>> I don't understand what you're saying above, and what you're aiming at.
>>>>
>>>
>>> If the use of 'getline' does not simplify things in the script compared
>>> to a new function, then why bother trying to extend 'getline'?
>>
>> What I least would like to see are half a dozen new functions,
>> *if* we can use the existing interface without breaking anything,
>> and if it fits nicely (as I think it does) in the existing concepts.
>>
>>>
>>> Ok, let's just stick with getline and see if it can provide a
>>> general solution;
>>
>> (LOL - you certainly cannot :-)
>>
>>> A solution specific to a particular problem
>>> at hand isn't going to be good enough. The following example from
>>> gawk-inet documentation, I believe, illustrates the most general
>>> usage of getline. There isn't any need to throw input file and
>>> related pattern/action, or even fields into the mix.
>>>
>>> BEGIN {
>>> RS = ORS = "\r\n"
>>> HttpService = "/inet/tcp/0/proxy/80"
>>> print "GET http://www.yahoo.com" |& HttpService
>>> PROCINFO[HttpService, "TIMEOUT"] = 1000
>>> while ((HttpService |& getline var)> 0)
>>> print var
>>> close(HttpService)
>>> }
>>
>> Change the while() loop to something more appropriate.
>>
>> while (whatever_condition) {
>> if ((Service |& getline var) >= 0) do_sth_w( var )
>> else report_a_provided_error_by_any_means();
>> # or distinguish return values <0 and ==0
>> }
>
> All I can say is that if you replace 'getline' with the
> imaginary 'readline' builtin, the structure will remain
> the same; There is no loss from a user's perspective.
>
> On a related note, someone had a wish for the ability to specify
> a two-way process name as a file name in the gawk command line in
> a thread titled "Dreamer's Wishlist". I don't know what exactly
> he had in mind, but I am assuming the intention was to avoid getline
> altogether to read from the socket like this:
>
> /pat/ { print $0 }
>
> 'pat' is matched against data read from the socket.
>
> This obviously can't be done with a new builtin, and
> will need help, and a lot of it, from gawk in terms
> of how it reads data from a file.
>
>>
>>>
>>> I added the line with the PROCINFO entry, and used 'getline var'
>>> instead of 'getline'.
>>
>> (The latter doesn't contribute to the question.)
>>
>>> What other changes are needed so that we still
>>> get the desired output with timeouts? If the answer is none whatsoever,
>>> then probably won't need to consider the following:
>>>
>>> 1. Dealing with partial output in case of a timeout.
>>
>> Be aware that you *need* some channel to pass an error from
>> the underlying OS or language level anyway!
>>
>> Getline will fill $0 or the provided var, so the user can
>> decide what to do with the (partial or not) data.
>>
>>> 2. Handling of non-recoverable errors. What is getline
>>> supposed to return if a timeout occurs?
>>
>> An error indication (" <0 ") and some hint WRT the error;
>> this can be a coded error number ("<0" is differenciated
>> to "-1","-2",...,"-n") and/or some error text (any channel
>> to provide that is necessary; PROCINFO, predefined variable,
>> ...).
>
> This may seem trivial, but right there it throws out the possiblity
> of using getline, I think. This is what is in the spec. for getline:
>
> "getline shall return 1 for successful input, zero for end-of-file,
> and -1 for an error"
>
> Not even sure if it is the right thing to do for a function call.
Here is an idea:
Let gawk update the PROCINFO["..", "TIMEOUT"] to indicate how much
time is left (ala select() in some os), and initialize it with a
negative value in case of a non-recoverable error.
Status of the read operation, and relation to other variables:
status PROCINFO[..,"TIMEOUT"] retval length(var or $0)
-------------------------------------------------------------
Normal > 0 > 0 >= 0
EOF < 0 * 0 0
timed out == 0 -1 0
Error < 0 -1 0
* Not necessary, but useful from the user's POV.
The general case usage can be something like this:
...
Service = "/inet/tcp/0/x.y.z/p"
print "..." |& Service
do {
PROCINFO[Service, "TIMEOUT"] = 1000
if ((Service |& getline)) > 0)
print $0
} while (PROCINFO[Service, "TIMEOUT"] >= 0)
if (PROCINFO[Service, "TIMEOUT"] < 0) # EOF or error
close(Service);
print("...") |& Service
Read some more with or without timeout
...
If this makes any sense at all, may I claim that I killed
two birds with one stone. Most likely it doesn't; not
enough caffine yet, and my head hurts thinking about it.
Thanks,
John
>
>>
>>> 3. Making sure there are finite number of retries in the
>>> event of timeouts.
>>
>> On application level see "whatever_condition" in my example
>> above. On awk/library level see Arnold's reply upthread.
>
> I beleive those environment variables relate to initial connect(),
> not read() or write(), but I have to double check.
>
> John
>
>
>
>>
>> Janis
>>
>>> 4. ?
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>>> If possible, please consider providing the outline
>>>>> of such a script illustrating the usage of 'readline' or getline
>>>>> with timeout.
>>>>
>>>> In my specific primitive application - which is by no means meant to be
>>>> a general example, covering all needs, or considering any corner cases -
>>>> it was just something like (simplified code)...
>>>>
>>>> BEGIN { P = "/inet/tcp/0/a.b.c.d/e" }
>>>> { print some_funct($0) |& P }
>>>> /pat/ { P |& getline ; do_sth_w($0) }
>>>> END { close(P) }
>>>>
>>>> which might, with timeouts, morph to something like...
>>>>
>>>> BEGIN { P = "/inet/tcp/0/a.b.c.d/e"
>>>> PROCINFO[ P, "TIMEOUT"] = 1000
>>>> }
>>>>
>>>> { print some_funct($0) |& P }
>>>>
>>>> /pat/ { if( (P |& getline)> 0) do_sth_w($0)
>>>> else print "Error:", PROCINFO[ P, "ERROR"]
>>>> }
>>>>
>>>> END { close(P) }
>>>>
>>>>
>>>> Using PROCINFO as a means to return the error is just an ad hoc thought
>>>> based on your timeout setting example. (So feel free to ignore it and
>>>> introduce something else.)
>>>>
>>>>> To keep things uniform, assume we parse ERRNO to
>>>>> find out the cause of error. Not very gifted when it comes to writing
>>>>> awk scripts that doesn't look like C, so I hope you understand my
>>>>> request in this context.
>>>>
>>>> If I've misunderstood your request, please clarify.
>>>>
>>>> Thanks.
>>>>
>>>> Janis
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Janis
>>>>
>>
Back to comp.lang.awk | Previous | Next — Previous in thread | Next in thread | Find similar
getline timeout (revisited) Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-08 14:23 +0200
Re: getline timeout (revisited) arnold@skeeve.com (Aharon Robbins) - 2011-04-10 18:49 +0000
Re: getline timeout (revisited) j.eh@mchsi.com - 2011-04-19 09:09 -0500
Re: getline timeout (revisited) Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-19 17:38 +0200
Re: getline timeout (revisited) j.eh@mchsi.com - 2011-04-20 06:51 -0500
Re: getline timeout (revisited) Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-20 14:20 +0200
Re: getline timeout (revisited) j.eh@mchsi.com - 2011-04-20 08:21 -0500
Re: getline timeout (revisited) j.eh@mchsi.com - 2011-04-21 05:46 -0500
Re: getline timeout (revisited) arnold@skeeve.com (Aharon Robbins) - 2011-04-22 13:55 +0000
Re: getline timeout (revisited) Grant <omg@grrr.id.au> - 2011-04-23 06:50 +1000
csiph-web