Groups | Search | Server Info | Login | Register


Groups > comp.lang.awk > #174

Re: getline timeout (revisited)

Newsgroups comp.lang.awk
From j.eh@mchsi.com
Subject Re: getline timeout (revisited)
References (1 earlier) <CsqdnTvy49IoCjDQnZ2dnUVZ_t2dnZ2d@mchsi.com> <ioka93$13e$1@news.m-online.net> <xpKdne1Tg-IuVTPQnZ2dnUVZ_hCdnZ2d@mchsi.com> <iomj2j$j9o$1@speranza.aioe.org> <s-idnXrjvPkiQDPQnZ2dnUVZ_u-dnZ2d@mchsi.com>
Message-ID <3K-dneqSL9hwly3QnZ2dnUVZ_rCdnZ2d@mchsi.com> (permalink)
Date 2011-04-21 05:46 -0500

Show all headers | View raw


In article <s-idnXrjvPkiQDPQnZ2dnUVZ_u-dnZ2d@mchsi.com>, j.eh@mchsi.com wrote:
> In article <iomj2j$j9o$1@speranza.aioe.org>, Janis Papanagnou wrote:
>> Am 20.04.2011 13:51, schrieb j.eh@mchsi.com:
>>> In article<ioka93$13e$1@news.m-online.net>, Janis Papanagnou wrote:
>>>> On 19.04.2011 16:09, j.eh@mchsi.com wrote:
>>>>> In article<inmun6$lq1$1@speranza.aioe.org>, Janis Papanagnou wrote:
>>>>>> I've been currently looking for a timeout option for getline in
>>>>>> the context of an /inet/tcp/... socket communication with gawk.
>>>>>>
>>>>>> This topic had already been addressed here in c.l.a many many
>>>>>> years ago, as my google search showed, but I haven't found any
>>>>>> positive answers. Has there been something incorporated in gawk
>>>>>> or xgawk, meanwhile, or is the status unchanged. I suppose the
>>>>>> latter, but asking for a confirmation anyway.
>>>>>
>>>>> [ Janis, I hit the wrong button. My intention was to post to
>>>>> the group, not to send a personal email. Sorry about that. ]
>>>>
>>>> Don't worry; in my mailbox only spam isn't welcome. :-)
>>>>
>>>>>
>>>>>
>>>>> I find the discussions in the old threads interesting, but I have
>>>>> a question. Let's assume we use PROCINFO to specify timeout like
>>>>> this:
>>>>>
>>>>>       PROCINFO["/inet/tcp/..", "TIMEOUT"] = 1000 (ms)
>>>>>
>>>>> and a value of 0 means the default behavior i.e. no timeout.
>>>>
>>>> Yes.
>>>>
>>>>>
>>>>> Also, assume the existence of a builtin either in the gawk source
>>>>> or in a extension:
>>>>>
>>>>>       readline("/inet/tcp/..")
>>>>
>>>> Hmm.. - why a new function or builtin 'readline()' and not extending
>>>> the functionality of 'getline'? (Shouldn't conflict given the PROCINFO
>>>> approach.)
>>>
>>> I don't know what extensions of 'getline' (or 'RS', or any other
>>> builtin feature of the language) functionality will be required,
>>> but all I know is that if those are in conflict with the required
>>> semantics and rules of the language, Arnold isn't simply going to
>>> accept it. So, one has to prove that the required functionality can't be
>>> provided by a seperate builtin, and/or any proposed changes to 'getline'
>>> functionality aren't going to violet the rules of the language etc. etc.
>>> Besides, a seperate function provides lot more flexibility; Just
>>> consider:
>>>
>>>    readline("/inet/tcp/...", var, timeout)
>>>
>>> 'var' is pass by reference. You won't need PROCINFO or environment
>>> variable for timeout specification, and won't even be limited to
>>> returning only -1, 0 or 1.
>> 
>> Yes, if you can easily avoid PROCINFO that would be fine.
>> But if, as an alternative, you'd introducde an new function
>> that just implements 99% of an existing language construct;
>> that wouldn't be my preferred choice.
> 
> Not mine either, but our hands are tied by the all powerful
> standard. See more on this w.r.t. the return value below.
> 
>> 
>>>
>>>>
>>>>>
>>>>> which other than being a function call behaves exactly like getline
>>>>> w.r.t. RS, RT and setting fields. We modify 'readline' to handle
>>>>> timeout anyway you think is suitable to serve our purpose. The question
>>>>> is then, if a script using 'readline' going to look any different
>>>>> than from the one that uses getline with exactly the same
>>>>> modifications.
>>>>
>>>> I don't understand what you're saying above, and what you're aiming at.
>>>>
>>>
>>> If the use of 'getline' does not simplify things in the script compared
>>> to a new function, then why bother trying to extend 'getline'?
>> 
>> What I least would like to see are half a dozen new functions,
>> *if* we can use the existing interface without breaking anything,
>> and if it fits nicely (as I think it does) in the existing concepts.
>> 
>>>
>>> Ok, let's just stick with getline and see if it can provide a
>>> general solution;
>> 
>> (LOL - you certainly cannot :-)
>> 
>>> A solution specific to a particular problem
>>> at hand isn't going to be good enough. The following example from
>>> gawk-inet documentation, I believe, illustrates the most general
>>> usage of getline. There isn't any need to throw input file and
>>> related pattern/action, or even fields into the mix.
>>>
>>>       BEGIN {
>>>         RS = ORS = "\r\n"
>>>         HttpService = "/inet/tcp/0/proxy/80"
>>>         print "GET http://www.yahoo.com" |&  HttpService
>>>         PROCINFO[HttpService, "TIMEOUT"] = 1000
>>>         while ((HttpService |&  getline var)>  0)
>>>            print var
>>>         close(HttpService)
>>>       }
>> 
>> Change the while() loop to something more appropriate.
>> 
>>    while (whatever_condition) {
>>      if ((Service |& getline var) >= 0) do_sth_w( var )
>>      else report_a_provided_error_by_any_means();
>>      # or distinguish return values <0 and ==0
>>    }
> 
> All I can say is that if you replace 'getline' with the
> imaginary 'readline' builtin, the structure will remain
> the same; There is no loss from a user's perspective.
> 
> On a related note, someone had a wish for the ability to specify
> a two-way process name as a file name in the gawk command line in
> a thread titled "Dreamer's Wishlist". I don't know what exactly
> he had in mind, but I am assuming the intention was to avoid getline
> altogether to read from the socket like this:
> 
> /pat/ { print $0 }
> 
> 'pat' is matched against data read from the socket.
> 
> This obviously can't be done with a new builtin, and
> will need help, and a lot of it, from gawk  in terms
> of how it reads data from a file.
> 
>> 
>>>
>>> I added the line with the PROCINFO entry, and used 'getline var'
>>> instead of 'getline'.
>> 
>> (The latter doesn't contribute to the question.)
>> 
>>> What other changes are needed so that we still
>>> get the desired output with timeouts? If the answer is none whatsoever,
>>> then probably won't need to consider the following:
>>>
>>> 1. Dealing with partial output in case of a timeout.
>> 
>> Be aware that you *need* some channel to pass an error from
>> the underlying OS or language level anyway!
>> 
>> Getline will fill $0 or the provided var, so the user can
>> decide what to do with the (partial or not) data.
>> 
>>> 2. Handling of non-recoverable errors. What is getline
>>> supposed to return if a timeout occurs?
>> 
>> An error indication (" <0 ") and some hint WRT the error;
>> this can be a coded error number ("<0" is differenciated
>> to "-1","-2",...,"-n") and/or some error text (any channel
>> to provide that is necessary; PROCINFO, predefined variable,
>> ...).
> 
> This may seem trivial, but right there it throws out the possiblity
> of using getline, I think. This is what is in the spec. for getline:
> 
> "getline shall return 1 for successful input, zero for end-of-file,
> and -1 for an error"
> 
> Not even sure if it is the right thing to do for a function call.

Here is an idea:

Let gawk update the PROCINFO["..", "TIMEOUT"] to indicate how much
time is left (ala select() in some os), and initialize it with a
negative value in case of a non-recoverable error.

Status of the read operation, and relation to other variables:

  status    PROCINFO[..,"TIMEOUT"]    retval   length(var or $0)
  -------------------------------------------------------------
  Normal      > 0                      > 0          >= 0
  EOF         < 0 *                      0            0
  timed out   == 0                      -1            0
  Error       < 0                       -1            0

* Not necessary, but useful from the user's POV.

The general case usage can be something like this:

      ...
   Service = "/inet/tcp/0/x.y.z/p"
   print "..." |& Service
   do {
      PROCINFO[Service, "TIMEOUT"] = 1000
      if ((Service |& getline)) > 0)
         print $0
   } while (PROCINFO[Service, "TIMEOUT"] >= 0)

   if (PROCINFO[Service, "TIMEOUT"] < 0)  # EOF or error
       close(Service);

   print("...") |& Service
   Read some more with or without timeout
    ...


If this makes any sense at all, may I claim that I killed
two birds with one stone. Most likely it doesn't; not
enough caffine yet, and my head hurts thinking about it.

Thanks,

John

> 
>> 
>>> 3. Making sure there are finite number of retries in the
>>> event of timeouts.
>> 
>> On application level see "whatever_condition" in my example
>> above. On awk/library level see Arnold's reply upthread.
> 
> I beleive those environment variables relate to initial connect(),
> not read() or write(), but I have to double check.
> 
> John
> 
> 
> 
>> 
>> Janis
>> 
>>> 4. ?
>>>
>>> Thanks,
>>>
>>> John
>>>
>>>>> If possible, please consider providing the outline
>>>>> of such a script illustrating the usage of 'readline' or getline
>>>>> with timeout.
>>>>
>>>> In my specific primitive application - which is by no means meant to be
>>>> a general example, covering all needs, or considering any corner cases -
>>>> it was just something like (simplified code)...
>>>>
>>>>    BEGIN { P = "/inet/tcp/0/a.b.c.d/e" }
>>>>          { print some_funct($0) |&   P }
>>>>    /pat/ { P |&  getline ; do_sth_w($0) }
>>>>    END   { close(P) }
>>>>
>>>> which might, with timeouts, morph to something like...
>>>>
>>>>    BEGIN { P = "/inet/tcp/0/a.b.c.d/e"
>>>>            PROCINFO[ P, "TIMEOUT"] = 1000
>>>>          }
>>>>
>>>>          { print some_funct($0) |&   P }
>>>>
>>>>    /pat/ { if( (P |&  getline)>  0) do_sth_w($0)
>>>>            else print "Error:", PROCINFO[ P, "ERROR"]
>>>>          }
>>>>
>>>>    END   { close(P) }
>>>>
>>>>
>>>> Using PROCINFO as a means to return the error is just an ad hoc thought
>>>> based on your timeout setting example. (So feel free to ignore it and
>>>> introduce something else.)
>>>>
>>>>> To keep things uniform, assume we parse ERRNO to
>>>>> find out the cause of error. Not very gifted when it comes to writing
>>>>> awk scripts that doesn't look like C, so I hope you understand my
>>>>> request in this context.
>>>>
>>>> If I've misunderstood your request, please clarify.
>>>>
>>>> Thanks.
>>>>
>>>> Janis
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John
>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Janis
>>>>
>> 

Back to comp.lang.awk | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

getline timeout (revisited) Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-08 14:23 +0200
  Re: getline timeout (revisited) arnold@skeeve.com (Aharon Robbins) - 2011-04-10 18:49 +0000
  Re: getline timeout (revisited) j.eh@mchsi.com - 2011-04-19 09:09 -0500
    Re: getline timeout (revisited) Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-19 17:38 +0200
      Re: getline timeout (revisited) j.eh@mchsi.com - 2011-04-20 06:51 -0500
        Re: getline timeout (revisited) Janis Papanagnou <janis_papanagnou@hotmail.com> - 2011-04-20 14:20 +0200
          Re: getline timeout (revisited) j.eh@mchsi.com - 2011-04-20 08:21 -0500
            Re: getline timeout (revisited) j.eh@mchsi.com - 2011-04-21 05:46 -0500
            Re: getline timeout (revisited) arnold@skeeve.com (Aharon Robbins) - 2011-04-22 13:55 +0000
              Re: getline timeout (revisited) Grant <omg@grrr.id.au> - 2011-04-23 06:50 +1000

csiph-web