Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.programming > #16327

Re: Scanning

From Ben Bacarisse <ben.usenet@bsb.me.uk>
Newsgroups comp.programming
Subject Re: Scanning
Date 2023-01-19 18:08 +0000
Organization A noiseless patient Spider
Message-ID <87v8l2z9bv.fsf@bsb.me.uk> (permalink)
References <Scanning-20230119123241@ram.dialup.fu-berlin.de>

Show all headers | View raw


ram@zedat.fu-berlin.de (Stefan Ram) writes:

>   Some idle thoughts about scanning (lexical analysis, or
>   rather what comes before it) ...
>
>   Let's take a very simple task: This scanner for text files
>   has nothing more to do than to return every character,
>   except to strip the spaces at the end of a line.
>
>   It is a function "get_next_token" that on each call will
>   return the next character from a file to its client (caller),
>   except that spaces at the end of a line will skipped.
>
>   So we read the line and strip the spaces. (One line in
>   Python.)
>
>   But how do I know in advance if the line will fit into
>   memory?

That's a huge assumption!  There's no need to read the line just to skip
spaces at the end.  All you need to do is read and count them so you can
"hand back" the right number of spaces if you don't see a newline
character.

But then this is not the real problem, I suspect.  You probably want to
skip spaces and tabs and probably other things at the end of a line.
Then again, maybe you really want to replace multiple spaces with just
on at this stage of the processing?  That's is the trouble with cut down
problem statements -- they can have simple solutions that don't apply in
the real case.

Mind you, I would try hard to avoid reading a line unless a line is
really and important structure.  You might only need to store the
largest token.

-- 
Ben.

Back to comp.programming | Previous | Next | Find similar


Thread

Re: Scanning Ben Bacarisse <ben.usenet@bsb.me.uk> - 2023-01-19 18:08 +0000

csiph-web