Path: csiph.com!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: D <nospam@example.net>
Newsgroups: comp.misc
Subject: Re: terminal only for two weeks
Date: Wed, 27 Nov 2024 10:54:50 +0100
Organization: i2pn2 (i2pn.org)
Message-ID: <4875e490-ad30-d644-345f-4a09c1935c6b@example.net>
References: <67447ce1$0$22$882e4bbb@reader.netnews.com> <vi3ecs$35u53$1@dont-email.me> <6c4ae24b-7bb8-7d84-8f74-1f5fc14c0ec0@example.net> <87ed2yjkl8.fsf@tilde.institute> <55db8483-58f0-c3dc-de0b-7f44881fa180@example.net> <87jzcp4pzy.fsf@enoch.nodomain.nowhere>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
Injection-Info: i2pn2.org; logging-data="142214"; mail-complaints-to="usenet@i2pn2.org"; posting-account="w/4CleFT0XZ6XfSuRJzIySLIA6ECskkHxKUAYDZM66M";
In-Reply-To: <87jzcp4pzy.fsf@enoch.nodomain.nowhere>
X-Spam-Checker-Version: SpamAssassin 4.0.0
Xref: csiph.com comp.misc:26196



On Tue, 26 Nov 2024, Mike Spencer wrote:

>
> D <nospam@example.net> writes:
>
>> On Tue, 26 Nov 2024, yeti wrote:
>>
>>>    <https://www.brow.sh/>
>>
>> Ah yes... I've seen this before! I did drop it due to its dependency on
>> FF, but the concept is similar. My idea was to aggressively filter a web
>> page before passing it on to elinks or similar.
>>
>> Perhaps rewriting it a bit in order to avoid the looooooong list of menu
>> options or links that always come up at the top of the page, before the
>> content of the page shows after a couple of page downs (this happens for
>> instance if I go to wikipedia).
>>
>> Instead parsing it, and adding those links at the bottom, removing
>> javascript, and perhaps passing on only the text. Well, those are only
>> ideas. Maybe I'll try, maybe I won't. Time will tell! =)
>
> I've done this for a few individual sites that I visit frequently.
>
>  + A link to that site resides on my browser's "home" page.
>
>  + That home page is a file in ~/html/ on localhost.
>
>  + The link is actually to a target-specific cgi-bin Perl script on
>    localhost where Apache is running, restricted to requests from
>    localhost.
>
>  + The script takes the URL sent from the home page, rewrites it for
>    the routable net, sends it to the target using wget and reads all
>    of the returned data into a variable.
>
>  + Using Perl's regular expressions, stuff identified (at time of
>    writing the script) as unwanted is elided -- js, style, svg,
>    noscript etc.  URLs self-referencing the target are rewritten to
>    to be sent through the cgi-bin script.
>
>  + Other tweaks peculiar to the specific target...
>
>  + Result is handed back to the browser preceded by minimal HTTP
>    headers.
>
> So far, works like a charm.  Always the potential that a target host
> will change their format significantly.  That has happened a couple of
> times, requiring fetching an unadorned copy of the target's page,
> tedious reading/parsing and edit to the script.
>
> This obviously doesn't work for those sites that initially send a
> dummy all-js page to verify that you have js enabled and send you a
> condescending reproof if you don't.  Other server-side dominance games
> a potential challenge or a stone wall.
>
> Writing a generalized version, capable of dealing with pages from
> random/arbitrary sites is a notion perhaps worth pursuing but clearly
> more of a challenge than site-specific scripts. RSN, round TUIT etc.

Brilliant! You are a poet Mike!

Frogfind.com was a great start! I would love to have some kind of crowd 
sourced html5->html1 - javascript - garbage script.

I also wondered if another approach might just be to take the top 500 
sites and base it on that? Or even looking through my own history, take 
the top 100.

Due to the bad development of the net, it seems like a greater and greater 
part of our browsing takes place on ever fewer numbers of sites.