Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.misc > #27373

Re: bad bot behavior

From anthk <anthk@openbsd.home>
Newsgroups comp.misc
Subject Re: bad bot behavior
Date 2025-05-12 06:24 +0000
Organization A noiseless patient Spider
Message-ID <slrn101ue1g.198p.anthk@openbsd.home.localhost> (permalink)
References <vrc2r4$2okrp$1@dont-email.me> <vrc8qm$2tkq5$1@dont-email.me> <20250318182006.00006ae3@dne3.net>

Show all headers | View raw


On 2025-03-18, Toaster <toaster@dne3.net> wrote:
> On Tue, 18 Mar 2025 12:00:07 -0500
> D Finnigan <dog_cow@macgui.com> wrote:
>
>> On 3/18/25 10:17 AM, Ben Collver wrote:
>> > Please stop externalizing your costs directly into my face
>> > ==========================================================
>> > March 17, 2025 on Drew DeVault's blog
>> > 
>> > Over the past few months, instead of working on our priorities at
>> > SourceHut, I have spent anywhere from 20-100% of my time in any
>> > given week mitigating hyper-aggressive LLM crawlers at scale.
>> 
>> This is happening at my little web site, and if you have a web site, 
>> it's happening to you too. Don't be a victim.
>> 
>> Actually, I've been wondering where they're storing all this data;
>> and how much duplicate data is stored from separate parties all
>> scraping the web simultaneously, but independently.
>
> But what can be done to mitigate this issue? Crawlers and bots ruin the
> internet.
>

GZip bombs + fake links = profit. Remember that gz'ed web pages are a
standard, even lynx can parse gz files natively.

Also, Megahal/Hailo under Perl. Feed it nonsense, and create some 
non-visible contents under a robots.txt-dissallowed directory
full of Markov-chains generated nonsense and gzip bombs. 

Back to comp.misc | Previous | NextPrevious in thread | Find similar


Thread

bad bot behavior Ben Collver <bencollver@tilde.pink> - 2025-03-18 15:17 +0000
  Re: bad bot behavior D Finnigan <dog_cow@macgui.com> - 2025-03-18 12:00 -0500
    Re: bad bot behavior not@telling.you.invalid (Computer Nerd Kev) - 2025-03-19 08:19 +1000
    Re: bad bot behavior Toaster <toaster@dne3.net> - 2025-03-18 18:20 -0400
      Re: bad bot behavior Ian <${send-direct-email-to-news1021-at-jusme-dot-com-if-you-must}@jusme.com> - 2025-03-19 12:06 +0000
        Re: bad bot behavior Rich <rich@example.invalid> - 2025-03-19 16:59 +0000
          Re: bad bot behavior candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2025-03-23 14:30 +0000
        Re: bad bot behavior Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-03-20 02:22 +0000
          Re: bad bot behavior Ian <${send-direct-email-to-news1021-at-jusme-dot-com-if-you-must}@jusme.com> - 2025-03-20 08:33 +0000
            Re: bad bot behavior Toaster <toaster@dne3.net> - 2025-03-20 19:01 -0400
            Re: bad bot behavior Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-03-21 08:05 +0000
              Re: bad bot behavior Ian <${send-direct-email-to-news1021-at-jusme-dot-com-if-you-must}@jusme.com> - 2025-03-21 08:42 +0000
        Re: bad bot behavior candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2025-03-23 14:30 +0000
          Re: bad bot behavior D Finnigan <dog_cow@macgui.com> - 2025-03-26 08:38 -0500
            Re: bad bot behavior candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2025-03-26 17:00 +0000
            Re: bad bot behavior not@telling.you.invalid (Computer Nerd Kev) - 2025-03-27 07:55 +1000
        Re: bad bot behavior anthk <anthk@openbsd.home> - 2025-05-12 06:24 +0000
      Re: bad bot behavior anthk <anthk@openbsd.home> - 2025-05-12 06:24 +0000

csiph-web