Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
| From | anthk <anthk@openbsd.home> |
|---|---|
| Newsgroups | comp.misc |
| Subject | Re: bad bot behavior |
| Date | 2025-05-12 06:24 +0000 |
| Organization | A noiseless patient Spider |
| Message-ID | <slrn101ue1g.198p.anthk@openbsd.home.localhost> (permalink) |
| References | <vrc2r4$2okrp$1@dont-email.me> <vrc8qm$2tkq5$1@dont-email.me> <20250318182006.00006ae3@dne3.net> |
On 2025-03-18, Toaster <toaster@dne3.net> wrote: > On Tue, 18 Mar 2025 12:00:07 -0500 > D Finnigan <dog_cow@macgui.com> wrote: > >> On 3/18/25 10:17 AM, Ben Collver wrote: >> > Please stop externalizing your costs directly into my face >> > ========================================================== >> > March 17, 2025 on Drew DeVault's blog >> > >> > Over the past few months, instead of working on our priorities at >> > SourceHut, I have spent anywhere from 20-100% of my time in any >> > given week mitigating hyper-aggressive LLM crawlers at scale. >> >> This is happening at my little web site, and if you have a web site, >> it's happening to you too. Don't be a victim. >> >> Actually, I've been wondering where they're storing all this data; >> and how much duplicate data is stored from separate parties all >> scraping the web simultaneously, but independently. > > But what can be done to mitigate this issue? Crawlers and bots ruin the > internet. > GZip bombs + fake links = profit. Remember that gz'ed web pages are a standard, even lynx can parse gz files natively. Also, Megahal/Hailo under Perl. Feed it nonsense, and create some non-visible contents under a robots.txt-dissallowed directory full of Markov-chains generated nonsense and gzip bombs.
Back to comp.misc | Previous | Next — Previous in thread | Find similar
bad bot behavior Ben Collver <bencollver@tilde.pink> - 2025-03-18 15:17 +0000
Re: bad bot behavior D Finnigan <dog_cow@macgui.com> - 2025-03-18 12:00 -0500
Re: bad bot behavior not@telling.you.invalid (Computer Nerd Kev) - 2025-03-19 08:19 +1000
Re: bad bot behavior Toaster <toaster@dne3.net> - 2025-03-18 18:20 -0400
Re: bad bot behavior Ian <${send-direct-email-to-news1021-at-jusme-dot-com-if-you-must}@jusme.com> - 2025-03-19 12:06 +0000
Re: bad bot behavior Rich <rich@example.invalid> - 2025-03-19 16:59 +0000
Re: bad bot behavior candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2025-03-23 14:30 +0000
Re: bad bot behavior Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-03-20 02:22 +0000
Re: bad bot behavior Ian <${send-direct-email-to-news1021-at-jusme-dot-com-if-you-must}@jusme.com> - 2025-03-20 08:33 +0000
Re: bad bot behavior Toaster <toaster@dne3.net> - 2025-03-20 19:01 -0400
Re: bad bot behavior Lawrence D'Oliveiro <ldo@nz.invalid> - 2025-03-21 08:05 +0000
Re: bad bot behavior Ian <${send-direct-email-to-news1021-at-jusme-dot-com-if-you-must}@jusme.com> - 2025-03-21 08:42 +0000
Re: bad bot behavior candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2025-03-23 14:30 +0000
Re: bad bot behavior D Finnigan <dog_cow@macgui.com> - 2025-03-26 08:38 -0500
Re: bad bot behavior candycanearter07 <candycanearter07@candycanearter07.nomail.afraid> - 2025-03-26 17:00 +0000
Re: bad bot behavior not@telling.you.invalid (Computer Nerd Kev) - 2025-03-27 07:55 +1000
Re: bad bot behavior anthk <anthk@openbsd.home> - 2025-05-12 06:24 +0000
Re: bad bot behavior anthk <anthk@openbsd.home> - 2025-05-12 06:24 +0000
csiph-web