46

Now that we know AI bots will ignore robots.txt and churn residential IP addresses to scrape websites, does anyone know of a method to block them that doesn't entail handing over your website to Cloudflare?

you are viewing a single comment's thread
view the rest of the comments
[-] Deckweiss@lemmy.world 7 points 1 day ago* (last edited 20 hours ago)

The only way I can think of is blacklisting everything by default, directing to a challanging proper captcha (can be selfhosted) and temporarily whitelisting proven human IPs.

When you try to "enumerate badness" and block all AI useragents and IP ranges, you'll always leave some new ones through and you'll never be done with adding them.

Only allow proven humans.


A captcha will inconvenience the users. If you just want to make it worse for the crawlers, let them spend compute ressources through something like https://altcha.org/ (which would still allow them to crawl your site, but make DDoSing very expensive) or AI honeypots.

[-] jagged_circle@feddit.nl 2 points 8 hours ago

Any reason you prefer this to mCAPTCHA?

[-] Deckweiss@lemmy.world 1 points 5 hours ago

I didn't know about mCaptcha. Thanks for sharing.

[-] ctag@lemmy.sdf.org 4 points 1 day ago* (last edited 1 day ago)

I hadn't heard of that before, thanks for the link.

I haven't read through the docs yet... But PoW makes me wonder what the work is and if it's cryptocurrency related.

Edit: Found it: https://altcha.org/docs/proof-of-work/

[-] jagged_circle@feddit.nl 1 points 8 hours ago

Hashcash predates crypto currencies

this post was submitted on 09 Jan 2025
46 points (97.9% liked)

Selfhosted

40971 readers
527 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS