Technik

379 readers

88 users here now

die Community für alles, was man als Technik beschreiben kann

Beiträge auf Deutsch oder Englisch

founded 6 months ago

MODERATORS

mettwurstkaninchen@feddit.org

muelltonne@feddit.org

Bluesky's open API means anyone can scrape your data for AI training (techcrunch.com)

submitted 1 month ago by tinosaurier@feddit.org to c/technik@feddit.org

13 comments fedilink hide all child comments

top 13 comments

sorted by: hot top controversial new old

[–] BetaDoggo_@lemmy.world 39 points 1 month ago* (last edited 1 month ago) (1 children)

The api is kind of irrelevant. Scraping is possible for any public site. This shouldn't be a reason to close off access.

[–] lolrightythen@lemmy.world 2 points 1 month ago (1 children)

Makes you wonder who stands to gain (or lose less) by a message like this

[–] DoucheBagMcSwag@lemmy.dbzer0.com 4 points 1 month ago

Feels like a hit piece

[–] remotelove@lemmy.ca 30 points 1 month ago (1 children)

That is kind of how the open Internet works. AI or no AI, the rules have been the same for years.

If you put anything online, it is no longer under your control and fancy APIs just remove a step or two in collecting that data.

[–] far_university190@feddit.org 11 points 1 month ago

Scraper do not ask, scraper just scrape.

[–] GbyBE@discuss.tchncs.de 12 points 1 month ago (1 children)

Well, is there's no API, people complain that it's too closed. If there is an API, people complain that it's too open. If you don't want AI to see and use your data, don't post it publicly. Attaching a license or terms could make clear what can and cannot be done with the data, but enforcement will be hard.

My opinion is that having an API is preferable, because anything that is put out there publicly can already be scraped anyway, but an API makes 3rd party clients and other integrations possible, which is a good thing.

[–] ramble81@lemm.ee 1 points 1 month ago (2 children)

There is a solution but everyone rants about it to: DRM. The problem is it’s been used in a way that no one likes and thus says it’s evil and restricted. But in this case it could be used to limit who can do what with the data. Granted there are always ways around it like the analog loophole

[–] GbyBE@discuss.tchncs.de 2 points 1 month ago

I don't believe DRM is the solution here, since it's the public posts that are being scraped. Those public posts are by default visible by everybody, so scrapers can't be stopped with DRM, unless you want to make a platform that can only be accessed by trusted applications.

[–] General_Effort@lemmy.world 1 points 1 month ago

You're not really into this whole technology thing much, are you?

[–] ramble81@lemm.ee 10 points 1 month ago

Got news for anyone using Lemmy….

[–] General_Effort@lemmy.world 6 points 1 month ago

If there's an API, you don't scrape data. Scraping is when you take the data out of some HTML file meant for human consumption. Maybe they should leave more of the writing to AI.

[–] BETYU@moist.catsweat.com 3 points 1 month ago

and its not really decentralized because the main server controls who can enter the network.

[–] Luffy879@lemmy.ml 1 points 1 month ago

Did bluesky say anything about this? Because if yes, ill bet my shiny metal ass that tthey are going to paywall the api in 2025