this post was submitted on 11 Sep 2023

91 points (92.5% liked)

Technology

59679 readers

4058 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

X updates its terms to ban crawling and scraping (techcrunch.com)

submitted 1 year ago by geosoco@kbin.social to c/technology@lemmy.world

14 comments fedilink hide all child comments

The new terms, which are effective from September 29, ban any kind of scraping or crawling without “prior written consent.”

NOTE: crawling or scraping the Services in any form, for any purpose without our prior written consent is expressly prohibited.

The previous version of the terms allowed crawling in accordance with robots.txt.

“NOTE: crawling the Services is permissible if done in accordance with the provisions of the robots.txt file, however, scraping the Services without our prior consent is expressly prohibited,” it read.

In the last few months, Twitter has also altered its robots.txt file — a file that gives instructions to robot crawlers about what parts of the site they are permitted to visit — to remove instructions for all crawler bots apart from Google.

In 2015, Twitter confirmed that it had a firehose deal in place with Google to surface tweets in search results. It is not clear if the nature or terms of that deal have changed under the new management.

top 14 comments

sorted by: hot top controversial new old

[–] reddig33@lemmy.world 43 points 1 year ago (1 children)

Good luck with that.

[–] IHeartBadCode@kbin.social 19 points 1 year ago

Yeah that's literally UNENFORCEABLE. We just had a case last year that indicated that you can scrap data from sites so long as the data being scrapped isn't used for profit.

Additionally, scrappers cannot be legally held to have agreed to the TOS. Just simply typing an address in and then receiving a page back doesn't mean that anyone agrees to the TOS of the server that gave the page. For pretty much the same reason software couldn't enforce the "if you don't agree with the terms on the CD-ROM, then you cannot open the package the CD-ROM is in." So just because X wrote that in their TOS has zero bearing on if they can actually enforce that through the court system, which likely that's going to be a big NAH.

That's based off of the point of the gate-up/gate-down test given by the courts. If a normal person can find a random "tweet (are we still calling them that?)" by typing a URL, the gate is up, you cannot pick and choose who gets to enter. If you don't want a random tweet being scrapped the gates must be down. That means nobody typing in a random URL can ever access that tweet, they have to go through the gate house to gain entry to the resource. But gates down means that no one is going to link to a tweet because when they click the link, instead of seeing the related information, they get handed a login page. Which X has been trying that and news outlets bitching that they're not going to post tweets in their story if Musk is just going to block everyone.

The thing that X could argue is that someone is using their tweets for "profit" which is exactly the case they're trying with the ADL and the CCDH. They're trying to argue that these not-for-profits are profiting off of convincing ad buyers to not buy ads. Which, if that sounds crazy, OH BOY IS IT. However, Musk's lawyers have attempted to muddle the waters on what is "PROFIT". So grab some popcorn for that one.

The thing is that, I get Musk wants to hold tight copyright on the tweets and not surface a lot to others who might use that data for who knows what purpose. BUT you cannot have cake and have eaten it as well. Musk doesn't get the best of both worlds. He can put everything behind a wall and attempt to enforce his TOS, but that's still not really go to go well for his ADL/CCDH case. Or he can surface the tweets for the Internet to read. But he cannot have both. We've settled that in courts and Congress hasn't made any kind of motion in changing that standing.

[–] silverbax@lemmy.world 32 points 1 year ago* (last edited 1 year ago)

Every move Elon Musk comes up with makes Twitter less relevant.

[–] Fisk400@feddit.nu 27 points 1 year ago

Looks like he still can't afford the hosting fees. I wonder if the income will improve once all search engines stop it indexing them.

[–] desmosthenes@lemmy.world 15 points 1 year ago (1 children)

they about to get even more litigious

[–] geosoco@kbin.social 12 points 1 year ago (2 children)

They're going to have a larger legal team than dev team pretty soon.

[–] Virkkunen@kbin.social 9 points 1 year ago

So they're pulling an Oracle?

[–] IHeartBadCode@kbin.social 4 points 1 year ago

I'm just curious where he's pulling the money for this legal team? It's definitely not from X's "profits" LOL.

[–] Rhaedas@kbin.social 14 points 1 year ago

in any form

Is viewing and copying/pasting a manual form? I know the implied meaning is automation but as a legal document it should probably specify that. Unless the plan is to drive more people away, which does seem to be a trend.

[–] Taleya@aussie.zone 13 points 1 year ago

Ironic given how much crawling and scraping they expect their staff to do...

[–] autotldr@lemmings.world 8 points 1 year ago

This is the best summary I could come up with:

Elon Musk-owned X, formerly Twitter, has updated its terms of service to prohibit scraping and crawling — likely to fend off any AI models training on its data.

The new terms, which are effective from September 29, ban any kind of scraping or crawling without “prior written consent.”

At that time, Musk had said that it was a temporary measure because the site was getting “data pillaged so much that it was degrading service for normal users.”

In April, he threatened to sue Microsoft for illegally using the social network’s data to train AI models.

Earlier this month, X changed its privacy policy to state it might use public data to train AI models.

Musk has previously noted during a Twitter space that xAI, a company founded in July, would use public data such as tweets to train its models.

The original article contains 390 words, the summary contains 140 words. Saved 64%. I'm a bot and I'm open source!

[–] Jaysyn@kbin.social 4 points 1 year ago

As long as they are publishing that information to the public internet, they don't have a leg to stand on legally.

[–] D1SoveR@sopuli.xyz 3 points 1 year ago (1 children)

I gather, policy as written, that apart from bulk data collection, this also inadvertently prohibits usage of any alternative front-ends, such as Nitter? Does it also stop any archival (akin to Wayback Machine) from happening against their service?

[–] geosoco@kbin.social 2 points 1 year ago

It sounds like it might, but whether it stops them from working or just introduces liability such that they can sue is unclear. Likely the latter, but unclear.