Scraping is pretty fragile and not suitable for many things, also reddit goes to shit anyway, my Homepage with 130 subs in hot recommends the same posts since the blackout startet and they are all old.
Asklemmy
A loosely moderated place to ask open-ended questions
Search asklemmy ๐
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- !lemmy411@lemmy.ca: a community for finding communities
~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~
but who are they gonna sue haha
The admin of the instance that provides such a proxy probably.
I'm a bit conflicted. On one hand I'd like to see reddit die, on the other hand I wish I had a backup of subreddits such as r/HFY. Maybe it would be enough to just dump the current archive into an instance and let various lemmy communities that map the subreddits grow from there.
There is a project to archive all Reddit before the API change.
More info:
https://old.reddit.com/r/DataHoarder/comments/142l1i0/archiveteam_has_saved_over_108_billion_reddit/
https://github.com/ArchiveTeam/reddit-grab
If scraping is against their terms of service (and it probably is), then it doesn't seem like much of a legal gray zone to me. I think they would sue the people running the scrapers.
I used to work for a company that was constantly fighting scrapers. They loved our data! I have no idea how successful the bad guys were at doing it, but there were ways we could slow it down, block it, etc. Also, if you spend enough money with your CDN, there are lots of ways to deal with bots and scrapers. None of it is 100% effective, but you can sure make it a pain in the ass for your casual Lemmy admin.
I say we make our own content here, instead of pulling it from Reddit.