Reddit reportedly selling its users' content to an AI company for $60 million per year
Bloomberg report unveils annual $60 mil deal from unnamed AI company in exchange for Reddit submissions as training data
According to a report from Bloomberg, Reddit has signed a content licensing deal allowing AI models to "train" on its users' submitted data to the sum of $60 mil every year. Officially, Reddit has declined to comment on the matter, but the timing of it aligns with expectations of its first Initial Public Offering (IPO) in the stock market in the coming months.
As expected, the move has been met with scrutiny and backlash in the hours since the story first dropped, but it isn't clear what recourse people reluctant to share their comments with AI engines have, at least outside of some direct legal action. Social media platforms like Reddit monetize user data all the time— however, the long-term viability of generative AI without legal challenges seems questionable. The viability becomes even murkier on a platform like Reddit, where copyrighted or even pirated content is often posted, even if it gets taken down.
Since the purported AI content licensing deal (and Reddit itself) has yet to be made public, there is still a chance that Reddit may decide against implementing it, or the final sum may be significantly different.
However, Reddit's actions could potentially encourage similar moves from other social media platforms. This would be a major headache to artists and other such users who don't want their work used to train AI models that many say resemble automated content theft machines more than intelligent entities.
Unfortunately, it seems likely that users on any platform will have no recourse until the law is truly "written" around the use of generative AI, copyrighted works therein, and so on. Major cases like The New York Times against OpenAI will ultimately determine the long-term fate of business arrangements such as this.
According to a previous Washington Post report and an anonymous source, Reddit has previously expressed willingness to cut off search engines from Reddit posts, declaring the service "can survive without search." That threat was reportedly because Reddit wanted to sell AI training data.
Stay On the Cutting Edge: Get the Tom's Hardware Newsletter
Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.
Christopher Harper has been a successful freelance tech writer specializing in PC hardware and gaming since 2015, and ghostwrote for various B2B clients in High School before that. Outside of work, Christopher is best known to friends and rivals as an active competitive player in various eSports (particularly fighting games and arena shooters) and a purveyor of music ranging from Jimi Hendrix to Killer Mike to the Sonic Adventure 2 soundtrack.
-
COLGeek Given the nature of much of the hosted content, I question what of value the models will "learn".Reply
Aside from the questionable technical goodness of the data, what may be more interesting is how the legal aspects of these transactions evolve over time.
Good awareness article. -
USAFRet
Garbage in, garbage out.COLGeek said:Given the nature of much of the hosted content, I question what of value the models will "learn". -
Giroro "Creators" never own their content. The platforms do.Reply
If you have a problem with that, then good luck raising the few dozen million dollars you'll need to get regulatory approval to start your own server farm. I hear the strict environmental regulations big tech keeps pushing can be a real challenge for startups to navigate.
But it will be worth it in the end when you finally have a place you can freely post your personal pictures and tell people what you *really* think. -
ThomasKinsley Reddit has become the butt of jokes. Are AI companies sure they want to copy this?Reply
JzU_5YoSegUView: https://www.youtube.com/shorts/JzU_5YoSegU -
bit_user I don't really mind if my github projects are being used to train AI. I opensourced that code for the benefit of others, so I don't care too much whether they benefit by using it directly, or via an AI service of some kind.Reply
I sure hope nobody is using posts from these forums to train AI models...
: O
Actually, I think the posts marked "Best Answer" might not be a terrible way to educate an AI about general PC troubleshooting, but even those aren't consistently great. For sure, I'd filter out the rest of the posts... -
DavidLejdar
That is not correct. I.e. Taylor Swift songs are not YouTube's (/Google's/Alphabet's) property. The terms and conditions of such sites, usually stipulate some details about what licence the creator gives to YouTube. In this case in details, see: https://www.youtube.com/t/terms#27dc3bf5d9Giroro said:"Creators" never own their content. The platforms do.
If you have a problem with that, then good luck raising the few dozen million dollars you'll need to get regulatory approval to start your own server farm. I hear the strict environmental regulations big tech keeps pushing can be a real challenge for startups to navigate.
...
But yeah, if there is something, such as in the case of Reddit, a creator does not want, then they sure do not have to use it. And it actually isn't that difficult to set up some hosting - the exposure is then a bit different topic though. -
Findecanor
In my view, there is great diversity between "subreddits" (subforums) on Reddit. Some are mostly silly, others are very serious. Different ones have different rules of conduct, and different tones.COLGeek said:Given the nature of much of the hosted content, I question what of value the models will "learn".
I would question the value of the posts as is though. There would need to be some kind of filter, perhaps based on an already highly trained model, to even start understanding how to train on forum posts. -
Findecanor
AFAIK, international copyright laws dictate that both the web site and the user owns copyright on each post. Neither can decide for the other what the other does with their copy of a post.Giroro said:"Creators" never own their content. The platforms do.
I'm allowed to collect my Reddit posts, and sell e.g. printed books with them if I want. But so is Reddit.
... Unless a post consists of something that was already under copyright owned by someone else, say: song lyrics, an image, a video, etc.