首页 > 资讯 > News > 正文
2025-08-29 17:46:14

Reddit Blocks Internet Archive’s Access to Its Communities

Reddit has announced that it will restrict the Internet Archive’s Wayback Machine from crawling its community pages, allowing access only to the Reddit homepage. This move comes as part of a broader effort to control how Reddit data is accessed and used, particularly in light of concerns about AI companies scraping content through the Archive’s resources in violation of platform policies.

The Internet Archive is a nonprofit organization dedicated to preserving web content, maintaining a vast collection of approximately 866 billion web pages. Its Wayback Machine serves as a crucial tool for journalists, researchers, and the public to access historical web data, especially given that an estimated 38% of pages available in 2013 are now offline. Restricting access to Reddit content limits this resource’s ability to archive and provide historical context for one of the largest social platforms.

This development follows Reddit’s earlier steps to tighten data access, including a controversial API pricing overhaul in 2023, aimed at regulating how third parties extract and use Reddit data. The platform’s statement to The Verge emphasized that some AI companies have violated Reddit’s policies by scraping content via the Wayback Machine, prompting this protective measure.

The move reflects a wider trend in the social media industry, as major platforms like LinkedIn and Meta have increasingly taken legal and technical steps to prevent unauthorized data scraping. While these actions safeguard user data and platform control, they also raise concerns about reduced transparency and the shrinking availability of historical digital records for research. As AI’s demand for large datasets grows, the tension between data protection and open access is intensifying, signaling potential challenges for future digital research and archival efforts in the social media ecosystem.