AI bots strain Wikimedia as bandwidth surges 50%

On Tuesday, the Wikimedia Foundation announced that relentless AI scraping is putting strain on Wikipedia’s servers. Automated bots seeking AI model training data for LLMs have been vacuuming up terabytes of data, growing the foundation’s bandwidth used for downloading multimedia content by 50 percent since January 2024. It’s a scenario familiar across the free and open source software (FOSS) community, as we’ve previously detailed.

The Foundation hosts not only Wikipedia but also platforms like Wikimedia Commons, which offers 144 million media files under open licenses. For decades, this content has powered everything from search results to school projects. But since early 2024, AI companies have dramatically increased automated scraping through direct crawling, APIs, and bulk downloads to feed their hungry AI models. This exponential growth in non-human traffic has imposed steep technical and financial costs—often without the attribution that helps sustain Wikimedia’s volunteer ecosystem.

The impact isn’t theoretical. The foundation says that when former US President Jimmy Carter died in December 2024, his Wikipedia page predictably drew millions of views. But the real stress came when users simultaneously streamed a 1.5-hour video of a 1980 debate from Wikimedia Commons. The surge doubled Wikimedia’s normal network traffic, temporarily maxing out several of its Internet connections. Wikimedia engineers quickly rerouted traffic to reduce congestion, but the event revealed a deeper problem: The baseline bandwidth had already been consumed largely by bots scraping media at scale.

Read full article

Comments