Major sites opt out of Apple’s AI scraping

This summer, Apple gave websites more control over whether the company could train its AI models on their data. Major large publishers and platforms have opted out.

Kate Knibbs for Wired:

WIRED can confirm that Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, the USA Today network, and WIRED’s parent company, Condé Nast, are among the many organizations opting to exclude their data from Apple’s AI training. The cold reception reflects a significant shift in both the perception and use of the robotic crawlers that have trawled the web for decades. Now that these bots play a key role in collecting AI training data, they’ve become a conflict zone over intellectual property and the future of the web.

This new tool, Applebot-Extended, is an extension to Apple’s web-crawling bot that specifically lets website owners tell Apple not to use their data for AI training.

The original Applebot, announced in 2015, initially crawled the internet to power Apple’s search products like Siri and Spotlight. Recently, though, Applebot’s purpose has expanded: The data it collects can also be used to train the foundational models Apple created for its AI efforts.

Publishers can block Applebot-Extended by updating a text file on their websites known as the Robots Exclusion Protocol, or robots.txt. This file has governed how bots go about scraping the web for decades—and like the bots themselves, it is now at the center of a larger fight over how AI gets trained. Many publishers have already updated their robots.txt files to block AI bots from OpenAI, Anthropic, and other major AI players.


MacDailyNews Take: Apple respects publishers’ rights. More info about Applebot here.

We are currently about 1/8th of the way to being sustainable with Substack subscriptions.

Not a bad start!

Please tell your friends about MacDailyNews on Substack (https://macdailynews.substack.com) and, if you’re currently a free subscriber, please consider $5/mo. or $50/year to keep MacDailyNews going. Just hit the subscribe button. Thank you!

– MacDailyNews

Read on Substack


Please help support MacDailyNews — and enjoy subscriber-only articles, comments, chat, and more — by subscribing to our Substack: macdailynews.substack.com. Thank you!

Support MacDailyNews at no extra cost to you by using this link to shop at Amazon.

The post Major sites opt out of Apple’s AI scraping appeared first on MacDailyNews.