Apple Intelligence not training on data from many major sites: report

The foundation models that train Apple’s (NASDAQ:AAPL) artificial intelligence, Apple Intelligence, are not training on data from a number of major websites, Wired reported.
The list of major websites that have opted to exclude their data from Apple’s AI training include: Facebook, Instagram, Craigslist, Tumblr, The New York Times, The Financial Times, The Atlantic, Vox Media, USA Today’s network and Condé Nast (which owns Wired), the news outlet added.
Apple did not immediately respond to a request for comment from Seeking Alpha.
Several of the aforementioned news outlets, including The Financial Times, The Atlantic and Vox Media, have signed deals with ChatGPT creator OpenAI that allows OpenAI to train its models on their content. Conversely, The New York Times (NYT) is suing OpenAI and its backer Microsoft (MSFT) for alleged copyright infringement.
Apple said in June that its Apple Intelligence uses its web crawler, Apple Bot, to train on “licensed” data and publicly available data, though it also allows web publishers to opt out.
Apple also has a secondary web crawler, Applebot-Extended. “With Applebot-Extended, web publishers can choose to opt out of their website content being used to train Apple’s foundation models powering generative AI features across Apple products, including Apple Intelligence, Services, and Developer Tools,” Apple said.