“Until the major labels go through their lawsuits, there’s no way for artists or labels to fight back”: A massive music dataset is allegedly allowing AI to train on 12 million+ songs without permission

sonfapitch

3 hours ago

“Until the major labels go through their lawsuits, there’s no way for artists or labels to fight back”: A massive music dataset is allegedly allowing AI to train on 12 million+ songs without permission

If you’ve ever believed that your use of AI could go under the radar, think again. While The Atlantic’s AI Watchdog has been around for a while now, it’s currently gaining traction amongst musicians of all shapes and sizes – and the results seem to suggest that nobody is safe from their work being used against their will.

Last year, independent artists began filing lawsuits against Suno and Udio for “trampling” on the rights of smaller creatives. And, if the AI Watchdog’s findings are correct, the problem could be worse than it originally seemed.

Introduced in September 2025, the launch of the AI Watchdog tool found that more than “7.5 million books, 81 million research articles, 15 million YouTube videos, and writing from tens of thousands of movies and television shows” were allegedly all included in data sets used to train AI products.

Prior to the release of the AI Watchdog tool, The Atlantic was hot on AI’s tail. Writer Alex Reisner in particular covered lots of oddities across the web, from noting AI’s “sneaky” addition to YouTube videos (the option to ‘improve clarity’ seemed to be a way of training AI), to calling out a The Common Crawl Foundation for allegedly “funnelling paywalled articles to AI developers”.

Earlier this week, Reisner discovered that “giant datasets of songs” have been shared within AI-development spaces. “One has 12 million tracks,” the writer claims. “Another has 9 million. The two smaller datasets each have more than 100,000. “The 12-million-track dataset, on its own, would take 91 years to listen to.”

Scarily, Reisner notes that the mass datasets have also already been downloaded “thousands of times”. Apparently, Google is among those who have downloaded one of the smaller datasets, reportedly downloading from the Free Music Archive to train AI models. Stability AI has also revealed it has trained on the Free Music Archive to train its systems.

In light of the new findings, many artists have been sharing their worries over their music being used against their will. Producer DJ Sabrina The Teen DJ in particular has clapped back at anyone calling their music “AI slop”, taking to X to say: “it’s funny how there were no accusations of my music sounding like AI slop until these datasets started getting used to generate slop.”

to everyone who thought my music sounded like ai slop, did you ever think it was because Suno was using a dataset that contained 22 of my songs?

it’s funny how there were no accusations of my music sounding like ai slop until these datasets started getting used to generate slop pic.twitter.com/SerSnaLO46

— DJSabrinaTheTeenDJ (@DJSTTDJ) June 18, 2026

Another X user also searched up Quedeca’s catalogue on AI Watchdog – and found that there were “295 grabs across 8 known data sets from various releases, snippets, videos, and corresponding lyrics from Genius”. In response, the artist could only respond with a bleak note of sarcasm: “Yayyyyy!”

Even smaller breakcore producer sophia_hjkl, who only has around has a combined following of around 10,000 across their X and Instagram accounts, is being looped into the shitstorm. “Suno and Udio [have] used138 of my songs across two of their datasets,” the artist writes. “This is almost my entire catalogue of music.”

the atlantic just published a searchable database of the music used by suno and udio. they used *one hundred and thirty eight* of my songs across two of their datasets. this is almost my entire catalogue of music. it’s just about everything i’ve released from 2017 to 2024. pic.twitter.com/dnudQKY83J

— Sophiaaaahjkl;8901 (@sophia_hjkl) June 18, 2026

Currently, it seems no massive artists have spoken out about the AI Watchdog figures. But producer Vince Valholla, the head of Valholla Records, has posted a damning video on X. “Late last night I found out over 100+ songs from our catalogue were used to train AI models,” the owner says.

“To be honest, until the major labels go through their lawsuits, there’s no way for artists or labels to fight back,” he continues. “They literally scraped the best songs from our catalogue. I’m sick.”

Late last night I found out over 100+ songs from our catalog were used to train AI models. Thanks to The Atlantic, they leaked a database of millions of songs that have been used by the biggest AI music companies like Udio and Suno.

To be honest, until the major labels go… https://t.co/7D0kcVybwS pic.twitter.com/3d2cmei0u9

— Vince Valholla (@VinceValholla) June 19, 2026

Australia’s official music copyright team, APRA AMCOS, has also annoyed that it will be launching an investigation into The Atlantic’s findings. Opening on a list of notable acts – from Nick Cave to Kylie Minogue – a press release entitled “PROOF OF THEFT” condemns AI companies that have allegedly stolen mass datasets for training purposes.

With Australia officially rejecting copyright exception for AI platforms in October, APRA AMCOS isn’t too pleased with the findings. According to the press release, the company will be launching an investigation into the Australian and New Zealong songs that have been compromised. “No permission, no licence, no payment,” the company’s Chief Executive, Dean Ormston, writes. “These are not bargaining chips, they are the life’s work of Australian and New Zealand songwriters.”

Over on Reddit, many fans are frantically trying to find a way of helping their favourite artists from being used in AI training. “I doubt [someone] like [hypercore rapper] Jane Remover would want their shit used as birdfeed for an AI model,” one user writes.

Another user named Scott The Pisces, a small British producer, even notes that his own work is apparently included in some of the datasets. “Found 10 songs of mine on the dataset, and I’m not even famous,” he writes.

Get the MusicTech newsletter

Get the latest news, reviews and tutorials to your inbox.