an abstract image of a network of dots

Why AI tools trained on the whole internet write like no one in particular

The first time I used ChatGPT for client work, the output read like a press release written by committee. Every sentence was technically correct. None of it sounded like the brand I was writing for — a regional HVAC company with forty years in business and a owner who called customers "neighbours."

The AI didn't know that. It couldn't. And the structural reason why AI writing sounds like everyone instead of anyone specific starts with what these tools learned from.

What happens when you train on everything

Large language models learn by reading. A lot. GPT-4 trained on roughly 300 billion tokens — books, websites, forums, documentation, academic papers, product descriptions, blog posts, social media. The entire written internet, more or less, filtered for quality but not for voice.

When you train on that volume, you're not learning how any particular person or brand writes. You're learning the statistical average of how everyone writes. The model becomes extraordinarily good at producing text that sounds plausible, grammatically correct, and generically appropriate for any topic.

That's the AI generic voice problem in a sentence: the tool optimised for everything, which means it optimised for nothing specific.

Why the average is always wrong for your brand

Think about what happens mathematically. If you average together the writing style of every software company, every law firm, every HVAC contractor, every nonprofit — you get something that belongs to none of them. The language model training bias isn't toward bad writing. It's toward the mean.

This is why AI content describing a "cutting-edge solution" for a plumber sounds identical to AI content describing a "cutting-edge solution" for an enterprise SaaS company. The model learned both from the same corpus. It smoothed out the differences.

Your brand's actual terminology gets replaced with industry-standard language. Your specific product names disappear into generic categories. The way your founder explains things gets flattened into how everyone in your sector explains things.

The homogeneity isn't a bug

Here's what most people miss: AI writing homogenous output isn't a failure of the technology. It's the technology working exactly as designed.

These models were built to be useful across millions of different use cases. A tool that writes specifically like a Brooklyn bakery would be useless for a cybersecurity firm. So the models learned to write like neither — to produce competent, flexible, adaptable prose that works reasonably well for any prompt.

The problem is that "reasonably well" isn't what brands need. Content that could belong to anyone creates no brand differentiation. It doesn't sound like your business because it was never trained on your business.

Why adding "write in a friendly tone" doesn't fix it

The obvious workaround is prompting. Tell the AI to sound casual, or professional, or authoritative, or warm. Most users try this first.

It helps — slightly. The model can shift register. But it's still drawing from the same averaged training corpus. "Friendly" becomes the statistical average of all friendly writing on the internet. "Professional" becomes generic corporate speak. The no brand voice AI problem doesn't disappear because you adjusted the mood. The underlying language patterns stay the same.

I've tested this extensively. Give three different AI tools the same brief with the same tone instructions, and you'll get output that's surprisingly similar. Not identical, but interchangeable. The voice isn't coming from anywhere specific.

What the model would need to actually know

For AI to write like your brand, it would need to learn from your brand's actual content. Not the category. Not similar businesses. Your specific pages, your terminology, your way of describing your products and services.

That's a fundamentally different approach than training on the whole internet. It requires the tool to read and ingest your material before generating anything — to build a temporary understanding of how your business talks, then apply that understanding to the output.

This is exactly what BrandDraft AI does differently: it reads your website URL first and uses that intelligence to generate articles that reference your actual product names, terminology, and voice. Not the industry average. Yours.

Content personalisation at the brand level isn't about selecting from preset options. It's about the model understanding your specific business before it writes a word.

Why this matters more than most marketers realize

The AI content flood is real. Millions of articles published monthly, most of them sounding exactly alike. Readers are developing immunity to it — that slightly-off quality where everything is technically correct but nothing feels specific.

Brands that sound distinctive cut through. Not because distinctive is inherently better, but because it signals authenticity. It suggests someone who actually knows this business wrote this content. That signal is worth more as generic AI output becomes the baseline.

The structural problem with AI trained on everything is permanent. These models won't suddenly develop individual voices. They'll keep producing average language for average brands — unless you feed them something specific to anchor on.

The input problem has an input solution

The training data can't change after the model is built. But the input at generation time can. This is where what you feed the AI matters more than how you prompt it.

A model that reads your actual content first has context a general prompt can't provide. It knows your flagship product is called the "ComfortGuard System," not a "comprehensive HVAC solution." It knows you call customers neighbours because you always have.

The average language problem doesn't get solved by better prompts. It gets solved by giving the tool better material to learn from — yours.