From Weird Phrases to Hate Speech: Why Responsible Machine Translation Matters for AI Ethics

Julio Madrid
Jan 26, 2024
4 min read

Research paper "A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism" — "A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism"

Eighteen years in the trenches of translation, navigating the ever-shifting sands of languages and cultures has taught me one thing: change is the only constant. And oh, how the landscape has changed! From the heady days of dial-up modems and clunky dictionaries to the present AI-powered Babel, it's been a whirlwind ride. And one of the biggest game-changers, the elephant in the multilingual room, is Machine Translation (MT).

Now, I'm not a Luddite. I've seen the magic of MT firsthand. Done right, it's a magnificent tool. It's streamlined my workflow, opened doors to new languages, and even sparked fascinating debates about the future of human-machine collaboration. But here's the catch: MT needs adult supervision. Just like any powerful tool, MT needs a responsible hand on the throttle. And lately, I've been scratching my head at the quality of content swirling around the web, a disturbingly large chunk of it churned out by unsupervised MT.

The recent research paper, “A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism,” has exposed what we all suspected. Turns out, that awkward phrasing and nonsensical sentence salad I kept encountering in Urdu and Spanish weren't just my quirky reading glasses acting up. With around 5 million new crawled (web-scraped) pages every month, just from Common Crawl, you can get an idea of the size of low-quality content generated at an incredible pace.

And here's the real kicker: this low-quality stuff isn't just an eyesore. It's poison seeping into the very foundation of AI programming. Think of it like this: imagine feeding a gourmet chef nothing but instant ramen and fast food. Can you expect Michelin-star dishes to come out of that kitchen? I don’t think so.

That's exactly what's happening with some of these large language models (LLMs), the AI workhorses trained on massive datasets of text and code. They're being fed a steady diet of not-so-good MT, a linguistic buffet gone bad.

I'm talking about the algorithms powering everything from chatbots to search engines, even the AI assistants we entrust with our homes and schedules. We're training AI on misinformation, cultural faux pas, and downright gibberish. It's like building a skyscraper on a foundation of sand – sooner or later, the whole thing will come crashing down. That's the risk we're facing with AI powered by bad MT. One misinterpreted phrase, one culturally insensitive blunder, and suddenly the chatbots are spewing hate speech, the search engines are serving up propaganda, and the AI assistants are subtly influencing us all to become vegan… [dear God, not vegan!]

Screenshot of LASER ccMatrix scraped content used by the research paper team

This isn't just a translator's paranoia. It's a clarion call for a reality check. We need to stop treating MT like a magic bullet and start demanding real accountability. Platforms need to invest in rigorous quality control measures, filtering out the garbage before it pollutes the AI ecosystem. Developers need to prioritize human oversight in LLMs, ensuring they're trained on clean data, not linguistic sewage.

So, what can we do? First, let's be discerning consumers of all content, not just the content we consume online. Just because something's in our language doesn't mean it's trustworthy. We need to develop a healthy skepticism, a palate that can distinguish the carefully crafted dish from the AI-microwaved slop. Question the source, scrutinize the language, and don't blindly trust the machine-generated sheen.

Second, let's advocate for responsible MT use. This means demanding transparency from platforms and content creators, pushing for human oversight and quality control checks. It's not enough to slap an MT label on content and call it a day. We need to ensure accuracy, fluency, and cultural sensitivity – the hallmarks of true translation.

And lastly, let's remember that MT is a tool, not a replacement. There's no robot linguist out there who can capture the nuances of a metaphor, the subtle humor of a pun, or the emotional weight of a poem. The human touch is still paramount, the final brushstroke that brings the linguistic canvas to life.

As translators, we're not just wordsmiths, we're cultural ambassadors. We have a responsibility to bridge the gap between languages with integrity and precision. Let's not let the machine mayhem dilute the melody of human expression. Let's work together to ensure that AI becomes a tool for good, not a conduit for lazy biased language.

ccMatrix tokens in Catalan displayed in MS Visual Studio Code

Because in the end, it's not just about accuracy or efficiency, cheap or expensive. It's about respect. Respect for the languages we wield, respect for the cultures we connect, and respect for the power of words to build bridges, not walls. And that, my friends, is a battle worth fighting for, one keystroke at a time.

Now, if you'll excuse me, I have an HR Newsletter to MTPEdit. One of my client's HR monthly newsletters just came in, translated by… you guessed it… MT, DeepL, specifically. And let me tell you, it needs some serious TLC before getting to all the employees’ inboxes in Mexico. But that's the beauty of being a translator – we get to clean up the linguistic messes, polish the gems, and ensure that the world hears the true symphony of language, one translated word at a time.

From Weird Phrases to Hate Speech: Why Responsible Machine Translation Matters for AI Ethics

Recent Posts

Comments