AI Watermarks Are No Match for Attackers


Soheil Feizi considers himself an optimistic person. But the University of Maryland computer science professor is blunt when he sums up the current state of watermarking AI images. “We don’t have any reliable watermarking at this point,” he says. “We broke all of them.”

For one of the two types of AI watermarking he tested for a new study— “low perturbation” watermarks, which are invisible to the naked eye—he’s even more direct: “There’s no hope.”

Feizi and his coauthors looked at how easy it is for bad actors to evade watermarking attempts. (He calls it “washing out” the watermark.) In addition to demonstrating how attackers might remove watermarks, the study shows how it’s possible to add watermarks to human-generated images, triggering false positives. Released online this week, the preprint paper has yet to be peer-reviewed; Feizi has been a leading figure examining how AI detection might work, so it is research worth paying attention to, even in this early stage.

It’s timely research. Watermarking has emerged as one of the more promising strategies to identify AI-generated images and text. Just as physical watermarks are embedded on paper money and stamps to prove authenticity, digital watermarks are meant to trace the origins of images and text online, helping people spot deepfaked videos and bot-authored books. With the US presidential elections on the horizon in 2024, concerns over manipulated media are high—and some people are already getting fooled. Former US President Donald Trump, for instance, shared a fake video of Anderson Cooper on his social platform Truth Social; Cooper’s voice had been AI-cloned.

This summer, OpenAI, Alphabet, Meta, Amazon, and several other major AI players pledged to develop watermarking technology to combat misinformation. In late August, Google’s DeepMind released a beta version of its new watermarking tool, SynthID. The hope is that these tools will flag AI content as it’s being generated, in the same way that physical watermarking authenticates dollars as they’re being printed.

It’s a solid, straightforward strategy, but it might not be a winning one. This study is not the only work pointing to watermarking’s major shortcomings. “It is well established that watermarking can be vulnerable to attack,” says Hany Farid, a professor at the UC Berkeley School of Information.

This August, researchers at the University of California, Santa Barbara and Carnegie Mellon coauthored another paper outlining similar findings, after conducting their own experimental attacks. “All invisible watermarks are vulnerable,” it reads. This newest study goes even further. While some researchers have held out hope that visible (“high perturbation”) watermarks might be developed to withstand attacks, Feizi and his colleagues say that even this more promising type can be manipulated.

The flaws in watermarking haven’t dissuaded tech giants from offering it up as a solution, but people working within the AI detection space are wary. “Watermarking at first sounds like a noble and promising solution, but its real-world applications fail from the onset when they can be easily faked, removed, or ignored,” Ben Colman, the CEO of AI-detection startup Reality Defender, says.


Source link