Featured in this episode of Tech News of the Week
So here’s a weird dichotomy: On the one hand, Google has a well-earned reputation for proudly serving ads online that lead to users falling for scams and malware infections. On the other, GMail has, for decades at this point, had a reputation for doing a borderline amazing job killing spam messages. In fact, the spam filtering they had from the get-go is probably one major reason a lot of people got, and still use GMail.
Obviously it’s always been a cat and mouse game with spammers. Google’s latest purports to keep up with one of the dumbest/cleverest ways spammers evade scans: using Unicode lookalike characters so a text scan doesn’t see a word. In the olden days we’d call this leet speek, where you’d replace an i with a 1, an o with a 0, etc. but Unicode enables you to use things like “Mathematical Bold Capital C” which looks damn near identical to the regular ASCII letter C.
The new Google service is an obvious backronym called RETVec (Resilient & Efficient Text Vectorizer), and the idea is that it doesn’t matter what coding is used. The system transmutes the characters into what they look like, rather than their coded definition. The idea here is that it then interprets actual meaning, rather than a simple blacklist or lookup table. And it’s done via AI so it will adapt as people continue to mark messages as spam or phishing.
It’s an interesting idea that should work better than some suggested absolutes like “disallow unicode altogether in emails,” which is wrong for so many reasons I don’t even have time to get into right now.