Can privacy and security be preserved in the course of large-scale textual data analysis? As it turns out, yes. A team of Amazon researchers in a proposed a way to anonymize customer-supplied data. They claim that their approach, which works by rephrasing samples and basing the analysis on the new phrasing, results in at least 20-fold greater guarantees on expected privacy.
“Questions about data privacy are frequently met with the answer ‘It’s anonymized! Identifying features have been scrubbed!’ However, studies … show that attackers can deanonymize data by correlating it with ‘side information’ from other data sources,” Tom Diethe, machine learning manager in the Amazon Alexa Shopping organization, wrote in a blog post.
The researchers’ solution involved adding noise to make data related to specific