Detecting hate speech is a task even state-of-the-art machine learning models struggle with. That’s because harmful speech comes in many different forms, and models must learn to differentiate each one from innocuous turns of phrase. Historically, hate speech detection models have been tested by measuring their performance on data using metrics like accuracy. But this makes it tough to identify a model’s weak points and risks overestimating a model’s quality, due to gaps and biases in hate speech datasets.
In search of a better solution, researchers at the University of Oxford,