‘Noise’ in the machine: Human differences in judgment lead to problems for AI

by | May 14, 2024 | Science

Many people understand the concept of bias at some intuitive level. In society, and in artificial intelligence systems, racial and gender biases are well documented.If society could somehow remove bias, would all problems go away? The late Nobel laureate Daniel Kahneman, who was a key figure in the field of behavioral economics, argued in his last book that bias is just one side of the coin. Errors in judgments can be attributed to two sources: bias and noise.Bias and noise both play important roles in fields such as law, medicine and financial forecasting, where human judgments are central. In our work as computer and information scientists, my colleagues and I have found that noise also plays a role in AI.Statistical noiseNoise in this context means variation in how people make judgments of the same problem or situation. The problem of noise is more pervasive than initially meets the eye. A seminal work, dating back all the way to the Great Depression, has found that different judges gave different sentences for similar cases.Worryingly, sentencing in court cases can depend on things such as the temperature and whether the local football team won. Such factors, at least in part, contribute to the perception that the justice system is not just biased but also arbitrary at times.Other examples: Insurance adjusters might give different estimates for similar claims, reflecting noise in their judgments. Noise is likely present in all manner of contests, ranging from wine tastings to local beauty pageants to college admissions.[embedded content]Noise in the dataOn the surface, it doesn’t seem likely that noise could affect the performance of AI systems. After all, machines aren’t affected by weather or football teams, so why would they make judgments that vary with circumstance? On the other hand, researchers know that bias affects AI, because it is reflected in the data that the AI is trained on.For the new spate of AI models like ChatGPT, the gold standard is human performance on general intelligence problems such as common sense. ChatGPT and its peers are measured against human-labeled commonsense datasets.Put simply, researchers and developers can ask the machine a commonsense question and compare it with human answers: “If I place a heavy rock on a paper table, will it collapse? Yes or No.” If there is high agreement between the two – in the best case, perfect agreement – the machine is approaching human-level common sense, according to the test.So where would noise come in? The commonsense question above seems simple, and most humans would likely agree on its answer, but there are many questions where there is more disagreement or uncertainty: “Is the following sentence plausible or implausible? My dog plays volleyball.” In other words, there is potential for noise. It is not surprising that interesting commonsense questions would have some noise.But the issue is that most AI tests don’t account for this noise in experiments. Intuitively, questions generating human answers that tend to agree with one another should be weighted higher than if the answers diverge – in other words, where there is noise. Researchers still don’t know whether or how to weigh AI’s answers in that situation, but a first step is acknowledging that the problem exists.Tracking down noise in the machineTheory aside, the question still remains whether all of the above is hypothetical or if in real tests of common sense there is noise. The best way to prove or disprove the presence of noise is to take an existing test, r …

Article Attribution | Read More at Article Source

[mwai_chat context=”Let’s have a discussion about this article:nnMany people understand the concept of bias at some intuitive level. In society, and in artificial intelligence systems, racial and gender biases are well documented.If society could somehow remove bias, would all problems go away? The late Nobel laureate Daniel Kahneman, who was a key figure in the field of behavioral economics, argued in his last book that bias is just one side of the coin. Errors in judgments can be attributed to two sources: bias and noise.Bias and noise both play important roles in fields such as law, medicine and financial forecasting, where human judgments are central. In our work as computer and information scientists, my colleagues and I have found that noise also plays a role in AI.Statistical noiseNoise in this context means variation in how people make judgments of the same problem or situation. The problem of noise is more pervasive than initially meets the eye. A seminal work, dating back all the way to the Great Depression, has found that different judges gave different sentences for similar cases.Worryingly, sentencing in court cases can depend on things such as the temperature and whether the local football team won. Such factors, at least in part, contribute to the perception that the justice system is not just biased but also arbitrary at times.Other examples: Insurance adjusters might give different estimates for similar claims, reflecting noise in their judgments. Noise is likely present in all manner of contests, ranging from wine tastings to local beauty pageants to college admissions.[embedded content]Noise in the dataOn the surface, it doesn’t seem likely that noise could affect the performance of AI systems. After all, machines aren’t affected by weather or football teams, so why would they make judgments that vary with circumstance? On the other hand, researchers know that bias affects AI, because it is reflected in the data that the AI is trained on.For the new spate of AI models like ChatGPT, the gold standard is human performance on general intelligence problems such as common sense. ChatGPT and its peers are measured against human-labeled commonsense datasets.Put simply, researchers and developers can ask the machine a commonsense question and compare it with human answers: “If I place a heavy rock on a paper table, will it collapse? Yes or No.” If there is high agreement between the two – in the best case, perfect agreement – the machine is approaching human-level common sense, according to the test.So where would noise come in? The commonsense question above seems simple, and most humans would likely agree on its answer, but there are many questions where there is more disagreement or uncertainty: “Is the following sentence plausible or implausible? My dog plays volleyball.” In other words, there is potential for noise. It is not surprising that interesting commonsense questions would have some noise.But the issue is that most AI tests don’t account for this noise in experiments. Intuitively, questions generating human answers that tend to agree with one another should be weighted higher than if the answers diverge – in other words, where there is noise. Researchers still don’t know whether or how to weigh AI’s answers in that situation, but a first step is acknowledging that the problem exists.Tracking down noise in the machineTheory aside, the question still remains whether all of the above is hypothetical or if in real tests of common sense there is noise. The best way to prove or disprove the presence of noise is to take an existing test, r …nnDiscussion:nn” ai_name=”RocketNews AI: ” start_sentence=”Can I tell you more about this article?” text_input_placeholder=”Type ‘Yes'”]
Share This