HomeNewsStudy Shows AI Chatbots Can Detect Race, But Racial Biases Reduce Empathy...

Study Shows AI Chatbots Can Detect Race, But Racial Biases Reduce Empathy in Responding

Under the guise of anonymity and the corporate of strangers, the appeal of the digital world as a spot to hunt psychological support is growing. This reinforces this phenomenon over 150 million people within the United States live in federally designated mental health skilled shortage areas.

“I actually need your help as I'm too scared to check with a therapist and I can't get in contact with one anyway.”

“Am I overreacting or feeling hurt because my husband makes fun of me in front of his friends?”

“Could some strangers please influence my life and choose my future?”

The quotes above are real posts by users on Reddit, a social media news website and forum where users share content or seek advice in smaller, interest-based forums often called “subreddits.” can ask.

Using a knowledge set of 12,513 posts with 70,429 responses from 26 mental health subreddits, researchers from MIT, New York University (NYU) and the University of California Los Angeles (UCLA) developed a study a frame to judge the fairness and overall quality of mental health support chatbots based on large language models (LLMs) akin to GPT-4. Their work was recently published on the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP).

To accomplish this, researchers asked two licensed clinical psychologists to judge 50 randomly chosen Reddit posts in search of mental health support and match each post with either an actual response from a Reddit user or one from GPT- 4 generated answer. Without knowing which answers were real or which were generated by AI, the psychologists were asked to rate the extent of empathy in each answer.

Mental health support chatbots have long been explored as a option to increase access to mental health support, but powerful LLMs like OpenAI's ChatGPT are transforming human-AI interactions as AI-generated answers grow to be increasingly more distant from real people's responses are to be distinguished.

Despite these remarkable advances, the unintended consequences of AI-powered mental health support have drawn attention to the possibly deadly risks; In March last 12 months, a Belgian man died by suicide because of this of an exchange with ELIZA, a chatbot designed to mimic a psychotherapist with an LLM called GPT-J. A month later, the National Eating Disorders Association suspended its chatbot Tessa after it began distributing weight loss program tricks to patients with eating disorders.

Saadia Gabriel, a young postdoctoral researcher at MIT who’s now an assistant professor at UCLA and lead writer of the paper, admitted that she was initially very skeptical about how effective chatbots could actually be in supporting mental health. Gabriel conducted this research during her time as a postdoctoral fellow at MIT within the Healthy Machine Learning Group led by Marzyeh Ghassemi, an associate professor at MIT within the Department of Electrical Engineering and Computer Science and the MIT Institute for Medical Engineering and Science, affiliated with MIT Affiliated is the Abdul Latif Jameel Clinic for Machine Learning in Health and the Laboratory for Computer Science and Artificial Intelligence.

What Gabriel and the research team found was that GPT-4 responses weren’t only more empathetic overall, but in addition promoted positive behavior change 48 percent higher than human responses.

However, in a bias assessment, researchers found that GPT-4 response empathy scores were lower in black (2 to fifteen percent lower) and Asian posters (5 to 17 percent lower) in comparison with white posters or posters whose race was unknown. were reduced.

To assess bias in GPT-4 responses and human responses, researchers included several types of posts with explicit demographic leaks (e.g., gender, race) and implicit demographic leaks.

An explicit demographic leak would appear like this: “I’m a 32-year-old black woman.”

An implicit demographic leak, alternatively, would appear like this: “I’m a 32-year-old girl wearing my natural hair,” using keywords to point specific demographics to GPT-4.

With the exception of the Black female posters, GPT-4 responses were found to be less influenced by explicit and implicit demographic leaks than human respondents, who tended to be more empathetic when responding to posts with implicit demographic suggestions.

“The structure of the input you give (the LLM) and a few information concerning the context, e.g. “Using patient demographics has a huge impact on the response you get,” says Gabriel.

The paper notes that explicitly providing instruction to LLMs on the usage of demographics can effectively mitigate bias, as this was the one method during which researchers didn’t observe a big difference in empathy between different demographic groups.

Gabriel hopes that this work may also help ensure a more comprehensive and thoughtful evaluation of LLMs utilized in clinical settings across diverse demographic subgroups.

“LLMs are already getting used for patient-centered support and in medical settings, in lots of cases to automate inefficient human systems,” says Ghassemi. “Here we now have shown that state-of-the-art LLMs are generally less affected by demographic loss than humans in peer-to-peer mental health support, but they don’t provide equitable mental health responses across all derived patient subgroups… we. “We have many opportunities to enhance models in order that they supply higher support when used.”

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read