A brand new study by researchers at… Georgia Institute of Technology has found that giant language models (LLMs) exhibit a major bias towards entities and ideas related to Western culture, even when input in Arabic or trained exclusively on Arabic data.
The results, published on arXivraise concerns concerning the cultural fairness and appropriateness of those powerful AI systems when deployed globally.
“We show that multilingual and monolingual Arabic (language models) exhibit a bias toward entities related to Western culture,” the researchers wrote of their paper titled “Having Beer after Prayer? Measuring Cultural Bias in Large Language Models.”
The study highlights the challenges LLMs face in capturing cultural nuances and adapting to specific cultural contexts, despite advances of their multilingual skills.
Potential harms of cultural bias in LLMs
The researcher's findings raise concerns concerning the impact of cultural bias on users from non-Western cultures who interact with applications supported by LLMs. “Because LLMs are more likely to have increasing impact through many recent applications in the approaching years, it’s difficult to predict all the potential harm that might be brought on by the sort of cultural bias,” Alan Ritter, one among the study's authors, said in an interview with VentureBeat.
Ritter identified that current LLM results perpetuate cultural stereotypes. “When language models are asked to generate fictional stories about individuals with Arabic names, they have an inclination to associate Arabic male names with poverty and traditionalism. For example, GPT-4 is more more likely to select adjectives similar to “willful,” “poor,” or “modest.” “In contrast, adjectives similar to 'wealthy', 'popular' and 'unique' appear more regularly in stories about individuals with Western names,” he explained.
Additionally, the study found that current LLMs perform worse for people from non-Western cultures. “In the case of sentiment evaluation, LLMs also make more false-negative predictions on sentences containing Arabic entities, suggesting that Arabic entities are more often falsely related to negative sentiment,” Ritter added.
Wei Xu, the study's lead researcher and creator, highlighted the potential consequences of those biases. “These cultural biases can’t only harm users from non-Western cultures, but can even affect the accuracy of the model when performing tasks and reduce users’ trust within the technology,” she said.
Introducing CAMeL: A novel measure for assessing cultural biases
To systematically assess cultural biases, the team introduced CAMeL (Cultural Appropriateness Measure Set for LMs), a novel benchmark dataset consisting of over 20,000 culturally relevant entities in eight categories, including personal names, foods, clothing items, and non secular sites. The units have been curated to offer a contrast between Arab and Western cultures.
“CAMeL provides a basis for measuring cultural biases in LMs through each extrinsic and intrinsic assessments,” the research team explains within the paper. Using CAMeL, researchers evaluated the cross-cultural performance of 12 different language models, including the renowned GPT-4, on a spread of tasks similar to story generation, named entity recognition (NER), and sentiment evaluation.
Ritter imagines that the… CAMeL benchmark might be used to quickly test LLMs for cultural biases and discover gaps that require more effort from model developers to scale back these issues. “One limitation is that CAMeL only tests Arab cultural biases, but we plan to expand this to more cultures in the longer term,” he added.
The Path Forward: Building Culturally Aware AI Systems
To reduce bias for various cultures, Ritter suggests that LLM developers must hire data labelers from many various cultures through the fine-tuning process where LLMs are tailored to human preferences using labeled data. “This will probably be a fancy and expensive process, but it is vitally vital to be certain that people profit equally from technological advances through LLMs and that some cultures usually are not left behind,” he stressed.
Xu highlighted an interesting results of her work, noting that one among the possible causes of cultural bias in LLMs is the heavy use of Wikipedia data in pre-training. “Although Wikipedia is created by editors around the globe, it happens that more Western cultural concepts are translated into non-Western languages, somewhat than the opposite way around,” she explained. “Interesting technical approaches could include higher data mix in pre-training, higher coordination with people for cultural sensitivity, personalization, model unlearning or relearning for cultural adaptation.”
Ritter also noted a further challenge in adapting LLMs to cultures with less online presence. “The amount of raw text available for pre-training language models could also be limited. In this case, LLMs may lack vital cultural knowledge to start with, and easily aligning with the values of those cultures using standard methods may not fully resolve the issue. Creative solutions are needed to seek out recent ways to include cultural knowledge into LLMs to make them more helpful to individuals in these cultures,” he said.
The findings highlight the necessity for a collaborative effort by researchers, AI developers and policymakers to handle the cultural challenges posed by LLMs. “We view this as a brand new research opportunity for the cultural adaptation of LLMs in each training and deployment,” Xu said. “This can also be an excellent opportunity for corporations to take into consideration localizing LLMs for various markets.”
By prioritizing cultural fairness and investing in the event of culturally aware AI systems, we will harness the facility of those technologies to advance global understanding and promote more inclusive digital experiences for users worldwide. Xu concluded: “We are pleased to be one among the primary stones on this direction and sit up for seeing how our data set and similar data sets using our proposed method are created to be routinely utilized in the assessment and training of LLMs to be certain that these are less favored.” culture over the opposite.”