HomeNewsI got generative AI to take an undergraduate law exam. It struggled...

I got generative AI to take an undergraduate law exam. It struggled with complex questions

Almost two years have passed since then generative artificial intelligence was made widely available to the general public. Some models showed great promise by passing academic and skilled examinations.

For example, GPT-4 scored greater than 90% of the points Taken the bar exam within the United States. These successes raised concerns that AI systems could easily pass university-level exams. However, mine current study paints a special picture and shows it's not quite the educational powerhouse some might think it’s.

My study

To explore the educational capabilities of generative AI, I checked out the way it performed on an undergraduate criminal justice exam on the University of Wollongong – certainly one of the core subjects that students must pass of their degrees. 225 students took part within the exam.

The exam lasted three hours and consisted of two sections. In step one, students were asked to guage a case study of crime – and the likelihood of successful prosecution. The second contained a brief essay and a series of short answer questions.

The test questions assessed a combination of skills, including legal knowledge, critical pondering and the flexibility to construct persuasive arguments.

Students weren’t allowed to make use of AI for his or her answers. And conducted the assessment in a monitored environment.

I used different AI models to create ten different answers to the exam questions.

Five papers were created by simply pasting the exam query into the AI ​​tool with none prompt. For the opposite five, I provided detailed guidance and relevant legal content to see whether this could improve the end result.

I hand-wrote the AI-generated answers into official exam booklets, using fake student names and numbers. These AI-generated answers were mixed with students' actual exam answers and shared anonymously with five tutors for grading.

Importantly, when grading, the tutors didn’t know that the AI ​​had generated ten of the exam answers.

We hand-wrote the AI ​​answers so editors could think they got here from students.
Kate Aedon/Shutterstock

How did the AI ​​papers perform?

When the tutors were asked concerning the correction, none of them suspected that the answers were AI-generated.

This shows the potential of AI to mimic student reactions and the lack of educators to acknowledge such papers.

But on the entire, the AI ​​papers were unimpressive.

While the AI ​​performed well on the essay-style task, it struggled on complex questions that required in-depth legal evaluation.

This signifies that while AI can mimic human writing style, it lacks the nuanced understanding required for complex legal reasoning.

The students’ exam average was 66%.

On average, only 4.3% of scholars outperformed the AI ​​tasks that had no prompt. Two barely passed the exams (the minimum rating is 50%), three failed.

On average, 39.9% of scholars outperformed papers that used prompts. Three of those papers were unimpressive, receiving 50%, 51.7% and 60%, but two did quite well. One achieved 73.3%, the opposite 78%.

A landing page for ChatGPT asking “How can I help you today?”
Generative AI has earned a popularity for acing difficult tests.
Tada Images/Shutterstock

What does that mean?

These findings have essential implications for each education and skilled standards.

Despite the hype, generative AI is much from with the ability to replace humans in intellectually demanding tasks like this legal exam.

My study suggests that AI needs to be viewed more as a tool and, when used appropriately, can enhance human capabilities.

Schools and universities should subsequently deal with developing students' skills to collaborate with AI and critically analyze its results, relatively than counting on the tools' ability to easily spit out answers.

To enable collaboration between AI and students, we might have to rethink a few of our traditional ideas about education and assessment.

For example, if a student stimulates, reviews, and edits an AI-generated piece of labor, we’d assume that that is their original contribution and may still be considered a useful a part of learning.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read