With its ability to pump out confident, humanlike prose almost instantaneously, ChatGPT is a valuable cheating tool for students who want to outsource their writing assignments. When fed a homework or test question from a college-level course, the generative artificial intelligence program is liable to be graded just as highly, if not better, than a college student, according to a new study published on Thursday in Scientific Reports. With no reliable tools for distinguishing AI content from human work, educators will have to rethink how they structure their courses and assess students—and what humans might lose if we never learn how to write for ourselves.
In the new research, computer scientists and other academics compiled 233 student assessment questions from 32 professors who taught across eight different disciplines at New York University Abu Dhabi. Then they gathered three randomly selected student answers to those questions from each professor and also generated three different answers from ChatGPT. Trained subject graders, blind to the circumstances of the study, assessed all the answers. In nine of the 32 classes, ChatGPT’s text received equivalent or higher marks than the student work. “The current version of ChatGPT is comparable, or even superior, to students in nearly 30 percent of courses,” wrote study authors Yasir Zaki and Talal Rahwan, both computer scientists at N.Y.U. Abu Dhabi, in an e-mail to Scientific American. “We expect that this percentage will only increase with future versions.”
The findings are far from the first to suggest that generative AI models can excel at assessments that are typically reserved for humans. GPT-3.5, the model that powers ChatGPT, and the newer model GPT-4 can both pass various Advanced Placement tests, the SAT and sections of the GRE with impressive grades, according to OpenAI. GPT-4 also purportedly shines at a bar exam, the LSAT and various sommelier tests, per the company’s assessment. Outside research has shown similar results, with trials demonstrating that GPT 3.5 can surpass the human median score on the Medical College Admissions Test and Ivy League final exams. The new study adds to the growing body of work that hints at how disruptive generative AI is set to become in schools—assuming it hasn’t already covertly worked its way into every classroom. In response, teachers and education experts say they need to adapt.
To try to prevent students from fabricating assignment answers with ChatGPT, Debora Weber-Wulff, a computer science professor at the University of Applied Sciences for Engineering and Economics in Berlin (HTW Berlin), has turned to the popular large language model (LLM) herself. She has been preparing for next semester by running exam and homework questions through the AI and then modifying the questions to trip the machine up. “I want to make sure that I have exercises that can’t be simply solved using ChatGPT,” she says. This strategy isn’t foolproof: there are already more-advanced LLMs out there, and updates and fine-tuning mean ChatGPT is liable to change how it responds to prompts over time. There may also be certain tricks to yield suitable answers from ChatGPT that Weber-Wulff hasn’t thought of. “Maybe my students will surprise me and show me that it was possible,” she says. “I don’t know. I will be learning, too.” But what the computer scientist does know is that she’s putting in more effort to try to thwart academic dishonesty now than she was before. And the problem goes far beyond novel technology.
AI developers did not exactly invent cheating. Prior to ChatGPT’s release, thousands of people in Kenya offered essay-writing services to students, notes Ethan Mollick, an associate professor of management at the University of Pennsylvania’s Wharton School of Business, who researches the impacts of AI on education. But getting a person to write your essay costs money, while ChatGPT does not. LLMs have simply made cheating on certain assignments easier and more accessible than ever before, Mollick notes. He highlights a challenge that has been present and growing for decades: some students view school assignments as boxes to check, not opportunities to learn.
The incentive structure of education has become muddled, says Joe Magliano, an educational psychologist at Georgia State University. Students are often rewarded for and reduced to their grades—not their effort or understanding. Higher education, in particular, has “incentivized students to use demonstrably poor learning strategies,” Magliano adds. Ian O’Byrne, an education professor at the College of Charleston, who researches literacy and technology, agrees. “The real big crisis here, it’s less about AI,” he says. “It’s just these generative tools are allowing us to hold up a mirror to what’s really happening in and out of our classrooms.”
The focus for educators thus should not be on preventing students from using ChatGPT but rather on addressing the root causes of academic dishonesty, suggests Kui Xie, an educational psychologist at Michigan State University. Xie studies student motivation, and he chalks up cheating and plagiarism to people’s attitudes toward learning. If a student is motivated to master a skill, there’s no reason to cheat. But if their primary goal is to appear competent, outcompete peers or just get the grade, they’re liable to use any tool they can to come out ahead—AI included.
AI-based cheating not only makes it more difficult to assess students’ knowledge but also threatens to prevent them from learning how to write for themselves. Writing well is a basic human linguistic skill, useful in most professions and valuable as a mode of individual expression. But writing is also a key learning tool in and of itself. Cognitive research has shown that writing helps people build connections between concepts, boosts insight and understanding, and improves memory and recall across a variety of topics, says Kathleen Arnold, a psychologist at Radford University, who studies how writing and learning are interrelated. If a student opts to outsource all their written assignments to ChatGPT, they not only won’t become a better writer—they might also be stunted in their academic and intellectual growth elsewhere. Arnold says it’s a prospect that worries her. But at the same time, it’s an opportunity to rethink teaching and even reconceptualize AI tools as educational opportunities rather than threats to learning.
Educators at every level can design their courses and assignments to better encourage growth over competition, and technology can be a part of that. Teachers could use what Mollick calls “flipped classrooms,” where students would self-direct learning at home—aided in part by AI tutoring tools—and then use class time for working with peers. Instead of proving their grasp of the new material through homework, which could be completed by an AI, they would build on and demonstrate their knowledge through in-class projects.
Phasing out or minimizing grades is another possibility, Xie says. If a teacher’s feedback to students is more individualized and focused on process—rather than just assigning a quantitative value to the final product—students might be less inclined to cheat with AI. More frequent lower-stakes assignments could also help. Qualitative feedback and assessing a larger volume of student work both take more time and effort from teachers, but here again, Xie believes generative AI could be used as a tool to speed up the process.
ChatGPT might also be useful for students in the idea-formation process for any assignment as a brainstorming partner to bounce thoughts off of, O’Byrne says. By teaching students how to apply AI tools for their own benefit, clearly outlining expectations for ethical use and encouraging transparency, educators could end up with tech-savvier pupils who would be less prone to let AI steer the whole ship. Other strategies might include using assessments that avoid a focus on rote memorization and instead shift toward more analysis and synthesis. The N.Y.U. Abu Dhabi study found that ChatGPT was most adept at generating responses to fact-based questions; it fell significantly behind human students’ performance when it was given conceptual prompts.
In an ideal world, our relationship with generative AI might end up similar to the one we have with calculators and spellcheck, Magliano says. All are tools with helpful and less helpful applications. It’s just a matter of ensuring students know when to use them—and when not to.