Catching Computer Science Cheaters

Academic plagiarism has long been a problem in computer science faculties, but instructors and other university staff are increasingly turning to a series of free, Web-based tools to ferret out plagiarized code and catch cheaters. "Computers make plagiarism easier, but also facilitate detection," said Michael Wise, author of YAP, a Web-based program that searches for […]

Academic plagiarism has long been a problem in computer science faculties, but instructors and other university staff are increasingly turning to a series of free, Web-based tools to ferret out plagiarized code and catch cheaters. "Computers make plagiarism easier, but also facilitate detection," said Michael Wise, author of YAP, a Web-based program that searches for "borrowed" code in computer science homework assignments.

To deal with such dishonesty, professors are using YAP and other programs, including MOSS, or "measure of software similarity."

MOSS searches for similarities among programs written in the Ada, C, C++, Java, Pascal, Lisp, ML and Scheme programming languages. Professors submit batches of student programs to the MOSS server, then obtain the results minutes later via the tool's Web site, where a visual interface highlights suspect code in red.

During the school year, the MOSS program, developed by a computer science professor at the University of California, Berkeley, processes between 50 and 100 submissions a week.

The MOSS algorithm is based on "code-sequence matching," says Alex Aiken, the program's developer.

Aiken says MOSS does not analyze a program's algorithms - a task that is still too difficult. Rather, the program bases its findings on syntax, or the structure of the program itself. Aiken said that this method is more effective than counting the frequency of words in the program - the usual method of software plagiarism detection.

Guido Malpohl, who wrote the MOSS Web interface, also authored another software plagiarism detector called JPlag. That program only works on programs authored in Java, though Malpohl says it will be extended to work on other programming languages.

Malpohl says that where MOSS maintains a database that stores an internal representation of programs and then looks for similarities between them, JPlag compares the submitted programs in pairs, trying to find a maximal amount of similarities that occur in each program pair.

"The standard algorithm [in plagiarism detection] just looks at the frequency that keywords appear in the file," he says. "For example, count all the IFs, THENs and ELSEs and see if they match in two programs," he said. "The thing that people are least likely to change is the control structure of the program."

Aiken said that the thinking behind JPlag is that while a cheater might make all kinds of cosmetic alterations to a program, that program's control structure is the part that is least likely to be altered by someone who doesn't understand the code.

The trouble with this method, however, is that these primitive contructs - the IF, THEN and ELSE statements - are used in about the same ratio in just about every program. The end result is that plagiarism-detection software that uses this scheme is prone to generate false positives.

Aiken claims that MOSS avoids that methodology. Just how the program works, though, is a secret.

"I'd rather not disclose that completely, because that makes it easier to break the system," Aiken said.

MOSS access is restricted to university faculty and staff, so students can't try to circumvent the system by running their programs through it. There are currently about 300 accounts on the system.

And while the tools are leading to improved cheating detection, the problem of plagiarism isn't going away.

"Computer science instructors have guessed that on any given assignment, between 5 and 20 percent of the students have collaborated 'beyond what is reasonable,'" said Kenneth C. Moyle, computing services coordinator for science faculty at McMaster University.

Moyle's impression is that cheating is a serious problem in computer science courses because it is so easy to plagiarize a program by making small changes to alter its appearance.

"It's hard to prove that there's been cheating," Moyle said. "In any given year, there are probably five to 10 times when students are actually confronted about cheating, but it's suspected that it happens much more than that," he said.

The difficulty of proving wrongdoing is perhaps why academics are showing so much interest now in plagiarism-detection software.

"[Plagiarism] is a noted problem everywhere, but one which is deeply unfashionable and generally swept under the carpet," Wise said. "Plagiarism detection is deeply unfashionable because it is seen as being very negative, and the rhetoric goes that we should be teaching our students better. This, of course, is nonsense, because students are under pressure - and under pressure we all, sometimes, do things we'd not do otherwise," he said.

Aiken said that the cheating-detection programs stand a chance to reduce the incidence of plagiarism.

"After students get used to the idea that there's a real risk of getting caught, then I think people will be more circumspect about cheating," Aiken said.