A Brief Look at AI Detectors

Nearly as soon as ChatGPT was released, teachers couldn’t even catch their breath before announcements of so-called AI Detectors began to flood the market. They started with GPTzero, but it didn’t take long for more and more to show up. Open AI, the company that runs ChatGPT even had to build their own detector.

But the competency of these detectors does not inspire confidence. Open AI themselves claim:

Our classifier is not fully reliable. In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written”

-OpenAI

26%?? Yikes! GPTzero claims better numbers, but even with 80-90% accuracy, let’s look at what that means.

Imagine I have a class of 25 students, and they all turn in an assignment. Using the threshold above, we can see that

  • 25 x .8 = 20
  • 25 x .9 = 23

That means that anywhere between 2-5 students per assignment will be false flagged as either having had their assignment written by AI, or the opposite, false flagged as written by a human when it actually was AI. Yikes again!

So now we’ve got up to 20% of my class incorrectly “detected”, and there’s NO WAY to know which detection is right and which is wrong. This leads us to the connundrum: If AI detectors aren’t 100% accurate, they might as well be 0% accurate.

Of course, we should do our own due diligence and try it out for ourselves. I did just that, taking a selection of text created completely by ChatGPT and ran it through several detectors. Here are the results:

GPTZero“Your text is likely to be written entirely by a human”
AIWritingCheck.org“AI Prediction: Text Written by AI”
CheckforAI.com“Very High Risk”
AI Text Classifier from OpenAI“The classifier considers the text to be possibly AI-generated.”

So GPTZero absolutely bombed it. OpenAI’s detector qualified it with the painful modifier, “possibly.” Mind you, this is the company that generated the text! AIWritingCheck and CheckforAI did better, but I’m not feeling reassured.

None of this is to mention the follow-up cottage industry of “AI Detector Deflectors” that are now upon us. Sites like GPT-Minus1 seem to be be built for the SOLE purpose of tricking detectors into not recognizing that any given text was created by an AI Bot. A noble pursuit, this is not.

It’s easy to see how quickly we can fall down the rabbit hole. We’ll quickly descend into an arms race of finding detecting detectors that double detect deception.

Where does it end?

Perhaps we need to recognize that AI detectors are a losing game for everyone. Instead, let’s look at ways to embrace AI and teach our students to use it with integrity.

2 responses to “A Brief Look at AI Detectors”

  1. Eric H. Roth Avatar

    Excellent points! While 80% accuracy would be far superior to the estimated 26%, we should certainly mind the gap as you delineate. Many English teachers have become quite familiar with both false positives and missed plagiarism with Turnitin. It seems naive to expect plagiarism detection programs to catch all ChatGPT papers in the near future.

    What’s to be done? Explore, learn, play, share and deploy these powerful, rapidly evolving AI programs in our classrooms! How can we learn alongside our students? Can we help our students ask better, sharper and smarter questions to AI programs? What can we create together with these new tools?

    1. Brent Avatar

      All great questions, Eric! I hope to continue exploring those ideas here, and I encourage anyone to share their experiences, good or bad!

Leave a Reply

Your email address will not be published. Required fields are marked *