Are ChatGPT Detectors Reliable? Exploring Their Trustworthiness and Accuracy


ChatGPT is an artificial intelligence (AI) model developed by OpenAI that has been specifically fine-tuned for generating human-like text responses in conversational contexts. Due to its popularity, it’s becoming harder to distinguish text generated by AI from text created by humans. Therefore, various Chat GPT detectors have been emerging in order to help spot AI-written content, but… are they really effective?

chat gpt detectors

What are ChatGPT and Large Language Models?

An LLM (Large Language Model) is a class of artificial intelligence models characterized by their large size and high capacity for processing natural language. These models are trained on enormous amounts of text data to learn the underlying patterns and structures of human language.

LLMs like GPT-3 have demonstrated great capabilities in various natural language processing tasks, such as text generation and completion, summarization, translation, question answering, and sentiment analysis. 

ChatGPT is a type of LLM. Specifically, it’s a fine-tuned version of the GPT architecture (which falls under the category of LLMs) prepared to create human-like text responses in conversations. In other words, ChatGPT is a specific instance of an LLM that has been fine-tuned for generating text responses in conversational settings.

Due to this ability to mimic human conversation, ChatGPT has been deployed in various applications such as chatbots and virtual assistants. 

GPT models, including Chat GPT, are trained on a diverse range of internet text data, such as social media conversations, forum discussions, and website articles. They learn to predict the next word in a sequence of text based on the preceding context, which enables them to generate coherent and contextually relevant responses.

Understanding ChatGPT Detectors: Insights into Their Functionality

Chat GPT detectors are tools designed to identify content generated by language models like ChatGPT, i.e., AI-generated texts. 

They are tuned to recognize linguistic patterns and features such as variations in sentence length and the frequency of certain words.

In fact, these detectors assess:


The perplexity of a text is its complexity and how unpredictable it is. 

As large language models work by predicting the next most probable word in a sequence, the detectors often consider a phrase as less likely to be AI-generated if it doesn’t follow this pattern. 


The burstiness of a text is how much the perplexity varies over the whole document.

So, these measures help the detectors to identify whether a text is likely to have been generated by AI. However, there can be false positives and false negatives. 

These are some of the most popular ChatGPT detectors: 

  • GPTZero
  • OpenAI Classifier
  • AI Detector by Sapling
  • Originality.ai
  • Copyleaks
  • Writer
  • Smodin

Facing the Test: Challenges Faced by ChatGPT Detectors

Since ChatGPT and other similar language models can generate text that is difficult to distinguish from human-written text, the detectors are not very reliable.

Up to now, the detectors’ results tremendously vary when analyzing the same text. While one of them may consider a text 99% highly likely to be human, another may consider it 99% AI-written.

David Gewirtz from ZDNET tried 5 online tools and compared them. He concluded that “…if you’re going to use a GPT detector, you might want to use a bunch of them on a single piece of text and aggregate the results. Even with that, you’re not guaranteed complete accuracy.” And then he said, “At this point, I don’t think we’re ready to let AI fight AI.”

Recently, researchers from Stanford University examined how reliable generative AI detectors are when trying to determine whether a text was written by an AI or by a human.

According to the research, the detectors are unreliable, especially when the author of the text is not a native English speaker.

James Zou, a biomedical data scientist at Stanford University and co-author of the study said “Current detectors are clearly unreliable and easily gamed, which means we should be very cautious about using them as a solution to the AI cheating problem.” And he also pointed out that the results of the detectors “pose serious questions about the objectivity of AI detectors and raise the potential that foreign-born students and workers might be unfairly accused of or, worse, penalized for cheating.”

Safety and Privacy: Ethical Considerations in ChatGPT Detection

Chat GPT detectors (like many AI models) rely on great amounts of user-generated data for training and operation. As we mentioned before, this data is used to understand and predict linguistic patterns, which are essential for the functioning of these models.

However, the collection, storage, and analysis of this data raise significant privacy concerns:

  • Unauthorized Surveillance: ChatGPT detectors may utilize personal information (as part of their training and operation,) potentially leading to concerns about unauthorized surveillance—Users may worry about their conversations being monitored without their explicit consent.
  • Data Breaches: Storing large amounts of data increases the risk of data breaches. Unauthorized access to this data could result in the exposure of personal information, which could lead to identity theft, financial fraud, etc.
  • Misuse of Personal Information: The personal information within the data used by the detectors could be susceptible to misuse. This could include targeted advertising based on user conversations, profiling individuals for marketing purposes, or even manipulation.

Besides, individuals can’t check whether the company stores their personal information, nor ask for it to be deleted. 

What Does the Future Look like Regarding Chat GPT Detectors?

Up to date, it’s quite clear that ChatGPT detectors are not reliable for determining whether a text was written by an AI or by a human.

They have several limitations, for example, the use of static, outdated data that is not updated in real-time. Inaccurate or biased training datasets may result in false positives or negatives, and even in discriminatory behaviors.

Another recent study concludes that ongoing development in the field of AI detection is necessary in order to improve the accuracy. 

So, the future of ChatGPT detectors will depend on various factors, including advancements in technology, regulatory considerations, and societal acceptance. 

Meanwhile, as their results vary enormously when analyzing the same text, it’s advisable to use different detectors and try averaging the multiple results. Nevertheless, this final result won’t be totally trustworthy.


A blog about digital marketing for restaurants

Subscribe to our newletter and receive the latest blog posts in your email!

Please enable JavaScript in your browser to complete this form.
Scroll to Top