AI-Detection Bias and False Positives

From Metopedia, the free encyclopedia
AI-Detection Bias and False Positives
Comparative detector study
Full titleAI-Detection Bias and False Positives: Comparing 2016 Human, 2026 AI, and 2007 Student Essays Across Common Detectors
AuthorAndrew Lehti
Publication dateFebruary 28, 2026
DOI10.6084/m9.figshare.31439995
SubjectAI-detection reliability, false positives, detector bias, academic-integrity policy
CorpusFive text samples: three human essays/comments, one seventh-grade human essay, one AI-generated essay
Main claimAI-detection scores vary sharply by detector and appear sensitive to polish, formatting, and structural regularity
ArchiveInternet Archive PDF

AI-Detection Bias and False Positives: Comparing 2016 Human, 2026 AI, and 2007 Student Essays Across Common Detectors is a 2026 paper by Andrew Lehti examining the reliability of common AI-authorship detectors when applied to human-written and AI-generated text samples.[1] The study compares detector outputs on older human writing, contemporary AI writing, informal student writing, and additional control texts. It argues that many detectors appear to penalize formal structure, grammatical consistency, and polished academic style rather than identifying a stable signature of machine authorship.

The paper is situated within broader debate over AI-content detection in education. Published research has found that available AI-detection tools can be unreliable, inconsistent, and vulnerable to paraphrasing or obfuscation.[2] Other research has reported bias against non-native English writing, raising concerns about fairness in academic and evaluative settings.[3]

The central finding of Lehti's paper is that a polished human essay from 2016 received a higher average AI-detection score than a 2026 AI-generated essay. The paper interprets this as evidence of a “polish penalty”: the tendency of detectors to associate structural competence, formal tone, and regular formatting with artificial generation.[1]

Background

AI-detection systems are software tools that attempt to estimate whether text was written by a human, generated by a language model, or produced through a mixture of human and machine assistance. These tools are often used in academic settings to support plagiarism screening, authorship review, and academic-integrity investigations. Their outputs are commonly expressed as percentages, risk labels, or probability bands.

The growth of generative language models increased pressure on schools, universities, publishers, and online platforms to distinguish human writing from model-generated writing. In practice, this task is difficult because modern language models are trained on large corpora of human writing, including academic prose, web articles, essays, forum posts, documentation, and informal discussion. A model's output can resemble the average style of the same written ecosystem from which human writers also learn.

Lehti's paper argues that this creates a convergence problem. AI systems learn from human writing; human writers increasingly read AI-influenced text; and ordinary writing tools now include grammar correction, tone rewriting, autocomplete, and one-click revision. As a result, the boundary between “human style” and “machine style” becomes less stable over time.[1]

Publication and context

The paper is part of Lehti's broader Metopedia and cognitive-psychology corpus. It includes an introductory advisory on “Cognitive Impasse,” a concept used by the author to describe resistance to ideas that contradict prior beliefs. The paper also describes the author's method as “Extrapolative Trial by Error,” a process in which independent observation and synthesis precede review of external academic literature.[1]

Although those framing sections are unusual for a conventional detector-evaluation paper, the empirical core of the document is a comparative table of detector outputs across multiple writing samples. The article also includes appendices containing control cases, full detector tables, error-rate summaries, and graphs.

Research question

The study asks whether public and commercial AI-detection systems can reliably distinguish AI-generated content from older and contemporary human-generated writing. It focuses on false positives, detector disagreement, and the possibility that detectors may classify polished human writing as AI-generated because of surface-level features.

The specific concerns examined include:

Methodology

The initial comparison used three main texts:

SampleDateOriginLengthStyle
2016 Human Essay2016Human-written7,062 words; about 46,000 charactersPolished, semi-academic prose
2026 AI Essay2026AI-generated1,115 wordsMixed formal and informal register
2007 Human Essay2007Human-written seventh-grade essay898 wordsInformal, beginner-level prose

Each document was submitted to multiple AI-detection systems. Where a detector required chunking because of word limits, the reported value was rounded or averaged. Later appendices added two additional controls: a 2026 human essay and a 2026 human Reddit comment.[1]

The study reports detector outputs as AI-likelihood percentages. These numbers are treated as the tools' own probability-like claims rather than as independently validated probabilities.

Detectors tested

The paper reports scores from a range of AI-detection systems and model-based judgments, including:

The paper notes that Quillbot and Scribbr may share backend relationships or similar scoring behavior, but that subscription limits and input limits produced differing values in the reported tests.[1]

Main comparison

The first major table compared the 2016 human essay with the 2026 AI essay.

Detector2016 Human Essay2026 AI Essay
Copyleaks96.8%99.9%
ZeroGPT54.51%31.67%
GPTZero98%92%
Gemini 3 Thinking80–90%90–95%
ChatGPT Extended Thinking35–45%75–85%
Quillbot29%54%
Sapling AI100%29%
Grammarly45%15%
AIDetector2.35%4.75%
Scribbr37.5%26%
undetectableAI67%72%
originalityAI100%100%
Pangram100%100%
NoteGPT28.9%53.26%
GPTInf100%100%
eduwriter AI49%29%
Winston AI99%61%

The reported summary statistics were:

SampleMean AI scoreMedianStandard deviationRange
2016 Human Essay66.59%67.00%32.78%2.35% to 100%
2026 AI Essay59.25%57.50%33.69%4.75% to 100%

The study emphasizes that the human-written 2016 essay received a higher average AI score than the AI-generated 2026 essay. Both samples also produced wide ranges, from near-zero values to 100% AI classifications, depending on the detector.[1]

Informal writing control

The 2007 seventh-grade human essay served as a low-polish control. It contained informal narration, uneven structure, inconsistent punctuation, and beginner-level prose. Most tools classified it as overwhelmingly human.

DetectorAI score for 2007 human essay
AIDetector1.75%
Scribbr1%
Quillbot1%
Sapling AI26.4%
NoteGPT16.15%
GPTZero2%
WinstonAI5%
undetectableAI9%
ZeroGPT16.15%
Eduwriter AI16%
OpenL IO16%
YouScan15%
ChatGPT Extended Thinking10–25%
Gemini 3 Pro Thinking5–10%

Lehti interprets the results as evidence that imperfection, irregularity, and lower structural control may be treated by detectors as human markers.[1]

Appendix controls

Reddit comment control

Appendix A examined an informal Reddit comment described as “clearly not AI.” The comment included spelling mistakes, repetitive personal details, uneven rhythm, and non-linear narration. Most detectors assigned low AI scores.

DetectorResult
Copyleaks0% AI
ZeroGPT3.24% AI
GPTZero0% AI
Quillbot0% AI
Detecting-AI29.1%
Sapling AI0% AI
Grammarly0% AI
AIDetector2.37% AI
Scribbr0% AI
undetectableAI5% AI
originalityAI0% AI
Pangram0% AI
NoteGPT3.27%
GPTInf0%
eduwriter AI3% AI
Winston AI2%
OpenL IO3% AI
YouScan15% AI

The paper treats this sample as a qualitative control because most detectors converged on a human classification, while a minority still reported non-trivial AI likelihood.[1]

2026 human essay control

Appendix B examined a 2026 first-person human essay. The author states that it was not polished, corrected, revised, or generated with tools. It was 3,502 words and 22,372 characters.

DetectorResult
Copyleaks77.4% AI
ZeroGPT7.1% AI
GPTZero38.31% AI
Quillbot13% AI
Detecting-AI30.6%
Sapling AI27.6% AI
Grammarly30% AI
AIDetector12.47%
Scribbr19%
undetectableAI77% AI
originalityAI100% AI
Pangram39% AI
NoteGPT7.87%
GPTInf11% AI
eduwriter AI17% AI
Winston AI3%
OpenL IO7% AI
YouScan85% AI
Detector IO21% AI
Content Detector AI18% AI
Decopy AI57% AI
Dechecker8% AI

For this control, the reported mean was 33.60%, the median was 24.3%, the standard deviation was 28.31%, and the range was 3% to 100%. The paper uses these values to argue that even human-coded personal writing can be classified as AI by some systems when it contains enough coherent structure.[1]

Full detector table

Appendix C expanded the results into a five-condition comparison.

Detector2026 Human Essay2007 Human Essay2016 Human Essay2026 AI Essay2026 Human Comment
AIDetector12.5%1.8%2.4%4.8%2.4%
ChatGPT65.0%17.5%40.0%80.0%7.5%
Content Detector AI18.0%0.0%73.0%0.0%23.0%
Copyleaks77.4%0.0%96.8%99.9%0.0%
Dechecker8.0%16.0%16.0%31.0%3.2%
Decopy AI57.0%46.0%51.0%34.0%32.0%
Detecting-AI30.6%42.6%48.3%67.9%29.1%
Detector IO21.0%0.0%21.0%70.0%0.0%
GPTInf11.0%0.0%100.0%100.0%0.0%
GPTZero38.3%2.0%98.0%92.0%0.0%
Gemini35.0%7.5%85.0%92.5%0.0%
Grammarly30.0%1.0%45.0%15.0%0.0%
NoteGPT7.9%16.2%28.9%53.3%3.3%
OpenL IO7.0%16.0%37.0%42.0%3.0%
Pangram39.0%0.0%100.0%100.0%0.0%
Quillbot13.0%1.0%29.0%54.0%0.0%
Sapling AI27.6%26.4%100.0%29.0%0.0%
Scribbr19.0%1.0%37.5%26.0%0.0%
Winston AI3.0%5.0%99.0%61.0%2.0%
YouScan85.0%15.0%15.0%25.0%15.0%
ZeroGPT7.1%16.2%54.5%31.7%3.2%
eduwriter AI17.0%16.0%49.0%29.0%3.0%
originalityAI100.0%0.0%100.0%100.0%0.0%
undetectableAI77.0%9.0%67.0%72.0%5.0%
Mean33.60%3.67%66.59%61.18%5.49%
Median24.30%1.00%67.00%61.00%2.19%
Standard deviation28.31%7.00%32.78%33.57%9.42%

The table divides the samples into “structured” and “low-quality” categories. The 2007 student essay and 2026 Reddit comment are described as poor-structure, inconsistent, or low-flow writing. The 2016 and 2026 human essays are described as semi-formal or experience-driven, while the 2026 AI essay is described as fully generated AI.[1]

Detector error rates

Appendix C also reports error-rate categories.

CategoryReported error rate
Human Detection Error Rate26.95%
AI Detection Error Rate45.4%
Semi-Formal Error Rate45.8%
Low-Quality Writing Error Rate8.1%

In the paper's terminology, the human error rate is the average AI probability assigned to human-written texts. The AI error rate is the probability mass not assigned to the AI class for the AI-generated essay. Semi-formal error refers to structured human essays, while low-quality writing error refers to the seventh-grade essay and the Reddit comment.[1]

Graphical findings

The paper includes five appendix graphs:

These graphics visually support the paper's claim that detector behavior clusters around style and structure rather than around a clean authorship boundary.

Stylistic ecosystem convergence

One of the paper's main interpretive sections argues that AI detection is structurally unstable because both humans and AI participate in the same writing ecosystem. Language models are trained on human writing from online platforms, academic works, social media, forums, and professional writing. Human writers, in turn, learn from the same public internet and increasingly encounter AI-generated prose.

The paper describes this as a feedback loop:

  1. AI produces structured text.
  2. Humans read and absorb the structures.
  3. Humans adopt some of those structures.
  4. Human writing moves closer to AI training distributions.
  5. Detectors must distinguish overlapping distributions.

This argument resembles broader concerns in machine-learning classification: when two classes become statistically entangled, classifiers lose stable decision boundaries. In authorship detection, the relevant signal is not only what a text looks like, but whether its stylistic features uniquely identify its source. Lehti argues that they increasingly do not.[1]

Formatting and structural bias

The paper argues that formatting choices can affect detector scores even when semantic content remains unchanged. Reported factors include:

Lehti describes this as a practical institutional concern because a student or writer might be penalized for writing clearly, using formal organization, or presenting work in a clean academic format.[1]

AI authorship and AI-assisted revision

The paper distinguishes full AI authorship from AI-assisted revision. Full AI authorship is described as a model generating the main substance of a work from limited prompting. AI-assisted revision is described as human authorship followed by software-based improvements to clarity, grammar, flow, tone, punctuation, or redundancy.

This distinction is significant because modern writing environments often include AI-like correction systems. Email clients, word processors, grammar checkers, and online editors may suggest rewritten sentences, tone adjustments, and punctuation changes. These changes can regularize a document without replacing the author's ideas.

Lehti argues that detection systems often treat authorship and revision as the same category. A human-authored document that has been lightly smoothed may acquire features that detectors associate with generated text: regular syntax, coherent structure, reduced redundancy, and consistent punctuation.[1]

The polish penalty hypothesis

The “polish penalty” is the paper's name for the observed tendency of AI detectors to assign higher AI probabilities to cleaner, more structured writing. Under this hypothesis, detectors may interpret formal competence as artificiality.

The paper contrasts three kinds of samples:

Writing typeDetector tendency in the studyInterpretation
Informal, irregular, error-prone writingUsually scored as humanIrregularity is treated as a human signal
Semi-formal human writingOften scored as partially or highly AIStructure and polish raise suspicion
AI-generated semi-formal writingScored inconsistently, sometimes lower than human writingDetectors do not share a stable AI signal

The hypothesis does not require that every detector behave identically. Instead, it claims that a general relationship appears across the study: as structural competence increases, AI attribution becomes more likely.[1]

Humanization paradox

The paper reports an additional test in which a human-written academic text initially received a mean AI score of 44.7%. After being processed through AI “humanization” or detector-bypass tools, the mean AI score increased to 76.4%.

Lehti describes this as paradoxical because a tool intended to make text appear more human made the human text appear more AI-generated. The paper interprets this as evidence that detectors are highly sensitive to surface-level statistical changes and may not be measuring authorship origin directly.[1]

Academic implications

The paper argues that AI detectors should not be used as determinative evidence in academic misconduct cases. The reasoning is based on three claims:

  1. false positives can accuse human writers of misconduct;
  2. detector disagreement makes tool choice outcome-determinative;
  3. polished academic writing can be misread as AI-generated writing.

This position is consistent with caution expressed by outside institutions and researchers. Vanderbilt University disabled Turnitin's AI detector in 2023, citing concerns about transparency, false positives, and the consequences of a 1% false-positive rate when applied to large numbers of student papers.[4] Turnitin has published its own explanation of false positives and its AI-writing report behavior, including thresholds designed to reduce false-positive risk.[5][6]

Relation to existing research

Lehti's findings align with several external concerns in the literature:

Lehti's contribution is narrower and more autobiographical in sample selection, but it adds a specific claim: polished pre-AI human writing can be scored as AI at a higher mean level than an AI-generated essay tested in the same comparison.[1]

Limitations

The paper has several limitations:

These limitations do not eliminate the paper's concern about false positives, but they affect how broadly its numeric results can be generalized.

Terminology

; AI detector : A tool that estimates whether text was generated by artificial intelligence.

; False positive : A human-written text incorrectly classified as AI-generated.

; False negative : AI-generated text incorrectly classified as human-written.

; Polish penalty : Lehti's proposed term for the tendency of detectors to assign higher AI probability to cleaner, more structured, or more academically polished writing.

; Stylistic convergence : The overlap between human and AI writing styles caused by shared corpora, AI-assisted writing tools, and human exposure to AI-generated prose.

; AI-assisted revision : Human-authored writing that has been edited, clarified, or polished with machine assistance, without the machine generating the core argument or evidence.

Summary of findings

The paper's main findings can be summarized as follows:

FindingEvidence in paper
Detector outputs vary sharply across systemsHuman and AI samples ranged from near-zero to 100% AI depending on tool
The 2016 human essay scored higher than the AI essay on average66.59% mean for the 2016 human essay versus 59.25–61.18% for the 2026 AI essay depending on table version
Informal writing was treated as humanThe 2007 essay and 2026 Reddit comment received low mean AI scores
Structured human writing was more vulnerable to false positivesSemi-formal human essays had much higher AI scores than low-quality controls
“Humanizer” tools may backfireA human text reportedly rose from 44.7% to 76.4% after humanization
Academic enforcement use is riskyThe paper argues detector scores should not serve as decisive evidence

See also

References

  1. ↑a ↑b ↑c ↑d ↑e ↑f ↑g ↑h ↑i ↑j ↑k ↑l ↑m ↑n ↑o ↑p ↑q ↑r ↑s Andrew Lehti, AI-Detection Bias and False Positives: Comparing 2016 Human, 2026 AI, and 2007 Student Essays Across Common Detectors, figshare, 2026. DOI: 10.6084/m9.figshare.31439995. Archived PDF: Internet Archive.
  2. ↑a ↑b Debora Weber-Wulff et al., “Testing of detection tools for AI-generated text,” International Journal for Educational Integrity, 2023. DOI: 10.1007/s40979-023-00146-z.
  3. ↑a ↑b Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, and James Zou, “GPT detectors are biased against non-native English writers,” Patterns, 2023. DOI: 10.1016/j.patter.2023.100779.
  4. ↑a ↑b Vanderbilt University Brightspace Support, “Guidance on AI Detection and Why We're Disabling Turnitin's AI Detector,” 2023. Link.
  5. ↑a ↑b Turnitin, “Understanding false positives within our AI writing detection capabilities,” 2023. Link.
  6. Turnitin, “Using the AI Writing Report,” 2026. Link.