Research integrity: What's the difference between detection and investigation?

Day, Adam

doi:10.59350/tcp36-2bj94

Published May 4, 2026 | https://doi.org/10.59350/tcp36-2bj94

Research integrity: What's the difference between detection and investigation?

Day, Adam

Oversight screens every published research paper in the world — several million each year. We've shown before how the dashboards — one for every journal and institution in the world — can identify risk at a high level.

Since its official launch in January 2025, Oversight has grown rapidly and now screens a large portion of journal submissions. The number of new articles screened rises every month and has increased ten-fold in the last year.

New manuscript submissions via Oversight. This chart ignores publications which we screen in the millions.

For more information about Oversight, get in touch!

The curse of dimensionality

There's a common problem with data analysis. If you've ever built a dashboard, you know this already. Good data analysis is about reducing data to be simple and clear. It's analogous to the editorial process. There are good reasons to publish volume, but the true value is in knowing when not to publish.

The curse of dimensionality is a principle of data analysis. It means that, the more methods you have, the harder it becomes to get an accurate detection. Similarly, the more complex your dashboard gets, the harder it will be to use.

But your users will always ask for one thing in the dashboard. (And that thing is 'everything') It's a real dilemma how to proceed — if you make the dashboard complex, it will be hard to use. If you do the hard work to refine it, that might be interpreted as missing features.

Back when Clear Skies was new, I was looking at the papermill detection problem and I had about 60 different methods that I could use to detect papermills. The question was which methods to put into production? I chose carefully and launched the original Papermill Alarm with just 1 detection pipeline in it. Doing so gave the best balance of coverage, accuracy, cost and sustainability.

Since then, we've worked closely with our users to understand their priorities and decide which methods to develop and release next. Oversight currently has 20 analytical methods running at once. But efficiency and simplicity have always been at the core of what we do.

It's during that time that we've seen the distinction between screening and investigation come out in how Oversight is used. They are 2 very distinct things.

Old integrity tools tended to work 1 article at a time and each article was treated like an investigation. That's a good way to do things in an ideal world, but there are reasons why this type of checking is no longer practical and why Clear Skies' approach: detection first, is rapidly becoming the new norm. The purpose of detection is to avoid investigation.

When to screen and when to investigate? If we can make a decision on the basis of detection it is cheap and low-impact. Investigation is only necessary when a high-impact decision is required.

But why?

First: the volume is too high.

In an ideal world, every problem would be investigated and brought carefully to a resolution. But, if there are 1000s of these every day, ignoring them is the only practical course of action. Investigation simply isn't possible at that scale. I've often compared Oversight's screening tools to a spam filter. A spam filter shows us which emails we don't want to spend time on. In the same way: detection tells us which articles we don't need to spend time investigating. It's pretty easy to make a decision on an Oversight report like this:

This shows the visual part of a report. The box in the middle is an article, the other boxes are references where we found problems. With this number of problematic references, coupled with a low false-positive rate, I highly doubt that the work is suitable for publication (however, this is one article that was not screened by Oversight — it was published a few weeks ago).

There is so much going on here that I would just reject it. But we don't prescribe how to use the tools: getting to a decision from here with the rich data provided by Oversight is a quick process.

Second: we get high precision quickly and cheaply before peer-review even starts.

Screening means running detection methods on everything. We don't ration-out checks; we check everything. Detection scales that way, because it is cheap and fast, but investigation doesn't. Investigation is too slow.

So screening is now a necessity. It's the only practical solution to rising volume and limited resources. Oversight is designed to facilitate rapid detection and decision making.

Investigation is still necessary in a minority of cases. Oversight's alerts help to triage cases for investigation and then facilitates the investigation process by providing detailed reports.

Third: we can separate editorial decisions from judgements

This part is really important. If the purpose of detection is to avoid investigation, what is the purpose of investigation? Investigation is only necessary if we need to make some sort of judgement that will have a lasting impact on someone (such as an author). That could be a retraction, or a sanction of some kind.

And that investigation can be extremely simple.

The grey boxes show references that have low-relevance to the main article. The dotted lines between them tell us that they share a common author. So here we have a really clear pattern: a high number of irrelevant references have been made to just 1 person. We don't make a judgement in Oversight about what this means, but for a human to investigate on the basis of what we're showing here represents a massive time-saving. (This article was also not submitted via Oversight. It has now been retracted.)

Judgement is for humans, not machines. This is actually one reason Oversight is called Oversight. It's an automated system, but it is there to enable human Oversight. We have great results with automated detection tools, but at the end of the day, if judgement is called for, that is for people to do.)

If we find problematic things in a research paper, that does not mean that we are making judgements about the authors.

I'll give you an example:

When we audit an institution's output, we might find that all of the output passes our checks with just a few exceptions. That probably means it's a really good institution.

A branch of a major institution in the UK. Every institution in the world is screened in Oversight.

On reviewing those exceptions, we might find that alerts have been raised in the authors' histories.
We then find that the author histories in question belong to co-authors from different institutions and not from the institution we are auditing.
In such a case, it seems more likely that an author from the institution has simply worked with someone who has a problematic history.
This is useful information for the institution because they want to take care over how they manage collaborations. But there is no quick decision-making here, someone has to look at the situation and make a judgement that considers the interests of the individuals fairly.

So, as time has gone on, we've seen integrity solutions move from a transactional, article-by-article grind to a strategic process of dynamic triage, efficient use of resources, and risk-management.

Oversight has always been primarily a screening service, but we support investigation as well. So when investigation is necessary, we can further reduce the costs. This is where more and more detailed information comes in. For example, imagine a single paper is flagged by a reader, but your output looks like this:

Here, Oversight is set for precision. This means alerts have an extremely low false-positive rate. In this scenario, I would investigate some of the greens as well.

At this stage, you might realise that you don't need to investigate the paper, you need to investigate the patterns that cause this many alerts to appear in your output. Investigation is strategic, too. (And that's another reason it's called Oversight. Strategic Oversight is critical with a problem of this magnitude — it's rare that an investigation involves just 1 article.)

It's at this point that all that extra detail becomes important. When we need to investigate, we need threads to pull on and rich descriptive detail. Oversight provides that, too.

We have some new features to support investigations coming out soon.

For more details about how to access Oversight, contact us.

Additional details

Oversight screens every published research paper in the world — several million each year. We've shown before how the dashboards — one for every journal and institution in the world — can identify risk at a high level. Since its official launch in January 2025, Oversight has grown rapidly and now screens a large portion of journal submissions.

UUID: f1a00504-cbbc-420d-93a4-2ddcc35af48b
GUID: https://medium.com/p/1dfb72215485
URL: https://clearskiesadam.medium.com/research-integrity-whats-the-difference-between-detection-and-investigation-1dfb72215485

Issued: 2026-05-04T13:08:43
Updated: 2026-05-04T13:08:43

Research integrity: What's the difference between detection and investigation?

The curse of dimensionality

But why?

Additional details

Description

Identifiers

Dates

Research integrity: What's the difference between detection and investigation?

Creators & Contributors

The curse of dimensionality

But why?

Additional details

Description

Identifiers

Dates