Published September 27, 2023 | https://doi.org/10.59350/4e1n2-7fa50

List of academic search engines that use Large Language models for generative answers and some factors to consider when using

Creators & Contributors

  • 1. ROR icon Singapore Management University
Feature image

List of academic search engines that use Large Language models for generative answers (for the latest version - see this page)

This is a non-comprehensive list of academic search engines that use generative AI (almost always Large language models) to generate direct answers on top of list of relevant results, typically using Retrieval Augmented Generation (RAG) Techniques. We expect a lot more!

This technique involves grounding the generated answer by using a retriever to find text chunks or sentences (also known as context) that may answer the question. 

Besides generating direct answers with citations, it seems to me this new class of search engine often but not always

a) Use Semantic Search (as opposed to Lexical search)

b) Use the ability of Large Language Models to extract information from papers such as "method", "limitations", "region" and display them in a literature review matrix format

For more see recording by me - The possible impact of AI on search and discovery (July 2023)

The table below is updated to Oct 2023

To be included on the list, they need to meet the following criteria

a) Has its own search index (academic)

b) Generate a direct answer using RAG type techniques with citations from multiple documents.

c) ChatGPT plus plugins are in a small section below

As such this will exclude JSTOR AI research tool which at the time of writing (Oct 2023) only allows you to ask questions of individual papers. A generic "chat with pdf" tool that allows you to upload multiple documents to query will not qualify either if it does not come with its own search index of content.

Name Sources LLM usedUpload your own PDF? Produces literature review matrix?Other features Elicit.com/old.elicit.org
Semantic Scholar

OpenAI GPT models & other opensource LLMs Yes Yes

  •  List of concept search

Consensus  Semantic Scholar GPT4 for summarisesNo No, has Consensus meter   scite.ai assistant Open Scholarly metadata and citation statements from selected partners "We use a variety of Language models depending on situation." GPT3.5 (generally), GPT4 (enterprise client), Claude instant (fallback) 



No No

  • Summaries include text from citation statements

  • Many options to control what is being cited

scispace Unknown Unknown



Yes Yes  Zeta alpha (R&D in AI)Mostly Comp Science content only -
OpenAI GPT Models

 No NA

  • ability to turn on/off semantic/neural search

  • doc visualization map, showing semantic similarity with cluster labels autogenerated 

Core-GPT / technical paper (unreleased?) CORE   GPT4No No   Scopus.ai (closed beta) Scopus index

?


 No  No

  • Graphical representation to see connections between keywords

Dimensions AI assistant (closed beta) Dimension index Dimensions General Sci-Bert and Open AI's ChatGPT.
No


 NA

  • Provides TLDR

Ask R Discovery R Discovery index - 115M papers (40M open access) - See FAQ ?
No


 No

?

See also - Five AI Research Tools That Referencing Genuine Sources 

Technical aspects to consider

  • What is the source used for the search engine?

A lot of these tools currently use Semantic Scholar, OpenAlex, Arxiv etc which are basically open scholarly metadata and open access full-text sources. Open Scholarly metadata is quite comprehensive, however using open access full text only may lead to unknown biases.

Scite.ai here probably has the biggest advantage here given it also has some paywall full-text (technically citation statements only) from publisher partners.

That said, you cannot assume that just because the source includes full-text it is being used for extraction.

For example, Dimensions and Elicit which do have access to full-text do not appear to be currently using it for direct answers. For technical or perhaps legal reasons their direct answers are only extracted from abstracts. This is unlike Scite assistant which does cite text beyond abstracts.

Elicit does seem to use the available full-text (open access) for generate of the literature review matrix.

  • Are there ways for users to check/verify accuracy of the generated direct answer, or extracted information in the literature review matrix?

RAG type systems ensures hat the citations made are always "real" citations found in their search index, however there is no guarantee that the generated statement is supported by the citation.

In my view, a basic feature such systems should have is a feature to make it easy to check the accuracy of the answers generated.

When a sentence is followed by a citation, typically the whole paper isn't being cited. The system grounds ititsnswer based on a sentence or two from the paper. The best systems like Elicit or scite assistant make it easy to see which are the extracted sentences/context used to support the answer. This can be done via mouseover (scite assistant) or with highlights (elicit).

  • How accurate are the generated direct answers and/or extracted information in the literature review matrix in general?

Features that allow users to check, verify answers are great, but even better is if the system can provide some scores to give users a sense of how generally reliable the results are over a large number of examples.

One way to measure such citation accuracy is via citation precision and recall scores.  However, such scores only measures whether the generated statement and citation given supports the generated statement but do not measure if the generated statements actually answer the question!

A more complete solution is based on ragas framework which measures four aspects of the generated answer

The first two relate to generation part of the pipeline

  • faithfulness - measures how consistent the generated answer is with the contexts retrieved. This is done by checking if the claims in the generated answers can be deduced from the context

  • Answer Relevancy - measures if the generated answer tries to address the question. This does not actually check if the answer is factually correct (which is checked by faithfulness), there might be a tradeoff between the first two

The second two relate to the retrieval part of the pipeline or measures how good the retrieval is

  • Context Precision - This looks at whether the retriever is able to consistently find contexts that are relevant to the answer such that most of the citations retrieved are relevant.

  • Context Recall - This is the converse of the context precision, is the system able to retrieve most of the contexts that might answer the question

The final score could be a harmonic mean of all four scores.

It would be good if systems could generate these stats for users to have a sense of the reliability of these systems, though as of time of writing none of the academic search systems have released such evaluations.

  • How generative AI features are integrated in the search and how it affects you should search

We are still very early in the days of search+generative AI. It's unclear how such features will be integrated into the search.

There are also dozens of ways to do RAG/generative AI + search, either at inference time or even at pretraining stage

  • How does the query get converted to match the retrieved contexts - some examples

    • It could just do simple type of keyword matching

    • It could ask prompt the language model to come up with search strategy which is then used

    • It could convert the query into embedding and match with preindexed embeddings of documents/text

  • How do you combine the retrieved contexts with the LLM (Large Language model)

How it is implemented can lead to different optimal ways of searching. 

For example, say you looking for papers on whether there is an open access citation advantage. Should you search like...

1. Keyword Style - Open Access citation advantage

2. Natural Language style - Is there an Open Access citation advantage?

3. Prompt engineering style - You are a top researcher in the subject of Scholarly communication. Write a 500 word essay on the evidence around Open Access citation advantage with references

Not all methods will work equally well (or at all) for these systems even those based on RAG, e,g, Elicit works for 1&2 but not 3, scite assistant works for all even #3. 

  • Other additional features 

As shown in the table above, other nice features include the ability to upload PDFs for extraction to supplement the limitations of the tool's index is clearly highly desirable.

Scite assistant currently provides dozens of options to control how the generation of answers work is also an interesting direction. For example, you can specify the citations must come from a certain topic, journal or even individual set of papers you specify,

  • Other Non-technical factors

The usual non-technical factors when choosing systems to use apply of course. This includes, user privacy (is the system training on your queries), sustainability of the system (what's their business model?) etc, 

Some (non-comprehensive) list of general web search engines that use LLMs to generate answers

  1. Bing Chat

  2. Perplexity.ai

  3. You.com

Side note : Some systems are chatbots where it may decide to search when necessary, as opposed to Elicit, Scispace which are search engines that always search.... 

Some (non-comprehensive) list of Chatgpt plugins that search academic papers - Requires ChatGPT Plus (default is Bing Chat)

  1. Scholar.ai

  2. Consensus search

  3. Research by Vector

  4. Scholarly

  5. Litmaps

  6. Paperpile

  7. Science

  8. NextPaper.ai

  9. txyz

  10. Scholarly Graph Link

  11. Bibliography Crossref

  12. MixerBox Scholar

  13. Scholar assist

  14. Scholarly insight

Note a lot just cover arxiv or at best open access papers or metadata.

Additional details

Description

List of academic search engines that use Large Language models for generative answers (for the latest version - see this page) This is a non-comprehensive list of academic search engines that use generative AI (almost always Large language models) to generate direct answers on top of list of relevant results, typically using Retrieval Augmented Generation (RAG) Techniques. We expect a lot more!

Identifiers

UUID
7c375f37-547f-4414-a075-05d57c01b530
GUID
164998118
URL
https://aarontay.substack.com/p/list-of-academic-search-engines-that

Dates

Issued
2023-09-27T22:57:00
Updated
2023-09-27T22:57:00