Request a tool
All toolsMCP serverRequest a toolPlatformsCategories
arXiv Scraper icon

arXiv LLM Paper Scraper - Abstracts, Authors, PDFs

Run a keyword search across arXiv for large language model papers, ranked by relevance, with abstracts, author lists, and direct PDF download links.

Run this use case nowRun on Apify →

How it works

  1. 1
    Open it on Apify

    Hit Run on Apify — it opens the tool in the cloud, no install.

  2. 2
    Set the inputs

    Adjust query, sortBy, maxItems (sensible defaults are pre-filled).

  3. 3
    Click Run

    The tool runs on Apify’s cloud and collects the data for you.

  4. 4
    Export the results

    Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.

Inputs

FieldWhat it doesType
queryarXiv search query. Use arXiv field prefixes: all: (all fields), ti: (title), au: (author), abs: (abstract), cat: (category). Examples: "all:large language modestring
sortByHow to order results. Relevance ranks by match quality; Submitted date sorts by original submission; Last updated date sorts by most recent revision. Newest/mosstring
maxItemsMaximum number of papers to return. The actor paginates 100 per request and pauses ~3s between pages to respect arXiv's rate guidance. arXiv hard-limits total rinteger
notionConnectorOptional. Write each paper as a page into your Notion when the run finishes — handy for building a literature-review database. Authorize a Notion connector oncestring
notionParentIdOptional. The Notion data source ID of the database to write papers into (only used if a Notion connector is set). Leave empty to create the pages privately in string

What you get

A structured dataset — each result includes fields like:

absUrlabstractarxivIdauthorscategoriesdoipdfUrlprimaryCategorypublishedAttitleupdatedAt

Export every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.

More use cases for arXiv Scraper

Latest arXiv cs.CL Papers - Newest NLP Preprints

Track new cs.CL (Computation and Language) preprints the moment they hit arXiv, sorted by submission date, so NLP researchers never miss a release.