Lemmy Scraper

Scrape public Lemmy posts from any instance by feed, community, or keyword. Get title, author, score, comments and more as JSON. No login needed.

Run this in the cloudRun on Apify →

Social Media Scrapers

How it works

1
Open it on Apify
Hit Run on Apify — it opens the tool in the cloud, no install.
2
Set the inputs
Adjust instance, mode, query (sensible defaults are pre-filled).
3
Click Run
The tool runs on Apify’s cloud and collects the data for you.
4
Export the results
Download as JSON, CSV or Excel, or pipe straight into your app, Google Sheets, or an AI agent.

Inputs

Field	What it does	Type
`instance`	The Lemmy instance host to scrape (bare domain, no https://). Examples: lemmy.world, lemmy.ml, beehaw.org, sh.itjust.works.	string
`mode`	What to scrape: "feed" = the instance front-page feed; "community" = a single community (put its name in Query); "search" = search posts by keyword (put the ter	string
`query`	For mode "community": the community name, e.g. "technology" or "technology@lemmy.world". For mode "search": the keywords to search for, e.g. "linux". Ignored in	string
`sort`	How to sort posts. Hot/Active rank by recent engagement; New is chronological; the Top* options rank by score within a time window.	string
`maxItems`	Maximum number of posts to return. The actor paginates (50 per request) until it reaches this many or runs out of posts.	integer
`notionConnector`	Optional. Write each post as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors,	string
`notionParentId`	Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your wo	string

What you get

A structured dataset — each result includes fields like:

authorauthorActorbodycommentscommunitycommunityTitledownvotesidnsfwpostUrlpublishedscorethumbnailtitle

Export every run as JSON, CSV or Excel, or send it to your app, a database, Google Sheets, or an AI agent.

2 ready-to-run use cases

Scrape lemmy.world Front Page: Hot Posts + Scores

The hottest lemmy.world front-page posts as structured data: title, link, author, score, and comment count. Sort by Hot, New, or Top, no login needed.

Lemmy Keyword Search: Find Posts on Any Topic

Track a brand or topic on Lemmy by pulling every post that mentions your keyword across lemmy.world, sorted newest first, with author and score.

Lemmy Scraper

Scrape public posts from any Lemmy instance — the federated, Reddit-style link aggregator. Browse an instance's front-page feed, pull a single community, or search by keyword. No account, no API key, no login.

Uses the public Lemmy v3 REST API, so reads are fast and clean (structured JSON, not HTML scraping).

Modes

Feed — the instance front page (/api/v3/post/list). Just set instance and sort.
Community — one community's posts. Set mode: "community" and put the community name in query (e.g. technology, or cross-instance technology@lemmy.world).
Search — search posts by keyword. Set mode: "search" and put the term in query (e.g. linux).

What you get per post

id, title, url (the external link the post points to, if any), body (post text; markdown, with stray HTML stripped), author, authorActor (the creator's federated actor URL), community, communityTitle, score, comments, upvotes, downvotes, nsfw, thumbnail, published (ISO), and postUrl (the permalink on the instance, e.g. https://lemmy.world/post/123).

Fields that can be null

url / thumbnail — many posts are pure text discussions with no external link or image.
body — link posts often have no body text.
Any field Lemmy omits for a given post comes back null rather than being dropped.

Input

Field	Notes
`instance`	Lemmy instance host (bare domain). Default `lemmy.world`.
`mode`	`feed`, `community`, or `search`. Default `feed`.
`query`	Community name (community mode) or search term (search mode).
`sort`	`Hot`, `Active`, `New`, `TopDay`, `TopWeek`, `TopMonth`, `TopAll`. Default `Hot`.
`maxItems`	Max posts to return (paginated 50 at a time). Default 100.

Output

One dataset row per post, deduped by post id. Pricing is pay-per-result: you are only charged for genuine post rows (ok: true). Rows we couldn't deliver are never charged:

invalid input — a single ok: false diagnostic row with errorCode: "BAD_INPUT" (bad instance, bad mode, or a missing community name / search term),
no posts for this feed/community/search (NO_RESULTS),
a missing community or non-Lemmy host (NOT_FOUND),
rate limits or network errors (RATE_LIMITED / NETWORK).

Proxy

The Lemmy v3 REST API is public and has no anti-bot, so no proxy is required and the default runs without one (saving proxy credits). Only enable Apify Proxy if an instance rate-limits your IP at very high volume.

Troubleshooting

NOT_FOUND in community mode? Check the community name. If it lives on another instance, use the cross-instance form name@otherinstance.tld, or set instance to that instance directly.
NO_RESULTS? The feed/community/search genuinely returned nothing on this instance — try a different sort, a broader search term, or a larger instance.
BAD_INPUT? community and search modes both require query. instance must be a bare Lemmy domain like lemmy.world.

Example

{ "instance": "lemmy.world", "mode": "community", "query": "technology", "sort": "Hot", "maxItems": 50 }

Notes

Lemmy is federated: a large instance like lemmy.world also relays content from communities hosted elsewhere. The postUrl permalink points to the instance you scraped; authorActor and the community's federated identity tell you where the content originates.