Pre-launch · invite only

The open-science
data engine.

Search, harvest, and build datasets from 200M+ open-access papers, books, and datasets — unified in one workspace, exported at terminal speed.

No tracking · open data · your harvest stays on your drive
bulkker.app / search
PapersBooksDatasetsMedia
Indexed across 30+ sources1,284,902 results
0
Papers indexed
0
Live sources
0
Books & datasets
0
Rows ingested
Connected sources

Every open archive, one search bar.

Papers, preprints, books and datasets stream in live from the world's open-access archives — no scraping setup, no per-site logins.

The platform

From query to dataset, end to end.

Not just a search box — a full harvesting workspace built for researchers, builders, and data teams.

🔍

Search everything

One unified index across papers, books, datasets and media. Real native taxonomies, full-text search, and per-category counts — not a keyword guess.

Harvest in bulk

Export millions of records to NDJSON and re-load them at ~30,000 rows/sec. No rate-limit babysitting — pull a whole publisher into a local index once.

🧱

Build ML datasets

Turn search results into training data: one-class image datasets, perceptual-hash dedup, auto-labeling, and ready-to-train data.yaml in one click.

📚

Open by design

Open-access papers and public-domain books, with direct PDF and EPUB downloads. Your harvest is written to your own drive — you own the data.

🗂️

Cover-to-cover

Browse by real journal categories, cover art, citation and download counts. Bulk-download whole collections with a single confirm.

🌐

Built to scale

A streaming SSE backend paginates server-side across hundreds of thousands of records, with a warm pre-crawl cache for instant category clicks.

How it works

Three steps to a dataset.

Search & filter

Pick a source, browse real categories or full-text search, and narrow by year, citations, downloads or type.

Harvest the set

Stream the entire result set into a local index — millions of rows, deduped, in minutes.

Export or build

Download the files, export to NDJSON, or compile straight into a labeled, train-ready ML dataset.

Ready to dive into
open science?

Bulkker is in private pre-launch. Enter your access code to start harvesting.

Enter the app →