Search, harvest, and build datasets from 200M+ open-access papers, books, and datasets — unified in one workspace, exported at terminal speed.
Papers, preprints, books and datasets stream in live from the world's open-access archives — no scraping setup, no per-site logins.
Not just a search box — a full harvesting workspace built for researchers, builders, and data teams.
One unified index across papers, books, datasets and media. Real native taxonomies, full-text search, and per-category counts — not a keyword guess.
Export millions of records to NDJSON and re-load them at ~30,000 rows/sec. No rate-limit babysitting — pull a whole publisher into a local index once.
Turn search results into training data: one-class image datasets, perceptual-hash dedup, auto-labeling, and ready-to-train data.yaml in one click.
Open-access papers and public-domain books, with direct PDF and EPUB downloads. Your harvest is written to your own drive — you own the data.
Browse by real journal categories, cover art, citation and download counts. Bulk-download whole collections with a single confirm.
A streaming SSE backend paginates server-side across hundreds of thousands of records, with a warm pre-crawl cache for instant category clicks.
Pick a source, browse real categories or full-text search, and narrow by year, citations, downloads or type.
Stream the entire result set into a local index — millions of rows, deduped, in minutes.
Download the files, export to NDJSON, or compile straight into a labeled, train-ready ML dataset.
Bulkker is in private pre-launch. Enter your access code to start harvesting.
Enter the app →