Bulkker — The open-science data engine

The platform

From query to dataset, end to end.

Not just a search box — a full harvesting workspace built for researchers, builders, and data teams.

🔍

Search everything

One unified index across papers, books, datasets and media. Real native taxonomies, full-text search, and per-category counts — not a keyword guess.

⚡

Harvest in bulk

Export millions of records to NDJSON and re-load them at ~30,000 rows/sec. No rate-limit babysitting — pull a whole publisher into a local index once.

🧱

Build ML datasets

Turn search results into training data: one-class image datasets, perceptual-hash dedup, auto-labeling, and ready-to-train data.yaml in one click.

📚

Open by design

Open-access papers and public-domain books, with direct PDF and EPUB downloads. Your harvest is written to your own drive — you own the data.

🗂️

Cover-to-cover

Browse by real journal categories, cover art, citation and download counts. Bulk-download whole collections with a single confirm.

🌐

Built to scale

A streaming SSE backend paginates server-side across hundreds of thousands of records, with a warm pre-crawl cache for instant category clicks.

How it works

Three steps to a dataset.

Search & filter

Pick a source, browse real categories or full-text search, and narrow by year, citations, downloads or type.

Harvest the set

Stream the entire result set into a local index — millions of rows, deduped, in minutes.

Export or build

Download the files, export to NDJSON, or compile straight into a labeled, train-ready ML dataset.

The open-science
data engine.

Every open archive, one search bar.

From query to dataset, end to end.

Search everything

Harvest in bulk

Build ML datasets

Open by design

Cover-to-cover

Built to scale

Three steps to a dataset.

Search & filter

Harvest the set

Export or build

Ready to dive into
open science?

Every open archive, one search bar.

From query to dataset, end to end.

Search everything

Harvest in bulk

Build ML datasets

Open by design

Cover-to-cover

Built to scale

Three steps to a dataset.

Search & filter

Harvest the set

Export or build

Ready to dive intoopen science?

Ready to dive into
open science?