One unified LLM+more infrastructure for research.

The internal layer behind research.jing.vision and the Article API. Four things in one: a unified LLM endpoint that routes to any model; agents that chain tools across multi-step tasks; prompts‑as‑functions — versioned, typed, callable; and a compound system layer for orchestrating full pipelines like PDF ingestion, parallel extraction passes, and evidence indexing.

one endpoint · any model · OpenAI-compatible
1import OpenAI from "openai";
2const gw = new OpenAI({
3  apiKey:  "gw_••••••••",
4  baseURL: "https://gateway.jing.workers.dev/v1",
5});
6// route to any model — zero other changes
7await gw.chat.completions.create({
8  model: "claude-3-5-sonnet-latest", // or gpt-4o, gemini…
9  messages,
10  stream: true,
11});
multi-step agent · tools · cost-optimal routing
1const agent = createAgent({
2  tools: [search, scrape, embed, summarize],
3  routing: {
4    reasoning:  "claude-3-5-sonnet",
5    summaries:  "gpt-4o-mini",
6    embeddings: "ollama/nomic-embed",
7  },
8});
9const findings = await agent.run(query);
trace · 3 steps · 4.1s
STEP 1 search(query) → 18 results
STEP 2 scrape + embed → 3 docs chunked
STEP 3 summarize(docs) → 320 tokens
prompt as versioned, typed function
1const extractMethods = definePrompt({
2  name:    "extract-methods",
3  model:   "gpt-4o-mini",
4  schema:  MethodsSchema,
5  version: 4,
6});
call it anywhere · schema-validated
7const out = await extractMethods({ section });
8// { datasets:[…], baselines:[…],
9//   confidence: 0.91 }
compound pipeline · PDF → index
1const paper = await gateway.ingest({
2  id: "2401.12345",  // arXiv ID
3  passes: ["entities", "methods",
4           "metrics", "topics"],
5  strategy: "parallel",
6});
output · 4 passes · 2.8s · schema-enforced
PASS methods datasets, baselines isolated
PASS metrics F1 0.88, BLEU 42.1, params 7B
PASS topics RAG · retrieval · long-context
→ INDEX deduped + written to research index

The engine room
for messy
papers.

Research papers are messy in two ways. Each one is long and unstructured — critical details like datasets, baselines, metrics, and limitations are buried deep inside PDFs with no reliable schema. And there are simply too many of them: manual reading doesn't scale, and keyword search returns noise rather than comparable evidence.

What you actually need is to extract evidence and key findings, then compare horizontally across many papers — reliably and repeatedly. That's what gateway makes possible. It’s not a standalone product; it's the deep infrastructure behind the Article API extraction pipeline and the research experience at research.jing.vision.

Think of it as the engine room. It runs the LLM passes that make paper intelligence possible — so the app can surface key findings and compare papers at scale without anyone reading a full PDF.

Each extraction is a typed, versioned prompt function: submit an arXiv ID, retrieve and parse the PDF into sections, run six parallel LLM passes, validate every output against a JSON schema with confidence scoring, dedup against the index, and serve structured results to the API. Repeatable, observable, and cheap enough to run across hundreds of papers.

capabilities

Four primitives.
Infinite research workflows.

From a single model call to a full multi-agent research pipeline — gateway gives you the right primitive at every layer. Each card below is a real capability, with the problem it solves.

01 ·
Any Model, One Key

Different providers, different SDKs, different auth tokens.

One OpenAI-compatible endpoint routes to Claude, GPT-4o, Gemini, Mistral, or any local model via Ollama — zero refactor. Swap a baseURL and everything else stays the same.
drop-in · zero refactor
02 ·
Tool-Calling Agents

Research tasks need decision loops, not single completions.

createAgent wires tools — search, scrape, embed, summarize — into a repeatable multi-step loop. A literature review across 40 papers becomes a single agent invocation.
multi-step · tool-calling
03 ·
Prompt-as-Function

Prompts scattered in code can't be versioned, reused, or tested.

definePrompt wraps any prompt in a named, typed, schema-enforced function. Call it anywhere in the stack. Improve it, bump the version; prior results stay intact and comparable.
versioned · typed · callable
04 ·
Compound Pipelines

Complex multi-stage tasks need more than a single model call — they need orchestration.

Chain arbitrary steps — ingest, extract, validate, index — into a single observable pipeline. Gateway manages state between steps so a full paper-to-index run is one function, not a tangle of callbacks. The PDF extraction pipeline behind research.jing.vision is one compound system built on this primitive. Trend monitoring, side-by-side comparisons, and repeatable structured extraction are all others.
orchestrated · stateful · observable
05 ·
Parallel Execution

Sequential LLM passes don't scale to hundreds of papers.

Run extraction passes concurrently — 6 at the latency of 1. Each pass is independently validated so a bad entities pass never blocks a clean metrics pass from writing.
concurrent · independently scoped
06 ·
Cost-Optimal Routing

Using the same powerful model for every task burns budget fast.

Route by role: Claude for deep reasoning, GPT-4o-mini for bulk summaries, local Ollama for embeddings. The agent selects the cheapest capable model per step, automatically.
claude · gpt-mini · ollama
07 ·
Step-Level Tracing

Agents fail silently; LLM outputs are hard to audit or reproduce.

Every tool call and LLM completion emits a trace — input, output, model, latency. Debug exactly what went wrong. Compare runs to see what a prompt change actually improved.
input · output · latency
08 ·
Versioned Prompt Registry

Improving a prompt silently invalidates prior results.

Every prompt function carries a version. Bump it and re-run only that pass. Each extraction record stores the version that produced it — cross-paper comparisons stay valid even as prompts evolve.
reproducible · comparable
09 ·
Structured Output, Always

Freeform LLM text breaks every downstream system that consumes it.

Every extraction is schema-enforced and confidence-scored. Low-confidence outputs are flagged before they reach the index; duplicates are dropped on write. The index stays clean and queryable.
typed · scored · deduped
Structured evidence, not search results.
01 ·

The research app it powers

The live research experience built on top of gateway — key-finding extraction, side-by-side paper comparison, and structured evidence across the full index.

Open research.jing.vision
02 ·

The extraction pipeline

The Article API landing and extraction pipeline — where PDFs go in and structured, machine-readable evidence comes out via gateway’s orchestration layer.

Article API ↗