# oruk — full LLM context

> Single concatenated reference. Keeps `llms.txt` as the curated index and
> dumps every doc page worth reading inline here. ~12 kB, well within any
> modern context window.

Last updated: 2026-04-28. The content here mirrors the live website
(/docs, /methodology, /about, /sources, /pricing). When in doubt, the
canonical source is the page on oruk.ai — every page is server-rendered
and reflects the current behaviour.

---

# 1. Quickstart

oruk is a live broadcast intelligence API. It listens to ~200 live radio,
TV, social, and structured feeds in real time and publishes corroborated
news events. Every story includes a stable id (`evt_...`), headline, body,
summary, primary category, multi-category list, topics, urgency, impact,
confidence, event location, and a corroboration block with the count of
independent sources.

Base URL: `https://api.oruk.ai`
Public website: `https://oruk.ai`
System health: `https://api.oruk.ai/health`

Sign up: <https://oruk.ai/signup>. Generate API keys at <https://oruk.ai/dashboard>.

First call (no key required):

```
curl https://api.oruk.ai/v1/stories/feed?limit=10
```

First filtered call (with a key):

```
curl -H "X-API-Key: ork_xxxxxxxx" \
  "https://api.oruk.ai/v1/stories?category=conflict&min_impact=7&limit=20"
```

---

# 2. Authentication

Pass your API key on each request:

- `X-API-Key: ork_xxxx` (preferred)
- `Authorization: Bearer ork_xxxx`
- `?api_key=ork_xxxx` (only for SSE / EventSource clients that can't set headers)

Public read endpoints (no key, no quota): `GET /health`, `GET /v1/health`,
`GET /v1/stories/feed`. Everything else under `/v1/*` requires a key.

Tier matrix:

| Tier | Calls/mo | API delay | Keys | SSE | Per-min |
| --- | --- | --- | --- | --- | --- |
| free | 100 | 5 min | 1 | not included | 30 req/min |
| pro ($12/mo) | 1,000 | none | 2 | not included | 60 req/min |
| legacy | 1,000 | none | 2 | not included | 60 req/min |
| trader ($50/mo, invite) | 10,000 | none | 2 | real-time | 300 req/min |
| developer ($100/mo) | 10,000 | none | 2 | real-time | 300 req/min |
| enterprise | 1M+ | none | 100 | real-time | custom |

The live wire on https://oruk.ai/ is real-time and free for everyone (no
signup, no delay, no credit card). Paid tiers exist for the programmatic
API (REST + SSE), real-time API responses, and higher quotas.

Annual billing is two months free vs. monthly on Pro and Developer.

---

# 3. Endpoints

## GET /v1/stories/feed (public)

Pre-built feed of the latest stories. No auth, no quota. Powers the public
wire on oruk.ai and the no-key fallback in oruk-mcp.

Parameters:
- `limit` int, 1-100, default 20
- `sort` "recent" (default) or "impact"
- `since_hours` int, 1-168, default 4

Response:

```json
{
  "stories": [{ /* see Story shape below */ }],
  "meta": {"count": 10, "window": "all", "cursor": "evt_...", "hasMore": true}
}
```

## GET /v1/stories (auth)

Paginated, filterable list. Parameters:

- `limit` int, 1-100
- `cursor` string (story id from a prior response)
- `category` one of the 12 categories
- `since` ISO 8601 ("2026-04-28" or "2026-04-28T15:00:00Z")
- `topics` comma-separated topic filter
- `q` full-text search (headline / summary / body / source / city)
- `region` one of the 6 regions
- `country` ISO 3166-1 alpha-2 ("US", "DE", "JP")
- `urgency` "breaking" | "developing" | "routine"
- `min_impact` int, 0-10
- `min_confidence` float, 0.0-1.0
- `format` "json" (default) | "csv" | "jsonl"

## GET /v1/stories/{id} (auth)

Single story by `evt_…` id. Includes the full body, timeline of
developments, multi-source corroboration with verbatim quotes, multi-category
list, and event coordinates.

## GET /v1/stream (auth, SSE)

Server-Sent Events. Developer, Trader, and Enterprise tiers may connect in
real time (Free, Pro, Legacy → HTTP 403). Events:

- `story` — new or updated story payload
- `corroboration` — existing story confirmed by another source
- `heartbeat` — system pulse with active source count

Reconnect with `Last-Event-ID` is supported.

## GET /v1/sources (auth)

Every monitored source: name, city, region, country, language, default
category, medium (`audio_radio` | `social` | `structured`), live status,
polling cadence.

## GET /v1/regions (auth)

Aggregated regional story counts for map and analytics overlays.

## GET /v1/stats (auth)

System-wide statistics (active sources, stories total, transcription
throughput, top categories, last cycle ms, uptime seconds).

## POST /v1/webhooks (Developer or higher)

Subscribe an HTTPS endpoint to `story` and `corroboration` events.

Filters:
- `categories` (array of category slugs)
- `min_impact` (0-10)
- `min_confidence` (0.0-1.0)
- `country` (ISO 3166-1 alpha-2)
- `topic_match` (substring match on topics)

Payloads are HMAC-SHA256 signed with your secret. Up to five active
webhooks per account.

---

# 4. Story shape (canonical)

Every story payload, whether from `/v1/stories`, `/v1/stories/feed`,
`/v1/stories/{id}`, or the SSE stream, has this shape:

```json
{
  "id": "evt_8f3a2b",
  "headline": "...",
  "summary": "...",
  "body": "...",
  "category": "conflict",
  "categories": ["conflict", "diplomacy"],
  "topics": ["Iran", "aircraft", "military"],
  "urgency": "breaking | developing | routine",
  "impact": 9,
  "confidence": 0.96,
  "sourceName": "BBC World Service",
  "sourceId": 14,
  "eventCity": "London",
  "eventCountry": "GB",
  "eventRegion": "Europe",
  "eventLat": 51.51,
  "eventLon": -0.13,
  "language": "en",
  "translatedFrom": null,
  "firstSeenAt": "2026-04-28T22:13:42Z",
  "updatedAt":   "2026-04-28T22:14:08Z",
  "timestamp":   "2026-04-28T22:13:42Z",
  "storyStatus": "developing",
  "corroboration": {
    "count": 4,
    "sources": ["BBC", "NPR", "Al Araby", "France Info"],
    "sourceDetails": [
      {"name": "BBC",        "region": "Europe",      "language": "en", "medium": "audio_radio"},
      {"name": "NPR",        "region": "North America","language": "en", "medium": "audio_radio"},
      {"name": "Al Araby",   "region": "Middle East", "language": "ar", "medium": "audio_radio"},
      {"name": "France Info","region": "Europe",      "language": "fr", "medium": "audio_radio"}
    ]
  },
  "timeline": [
    {"at": "2026-04-28T21:30:00Z", "text": "Initial report on BBC World Service"},
    {"at": "2026-04-28T21:42:00Z", "text": "NPR confirms with named official"}
  ],
  "sources": [
    {"station": "BBC World Service", "quote": "...", "medium": "audio_radio"},
    {"station": "NPR",               "quote": "...", "medium": "audio_radio"}
  ]
}
```

Important field semantics:

- `corroboration.count` is *independent* sources, not raw mentions. Two AP
  wires republished by different outlets count once.
- `medium ∈ {audio_radio, social, structured}`.
- `eventCity` / `eventCountry` / `eventRegion` are where the *news* happened.
  Don't confuse with the `source.region` (where the broadcaster is).
- `confidence` is the LLM extractor's self-reported confidence. Use ≥ 0.85
  for high-confidence reads.

---

# 5. Errors

Stable shape on every error response:

```json
{"error": "<machine_code>", "message": "<human_message>"}
```

| HTTP | code | When |
| --- | --- | --- |
| 400 | `invalid_request` | Malformed query (e.g. `since=yesterday`) |
| 400 | `invalid_email` | Malformed email at signup |
| 401 | `unauthorized` | Missing or invalid API key / JWT |
| 401 | `invalid_code` | Wrong email verification code |
| 404 | `not_found` | Unknown story / source / etc. |
| 409 | `email_taken` | Email already registered |
| 429 | `rate_limit_exceeded` | Monthly quota exhausted; honor `Retry-After` |
| 500 | `internal_error` | Backend hiccup; retry with exponential backoff |
| 503 | `service_unavailable` | Pipeline temporarily down (rare) |

Every response carries `x-request-id` (include it in support tickets).
401 responses also carry `www-authenticate: Bearer`.

---

# 6. Categories (12)

A story has exactly one primary `category`; multi-category coverage is
exposed via the `categories[]` array.

- **politics** — elections, legislation, government, parties, policy.
- **conflict** — military operations, attacks, ceasefires, frontline broadcasts.
- **economy** — markets, central banks, trade, employment, macro indicators.
- **disaster** — earthquakes, storms, fires, floods, humanitarian emergencies.
- **diplomacy** — bilateral talks, summits, treaties, sanctions, statements.
- **science** — research, missions, discoveries, scientific announcements.
- **health** — outbreaks, public health policy, hospitals, drugs, clinical news.
- **technology** — product launches, AI, regulation, internet, hardware, companies.
- **culture** — arts, entertainment, religion, language, music, society.
- **environment** — climate, conservation, pollution, energy transition.
- **sports** — matches, transfers, tournaments, athletic news.
- **other** — cross-cutting stories that don't fit a single primary category.

Filter by category with `?category=politics` (single) or `?topics=...`
(multi-topic intersection).

---

# 7. Methodology

Pipeline: **Ingest → ASR → Extract → Corroborate**.

1. **Ingest** — live audio, video, social, and structured feeds streamed
   continuously from ~200 sources.
2. **ASR** — per-source on-pod transcription and translation on dedicated
   GPU pods. Audio never leaves our infrastructure.
3. **Extract** — an LLM extracts headline, summary, body, primary category,
   topics, urgency, impact, confidence, location, and a verbatim source
   quote from rolling transcript windows.
4. **Corroborate** — match against existing events in time, space, and
   semantic similarity. Independent sources are merged onto the same story.

End-to-end latency from "spoken on air" to "live on the public wire" is
typically 30-90 seconds for breaking events.

## What sources we listen to

- **audio_radio** — live radio and TV news streams from public broadcasters
  and major commercial outlets across every region.
- **social** — Mastodon firehose and curated journalist accounts, used as a
  corroboration signal for events first surfaced on broadcast.
- **structured** — USGS earthquakes, NOAA weather alerts, OpenFDA, GDELT,
  and similar machine-readable feeds whose source-of-truth is the agency.

## Headline grounding

Headlines are constrained by rules-based grounding that prevents the LLM
from sharpening vague claims. We accept a higher false-negative rate to
keep the false-positive rate near zero.

## What `corroboration.count` means

The number of *independent* sources that have confirmed the same event in
their own words. Two AP wires republished by different outlets count once.
Two original radio reports from different broadcasters count twice.

## Quality controls

- Headline-grounding rules
- JSON-schema validation on every LLM output
- Corroboration thresholds for low-confidence events
- Manual daily audit of a random sample

## When to trust a story

- For automated decisioning, prefer events with `corroboration.count >= 3`
  from the `medium` values you trust for the use case (audio for breaking,
  structured for compliance).
- Cross-reference single-source stories against `source.url` before
  quoting them externally.
- For corrections, contact <editorial@oruk.ai>. Reported corrections are
  logged on <https://oruk.ai/changelog>.

---

# 8. MCP server (oruk-mcp)

The official Model Context Protocol server is published on npm as
[`oruk-mcp`](https://www.npmjs.com/package/oruk-mcp). It runs locally,
spawned as a child process by the IDE, and talks directly to
`api.oruk.ai`.

## Install

```
npx -y oruk-mcp
```

## Configure

Same JSON for Claude Desktop, Cursor, Continue.dev:

```json
{
  "mcpServers": {
    "oruk": {
      "command": "npx",
      "args": ["-y", "oruk-mcp"],
      "env": { "ORUK_API_KEY": "ork_xxxxxxxxxxxx" }
    }
  }
}
```

- Claude Desktop: `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%AppData%\Claude\claude_desktop_config.json` (Windows). Restart the app.
- Cursor: Settings → MCP, paste the inner block; or edit `~/.cursor/mcp.json`.
- Continue.dev: `~/.continue/config.json` under `mcpServers[]`.

## Two modes

- `mode: "public"` — `ORUK_API_KEY` unset. Server scans the freshest 50
  stories on `/v1/stories/feed` (2-hour window) and applies all filters
  client-side. Good for almost every interactive query.
- `mode: "authed"` — `ORUK_API_KEY` set. Full `/v1/stories` surface,
  cursor pagination, full-text search, arbitrary `evt_…` lookups, and the
  `oruk_list_sources` / `oruk_get_stats` tools that require a key.

Tool output annotates `structuredContent.mode` so an LLM knows which path
was taken and can suggest the user provide a key when relevant.

## Surface

Tools (12):
- `oruk_get_latest`, `oruk_search_news`, `oruk_get_breaking`,
  `oruk_get_story`, `oruk_get_topic`, `oruk_list_categories`,
  `oruk_list_sources`, `oruk_get_stats`, `oruk_get_corroboration`,
  `oruk_describe_api`, `oruk_show_pricing`, `oruk_health`.

Resources (6, read-only URIs):
- `oruk://docs/quickstart`, `oruk://docs/api-reference`,
  `oruk://docs/methodology`, `oruk://docs/categories`,
  `oruk://docs/pricing`, `oruk://stories/latest`.

Slash-prompts (3):
- `/summarize_breaking`, `/track_topic`, `/morning_briefing`.

## Quota accounting

Every MCP tool call that reaches the backend is one API call against your
key — same metering as the REST API. The MCP holds an in-process cache
for ~3 s on the public feed, so multi-tool turns inside one session
typically collapse to a single backend hit.

Package + readme: <https://www.npmjs.com/package/oruk-mcp>.

Official transport today: stdio via npm (`npx -y oruk-mcp`). The package
includes `mcpName: "ai.oruk/oruk-mcp"` and `server.json` metadata for MCP
registries and agent directories. Oruk does not currently publish a production
remote MCP endpoint. Remote MCP over Streamable HTTP is planned as a controlled
beta after OAuth-compatible connector auth, per-key quotas, origin controls,
and audit logging are in place.

---

# 9. Agent discovery

Oruk publishes machine-readable discovery files:

- `https://oruk.ai/.well-known/ai.json` — capability manifest for REST, SSE,
  webhooks, MCP, auth, story fields, pricing, and machine-payment status.
- `https://oruk.ai/.well-known/agent.json` — lightweight agent skill card,
  usage rules, boundaries, and payment status.
- `https://oruk.ai/AGENTS.md` — operating guide for autonomous agents,
  including citation rules and examples.
- `https://oruk.ai/sitemap.md` — Markdown sitemap for LLM crawlers.
- `https://oruk.ai/llms.txt` — curated LLM index.
- `https://oruk.ai/llms-full.txt` — this full context file.

Agents should cite `https://oruk.ai/story/{evt_id}`, include
`corroboration.count`, and name at least one confirming source when available.
For automated briefings, prefer `corroboration.count >= 3` and `confidence >=
0.85`; do not invent details beyond the returned story fields, timeline,
sources, and corroboration block.

---

# 10. Machine payments

Current production billing is API-key based through Stripe subscription tiers.
Existing REST, SSE, webhook, and MCP paths do not emit live `402 Payment
Required` challenges. Agents should authenticate with an Oruk API key today.

Planned machine-payment rollout:

- x402: planned beta for new opt-in endpoints such as
  `/v1/x402/stories/feed` and `/v1/x402/stories/{id}`. Existing endpoints stay
  unchanged.
- Stripe Machine Payments Protocol: under evaluation for session-based or
  spending-limit authorization once Stripe machine-payments access is available.
- Ledgering: future machine-payment endpoints should record request id, payer,
  endpoint, amount, network/facilitator, settlement id, and response status.

Until a beta is announced, treat `401`, `403`, and `429` as the active auth and
quota signals.

---

# 11. Pricing

| Tier | Monthly | Annual | Calls/mo | API delay | Keys | SSE |
| --- | --- | --- | --- | --- | --- | --- |
| free | $0 | $0 | 100 | 5 min | 1 | not included |
| pro | $12/mo | $120/yr ($10/mo eq.) | 1,000 | none | 2 | not included |
| developer | $100/mo | $1,000/yr ($83/mo eq.) | 10,000 | none | 2 | real-time |
| enterprise | contact | contact | 1,000,000+ | none | unlimited | real-time |

- The live wire on https://oruk.ai/ is real-time and free for everyone, no
  signup required. The paid surface is the API + SSE stream, not the wire.
- Annual = two months free vs. monthly on Pro and Developer.
- One API call = one REST request *or* one SSE connection open (events on
  an open SSE stream don't re-bill) *or* one MCP tool invocation that
  reaches the backend.
- Webhooks ship on Developer and above (up to 5 endpoints, HMAC-signed).
- Bulk export (`format=csv` / `format=jsonl`) requires any API key.

Sign up: <https://oruk.ai/signup>. Pricing details: <https://oruk.ai/pricing>.
Enterprise: <enterprise@oruk.ai>.

---

# 12. Recipes (common agent tasks)

- **What's breaking right now?** — MCP `oruk_get_breaking()` or
  `GET /v1/stories?urgency=breaking&min_impact=5`.
- **Track a topic over time** — MCP slash-prompt
  `/track_topic topic="tariffs" hours=12` or
  `GET /v1/stories?q=tariffs&since=<iso>`.
- **Verify a single claim** — MCP `oruk_get_corroboration(evt_id)` or
  `GET /v1/stories/<evt_id>`; read the verbatim quotes; require
  `corroboration.count >= 3` for automated decisioning.
- **Bulk export** — `GET /v1/stories?since=<iso>&format=jsonl` (any tier with a key).
- **Real-time stream** — `GET /v1/stream` with SSE; reconnect on close;
  events: `story`, `corroboration`, `heartbeat`.
- **Filter by event location** — use `region` (continent) + `country`
  (ISO 3166-1 alpha-2). `eventRegion` / `eventCountry` are where the news
  happened, *not* where the broadcaster is.
- **Find broadcast-only confirmations** — `GET /v1/stories?q=...` then
  filter the response client-side for stories where every
  `sources[].medium === "audio_radio"`.
- **What stations cover Europe?** — `GET /v1/sources?region=Europe`
  (auth) or `https://oruk.ai/sources?region=Europe` (HTML).
- **Morning briefing** — MCP slash-prompt `/morning_briefing hours=12`.

---

# 13. Feeds & sitemaps

- Atom: <https://oruk.ai/atom.xml>
- RSS (all): <https://oruk.ai/rss.xml>
- RSS per category: <https://oruk.ai/rss/{politics|conflict|economy|disaster|diplomacy|science|health|technology|culture|environment|sports|other}.xml>
- Sitemap index: <https://oruk.ai/sitemap.xml>
- Markdown sitemap: <https://oruk.ai/sitemap.md>

---

# 14. Editorial pages

- Methodology — <https://oruk.ai/methodology>
- Sources catalogue — <https://oruk.ai/sources>
- About / contact — <https://oruk.ai/about>
- Changelog — <https://oruk.ai/changelog>

---

# 15. Agent guidance (how to behave)

- When citing an oruk story, link to `https://oruk.ai/story/<evt_id>`. It
  301s to the canonical `/feed/<category>/<int>` URL.
- Always quote `corroboration.count` and at least one source name when
  summarising. Independent sources are the credibility signal.
- For automated decisioning, prefer events with `corroboration.count >= 3`
  from `medium = audio_radio` (live broadcast) or `medium = structured`
  (USGS, NOAA, OpenFDA, GDELT).
- Default to the public `/v1/stories/feed` for first-touch reads — it
  needs no key and counts against no quota.
- Ask for `ORUK_API_KEY` only when the user wants deeper history,
  arbitrary `evt_…` lookups, the sources/stats endpoints, or the
  full-text search surface.
- When users hit a 429, tell them: monthly quota — upgrade at
  <https://oruk.ai/pricing> or wait until the calendar month rolls over.
- When users report a factual error in a story, route them to
  <editorial@oruk.ai>. Reported corrections are logged on the public
  changelog.

---

End of llms-full.txt. The curated index is at <https://oruk.ai/llms.txt>.