Apr 28 · 02:00
7103cdb
Source Mode — NotebookLM-equivalent bounded answers
Pin files, URLs, or pasted text per session. Toggle Source Mode on. The twelve verifiers must answer ONLY from the pinned sources, cite [SOURCE N] per claim, and refuse if the answer isn’t present. Constitutional Tribune’s veto power enforces it across the quorum.
Backend in src/sources/store.js (JSONL-backed, 1 MB/source, 64 sources/session). Endpoints: POST /api/sources, POST /api/sources/mode, GET /api/sources?session=<id>. Pipeline integration in src/agent/pipeline.js — sources prepended to user input with hard system constraint.
UI: slide-out panel from the agent header, file/URL/text picker, mode toggle, source list with preview.
ShippedBackendUIMarketing
Apr 28 · 01:30
7103cdb
The cost dial — four tiers, same architecture
User-selectable power level for the 12-persona ensemble. Same governance, same provenance, same verification — just with a different model lineup underneath. Switch via POST /api/cost-tier, dropdown in the agent UI, or per-request override.
- Zero · $0/day forever · Cerebras free tier (Qwen 235B / GPT‑OSS / GLM‑4.7) + Ollama + Moonshot/OpenRouter/Mistral/Together free tiers
- Low · $0.05–0.10/day · Haiku 4.6 + Gemini Flash + free OSS
- Medium · $0.20–0.60/day · current default mix
- Full · $5–15/day · Opus 4.7 + Grok 4 + DeepSeek V4 Pro + Qwen 3.5 397B + Gemini 2.5 Pro + Kimi K2 Thinking + GPT‑4o + Mistral Large — seven distinct labs voting per answer
ShippedArchitectureMarketing
Apr 28 · 01:00
7103cdb
Router fallback diversity + circuit-breaker reset endpoint
When a provider tripped its circuit breaker (Cerebras 429s during a bench, OpenAI rate limits), every persona was falling back to the same Sonnet 4.6. Quorum collapsed onto one model under stress.
- Fallback diversity pool: rotating across Haiku 4.6, Gemini Flash, Sonnet 4.5, Sonnet 4.6 so the 12-persona quorum keeps real diversity even under provider stress.
- Circuit-breaker bug fix: was «reset 5 min after most recent failure» — under bench load every retry reset the timer, breaker stayed open forever. Now measures from
openedAt.
- Admin reset endpoint:
POST /api/admin/router/reset-circuits clears all breakers without bridge restart.
FixReliability
Apr 28 · 00:45
03a6ee6
MBPP+ measured on Krentix — 81.5% pass@1
Full 378-problem EvalPlus MBPP+ run, medium tier. 308 of 378 passed all hidden unit tests in a fresh Python subprocess. 98.7-minute wall time. Of 70 fails, 69 were genuine full-pipeline ensemble misses; 1 was a routing bug. Cleaner signal than HumanEval.
Harness, dataset, and per-problem result file (with stderr tail on every failure) in
github.com/joelrobic-gif/krentix-landing/bench/mbpp. Clone, run, your numbers should match within sampling variance.
MeasuredPublic harness
Apr 27 · 22:30
677f132
HumanEval measured on Krentix — 89.0% pass@1
Full 164-problem OpenAI HumanEval run, medium tier. 146 of 164 passed the dataset’s hidden unit tests. 40.7-minute wall. 18 failures, of which 12 were Krentix routing-layer bugs (speed_path / instinct / math_engine misroutes) and only 4 were real verification ensemble failures.
First public-dataset Krentix score. Harness, dataset, per-problem JSON in
github.com/…/bench/humaneval.
MeasuredPublic harness
Apr 27 · 22:00
9776d76
20-benchmark reference table at /benchmarks
Standing comparison across the 20 most-cited public AI benchmarks. Every cell shows the model’s score as published by the lab or by Artificial Analysis, with footnote citations linking back to the primary source. Krentix appears in every row — gold + score where measured (HumanEval, MBPP+), dashed with explicit reason where not (gated, vision-only, harness pending).
ShippedMarketing
Apr 27 · 20:30
3345d71
Marketing landing — brand and copy lock-in
New design language: oak / gold / Fraunces. Hero now reads «Twelve models. Every frontier. You move the dial.» Removed the pre-existing implausible «100% HLE vs 56.8% Mythos» comparator (Mythos wasn’t a real benchmark; it was a placeholder name from internal planning docs).
- §01 Thesis · §02 Mechanism · §03 Positioning
- §04 Benchmarks · §05 Energy & cost
- §06 The Dial (cost selector) · §07 Source mode
Animation framework: count-up numbers, scroll-reveals, table-row stagger, magnetic CTA, hero variable-font cursor proximity. Full prefers-reduced-motion + noscript fallback.
ShippedMarketing
Apr 27 · 17:30
a4fdf9a
www.krentix.com live on HTTPS
GitHub Pages deployment, custom CNAME, Let’s Encrypt cert covering both apex and www. Initial provisioning stalled in the null state — forced re-issuance via API CNAME-toggle. Cert moved new → authorization_pending → approved in ~90 seconds. https_enforced=true.
ShippedInfra