Spec Refinery v1.0 — Engineering Partner Proposal

01 — Approach

From a validated prototype to an owned enterprise v1.0.

Spec Refinery already works. Product managers across Hill's and Col-Pal domains have used it to turn a guided conversation into a full, readiness-scored spec across 17+ sections. The job for v1.0 is not to reinvent that — it's to put enterprise foundations underneath it without losing the product, or the speed, that made it land.

We treat this as a disciplined re-platforming, not a rewrite. The validated product surface stays alive throughout. The POC scaffolding is replaced with foundations a Colgate-Palmolive security review will pass. And the whole build is sequenced around that review and GIT's readiness — planned for from day one, never bolted on at the end.

Preserve the asset, and the speed.

The 17-section discovery, Definition-of-Ready scoring, the stakeholder outreach loop and Knowledge Mode are the validated product — they carry forward intact. The fast PM feedback loop is a requirement, not a nicety: hardening happens underneath the workflow, never on top of it.

Harden underneath, not on top.

Swap POC scaffolding — the password gate, GitHub-as-database, Resend — for enterprise foundations: Okta identity, managed persistence, an audit layer, GCP-aligned hosting. PMs keep the surface they already know; the change is structural, not cosmetic.

Design for the review, from day one.

Identity, audit and data-access controls are built first, not last. The Col-Pal security review becomes a confirmation of how the system already works — not a late scramble. GIT readiness is treated as a dependency we sequence around, not a formality.

Dozens of clients SOC 2 Type II · permission-scoped RAG, shipped

Why the speed survives

We're not inventing this on your build. The identity-scoped retrieval, audit trails and optimized retrieval systems v1.0 needs are patterns BetterBrain has shipped many times over — including SOC 2 Type II–compliant, permission-scoped RAG for dozens of clients. Enterprise hardening is well-trodden ground for us, which is exactly why it doesn't cost you the iteration speed that made the prototype work.

The re-platforming map

What changes, and what stays.

Every piece of POC scaffolding has a defined enterprise target. The product logic that PMs validated is left untouched. Nothing here is a guess about scope — it is the migration the brief describes, made concrete.

Identity

Netlify password protection→Okta SSO + role-based access

Persistence

GitHub data-branch model→Managed GCP-aligned persistence

Hosting

Netlify serverless functions→GCP-aligned hosting (e.g. Cloud Run)

AI stack

Anthropic SDK called directly→Provider-abstraction layer · Claude default

Outbound

Resend email→Enterprise notification channel

Audit

None today→Full audit & access log

Knowledge Mode

Raw GitHub PR correction loop→Permissioned, audited correction workflow

Knowledge sources

Single reference spec→RAG over Confluence, Jira, GDocs, Snowflake

The validated product — unchanged

Guided 17+ section discovery Definition-of-Ready scoring Stakeholder outreach loop Knowledge Mode capture React · Vite · Tailwind · shadcn/ui surface

This is the asset. We harden everything around it, not it.

How the hard parts work

Four decisions that carry the build.

Model portability

A swap, not a rebuild.

Spec generation and interview turns call an interface, not a vendor SDK. Claude (Sonnet for generation, Haiku for interview turns) stays the default today. If Hill's confirms the Google direction, Vertex AI / Gemini becomes a configuration-and-evaluation exercise behind that same boundary.

Treats Vertex / Gemini as a likely direction, not a confirmed requirement — exactly as the brief frames it.
De-risks the single largest open question in the scope before it can become a rebuild.
Proven by the model-swap regression suite below, so portability is demonstrable, not asserted.

RAG across sources

Permission-aware retrieval.

The agents draw on domain knowledge spread across Confluence, Jira, Google Docs, Snowflake and whatever else scoping surfaces. Retrieval is built to be access-aware end to end: a user only ever retrieves what their Okta role allows.

Connector design, chunking and freshness strategy fixed at scoping against the real source mix.
A knowledge graph over entities and their relationships — projects, owners, terms — so retrieval can follow links, not just match text.
Agentic retrieval available as an option where one pass isn't enough: the orchestrator plans, runs multiple scoped queries, and assembles the result.
Per-domain knowledge packs let the same engine extend across product domains — see the multi-domain view above.
Every generated claim is traceable to its source passage for the review trail.

Evaluation

Co-owned, and provable.

A base evaluation harness, co-owned with the Tallwave lead. It is how we keep refactor speed without regressing the product, and how we prove a model swap is safe before it ships.

Golden specs with Definition-of-Ready regression — the product can't silently get worse.
Retrieval-quality checks across the connected sources.
A model-swap regression suite so Claude → Gemini is a measured decision, not a leap.

Speed & the PM loop

Protected as a requirement.

The feedback loop that made the prototype work is treated as a first-class constraint, not a casualty of hardening. Enterprise rigour goes in underneath the product, never across the PM's path.

The enterprise pieces — SOC 2 Type II permission-scoped RAG, audit, identity — are patterns we've shipped for dozens of clients, so they're delivery, not discovery.
A fast iteration surface stays intact; PMs are not buried in enterprise process.
Daily cadence in the pod, under the Tallwave engagement lead's technical direction.
Changes ship behind flags so hardening never stalls the PM loop mid-flight.

Sequencing

Built around the review, not in spite of it.

Indicative shape, aligned to the brief's timeline. The security-review spine — identity, audit, data-access — goes in first, so the review confirms a system that already behaves correctly. Full sequence and estimate are set together at scoping.

Phase 0 — Joint scoping

Scope, sequence and estimate, set together.

The first paid activity. With workshop transcripts and the current codebase in hand, we fix scope, sequencing and the source mix jointly with Tallwave — no work priced against a frozen spec.

Phase 1 — Identity & audit foundation

The security-review spine first.

Okta SSO and role-based access replace the password gate. The audit and access log and managed persistence go in early, so access control and traceability are load-bearing from the start of the build.

Phase 2 — Re-platform & portability

Off Netlify and GitHub-as-database.

Move to GCP-aligned hosting and the provider-abstraction layer, with the validated product surface preserved throughout — PMs keep working as the foundation changes underneath them.

Phase 3 — RAG & integrations

Retrieval and the systems of record.

Permission-aware retrieval across Confluence, Jira, Google Docs and Snowflake; read/write integration with Jira, Confluence and GitHub. Coordinated with GIT, who host and bridge — we build to that bridge.

Phase 4 — Eval & clearance

Through the security review, to v1.0.

The evaluation harness lands, and we support the Colgate-Palmolive security review through to clearance. GIT readiness is sequenced as a dependency, not assumed.

02 — Team

People and judgment. The right ones, where it matters.

A small, named pod selected for this build. Abhishek leads as architect — hands-on through delivery, co-owning evaluation with Tallwave. Dima and Darshan are the forward-deployed engineers who carry the product surface, retrieval and RAG patterns BetterBrain has already shipped. Ilona owns QA to the security-reviewed bar a Colgate-Palmolive system requires — with senior specialists brought in as each part of the project needs them.

Day-to-day delivery · engagement, build & quality

Architect-lead Named lead

Abhishek Bhargava

CMU computer science and computational finance; ex-YC, with a commodities-trading background. Owns the technical approach and co-owns evaluation with the Tallwave lead — a hands-on architect through delivery, and the main point of contact for roadmap and priorities.

CMU CS · ex-YCProvider-abstraction designEval co-owner

Forward-deployed engineer

Dima Kyrychuk

Owns the product surface and UX — the mini-app interfaces and the fast editing workspace that keep the PM loop quick. Fluent in the React / Vite / Tailwind / shadcn stack Spec Refinery already runs on, so iteration speed survives the re-platforming.

Product surface & UXReact / Vite / shadcnIteration speed

Forward-deployed engineer

Darshan Vanol

Builds the knowledge layer, retrieval and app backends directly in your environment. Carries BetterBrain's permission-scoped RAG and knowledge-graph patterns from prior builds — paired with Abhishek on retrieval.

Knowledge layer & retrievalPermission-scoped RAGApp backends

QA Lead

Ilona Litvinova

Integration and regression testing, evals and UX quality. Builds the evaluation harness — golden specs, Definition-of-Ready regression, model-swap regression — so the product and portability are provable to the security bar.

Eval harnessRegression testingDoR & model-swap evals

Relevant specialists · matched to the work

Alex Brogan

Commercial & exit-strategy · flex

Ex-Goldman investment banking; runs finance-related project initiatives like cost-containment tooling. Flexes in on commercial framing and the business case.

Michael Boyer

CIO advisor · IT governance · flex

Enterprise IT, security and compliance. Flexes in for the Colgate-Palmolive security review, IT governance and GIT coordination as the review ramps.

Retrieval & knowledge layer

Darshan + Abhishek

Security & compliance

Abhishek + Michael

One engagement, many disciplines — depth wherever the build needs it.

Abhishek, Dima and Darshan are the ~3 FTE build core the brief calls for — a named architect-lead and two engineers with production LLM, RAG and integration experience. Ilona owns the quality and evaluation bar; Alex and Michael flex in as the security review, commercial and governance phases surge.

Edit any bio, chip, or pairing to taste — the structure holds.

03 — Depth behind the bench

Meet the experts. Built to deploy, not just advise.

Research-grade AI depth meets operators who've shipped in production — across finance, robotics, enterprise IT and energy. We're backed by leading funds, and by individual investors and advisors from the very companies building the models and data platforms Hill's will run on.

Academia & research

Enterprise & finance

Industry & operations

Core capabilities

Strategy

Roadmap & prioritization · use-case discovery · eval design · governance & risk.

Implementation

Knowledge layer & retrieval · workflow orchestration · context-aware agents · proposal & document automation.

Industry expertise

Finance & project finance · legal & compliance · manufacturing & logistics · energy & infrastructure.

BetterBrain · Prepared for Hill's Pet Nutrition Academic depth and production scars — the same team that builds the foundation ships it in your environment.

04 — Prior work

Comparable builds. Taken through the security bar.

The brief asks for comparable enterprise AI — LLM and RAG applications taken through enterprise security review. Below is a selection of BetterBrain engagements chosen for how directly they map to Spec Refinery v1.0: permission-scoped retrieval, provenance and audit, format-faithful generation, and guided discovery — built for regulated buyers, including a global bank's AI-governance program.

SOC 2 Type 2

certified

7-day

POC turnaround

100+

partner network

Selected builds · matched to what v1.0 needs

Permission-scoped RAG

Client-aware knowledge assistant

A chatbot that indexed data separately for each of a client's 100+ customers, with per-customer access control and a dual-pane UI unifying internal and web results — Vespa semantic + lexical indexing across Slack, ClickUp and Google Drive.

Access-controlled retrieval for hundreds of clients, with data isolated per customer.

→ v1.0: Okta-scoped, per-domain retrieval

Enterprise · US · Workflow management

Provenance & audit

Source attribution — global bank

Connected every data source for a global bank and traced each AI answer back to the specific document it came from — accurate and traceable rather than hallucinated, built for the bank's AI-governance program.

Audit-grade governance for a regulated buyer — the bar compliance teams actually require.

→ v1.0: Citations, provenance & the audit trail

Enterprise · Global bank · Financial services

Many-source retrieval

Enterprise search across CRM, Slack & Drive

Best-in-class enterprise search over CRM, Airtable, Slack and Google Drive — custom ranking, re-rankers, query expansion and contextual embeddings — to support VC due diligence across heterogeneous sources.

Surfaced long-forgotten information that changed go/no-go decisions.

→ v1.0: Confluence / Jira / Docs / Snowflake retrieval

VC fund · US · Finance

Format-faithful generation

Report generation that learns your format

A system that learns the format and tone of prior reports and drafts new ones from fresh data — holding a consistent structure across every output, with a human in the loop.

On track to save hundreds of finance-team hours a month.

→ v1.0: Spec generation in the Definition-of-Ready format

VC fund · Document automation

Text-to-SQL · self-learning

Self-learning ad-hoc data agent

An agent that writes SQL and Python, plans, reflects and asks clarifying questions — learning from previous queries, human-in-the-loop throughout — over a structured warehouse.

80% less analyst time on ad-hoc requests; 75% faster for non-technical users.

→ v1.0: Snowflake retrieval + the Knowledge Mode loop

Enterprise · US · Insurance

Guided discovery

Automated workflow-discovery interviewer

A structured discovery agent that interviews practitioners, surfaces the workflows worth automating, and produces an opportunity map — discovery that would take weeks of consulting done in days.

Production-shape discovery at scale — weeks of interviews compressed to days.

→ v1.0: The guided discovery interview itself

Cross-vertical · discovery accelerator

Growth-led ROI · what these builds do to a P&L

≈ $9M

annual benefit

Engineering velocity

80% automation of QA cycles — ~$3.6M from freed capacity, plus ~$5M as ~24 engineers redirect to product.

F500 · ~$20B+ revenue

$5M

annual · $2.5–10.1M sensitivity

Revenue retention

3× recall lift on churn (7% → 21%) — about 56K subscribers retained a year.

~2M-subscriber national chain

$3.4M+

annual benefit

Sales conversion

Coaching patterns surfaced at scale — ~$2.4M time freed plus ~$1M from a 3% conversion uplift.

300-rep sales team

$1.7M+

annual · ~90% growth-led

Margin expansion

Cost attribution & automated quoting — margin projected to more than double (4% → 10%).

$25M digital media publisher

Outcomes from specific BetterBrain engagements — evidence of impact, not a projection for this build.

Where we work · selected references

Strategic partnership · F500

Equinix

World's largest datacenter operator. Shared AI tooling for their Global Solutions Architects team, scoping and closing enterprise deals.

Strategic partnership · $1B-funded

Tenstorrent

Leading NVIDIA challenger in AI. Uses BetterBrain's audit capabilities across engagements.

Enterprise · global bank

AI governance & attribution

Source-attribution platform tracing every AI answer to its document across all connected data.

F500 delivery · ~$20B+ revenue

Engineering QA automation

Automated manual QA cycles across the engineering org, freeing skilled engineers for product velocity.

Commercial · EU distributor

Commercial AI assistant

Plain-language queries over live SKU, inventory and customer pricing. In production daily.

Startup / consultancy · US

Blueprint

Natural-language SQL over billions of rows with a self-learning loop — $600K+ revenue in six months.

Why this is the right track record

Every pattern Spec Refinery v1.0 depends on — permission-scoped retrieval, provenance and audit, retrieval across many enterprise sources, format-faithful generation, and guided discovery — is something BetterBrain has already shipped, much of it for regulated buyers and a global bank's AI-governance program, by a SOC 2 Type 2–certified team. The hard parts of this build are delivery for us, not discovery.

Drawn from BetterBrain's case studies and reference materials. Swap in named clients or add logos where NDAs allow.

05 — Commercial model

A model, not a frozen bid.

The brief is explicit: propose a model, with joint scoping before any fixed number, and share risk rather than pad a bid. That's how we'd price this — a transparent monthly pod rate, sized as a share of the overall program, with the firm number set together at scoping.

$30–40k

per month · the ~3-FTE build pod

~30–40%

of total program value — aligned, not padded

Scope first

joint scoping sets the number, not us

How it works · scope → build → expand

Step 1 — Joint scoping

The first paid activity, set together.

A short, paid scoping sprint with Tallwave — workshop transcripts and the current codebase in hand — that fixes scope, sequence and estimate jointly. No build number is locked before this.

Step 2 — Build pod

Billed monthly · ~$30–40k.

A time-boxed monthly rate for the pod actually deployed — Abhishek leading, Dima and Darshan building, Ilona on QA. Transparent and adjustable as scope firms up, not a frozen fixed bid against a spec we were told not to freeze.

Step 3 — Sized to the engagement

Roughly a third of program value.

Pricing is calibrated as a share — about 30–40% — of the overall program economics, given Tallwave's team and the build's shape, so our incentive is the success of the whole engagement rather than a padded line item. This is the risk-share the brief asks for.

Step 4 — Expansion

Scoped with you, as v1.0 lands.

As the build delivers, we help Tallwave identify and scope areas for expansion — additional product domains, new agents, deeper integrations — priced the same transparent way.

Step 5 — Longer-term partnership

A later conversation, not a precondition.

Once v1.0 is delivered and owned by Hill's, there's room to discuss an ongoing partnership — scale-out across domains, support, new agents. Worth exploring later; this proposal stands on the build alone.

What you can count on

Joint scoping first

No number is locked before scope is set together — the brief's first paid activity is ours too.

A transparent monthly rate

You pay for the pod you can see, billed monthly — not a padded fixed bid against a frozen spec.

Risk-shared & aligned

Priced as a share of program value, so we win when the whole engagement does — the commercial fit Colgate is looking for.

Owned, with no lock-in

Hill's owns v1.0 outright — any cloud, any model provider, nothing held hostage.

In short

A ~$30–40k/month pod for the build phase, sized as roughly a third of the overall program and structured to share risk — with the firm number set jointly at scoping, expansion scoped as we go, and a longer-term partnership open to discuss once Hill's owns v1.0.

Final figures and billing cadence confirmed at the joint scoping session.

06 — Assumptions & risks

What we'd need, and where the risks are.

The brief asks what we'd need from each party and where the risks sit. Here's what we're assuming, what we'd need from Tallwave, Hill's and GIT, and the main risks — each paired with how the approach already handles it.

Assumptions we're building on

Joint scoping is the first paid activity, and we get the workshop transcripts and current Spec Refinery codebase at kickoff — scope, sequence and estimate set together, not pre-frozen.
Tallwave holds product direction, adoption and the Hill's PM relationship; we build under that direction and co-own evaluation with the Tallwave lead.
GIT hosts and provides the integration bridge — they don't build; we coordinate to that bridge rather than hand it over.
Vertex / Gemini is a likely direction to design for, not a confirmed day-one rewrite; the model decision is Hill's to make.

What we'd need · from each party

From Tallwave

Tallwave

Workshop transcripts and the current codebase at scoping, plus POC deployment details — Netlify, the GitHub data-branch store, Resend, API tokens — for a clean migration.
A single product-decision contact, and the daily-cadence engagement lead who supervises technical direction.
Convening the Hill's PMs and owning adoption and the PM relationship.

From Hill's

Hill's

Named PMs with committed time for the discovery and feedback loop — the speed depends on it.
The decisions only Hill's can make: the Vertex / Gemini direction, the in-scope knowledge sources and their priority, and the roles / RBAC model.
Scoped access to (or representative samples of) Confluence, Jira, Google Docs and Snowflake for RAG design.
A security-review contact, the review requirements early, and clarity on who owns sign-off.

From GIT

GIT

A committed readiness timeline for GCP-aligned hosting and the integration bridge — the hard sequencing dependency.
Okta / SSO configuration and RBAC integration support.
Access to the target GCP (or staging) environment and the Jira / Confluence / GitHub endpoints.
The security-review process and the specific controls required, up front.

Main risks · ranked, with mitigations

Security review timing & scope

High

The Colgate-Palmolive review governs what ships and when; requirements surfaced late mean rework.

MitigationGet the checklist early, build identity, audit and data-access first, and treat the review as a dependency rather than a formality.

GIT readiness

High

If hosting or the integration bridge slips, deployment and integration stall.

MitigationSequence around it, secure a committed timeline, and build to interfaces with stubs and staging so the critical path isn't blocked while we wait.

Model portability / Gemini uncertainty

Medium

A mid-build mandate to move to Vertex / Gemini, or a lingering decision. Worth naming: model parity on spec-generation and interview quality isn't guaranteed, so a swap may need prompt and tuning work.

MitigationA provider-abstraction layer from day one plus a model-swap eval suite — config and eval, not a rebuild — so the eval catches any regression before it ships.

Knowledge-source access & open-ended scope

Medium

Access, permissions and data quality across Confluence, Jira, Docs and Snowflake — plus the brief's open-ended “any other source” — can expand RAG scope.

MitigationFix the source mix and priority at scoping, use permission-aware retrieval, and phase connectors.

Preserving speed under enterprise process

Medium

Hardening and review can quietly throttle the PM loop the brief calls a requirement.

MitigationKeep a fast iteration surface, ship behind flags, and don't route PMs through enterprise process.

PM availability

Medium

If Hill's PMs aren't reliably in the loop, validation and velocity slip.

MitigationNamed PMs and a committed cadence agreed at scoping.

Three-party boundary drift & scope creep

Medium

Unclear ownership or handoff friction across Tallwave, Hill's and GIT.

MitigationJoint scoping fixes scope, a clear RACI, and the explicit GIT boundary — host and bridge, not builder.

Data sensitivity & compliance

High

Colgate-Palmolive data handling and PII across connected sources.

MitigationA PII and sensitivity guard, RBAC-scoped retrieval, full audit, and no client data used to train external models.

In short

None of these is a surprise. The two biggest — the security review and GIT readiness — are exactly why the approach builds identity and audit first and sequences around GIT. Two lighter items we'd raise live rather than belabor here: decision latency across three parties, mitigated with a single decision contact per party; and that the brief's indicative timeline stays contingent on scoping and GIT readiness.

Spec Refinery v1.0.

From a validated prototype to an owned enterprise v1.0.

Preserve the asset, and the speed.

Harden underneath, not on top.

Design for the review, from day one.

The whole picture, end to end.

What changes, and what stays.

Four decisions that carry the build.

A swap, not a rebuild.

Permission-aware retrieval.

Co-owned, and provable.

Protected as a requirement.

Built around the review, not in spite of it.

Scope, sequence and estimate, set together.

The security-review spine first.

Off Netlify and GitHub-as-database.

Retrieval and the systems of record.

Through the security review, to v1.0.

The approach, against how you're choosing.

Technical judgment

Relevant track record

Fit with the working model

Commercial fit

People and judgment. The right ones, where it matters.

Alex Brogan

Michael Boyer

Meet the experts. Built to deploy, not just advise.

Comparable builds. Taken through the security bar.

Client-aware knowledge assistant

Source attribution — global bank

Enterprise search across CRM, Slack & Drive

Report generation that learns your format

Self-learning ad-hoc data agent

Automated workflow-discovery interviewer

A model, not a frozen bid.

The first paid activity, set together.

Billed monthly · ~$30–40k.

Roughly a third of program value.

Scoped with you, as v1.0 lands.

A later conversation, not a precondition.

Joint scoping first

A transparent monthly rate

Risk-shared & aligned

Owned, with no lock-in

What we'd need, and where the risks are.

Security review timing & scope

GIT readiness

Model portability / Gemini uncertainty

Knowledge-source access & open-ended scope

Preserving speed under enterprise process

PM availability

Three-party boundary drift & scope creep

Data sensitivity & compliance