Wetstone logowetstone.
Now in private beta

The proving ground
for the AI era.

// A practice platform for the skill that actually matters now: judgment over AI-generated system and code design.

1,247 engineers practicing·500+ problems shipped·Top 1% = Wetstone Elite
// 01 · The shift

AI writes the code.
Judgment is the bottleneck.

The classical coding interview is being defeated in real time. Any candidate with an editor and a model can pass the algorithm screen, the take-home, and most live coding rounds.

What companies actually need to know is whether a candidate can read a thousand lines of model-generated code, spot the system design choice that won't survive scale, find the load-bearing assumption that's wrong, and write a spec the next model won't get wrong again. That's what we test.

// 02 · What you'll practice

Four problem types. One rating.

BUG-HUNT

Find what the model missed.

A subtle bug is hiding in plausible AI-generated code. Spot it, fix it, ship it. Tests will tell you if you're right.

// RAG retriever sorts ascending instead of descending.

SPEC-WRITE

Write specs that hold up.

You're given a fuzzy requirement. Author a spec precise enough that three different models all produce a passing implementation.

// Specify a rate limiter that survives clock drift.

SYSTEM-DESIGN

Judge the architecture the model shipped.

Read a system design generated by a model. Identify where it breaks at scale, what it got wrong about consistency or cost, and rework the boundaries.

// Webhook delivery system that silently drops on retry storms.

BUILD-LOOP

Build a small agent against a behavioral spec.

Wire up tool calls, retries, and stop conditions to satisfy a behavioral contract. We grade behavior, not lines of code.

// Agent that books a meeting through three flaky APIs.

// 03 · How a problem works

One bug. Four steps.

// Read plausible AI-generated code, find the failing assumption, patch it, and see how the top 1% diagnosed it.

retrieve.py
BUG-HUNT · 002
def retrieve_context(query: str, k: int = 5):
embedding = embed(query)
results = vector_store.query(embedding, top_k=k)
docs = [r.document for r in results]
reranked = sorted(docs, key=lambda d: d.score)
return reranked[:k]
# downstream:
context = retrieve_context(user_question)
answer = llm.generate(prompt, context)

// 1,247 attempted · 31% caught it · median 4m 12s

Try a sample →
  1. 01
    Read

    Skim the code and scenario. The bug is plausible by design.

  2. 02
    Diagnose

    Click the suspicious line. Explain the failing assumption in one sentence.

  3. 03
    Fix & submit

    Patch it. Hidden tests run against your change.

  4. 04
    Compare

    See how the top 1% diagnosed it, and how fast.

// 04 · Why now

The skill stack changed. The test didn't.

01

AI is the first author.

Most production code and systems are drafted by a model. The valuable engineer is the one who reads, judges, and corrects the design — not the one who types fastest.

02

The interview signal collapsed.

LeetCode is solved. Take-homes are solved. Live coding is theater. The classical funnel cannot tell a great engineer from a confident prompter.

03

AI-native hiring has no standard.

There's no Codeforces rating, no ELO, no public ladder for the design judgment that now decides who ships. Wetstone is that standard.

// 05 · For hiring teams

Hiring? We replaced
your take-home.

Wetstone gives you a calibrated signal on AI-generated system and code design in one hour. Custom problems tied to your stack, auto-graded by the same harness our top engineers train against.

See the business offering →
candidates · backend role
live
p.shah210492PASS
alex.kr187678PASS
j_morales154264REVIEW
sara.t142151FAIL
ben.dvk138947FAIL

Stop practicing the last war.

// Free forever. Pro unlocks the full library — $19/mo.

Wetstone — The proving ground for the AI era