The proving ground
for the AI era.▊
// A practice platform for the skill that actually matters now: judgment over AI-generated system and code design.
AI writes the code.
Judgment is the bottleneck.
The classical coding interview is being defeated in real time. Any candidate with an editor and a model can pass the algorithm screen, the take-home, and most live coding rounds.
What companies actually need to know is whether a candidate can read a thousand lines of model-generated code, spot the system design choice that won't survive scale, find the load-bearing assumption that's wrong, and write a spec the next model won't get wrong again. That's what we test.
Four problem types. One rating.
Find what the model missed.
A subtle bug is hiding in plausible AI-generated code. Spot it, fix it, ship it. Tests will tell you if you're right.
// RAG retriever sorts ascending instead of descending.
Write specs that hold up.
You're given a fuzzy requirement. Author a spec precise enough that three different models all produce a passing implementation.
// Specify a rate limiter that survives clock drift.
Judge the architecture the model shipped.
Read a system design generated by a model. Identify where it breaks at scale, what it got wrong about consistency or cost, and rework the boundaries.
// Webhook delivery system that silently drops on retry storms.
Build a small agent against a behavioral spec.
Wire up tool calls, retries, and stop conditions to satisfy a behavioral contract. We grade behavior, not lines of code.
// Agent that books a meeting through three flaky APIs.
One bug. Four steps.
// Read plausible AI-generated code, find the failing assumption, patch it, and see how the top 1% diagnosed it.
def retrieve_context(query: str, k: int = 5):embedding = embed(query)results = vector_store.query(embedding, top_k=k)docs = [r.document for r in results]reranked = sorted(docs, key=lambda d: d.score)return reranked[:k]# downstream:context = retrieve_context(user_question)answer = llm.generate(prompt, context)
// 1,247 attempted · 31% caught it · median 4m 12s
Try a sample →- 01Read
Skim the code and scenario. The bug is plausible by design.
- 02Diagnose
Click the suspicious line. Explain the failing assumption in one sentence.
- 03Fix & submit
Patch it. Hidden tests run against your change.
- 04Compare
See how the top 1% diagnosed it, and how fast.
The skill stack changed. The test didn't.
AI is the first author.
Most production code and systems are drafted by a model. The valuable engineer is the one who reads, judges, and corrects the design — not the one who types fastest.
The interview signal collapsed.
LeetCode is solved. Take-homes are solved. Live coding is theater. The classical funnel cannot tell a great engineer from a confident prompter.
AI-native hiring has no standard.
There's no Codeforces rating, no ELO, no public ladder for the design judgment that now decides who ships. Wetstone is that standard.
Hiring? We replaced
your take-home.
Wetstone gives you a calibrated signal on AI-generated system and code design in one hour. Custom problems tied to your stack, auto-graded by the same harness our top engineers train against.
See the business offering →Stop practicing the last war.
// Free forever. Pro unlocks the full library — $19/mo.