╔════════════════════════════════════════════════════════════════════════════╗
║ ║
║ ██╗███╗ ██╗███████╗███████╗██████╗ ███████╗███╗ ██╗ ██████╗███████╗ ║
║ ██║████╗ ██║██╔════╝██╔════╝██╔══██╗██╔════╝████╗ ██║██╔════╝██╔════╝ ║
║ ██║██╔██╗ ██║█████╗ █████╗ ██████╔╝█████╗ ██╔██╗ ██║██║ █████╗ ║
║ ██║██║╚██╗██║██╔══╝ ██╔══╝ ██╔══██╗██╔══╝ ██║╚██╗██║██║ ██╔══╝ ║
║ ██║██║ ╚████║██║ ███████╗██║ ██║███████╗██║ ╚████║╚██████╗███████╗ ║
║ ╚═╝╚═╝ ╚═══╝╚═╝ ╚══════╝╚═╝ ╚═╝╚══════╝╚═╝ ╚═══╝ ╚═════╝╚══════╝ ║
║ ║
║ · I N C O R P O R A T E D · EST 2026 ║
║ ║
╚════════════════════════════════════════════════════════════════════════════╝
NVIDIA A100
HBM 80GB · $8k · workhorse
NVIDIA H100
HBM 80GB · $25k · premium
30 DAY RUN
1 datacenter · don't go bankrupt
· don't poison the town
> WHAT IS THIS?
You are running a small AI datacenter.
Every time someone asks an AI a question - a coding question, a search, an image generation - a computer somewhere actually does the work. That computer is a GPU (graphics processing unit), and one full "ask" run through the model is called an inference.
Your job: own enough GPUs to handle the day's inference demand, price them right, and keep the power cheap without poisoning the town.
The prompt is broken into tokens (~750 words per 1000 tokens).
The GPU loads the model (tens to hundreds of GB) into its HBM memory.
It runs a forward pass - billions of matrix multiplications.
It streams tokens back to the user, one at a time.
That single response = roughly 1 "inference" in this game.
Why the tiers matter:
A100 - older but cheap and plentiful. Can't fit the biggest models. Good for fine-tuned 7B-class workloads.
H100 - current production standard. Most paying customers will only sign for H100-or-better.
B200 - newest, fastest. Frontier-research workloads need these or they go elsewhere.
The catch: GPUs burn a lot of electricity. A rack of B200s pulls more power than a small neighborhood. Where you get that power is the real game - and the regulators are watching.
Hit BACK and pick a difficulty. Easy = short campaign with more starting cash; Hard = 60 days, you're under-capitalized.
DAY 1/30
$50,000
REP ▓▓▓░░
A 0H 0B 0
GOAL: First profitable day
TEXAS
▶▶[loading market intel…]
FX -
cart: $0··
·$/infmkt $5.00
▶▶ BREAKING · DAY ?
BOOKS
NET 0
margin 0%
PLANT
QUEUE
CASH $0
NEW RECORD
> LEADERBOARD - TOP 100
> HOW TO PLAY
You run an AI compute datacenter for 30 days. Make money. Don't go bankrupt. Don't poison the town.
Each day:
Read the news ticker. Model releases drive demand spikes.
Buy GPUs (click +A100 / +H100 / +B200). Each one fills a slot on the floor.
Pick power: diesel (cheap, smoke fines), grid (medium, blackouts), solar (clean, cheapest per kWh but signed by day 5).
Set your $/inference. Charge too much, customers leave. Too little, you bleed.
Hit RUN DAY. Watch your floor run. Adjust tomorrow.
Score = ending cash + (reputation × $1k) − fines paid. Top of the leaderboard wins bragging rights.
Tip: premium customers (H100 / B200 buyers) won't sign if reputation drops below 30. Stay clean enough for them to take your calls.
Keyboard:
Space / Enter run the day
AHB buy A100 / H100 / B200 (hold Shift to sell)
123 switch fuel: diesel / grid / solar
↑↓ nudge price $/inf by ±$0.50
X clear cart · M marketing · S sound
L leaderboard · ? this screen · Esc close overlays
> ACADEMY
Six lessons. Each one teaches a piece of the real AI-compute economy
you're playing. You don't have to read these to win - but if you want
to know why an H100 costs $32,000 and what an "NVL72 rack" actually is,
start at the top.
01 Why GPU memory is the expensive part (HBM)
Imagine a checkout clerk who can scan 100 items per minute. The
customer hands items over one at a time, slowly. The clerk's speed
doesn't matter - the line moves at the customer's pace.
A GPU's compute units are the clerk. They can do trillions of math
operations per second. But the math operates on data - model
weights, in-flight conversation state - and that data lives in memory.
Normal computer memory (DDR) delivers ~100 GB/s. GPU compute can eat
~10 TB/s when running flat out. So the bottleneck is memory bandwidth,
not raw math.
HBM (High Bandwidth Memory) is the fix: stacks of DRAM dies
sitting right next to the GPU chip on the same interposer, wired with
a wide parallel bus. A B200 has ~8 TB/s of HBM bandwidth, about 80×
normal RAM. That's why HBM costs 5× normal DRAM and why a 192 GB B200
is so expensive: the memory is the expensive part.
In this game: each tier's
capacity (A100 280 / H100 1100 / B200 1700 inf/day) is
set mostly by HBM size and bandwidth, not raw FLOPS. When you see
"KV cache full" in a news event, the HBM ran out of room for in-flight
conversations.
02 PUE: why the AC eats half your power bill
Run a 700 W GPU. It dumps 700 W of heat into the room. In a server
closet, that heat piles up fast. You need cooling - fans, chillers,
sometimes water loops - and the cooling itself burns electricity.
PUE (Power Usage Effectiveness) is the ratio: (total facility
electricity) / (electricity actually doing compute). PUE 1.0 would be
magic: cooling, lights, networking all cost zero. In reality:
Iceland pulling outside air straight in: PUE 1.05-1.12 - basically free cooling.
Mild climate, decent build: PUE 1.20.
Texas in August, air-cooled: PUE 1.40-1.50.
A PUE of 1.45 means for every dollar of GPU electricity, you spend
45 cents on cooling. That's why Iceland, Quebec, Wyoming, and northern
Sweden win on operating margin - and Phoenix loses.
In this game: each region carries a
basePUE (TX 1.45, CA 1.20, IS 1.05). Your daily power bill
is fleet kWh × PUE × $/kWh. When a heatwave hits the news ticker, PUE
jumps another 0.35 and your cooling kWh starts bleeding you out.
03 Inference is two different jobs: prefill + decode
When a user asks an LLM "explain quantum entanglement," two phases
run back to back.
Prefill: the model reads the prompt (say 200 tokens) all at
once, in parallel. The answer doesn't depend on the prompt tokens yet,
so they don't have to be processed in order. This phase is
compute-bound - lots of math to do, GPU horsepower available,
batches well across users.
Decode: the model writes the answer one token at a time.
Token N+1 needs tokens 1...N to already exist. No parallelism. Each
new token requires re-reading all the model's weights out of HBM.
This phase is memory-bandwidth-bound - the GPU's math units
sit idle waiting on memory.
So a GPU's "tokens per second" is much slower than its
theoretical FLOPS would suggest. Long answers spend most of their
time in decode, capped by HBM bandwidth, not compute.
In this game: when a news card says
"long-context decode is memory-bound" (day 24), it's telling you the
H100 hits its HBM ceiling before its math ceiling. The B200 (bigger
HBM, faster bus) handles it; the H100 stalls.
04 KV cache: why long conversations get expensive
To write token N+1, the model has to "remember" what tokens 1...N
said. Re-deriving that from scratch every step would be 1000× too
slow. So during decode the model keeps a running scratch-pad called
the KV cache (keys and values from the attention mechanism).
It lives in HBM, next to the model weights.
Rough size: ~0.5 MB of KV cache per token per active conversation
for a 70 B model. A single 1 M-token conversation eats ~500 GB of HBM
by itself. You cannot fit that on one H100 (80 GB HBM). Your options:
Spill the cache to slower memory (slow).
Evict another user's session (drops their chat).
Shard the cache across more GPUs (only works with NVLink).
That's why 1 M-context windows are economically painful: each
long-context user hogs the HBM that would otherwise serve 30 short
users.
In this game: when "Opus 5 ships
1 M-context" hits the news, your H100 capacity is halved that day.
The fix in real life is more/bigger HBM (move to B200) or a connected
cluster (NVL72).
05 Cluster not card: NVLink and NVL72
Frontier models (GPT-6, Opus 5-class, ~1 trillion parameters) don't
fit on one GPU. A B200 has 192 GB HBM; trillion-param weights are
400-800 GB depending on how aggressively you quantize. So you
shard: split the weights across 8 GPUs, each holding 1/8 of
the model.
The catch: at every forward pass, those 8 GPUs have to swap
partial answers with each other. Over normal PCIe (~64 GB/s) they
spend more time waiting on the network than computing. The cluster
is then slower than a single big GPU would be, if a single big GPU
existed.
NVIDIA's answer is NVLink: a direct GPU-to-GPU bus at
~900 GB/s, about 14× PCIe. NVSwitch ties many NVLink lanes
into a mesh. Bundle 72 B200s in one rack with NVSwitch fabric and
you get an NVL72: 72 GPUs behaving like one logical GPU with
13.8 TB of HBM and ~130 PFLOPS. Costs ~$3M per rack and pulls
~120 kW - more than a small neighborhood draws.
In this game: pre-day 15, your B200s
are isolated cards (1700 inf/day each). Day 15 the OEM ships NVL72
and your B200s become 2200 inf/day each. Same chips, new topology,
frontier workloads now run coherently.
06 Are we in a bubble?
In 2024-2026, the hyperscalers (Microsoft, Meta, Google, Amazon,
Oracle, OpenAI's Stargate consortium) committed roughly $500 B
of new datacenter buildout. Annual AI capex is now 2-3× the entire
1999 telecom buildout, in real dollars. Every quarter brings a bigger
number.
The bull case: inference demand is growing exponentially.
Every white-collar workflow eventually has an agent. Every search
query becomes an LLM query. Capacity built today is full by the time
it's online.
The bear case: model improvement is plateauing (the
"frontier wall"). DeepSeek-style efficiency shocks make models 10×
cheaper to run, blunting demand growth. Stargate alone could glut
the market by 2027.
Truth: nobody knows. The capex is real, the demand is real, the
question is timing. If you want to track it, SemiAnalysis runs the
spreadsheet.
In this game: the day-28 finale is a
coin flip. Heads = demand 3× (bull peak). Tails = compute glut,
resale -40% (bear correction). Same fundamentals, different macro
outcome - which is exactly the actual debate.