gchat Scaling

Explore rough memory requirements for gchat model and training configuration changes. Type exact values or use the sliders to update the estimate immediately.

Estimated total training memory38.66 GiB
Parameters833,866,752
Total FLOPs35.02 EFLOPs (3.50e+19)
Tokens per step32,768
Parameter memory1.55 GiB
Gradient memory1.55 GiB
AdamW state6.21 GiB
Hidden activations2.34 GiB
Attention workspace27.00 GiB
Training examples per step.
Tokens per example.
Tokenizer vocabulary entries.
Transformer blocks.
Attention query heads.
Key/value heads for GQA.
Hidden dimension.
2 = bf16/fp16, 4 = fp32.
2 = bf16/fp16, 4 = fp32.
B
Total tokens in the training dataset, in billions.
Chips used by one model replica; remaining chips form data-parallel replicas.

TPU HBM Fit Grid

Uses the Data parallelism slider as chips per model replica. Each column shows whether that TPU type has enough HBM across one replica at the selected chip count; red cells either need more chips for a replica or have insufficient per-replica HBM.

TPU type
1 chip
2 chips
4 chips
8 chips
16 chips
32 chips
TPU 8i288 GB per chipMax 1,152 chips. Boardfly (Inference)
288 GB1-chip replica HBM
288 GB1-chip replica HBM
288 GB1-chip replica HBM
288 GB1-chip replica HBM
288 GB1-chip replica HBM
288 GB1-chip replica HBM
TPU 8t216 GB per chipMax 9,600 chips. 3D Torus (Training)
216 GB1-chip replica HBM
216 GB1-chip replica HBM
216 GB1-chip replica HBM
216 GB1-chip replica HBM
216 GB1-chip replica HBM
216 GB1-chip replica HBM
TPU v7 (Ironwood)192 GB per chipMax 9,216 chips. 3D Torus
192 GB1-chip replica HBM
192 GB1-chip replica HBM
192 GB1-chip replica HBM
192 GB1-chip replica HBM
192 GB1-chip replica HBM
192 GB1-chip replica HBM
TPU v6e (Trillium)32 GB per chipMax 256 chips. 2D Torus
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
TPU v5p95 GB per chipMax 8,960 chips. 3D Torus
95 GB1-chip replica HBM
95 GB1-chip replica HBM
95 GB1-chip replica HBM
95 GB1-chip replica HBM
95 GB1-chip replica HBM
95 GB1-chip replica HBM
TPU v5e16 GB per chipMax 256 chips. 2D Torus
16 GB1-chip replica HBM
16 GB1-chip replica HBM
16 GB1-chip replica HBM
16 GB1-chip replica HBM
16 GB1-chip replica HBM
16 GB1-chip replica HBM
TPU v432 GB per chipMax 4,096 chips. 3D Torus
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
TPU v332 GB per chipMax 2,048 chips. 2D Torus
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM
32 GB1-chip replica HBM

TPU Training Time Estimate

Uses 6 x parameters x training tokens, divided by BF16 peak FLOPs. Assumes 50% MFU (mean FLOPs utilization).

TPU type
1 chip
2 chips
4 chips
8 chips
16 chips
32 chips
TPU 8i5,050 BF16 peak TFLOPS per chip
3 hr 51 min1 replica
1 hr 56 min2 replicas
0 hr 58 min4 replicas
0 hr 29 min8 replicas
0 hr 14 min16 replicas
0 hr 07 min32 replicas
TPU 8t6,300 BF16 peak TFLOPS per chip
3 hr 05 min1 replica
1 hr 33 min2 replicas
0 hr 46 min4 replicas
0 hr 23 min8 replicas
0 hr 12 min16 replicas
0 hr 06 min32 replicas
TPU v7 (Ironwood)2,307 BF16 peak TFLOPS per chip
8 hr 26 min1 replica
4 hr 13 min2 replicas
2 hr 07 min4 replicas
1 hr 03 min8 replicas
0 hr 32 min16 replicas
0 hr 16 min32 replicas
TPU v6e (Trillium)918 BF16 peak TFLOPS per chip
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
TPU v5p459 BF16 peak TFLOPS per chip
42 hr 23 min1 replica
21 hr 12 min2 replicas
10 hr 36 min4 replicas
5 hr 18 min8 replicas
2 hr 39 min16 replicas
1 hr 19 min32 replicas
TPU v5e197 BF16 peak TFLOPS per chip
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
TPU v4275 BF16 peak TFLOPS per chip
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
TPU v3123 BF16 peak TFLOPS per chip
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM
won't fit in HBMinsufficient HBM