DLRM - AshokBhat/ml GitHub Wiki

About

DLRM v1 vs V2

Feature/Aspect DLRMv1 DLRMv2
Use Case Representation More realistic large-scale recommender system benchmark
Model Architecture Uses an interactions layer to combine features Replaces interactions layer with a three-layer DCNv2 cross network
Input Encoding One-hot categorical inputs Multi-hot categorical inputs
Dataset Smaller, less complex dataset Larger, more complex dataset
Hardware Utilization Less demanding on memory bandwidth Requires higher memory bandwidth

MLPerf numbers (DLRMv2 99.99)

Version Hardware Software CPU Cores Offline(samples/s) Server (samples/s)
5.0 1 Node, 2 socket, GR PyTorch Xeon 6787P 86 12397 11788
5.0 1 Node, 2 socket, GR PyTorch Xeon 6980P 128 18686 18117
4.1 1 Node, 2 socket, EMR PyTorch Xeon 8592+ 64 9830 9101
5.0 GH200 1-GPU TensorRT Grace CPU 51970 50070
5.0 L40S 8-GPU TensorRT Xeon 6740E 100,517 94,989

See also