DLRM - AshokBhat/ml GitHub Wiki

About

Feature/Aspect	DLRMv1	DLRMv2
Use Case Representation		More realistic large-scale recommender system benchmark
Model Architecture	Uses an interactions layer to combine features	Replaces interactions layer with a three-layer DCNv2 cross network
Input Encoding	One-hot categorical inputs	Multi-hot categorical inputs
Dataset	Smaller, less complex dataset	Larger, more complex dataset
Hardware Utilization	Less demanding on memory bandwidth	Requires higher memory bandwidth

Version	Hardware	Software	CPU	Cores	Offline(samples/s)	Server (samples/s)
5.0	1 Node, 2 socket, GR	PyTorch	Xeon 6787P	86	12397	11788
5.0	1 Node, 2 socket, GR	PyTorch	Xeon 6980P	128	18686	18117
4.1	1 Node, 2 socket, EMR	PyTorch	Xeon 8592+	64	9830	9101
5.0	GH200 1-GPU	TensorRT	Grace CPU		51970	50070
5.0	L40S 8-GPU	TensorRT	Xeon 6740E		100,517	94,989