ECE 4822: Matrix Benchmarks

ECE 4822: Matrix Multiplication Benchmarks

The table below represents some benchmarks for a 1000x1000 32-bit floating-point multiplication run on nedc_000 (Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz). All cells contain the amount of CPU used in secs, averaged over a large number of iterations. The hyperlink sends you to the code used to generate this benchmark.

Participant	Array	Pointers	Fast	Boost	STL	Eigen	GPU #1	GPU #2
Acosta Del Vecchio, Miguel	1.145	1.267	1.152	1.411	0.781	N/A	12.072	0.006000
Aliman, Yamen	1.171	1.263	0.562	0.218	1.120	N/A	18.140	0.000534
Bici, Daniel	1.092	1.399	1.072	0.747	1.467	0.087	12.753	0.032000
Khantan, Mehdi	1.221	1.265	3.240	0.663	1.241	N/A	34.000	0.000860

Notes:

The times reported under GPU #1 are a GPU version of the C code reported under "Pointers". This is a non-parallel version that does not exploit parallelism on the GPU. The GPU used was an NVIDIA A40 (hosted on nedc_012, which has two AMD EPYC 7413 24-Core processors).

The times reported under GPU #2 are for a single GPU version that uses an optimal number of blocks and threads (shown in parentheses). The GPU used was an NVIDIA A40 (hosted on nedc_012, which has two AMD EPYC 7413 24-Core processors).