ECE 4822: Matrix Multiplication Benchmarks


The table below represents some benchmarks for a 1000x1000 32-bit floating-point multiplication run on nedc_000 (Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz). All cells contain the amount of CPU used in secs, averaged over a large number of iterations. The hyperlink sends you to the code used to generate this benchmark.

Participant Array Pointers Fast Boost STL Eigen GPU #1 GPU #2
Acosta Del Vecchio, Miguel 1.145 1.267 1.152 1.411 0.781 N/A 12.072 0.006000
Aliman, Yamen 1.171 1.263 0.562 0.218 1.120 N/A 18.140 0.000534
Bici, Daniel 1.092 1.399 1.072 0.747 1.467 0.087 12.753 0.032000
Khantan, Mehdi 1.221 1.265 3.240 0.663 1.241 N/A 34.000 0.000860
Notes:
  1. The times reported under GPU #1 are a GPU version of the C code reported under "Pointers". This is a non-parallel version that does not exploit parallelism on the GPU. The GPU used was an NVIDIA A40 (hosted on nedc_012, which has two AMD EPYC 7413 24-Core processors).

  2. The times reported under GPU #2 are for a single GPU version that uses an optimal number of blocks and threads (shown in parentheses). The GPU used was an NVIDIA A40 (hosted on nedc_012, which has two AMD EPYC 7413 24-Core processors).