Enrico made the code (if the link doesn’t work search for his name on Research Gate) for his LU factorisation code over GF(2) available online under the GPL. This is an implement of the algorithm described by Anna and him in Fast matrix decomposition in F2 and for which they give timings in that paper (discussed a bit here). I had to fix some includes to make it compile on my box, but nothing major. I can also confirm the impressive performance of their software (I ran testRankComputation).
PS: I managed to catch up in terms of performance up to dimension 4096 x 4096 with M4RI. However, 8192 x 8192 is still slower – at least on my machine – than Enrico’s code. I would suspect that this is down to us not using the cache as effectively as Enrico, possibly because of the different matrix representation (row major) .
n | M4RI | PLE Base | PLE | Enrico |
---|---|---|---|---|
128 | 0.019 ms | 0.070 ms | 0.067 ms | 0.137ms |
256 | 0.146 ms | 0.185 ms | 0.200 ms | 0.287ms |
512 | 0.388 ms | 0.571 ms | 0.580 ms | 0.558ms |
1024 | 1.143 ms | 2.176 ms | 2.198 ms | 1.470ms |
2048 | 4.787 ms | 6.205 ms | 7.073 ms | 5.277ms |
4096 | 0.02619 s | 0.02848 s | 0.03076 s | 0.0288 s |
8192 | 0.26933 s | 0.25350 s | 0.24995 s | 0.1915 s |
16384 | 2.04408 s | 3.17302 s | 1.80076 s | 1.3815 s |
Hi, Martin.
I recently came across your blog and spent some time reading through some posts. I have written (in C++) versions of an LU-Decomposition routine and, most recently, a matrix multiplication routine, so your posts about speeding up those routines were especially interesting to me.
I am planning to write a Cholesky Factorization routine in the near future, so would be interested in any of the latest developments and information regarding that task (although that may not be relevant to your own direction).
In any case, I look forward to reading more of your posts.