This example has three times more fibers, and thus also CL-points, than the original one, but it still runs in a reasonable time of ~15s because (1) I hard-coded the matrix-determinant expressions everywhere instead of relying on a slow general purpose function and (2) the batch-processing of the fibers now uses OpenMP in order to put all those multi-cores to work.
(red=vertex contacts, green=facet contacts, blue=edge contacts)
My initial point-ordering scheme based on a complete breadth-first-search at each CL-point is a bit naive and slow (that's not included in the 15s time), so that still needs more work.
Great work..I am really impressed.