Use tighter bounds and faster filters in orientation and isInCircle p… #1162

tinko92 · 2024-09-14T09:52:22Z

This PR proposes changes to the 2D orientation and incircle predicates. The orientation predicate is changed to use a tighter error bound coefficient (which should reduce the number of calls to the slower more robust computation for degenerate inputs) and a computation of the error bound that produces fewer branches. The error bound is due to https://doi.org/10.1007/s10543-015-0574-9 , this paper also has a discussion on how fewer branches improve performance in the predicate and a proof for why abs(detleft + detright) works here (if the signs are the same, it is equivalent, if the signs of detleft and detright differ, then det would never be smaller in absolute value than the error bound anyway).

For the incircle predicate, this PR proposes to not generally use long double (which increases precision a little but is not robust as the code notes) but to use a similar error bound filter instead (bound computation based on https://doi.org/10.1007/s10543-023-00975-x , I don't know the source for alternative expression of the incircle determinant, I have first seen it in CGAL). The rational for that is higher performance (roughly 4-10% speedup on my machine for perf_voronoi) and that a more robust computation here is probably motivated by preventing infinite loops in Delaunay Triangulation due to false cases of INTERIOR, which should be impossible with this version (except possibly for inputs near the lower end of the representable magnitude of double that produce denormalized results/underflow).

PS: I also have an implementation of exact predicates that I could port to geos to be run after failures of the fast filters if it would be considered beneficial to the library by its maintainers. I don't know enough about its use cases to make a guess about that. Exact predicates guarantee consistency wrt to predicate calls. They would not guarantee consistency between predicates and computed geometries (e.g. predicate might say some polygons overlap but the computed intersection may then be empty after rounding to double coordinates in the output geometry, even if all intermediate computations were exact).

…redicates.

pramsey · 2024-09-19T16:12:31Z

I don't have a voronoi test yet, but my performance suite shows no differences, with one exception: the prepared point-in-poly test seems to be consistently about 10% faster.

tinko92 · 2024-09-20T00:04:16Z

@pramsey Thanks for the feedback. Interesting, point-in-poly may benefit from a faster orientation but I would have expected that to be negligible if its implemented with some ray counting then I'd expect runtime to be dominated by walking over segments that don't intersect the ray and need no orientation check (at least for polygons with many segments). I will try to look into that when I find time.

I would expect the change in the orientation test to be measurable in e.g. convex hull or triangulation without Delaunay requirement and synthetic benchmarks (I ran perf_orientation from the benchmark directory and compared the number of Iterations since the ns timing seems too coarse for such a micro benchmark, I think running it in a larger algorithm is a more meaningful benchmark for this) and the change in the incircle test to be measurable in benchmarks like Delaunay Triangulation or Voronoi diagram (I ran perf_voronoi from the benchmark directory in the geos repository with release config for the numbers in the opening post). The incircle test maybe very significantly on non-x86-architectures where no machine hardware accelerated 80-bit floats are available and long double would compile to function calls to some soft float implementation, e.g. https://godbolt.org/z/EosY5ehjq .

pramsey · 2024-09-26T22:59:23Z

I hooked a Delaunay build into my test harness but couldn't measure any different between this PR and 3.13.

tinko92 · 2024-09-29T14:32:12Z

I hooked a Delaunay build into my test harness but couldn't measure any different between this PR and 3.13.

I tested as follows using the Delaunay/Voronoi benchmarks in the benchmark folder from the main repo:
I checked out the main branch, went into the directory and:

cd geos
mkdir build_gcc && build_gcc
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_BENCHMARKS=ON -DCMAKE_CXX_FLAGS="-march=native" .. 
make #will complain about missing override in another benchmark but doesn't affect the build of perf_voronoi
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo #only on intel
cpupower frequency-set -g performance #only on intel 
for i in {1..10}; do taskset -c 2 ./bin/perf_voronoi >> baseline; done;
grep "BM_DelaunayFromSeq/1000000" baseline
BM_DelaunayFromSeq/1000000        5050644908 ns   5045715175 ns            1
BM_DelaunayFromSeq/1000000        5049845175 ns   5045245900 ns            1
BM_DelaunayFromSeq/1000000        6646025308 ns   6638513791 ns            1
BM_DelaunayFromSeq/1000000        6649783153 ns   6642201348 ns            1
BM_DelaunayFromSeq/1000000        6648571528 ns   6641021393 ns            1
BM_DelaunayFromSeq/1000000        5053053815 ns   5048554170 ns            1
BM_DelaunayFromSeq/1000000        5049133353 ns   5044510485 ns            1
BM_DelaunayFromSeq/1000000        5052486875 ns   5047828776 ns            1
BM_DelaunayFromSeq/1000000        5050604032 ns   5046051328 ns            1
BM_DelaunayFromSeq/1000000        5051929265 ns   5047293562 ns            1

#checkout the branch with this patch, same steps, then
for i in {1..10}; do taskset -c 2 ./bin/perf_voronoi >> patched; done;
grep "BM_DelaunayFromSeq/1000000" patched
BM_DelaunayFromSeq/1000000        4659588868 ns   4654149493 ns            1
BM_DelaunayFromSeq/1000000        4654536370 ns   4649391288 ns            1
BM_DelaunayFromSeq/1000000        4653135324 ns   4648135157 ns            1
BM_DelaunayFromSeq/1000000        6127312010 ns   6119523080 ns            1
BM_DelaunayFromSeq/1000000        6125711504 ns   6118105844 ns            1
BM_DelaunayFromSeq/1000000        4655963070 ns   4651342713 ns            1
BM_DelaunayFromSeq/1000000        4652838792 ns   4648184104 ns            1
BM_DelaunayFromSeq/1000000        4654345257 ns   4649771922 ns            1
BM_DelaunayFromSeq/1000000        4666023790 ns   4661365412 ns            1
BM_DelaunayFromSeq/1000000        4653681302 ns   4649277328 ns            1

I honestly wish I knew, where the outliers come from with turbo disabled, maybe the P-core/E-core (it's a mobile i7-12800H) thing can affect it despite fixing the CPU core with taskset. Same steps with CXX=/usr/bin/clang++ before the cmake command yield for me:

grep "BM_DelaunayFromSeq/1000000" baseline
BM_DelaunayFromSeq/1000000        4329365684 ns   4323526473 ns            1
BM_DelaunayFromSeq/1000000        5433784294 ns   5427474646 ns            1
BM_DelaunayFromSeq/1000000        4148863139 ns   4144892923 ns            1
BM_DelaunayFromSeq/1000000        4139543832 ns   4135714625 ns            1
BM_DelaunayFromSeq/1000000        5428346793 ns   5422042258 ns            1
BM_DelaunayFromSeq/1000000        5439176317 ns   5432822740 ns            1
BM_DelaunayFromSeq/1000000        5432680958 ns   5426345764 ns            1
BM_DelaunayFromSeq/1000000        4140026534 ns   4136116714 ns            1
BM_DelaunayFromSeq/1000000        4149153847 ns   4145284595 ns            1
BM_DelaunayFromSeq/1000000        5431052965 ns   5424639386 ns            1

grep "BM_DelaunayFromSeq/1000000" patched 
BM_DelaunayFromSeq/1000000        3767896479 ns   3764021258 ns            1
BM_DelaunayFromSeq/1000000        4923848243 ns   4917679386 ns            1
BM_DelaunayFromSeq/1000000        3764235827 ns   3760602930 ns            1
BM_DelaunayFromSeq/1000000        3762541532 ns   3758951324 ns            1
BM_DelaunayFromSeq/1000000        3765034916 ns   3761276158 ns            1
BM_DelaunayFromSeq/1000000        3764714564 ns   3761178183 ns            1
BM_DelaunayFromSeq/1000000        4928664355 ns   4922710265 ns            1
BM_DelaunayFromSeq/1000000        4922105205 ns   4916260250 ns            1
BM_DelaunayFromSeq/1000000        3767456227 ns   3763747766 ns            1
BM_DelaunayFromSeq/1000000        3760288885 ns   3756814473 ns            1

Similar results for the other workloads.

And on an EPYC-Milan I don't have these outlier timings, so I post just the first sample for each workload:

grep 100000 baseline | head -n5
BM_DelaunayFromSeq/1000000        4916505592 ns   4916348118 ns            1
BM_DelaunayFromGeom/1000000       4671339867 ns   4671204309 ns            1
BM_VoronoiFromSeq/1000000         5506559255 ns   5506240288 ns            1
BM_VoronoiFromGeom/1000000        5435366604 ns   5435120888 ns            1
BM_OrderedVoronoiFromGeom/1000000 7680563331 ns   7680220398 ns            1

grep 100000 patched | head -n5 
BM_DelaunayFromSeq/1000000        4427675159 ns   4427496208 ns            1
BM_DelaunayFromGeom/1000000       4462223273 ns   4461665059 ns            1
BM_VoronoiFromSeq/1000000         5045797628 ns   5045404499 ns            1
BM_VoronoiFromGeom/1000000        4988454644 ns   4987171839 ns            1
BM_OrderedVoronoiFromGeom/1000000 7392607964 ns   7392126698 ns            1

Use tighter bounds and faster filters in orientation and isInCircle p…

4143db7

…redicates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use tighter bounds and faster filters in orientation and isInCircle p… #1162

Use tighter bounds and faster filters in orientation and isInCircle p… #1162

tinko92 commented Sep 14, 2024 •

edited

Loading

pramsey commented Sep 19, 2024

tinko92 commented Sep 20, 2024 •

edited

Loading

pramsey commented Sep 26, 2024

tinko92 commented Sep 29, 2024 •

edited

Loading

Use tighter bounds and faster filters in orientation and isInCircle p… #1162

Are you sure you want to change the base?

Use tighter bounds and faster filters in orientation and isInCircle p… #1162

Conversation

tinko92 commented Sep 14, 2024 • edited Loading

pramsey commented Sep 19, 2024

tinko92 commented Sep 20, 2024 • edited Loading

pramsey commented Sep 26, 2024

tinko92 commented Sep 29, 2024 • edited Loading

tinko92 commented Sep 14, 2024 •

edited

Loading

tinko92 commented Sep 20, 2024 •

edited

Loading

tinko92 commented Sep 29, 2024 •

edited

Loading