Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drawing performance tweaks #290

Merged
merged 15 commits into from
Sep 9, 2023
Merged

Drawing performance tweaks #290

merged 15 commits into from
Sep 9, 2023

Conversation

JimBobSquarePants
Copy link
Member

@JimBobSquarePants JimBobSquarePants commented Sep 7, 2023

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

Some performance tweaks to help with our drawing performance.

  • Avx/Sse41/Arm64 rounding in ScanEdgeCollection
  • Remove unnecessary scaling in ClipperOffset since we don't do fixed precision maths. This should actually make the clipper more accurate.

Given that Avx does not have an exact equivalent to MidpointRounding.AwayFromZero there are minor changes to the test output, but they are nonvisible. This is why there are so many files listed.

Only 6 reference files are changed now (due to improved accuracy of the clipper). However, GitHub is marking 30 files as changed with several LFS files marked as invalid. I have no idea why. I've not touched those files and they are present in our CI tests.

Here's the difference in performance with the new rounding method testing a buffer of 1000 vertices. (I can't test the ARM version).

|      Method |       Mean |     Error |    StdDev | Ratio |
|------------ |-----------:|----------:|----------:|------:|
|   RoundYAvx |   224.4 ns |   2.43 ns |   2.03 ns |  0.03 |
| RoundYSse41 | 2,639.2 ns |  10.78 ns |   9.56 ns |  0.35 |
|      RoundY | 7,449.2 ns | 142.53 ns | 119.02 ns |  1.00 |

Here's some comparison benchmarks indicating where we stand against System.Drawing and SkiaSharp.

I've included the benchmarks from #96 to give us a rough idea of how much better we are looking compared to v1. Bear in mind though that the difference is massively influenced by the improved blending performance in ImageSharp v3.

FillPolygonMedium (Mississippi):

v1

|                 Method |         Mean |       Error |      StdDev |    Ratio | RatioSD |
|----------------------- |-------------:|------------:|------------:|---------:|--------:|
|          SystemDrawing |     358.2 us |     7.14 us |    10.68 us |     2.27 |    0.06 |
|    ImageSharp_Original | 187,078.3 us | 3,699.54 us | 6,181.11 us | 1,186.12 |   37.40 |
| ImageSharp_ActiveEdges |   2,884.7 us |    56.27 us |   109.75 us |    18.84 |    0.67 |
|              SkiaSharp |     156.8 us |     2.91 us |     2.43 us |     1.00 |    0.00 |

This PR

|        Method |     Mean |    Error |   StdDev | Ratio | RatioSD |
|-------------- |---------:|---------:|---------:|------:|--------:|
| SystemDrawing | 253.3 us |  2.50 us |  1.95 us |  2.26 |    0.03 |
|    ImageSharp | 766.4 us | 13.54 us | 16.12 us |  6.84 |    0.14 |
|     SkiaSharp | 112.5 us |  1.25 us |  1.17 us |  1.00 |    0.00 |

DrawPolygonMediumThick:

v1

|                 Method |       Mean |     Error |    StdDev |     Median |  Ratio | RatioSD |
|----------------------- |-----------:|----------:|----------:|-----------:|-------:|--------:|
|          SystemDrawing |   1.539 ms | 0.0304 ms | 0.0621 ms |   1.515 ms |   0.97 |    0.04 |
|    ImageSharp_Original | 169.697 ms | 3.3095 ms | 4.7463 ms | 168.241 ms | 104.43 |    2.09 |
| ImageSharp_ActiveEdges |  38.795 ms | 0.7716 ms | 1.0562 ms |  38.591 ms |  24.16 |    0.74 |
|              SkiaSharp |   1.625 ms | 0.0158 ms | 0.0132 ms |   1.623 ms |   1.00 |    0.00 |

This PR

|        Method |      Mean |     Error |    StdDev | Ratio | RatioSD |
|-------------- |----------:|----------:|----------:|------:|--------:|
| SystemDrawing |  1.352 ms | 0.0094 ms | 0.0083 ms |  0.92 |    0.01 |
|    ImageSharp | 14.104 ms | 0.2779 ms | 0.3089 ms |  9.61 |    0.25 |
|     SkiaSharp |  1.473 ms | 0.0068 ms | 0.0053 ms |  1.00 |    0.00 |

Drawing, though much faster is still way off the pace so I did a little digging. It turns out almost half the overall time of the actual operation is spent clipping away self-intersections. Here's the benchmark if I remove that.

|        Method |     Mean |     Error |    StdDev | Ratio | RatioSD |
|-------------- |---------:|----------:|----------:|------:|--------:|
| SystemDrawing | 1.352 ms | 0.0103 ms | 0.0080 ms |  0.90 |    0.02 |
|    ImageSharp | 8.518 ms | 0.0815 ms | 0.0681 ms |  5.65 |    0.14 |
|     SkiaSharp | 1.498 ms | 0.0297 ms | 0.0342 ms |  1.00 |    0.00 |

Only 11 tests fail when I disable clipping, some results are actually better. (Note the ghosting disappearing #106)

image

However, we cannot remove clipping as the offsetter depends on it.

image

I've been looking at both Blend2d and Skia and it appears to me (I could be reading it wrong) that they work differently. I cannot find any trace of clipping in either stroke operation. Perhaps someone would like to have a closer look?

@codecov
Copy link

codecov bot commented Sep 7, 2023

Codecov Report

Merging #290 (d15be1c) into main (3f1b4fa) will decrease coverage by 1%.
The diff coverage is 69%.

@@         Coverage Diff         @@
##           main   #290   +/-   ##
===================================
- Coverage    80%    80%   -1%     
===================================
  Files        97     97           
  Lines      4873   4937   +64     
  Branches    869    878    +9     
===================================
+ Hits       3929   3975   +46     
- Misses      752    769   +17     
- Partials    192    193    +1     
Flag Coverage Δ
unittests 80% <69%> (-1%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...g/Shapes/Rasterization/ScanEdgeCollection.Build.cs 83% <67%> (-13%) ⬇️
...arp.Drawing/Shapes/PolygonClipper/ClipperOffset.cs 100% <100%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

📢 Have feedback on the report? Share it here.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are minor changes to the test output, but they are nonvisible. This is why there are so many files listed.

Do we have at least one test configuration that excercises scalar runs primarily? We may see failures there. If the difference is minor, it could be better to tweak the tolerance than to replace all the files.

@@ -82,12 +85,85 @@ internal static ScanEdgeCollection Create(TessellatedMultipolygon multipolygon,

static void RoundY(ReadOnlySpan<PointF> vertices, Span<float> destination, float subsamplingRatio)
{
for (int i = 0; i < vertices.Length; i++)
int ri = 0;
if (Avx.IsSupported)
Copy link
Member

@antonfirsov antonfirsov Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure all 3 cases (AVX, SSE, scalar) are covered by tests for various input sizes. Given that running an extensive set of integration tests out of process is rather expensive, it would be much better to have the method unit-tested with FeatureTestRunner against random input of various sizes. Here's an example for this approach:

https://github.com/SixLabors/ImageSharp/blob/54b7e04f7a3c2921af3c769bd6c27fd3d5156f04/tests/ImageSharp.Tests/Common/SimdUtilsTests.cs#L155-L166

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've ported FeatureTestRunner across and added a wrapper around existing ScanEdgeCollection tests to cover changes. We now use the same rounding throughout as I figured out an easy way to replicate midpoint rounding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage is better than reported (actually better than main) because we can't provide a coverage report for the ARM code.

@JimBobSquarePants
Copy link
Member Author

JimBobSquarePants commented Sep 8, 2023

There is something funky going on with Git LFS. If I switch the branch to main I see the following 15 changes which I then can do absolutely nothing with (can't switch branch or discard).

image

It might well be main that's the issue actually as I can see the files are hosted in Git LFS in my feature branch.

https://github.com/SixLabors/ImageSharp.Drawing/blob/a7481bb4cbe160a02ab8793ef46b87d4a260db2a/tests/Images/ReferenceOutput/Drawing/DrawComplexPolygonTests/DrawComplexPolygon__Overlap.png

I would suggest we ignore and merge the new files. I can follow up with a PR to blanket normalize the images if there are issues.

@antonfirsov
Copy link
Member

When trying to copy back files from main I'm seeing a diff of 6 files 🤷

image

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the small overlook with maxIterations LGTM.

I would still prefer to see SIMD optimized utility methods being directly unit tested, since that enables fine grained testing of various element counts and edge cases and may catch bugs we could miss otherwise, but won't block the PR on it.

if (verticesLength > 0)
{
ri = vertices.Length - (remainder / 2);
float maxIterations = verticesLength / (Vector256<float>.Count * 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be nint. (Also for Sse41 and AdvSimd.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed here and in benchmarks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you've pushed the change? I don't see the update in the PR diff.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫢Nope.... committed but didn't push. I'll open a PR.

}
}
}
else if (AdvSimd.IsSupported)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know if this is true on the BuildJet image?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we've specified the ARM images. These are the ones we've used previously to diagnose issues in ImageSharp. https://buildjet.com/for-github-actions/docs/runners/hardware#arm

@JimBobSquarePants JimBobSquarePants merged commit e050b92 into main Sep 9, 2023
@JimBobSquarePants JimBobSquarePants deleted the js/draw-perf branch September 9, 2023 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants