A new vectorized code implemented using Vc library to allow SIMD operations for the generation of the Circular Soft Mask. Implementation was straightforward using internal methods declared in Vc however the gains were not as dramatic as with Gaussian Masks because one of the biggest bottlenecks is fetching from memory the predefined values rendered from the curve set by the user.
Making a plan
Phabricator task:Implement Circular Soft Mask Optim AVX
The code templates work the same as the Circular Gaussian Mask generator implementation, which I explained in my [previous post](blog, URL). Taking that into account the plan consisted in three simple steps.
Understand how the scalar vector is generating the values for the Mask
Previous implementation was based on a slow scalar model, calculating each mask value per coordinate. I implement a new vectorized code using Vc library to allow a robust SIMD usage, calculating the mask values in parallel. Not all operations are implemented on Vc data types, especially erf had to be implemented for Vc data types. The new implementation shows to be up to 10 times faster (on my system) on mask generation. Given that the mask generation requires the most computing on brush stroke generation, this speed improvement holds up even in the full brush stroke benchmarks. Given the way it is implemented the code can become faster as future SIMD registers grows on future CPUs.
Code study and implementation of Gauss Mask Mask generator.
Phabricator task:Implement Circular Gauss Mask Optim AVX Vc creates code from templates tailored for each processor instruction set: AVX, AVX2, SSSE2, SSSE3, SSE2, and scalar. so first a template must be declared to manage the creation of each instructions set code. Using the vectorized Default Mask implementation as a guideline, studying how the code generates is constructed to provide the functionality allowed to extend it for the other MaskGenerators Read More »
Hi! GSoC student here :]. This first weeks coding for Krita have been so busy I forgot to write about them. So I’ll start to sum everything up in short posts about each step of the project implementation process.
First Steps, setting up a dev environment
I followed the steps in the 3rdparty to compile the base krita system on OSX. This easy to follow instructions helped me get a basic Krita installation in a short time. However not everything worked for me quite easily and most tests did not work or run at all on OSX with the message.
QFATAL : FreehandStrokeBenchmark::testDefaultTip() Cannot calculate the bundle path from the app path
After some digging I found out that no program that uses a GUI can run outside of an app bundle. So while not a future proof, to start working on the code I made a quick script to install the tests I’m interested inside Krita.app folder. To allow tests to run. By default all tests are linked to libraries in the build dir, but because this wont work on OSX one approach would be to install also the tests in the bundle and link to the install libraries or, another approach could be to generate an app bundle for each test.
In any case the tests could run so It was time to start working on the unit test.
Implementing Mask Similarity Test
Phabricator task:Base unit test kis_mask_similarity_test
This unit test intention is to compare the correctness of the new vectorized mask rendering by comparing it to the same settings Mask produced by the previous engine. The new versions have to be as identical as possible to ensure the painting effects the user is expecting does not change between engines (The user can’t change how the mask is produced, but we use the scalar version for smaller brush dab sizes).
I can’t believe I was selected for the Google Summer of Code program for working on Krita. The proyect I’ll be working this summer is on optimizing Krita’s brush mask to work with AVX instructions. These instructions will be coded using the Vc library, a “zero overhead C++ types for parallel computing” that enables to efficiently transform the mask’s generator code to SIMD instructions for vectorization.
Brush masks is a core process in the painting task as it creates the shape it will be imprinted in the canvas. This, depending on brush settings, can be done as much as thousends of times per second. Having this optimized will greatly improve painting enjoyment keeping the brush stroke responsive on bigger sizes.