A few weeks ago I travelled to the Netherlands to be part of the Krita October Sprint. During this Sprint we decided to focus on bug fixing, my tasks included some simple bugs and a couple of more convoluted bugs. I started fixing the simple ones in order to gain speed: one about modifiers not working on OSX, the bug was simple enough but puzzling as the missing logic shouldn’t make the code work on Linux, but it did. The second bug was related to events logic in the preferences dialog command: My first approach was good but not simple, so talking with the team made me change the solution to something much more simple.
The next days showed me how deep the rabbit hole goes in Krita’s code, my bug was in the invert color code, some color spaces didn’t show the correct/expected result. A quick dive showed that there was a different codebase for every colorspace invert operation, and the wrong results showed for the missing implementations. However this made the solution not very portable as the combination of colorspaces and color depths suggested I needed to implement 18 color inverters. A short consultation showed me that there was a space invert operation already implemented for the pixel depth, so refactoring to use this convertors in one class to invert the input colors made the invert filter work as expected, except for CMYK and Lab spaces in 16bit float spaces. After a couple of days of digging into the code and testing, we found that there is a a bug in the way CMYK and Lab is values are processed as normalized values are not returned in places they should be.
As my first Krita Sprint I was very nervous, however I was even more exited to meet the team. In a way it was also the first time to work in a code only environment which made it very fruitful as it showed me that code is not made by super coding super geniuses, but by a little changes made by a coordinated team of normal people.
A new vectorized code implemented using Vc library to allow SIMD operations for the generation of the Circular Soft Mask. Implementation was straightforward using internal methods declared in Vc however the gains were not as dramatic as with Gaussian Masks because one of the biggest bottlenecks is fetching from memory the predefined values rendered from the curve set by the user.
Making a plan
Phabricator task:Implement Circular Soft Mask Optim AVX
The code templates work the same as the Circular Gaussian Mask generator implementation, which I explained in my [previous post](blog, URL). Taking that into account the plan consisted in three simple steps.
Understand how the scalar vector is generating the values for the Mask
Previous implementation was based on a slow scalar model, calculating each mask value per coordinate. I implement a new vectorized code using Vc library to allow a robust SIMD usage, calculating the mask values in parallel. Not all operations are implemented on Vc data types, especially erf had to be implemented for Vc data types. The new implementation shows to be up to 10 times faster (on my system) on mask generation. Given that the mask generation requires the most computing on brush stroke generation, this speed improvement holds up even in the full brush stroke benchmarks. Given the way it is implemented the code can become faster as future SIMD registers grows on future CPUs.
Code study and implementation of Gauss Mask Mask generator.
Phabricator task:Implement Circular Gauss Mask Optim AVX Vc creates code from templates tailored for each processor instruction set: AVX, AVX2, SSSE2, SSSE3, SSE2, and scalar. so first a template must be declared to manage the creation of each instructions set code. Using the vectorized Default Mask implementation as a guideline, studying how the code generates is constructed to provide the functionality allowed to extend it for the other MaskGenerators Read More »
Hi! GSoC student here :]. This first weeks coding for Krita have been so busy I forgot to write about them. So I’ll start to sum everything up in short posts about each step of the project implementation process.
First Steps, setting up a dev environment
I followed the steps in the 3rdparty to compile the base krita system on OSX. This easy to follow instructions helped me get a basic Krita installation in a short time. However not everything worked for me quite easily and most tests did not work or run at all on OSX with the message.
QFATAL : FreehandStrokeBenchmark::testDefaultTip() Cannot calculate the bundle path from the app path
After some digging I found out that no program that uses a GUI can run outside of an app bundle. So while not a future proof, to start working on the code I made a quick script to install the tests I’m interested inside Krita.app folder. To allow tests to run. By default all tests are linked to libraries in the build dir, but because this wont work on OSX one approach would be to install also the tests in the bundle and link to the install libraries or, another approach could be to generate an app bundle for each test.
In any case the tests could run so It was time to start working on the unit test.
Implementing Mask Similarity Test
Phabricator task:Base unit test kis_mask_similarity_test
This unit test intention is to compare the correctness of the new vectorized mask rendering by comparing it to the same settings Mask produced by the previous engine. The new versions have to be as identical as possible to ensure the painting effects the user is expecting does not change between engines (The user can’t change how the mask is produced, but we use the scalar version for smaller brush dab sizes).
I can’t believe I was selected for the Google Summer of Code program for working on Krita. The proyect I’ll be working this summer is on optimizing Krita’s brush mask to work with AVX instructions. These instructions will be coded using the Vc library, a “zero overhead C++ types for parallel computing” that enables to efficiently transform the mask’s generator code to SIMD instructions for vectorization.
Brush masks is a core process in the painting task as it creates the shape it will be imprinted in the canvas. This, depending on brush settings, can be done as much as thousends of times per second. Having this optimized will greatly improve painting enjoyment keeping the brush stroke responsive on bigger sizes.