Thu 26 Mar 2009
Flash 10, Flex 3, and ActionScript 3 have their limitations. For one, the Flash 10 Player is not multithreaded nor does it take advantage of multiple CPUs. This is a limitation that can hopefully be eliminated in future versions of Flash. But for now, this means that if you create an ActionScript program and want users to be able to click buttons and interact with the user interface while other work (like rendering large images) is done at the same time, you have to break big tasks into little chunks and “simulate” multithreading. Nonetheless, you’ll still only ever get one CPU attacking a problem.
So if you want to do some serious calculations in Flash/Flex (which you often have to do to make anything really interesting happen), you will likely be disappointed by the performance.
Enter Adobe Pixel Bender. As the name suggests, this tool lets developers write programs (called kernels) to “bend” pixels to their will and have them do all sorts of crazy things. Pixel Bender lets you write fills, filters, and blenders that you can use inside Photoshop, After Effects, and Flash 10. Gaussian Blurs, Color Channel Swapping, Multiply Mode Image Blending, and a slew of fancy custom algorithms are pretty easy to implement with Pixel Bender.
The nice thing is that Pixel Bender doesn’t really care too much if the input is a JPEG image of flowers, a big matrix of MP3 sound file data, or (in the example below) the state of cells in a cellular automata. What matters is that Pixel Bender lets us take advantage of a little Flash loophole: Flash 10 is able to run filters/shaders/blenders multithreaded AND on multiple CPUs. So if you can figure out how to describe a problem in such a way as to allow Pixel Bender to work on it, you can stand to speed up the computations by a factor of 2 to 10 or more.
The program below demonstrates some speed gain. It implements Conway’s Game of Life. In each image, white pixels are “alive” and black pixels are “dead”. One tab runs the program in pure ActionScript. The second tab runs the algorithm using a Pixel Bender kernel. Run each one and see how long it takes to execute 25 generations. There are two additional options:
Refresh Display: When unchecked, the field will not be redrawn each generation. This allows the update algorithm to run as quickly as possible, without the expense of drawing the image each time.
No Conditionals: The most basic Game of Life algorithm uses a series of if-then-else conditionals to decide what the next state of a cell should be. For example, if the cell is alive and has more than three living neighbors, the cell should die. As my friend Robert Sundling pointed out to me, these conditionals cause a problem in Pixel Bender. When the conditionals are compiled they intersperse the code with jump instructions. These jump instructions create multiple code paths and limit the processors ability to utilize SSE instructions and effectively process several pixels at once. (EDIT NOTE: I didn’t quite make the change my friend Bob had suggested. I still need to change my Pixel Bender code to utilize float4 rather than floats. Then those operations can run on all four floats at once. Will do that soon. So the rest of this paragraph doesn’t make a ton of sense. But it’s still faster, so I’m leaving it for now.) To address this, the simple if-then-else-based Game of Life algorithm was replaced with one that does not use conditionals, just mathematics. For symmetry, the AS3 algorithm was also rewritten without conditionals so it would use the same mathematics (namely, absolute value, multiplication, addition). However, although Pixel Bender can execute an absolute value calculation quite quickly, the same cannot be said for AS3. The Math.abs(x) function is slow compared to a simple (x<0) ? -1*x : x statement. So, to avoid needlessly penalizing the AS3 code, I used the quicker version. You will note that the AS3 code without conditionals is a bit slower than the AS3 with them. I suspect this is mainly due to the fact that the conditionals prevent large chunks of code from being run depending on the pixel. This saving cannot make up for the speed boost realized by Pixel Bender even though it executes all code for every pixel.
I tried to be sure that the algorithm was implemented the same way in both ActionScript and Pixel Bender (aside from the exceptions regarding conditionals in place of Math library calls) so that the performance comparison is valid. However, I did not attempt to optimize the algorithm overall. For example, each pixel (a 32 bit value) represents a single cell that is either living or dead. Clearly, a more efficient algorithm could store the state of many cells in a single integer. It would also be better to use a lookup table for determining the next state for 4×4 (or larger) regions, rather than using a series of if-then-else statements or math functions for each pixel. There are many ways one could make a Game of Life simulation much faster, even before considering Pixel Bender, but this was not the goal for this experiment.
Also, this is a 768×768 simulation. Since Pixel Bender helps most by speeding up access to pixels and running kernels in parallel on multiple CPUs, running the simulation on larger grids for fewer generations will show a larger speed gain than running on tiny grids for many generations (even though the total number of pixels “touched” in each case is the same).
Here it is:
View Source has been enabled and you can see both the ActionScript and Pixel Bender code. I ran this application on three of my computers (all Firefox with non-debugging Flash Player, Refresh Display option turned off so UI redraws wouldn’t be counted in the run times, with both conditional and non-conditional versions of the algorithm) and saw the following results after averaging 10 runs of 25 generations each:
||% Improvement||AS3 (no cond)||PB (no cond)||% Improvement|
|Intel Core 2 6600 @ 2.4GHz (2 CPU)||2.92||2.30||27.0%||3.54||0.96||269%|
|Intel T2500 @ 2.0GHz (2 CPU)||4.28||3.44||24.4%||5.20||1.96||165%|
|Intel P4 @ 2.4GHz (1 CPU)||8.22||17.88||-54.0%||9.25||4.72||96.0%|
Some interesting results here:
- The greatest gain is for faster multicore computers when effort is made to remove conditional branching from the kernel code (don’t use if statements), even though this might mean executing more instructions overall in the kernel.
- Removing conditional branching from AS3 code can easily be counter-productive. If the conditionals allow you to skip over large chunks of code, they are likely worth keeping.
- On slower, single core computers, using Pixel Bender unwisely can hurt you rather than help. Since you don’t benefit from leveraging multiple cores, the overhead of using Pixel Bender can kill you.
- However, even on slower machines, if you can strip out conditionals from your kernel, you can see dramatic improvements in execution time.