Flex/Flash/ActionScript


Flash 10, Flex 3, and ActionScript 3 have their limitations. For one, the Flash 10 Player is not multithreaded nor does it take advantage of multiple CPUs. This is a limitation that can hopefully be eliminated in future versions of Flash. But for now, this means that if you create an ActionScript program and want users to be able to click buttons and interact with the user interface while other work (like rendering large images) is done at the same time, you have to break big tasks into little chunks and “simulate” multithreading. Nonetheless, you’ll still only ever get one CPU attacking a problem.

So if you want to do some serious calculations in Flash/Flex (which you often have to do to make anything really interesting happen), you will likely be disappointed by the performance.

Enter Adobe Pixel Bender. As the name suggests, this tool lets developers write programs (called kernels) to “bend” pixels to their will and have them do all sorts of crazy things. Pixel Bender lets you write fills, filters, and blenders that you can use inside Photoshop, After Effects, and Flash 10. Gaussian Blurs, Color Channel Swapping, Multiply Mode Image Blending, and a slew of fancy custom algorithms are pretty easy to implement with Pixel Bender.

The nice thing is that Pixel Bender doesn’t really care too much if the input is a JPEG image of flowers, a big matrix of MP3 sound file data, or (in the example below) the state of cells in a cellular automata. What matters is that Pixel Bender lets us take advantage of a little Flash loophole: Flash 10 is able to run filters/shaders/blenders multithreaded AND on multiple CPUs. So if you can figure out how to describe a problem in such a way as to allow Pixel Bender to work on it, you can stand to speed up the computations by a factor of 2 to 10 or more.

The program below demonstrates some speed gain. It implements Conway’s Game of Life. In each image, white pixels are “alive” and black pixels are “dead”. One tab runs the program in pure ActionScript. The second tab runs the algorithm using a Pixel Bender kernel. Run each one and see how long it takes to execute 25 generations. There are two additional options:

Refresh Display: When unchecked, the field will not be redrawn each generation. This allows the update algorithm to run as quickly as possible, without the expense of drawing the image each time.

No Conditionals: The most basic Game of Life algorithm uses a series of if-then-else conditionals to decide what the next state of a cell should be. For example, if the cell is alive and has more than three living neighbors, the cell should die. As my friend Robert Sundling pointed out to me, these conditionals cause a problem in Pixel Bender. When the conditionals are compiled they intersperse the code with jump instructions. These jump instructions create multiple code paths and limit the processors ability to utilize SSE instructions and effectively process several pixels at once. (EDIT NOTE: I didn’t quite make the change my friend Bob had suggested. I still need to change my Pixel Bender code to utilize float4 rather than floats. Then those operations can run on all four floats at once. Will do that soon. So the rest of this paragraph doesn’t make a ton of sense. But it’s still faster, so I’m leaving it for now.) To address this, the simple if-then-else-based Game of Life algorithm was replaced with one that does not use conditionals, just mathematics. For symmetry, the AS3 algorithm was also rewritten without conditionals so it would use the same mathematics (namely, absolute value, multiplication, addition). However, although Pixel Bender can execute an absolute value calculation quite quickly, the same cannot be said for AS3. The Math.abs(x) function is slow compared to a simple (x<0) ? -1*x : x statement. So, to avoid needlessly penalizing the AS3 code, I used the quicker version. You will note that the AS3 code without conditionals is a bit slower than the AS3 with them. I suspect this is mainly due to the fact that the conditionals prevent large chunks of code from being run depending on the pixel. This saving cannot make up for the speed boost realized by Pixel Bender even though it executes all code for every pixel.

I tried to be sure that the algorithm was implemented the same way in both ActionScript and Pixel Bender (aside from the exceptions regarding conditionals in place of Math library calls)  so that the performance comparison is valid. However, I did not attempt to optimize the algorithm overall. For example, each pixel (a 32 bit value) represents a single cell that is either living or dead. Clearly, a more efficient algorithm could store the state of many cells in a single integer. It would also be better to use a lookup table for determining the next state for 4×4 (or larger) regions, rather than using a series of if-then-else statements or math functions for each pixel. There are many ways one could make a Game of Life simulation much faster, even before considering Pixel Bender, but this was not the goal for this experiment.

Also, this is a 768×768 simulation. Since Pixel Bender helps most by speeding up access to pixels and running kernels in parallel on multiple CPUs, running the simulation on larger grids for fewer generations will show a larger speed gain than running on tiny grids for many generations (even though the total number of pixels “touched” in each case is the same).

Here it is:

gameoflife

Game Of Life: ActionScript vs Pixel Bender (open in new window)

View Source has been enabled and you can see both the ActionScript and Pixel Bender code. I ran this application on three of my computers (all Firefox with non-debugging Flash Player, Refresh Display option turned off so UI redraws wouldn’t be counted in the run times, with both conditional and non-conditional versions of the algorithm) and saw the following results after averaging 10 runs of 25 generations each:

CPU AS3 (cond)
PB (cond)
% Improvement AS3 (no cond) PB (no cond) % Improvement
Intel Core 2 6600 @ 2.4GHz (2 CPU) 2.92 2.30 27.0% 3.54 0.96 269%
Intel T2500 @ 2.0GHz (2 CPU) 4.28 3.44 24.4% 5.20 1.96 165%
Intel P4 @ 2.4GHz (1 CPU) 8.22 17.88 -54.0% 9.25 4.72 96.0%

Some interesting results here:

  1. The greatest gain is for faster multicore computers when effort is made to remove conditional branching from the kernel code (don’t use if statements), even though this might mean executing more instructions overall in the kernel.
  2. Removing conditional branching from AS3 code can easily be counter-productive. If the conditionals allow you to skip over large chunks of code, they are likely worth keeping.
  3. On slower, single core computers, using Pixel Bender unwisely can hurt you rather than help. Since you don’t benefit from leveraging multiple cores, the overhead of using Pixel Bender can kill you.
  4. However, even on slower machines, if you can strip out conditionals from your kernel, you can see dramatic improvements in execution time.
  • Share/Save/Bookmark

Introduction

The following application generates a single “flawless” composite image by ignoring transient elements detected in multiple images and stitching together the remaining static features.

For example, suppose you wanted to capture a photo of a popular beach or city street, but do not want crowds of random people in the image. Unless you can arrange for the beach or boulevard to be deserted during the shoot, you are likely going to have to retouch the photo to remove the unwanted clutter. For example, you could use the manual process described at dsphotographic.com to mask out flaws or, if you happen to have Photoshop CS3 or CS4 Extended, the more automated “image stack” method described at aecbytes.com. Another option is to use the free tool below…

Instructions

  1. Serious users should Run Photo Decrapifier in a new window (useful since it can be resized).
  2. Enter up to five URLs for images (the images must be online; you cannot upload from your PC through this version of the application). The images must be the same size and should be taken of the same scene, separated by enough time so that moving elements shift position substantially. Photos should ideally be taken using a tripod or this cheap substitute; this version does not correct for image offset due to jitter. If necessary, align the images to each other as best as you can in Photoshop (or whatever) first. The URLs of the five images shown above are set as default.
  3. Click the “Load” button. The images should be visible in the preview panel.
  4. Adjust the settings (explained below). For good results, just accept the defaults (leave setting alone).
  5. Click the “Process” button. The program will analyze the images, extract static features, and composite them to a new image.
  6. Continue reading below for an explanation of the algorithms used in more detail.

(1) (2) (3)

(4) (5)

Run Photo Decrapifier in a new window (useful since it can be resized)

Behind the Scenes

In this version, there are three algorithms: Median, Modified Median, and Cluster. Median is the first method I tried, simply because it was simpler to implement. The median algorithm looks at corresponding pixels from each image and calculates the RGB distance of each pixel from white (0x000000). It then sorts the pixels by distance and chooses the middle point (median). The following simplified example with non-RGB color values illustrates the algorithm.

Example 1:  Three desired/static and two undesired/transient pixels

Pixel color values: 20, 25, 30, 250, 250
Mean Pixel: 115 (bad)
Median Pixel: 30 (good)

Example 2:  Two desired/static and three undesired/transient pixels

Pixel color values: 20, 25, 100, 175, 250
Mean Pixel: 114 (bad)
Median Pixel: 100 (bad)

Notice that the median pixel algorithm can usually provide a good answer as long as there are more desired pixels than undesired pixels in the set. It can still fail though when two very different colors—one desired and one undesired—are both nearly the same RGB distance from white.

The modified median algorithm starts by calculating an average (mean) RGB value for the set of pixels. It then calculates the distance of each pixel from that average value and chooses the pixel closest to it. Again, the results are mediocre at best. As long as there are sufficiently more static than transient pixels, it’s fine, but as this method tends to err toward choosing the pixel closest to 50% gray, it is easy for undesired pixels at the edges of transient features (where antialiasing and blending occur) to be chosen. It can also suffer when very bright or dark pixels shift the value of the mean far away from the value of the static pixels. Both median and modified median algorithms fail easily when there are fewer total static pixels than transient ones.

To solve these deficiencies, I implemented a straightforward agglomerative hierarchical clustering algorithm. First, the RGB distances between each pair of image pixels is calculated. Then the two nearest pixels in RGB space are grouped into a cluster. The color of the cluster is updated to be the average of the colors of pixels in the cluster, and the distances of the other points to the cluster are updated. Then the next smallest distance between a pair is chosen. This could be a distance between two lone pixels, between an existing cluster and a lone pixel, or between two clusters. These  are then further clustered. This process is repeated until no two pixels (or clusters) are closer together than a threshold distance.

The algorithm then looks for the largest single cluster, which we assume will contain static pixels. It then draws the average color for the cluster. Note that it is no longer necessary that there be more total static pixels than total transient pixels. It is simply necessary that the size of a cluster of static pixels be larger then the next largest cluster, as the previous example demonstrates:

Example 2:  Two desired/static and three undesired/transient pixels

Pixel color values: 20, 25, 100, 175, 250
Mean Pixel: 114 (bad)
Median Pixel: 100 (bad)
Clusters: (20, 25), (100), (175), (250)
Cluster Pixel: 22.5 (the average of 20 and 25) GOOD!

The cluster algorithm returns a much better color value than either the median or modified median algorithm. As long as the threshold value chosen is larger than 5 and smaller than 75, the static pixels will belong to the largest cluster, even though there are more total transient pixels. This is therefore a more robust algorithm.

Settings

Distance Metric: Choose either the Manhattan distance or 3D Euclidean_distance.

Algorithm: Choose Median, Modified Median, or Cluster

Threshold: The percent of the maximum distance (the distance between white and black) beyond which two points will not be considered as being part of the same cluster. Small threshold values will result in many clusters with only a single pixel. Large values will create large clusters (possibly one cluster), and provide an average of both static and transient pixels (the result will simply be a blending of all images).

Show Unclustered: If checked, pixels drawn that were not part of a cluster of two or more pixels are colored pure blue. This is to help you choose the optimal threshold.

Possible Future Improvements

  1. Try different color spaces than RGB, such as YUV, which are theoretically more tuned to a person’s subjective perception of color than RGB.
  2. Use a different clustering algorithm such as spectral clustering or DBSCAN.
  3. Add intelligence that takes into account the clustering choices made for pixels neighboring the one currently being evaluated. Use neighbor information to reduce discontinuities or act as a prior for choosing cluster points.
  • Share/Save/Bookmark

I was having a hard time drawing a Flex BitmapData object (or Bitmap) to a canvas. Since neither of these objects are UIComponents, they cannot be added directly as children to a Canvas using the AddChild method-you’ll get a runtime error. After poking around the web, I quickly found a suggestion to simply add the Bitmap to a UIComponent like Sprite or Image and then add that to the Canvas. Like this:

var myBitmap:Bitmap = new Bitmap(myBitmapData);
var myImage:Image = new Image();
myImage.addChild(myBitmap);
myCanvas.addChild(myImage);

So this method will get your Bitmap/BitmapData onto the canvas. But then we have a new problem: if the image is larger than the canvas, the image will extend outside the limits of the canvas, even if the canvas has scrollbars enabled (the large image does not even activate the canvas scrollbars). This was no good for me. I wanted to be able to scroll the canvas to view the entire image.

The solution that worked better than the AddChild method above was to instead set the Image.source property to the Bitmap.  Like this:

var myBitmap:Bitmap = new Bitmap(myBitmapData);
var myImage:Image = new Image();
myImage.source = myBitmap;
myCanvas.addChild(myImage);

Presto! The BitmapData is displayed in the canvas and we get nice scrollbars.

  • Share/Save/Bookmark