From the Lab
Building a Slow Cumulative Range Geodesic Filter - Part 2
Bernard Llanos — June 18, 2013 - 2:54pm
Filter Mask Generation Using a Binary Heap
Seeking to improve the speed of the cumulative range geodesic filtering program described in my last blog entry, I started by examining code for Dijkstra's algorithm written by David Gleich (downloaded from http://www.mathworks.com/matlabcentral/fileexchange/24134-gaimc-graph-algorithms-in-matlab-code ). I had been using this code to produce data with which I could compare my results to determine if my program was working correctly. David Gleich's code, which uses a binary heap, ran significantly faster than my program.
Using David Gleich's program and some general information on binary heaps as guides, I substituted the Fibonacci heap with a binary heap. The version which used a Fibonacci heap processed an image at a rate of around one pixel per second, while the version with a binary heap, following some initial changes to improve its efficiency, reached a speed of about 50 pixels per second. After working much harder on optimizing the code, the filter ran at about 350 pixels per second.
Unfortunately, the filter was still too slow to be useful. However, I learned many tricks for speeding up MATLAB code, which I will summarize here for future reference:
MATLAB Code Efficiency Tips
-
MATLAB's help system has pages on efficient MATLAB programming, which are especially useful for anyone with experience programming in other languages, but who is not familiar with MATLAB.
-
An article by Pascal Getreuer, available at http://www.getreuer.info/tutorials goes beyond the MATLAB documentation to provide many more useful tips. I disagree, however, with his advice on Page 29 to replace if-statements with the min and max functions when bounding values within a range, based on timing many iterations of both methods using the tic and toc functions. Perhaps his assertion was correct, but was later invalidated by changes between MATLAB releases. This suggests that very detailed optimization of code may actually be unwarranted, as it could reduce the portability of code between MATLAB versions.
-
When learning to use MATLAB, it is nice to make use of multiple indices when accessing elements of multidimensional arrays because it is more readable. However, MATLAB only pretends that arrays are multidimensional and probably converts multiple indices to linear (single) indices before accessing data. Code runs slightly faster if it uses linear indexing where possible. However, I found that initializing arrays as explicitly one-dimensional instead of multidimensional will not necessarily make the program more efficient.
-
Note that arrays are stored in column-major order. Iterating down the columns of an array should, in theory, result in faster execution than iterating across the rows of an array, because column elements are stored in adjacent memory locations. (As mentioned by Pascal Getreuer in his article referenced above)
-
-
It appears to be faster to re-calculate an intermediate value several times (within reason) than to store it as a variable and access that variable several times when calculating final results.
-
MATLAB uses 'double' as a default type for most values. From running my own test code, I found that initialization and retrieval of 'double' data, as well as arithmetic operations on 'double' data, seem to run faster than for single-precision or integer-type data.
-
I like MATLAB's treatment of global variables, wherein variables are only global to the functions in which they are explicitly declared as global. It is as though global variables can live in "parallel universes". Unfortunately, using global variables slows down program execution.
-
In my filtering program I had several functions which needed access to the same image data. From reading MATLAB's help documentation, I had initially found that MATLAB does not pass arguments by reference. Concerned about passing large image data arrays by value, I decided to write a subclass of the "handle" class to hold the image data. MATLAB's handle class allows for objects to be passed to functions without having their data copied. However, I later discovered a MATLAB help page on "Memory Allocation" which provided more detail on how MATLAB passes parameters to functions:
-
Arrays are passed to functions as pointers, which is similar to other langauges (C, for example).
-
However, whenever a function attempts to modify any values in an array, MATLAB copies the entire array. The function now has a local copy of the array containing any modifications.
-
In order to have a function persistently modify an array, the modified array must be returned by the function and used to replace the original array in the calling program.
-
Note that nested functions are an exception to the above rules, as they can directly modify arrays in the workspaces of their parent functions which they are not explicitly passed. Therefore, they can avoid making copies of arrays.
-
Modifying my code to remove all object-oriented aspects, so that arrays were passed directly to functions, resulted in a 60% increase in speed.
-