Archive for the Category Algorithms

 
 

Rhombo

Over the past couple of weeks I wrote some code in C# to generate dissections of the rhombic triacontahedron into golden rhombohedrons. George Hart discusses these types of dissections here  and also talks about the problem of enumerating them in an appendix here — briefly, all this material by Hart and others is about how the fact that the rhombic triacontahedron and the rhombic enneacontahedron are zonohedra lead to both having interesting combinatoric properties which can be explored by coloring their dissections.

I was, however, more interested in how such dissections could be turned into an interlocking puzzle, akin to a traditional burr puzzle. amd as such needed code to generate 3D models of the dissections. My generation code is a dumb, constructive, brute force approach in which I just traverse the search space adding rhombohedrons to a candidate dissection in progress and backtracking when reaching a state in which it is impossible to add a rhombohedron without intersecting the one that was already added or the containing triacontahedron, keeping track of configurations that have already been explored.

Dissections of the rhombic triacontahedron into golden rhombohedrons (hereafter “blocks”) turns out to always need 10 and 10 of the two types of blocks that Hart refers to in the above as the “pointy” and “flat” varieties (and which I refer to as yellow and blue). Further it turns out that in all of these dissections there are four blocks that are completely internal, i.e. sharing no face with the triacontahedron; I also believe that the four internal blocks are always three blue and one yellow, but I’m not sure about that.

My strategy for finding an interlocking puzzle was the following:

  • Generate a bunch of raw dissections into blocks
  • For each dissection, search the adjacency graph for four pieces, the union of sets of five blocks, such that
    • Each piece forms a simple path in the dissection; that is, each block in the piece
      • is either an end block that is face adjacent to a next or previous block in the piece or is a non-end block that is face adjacent to a next block and a previous block.
      • and does not share any edges with other blocks in the piece except for the edges of the face adjacencies.
    • Each piece contains at least one fully internal block.
    • Each piece is “single axis disentangle-able” from each other piece, where we mean by that that there exists some edge e in the complete construction such that if given piece p1 and piece p2, if you offset p1 in the direction of  e by a small amount p1 does not intersect p2.
    • Each piece is not single axis disentangle-able from the union of the other three pieces.

I never managed to succeed in doing a complete enumeration, generating all of the dissections for reasons that I don’t feel like going into. (As I said above, I did not do anything fancy and it would be easier to just be smarter about how I do the generation than to make what I have more efficient; i.e. could have done the George Hart algorithm if I had known abouyt that or there are ways of transforming one dissection into another that I don’t do — I do an exhaustive search, period — but I never did the smarter stuff because I found what I was looking for, see below)

But from about 10 dissections I found one set of pieces that uniquely satisfies all of the above:

Here’s some video. (The individual blocks were 3D printed and super glued together)

I’m calling the above “rhombo”. Those pieces are rough because I only 3D printed the individual rhombohedrons and then superglued them together into the pieces, which is imprecise. I had to sand them heavily to get them to behave  nicely. I’ll eventually put full piece models up on Shapeways.

In the course of doing this work, it became apparent that there is no good computational geometry library for C# to use for something like this. There is one called Math.Net Numerics along with Math.Net Spatial that will get you vectors and matrices but not with all the convenience routines you’d expect to treat vectors like 3D points and so forth. What I ended up doing was extracting the vectors and matrices out of monogame and search-and-replacing “float” to “double” to get double precision. Here is that code on github. I also included in there 3D line segment/line segment intersection code and 3D triangle/triangle intersection code which I transliterated to C#. The line segment intersection code came from Paul Bourke’s web site. And the triangle intersection code came from running Tomas Moller’s C code through just a C preprocessor to resolve all the macros and then transliterating the result to C#.

Mean Shift Segmentation in OpenCV

I’ve posted a new repository on GitHub for doing mean shift segmentation in C++ using OpenCV: see here.

OpenCV contains a mean shift filtering function and has a GPU, I think CUDA, implementation of mean shift segmentation. I didn’t evaluate the GPU implementation because I’m personally not interested in GPU for the project I am working on. I did take a look at turning cv::pyrMeanShiftFiltering(…) output into a segmentation but didn’t bother trying because pyrMeanShiftFiltering seems broken to me. This is my gut instinct — I can’t quantify it but basically I agree with this guy. The output just seems to not be as good as the output generated by codebases elsewhere online. I have no idea why … one interesting reason might be that OpenCV is doing mean shift on RGB rather than one of the color spaces that are supposed to be better at modeling human vision. Everybody always says to do things that involve treating colors as points in Euclidean space using L*a*b* or L*u*v* rather than RGB, but in practice, to be honest, it never seems to matter to me. Maybe this is an example of where it does. I don’t know but in any case cv::pyrMeanShiftFiltering in my opinion sucks.

The “elsewhere online” I mention above is the codebase of EDISON, “Edge Detection and Image SegmentatiON”, made freely available by Rutgers University’s “Robust Image Understanding Laboratory”. EDISON is a command line tool that parses a script specifying a sequence of computer vision operations that I wasn’t really interested in except for the part in which it does mean shift segmentation, as its mean shift output seems really good to me. What I have done is extracted the mean shift code, which was C, wrapped it thinly in C++, and ported it to use OpenCV types, e.g. cv::Mat, and OpenCV operations where possible. I also re-factored for concision and removed C-isms where possible, e.g. I replace naked memory allocations with std::vectors and so forth.

The most significant change coming out of this re-factoring work in terms of functionality and/or performance was replacing the EDISON codebase’s L*u*v*-to-RGB/RGB-to-L*u*v* conversion routines with OpenCV calls. This actually changes the output of this code relative to EDISON because OpenCV and EDISON give different L*u*v* values for the same image. Not sure who is right or the meaning of the difference but OpenCV is an industry standard so am erring on the side of OpenCV and further the segmentation this code outputs is in my opinion better that what results from EDISON’s L*u*v* routines while performance is unchanged.

Below is some output:

Lifelike, or the Joy of Killing Time via Breeding Little Squiggles

A couple of weeks ago I got interested in this project but wanted full control of the code, wanted to know exactly what it is doing, wanted a bunch of features like the ability to import and export CA rules, and wanted to have the process not be seeded by cellular automata already featuring gliders (which the web app seems to be). To this end I have pushed a project to github that does something similar but, I hope, more transparently.

I’m calling it Lifelike Cellular Automata Breeder. It is a (C# WinForms) application in which given some settings a user can artificially select and breed cellular automata; i.e., it performs a genetic algorithm in which the user manually provides the fitness criteria interactively.

I decided I wanted to only allow a reproduction step in which I scramble together state tables in various ways, guessing that using “DNA” more complex than commensurate 2D tables of numbers wouldn’t work well for a genetic process in this case. I characterize CA rules as applying only relative to a given number of states and a given, what I call, “cell structure” and “neighborhood function”. Cell structure just means a lattice type and neighborhood –e.g. square, with four neighbors; square, with eight neighbors; hex, with six, etc. “Neighborhood function” is an arbitrary function that given the  states of the n cells in some cell’s neighborhood returns an integer from 0 to r where r is dependent on the neighborhood function and possibly number of states. For example, Conway Life uses the neighborhood function I refer to as “alive cell count”, and for an n-cell neighborhood, r equals n because the greatest number of alive cells that can surround a cell is just the size of the neighborhood. If the user has selected s total states, the state tables will be s by r.

Lifelike works as follows

  1. The user selects a number of states, cellular structure, and neighborhood function and kicks off the genetic process.
  2. Lifelike sets the current generation to nil, where by “generation” we just mean a set of cellular automata that have been tagged with fitness values.
  3. While the user has not clicked the “go to the next generation” button,
    • If the current generation is nil, Lifelike randomly generates a cellular automata, CA, from scratch by making an s by r state table filled with random numbers from 0 to s. (The random states are generated via a discrete distribution controllable by the user). If the generation is not nil, Lifelike selects a reproduction method requiring k parents, selects k parents from the current generation such that this selection is weighted on the fitness of the automata, generates CA using the reproduction method and parents, and then possibly selects a random mutation function and mutates CA, selecting the mutation function via a discrete distribution controllable by the user and applying it with a “temperature” controlled by the user.
    • Lifelike presents CA in a window.
    • The user either skips CA in which case it no longer plays a role in the algorithm or applies a fitness value to it and adds it to the next generation.
  4. When the user decides to go to next generation, the selections the user just made become the new parent generation and processing continues.

Here’s a video of me playing with an early version of the application.


Results, Musings, etc.

Briefly, Lifelike works.

You can produce interesting cellular automata with it and there is a weird feeling that is hard to describe when you first see a tiny glider wiggle across the screen; however, the way it works is somehow more mundane than I thought that it would be. I wonder if all genetic algorithms are like this. Most of what it produces is garbage. When you see something that isn’t garbage you can select for it. However Lifelike doesn’t do magic. It doesn’t magically find phenomena you want just because you have a fancy framework implemented to find such phenomena. For example, it is my belief at this point that there is no simple hex grid analog of Conway Life using alive cell count, a six cell neighborhood, and 2-states. I think you could probably prove the non-existence of gliders in this configuration but it would be a boring proof by exhaustion and running Lifelike on that configuration is boring as well. Just because you, the user, are a step in a “genetic algorithm” doesn’t somehow make it interesting.

The simple hex neighborhood negative result led me to ask the following question: What is the smallest change you can make to hex/6-cell/alive cell count/2 states to allow Conway Life-like behavior? If you google “Hex Life” you will find that it is well-known that interesting things can happen if you go to a 12-cell  Star of David shaped neighborhood, but this seems inelegant to me because the simple hex neighborhood is so nice. The question then is are gliders possible in the simple hex lattice and neighborhood if we add one state and modify “alive cell count” in a trivial way? The answer to this question turns out be yes. There are beautiful rules that live in the hex lattice with the natural neighborhood if we have 0 = death, 1 = Alive-A, 2 = Alive-B and instead of the simple alive cell count we use its natural analog when states can have values greater than one: sum of states.

Below is a such a rule set and is probably the best thing that has come out of my work with Lifelike as far as I am concerned (So I am naming it Joe Life, assuming that it is unknown in the literature).

The above has a nice quality that Conway Life also has that I call “burn”. This a qualitative thing that is really hard to define but it is what I look for now when I play with Lifelike: burn + gliders = a Life-like cellular automaton. “Burn” is the propensity of a cellular automata configuration to descend into segregated regions of chaos that churn for awhile before ultimately decaying into gliders, oscillators, and still lifes. Some CAs burn faster than others; the above has a nice slow burn. CAs that exhibit steady controlled burn turn out to be rare. Most CAs either die or devolve instantly into various flavors of unbounded chaos.

However there does turn out to be another quality that is not death or unbounded chaos that is sort of like the opposite of burn. See for example

(The above is hex 6-cell, four states, and using a neighborhood function I call “state-based binary”)  which I have been calling “Armada” and generally have been referring to these kind of CAs as being armada-like. Armada-like cellular automata quickly decay completely into only weakly interacting gliders. For example, one from the literature that I would characterize as armada-like is Brian’s Brain. Armada-like rules turn out to be more common than life-like rules. They’re impressive when you first start finding them but they are ultimately less interesting, to me at least. The best thing about armada-like rules is that they indicate that life-like rules are probably “nearby” in the space you are exploring in the genetic process.

Also they can breed weird hybrids that defy classification, such as the following which are all burn with large blob-like gliders and seem sometimes to live around the boundary between armada-like rules and life-like rules.

Magic Carpets (square, 4-cell/ 4 states / sum of states)

or Ink Blots (hex, 6-cell/ 3 states / “0-low-med-high”)

My other major result is that life-like rules exist in the square 4-cell neighborhood if we allow an extra state and use the simple sum of states as the neighborhood function, but they can be boring looking so instead here is an armada-like square 4-cell CA that is on the edge of being lifelike:

The above uses the neighborhood function I call “2-state count” which enumerates all possible combinations of c1, number of neighboring cells in state 1 and c2, number of neighboring cells in state 2 or above, in an n-cell neighborhood i.e. c1 + c2 ≤ n.

My Code for Doing Two Things That Sooner or Later You Will Want to do with Bezier Curves

I just added full cubic bezier curve support to the vector tessellation creation tool I am developing (see here). I’ll include some video at a later date (Update: here) showing this feature in action but first I want to document and make publicly available some of the code involved: namely, a bezier class in C# that supports two functions that are not the easiest things in the world to implement. Both of these functions come up as soon as you try to do anything non-trivial with beziers and the literature online can be unhelpful.

My code is here: link.

The first function is splitting a cubic bezier into two curves at a given t parameter such that the concatenation of the two curves is congruent to the original. This turns out to be easy; however, I could find no actual code on offer, rather just descriptions of code. There is good coverage elsewhere on the theory so I won’t go into it deeply here, but, briefly, splitting a bezier is easy because how one does so follows naturally from the sort of recursive definition of the bezier that is implied by De Casteljau’s algorithm. Anyway here is my code:

...
 public Tuple<Bezier,Bezier> Split(double t) {
     PointD E = Interpolate(t, CtrlPoint1, CtrlPoint2);
     PointD F = Interpolate(t, CtrlPoint2, CtrlPoint3);
     PointD G = Interpolate(t, CtrlPoint3, CtrlPoint4);
     PointD H = Interpolate(t, E, F);
     PointD J = Interpolate(t, F, G);
     PointD K = Interpolate(t, H, J);

     return new Tuple<Bezier, Bezier>(
         new Bezier(CtrlPoint1, E, H, K),
         new Bezier(K, J, G, CtrlPoint4)
     );
 }

 static private PointD Interpolate(double t, PointD pt1, PointD pt2) {
     double x = (1.0 - t) * pt1.X + t * pt2.X;
     double y = (1.0 - t) * pt1.Y + t * pt2.Y;
     return new PointD(x, y);
 }
....

The other function is considerably harder: finding the nearest point on a bezier curve to a given point. If you scan the literature (i.e. perform the google search) you will find that what most people end up using is some code from an article in the book Graphics Gems I, “Solving the Nearest-Point-On-Curve Problem” by Philip J. Schneider. I, however, have a couple of problems with this code (1) I don’t really understand it, (2) it is long and I need to port it to C#, and (3) someone on the internet is claiming that it is incorrect but given (1) I can’t really evaluate the claim; see the comment that begins “There seem to be errors in the ControlPolygonFlatEnough function […]” in the C code here.

But more relevantly I just don’t get this code on a fundamental level. The nearest point on a bezier problem is difficult to solve mathematically but it is easy to formulate mathematically. I think that the Graphics Gems code obscures the straightforward aspect of how this problem can be formulated. Further, I don’t have a problem using code that I don’t understand; however, if I am going to port an opaque block of code from one language to another I’d like to keep that portion of code to a minimum and I can do this easily by structuring my code around the formulation of the problem that I understand.

The formulation I am talking about is as follows, if B(t) is a particular cubic bezier function, B’(t) is its derivative, and P is an arbitrary point then [B(t) – P] ⋅ B’(t) = 0 when t is the parameter of the closest point on B to P. [B(t) – P] is a vector that points from P towards some point on the curve and B’(t) is a vector that is tangent to the curve at this point, if the distance is going to be a minimum then these two vectors will be perpendicular and thus their dot product will be zero. Since B(t) is cubic, its derivative will be quadratic and thus when we take the scalar product of [B(t) – P] and B’(t) we will end up with a quintic equation of one variable. This leads naturally to a nearest point function that looks like the following on the top level:

 public Tuple<PointD, double> FindNearestPoint(PointD pt)
 {
     var polyQuintic = GetNearestBezierPtQuintic(
         _points[0].X, _points[0].Y,
         _points[1].X, _points[1].Y,
         _points[2].X, _points[2].Y,
         _points[3].X, _points[3].Y,
         pt.X, pt.Y
     );
     List<Complex> roots = FindAllRoots(polyQuintic);

     // Filter out roots with nonzero imaginary parts and roots
     // with real parts that are not between 0 and 1.
     List<double> candidates = roots.FindAll(
         root => root.Real > 0 && root.Real <= 1.0 && Math.Abs(root.Imaginary) < ROOT_EPS
     ).Select(
         root => root.Real
     ).ToList();

     // add t=0 and t=1 ... the edge cases.
     candidates.Add(0.0);
     candidates.Add(1.0);

     // find the candidate that yields the closest point on the bezier to the given point.
     double t = double.NaN;
     PointD output = new PointD(double.NaN,double.NaN);
     double minDistance = double.MaxValue;
     foreach (double candidate in candidates)
     {
         var ptAtCandidate = GetPoint(candidate);
         double distance = DistSqu(ptAtCandidate, pt);
         if (distance < minDistance)
         {
             minDistance = distance;
             t = candidate;
             output = ptAtCandidate;
         }
     }

     return new Tuple<PointD, double>(output, t);
 }

which reduces the problem to implementing GetNearestBezierPtQuintic() and a numeric root finding function for polynomials, given that quintic equations cannot be solved via a closed-form formula like the quadratic equation.

GetNearestBezierPtQuintic() –which returns the coefficients of [B(t) – P] ⋅ B’(t) when fully expanded given P and the control points of — turns out to be a little two much algebra to work out comfortably by hand, so I ran the open source symbolic mathematics application Sage on this Sage script yielding the following when translated into C#: (For the first time on the internet! … as far as I can tell)

static private List<Complex> GetNearestBezierPtQuintic(double x_0, double y_0, double x_1, double y_1,
            double x_2, double y_2, double x_3, double y_3, double x, double y)
{
    double t5 = 3 * x_0 * x_0 - 18 * x_0 * x_1 + 27 * x_1 * x_1 + 18 * x_0 * x_2 - 54 * x_1 * x_2 + 27 * x_2 * x_2 -
        6 * x_0 * x_3 + 18 * x_1 * x_3 - 18 * x_2 * x_3 + 3 * x_3 * x_3 + 3 * y_0 * y_0 - 18 * y_0 * y_1 +
        27 * y_1 * y_1 + 18 * y_0 * y_2 - 54 * y_1 * y_2 + 27 * y_2 * y_2 - 6 * y_0 * y_3 + 18 * y_1 * y_3 -
        18 * y_2 * y_3 + 3 * y_3 * y_3;
    double t4 = -15 * x_0 * x_0 + 75 * x_0 * x_1 - 90 * x_1 * x_1 - 60 * x_0 * x_2 + 135 * x_1 * x_2 -
        45 * x_2 * x_2 + 15 * x_0 * x_3 - 30 * x_1 * x_3 + 15 * x_2 * x_3 - 15 * y_0 * y_0 + 75 * y_0 * y_1 -
        90 * y_1 * y_1 - 60 * y_0 * y_2 + 135 * y_1 * y_2 - 45 * y_2 * y_2 + 15 * y_0 * y_3 - 30 * y_1 * y_3 +
        15 * y_2 * y_3;
    double t3 = 30 * x_0 * x_0 - 120 * x_0 * x_1 + 108 * x_1 * x_1 + 72 * x_0 * x_2 - 108 * x_1 * x_2 +
        18 * x_2 * x_2 - 12 * x_0 * x_3 + 12 * x_1 * x_3 + 30 * y_0 * y_0 - 120 * y_0 * y_1 +
        108 * y_1 * y_1 + 72 * y_0 * y_2 - 108 * y_1 * y_2 + 18 * y_2 * y_2 - 12 * y_0 * y_3 + 12 * y_1 * y_3;
    double t2 = 3 * x * x_0 - 30 * x_0 * x_0 - 9 * x * x_1 + 90 * x_0 * x_1 - 54 * x_1 * x_1 + 9 * x * x_2 -
        36 * x_0 * x_2 + 27 * x_1 * x_2 - 3 * x * x_3 + 3 * x_0 * x_3 + 3 * y * y_0 - 30 * y_0 * y_0 - 9 * y * y_1 +
        90 * y_0 * y_1 - 54 * y_1 * y_1 + 9 * y * y_2 - 36 * y_0 * y_2 + 27 * y_1 * y_2 - 3 * y * y_3 + 3 * y_0 * y_3;
    double t1 = -6 * x * x_0 + 15 * x_0 * x_0 + 12 * x * x_1 - 30 * x_0 * x_1 + 9 * x_1 * x_1 - 6 * x * x_2 +
        6 * x_0 * x_2 - 6 * y * y_0 + 15 * y_0 * y_0 + 12 * y * y_1 - 30 * y_0 * y_1 + 9 * y_1 * y_1 -
        6 * y * y_2 + 6 * y_0 * y_2;
    double t0 = 3 * x * x_0 - 3 * x_0 * x_0 - 3 * x * x_1 + 3 * x_0 * x_1 + 3 * y * y_0 - 3 * y_0 * y_0 -
        3 * y * y_1 + 3 * y_0 * y_1;

    return new List<Complex> { (Complex)t0/t5, (Complex)t1/t5, (Complex)t2/t5, (Complex)t3/t5, (Complex)t4/t5, (Complex)1.0 };
}

and decided to use Laguerre’s Method to solve the equation.

I chose Laguerre’s Method because it is optimized for polynomials, is less flaky than Newton-Raphson, is relatively concise, and is easy to port from C++ given .Net’s System.Numerics’s implementation of complex numbers. I ported the implementation of Laguerre’s Method found in Data Structures and Algorithms in C++ by Goodrich et. al. (here). This solver works well enough for my purposes; however, I think the optimal thing to do to get the optimal nearest-point-on-a-bezier algorithm would be to implement one of a class of newish algorithms that are specifically targeted at solving the quintic, e.g. “Solving the Quintic by Iteration” by Doyle and McMullen. No implementations of these algorithms seem to be publicly available, however, and it would be a research project that I don’t have time for right now to translate that paper from academese into code.

Discretely Distributed Random Numbers in C#

Random numbers generated from a discrete distribution are a commonly needed thing in game development.

By “discrete distribution” we just mean the roll you get from something like an unfair die, e.g. you want a random number from 0 to 5 but you want 4 and 5 to be twice as likely as 0, 1, 2, or 3. If we think of each possible random value as having a weight, in this case 0, 1, 2, and 3 would have a weight of 1 and 4 and 5 would have a weight of 2.

A simple way to generate these kinds of random values is the following. Given some n such that we want random values ranging from 0 to n-1 where for each 0 ≤ i < n we have a weight w(i):

  1. Build a data structure mapping cumulative weight to each value i. By cumulative weight we mean for each i the sum of w(0), … , w(i-1).
  2. Generate a random number r from 0 to W-1 inclusive where W is the total weight i.e. the sum of w(i) for all i.
  3. If r is a cumulative weight in our data structure return the value associated with it; otherwise, find the value v that is the first item in the data structure that has a cumulative weight greater than r and return v-1.

Obviously the data structure in the above could just be an unordered array but a better way to do it is to use a binary search tree because 3. will then be O(log n) rather than linear. In Java you can do this with a TreeMap. In C++ you can do this with an std::map (or in C++ you can do the whole thing with boost::random::discrete_distribution).

However, in C# you can’t just use a SortedDictionary, which is a binary search tree under the hood. You can’t use a SortedDictionary because it does not expose the equivalent of C++’s std::lower_bound and std::upper_bound or the equivalient of Java’s TreeMap.floorEntry(…) and TreeMap.ceilingEntry(…). In order to perform the “otherwise” part of 3. above you need to efficiently be able to find the spot in the data structure where a key would go if it was in the data structure when it is in fact not in the data structure. There is no efficient way to do this with a SortedDictionary.

However, C#’s List does support a BinarySearch method that will return the bitwise complement of the index of the next element that is larger than the item you searched for so you can use that. The downside of the above is that there will be no way to efficiently add or remove items to the discrete distribution, but often you don’t need this functionality anyway and the code to do the whole algorithm is very concise:

class DiscreteDistributionRnd
    {
        private List<int> m_accumulatedWeights;
        private int m_totalWeight;
        private Random m_rnd;

        public DiscreteDistributionRnd(IEnumerable<int> weights, Random rnd = null)
        {
            int accumulator = 0;
            m_accumulatedWeights = weights.Select(
                (int prob) => {
                    int output = accumulator;
                    accumulator += prob;
                    return output;
                }
            ).ToList();

            m_totalWeight = accumulator;
            m_rnd = (rnd != null) ? rnd : new Random();
        }

        public DiscreteDistributionRnd(Random rnd, params int[] weights) : 
            this(weights, rnd) { }

        public DiscreteDistributionRnd(params int[] weights) : 
            this(weights, null) { }

        public int Next()
        {
            int index = m_accumulatedWeights.BinarySearch(m_rnd.Next(m_totalWeight));
            return (index >= 0) ? index : ~index - 1;
        }
    }

where usage would be like the following:

            DiscreteDistributionRnd rnd = new DiscreteDistributionRnd(3,1,2,6);
            int[] ary = new int[4] {0,0,0,0};
            for (int i = 0; i < 100000; i++)
                ary[rnd.Next()]++;
            System.Diagnostics.Debug.WriteLine(
                "0 => {0}, 1 => {1}, 2 => {2}, 3 => {3}",
                (float)(ary[0] / 100000.0),
                (float)(ary[1] / 100000.0),
                (float)(ary[2] / 100000.0),
                (float)(ary[3] / 100000.0)
            );

Sprite Packing in Python…

(update: newer version mentioned in comments is on github here.)

I’ve been working on my puzzle game Syzygy again, after a long hiatus, and am now writing to iOS/cocos2d-x rather than just working on the prototype I had implemented to Win32.

The way that you get sprite data into cocos2d is by including as a resource a sprite sheet image and a .plist file which is XML that specifies which sprite is where. Plists are apparently an old Mac thing — I had never heard of this format. .plists describing a lot of sprites would be a chore to write by hand so there is a cottage industry of sprite packing applications.

I tried out one called TexturePacker and liked it a lot — except that it is crippleware; I need a few features that are only in the full version; plus I can’t stand crippleware; and I think $30 is too much for something that I can write myself over the weekend. So I decided to write my own sprite packer over the weekend.

The result is pypacker, a python script: source code here. Usage is like

pypacker -i [input] -o [output] -m [mode] -p

where

  • [input] = a path to a directory containing image files. (In any format supported by the python PIL module.)
  • [output] = a path + filename prefix for the two output files e.g. given C:\foo\bar the script will generate C:\foo\bar.png and c:\foo\bar.plist
  • [mode] = the packing mode. Can be either “grow” or fixed dimensions such as “256×256”. “grow” tells the algorithm to begin packing rectangles from a blank slate expanding the packing as necessary. “256×256” et. al. tell the algorithm to start with the given image size and pack sprites into it by subdivision, throwing an error if they all won’t fit.
  • -p = optional flag indicating you want the output image file dimensions padded to the nearest power-of-two-sized square.

The algorithm I used is a recursive bin packing algorithm in which sprites are placed one-by-one into a binary tree. I based it directly on Jake Gordon’s work in Javascript for generating sprite sheets for use in CSS, described here, only my algorithm is sort of like version 2 of his i.e. I fixed an issue that bugged me about his algorithm.

The core of the algorithm is a function that looks like this:

def pack_images( named_images, grow_mode, max_dim):
    root=()
    while named_images:
        named_image = named_images.pop()
        if not root:
            if (grow_mode):
                root = rect_node((), rectangle(0, 0, named_image.img.size[0], named_image.img.size[1]))
            else:
                root = rect_node((), rectangle(0, 0, max_dim[0], max_dim[1]))
            root.split_node(named_image)
            continue
        leaf = find_empty_leaf(root, named_image.img)
        if (leaf):
            leaf.split_node(named_image)
        else:
            if (grow_mode):
                root.grow_node(named_image)
            else:
                raise Exception("Can't pack images into a %d by %d rectangle." % max_dim)
    return root

We iterate through the images we want to pack. For each image, try to find a rectangular node in the tree that can contain the image. If one exists, place the image in the node and subdivide the node such that the remaining space, not taken up by the image, is available in the tree (this is what ‘split_node’ does). If such a node cannot be found, throw an exception if we are not in ‘grow’ mode or expand the root rectangle node to accommodate the new image if we are in ‘grow’ mode.

This routine is very similar to the Javascript implementation I linked to above. The difference is in the details about the structure of the binary tree. Jake Gordon’s Javascript implementation uses a node type that stores an image in the upper left and has children that he calls ‘right’ and ‘down’  like this:

Since actual data is always burnt into the upper left, it means that the tree can never subdivide into this space; we can never recurse into the upper left. This results in the grow_node routine being awkward to write. When we grow the root we either want to extend to the right or extend down, if the upper left can be a node and not image data this is a simple matter of creating a new node and making the the existing root its upper or left child. Anyway, Jake Gordon’s implementation results in a packing tree that cannot both grow right and grow down simultaneously because it would have been complicated to implement this. This limitation is not a problem practically as long as you sort the images from largest to smallest before running the packing algorithm —  a standard heuristic from the bin packing literature.

I however wanted to see if the standard sorting heuristic is really accomplishing anything. I wanted to be able to pack rectangles in random order. I therefore simplified the trinary node structure of the Javascript implementation into true binary nodes either oriented horizontally or vertically like this:

Further now only leafs can contain images and if a node is not a leaf it always has two valid, that is non-null, children. Using this type of tree structure makes the full grow_node routine more or less trivial.

Beyond that, I’m using the following heuristics:

  • If the orientation (horizontally or vertically) of a split is not forced, split with the orientation that will result in the new empty node having the largest area
  • If the orientation of growing the root rect is not forced, grow in the direction that leads to the smallest increase in the maximum side length of the root rectangle. (This heuristic enforces squarishness and is extremely important. Without doing this the grow version of the algorithm is basically unusable, and in this sense this grow heuristic can be considered part of the algorithm rather than a heuristic that can be swapped out)

Sorting by size (max side length) turns out be about a 6% improvement with this algorithm. Here’s 500 rects packed with sorting (top) and without (bottom):