After getting the basic tasks worked out for the project to de-skew perspective images and attach them inside AutoCAD, I went ahead and got cracking on the next one. I’d already knocked off 1 & 2, and so decided to have a go at task 5. Why task 5? Mainly because I realised I was impatient to put some code together that writes images directly to file, if nothing else to prove that it can be solved in Python with tolerable performance.
The way the overall project works is that it holds pixel information in a couple of matrices: one for the positions and one for the colours. The positional matrix can then be transformed via a coordinate system transformation – using matrix-matrix multiplication – so that the resultant positional matrix holds values that actually belong to the new coordinate system (which is defined in terms of the area of the picture we want to de-skew, which will stretch from 0,0 to 1,1 for a square region).
The existing display code needs to go through the transformed pixel information and create polygonal shapes in SVG embedded in an HTML page. The pixel information is basically stored in a dictionary keyed off the source pixel coordinates (i.e. the X-Y displacement from the top-left of the original image) and whether it’s the X or Y value. The positional matrix – post transformation – holds an X and a Y offset relative to the target coordinate system (that of the cropped section of the de-skewed image we want to display or create).
The polygons are created by getting the new locations of the pixels that would be adjacent in the source image: the one to the right of it, the one immediately below it and the one diagonally right and down. The four-side polygon to be placed in the output will almost certainly not be square (some of them may be reasonably so, but many will be distorted quite significantly depending on the degree of perspective in the original). The output polygon corresponding to a particular source pixel simply has a fill based on the colour of that pixel.
One option would be to use an existing SVG library and use that to generate our graphics. In many ways the path of least resistance, I didn’t really like this option. The displayed HTML page has moiré patterns that I’d like to avoid and I really wanted to find a way to generate a (for instance) PNG file without having an additional component dependency.
So I looked at ways to generate our image – as a list of lists of pixel information, one for each row, as the project already has code in place to generate PNG files from that representation.
I tried a couple of approaches, both of which involved looping through pixel coordinates in the output image space (which I establish based on the size of the portion of the input image we're choosing to correct), determining the colour value for each.
In the first attempt, I indexed the output locations on X and used the index as a way to determine the original pixel coordinates to test to work out which to use. I used a threshold delta to collect various candidates along X and then check them for any that were within the same threshold on Y. It proved to be quite flakey (there were noise-like spikes in the output image) and rather slow.
The idea for the second attempt came to me in the bath (yes, "Eureka" was very nearly shouted ;-). All that needed to be done was to start at the top left “whiteboard” pixel in the output dictionary and get the surrounding polygons' centre points (mid-point between the 1st and 3rd vertex was actually sufficient) and compare them to the XY values of the point we're trying to output. The closest to this gets used (and becomes the next pixel to use to get its surrounding ones, etc.). Once a row in the output has been created, we calculate the next pixel to start at on the row below (we can use the fraction along the line between the top left and bottom left points). Repeat until done.
This worked very well and resulted in these two de-skewed images being created for the two perspective images shown in the previous posts.
First the whiteboard…
And then the painting…
I found the results to look pretty good, and with the second approach didn’t take too long to generate (assuming you’re working with an image less than 1000x1000 pixels – larger than can take some time, but will work eventually).
Next up I’ll talk about the challenges getting the code to work in IronPython rather than standalone Python, so that we can get this code (which I’m getting closer to being able to share) working inside AutoCAD.