Project: Photo Mosaic Renderer and Viewer

Usage:
First, click on a button to load that scene.
On a PC: translate= wasd, zoom in = e, zoom out = f, reset zoom = z, zoom in = left MB (hold), zoom out = right MB (tap)
On a smart device: zoom in = touch (and hold), zoom out=three fingers,
reset zoom = two fingers

Introduction

Photo mosaics using movie frames are really cool, so I went ahead and wrote a renderer (and a viewer). As usual, I start off with a quick prototype code in Matlab and eventually the code becomes way too big to port, so I just continue on with Matlab. So sometimes I wish I had started with C or C++, but I like Matlab.

I quickly realized that to get really nice mosaics, you have to use a large database of mosaic cells. The closer the mosaic cell resembles the template, the better it will look. This requires a lot of testing.

The creation

The best mosaic images are the ones where the mosaic elementare relatively large and still looks as though it fits perfectly with the template. I’m thinking, from here on out we should call them something, but what? They are mosaic cells, so mocel. Mocel sounds like a sea animal, how about mosel?). To accomplish this we have to collect a large number of images.

 

Example photo of Janet Leigh with a grid overlay. This is the template to which each cell will be replaced with another photo from some other source.

Data acquisition

To be on the safe side I switched from block buster movies which I had used in my prototyping step (e.g. Iron man, Memento,…) to movies in the public domain. I downloaded 18 movies, but I decided to stick with 10 of them (I had enough frames). Here is the link of the movies that are in the public domain: https://www.vulture.com/2017/09/30-hollywood-classics-streaming-free-in-the-public-domain.html.

James Finlayson in disbelief

James Finlayson just can’t believe what he is reading (Way Out West (1937))

In total I picked 16000 frames. I wrote a simple bat script (see below) to extract the frames from the mp4 files. I also added support to cull out mosels that are deemed too “similar”, the idea is that if they do not contribute enough variation to the database to justify the extra space they take up, they get tossed. Because I use Matlab, I know speed was going to be an issue, so we cannot simply extract every frame, that would not necessary improve the quality of the mosaics anyway.

 

for %%f IN (*.mp4) do (	
	echo processing... %%~nf
	ffmpeg.exe -i "%%f" -r 0.1 -f image2 frames2/"%%~nf_"%%03d.jpg
)

Additionally, going from my initial tests with color to black/white movies help with the quality of the mosaic because a lot less mosels are needed. Imagine the template having a red background with a line, the renderer has to find such a mosel, all red with a line, but in black and white this is reduced to just a shade of gray.

 

Template cell samples (blue left) compared with candidate samples (green right).

Optimizations

So, the problem of comparing images (with hundreds of RGB values) is simple, but would have been extremely slow in Matlab. The problem can instead be reduced to comparing a few RGB values. The reasoning is that using a sample grid pattern will catch the slow changing features of the template very well.

Comparing samples (vectors) element-by-element is optimized using SIMD by Matlab.

Because we will view the mosaic from a distance, the fast changing features (aka high frequency details) will in essence be blurred to our eyes. So worse case scenario is if a mosel is all dark but at the samples it has white pixels, the mosel we want is dark, but because the samples returned are white, we would get a white mosel. To avoid picking mosels that are “dirty” like this, a blurring filter is applied before the sampling of both the mosel and template, however when assembling the mosels in the mosaic image, no blurring is applied.

Oliver hardy Breaking the fourth wall

Oliver Hardy in disbelief (Way Out West (1937))

Additionally, when dealing with black/white images, we can simply regard a sample as a scalar. Imagine that we compare a candidate mosel with the template cell as we scan the template. We have maybe 100 samples for the mosel and 100 for the template cell. Instead of calculating differences between the mosels RGB values and the template cell RGB values, we will think of all the samples as vectors. So a mosel and template has both a vector of 100 elements that we need to compare. Furthermore, we can think of mosel samples as points in a 100-dimensional space. Ok, I think I looked down the rabbit hole and I fell in now.

 

Illustration showing that the comparing of k samples of one template cell (light blue ball) with multiple candidates (green balls) can be seen as finding the closest point in a k-dimensional space (R is really N here), which is faster than the naïve vector-version above.

Well, we are already in the hole, let’s continue. We can think of the problem as being the shortest distance between all points in a 100-dimensional space. So, how can we do this efficiently?
Actually there is a function that is called dsearchn that does just this. All we need to do is plug in the arguments and we have a speedy mosaic renderer.

Challenges

Some challenges along the way include black borders and different sized videos. Thankfully ffmpeg can deal with borders (with some help), it takes a little adjusting the flag arguments -vf "crop=1280:520:0:100" This is annoying when dealing with 10 movies, so I might fix this in the future. The different sized movies are resized and cropped if needed in the Matlab code.

When looking at the demo on the top of the page, there are artifact that has to do with how textures are sampled in the border of each tile in the spritemap. When zooming in you can see darker lines between tiles. This issue can be kind of resolved by sorting the tiles in the spritemap or creating borders for each tile that repeats the color of that tile. I have done neither, but I’m planning on resolving this issue in a shader instead, leaving the spritemap creation part as it is. Because I’m saving the spritemap as a jpg, this could have an affect on the borders as well, because jpg makes the image smoother (it cannot deal with discontinuities well), so maybe saving the spritemap as a png can help.

I noticed that since I use a grid to sample which mosaic element I should pick, I risk picking mosaic cells that has “dirt” around the samples, making the mosaic image look, well… dirty. I tried adding more samples but that just slows everything down and the result does not improve. The best solution is to blur the mosaic cells slightly before sampling. In this way any “dirty” pixels will leak in to the sample points and give a lower score which will reduce the risk of picking such a mosel.

 

Template (left) only has white pixels while the candidate has white pixels where the sample is but black pixels in the rest of the cell.

The visualization

The mosaics that are output are large. But I want to share them, what to do? I decided to write a simple Javascript viewer that reads a spritemap that is output by the mosaic renderer and a json to place the mosels on it.

The mosaic viewer uses THREEjs. The mosaic is built using quads (two triangles per quad) per mosel. When loading a new mosaic, the same spritemap is used, only the indices change.

new THREE.PlaneGeometry(tilesX, ratio * tilesY, tilesX, tilesY)

By indexing into the spritemap it makes the viewer very lightweight and since it is THREEJS we can zoom in and out, which is neat.

Example spritemap:

Spritemap of 'House of flying daggers'

Spritemap of ‘House of flying daggers’.

 

Simple quad constructed by triangles. Each sub quad is pointing to a UV coordinate in the sprite map.

 

Example mosaic json:

{
"spriteMap":"spritemap_flying_daggers.json", 
"metadata": 
  {
    "columns": 59,
    "rows": 225
  },
"compressionRatio": 0.012505,
"mosaicIndices": [
  875,
  871,
  .
  . 
  .
  811]
}

The spritemap.json contains metadata about the jpg spritemap such as mosel size. The mosaic json contains the indices to the spritemap as integers in row-major order.

The spritemap can be as large as 29 MB. To put that into perspective, a jpg of a mosaic using a spritemap of 29 MB can be 19 MB. Each json that contain the indices into the spritemap takes around 300 KB, so if we have two mosaics as jpg files they would in total take up 40 MB, as opposed to using the mosaic viewer with one spritemap for all mosaic images, which would take 29MB + 300KB*2 = 30 MB.

How about animation?

For fun I also used mosaic elements from “Star Wars 5: The Empire Strikes Back” and created an animation from the movie itself, needless to say, this would require too much data to be feasible for the mosaic viewer, so I just link to YouTube.

Conclusions

In this project I investigated how to make mosaic images using frames from movies. There are still a few issues that I would like to address in the future, such as:

  • Sampling of spritemap tiles leaking pixels. This will be resolved in a future post using a shader.
  • Using the GPU to render the mosaic
  • Smarter way of sampling to avoid picking mosels with dirty pixels (deep learning?)
  • Automatic removal of black borders when extracting frames
  • Full screen viewer of the mosaics

In this article the following movies (in the public domain) where used in creating the mosaic images:

Algiers (1938)
Beat the devil (1953)
Behind office doors (1931)
Love affair (1939)
Father’s Little Dividend (1951)
Penny Serenade (1941)
The Strange Love of Martha Ivers (1946)
The Hitch-hiker (1933)
The Stranger (1946)
Three Guys Named Mike (1951)

 

The github repos for this project:

https://github.com/MatzJB/Javascript-experiments

https://github.com/MatzJB/MosaicRenderer

Leave a Reply

Your email address will not be published.