This is an exciting update, because I’ve been hoping for a long time to release something like this!
We get these questions all the time: how do I improve the frame-rate? How do we make performance better?
I’ve taken a few different approaches to solving this and done a lot of exploration in this area. While I have found some successful solutions, none have ever been convenient or practical enough to integrate into the main SDK until this most recent breakthrough.
Quick background on the problem
Our approach to multi-view rendering is to set the camera frustum to represent each view by adjusting the view and projection matrices, make separate renders, then process them all into a final lightfield image. This is something of a brute-force method, and requires Unity to re-render all lights, materials, and shadows in sequential order each time camera.Render() is called.
There are ways I’ve tried utilizing the new Scriptable Rendering Pipeline that may allow us to reduce this redundancy on a lower level. We will likely release these experiments at some point, but it is neither convenient nor practical and requires a big commitment to implement in a project.
The most straightforward way to improve performance is to simply reduce the number of views we render and then interpolate between them to generate the rest. We can do this by running a compute shader that reprojects color pixels from our normally rendered views to generate the new ones.
The improvements in performance are substantial and the generated views look very close to the real thing, so we’re excited for all Looking Glass creators to give it a try.
How to use it
Download the new 1.1.0b1 SDK here. In the HoloPlay Capture, you will find a new drawer called Optimization.
View Interpolation: this field is where you select how many views to render normally. The options are Every Other, Every 4th, Every 8th, 4 Views, and 2 Views. We saw the best results with Every Other and Every 4th, with diminishing returns from increasing the interpolation any further.
Fill Gaps: Boolean which will run an additional process to attempt to make a best-guess at filling in some of the gaps where not enough information was available to interpolate properly. This was not usually necessary in our usage, but is included in case you may find it useful.
Blend Views: Specular highlights and reflections are dependent on the viewing angle, and so we get some artifacts if we merely mix adjacent views without attempting a smooth transition. This option will mitigate that effect at the (minor) cost of some performance.
Currently, the view interpolation does not work with the post-processing stack in Unity. This is a work in progress! We absolutely feel it is necessary to support this and will continue to push on this front. On the bright side, our Simple DOF script does work!
Also, please note that this is an experimental release and these features are subject to change or be removed.
How it works
The HoloPlay Capture normally works by moving in a strictly horizontal direction, shifting its frustum around a zero-parallax plane as it goes from view to view. This makes our lives easy in one crucial way: after all the view and projection matrix math is finished, effectively all we’ve really done has been to move the vertices left or right in clip space depending on the view and the distance from the zero-parallax plane.
Given this, we could emulate the same offset on a per-pixel basis by looking at each pixel in a view’s colormap and depthmap, and redrawing the same color to be in a different location based on the depth.
Pixel shaders don’t allow for this kind of arbitrary writing, and to reverse the process and sample a sizable strip of pixels to figure out if a nearby pixel should be moved to the one spot is tedious and not very performant. It can be done using some established ray marching techniques with success, as we’ve seen with Masuji’s Looking Glass applications, but to interpolate between existing views using this method offers barely any performance improvements in our Unity SDK.
Compute shaders, however, can do this, and do it in an intuitive way, simply by telling the program to copy the pixel to the place you want it to go. And there is no marching required, so it’s very fast! Fast enough to be a roughly 30% improvement on average. There is, however, a downside to this I will get to in a moment.
There is one conditional check we must do before we commit to reprojecting and drawing a given pixel: did another pixel already draw to this same location? And if so, did it have a depth value closer to the camera than the one we’re about to write? If so, we want our current pixel to be occluded, so we do not draw it. If not, we write both our color pixel to the colormap and also our depth pixel to the depthmap, so future pixels know they must have a closer value in order to overwrite in this location.
The issue with a compute shader running parallel processing here is that two pixels moving to the same location could read this depth value at the same time. In this case, because neither have written to it yet, they’ll both read the default (furthest) value, and they’ll both decide they can write there. It will be a toss-up which pixel will actually end up written to that location, and that randomness plays out every single frame, creating a kind of flickering effect.
We can still run the compute shader in parallel across the y-axis and across multiple views, because pixels can only move left or right in a single view. However, this leads us to run the compute shader 819 times for our default view size, which is fairly slow.
Well, what if we make some assumptions about how far a pixel can possibly move? Can it move half-way across the view texture? A quarter? An eighth? Let’s assume an eighth is the most it can move — then we can run eight columns of pixels at once, move one pixel over and repeat, and so on eight times to fill out the entire view. Now we ran it 103 times instead. We’ll need to run it fewer times for closer interpolations, because the horizontal offset is less in those cases. So, for example, we can run the program 4 times when we’re generating every other view, but we will get flickering if we try to only run 4 times at every 8th view. A simple rule that yielded the best results was to run the program double the interpolation amount, so 4 for every other, 8 for every 4th, 16 for every 8th, etc).
This reduced flickering to a livable degree while not sacrificing too much performance. Benchmarked on an MSI laptop with a GTX 1070:
I’m so excited to see what everyone makes with these improvements, and I’d love to hear feedback and issues so we can polish this into a major release!
Hope this opens up a world of possibilities for you.
Lead Software Architect, Looking Glass Factory