Sunday, February 28, 2016

Some thoughts around mobile VR and mobile positional tracking

It's been a while since I blogged. Mostly because I haven't gotten around writing things down. So this time I wrote a bit more.

This post summarises a little side project I’ve been playing with lately: how to enable positional tracking for mobile devices, but I also took the opportunity to share some thoughts around why I’m passionate about VR in general and mobile VR in particular.

Before I go into what I did and how it works I want to make it clear that I make no claims to have solved mobile positional tracking or that I've done something new. I didn't find any blogposts describing the same setup I went with and I did get to something that works well enough to prove the point I was setting out to do so I wanted to share.

Background

As a mobile evangelist and a big VR enthusiast, combining the two is obvious. For me mobile VR, or portable VR I should say, is the area that excites me the most. Whenever I think of the future possibilities portable VR and its portable nature can enable my mind literally melts. The visual experience can’t compete with what stationary VR can offer. But just as with mobile gaming in general, the portability possibilities often overweigh visual fidelity in my mind.

Stationary VR (PC/Console) in itself is absolutely amazing and cannot be described well enough with words; it must be experienced. The feeling of being transported into a different world simply by putting on a headset and having your brain accept the virtual reality is a fantastic feeling and justify spending all those dollars to get a decent rig + headset(s).

But there are some limitations with stationary VR, which as the name implies, comes down to portability.
One obvious thing is wires. With stationary VR your head is physically attached to a computer using cables, which besides from sometimes being in your way, means you are limited to experience VR to the place where the computer is. Also the stationary nature of the computer itself means you are further limited to the physical space where the computer is even if the headsets become wireless, which I assume they will.
Another thing is room scale tracking, i.e. being able to walk around having the physical movement of the body reflected in VR. Today only HTC Vive offers room scale tracking out of the box. The solution HTC Vive offers is based around a stationary setup, which again limits the VR experience to the physical place where the setup exists.

Enter mobile. Right out of the box the problem with wires is solved as the experience is all running on your mobile device inserted. But mobile VR lacks a couple of features, which are key to why stationary VR so good.
A main reason for the brain to buy the illusion of VR is to correctly track (at least) the head’s positional movement in addition to the head’s rotation. When the physical movement and virtual movement match the brain is happy and buys the illusion.

Positional tracking is something mobile VR generally lacks as most solutions only tracks the head rotation. Various people are trying to solve the problem of positional tracking for mobile and the remainder of this post is about my experiment of enabling full body positional tracking for mobile. 

The experiment

A common approach to solve positional tracking is Simultaneous Localization And Mapping (SLAM) using the camera and motion sensors in the mobile phone to track the user’s movement. With SLAM all tracking computations takes place on the mobile device, which can compete with the actual experience for valuable CPU resources.

The route I explored was to use a depth camera placed to overlook the play area and offload all tracking computations to a separate device. The tracking data is then streamed to the mobile device over either WiFi or Bluetooth.


The frustum represents what the depth camera sees. The circle represents the user

With this setup it’s possible to do full body tracking of more than one person without it having a performance impact on the mobile device as all heavy computations are performed elsewhere.

The main concern I had going into the project was what impact the latency introduced from streaming the positional data would have on responsiveness. Would it feel sluggish moving around? Best way to find out is to try.

In this experiment I only wanted to test the theory if streaming positional data is an option so I quickly settled for off-the-shelf products. In this case it meant using a Kinect v2 connected to a desktop PC. Not only is the Kinect a great device but it also comes with an SDK with built-in support for full skeleton tracking! This allowed me to simply use the SDK to get tracking data and focus most of my time on the streaming and latency parts.



To visualize and test the tracking data I used Unity to build a simple test scene to walk around in.

To allow me to iterate quickly and test easily I started out running everything locally on the PC using an Oculus DK1 instead of a mobile headset. The benefit of using the DK1 was that it too lacks positional tracking giving me a representative test setup.

To make things easy I decided to stream positional data over a local network using either TCP or UDP rather than Bluetooth. Again this was the easy setup to get to results quickly. I figured that if I needed to I could always test a Bluetooth setup at a later stage. An added benefit of using standard network transfer was that it enabled multiple clients to consume the same data simultaneously.

I approached the streaming problem similar to how networking of gameplay objects works in multiplayer games. On the server side the Kinect is sampled frequently ~60 Hz to make sure the server has recent data available. On the the client a different frame rate can be used to consume the data depending on the clients requirements. Typically I went with 30 Hz for the client as that matched the Kinect frame rate.

The amount of data being transfered is very small. A full Kinect skeleton is 25 joint positions = 75 floats = 300 bytes. At 30 Hz that’s around 1K of data to transfer per second. To get the bandwidth requirement down the data could be compressed, or chose a smaller sets of joints to transfer. As my tests took place on a local network with high throughput this wasn’t an issue and I preferred to have the full skeleton transferred.

Screenshot from PC
Using a simple TCP client/server I was quickly able to get a demo up that samples skeleton data from the Kinect in one process and have it streamed to the game client running in a separate process. With full skeleton data available it was time to test positional tracking. This was easily achieved by positioning the VR camera relative to the position of the head joint. Together with the built in head rotation tracking together with the streamed head position I now had full head tracking. As it all ran on the same machine latency wasn't an issue and the tracking worked remarkably well.

As I was curious on the latency, I created a set of test suites to measure transfer latency. I wanted to know what the difference was between TCP and UDP and if there was any difference given the small amount of data and “perfect” conditions with a local network. To limit the amount of things that could interfere I created two command line test clients in C# as well as using the Unity prototype. I chose C# so that I could reuse the same code in the Unity prototype. In addition to the C# tests I also made a C++ version to rule out any issues with .Net.

All tests except the iOS one was made on the PC, meaning all network traffic was using the loopback interface. I was a bit surprised by the results:
  • TCP Client (C# command line): ~0.25-0.5 ms latency
  • UDP Client (C# command line): ~2 ms latency
  • TCP Client (C++ command line): ~3 ms latency
  • TCP Client (Unity): ~14 ms latency
  • TCP Client (Unity iOS): ~15 ms latency
The differences between the various command line tests can probably be explained by implementation details, but what surprised me was the high latency when using the same code that yielded the lowest latency also yielded the highest when running in Unity. Interesting to note is the small increase between running the test inside of Unity on the local host compared to running it on a mobile device. Only 1 ms increase...
At this point I'm assuming I've done something wrong in my code. Something to investigate in the future.


Anyway, even though I have some latency the test was successful as can be seen in the video below. It was recorded on an iPhone 6 and as the video shows the skeleton is tracked pretty accurate and smooth. The camera is offset so that I could see the skeleton. Apologies for my robot-like movement. I guess recording video, hold a phone and move at the same time was too much ;)


Conclusion

The experiment clearly showed that it's possible to use an external device to do full body tracking and consume the data on a mobile device. Even though this was only a first small step I learned a lot and have a nice list of possible improvements.
A natural next step would be to investigate why I get such a big difference in latency when using Unity (e.g. fix my code), but there are many other improvements that can be done. Such as interpolate joint positions when streaming at lower frequencies.
Naturally it would be interesting to investigate what it would take to create a mobile setup to make it a true mobile experience.
But that's for another blog post another time.

Latency Updates (March 13th, 2016)

After this blog post was originally posted I've looked into the latency issues I had with Unity and have gotten latency down to microseconds levels by implementing a native plugin solution that utilizes Grand Central Dispatch (GCD) to keep networking off any Unity threads.

No comments: