Why I Don’t Care About Your New Mixed Reality Headset

I’m often approached by entrepreneurs in the AR/MR space offering me demos of new hardware.  Competition in this space is fierce. You need three major elements for me to take a new platform seriously.

vrlatryon_2

You Need These Three Things To Have A Successful Mixed Reality Device

The three requirements for any successful AR (or more specifically MR) device are: Display, Computer Vision, Operating System

Display

This is the first element of an AR/MR wearable, and usually this what all hardware companies have. There are a number of different displays out there, but they all seem to share the same limitations: additive translucent graphics, small FOV, and relatively low resolution. Often times devices with claims of wider FOVs end up with even lower resolution visuals as a compromise. Both low and high resolution displays I’ve seen are all additive, thus images appear as translucent. Some companies claim to have solved these problems. As far as I’ve seen, we’re a long ways off from a commercial reality.

getimage

Operating System

When I got my HoloLens devkits, the first thing that impressed me is that Microsoft ported the entirety of Windows 10 to Mixed Reality. Up until now, most AR headsets had simple gaze-optimized skins for Android. Windows Holographic makes even traditional 2D applications able to be run in mixed reality as application windows floating in space or attached to your walls. It’s all tied to a bulletproof content delivery ecosystem (Windows App Store) so distribution is solved as well.

2786932-hololens

Your device needs to be more than just something worn only to run a specific app. Mixed reality wearables will one day replace your computer, phone, and just about anything with a screen. You need a complete Mixed Reality operating system that can run everything from the latest games to a browser and your email client in this inevitable use case.

Computer Vision

I can’t tell you how many device manufacturers have shown me their new display but “just don’t have the computer vision stuff in.” Sorry, but this is the most important element of mixed reality. Amazing localization, spatialization, tracking, and surface reconstruction features are what puts HoloLens light years ahead of its nearest competition.

This stuff is hard to do. Computer Vision was formerly an obscure avenue of computer science not many people studied. Now augmented reality has created a war for talent in this sector, with a small (but growing) number of Computer Vision PhDs commanding huge salaries from well funded startups. There are very few companies that have the Computer Vision expertise to make mixed reality work, and this talent is jealously guarded.

[BONUS] Cloud Super-intelligence

The AR headset of the future is a light, comfortable, and truly mobile device you wear everywhere. This requires a constant, fast connection to the Internet. HoloLens is Wifi only for now, but LTE support must be on the horizon. Not only is this critical for everyday-everywhere use, but many advanced computer vision functions such as object recognition need cloud-based AI systems to analyze images and video. With the explosion of deep learning and machine learning technology, a fast 5G connection to these services will make Mixed Reality glasses something you never want to leave the house without.

Don’t Waste My Time

A lot of people seem impressed with highly staged demos of half baked hardware. It’s only when you begin to develop mixed reality apps that you understand what’s really needed to make these platforms successful. Demos without the critical elements listed in this post will be harder to impress with once more people are familiar with the technology.

The Coming Public Point Cloud

One of the most important elements of Augmented Reality is the ability to seamlessly mesh 3D graphics with the real world.  Current AR technology simply overlays graphics on top of video–even when tracking and recognizing objects like cards and markers. The AR SDK gives the position and orientation of the tracked object to a 3D engine which then renders geometry on top of the video frame coming from the device’s camera.

A 3D scan of myself overlaid on an AR card with Vuforia.

A 3D scan of myself overlaid on an AR card with Vuforia.

New technologies like Google’s Tango Tablet use Kinect-style depth cameras to store not only the color of each pixel, but the depth and position, too. (Well, sort of–the depth camera’s resolution is much lower than that of the color camera). This means that you can build a 3D model out of what the tablet’s camera sees as you move around an environment.

Tango displaying point cloud data of what it currently sees.

Tango displaying point cloud data of what it currently sees.

This feature has huge ramifications. Tango uses this data to do what is called “localization.” This means once an area is scanned, the tablet can compare the internal 3D model of the current environment it has stored with what the camera is currently seeing. When fused with compass and gyro data, the Tango tablet can compute its precise location inside the scanned area. This doesn’t take long either. Tango starts building the model immediately. Walk back to where you started using the tablet, and Tango knows where it is.

Usually this 3D data is stored as a point cloud. This is basically a 3D point for every position the 3D camera records.  Hence, a sufficiently complicated area will look like a cloud of dots–a point cloud. You can see an example of the Tango building a point cloud with the Room Scanner Tango app.

These point clouds are important for not only localization, but AR graphical effects such as occluding rendered 3D objects with the real world.  This is because a 3D mesh can be built out of these points which can be used for occlusion, collision and other features. Having objects in between you and the augmentation occlude the 3D render is essential to nailing the feeling that an AR object is really there.

Point clouds are awesome, but building them can be frustrating. Current point cloud scanners are bulky and slow, not to mention their accuracy issues can lead to jitter and other artifacts. Also, some depth cameras run at a frame rate low enough to make it hard to create point clouds without moving very slowly through an environment. Who wants to play a game where you have to walk around and meticulously scan a room before you can start?

In order for AR games and apps to succeed, devices need to effortlessly be able to sense and detect the 3D geometry of their surroundings. Yet, quick and instant generation of point clouds is far beyond the capabilities of current mobile sensor technology.

That’s where the public point cloud comes in.

A truly great Augmented Reality platform needs to upload point clouds generated by devices to the cloud.  Then, when a user uses some hot new wearable AR glasses, it can pull down a pre-made point cloud for the current location off of a server and use that until the glasses can update it from its own sensors. The device will then upload a fresh point cloud which can be used to refine the version stored online.

You can kind of see this already–Google and Apple Maps’ 3D satellite mode use similar point cloud reconstruction techniques presumably from aerial photos and other sources. Whereas these 3D models often look like something you’d see on the original PlayStation, the public point cloud will have to be much more detailed.  As sensors on mobile devices become more advanced, the crowdsourced point cloud data will become incredibly detailed.

Apple Maps' 3D reconstruction kind of looks like an original PlayStation game. The public point cloud will have to be higher resolution.

This 3D reconstruction kind of looks like an original PlayStation game. The public point cloud will have to be higher resolution. Oh, this is also where you can bet the best pastrami sandwich on the planet.

A massive, publicly accessible point cloud is not just necessary for the next generation of AR wearable devices. But also for self driving cars, drone navigation, and robotics (which is indeed where many of these algorithms came from in the first place). Privacy implications do exist, but perhaps not more so than Google Maps’ street view, or other current technologies that give you very precise information about your location.

In the near future, almost every public place on the planet will be stored in the cloud as 3D reconstructed geometry–passively built up and constantly refined by sensors embedded on countless mobile and wearable devices, perhaps without the user even knowing.

Samsung Gear VR Development Challenges with Unity3D

As you may know, I’m a huge fan of Oculus and Samsung’s Gear VR headset. The reason isn’t about the opportunity Gear VR presents today. It’s about the future of wearables–specifically of self-contained wearable devices. In this category, Gear VR is really the first of its kind. The lessons you learn developing for Gear VR will carry over into the bright future of compact, self-contained, wearable displays and platforms. Many of which we’ve already started to see.

The Gear VR in the flesh (plastic).

The Gear VR in the flesh (plastic).


Gear VR development can be a challenge. Rendering two cameras and a distortion mesh on a mobile device at a rock solid 60fps requires a lot of optimization and development discipline. Now that Oculus’ mobile SDK is public and having worked on a few launch titles (including my own original title recently covered in Vice), I figured I’d share some Unity3D development challenges I’ve dealt with.

THERMAL ISSUES

The biggest challenge with making VR performant on a mobile devices is throttling due to heat produced by the chipset. Use too much power and the entire device will slow itself down to cool off and avoid damaging the hardware. Although the Note 4 approaches the XBOX 360 in performance characteristics, you only have a fraction of its power available. This is because the phone must take power and heat considerations in mind when keeping the CPU and GPU running at full speed.

With the Gear VR SDK you can independently tell the device how fast the GPU and CPU should run. This prevents you from eating up battery when you don’t need the extra cycles, as well as tune your game for performance at lower clock speeds. Still, you have to be aware of what types of things eat up GPU cycles or consume GPU resources. Ultimately, you must choose which to allocate more power for.

GRAPHICAL DETAIL

The obvious optimization is lowering graphical detail. Keep your polycount under 50k triangles. Avoid as much per pixel and per vertex processing as possible. Since you have tons of RAM but relatively little GPU power available–opt for more texture detail over geometry. This includes using lightmaps instead of dynamic lighting. Of course, restrict your usage of alpha channel to a minimum–preferably for quick particle effects, not for things that stay on the screen for a long period of time.

Effects you take for granted on modern mobile platforms, like skyboxes and fog, should be avoided on Gear VR. Find alternatives or design an art style that doesn’t need them. A lot of these restrictions can be made up for with texture detail.

A lot of standard optimizations apply here–for instance, use texture atlasing and batching to reduce draw calls. The target is under 100 draw calls, which is achievable if you plan your assets correctly. Naturally, there are plenty of resources in the Asset Store to get you there. Check out Pro Draw Call Optimizer for a good texture atlasing tool.

CPU OPTIMIZATIONS

There are less obvious optimizations you might not be familiar with until you’ve gone to extreme lengths to optimize a Gear VR application. This includes removing as many Update methods as possible. Most update code spent waiting for stuff to happen (like an AI that waits 5 seconds to pick a new target) can be changed to a coroutine that is scheduled to happen in the future. Converting Update loops to coroutines will take the burden of waiting off the CPU. Even empty Update functions can drain the CPU–death by a thousand cuts. Go through your code base and remove all unnecessary Update methods.

As in any mobile game, you should be pooling prefabs. I use Path-o-Logical’s PoolManager, however it’s not too hard to write your own. Either way, by recycling pre-created instances of prefabs, you save memory and reduce hiccups due to instantiation.

IN CONCLUSION

There’s nothing really new here to most mobile developers, but Gear VR is definitely one of the bigger optimization challenges I’ve had in recent years. The fun part about it is we’re kind of at the level of Dreamcast-era poly counts and effects but using modern tools to create content. It’s better than the good old days!

It’s wise to build for the ground up for Gear VR than to port existing applications. This is because making a VR experience that is immersive and performant with these parameters requires all disciplines (programming, art, and design) to build around these restrictions from the start of the project.