Why I Don’t Care About Your New Mixed Reality Headset

I’m often approached by entrepreneurs in the AR/MR space offering me demos of new hardware.  Competition in this space is fierce. You need three major elements for me to take a new platform seriously.


You Need These Three Things To Have A Successful Mixed Reality Device

The three requirements for any successful AR (or more specifically MR) device are: Display, Computer Vision, Operating System


This is the first element of an AR/MR wearable, and usually this what all hardware companies have. There are a number of different displays out there, but they all seem to share the same limitations: additive translucent graphics, small FOV, and relatively low resolution. Often times devices with claims of wider FOVs end up with even lower resolution visuals as a compromise. Both low and high resolution displays I’ve seen are all additive, thus images appear as translucent. Some companies claim to have solved these problems. As far as I’ve seen, we’re a long ways off from a commercial reality.


Operating System

When I got my HoloLens devkits, the first thing that impressed me is that Microsoft ported the entirety of Windows 10 to Mixed Reality. Up until now, most AR headsets had simple gaze-optimized skins for Android. Windows Holographic makes even traditional 2D applications able to be run in mixed reality as application windows floating in space or attached to your walls. It’s all tied to a bulletproof content delivery ecosystem (Windows App Store) so distribution is solved as well.


Your device needs to be more than just something worn only to run a specific app. Mixed reality wearables will one day replace your computer, phone, and just about anything with a screen. You need a complete Mixed Reality operating system that can run everything from the latest games to a browser and your email client in this inevitable use case.

Computer Vision

I can’t tell you how many device manufacturers have shown me their new display but “just don’t have the computer vision stuff in.” Sorry, but this is the most important element of mixed reality. Amazing localization, spatialization, tracking, and surface reconstruction features are what puts HoloLens light years ahead of its nearest competition.

This stuff is hard to do. Computer Vision was formerly an obscure avenue of computer science not many people studied. Now augmented reality has created a war for talent in this sector, with a small (but growing) number of Computer Vision PhDs commanding huge salaries from well funded startups. There are very few companies that have the Computer Vision expertise to make mixed reality work, and this talent is jealously guarded.

[BONUS] Cloud Super-intelligence

The AR headset of the future is a light, comfortable, and truly mobile device you wear everywhere. This requires a constant, fast connection to the Internet. HoloLens is Wifi only for now, but LTE support must be on the horizon. Not only is this critical for everyday-everywhere use, but many advanced computer vision functions such as object recognition need cloud-based AI systems to analyze images and video. With the explosion of deep learning and machine learning technology, a fast 5G connection to these services will make Mixed Reality glasses something you never want to leave the house without.

Don’t Waste My Time

A lot of people seem impressed with highly staged demos of half baked hardware. It’s only when you begin to develop mixed reality apps that you understand what’s really needed to make these platforms successful. Demos without the critical elements listed in this post will be harder to impress with once more people are familiar with the technology.

The Challenge of Building Augmented Reality Games In The Real World

InnAR Wars Splash Image - B

Last week I submitted the prototype build of my latest augmented reality project, InnAR Wars, to Google’s Build a Tango App Contest. It’s an augmented reality multiplayer space RTS built for Google’s Tango tablet that utilizes the environment around you as a game map. The game uses the Tango’s camera and Area Learning capabilities to superimpose an asteroid-strewn space battlefield over your real-world environment. Two players holding Tangos walk around the room hunting for each other’s bases while sending attack fleets at the other player’s structures.

Making InnAR Wars fun is tricky because I essentially have no control over the map. The battlefield has to fit inside the confines of the real-world environment the tablets are in. Using the Tango’s Area Learning capabilities with the positions of players, I know the rough size of the play area. With this information I adjust the density of planetoids and asteroids based on the size of the room. It’s one small way I can make sure the game at least has an interesting number of objects in the playfield regardless of the size of the area. As you can see from the videos in this post, it’s already being played in a variety of environments.

This brings up the biggest challenge of augmented reality games–How do you make a game fun when you have absolutely no control over the environment in which it’s played? One way is to require the user to set up the play space as if she were playing a board game. By using Tango’s depth camera, you could detect the shapes and sizes of objects on a table and use those as the playfield. It’s up to the user to set it up in a way that’s fun–much like playing a tabletop war game.

For the final release, I’m planning on using Tango’s depth camera to figure out where the room’s walls, ceilings, and floors are. Then I can have ships launch from portals that appear to open on the surfaces of the room. Dealing with the limited precision and performance of the Tango depth camera along with the linear algebra involved in plane estimation is a significant challenge. Luckily, there are a few third-party solutions for this I’m evaluating.

Especially when looking at augmented reality startups’ obligatory fake demo videos, the future of AR gaming seems exciting. But the practical reality of designing a game to be played in reality–which is itself rather poorly designed–can prevent even the most amazing technology from enabling great games. It’s probably going to take a few more hardware generations to not only make the technology usable, but also develop the design language to make great games that work in AR.

If you want to try out the game, I’ll have a few Tangos on hand at FLARB’s VRLA Summer Expo table. Stop by and check it out!

The Coming Public Point Cloud

One of the most important elements of Augmented Reality is the ability to seamlessly mesh 3D graphics with the real world.  Current AR technology simply overlays graphics on top of video–even when tracking and recognizing objects like cards and markers. The AR SDK gives the position and orientation of the tracked object to a 3D engine which then renders geometry on top of the video frame coming from the device’s camera.

A 3D scan of myself overlaid on an AR card with Vuforia.

A 3D scan of myself overlaid on an AR card with Vuforia.

New technologies like Google’s Tango Tablet use Kinect-style depth cameras to store not only the color of each pixel, but the depth and position, too. (Well, sort of–the depth camera’s resolution is much lower than that of the color camera). This means that you can build a 3D model out of what the tablet’s camera sees as you move around an environment.

Tango displaying point cloud data of what it currently sees.

Tango displaying point cloud data of what it currently sees.

This feature has huge ramifications. Tango uses this data to do what is called “localization.” This means once an area is scanned, the tablet can compare the internal 3D model of the current environment it has stored with what the camera is currently seeing. When fused with compass and gyro data, the Tango tablet can compute its precise location inside the scanned area. This doesn’t take long either. Tango starts building the model immediately. Walk back to where you started using the tablet, and Tango knows where it is.

Usually this 3D data is stored as a point cloud. This is basically a 3D point for every position the 3D camera records.  Hence, a sufficiently complicated area will look like a cloud of dots–a point cloud. You can see an example of the Tango building a point cloud with the Room Scanner Tango app.

These point clouds are important for not only localization, but AR graphical effects such as occluding rendered 3D objects with the real world.  This is because a 3D mesh can be built out of these points which can be used for occlusion, collision and other features. Having objects in between you and the augmentation occlude the 3D render is essential to nailing the feeling that an AR object is really there.

Point clouds are awesome, but building them can be frustrating. Current point cloud scanners are bulky and slow, not to mention their accuracy issues can lead to jitter and other artifacts. Also, some depth cameras run at a frame rate low enough to make it hard to create point clouds without moving very slowly through an environment. Who wants to play a game where you have to walk around and meticulously scan a room before you can start?

In order for AR games and apps to succeed, devices need to effortlessly be able to sense and detect the 3D geometry of their surroundings. Yet, quick and instant generation of point clouds is far beyond the capabilities of current mobile sensor technology.

That’s where the public point cloud comes in.

A truly great Augmented Reality platform needs to upload point clouds generated by devices to the cloud.  Then, when a user uses some hot new wearable AR glasses, it can pull down a pre-made point cloud for the current location off of a server and use that until the glasses can update it from its own sensors. The device will then upload a fresh point cloud which can be used to refine the version stored online.

You can kind of see this already–Google and Apple Maps’ 3D satellite mode use similar point cloud reconstruction techniques presumably from aerial photos and other sources. Whereas these 3D models often look like something you’d see on the original PlayStation, the public point cloud will have to be much more detailed.  As sensors on mobile devices become more advanced, the crowdsourced point cloud data will become incredibly detailed.

Apple Maps' 3D reconstruction kind of looks like an original PlayStation game. The public point cloud will have to be higher resolution.

This 3D reconstruction kind of looks like an original PlayStation game. The public point cloud will have to be higher resolution. Oh, this is also where you can bet the best pastrami sandwich on the planet.

A massive, publicly accessible point cloud is not just necessary for the next generation of AR wearable devices. But also for self driving cars, drone navigation, and robotics (which is indeed where many of these algorithms came from in the first place). Privacy implications do exist, but perhaps not more so than Google Maps’ street view, or other current technologies that give you very precise information about your location.

In the near future, almost every public place on the planet will be stored in the cloud as 3D reconstructed geometry–passively built up and constantly refined by sensors embedded on countless mobile and wearable devices, perhaps without the user even knowing.