Developing Applications for HoloLens with Unity3D: First Impressions

I started work on HoloLens game development with Unity3D over the past week. This included going through all of the example projects, as well as building simple games and applications to figure out how all of the platform’s features work.  Here’s a some takeaways from my first week as a HoloLens developer.

CiPSvqKVAAAsvoP

Baby steps…

The Examples Are Great, But Lack Documentation

If you go through all of the Holo Academy examples Microsoft provides, you’ll go from displaying a basic cube to a full-blown multi user Augmented Reality experience. However, most of the examples involve dragging and dropping pre-made prefabs and scripts into the scene. Not a lot about the actual SDK is explained. The examples are a good way to get acquainted with HoloLens features, but you’re going to have to do more work to figure out how to write your own applications.

HoloToolkit is Incredibly Full Featured

All of the examples are based on HoloToolkit, Microsoft’s collection of scripts and prefabs that handle just about every major HoloLens application feature: input, spatial mapping, gesture detection, speech recognition, and even some networking.

I also found that features I needed (such as the placement of objects in the real world using real-time meshing as a collider) are features in the examples I could easily strip out and modify for my own C# scripts. Using these techniques I was able to get a very simple carnival milk bottle game running in a single Saturday afternoon.

Multiplayer Gets Complicated

I’m working on moving my award winning Tango RTS, InnAR Wars, to HoloLens. However, multiplayer experiences work much differently on HoloLens than Tango. In the case of Tango, each device shares a single room scan file and is localized in the same coordinate space. This means that once the game starts, placing an object (like a floating planet or asteroid) at any position will make it appear in the same real-world location on both Tangos.

HoloLens shares objects between devices using what are called Spatial Anchors. Spatial Anchors mark parts of the scanned room geometry as an anchored position. You can then place virtual objects in the real world relative to this anchor. When you share a Spatial Anchor with another device, the other HoloLens will look for a similar location in its own scan of the room to position the anchor. These anchors are constantly being updated as the scan continues, which is part of the trick to how HoloLens’ tracking is so rock solid.

Sure, having a single coordinate frame on the Tango is easier to deal with, but the Tango also suffers from drift and inaccuracies that may be symtomatic of its approach. Spatial Anchoring is a rather radical change from how Tango works–which means a lot of refactoring for InnAR Wars, or even a redesign.

First Week Down

This first week has been an enlightening experience. Progress has been fast but also made me aware of how much work it will be to produce a great HoloLens app. At least two independently published HoloLens games popped up in the Windows Store over the past few days. The race is on for the first great indie HoloLens application!

My Week With HoloLens

holobox

Microsoft ships the HoloLens and Clicker accessory in the box

My HoloLens development kits finally arrived a week ago. I’ve spent a great deal of time using the device over the past week. I figured I’d post my impressions here.

This Really Works

When I first put my HoloLens on, I placed an application window floating above my kitchen table. Suddenly, I realized I hadn’t taken out the garbage. Still wearing the device, I ran downstairs to put something in the trash. I narrowly missed my neighbor–successfully avoiding an awkward conversation about what this giant contraption on my face does.

When I returned to my kitchen, the application was still there–hovering in space.

As I’ve stated before, Microsoft blew me away. HoloLens is an absolutely incredible leap from previous generation AR glasses (and MUCH cheaper, believe it or not). It also does everything Tango does but at a much higher level of performance and precision. Which means most applications built on Tango can be directly moved over to HoloLens.

HoloLens is fast and intuitive enough to attempt getting actual work done with it. Yet, a lot of my time spent is just trying to make silly videos like this.

It’s A Full Blown Windows Machine

HoloLens isn’t just a prototype headset–it’s a full featured desktop Windows PC on your face. Not only can you run “Windows Holographic” apps, but any Universal Windows App from the Windows Store. Instead of these applications running in a window on a monitor, they float around in space–positioned wherever you choose.

Although HoloLens really does need a taskbar of some kind. It’s way too easy to forget where Skype physically is because you launched it in the bathroom.

It also helps to connect a Bluetooth keyboard and mouse when running standard applications. Gestures can’t give you the input fidelity of a traditional mouse, and typing in the air is a chore.

HoloLens’ narrow FOV makes using a regular Windows app problematic–as the screen will get cut off and require you to move your head around to see most of it. Also, if you push a window far enough into the background so you can see the whole thing, you’ll notice HoloLens’ resolution is a little low to read small text. We’re going to need a next generation display for HoloLens to really be useful for everyday computing.

Microsoft Has Created A New Input Paradigm

HoloLens can seemingly only recognize two gestures: bloom and “air tap”. Blooming is cool–I feel like a person in a sci-fi movie making the Windows start menu appear in the air by tossing it up with a simple gesture.

The air tap can be unintuitive. Most people I let try the HoloLens poke at the icons by stabbing them with a finger. That’s not what the air tap is for. You still have to gaze at a target by moving your head and then perform the lever-like air tap gesture within the HoloLens camera’s view to select what the reticule is on.

HoloLens can track the motion of your finger and use it as input to move stuff around (such as application windows), but not detect collisions between it and virtual objects. It’s as if it can detect the amount your finger moves but not its precise location in 3D space.

Using apps while holding your hand out in front of the headset is tiring. This is why Microsoft includes the clicker. This is a simple Bluetooth button that when pressed triggers the air tap gesture. Disappointingly, the clicker isn’t trackable–so you can’t use it as a true finger replacement.

Microsoft has adapted Windows to the holographic model successfully. This is the first full blown window manager and gesture interface for augmented reality I’ve ever seen and it’s brilliant. After a few sessions with the device, most people I’ve let use it are launching apps and moving windows around the room like a pro.

This Thing Is Heavy

Although the industrial design is cool in a retro ‘90s way, this thing is really uncomfortable to use for extended periods of time. Maybe I don’t have it strapped on correctly, but after a 20 minute Skype session I had to take the device off. I felt pain above the bridge of my nose. When I looked in the mirror, I saw what can only be described as ‘HoloHead’

holohead

The unfortunate symptom of “HoloHead”

The First Generation Apps Are Amazing

There already are great free apps in the Windows Store that show off the power of the HoloLens platform. Many made by Asobo Studio–a leader in Augmented Reality game development.

Young Conker

Young Conker is a great example of HoloLens as a games platform. The game is simple: after scanning your surroundings, play a familiar platform game over the floors, walls, tables and chairs as Conker runs, jumps and collects coins scattered about your room.

Conker will jump on top of your coffee table, run into walls, or be occluded by a chair as if he were walking behind it–well, depending on how accurate your scan is. The fact that this works as well as it does is amazing to me.

Fragments

One of the first true game experiences I’ve ever played in augmented reality. You play the part of a futuristic detective, revisiting memories of crimes as their events are re-created holographically in your location. Characters sit on your furniture. You’ll hunt for pieces of evidence scattered about your room–even under tables. It really is an incredible experience, As with Conker, it requires some pre-scanning of your environment. However, applications apparently can share scans between each other as Fragments was able to re-use a scan of my office I previously made with another app.

Skype

When Skyping with a person not using HoloLens, you simply place their video on a wall in your surroundings. It’s almost like talking to someone on the bridge of the Enterprise, depending on how big you make the video window.

When Skyping with another HoloLens user, you can swap video feeds so either participant can see through the other’s first person view. While looking at someone else’s video feed as a floating window, you can sketch over it with drawing tools or even place pictures from your photos folder in the other person’s environment. 2D lines drawn over the video feed will form around the other user’s real-world in 3D–bending around corners, or sticking to the ceiling. 

Conclusion

As a consumer electronics device, HoloLens is clearly beta–maybe even alpha, but surprisingly slick. It needs more apps. With Wave 2 underway, developers are working on just that. In my case, I’m moving all of my Tango projects to HoloLens–so you’ll definitely be seeing cool stuff soon!

The Challenge of Building Augmented Reality Games In The Real World

InnAR Wars Splash Image - B

Last week I submitted the prototype build of my latest augmented reality project, InnAR Wars, to Google’s Build a Tango App Contest. It’s an augmented reality multiplayer space RTS built for Google’s Tango tablet that utilizes the environment around you as a game map. The game uses the Tango’s camera and Area Learning capabilities to superimpose an asteroid-strewn space battlefield over your real-world environment. Two players holding Tangos walk around the room hunting for each other’s bases while sending attack fleets at the other player’s structures.

Making InnAR Wars fun is tricky because I essentially have no control over the map. The battlefield has to fit inside the confines of the real-world environment the tablets are in. Using the Tango’s Area Learning capabilities with the positions of players, I know the rough size of the play area. With this information I adjust the density of planetoids and asteroids based on the size of the room. It’s one small way I can make sure the game at least has an interesting number of objects in the playfield regardless of the size of the area. As you can see from the videos in this post, it’s already being played in a variety of environments.

This brings up the biggest challenge of augmented reality games–How do you make a game fun when you have absolutely no control over the environment in which it’s played? One way is to require the user to set up the play space as if she were playing a board game. By using Tango’s depth camera, you could detect the shapes and sizes of objects on a table and use those as the playfield. It’s up to the user to set it up in a way that’s fun–much like playing a tabletop war game.

For the final release, I’m planning on using Tango’s depth camera to figure out where the room’s walls, ceilings, and floors are. Then I can have ships launch from portals that appear to open on the surfaces of the room. Dealing with the limited precision and performance of the Tango depth camera along with the linear algebra involved in plane estimation is a significant challenge. Luckily, there are a few third-party solutions for this I’m evaluating.

Especially when looking at augmented reality startups’ obligatory fake demo videos, the future of AR gaming seems exciting. But the practical reality of designing a game to be played in reality–which is itself rather poorly designed–can prevent even the most amazing technology from enabling great games. It’s probably going to take a few more hardware generations to not only make the technology usable, but also develop the design language to make great games that work in AR.

If you want to try out the game, I’ll have a few Tangos on hand at FLARB’s VRLA Summer Expo table. Stop by and check it out!

My Week With Project Tango

A few weeks back I got into Google’s exclusive Project Tango developers program. I’ve had a Tango tablet for about a week and have been experimenting with the available apps and Unity3D SDK.

Project Tango uses Movidius’ Myriad 1 Vision Processor chip (or “VPU”), paired with a depth camera not too unlike the original Kinect for the XBOX 360. Except instead of being a giant hideous block, it’s small enough to stick in a phone or tablet.

I’m excited about Tango because it’s an important step in solving many of the problems I have with current Augmented Reality technology. What issues can Tango solve?

POSITIONAL TRACKING

First, the Tango tablet has the ability to determine the tablet’s pose. Sure, pretty much every mobile device out there can detect its precise orientation by fusing together compass and gyro information. But by using the Tango’s array of sensors, the Myriad 1 processor can detect position and translation. You can walk around with the tablet and it knows how far and where you’ve moved. This makes SLAM algorithms much easier to develop and more precise than strictly optical solutions.

Also, another problem with AR as it exists now is that there’s no way to know whether you or the image target moved. Rendering-wise, there’s no difference. But, this poses a problem with game physics. If you smash your head (while wearing AR glasses) into a virtual box, the box should go flying. If the box is thrown at you, it should bounce off your head–big distinction!

Pose and position tracking has the potential to factor out the user’s movement and determine the motion of both the observer and the objects that are being tracked. This can then be fed into a game engine’s physics system to get accurate physics interactions between the observer and virtual objects.

OCCLUDING VIRTUAL CHARACTERS WITH THE REAL WORLD

Anyway, that’s kind of an esoteric problem. The biggest issue with AR is most solutions can only overlay graphics on top of a scene. As you can see in my Ether Drift project, the characters appear on top of specially designed trading cards. However, wave your hand in front of the characters, and they will still draw on top of everything.

Ether Drift uses Vuforia to superimpose virtual characters on top of trading cards.

Ether Drift uses Vuforia to superimpose virtual characters on top of trading cards.

With Tango, it is possible to reconstruct the 3D geometry of your surroundings using point cloud data received from the depth camera. Matterport already has an impressive demo of this running on the Tango. It allows the user to scan an area with the tablet (very slowly) and it will build a textured mesh out of what it sees. When meshing is turned off the tablet can detect precisely where it is in the saved environment mesh.

This geometry can possibly be used in Unity3D as a mesh collider which is also rendered to the depth buffer of the scene’s camera while displaying the tablet camera’s video feed. This means superimposed augmented reality characters can accurately collide with the static environment, as well as be occluded by real world objects. Characters can now not only appear on top of your table, but behind it–obscured by a chair leg.

ENVIRONMENTAL LIGHTING

Finally, this solves the challenge of how to properly light AR objects. Most AR apps assume there’s a light source on the ceiling and place a directional light pointing down. With a mesh built from local point cloud data, you can generate a panoramic render of where the observer is standing in the real world. This image can be used as a cube map for Image-based lighting systems like Marmoset Skyshop. This produces accurate lighting on 3D objects which when combined with environmental occlusion makes this truly a next generation AR experience.

A QUICK TEST

The first thing I did with the Unity SDK is drop the Tango camera in a Camera Birds scene. One of the most common requests for Camera Birds was to be able to walk through the forest instead of just rotating in place. It took no programming at all for me to make this happen with Tango.

This technology still has a long way to go–it has to become faster and more precise. Luckily, Movidius has already produced the Myriad 2, which is reportedly 3-5X faster and 20X more power efficient than the chip currently in the Tango prototypes. Vision Processing technology is a supremely nerdy topic–after all it’s literally rocket science. But it has far reaching implications for wearable platforms.

The Next Problems to Solve in Augmented Reality

I’m totally amped up about Project Tango. After having worked with augmented reality for a few years, most of the problems I’ve seen with current platforms could be solved with a miniaturized depth-sensing Kinect-style sensor. The Myriad 1 is a revolutionary chip that will dramatically change the quality of experience you get from augmented reality applications–both on mobile devices and wearables.

There’s a few other issues in AR I’d like to see addressed. Perhaps they are in research papers, but I haven’t seen anything real yet. Maybe they require some custom hardware as well.

Real-world lighting simulation.

One of the reasons virtual objects in augmented reality look fake is because AR APIs can’t simulate the real-world lighting environment in a 3D engine. For most applications, you place a directional light pointing down to and turn up the ambient for a vague approximation of overhead lighting. This is assuming the orientation of the object you’re tracking is upright, of course.

Camera Birds AR mode using an overhead directional light.

Camera Birds AR mode using an overhead directional light.

What I’d really like to use is Image Based Lighting. Image based Lighting is a computationally efficient way to simulate environmental lighting without filling a scene up with dynamic lights. It uses a combination of cube maps built from HDR photos with custom shaders to produce great results. A good example of this is the Marmoset Skyshop plug-in for Unity3D.

Perhaps with a combination of sensors and 360 cameras you can build HDR cubemaps out of the viewer’s local environment in real-time to match environmental lighting. Using these with Image Based Lighting will be a far more accurate lighting model than what’s currently available. Maybe building rudimentary cubemaps out of the video feed is a decent half-measure.

Which object is moving?

In a 3D engine, virtual objects drawn on top of image targets are rendered with two types of cameras. Ether the camera is moving around the object, or the object is moving around the camera. In real life, the ‘camera’ is your eye–so the it should move if you move your head. If you move an image target, that is effectively moving the virtual object.

Current AR APIs have no way of knowing whether the camera or the object is moving. With Qualcomm’s Vuforia, you can either tell it to always move the camera around the object, or to move the objects around the camera. This can cause problems with lighting and physics.

For instance, on one project I was asked to make liquid pour out of a virtual glass when you tilt the image target it rest upon. To do this I had to force Vuforia to assume the image target was moving–so then the image target tilted, so would the 3D object in the game engine and liquid would pour. Only problem is, this would also happen if I had moved the phone as well. Vuforia can’t tell what’s actually moving.

There needs to be a way to accurately track the ‘camera’ movement of either the wearable or mobile device so that in the 3D scene the camera and objects can be positioned accurately. This will allow for lighting to be realistically applied and for moving trackable objects to behave properly in a 3D engine. Especially with motion tracking advances such as the M7 chip, I suspect there are some good algorithmic solutions to factoring out the movement of the object and the observer to solve this problem.

Anyway, these are the kind of problems you begin to think about when staring at augmented reality simulations for years. Once you get over the initial appeal of AR’s gimmick, the practical implications of the technology poses many questions. I’ve applied for my Project Tango devkit and really hope I get my hands on one soon!