The Challenge of Building Augmented Reality Games In The Real World

InnAR Wars Splash Image - B

Last week I submitted the prototype build of my latest augmented reality project, InnAR Wars, to Google’s Build a Tango App Contest. It’s an augmented reality multiplayer space RTS built for Google’s Tango tablet that utilizes the environment around you as a game map. The game uses the Tango’s camera and Area Learning capabilities to superimpose an asteroid-strewn space battlefield over your real-world environment. Two players holding Tangos walk around the room hunting for each other’s bases while sending attack fleets at the other player’s structures.

Making InnAR Wars fun is tricky because I essentially have no control over the map. The battlefield has to fit inside the confines of the real-world environment the tablets are in. Using the Tango’s Area Learning capabilities with the positions of players, I know the rough size of the play area. With this information I adjust the density of planetoids and asteroids based on the size of the room. It’s one small way I can make sure the game at least has an interesting number of objects in the playfield regardless of the size of the area. As you can see from the videos in this post, it’s already being played in a variety of environments.

This brings up the biggest challenge of augmented reality games–How do you make a game fun when you have absolutely no control over the environment in which it’s played? One way is to require the user to set up the play space as if she were playing a board game. By using Tango’s depth camera, you could detect the shapes and sizes of objects on a table and use those as the playfield. It’s up to the user to set it up in a way that’s fun–much like playing a tabletop war game.

For the final release, I’m planning on using Tango’s depth camera to figure out where the room’s walls, ceilings, and floors are. Then I can have ships launch from portals that appear to open on the surfaces of the room. Dealing with the limited precision and performance of the Tango depth camera along with the linear algebra involved in plane estimation is a significant challenge. Luckily, there are a few third-party solutions for this I’m evaluating.

Especially when looking at augmented reality startups’ obligatory fake demo videos, the future of AR gaming seems exciting. But the practical reality of designing a game to be played in reality–which is itself rather poorly designed–can prevent even the most amazing technology from enabling great games. It’s probably going to take a few more hardware generations to not only make the technology usable, but also develop the design language to make great games that work in AR.

If you want to try out the game, I’ll have a few Tangos on hand at FLARB’s VRLA Summer Expo table. Stop by and check it out!

How To Support Gear VR and Google Cardboard In One Unity3D Project

Google Cardboard is a huge success. Cardboard’s userbase currently dwarfs that of Gear VR. Users, investors, and collaborators who don’t have access to Gear VR often ask for Cardboard versions of my games. As part of planning what to do next with Caldera Defense, I decided to create a workflow to port between Gear VR and Cardboard.

Always keep a Cardboard on me at ALL TIMES!

I used my VR Jam entry, Duck Pond VR, as a test bed for my Unity3D SDK switching scripts. It’s much easier to do this on a new project. Here’s how I did it:

Unity 4 vs. Unity 5

Google Cardboard supports Unity 4 and Unity 5. Although Oculus’ mobile SDK will technically work on Unity 5, you can’t ship with it because bugs in the current version of Unity 5 cause memory leaks and other issues on the Gear VR hardware. Unity is working on a fix but I haven’t heard any ETA on Gear VR support in Unity 5.

This is a bummer since the Cardboard SDK for Unity 5 supports skyboxes and other features in addition to the improvements Unity 5 has over 4. Unfortunately you’re stuck with Unity 4 when making a cross-platform Gear VR and Cardboard app.

Dealing With Cardboard’s Lack of Input

Although Gear VR’s simplistic touch controls are a challenge to develop for, the vast majority of Cardboards have no controls at all! Yes, Google Cardboard includes a clever magnetic trigger for a single input event. Yet, the sad fact is most Android devices don’t have the necessary dock connector to use this.

You have a few other control options that are universal to all Android devices: the microphone and Bluetooth controllers. By keeping the microphone open, you can use loud sounds (such as a shout) to trigger an action. You can probably use something like the Pitch Detector plug-in for this. Or, if your cardboard has a head strap for hands-free operation, you can use a Bluetooth gamepad for controls.

Because of this general lack of input, many Cardboard apps use what I call “stare buttons” for GUIs. These are buttons that trigger if you look at them long enough. I’ve implemented my own version. The prefab is here, the code is here. It even hooks into the new Unity UI event system so you can use it with my Oculus world space cursor code.

Gear VR apps must be redesigned to fit within Cardboard’s constraints. Whether it’s for limited controls or the performance constraints of low end devices. Most of my Cardboard ports are slimmed down Gear VR experiences. In the case of Caldera Defense, I’m designing a simplified auto-firing survival mode for the Cardboard port. I’ll merge this mode back into the Gear VR version as an extra game mode in the next update.

Swapping SDKs

This is surprisingly easy. You can install the Cardboard and Gear VR SDKs in a single Unity project with almost no problems. The only conflict is they both overwrite the Android manifest in the plugin folder. I wrote an SDK swapper that lets you switch between the Google Cardboard and Oculus manifests before you do a build. You can get it here. This editor script has you pick where each manifest file is for Cardboard and Gear VR and will simply copy over the appropriate file to the plugin folder. Of course for iOS Cardboard apps this isn’t an issue.

Supporting Both Prefabs

Both Oculus and Cardboard have their own prefabs that represent the player’s head and eye cameras. In Caldera Defense, I originally attached a bunch of game objects to the player’s head to use for traces, GUI positioning, HUDs, and other things that need to use the player’s head position and orientation. In order for these to work on both Cardboard and Oculus’ prefabs, I placed all objects attached to the head on another prefab which is attached to the Cardboard or Oculus’ head model at runtime.

Wrapping Both APIs

Not only do both SDK’s have similar prefabs for the head model, they also have similar APIs. In both Cardboard and Oculus versions, I need to refer to the eye and head positions for various operations. To do this, I created a simple class that detects which prefab is present in the scene, and grabs the respective class to wrap the eye position reference around. The script is in the prefab’s package.


For the final step, I made separate Cardboard versions of all my relevant Gear VR scenes which include the Cardboard prefabs and modified gameplay and interfaces. If no actual Oculus SDK code is in any of the classes used in the Cardboard version, the Oculus SDK should be stripped out of that build and you’ll have no problem running on Cardboard. This probably means I really need to make an Oculus and Cardboard specific versions of that CameraBody script.

The upcoming Unity 5.1 includes native Oculus support which may make this process a bit more complicated. Until then, these steps are the best way I can find to support both Cardboard and Gear VR in one project. I’m a big fan of mobile VR, and I think it’s necessary for any developer at this early stage of the market to get content out to as many users as possible.

The Coming Public Point Cloud

One of the most important elements of Augmented Reality is the ability to seamlessly mesh 3D graphics with the real world.  Current AR technology simply overlays graphics on top of video–even when tracking and recognizing objects like cards and markers. The AR SDK gives the position and orientation of the tracked object to a 3D engine which then renders geometry on top of the video frame coming from the device’s camera.

A 3D scan of myself overlaid on an AR card with Vuforia.

A 3D scan of myself overlaid on an AR card with Vuforia.

New technologies like Google’s Tango Tablet use Kinect-style depth cameras to store not only the color of each pixel, but the depth and position, too. (Well, sort of–the depth camera’s resolution is much lower than that of the color camera). This means that you can build a 3D model out of what the tablet’s camera sees as you move around an environment.

Tango displaying point cloud data of what it currently sees.

Tango displaying point cloud data of what it currently sees.

This feature has huge ramifications. Tango uses this data to do what is called “localization.” This means once an area is scanned, the tablet can compare the internal 3D model of the current environment it has stored with what the camera is currently seeing. When fused with compass and gyro data, the Tango tablet can compute its precise location inside the scanned area. This doesn’t take long either. Tango starts building the model immediately. Walk back to where you started using the tablet, and Tango knows where it is.

Usually this 3D data is stored as a point cloud. This is basically a 3D point for every position the 3D camera records.  Hence, a sufficiently complicated area will look like a cloud of dots–a point cloud. You can see an example of the Tango building a point cloud with the Room Scanner Tango app.

These point clouds are important for not only localization, but AR graphical effects such as occluding rendered 3D objects with the real world.  This is because a 3D mesh can be built out of these points which can be used for occlusion, collision and other features. Having objects in between you and the augmentation occlude the 3D render is essential to nailing the feeling that an AR object is really there.

Point clouds are awesome, but building them can be frustrating. Current point cloud scanners are bulky and slow, not to mention their accuracy issues can lead to jitter and other artifacts. Also, some depth cameras run at a frame rate low enough to make it hard to create point clouds without moving very slowly through an environment. Who wants to play a game where you have to walk around and meticulously scan a room before you can start?

In order for AR games and apps to succeed, devices need to effortlessly be able to sense and detect the 3D geometry of their surroundings. Yet, quick and instant generation of point clouds is far beyond the capabilities of current mobile sensor technology.

That’s where the public point cloud comes in.

A truly great Augmented Reality platform needs to upload point clouds generated by devices to the cloud.  Then, when a user uses some hot new wearable AR glasses, it can pull down a pre-made point cloud for the current location off of a server and use that until the glasses can update it from its own sensors. The device will then upload a fresh point cloud which can be used to refine the version stored online.

You can kind of see this already–Google and Apple Maps’ 3D satellite mode use similar point cloud reconstruction techniques presumably from aerial photos and other sources. Whereas these 3D models often look like something you’d see on the original PlayStation, the public point cloud will have to be much more detailed.  As sensors on mobile devices become more advanced, the crowdsourced point cloud data will become incredibly detailed.

Apple Maps' 3D reconstruction kind of looks like an original PlayStation game. The public point cloud will have to be higher resolution.

This 3D reconstruction kind of looks like an original PlayStation game. The public point cloud will have to be higher resolution. Oh, this is also where you can bet the best pastrami sandwich on the planet.

A massive, publicly accessible point cloud is not just necessary for the next generation of AR wearable devices. But also for self driving cars, drone navigation, and robotics (which is indeed where many of these algorithms came from in the first place). Privacy implications do exist, but perhaps not more so than Google Maps’ street view, or other current technologies that give you very precise information about your location.

In the near future, almost every public place on the planet will be stored in the cloud as 3D reconstructed geometry–passively built up and constantly refined by sensors embedded on countless mobile and wearable devices, perhaps without the user even knowing.

My Week With Project Tango

A few weeks back I got into Google’s exclusive Project Tango developers program. I’ve had a Tango tablet for about a week and have been experimenting with the available apps and Unity3D SDK.

Project Tango uses Movidius’ Myriad 1 Vision Processor chip (or “VPU”), paired with a depth camera not too unlike the original Kinect for the XBOX 360. Except instead of being a giant hideous block, it’s small enough to stick in a phone or tablet.

I’m excited about Tango because it’s an important step in solving many of the problems I have with current Augmented Reality technology. What issues can Tango solve?


First, the Tango tablet has the ability to determine the tablet’s pose. Sure, pretty much every mobile device out there can detect its precise orientation by fusing together compass and gyro information. But by using the Tango’s array of sensors, the Myriad 1 processor can detect position and translation. You can walk around with the tablet and it knows how far and where you’ve moved. This makes SLAM algorithms much easier to develop and more precise than strictly optical solutions.

Also, another problem with AR as it exists now is that there’s no way to know whether you or the image target moved. Rendering-wise, there’s no difference. But, this poses a problem with game physics. If you smash your head (while wearing AR glasses) into a virtual box, the box should go flying. If the box is thrown at you, it should bounce off your head–big distinction!

Pose and position tracking has the potential to factor out the user’s movement and determine the motion of both the observer and the objects that are being tracked. This can then be fed into a game engine’s physics system to get accurate physics interactions between the observer and virtual objects.


Anyway, that’s kind of an esoteric problem. The biggest issue with AR is most solutions can only overlay graphics on top of a scene. As you can see in my Ether Drift project, the characters appear on top of specially designed trading cards. However, wave your hand in front of the characters, and they will still draw on top of everything.

Ether Drift uses Vuforia to superimpose virtual characters on top of trading cards.

Ether Drift uses Vuforia to superimpose virtual characters on top of trading cards.

With Tango, it is possible to reconstruct the 3D geometry of your surroundings using point cloud data received from the depth camera. Matterport already has an impressive demo of this running on the Tango. It allows the user to scan an area with the tablet (very slowly) and it will build a textured mesh out of what it sees. When meshing is turned off the tablet can detect precisely where it is in the saved environment mesh.

This geometry can possibly be used in Unity3D as a mesh collider which is also rendered to the depth buffer of the scene’s camera while displaying the tablet camera’s video feed. This means superimposed augmented reality characters can accurately collide with the static environment, as well as be occluded by real world objects. Characters can now not only appear on top of your table, but behind it–obscured by a chair leg.


Finally, this solves the challenge of how to properly light AR objects. Most AR apps assume there’s a light source on the ceiling and place a directional light pointing down. With a mesh built from local point cloud data, you can generate a panoramic render of where the observer is standing in the real world. This image can be used as a cube map for Image-based lighting systems like Marmoset Skyshop. This produces accurate lighting on 3D objects which when combined with environmental occlusion makes this truly a next generation AR experience.


The first thing I did with the Unity SDK is drop the Tango camera in a Camera Birds scene. One of the most common requests for Camera Birds was to be able to walk through the forest instead of just rotating in place. It took no programming at all for me to make this happen with Tango.

This technology still has a long way to go–it has to become faster and more precise. Luckily, Movidius has already produced the Myriad 2, which is reportedly 3-5X faster and 20X more power efficient than the chip currently in the Tango prototypes. Vision Processing technology is a supremely nerdy topic–after all it’s literally rocket science. But it has far reaching implications for wearable platforms.

Big Data Bootstrapping Beware

I suppose this a dumb observation, but the one thing I learned in building ZenThousand is that bootstrapping a big data startup can be expensive. Obviously, it’s due to all of that data you have to deal with before having a single user.

First, there’s the problem of collecting the data. In the case of ZenThousand, I am looking for social network profiles of programmers. Although sites like Github, LinkedIn, and others have collected a treasure trove of personal details on engineers, it’s not like they are just going to let you walk in and take it!

In LinkedIn’s case, over half of their revenue is from their recruiting features. Essentially, the information they have on you is worth nearly a billion dollars. Which is why use of their people search API has strict licensing restrictions. Github and other social sites don’t let you do simple searches for users either–you have to use what tools they give you in their API to sniff info out.

Collecting your own data can be very expensive. Data intensive services such as mapping require massive effort. This is why companies sitting on large datasets are so valuable. People scoff at Foursquare’s valuation, but while the app might not have great user numbers the location database they’ve built is of immense value.

Secondly, there’s the cost to store and process all of this data. With most startups, the amount of data you store is directly proportional to the amount of users you have. Scaling issues become a so-called “good problem to have” as it usually means your app has a lot of traction. If these are paying users, even better–Your data costs are totally covered.

With a big data startup, you have massive amounts of data to store and process with no users. This gets costly really fast. In my case, Google App Engine service fees quickly became prohibitively expensive. My future strategy involves moving off of GAE and on to either Google Compute Engine or a physical box. I know of at least one big data startup that migrated out of the cloud to a colocation facility for both cost and performance reasons.

This doesn’t mean big data isn’t possible without a large investment. It’s just that two of the first big problems you need to solve are how to cost-effectively collect and analyze lots of data before you have any revenue.

Donut Vision: Google App Engine Experiments 2

Some (well, very few) of you may remember my previous post on Google App Engine. Developing a GAE app using JSP was a trip down memory lane, using a technology that has seemingly been left unchanged since 2001.

I recently began a project that involves using Vine and Twitter to sort through video clips. I decided to build on Google App Engine again. This time I’m using Python. My initial hacking has resulted in Donut Vision–a search portal for donut videos on Vine. Hey, don’t laugh. These guys are trying to build an actual business off of the same type of sites–Presumably with cokehead money.

Using Python (GAE’s original language) has been an absolute pleasure. On GAE, it really does seem much faster than using Java. GAE’s built in webapp2 framework and Django templates make building sites and APIs a breeze. I swear not having to type brackets has given me some kind of minor productivity boost–Or not. But placebo is a real thing.

My general “get off my lawn” nitpicks with Python are mostly due to it being a weird hybrid of a dynamic language, yet strongly typed. This gives PyDev in Eclipse a problem performing autocomplete since it really doesn’t know what type you’re referring to in most cases. PyDev and Eclipse is a decent combination due to the convenience of deploying to GAE within the IDE. I’d switch to something else with better autocomplete support, though.

As for the details of how this works, it’s really pretty simple. There’s no Vine API yet, so I simply use the Twitter API to search for Vines with relevant hashtags and pull the URLs out of them. Originally I was using Vine’s new embed code to display videos, but I eventually resorted to grabbing the URL of the MP4 file in the S3 bucket it’s stored in to have more control over the video when playing it with video-js. I expect Vine to shut down this method since I’m just running up their AWS bill with no benefit to them–not even a link back to the Vine app. Hey, if Vine provides a proper API, I’d use it.

Oh also, in my earlier post I stated that Google App Engine is not available in China. This is only partially true. The default appspot domain is indeed blocked in China. Yet, when putting my custom domain,, through I get nothing but green status. Yes, I’m boldly sparking a democratic revolution one French Cruller at a time. So, if you want to serve Chinese customers via GAE, just map a custom domain to it.

I’m seriously considering using Google App Engine as a backend for a new game. The only problem is cost estimation. I have constant paranoia of real-world usage patterns running up my bill. Especially with improperly indexed datastores, you can rack up charges pretty fast. Still, simply writing an app and uploading it to Google’s cloud is significantly easier than fiddling with Amazon Web Services and Beanstalk. If you haven’t checked it out since the early days, GAE is worth another look.

Oh, also the latest version of GAE has sockets support. It’s still experimental, but this may lead to GAE being suitable for real-time applications such as multiplayer game servers.

Google App Engine Experiments

For a while I’ve been complaining about the fact that sites such as AppAnnie and AppFigures don’t send daily summary emails of not just your apps, but the top apps in the App Store. I want to know what’s trending and topping the charts every day.

I could have made something in PHP to do this in a matter of hours, but I like to use side projects to learn something new. As an excuse to learn Google App Engine I built UpTopR: a site that emails a daily summary of the top 10 apps for iOS and iPad. It’s slow and ugly, but does what I need it to.

I used the Java API since I couldn’t find a way to deploy Python projects to GAE as easily as the Google plug-in for Eclipse does. I only had to learn how to use Google’s NoSQL App Engine Datastore and caching APIs. Otherwise, getting up and running on GAE is as easy, if not easier, than deploying a servlet on Tomcat. The whole process of learning GAE and finishing the app took about 4 days.

I’m big on PaaS now. Writing an application that magically scales inside Google’s environment is much easier than managing a cluster of EC2 instances as virtual infrastructure. Of course, writing a giant scaling servlet isn’t appropriate for a lot of tasks–but for the back-end of an asynchronous mobile game it makes a lot of sense.

Although last year’s pricing changes caused a revolt with long time GAE users, low traffic applications fall under the free usage quotas. Noodling around on GAE costs you nothing. This is great for prototyping.

Unfortunately, Google App Engine doesn’t work in China. The vast majority of IAPs in China are fraudulent, but China is kind of a big deal. Also, as useful as Google’s Datastore is, it still can’t search using geolocation without some suspect hacks. Amazon Web Services is available in China, and I can attach any kind of database I want to Amazon’s GAE equivalent, Beanstalk. This includes the geohash-supporting MongoDB. For these reasons I’m most likely going to use Amazon’s Beanstalk as a GAE alternative on future projects.

PROTIP: I had this problem for a while when trying to use as the domain for the app. Here’s what you have to know about using custom domains for GAE apps:

  • Only domain aliases for your main domain your App Engine Account is hosted on can be used with Google App Engine apps.

  • For mysterious reasons, naked domains can’t be used. You have to use a subdomain such as and use a URL redirect to point the naked domain at the subdomain.

  • Once your domain alias is registered with Google Apps, you have to type in the main active domain the alias is for on the Domain settings page for the GAE app. Then it will direct you to your Google Apps administration panel where you will be able select the alias from a dropdown.

I wish I knew this earlier! It took a few days of banging my head against a wall to figure out how to host my App Engine app on a custom domain.