My Week With HoloLens

holobox

Microsoft ships the HoloLens and Clicker accessory in the box

My HoloLens development kits finally arrived a week ago. I’ve spent a great deal of time using the device over the past week. I figured I’d post my impressions here.

This Really Works

When I first put my HoloLens on, I placed an application window floating above my kitchen table. Suddenly, I realized I hadn’t taken out the garbage. Still wearing the device, I ran downstairs to put something in the trash. I narrowly missed my neighbor–successfully avoiding an awkward conversation about what this giant contraption on my face does.

When I returned to my kitchen, the application was still there–hovering in space.

As I’ve stated before, Microsoft blew me away. HoloLens is an absolutely incredible leap from previous generation AR glasses (and MUCH cheaper, believe it or not). It also does everything Tango does but at a much higher level of performance and precision. Which means most applications built on Tango can be directly moved over to HoloLens.

HoloLens is fast and intuitive enough to attempt getting actual work done with it. Yet, a lot of my time spent is just trying to make silly videos like this.

It’s A Full Blown Windows Machine

HoloLens isn’t just a prototype headset–it’s a full featured desktop Windows PC on your face. Not only can you run “Windows Holographic” apps, but any Universal Windows App from the Windows Store. Instead of these applications running in a window on a monitor, they float around in space–positioned wherever you choose.

Although HoloLens really does need a taskbar of some kind. It’s way too easy to forget where Skype physically is because you launched it in the bathroom.

It also helps to connect a Bluetooth keyboard and mouse when running standard applications. Gestures can’t give you the input fidelity of a traditional mouse, and typing in the air is a chore.

HoloLens’ narrow FOV makes using a regular Windows app problematic–as the screen will get cut off and require you to move your head around to see most of it. Also, if you push a window far enough into the background so you can see the whole thing, you’ll notice HoloLens’ resolution is a little low to read small text. We’re going to need a next generation display for HoloLens to really be useful for everyday computing.

Microsoft Has Created A New Input Paradigm

HoloLens can seemingly only recognize two gestures: bloom and “air tap”. Blooming is cool–I feel like a person in a sci-fi movie making the Windows start menu appear in the air by tossing it up with a simple gesture.

The air tap can be unintuitive. Most people I let try the HoloLens poke at the icons by stabbing them with a finger. That’s not what the air tap is for. You still have to gaze at a target by moving your head and then perform the lever-like air tap gesture within the HoloLens camera’s view to select what the reticule is on.

HoloLens can track the motion of your finger and use it as input to move stuff around (such as application windows), but not detect collisions between it and virtual objects. It’s as if it can detect the amount your finger moves but not its precise location in 3D space.

Using apps while holding your hand out in front of the headset is tiring. This is why Microsoft includes the clicker. This is a simple Bluetooth button that when pressed triggers the air tap gesture. Disappointingly, the clicker isn’t trackable–so you can’t use it as a true finger replacement.

Microsoft has adapted Windows to the holographic model successfully. This is the first full blown window manager and gesture interface for augmented reality I’ve ever seen and it’s brilliant. After a few sessions with the device, most people I’ve let use it are launching apps and moving windows around the room like a pro.

This Thing Is Heavy

Although the industrial design is cool in a retro ‘90s way, this thing is really uncomfortable to use for extended periods of time. Maybe I don’t have it strapped on correctly, but after a 20 minute Skype session I had to take the device off. I felt pain above the bridge of my nose. When I looked in the mirror, I saw what can only be described as ‘HoloHead’

holohead

The unfortunate symptom of “HoloHead”

The First Generation Apps Are Amazing

There already are great free apps in the Windows Store that show off the power of the HoloLens platform. Many made by Asobo Studio–a leader in Augmented Reality game development.

Young Conker

Young Conker is a great example of HoloLens as a games platform. The game is simple: after scanning your surroundings, play a familiar platform game over the floors, walls, tables and chairs as Conker runs, jumps and collects coins scattered about your room.

Conker will jump on top of your coffee table, run into walls, or be occluded by a chair as if he were walking behind it–well, depending on how accurate your scan is. The fact that this works as well as it does is amazing to me.

Fragments

One of the first true game experiences I’ve ever played in augmented reality. You play the part of a futuristic detective, revisiting memories of crimes as their events are re-created holographically in your location. Characters sit on your furniture. You’ll hunt for pieces of evidence scattered about your room–even under tables. It really is an incredible experience, As with Conker, it requires some pre-scanning of your environment. However, applications apparently can share scans between each other as Fragments was able to re-use a scan of my office I previously made with another app.

Skype

When Skyping with a person not using HoloLens, you simply place their video on a wall in your surroundings. It’s almost like talking to someone on the bridge of the Enterprise, depending on how big you make the video window.

When Skyping with another HoloLens user, you can swap video feeds so either participant can see through the other’s first person view. While looking at someone else’s video feed as a floating window, you can sketch over it with drawing tools or even place pictures from your photos folder in the other person’s environment. 2D lines drawn over the video feed will form around the other user’s real-world in 3D–bending around corners, or sticking to the ceiling. 

Conclusion

As a consumer electronics device, HoloLens is clearly beta–maybe even alpha, but surprisingly slick. It needs more apps. With Wave 2 underway, developers are working on just that. In my case, I’m moving all of my Tango projects to HoloLens–so you’ll definitely be seeing cool stuff soon!

The Coming Public Point Cloud

One of the most important elements of Augmented Reality is the ability to seamlessly mesh 3D graphics with the real world.  Current AR technology simply overlays graphics on top of video–even when tracking and recognizing objects like cards and markers. The AR SDK gives the position and orientation of the tracked object to a 3D engine which then renders geometry on top of the video frame coming from the device’s camera.

A 3D scan of myself overlaid on an AR card with Vuforia.

A 3D scan of myself overlaid on an AR card with Vuforia.

New technologies like Google’s Tango Tablet use Kinect-style depth cameras to store not only the color of each pixel, but the depth and position, too. (Well, sort of–the depth camera’s resolution is much lower than that of the color camera). This means that you can build a 3D model out of what the tablet’s camera sees as you move around an environment.

Tango displaying point cloud data of what it currently sees.

Tango displaying point cloud data of what it currently sees.

This feature has huge ramifications. Tango uses this data to do what is called “localization.” This means once an area is scanned, the tablet can compare the internal 3D model of the current environment it has stored with what the camera is currently seeing. When fused with compass and gyro data, the Tango tablet can compute its precise location inside the scanned area. This doesn’t take long either. Tango starts building the model immediately. Walk back to where you started using the tablet, and Tango knows where it is.

Usually this 3D data is stored as a point cloud. This is basically a 3D point for every position the 3D camera records.  Hence, a sufficiently complicated area will look like a cloud of dots–a point cloud. You can see an example of the Tango building a point cloud with the Room Scanner Tango app.

These point clouds are important for not only localization, but AR graphical effects such as occluding rendered 3D objects with the real world.  This is because a 3D mesh can be built out of these points which can be used for occlusion, collision and other features. Having objects in between you and the augmentation occlude the 3D render is essential to nailing the feeling that an AR object is really there.

Point clouds are awesome, but building them can be frustrating. Current point cloud scanners are bulky and slow, not to mention their accuracy issues can lead to jitter and other artifacts. Also, some depth cameras run at a frame rate low enough to make it hard to create point clouds without moving very slowly through an environment. Who wants to play a game where you have to walk around and meticulously scan a room before you can start?

In order for AR games and apps to succeed, devices need to effortlessly be able to sense and detect the 3D geometry of their surroundings. Yet, quick and instant generation of point clouds is far beyond the capabilities of current mobile sensor technology.

That’s where the public point cloud comes in.

A truly great Augmented Reality platform needs to upload point clouds generated by devices to the cloud.  Then, when a user uses some hot new wearable AR glasses, it can pull down a pre-made point cloud for the current location off of a server and use that until the glasses can update it from its own sensors. The device will then upload a fresh point cloud which can be used to refine the version stored online.

You can kind of see this already–Google and Apple Maps’ 3D satellite mode use similar point cloud reconstruction techniques presumably from aerial photos and other sources. Whereas these 3D models often look like something you’d see on the original PlayStation, the public point cloud will have to be much more detailed.  As sensors on mobile devices become more advanced, the crowdsourced point cloud data will become incredibly detailed.

Apple Maps' 3D reconstruction kind of looks like an original PlayStation game. The public point cloud will have to be higher resolution.

This 3D reconstruction kind of looks like an original PlayStation game. The public point cloud will have to be higher resolution. Oh, this is also where you can bet the best pastrami sandwich on the planet.

A massive, publicly accessible point cloud is not just necessary for the next generation of AR wearable devices. But also for self driving cars, drone navigation, and robotics (which is indeed where many of these algorithms came from in the first place). Privacy implications do exist, but perhaps not more so than Google Maps’ street view, or other current technologies that give you very precise information about your location.

In the near future, almost every public place on the planet will be stored in the cloud as 3D reconstructed geometry–passively built up and constantly refined by sensors embedded on countless mobile and wearable devices, perhaps without the user even knowing.