Math Behind Augmented Reality Maps

Augmented reality combined with modern smartphones equipped with cameras, magnetometers, and the ability to display maps on the screen make a powerful tool that can be used in a plethora of different applications. Here we are going to explain how to build one such kind of application. We will learn how to put geolocation based annotations on the screen, taking into account where a device is, which direction it is pointing, and how far place marks of interest are from it—known as geolocation augmented reality. We will explore the math that goes behind such application, and also learn about a reference implementation for iOS.

Although the reference implementation is for iOS, the concepts presented here are transfer- able to any other capable device or platform.

The use case we will be developing here is of a user standing at a location with longitude $a_x$ and latitude $a_y$ and pointing the back of her device to a direction making an angle α with a line parallel to the latitude lines. The device will have a camera with viewing angle φ and a maximum field of interest, from the observer’s perspective, denoted by radius r. We want to overlay annotations on the device’s screen of all place marks that are situated within the visible area. Furthermore, we want to update the annotations every time the user changes location, rotates the device, or changes the radius of the region of interest.

Figure 1. Blue and red pins are visible, all others are not.

The viewing angle φ of the device will depend whether its orientation is portrait or landscape. For our calculations we are using:

$\phi = \begin{cases} \cfrac{\pi}{3} & \text{Device orientation is landscape} \\ \cfrac{\pi}{4} & \text{Device orientation is portrait} \end{cases}$

Figure 2. Field of view depending on the device being portrait or landscape.

Although there may be some discussion whether those angles are exact. They are pretty close to the actual angle values and perfectly good for this application.

Before we can proceed we need to find a way to convert longitude and latitude—which are expressed in degrees—to miles (or kilometers). And since the Earth’s shape is not a circle, we can expect to have two different conversions: one for latitude and one for longitude.

Figure 3. Field of view on the x y plane.

The Earth’s radius at the equator line is 3,956.547 miles. Let’s denote it by s.

s = 3,956.547 miles

Converting latitudes to miles is be simpler since we can approximate latitude lines to an arc of a circle. We know how to calculate the circumference of a circle (2πr), and we also know a circle has 360 degrees. Thus, if we divide the Earth’s circumference by 360 degrees we will have an approximation to miles per degree of latitude t. The accuracy of this approximation is sufficient for our application.

$t = \cfrac{2 \pi s}{360}$ miles per degree of latitude.

Calculating miles per degree of longitude g will be not as straightforward. Walking one degree of longitude alongside the equator line means walking a greater distance than walking one degree of longitude closer to the poles.

Figure 4. Distance depends on the longitude.

$g = t \cos{\left( a_y \cfrac{\pi}{180} \right)}$ miles per degree of longitude.

Our user is standing at:

$a = \left[ \begin{array}{c} a_x \\ a_y \end{array} \right] = \left[ \begin{array}{c} \text{device's longitude} \\ \text{device's latitude} \end{array} \right]$

A user will be pointing the device to a direction, and this direction will make an angle α with the true north (heading). Now, this angle α will be measured against the latitude lines, which are vertical, and equivalent to the y axis or any of its parallels in a cartesian plane. However in a cartesian plane it is common practice to measure angles against the x axis, therefore we need to convert the angle α to an angle ψ, which will give us the heading relative to an horizontal line parallel to the x axis.

$\psi = \cfrac{\pi}{2} - \alpha$

And since now we know how to convert from degrees to miles and vice-versa, we can calculate points b and c given a distance of interest r.

$b = \left[ \begin{array}{c} b_x \\ b_y \end{array} \right] = \left[ \begin{array}{c} \cfrac{r}{g} \cos \left( \psi + \cfrac{\phi}{2} \right) + a_x \\ \cfrac{r}{t} \sin \left( \psi + \cfrac{\phi}{2} \right) + a_y \end{array} \right]$

$c = \left[ \begin{array}{c} c_x \\ c_y \end{array} \right] = \left[ \begin{array}{c} \cfrac{r}{g} \cos \left( \psi - \cfrac{\phi}{2} \right) + a_x \\ \cfrac{r}{t} \sin \left( \psi - \cfrac{\phi}{2} \right) + a_y \end{array} \right]$

Given that we have a set of place marks, each one with its own longitude and latitude. Our challenge is to determine which ones would fall within the region of interest, therefore be visible and have a corresponding annotation overlaid on the screen. We will need to iterate over the set of place marks and for each place mark:

$p = \left[ \begin{array}{c} p_x \\ p_y \end{array} \right] = \left[ \begin{array}{c} \text{place mark's longitude} \\ \text{place mark's latitude} \end{array} \right]$

We will need to calculate whether or not its longitude and latitude coordinates are within the region of interest. We have that the projection of vector $\vec{ap}$ on vector $\vec{ab}$ can be calculated by:

$proj_{\vec{ab}}{\vec{ap}} = \cfrac{\vec{ap} \cdot \vec{ab}}{\| \vec{ab} \|}$

We need to express $proj_{\vec{ab}}{\vec{ap}}$ as a coordinate of vector $\vec{ab}$ , or in other words, calculate its eigenvalue. That is achieved by dividing it by the norm of $\vec{ab}$ . Let’s call this eigenvalue λ.

$\lambda = \cfrac{proj_{\vec{ab}}{\vec{ap}}}{\| \vec{ab} \|} = \cfrac{\vec{ap} \cdot \vec{ab}}{{\| \vec{ab} \|}^2}$

Now we need to calculate the projection of vector $\vec{ap}$ on vector $\vec{ac}$ :

$proj_{\vec{ac}}{\vec{ap}} = \cfrac{\vec{ap} \cdot \vec{ac}}{\| \vec{ac} \|}$

Dividing the projection by the norm of vector $\vec{ac}$ gives us an eigenvalue and allows us to express its coordinate in terms of $\vec{ac}$ . Let’s call this eigenvalue σ.

$\sigma = \cfrac{proj_{\vec{ac}}{\vec{ap}}}{\| \vec{ac} \|} = \cfrac{\vec{ap} \cdot \vec{ac}}{{\| \vec{ac} \|}^2}$

With λ and σ in hand we can proceed to the last step in determining whether a place mark lies within the visible region of interest. The arc of circumference determining the distance boundary can be calculated using the Pythagorean theorem:

$a^2 + b^2 = c^2$

In our case:

$( \lambda \, \vec{ab} )^2 + ( \sigma \, \vec{ac} )^2 = r^2$

However the length of vectors $\vec{ab}$ and $\vec{ac}$ have the same length, and are equal in length to the radius r. Thus, in order to simplify the equation we can divide both sides by $r^2$ , leaving us with:

${\lambda}^2 + {\sigma}^2 = 1$

Now we can determine whether a place mark p is within the region of interest, therefore is visible. The eigenvalues λ and σ must be positive numbers (otherwise the place mark would be located behind the observer) and when combined, using the Pythagorean theorem, they have to be smaller or equal to 1, so they are not farther than the boundary set by the radius arc segment.

$p(\lambda, \sigma) = \begin{cases} \forall \, (\lambda > 0) \land (\sigma > 0) \land ({\lambda}^2 + {\sigma}^2 \leq 1) & \text{Visible} \\ \text{Otherwise} & \text{Not visible} \end{cases}$

Our next step is to translate what we have developed thus far into device (e.g., iPhone, iPad) equations and coordinates.

Overlaying Augmented Reality Annotations on the Device’s Screen

So far we have calculated which place marks are visible. Now we need to calculate their respective coordinates, sizes, and perspectives on the device’s screen. If we calculate the mid-point $m$ of the boundary arc segment and determine the vector $\vec{am}$ , we can calculate the distance $d$ , to represent how far a place mark $p$ is from the vector $\vec{am}$ . Point $m$ would be equivalent to the center of the device’s screen, and points $b$ and $c$ equivalent to the screen’s left and right margins. Thus, once having $d$ computed we can determine its proportionally equivalent $d'$ on the screen.

Figure 5. Projection on the device’s screen.

$m = \begin{bmatrix} \cfrac{r}{g} \cos \psi + a_x \\ \cfrac{r}{t} \sin \psi + a_y \end{bmatrix}$

The dot product between two vectors can be calculated as the product of the vector norms times the cosine of the angle ( $\theta$ ) between them.

$\vec{am} \cdot \vec{ap} = \| \vec{am} \| \| \vec{ap} \| \cos \theta$

The angle $\theta$ is the only unknown in the equation, and is exactly what we need to discover in order to calculate distance $d$ .

$\theta = \arccos \left( \cfrac{\vec{am} \cdot \vec{ap}}{\| \vec{am} \| \| \vec{ap} \|} \right)$

Now that we know $\theta$ we can proceed to calculate $d$ :

$d = \| \vec{ap} \| \sin \theta$

**Figure 6**. *Distance between a place mark and the midpoint vector $\vec{am}$ .*

The length of the vector $\vec{bc}$ is equivalent to the length $l$ of the screen. With that information we can calculate $d'$ :

$\cfrac{d}{\| b - c \|} = \cfrac{d'}{l} \implies d' = \cfrac{l \, d}{\| b - c \|}$

For each place mark we will need to calculate a coordinate $(x, y)$ representing the center of the augmented reality annotation (annotation for short), and dimensions $(w, h)$ representing its width and height, respectively. Having calculated the distance $d'$ , we can compute coordinate $x$ :

$x = \cfrac{l}{2} + d'$

In the equation above, $\cfrac{l}{2}$ gives us the middle of the screen on the longitudinal (cartesian $x$ ) axis, and $d'$ tells us how far our annotation should be from it. This leaves us with $y, w, h$ yet to be calculated.

However, before we continue forward, let me introduce a scale factor $s$ varying from $[0, 1]$ . That will become very important determining the remaining variables.

When plotting annotations on the screen, it would be nice to draw them smaller and closer to the top of the screen if a place mark is farther from the observer, and draw them larger and closer to the bottom of the screen, if a place mark is closer to the observer. This is where the scale factor $s$ becomes key.

We can define an annotation’s maximum size to have width defaultWidth and height defaultHeight, then by multiplying its size by the scale factor $s$ we can make the size of the annotation drawn on the screen to be inversely proportional to the place mark’s distance to the observer (which is the length of vector $\vec{ap}$ ), behaving as described in the previous paragraph. The scale factor $\vec{ap}$ is calculated by:

$s = 1 - \cfrac{\| \vec{ap} \|}{r}$

And the dimensions $(w, h)$ of a place mark is given by:

$w = s \times \textit{defaultWidth}$

$h = s \times \textit{defaultHeight}$

Figure 7. Augmented reality map with annotations.

The same idea we used for the dimensions of an annotation can be carried over to determine its $y$ coordinate. We just need to establish a maximum $y$ coordinate ( $y_{max}$ ) on the screen and multiply it by the scale factor $s$ .

$y = s \times y_{max}$

As you can see from the screenshot in figure 7, the annotations closer to the observer appear in larger size and closer to the bottom of the screen, and as distances from the observer grow, the annotations get smaller and closer to the top of the screen.

Reference Implementation

A reference implementation for iOS is available on GitHub. You can fork it from:

https://github.com/dcirne/ARGEOM

Let’s begin by taking a look at the source code files and explaining a little about of what each of them do.

Figure 8. Project files.

As you can see, we are using some of Cocoa Touch’s frameworks in this project. For example:

AVFoundation to capture the input from the device’s camera.
MapKit to place a map on the screen.
CoreMotion to detect device movement and acceleration (including the gesture to tell whether the device is parallel or perpendicular to the floor).
CoreLocation for user location and device heading.

This project’s deployment target is iOS 5.0 or greater.

If you already have the project open in Xcode, you should be able to select a device and hit Run to see it working.

DCViewController is the root view controller of this app and also has the responsibility of loading the collection [NSArray] of place marks. There are two choices of place marks collections to be loaded, and it is very easy to pick which one you want. The first one is calling the method:

- (NSArray *)loadPlacemarks;

In its implementation you can see a simplistic way of entering a collection of place marks. You can manually enter title, subtitle, latitude, and longitude. The other choice is to call the method:

- (void)loadPlacemarks:(PlacemarksLoaded)completionBlock;

This method loads the list of U.S. State Capitals from a CSV file in a background thread, parses it, allocates an instance of DCPlacemark for each capital, and adds it to the collection of place marks. Once the collection is complete, it invokes the completion block on the main thread passing all loaded place marks as parameter.

In order to keep the selection easy, you can comment out the compiler directive:

#define USE_EXTERNAL_PLACEMARKS

Comment it out if you want the simple, manual collection of place marks; or uncomment it if you want the list of U.S. State Capitals.

DCAugmentedRealityViewController is the place where most of the calculations take place. Using it is as simple as presenting the view controller and calling its start method passing a collection [NSArray] of DCPlacemark as parameter:

- (void)startWithPlacemarks:(NSArray *)placemarks;

When you hold your device parallel to the floor you will see the standard iOS map view from MapKit and the place marks as pins on the map. However, if you move your arm to hold the device perpendicular to floor, the map is resized, placed at the bottom-right corner, and the augmented reality mode gets started. Depending on the direction you are pointing the device to, and the distance set in the slider, you will see annotations appearing on the screen.

The program uses CoreMotion to monitor the device’s motion and detect whether it is parallel or perpendicular to the floor. Then we use this information to start/stop the augmented reality visualization mode. This is handled by the following methods:

- (void)startMonitoringDeviceMotion;
- (void)stopMonitoringDeviceMotion;
- (void)handleDeviceAcceleration:(CMAccelerometerData *)accelerometerData                            error:(NSError *)error;

One point worth mentioning is that by default iOS devices assume the top of the screen, in portrait mode, to represent due north. Thus, if we are holding a device in any other orientation, or if we rotate the device, we need to set the heading orientation to the appropriate value. This is accomplished in app by the method:

- (void)updateLocationManagerHeadingOrientation;

Now, the most important method in this class, and highly likely in the whole project, is the one where majority of the calculations described in this paper are performed. Let’s take a moment to dive deeper look into the code:

- (void)calculateVisiblePlacemarksWithUserLocation:(CLLocation *const)location
                                     heading:(CLHeading *const)heading
                             completionBlock:(PlacemarksCalculationComplete)completionBlock {
.
.
.
// Calculations are performed on a separate thread in a serial queue dispatch_async(placemarksQueue, ^{
} }
// Blocks to perform vector operations
double(^dotProduct)(double *, double *) = ^(double *vector1, double *vector2)... double(^norm)(double *) = ^(double *vector)...
void(^makeVector)(double **, CGPoint, CGPoint) = ^(double **vector, CGPoint point1, CGPoint point2)...
CLLocationDistance(^calculateDistanceBetweenPoints)(CGPoint, CGPoint) = ^(CGPoint point1, CGPoint point2)...
.
.
.
// Loops through place marks calculating which ones are visible and which
// ones are not
for (DCPlacemark *placemark in self.placemarks) {
    pointP = CGPointMake(placemark.coordinate.longitude,
                         placemark.coordinate.latitude);
    makeVector(&vectorAP, pointA, pointP);
    lambda = dotProduct(vectorAP, vectorAB) / pow(norm(vectorAB), 2);
    sigma = dotProduct(vectorAP, vectorAC) / pow(norm(vectorAC), 2);
    if ((lambda > 0) && (sigma > 0) && (pow(lambda, 2) + pow(sigma, 2) <= 1)) {
        thetaDirection = calculateDistanceBetweenPoints(pointB, pointP) <=
                         calculateDistanceBetweenPoints(pointC, pointP) ? -1.0 : 1.0;
        theta = acos(dotProduct(vectorAM, vectorAP) / (norm(vectorAM) *
                     norm(vectorAP))) * thetaDirection;
        dPrime = l * norm(vectorAP) * sin(theta) / norm(vectorBC);
        distanceFromObserver = [placemark
                                calculateDistanceFromObserver:location.coordinate];
        scale = 1.0 - distanceFromObserver / distance;
        placemark.bounds = CGRectMake(0,
                                      0,
                               defaultAugmentedRealityAnnotationSize.width * scale,
                               defaultAugmentedRealityAnnotationSize.height * scale);
        placemark.center = CGPointMake(lOver2 + dPrime, yMax * scale);
        [visiblePlacemarks addObject:placemark];
    } else {
        [nonVisiblePlacemarks addObject:placemark];
    }
    free(vectorAP);
}
.
.
.
dispatch_async(dispatch_get_main_queue(), ^{
     completionBlock([visiblePlacemarks copy], [nonVisiblePlacemarks copy]);
});

This method calculates all visible and non-visible place marks in a background serial dispatch queue and when finished it dispatches completionBlock on the main queue.

In this reference implementation the completion block contains a call to a single method. This method overlays the visible annotations on the screen and also remove the ones no longer visible.

- (void)overlayAugmentedRealityPlacemarks:(NSArray *const)visiblePlacemarks
                      nonVisiblePlacemarks:(NSArray *const)nonVisiblePlacemarks;

We have covered a lot of ground in this article. From learning the math necessary to determine whether a place mark is within a region of interest, to going a step further and translated it to fit inside a device’s screen, and last but not least a reference implementation for iOS.

As iOS evolves, the reference implementation may become outdated and better ways to write the code will emerge. Regardless, the concepts presented here and the math are likely to remain timeless.

Dalmo Cirne

The Math Behind Augmented Reality Maps

Overlaying Augmented Reality Annotations on the Device’s Screen

Reference Implementation

**Get the Streams newsletter.**

Like this:

Comments

Leave a ReplyCancel reply

The Math Behind Augmented Reality Maps

Overlaying Augmented Reality Annotations on the Device’s Screen

Reference Implementation

Get the Streams newsletter.

Share this:

Like this:

Comments

Leave a ReplyCancel reply

Discover more from Dalmo Cirne

**Get the Streams newsletter.**