Calculations for pinch to zoom

In which I discover how to correct for things moving around when you zoom, using only elementary algebra.

Text Collector uses a pinch-pan-zoom view to let people preview how their messages will look in pdf format. Inexplicably, Android provides no pinch-pan-zoom view built-in, so a quick look online reveals implementations to fill that gap littered everywhere. Those that aren’t broken, however, can only handle ImageView content.

If you need pinch-to-zoom for something other than pictures, you need to reinvent it.

I struggled with this implementation for an embarrassing amount of time, and judging by the number of wonky zooms I’ve seen in Android games, I’m not alone in finding it tricky.

Android does give us ScaleGestureDetector to detect pinches; it reports a “scale factor” that is a ratio representing how far our fingers move apart or together. The obvious thing to do is to scale your content, using View.setScale(), something like setScale(getScale() * scaleFactor). That’s the right idea, but insufficient.

Scaling a view transforms it around its “pivot,” an arbitrary point somewhere in the view. What we really want is to scale it around the “focus” of the zoom, that is, the bit of content between our fingers. Focus and pivot don’t line up, so, as we zoom, the content we want to see rushes away offscreen.


We have two different coordinate systems because we need a fixed-size touchable area to detect fingers and a changing-size area to display content. I call these the “window” and the “content,” respectively. As reported by Android, focus is in the window grid and pivot is in the content grid.

Misaligned pivot and focus cause scaling to shift the view content away from wherever it’s supposed to be after the zoom. To correct, we need to translate back by an amount t.

  • t: translation needed to correct for scaling, window units

Android gives us these measurements:

  • f: focal point of the zoom, window units
  • m: margin outside the content, window units
  • s: starting scale, window units per content unit
  • z: scale factor, that is, change in scale, unitless

Two measurements change during scaling. I will denote them with a tick mark meaning “prime:”

  • m': margin after scaling, window units
  • s': scale after scaling, window units per content unit

Scale factor is the ratio of scale before to scale after, so:

s' = zs

Actually, the scale factor and focus used here are approximations that work well, but could be refined in a more complete model.

We’ll use a couple measurements in the content grid as well:

  • P: pivot around which scaling happens, content units
  • D: content that aligns with the zoom focal point when zoom begins, content units

When scaling, measurements in the content grid do not change. Upon reflection, this should be obvious because the content can draw itself without knowing it’s been zoomed. So, even though it looks like P grows in this diagram, remember this diagram shows the window perspective. From the content perspective, P does not change.

Diagram of measurementsAndroid gives us P but we need to calculate D for ourselves. Since f and m are in different coordinates than D we cannot say that D = f - m.

This makes me wish for a language like Frink that attaches units to numbers. You actually can add measurements of different units together, but only if there’s a defined conversion. So, something, like D = f - m could do something sane.

In Java and all mainstream languages, numbers are unitless, so it’s easy to add numbers nonsensically.

For both grids, the origin is at the left side. To convert between coordinates on the window grid (subscript r) and the content grid (subscript c):

x_r = sx_c + m \newline s' = zs \newline x'_r =s'x_c + m'


f = sD + m \newline \Rightarrow D = \frac{f-m}{s}

Given these things, we need to solve for t, the translation that will rescue the content we want to see from wherever it went during scaling.

It is important that even though we call a View function, setTranslation(), on the content to translate it, the number we pass that function is in window coordinates, not content coordinates.


So far, the things we know, given by the Android api are f, m, s, z and P, from which we know how to calculate D and s'.

Next, we need m', the margin after scaling.

In software, you don’t actually have to calculate m' yourself. You can setScale() then getLocationOnScreen() to ask the view where it would place its corner, but that’s cheating.

To find m' in terms of things that we know, another variable helps to translate pivot from content to window:

  • w: position of the pivot, P, in the window grid, that is, w=Ps+m.

w = Ps + m \newline w' = Ps' + m'

Relationship of w to m and PThe t correction will move the pivot in window space, but only after scaling. By definition, the pivot does not move due to scaling alone, so w = w'. This implies:

Ps + m = Ps' + m' \newline \Rightarrow m' = Ps - Ps' + m \newline\Rightarrow m' = Ps(1 - z) + m

Next, we use m' to calculate the thing we really want, the translation t, to compensate for zoom. Define a variable translating the content position under the focus, D, to window coordinates:

  • h: position of the content at D after scaling, in the window grid, so h = Ds'+m'

Relationship of m, h, f and D to t

Because the translation t is in window coordinates, t = f - h. Recalling that s' = zs, we know:

t = f - h \newline t = f - Dzs - m'

Plugging in D:

t = f - \frac{f-m}{s}zs - m' \newline t = f - z(f-m) - m'

Since we previously derived m', we are now done:

t = f - z(f-m) - (Ps(1 - z) + m)

To make this Android-executable code, we just need to translate to Java and do the same for each axis. Source is on Bitbucket.

But wait, there’s more

An eager kid in the front row is waving his hand to tell me about affine transformations. We can simplify further, you see:

t = (f - Ps)(1 - z) + m(z - 1)

I didn’t go this far because it’s no cleaner in Java, but does look more symmetrical in Math: tantalizingly like a dot product. Unfortunately, I forgot most of my linear algebra long ago, so I have no idea why. Best go watch 3Blue1Brown.

How Unicode can save math: part 2

It’s widely known that decimal – or “numbers” to most of us – is an inferior system. Decimal doesn’t work well for computers, which prefer base two and it doesn’t work well for humans either, at least not when compared to dozenal.

Dozenal is also called “duodecimal,” or “base 10” (when writing in dozenal) and it is a much more natural system for humans than decimal. The usual example of why is a clock. Look how neat it is with the number 10 right at the top:

Dozenal clock face
From the Dozenal Society of Great Britain

Dozenal has a big problem though, as we can see from the clock. What number does 10 represent, when you see it out of context? You just don’t know.

For decades, we’ve solved this problem in computer programming with funny prefixes. To a programmer, dozenal and decimal might be “base 0xA” and “base 0xC”. Likewise, in a dozenal world, we might write “hexadecimal” as “base 0z14” or something. If we need to start writing all our numbers with warts to indicate the base, however, dozenal seems doomed.

But wait, there’s hope. Unicode already contains the digits for “dek” and “el.” (That’s ten and eleven, if you’re not a cool dozenal kid.) If your browser doesn’t have a suitable font, refer to the clock above. If it does, they look like this:

↊ ↋

Now all we need is nine more Unicode symbols for the rest of the digits. Zero is special: for zero, there need be only one.

How Unicode can save math: part 1

Every casual math enthusiast has by now heard of the raging war between tau and pi.

The what?

Ok, I mean, tau’s gaining a little ground, but really, pi has the weight of history behind it, so “raging” and “war” might be overstating things a little. The point is that 3.14, et cetera, is a bad circle constant and there’s a more intuitive option, “tau.”

Take a 90-degree angle. In radians, it’s half pi, but one quarter tau makes much more sense:

A quarter cut out of a pizza
Half pi!

If we call 2 times pi “tau,” this slice is one quarter, and things just make more sense.

Doesn’t seem to be catching on.

Not really, no. And tau has some problems too: for example τ=2π, but the tau glyph has only one leg and pi has two, so it looks like pi is twice tau. Shouldn’t tau have four legs? Clearly, the problem is this symbol: we need something more familiar. How about we just redefine pi to be twice itself?

Madness. Only confusion and chaos can result.

But not if we change the spelling. We’ll say that pie = 2pi.

Mathematicians will never go for it.

Perhaps, but food-based math has rich history. If you’re worried that it lacks a succinct one-character symbol, well, that’s where Unicode comes into the picture. In 2017, we finally have pie emoji, 🥧.