Calculations for pinch to zoom
In which I discover how to correct for things moving around when you zoom, using only elementary algebra.
Text Collector uses a pinch-pan-zoom view to let people preview how their messages will look in pdf format. Inexplicably, Android provides no pinch-pan-zoom view built-in, so a quick look online reveals implementations to fill that gap littered everywhere. Those that aren’t broken, however, can only handle ImageView content.
If you need pinch-to-zoom for something other than pictures, you need to reinvent it.
I struggled with this implementation for an embarrassing amount of time, and judging by the number of wonky zooms I’ve seen in Android games, I’m not alone in finding it tricky.
Android does give us ScaleGestureDetector to detect pinches; it reports a “scale factor” that is a ratio representing how far our fingers move apart or together. The obvious thing to do is to scale your content, using View.setScale(), something like setScale(getScale() * scaleFactor). That’s the right idea, but insufficient.
Scaling a view transforms it around its “pivot,” an arbitrary point somewhere in the view. What we really want is to scale it around the “focus” of the zoom, that is, the bit of content between our fingers. Focus and pivot don’t line up, so, as we zoom, the content we want to see rushes away offscreen.
Model
We have two different coordinate systems because we need a fixed-size touchable area to detect fingers and a changing-size area to display content. I call these the “window” and the “content,” respectively. As reported by Android, focus is in the window grid and pivot is in the content grid.
Misaligned pivot and focus cause scaling to shift the view content away from wherever it’s supposed to be after the zoom. To correct, we need to translate back by an amount t.
- t: translation needed to correct for scaling, window units
Android gives us these measurements:
- f: focal point of the zoom, window units
- m: margin outside the content, window units
- s: starting scale, window units per content unit
- z: scale factor, that is, change in scale, unitless
Two measurements change during scaling. I will denote them with a tick mark meaning “prime:”
- m’: margin after scaling, window units
- s’: scale after scaling, window units per content unit
Scale factor is the ratio of scale before to scale after, so:
Actually, the scale factor and focus used here are approximations that work well, but could be refined in a more complete model.
We’ll use a couple measurements in the content grid as well:
- P: pivot around which scaling happens, content units
- D: content that aligns with the zoom focal point when zoom begins, content units
When scaling, measurements in the content grid do not change. Upon reflection, this should be obvious because the content can draw itself without knowing it’s been zoomed. So, even though it looks like P grows in this diagram, remember this diagram shows the window perspective. From the content perspective, P does not change.

Android gives us P but we need to calculate D for ourselves. Since f and m are in different coordinates than D we cannot say that D = f − m.
This makes me wish for a language like Frink that attaches units to numbers. You actually can add measurements of different units together, but only if there’s a defined conversion. So, something, like D = f − m could do something sane.
In Java and all mainstream languages, numbers are unitless, so it’s easy to add numbers nonsensically.
For both grids, the origin is at the left side. To convert between coordinates on the window grid (subscript r) and the content grid (subscript c):
s’ = zs
x’r = s’xc + m’
So:
⇒ D = (f − m)/(s)
Given these things, we need to solve for t, the translation that will rescue the content we want to see from wherever it went during scaling.
It is important that even though we call a View function, setTranslation(), on the content to translate it, the number we pass that function is in window coordinates, not content coordinates.
Derivation
So far, the things we know, given by the Android api are f, m, s, z and P, from which we know how to calculate D and s’.
Next, we need m’, the margin after scaling.
In software, you don’t actually have to calculate m’ yourself. You can setScale() then getLocationOnScreen() to ask the view where it would place its corner, but that’s cheating.
To find m’ in terms of things that we know, another variable helps to translate pivot from content to window:
- w: position of the pivot, P, in the window grid, that is, w = Ps + m.
w’ = Ps’ + m’

The t correction will move the pivot in window space, but only after scaling. By definition, the pivot does not move due to scaling alone, so w = w’. This implies:
⇒ m’ = Ps − Ps’ + m
⇒ m’ = Ps(1 − z) + m
Next, we use m’ to calculate the thing we really want, the translation t, to compensate for zoom. Define a variable translating the content position under the focus, D, to window coordinates:
- h: position of the content at D after scaling, in the window grid, so h = Ds’ + m’

Because the translation t is in window coordinates, t = f − h. Recalling that s’ = zs, we know:
t = f − Dzs − m’
Plugging in D:
t = f − z(f − m) − m’
Since we previously derived m’, we are now done:
To make this Android-executable code, we just need to translate to Java and do the same for each axis. Source is on sourcehut.
But wait, there’s more
An eager kid in the front row is waving his hand to tell me about affine transformations. We can simplify further, you see:
I didn’t go this far because it’s no cleaner in Java, but does look more symmetrical in Math: tantalizingly like a dot product. Unfortunately, I forgot most of my linear algebra long ago, so I have no idea why. Best go watch 3Blue1Brown.