I think it would be easiest to just ignore manual panning, as it simplifies things as well as makes it easier to keep track of everything. The camera max zoom will probably be bound as by the area needed to see all nodes, and the minimum zoom will be based on action - planning phase will allow zooming in all the way to the centre node, but during testing will follow the minimum bounds required to show all moving bits.
I tried the tracking thing, but in the end I figured for the effort required and the result, it probably isn't worth it - a prototype test just made the screen feel dizzy as signals went back and forth, causing the camera to zoom in and out. I think a gradual zoom from the centre to the rest of the map would be simpler and look nicer.
Adding the zoom is pretty much as case of using the functionality of the touch positions, as suggested by this answer. It is important to note that it is using the touch position for the screen, not the game positions, since that would exponentially increase the further out the camera zooms. The way my zoom works was this:
float posRatio = CameraZoom / CameraZoomMax;
if (Input.touchCount >= 2)
float distance = Vector2.Distance(Input.GetTouch(0).position, Input.GetTouch(1).position);
if (pinchZoom > 0)
CameraZoom += (pinchZoom - distance) * 0.025f / (1 + posRatio);
pinchZoom = distance;
pinchZoom = -1; //reset;
CameraZoom = Mathf.Clamp(CameraZoom, CameraZoomMin, CameraZoomMax);
SmoothCameraZoom = (SmoothCameraZoom + CameraZoom * Time.deltaTime * 5) / (1 + Time.deltaTime * 5);
MainCamera.camera.orthographicSize = SmoothCameraZoom + CameraBorder;
posRatio = SmoothCameraZoom / CameraZoomMax;
MainCamera.transform.position = new Vector3
CameraCentreNodePos.x + (CameraCentreMapPos.x - CameraCentreNodePos.x) * posRatio,
CameraCentreNodePos.y + (CameraCentreMapPos.y - CameraCentreNodePos.y) * posRatio,
It works by taking the difference between the distance of the fingers between the current frame and the previous frame, and changing the zoom based on the result. The camera then pans between the centre node and the whole level shot using the ratio, looking at the current camera size against the maximum. Then the camera is made smoother by making it equal the "average" of the current zoom and the target zoom, modified by Time.deltaTime to make it as smooth and consistent as possible regardless of framerate.
The result is actually ok (or at least good enough) if you ask me. Plus due to the panning ratio it importantly can also do levels where the node does not start in the centre (as shown at the top).