Lane tracking

The default lane tracker, called Pathfinder, uses a correlation approach to identify the lanes on a gel. The original version of Pathfinder did not take account of marker lanes whereas the new version includes support for marker lanes. If Pathfinder fails to identify the marker lanes it reverts to the old, non-marker approach so this is described first with the modifications for marker lanes below.

How to identify lanes

Since each lane has a finite width, typically 5-10 pixels, within any particular lane the gel image should be well correlated across the lane. Consider the cross-correlation of the gel image along two lines at the same down-lane position and in the general direction of the lane. We would expect the correlation to be high if the two lines are in the same lane and low if the they are not. Typically we take pairs of lines separated by (1/3) of the expected lane width and calculate the correlation across the full width of the gel. A plot of the correlation function is shown below.

Correlation

Correlation

zoom

The plot shows peaks and troughs corresponding to the centre/edge of each lane. A threshold, here set to 0.9 (the red line) is used to identify the peaks at the centre of each lane (indicated in blue). Note that this approach has not identified all the lanes. Given that the lanes are evenly spaced across the gel several lanes, e.g. at 55 and 110, have been missed, this is because the correlation falls below the threshold. Rather than force the number of lanes to be correct simply by lowing the threshold we have choosen to identify the "best" lanes, adding missing lanes later by interpolation.

What happens if the lanes are not straight?

If the lanes were straight we could try to identify lanes by calculating the correlation along the full length of each lane. In practice lanes are not straight so we split the gel into a number of zones and look for the lanes in each zone. Initially "seed points" for each lane are identified in a central zone where we expect the lanes to be relatively straight. The lanes are then extended in both directions by tracking the peaks in the correlation function. The position of the lane in the next zone is found by searching for a maximum in the correlation function over a small interval centred on the expected position given by an extrapolation of the previous points in the lane. This approach can track lanes which are bent, although the maximum allowable deviation from the expected position is restricted by the size of the zone.

A secondary problem is that the gel may be skewed, to account for this the correlation is actually calculated for a small set of both positive and negative lags and only the maximum value retained. This can result in the occasional relatively large correlation in the troughs between lanes which can be seen in the correlation plot.

Usually such an approach will not find all the lanes. To generate a full set of lanes, missing lanes are added by interpolation. Firstly the average distance between neighbouring lanes is calculated, this it then compared to the expected lane width. If the distance between two lanes exceeds the expected distance by some threshold extra lanes are added until the average distance between the lanes lanes is reduced.

Pathfinder and marker lanes

Image requires gels to have marker lanes and for there to be a fixed number of lanes between the marker lanes. Providing the marker repeat is greater than one, i.e. there is atleast one lane between marker lanes, the new Pathfinder will first attempt to identify the marker lanes. Assuming it can identify the correct number of marker lanes the other lanes are filled in by interpolation. The only difference from the approach outlined above is at the initial "seeding" stage. First all possible lanes are identified as above. Then marker lanes are identified by correlating each lane against all the other lanes. If a particular lane is a marker lane then we would expect to see peaks in the correlation plot corresponding to each marker lane. The inter-lane correlation is calculated along pairs of lines positioned at the centres of the lanes. Since these marker lanes may be separated by up to the gel width we have found it is necessary to allow a larger range of lags when calculating the maximum correlation. A plot of the inter-lane correlation for the same gel as considered above is given below

Marker correlation

Marker correlation

zoom

The peaks correspond to the centre of each possible marker lane. Note the correlation is 1 for the first marker lane because this is the lane that is correlated against all other the lanes. A threshold, here set to 0.8 (the red line) is used to identify the peaks at the centre of each lane (indicated in blue). We use a lower threshold to identify marker lanes because of the increased separation between lines. Provided the section of the gel where the marker lanes are "seeded" is relatively straight and not too skewed it is usually possible to identify a full set of 'seeds' for marker lanes. Once the marker 'seeds' have been found the lanes are extended using the approach described above.

What happens if Pathfinder can't find the marker lanes ?

For most gels Pathfinder is able to find a full set of markers lanes and the correct number of lanes are inserted between marker lanes. When this approach fails Pathfinder will simply identify lanes where the correlation within the lane is high and any missing lanes are inserted if the distance between lanes exceeds the expected lane width. This approach can lead to too many extra lanes being interpolated, they are removed by only retaining lanes inwhich the variation along the lane exceeds a threshold. This excludes extra lanes inserted where there is no signal or only a constant background. The threshold is calculated so the the correct number of lanes are retained. Because this approach is not based on marker lanes you may need to Edit the lanes to ensure that Image correctly identifies the marker lanes.

Refining the lanes

For this algorithm to work we require a good a-priori estimate of the expected lane width. If this is too large we may fail to identify the lanes because the correlations are too low. In extreme cases the separation of the two lines on which the correlation is calculated may be greater than the lane width. For this reason once a set of lanes has been identified an updated lane width is calculated. If this new lane width is significantly different from the expected value the algorithm is re-run using the new lane width. Several iterations may be required if the lanes only occupy a portion of the gel since the initial estimate of the expected lane width assumes lanes are evenly across the whole gel.

* quick link - http://q.sanger.ac.uk/ilrekmkd