Hoshen–Kopelman algorithm

The Hoshen–Kopelman algorithm is a simple and efficient algorithm for labeling clusters on a grid. Where the grid is a regular network of cells, with the cells being either occupied or unoccupied. This algorithm is based on a well-known union-finding algorithm.[1] The algorithm was originally described in by J. Hoshen and R. Kopelman in their 1976 paper "Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm".[2]

Percolation theory

Percolation theory is the study of the behavior and statistics of clusters on lattices. Suppose we have a large square lattice where each cell can be occupied with the probability p and can be empty with the probability 1  p. Each group of neighboring occupied cells forms a cluster. Neighbors are defined as cells having a common side but not those sharing only a corner i.e. we consider 4x4 neighborhood that is top, bottom, left and right. Each occupied cell is independent of the status of its neighborhood. The number of clusters, the size of each cluster and their distribution are important topics in percolation theory.

Consider 5x5 grids in figure (a) and (b).
In figure (a), the probability of occupancy is p = 6/25 = 0.24.
In figure (b), the probability of occupancy is p = 16/25 = 0.64.

Hoshen–Kopelman algorithm for cluster finding

In this algorithm, we scan through a grid looking for occupied cells and labeling them with cluster labels. The scanning process is called as Raster Scan. The algorithm begins with scanning the grid cell by cell and check if the cell is occupied or not. If the cell is occupied, then it must be labeled with a cluster label. This cluster label is decided based on the neighbors of that cell. (For this we are going to use Union-Find Algorithm which is explained in the next section.) If the cell doesn’t have any occupied neighbors then, a new label is assigned to the cell.[3]

Union-find algorithm

This algorithm is a simple method for computing equivalence classes. Calling the function union(x,y) specifies that, items x and y are members of the same equivalence class. Because equivalence relations are transitive; all the items equivalent to x are equivalent to all the items equivalent to y. Thus for any item x, there is a set of items which are all equivalent to x . This set is the equivalence class of which x is a member. A second function find(x) returns a representative member of the equivalence class to which x belongs.

Pseudo-code

During the raster scan of the grid, whenever an occupied cell is encountered, neighboring cells are scanned to check whether any of them have already been scanned. If we find already scanned neighbors, the union operation is performed, to specify that these neighboring cells are in fact members of the same equivalence class. Then thefind operation is performed to find a representative member of that equivalence class with which the current cell will be labeled.

On the other hand,if the current cell has no neighbors, it is assigned a new, previously unused, label. The entire grid is processed in this way.

Following pseudo-code is referred from Tobin Fricke's implementation of the same algorithm.[3]

   Raster Scan and Labeling on the Grid
   largest_label = 0;
   for x in 0 to n_columns {
       for y in 0 to n_rows {
           if occupied[x,y] then
               left = occupied[x-1,y];
               above = occupied[x,y-1];
               if (left == 0) and (above == 0) then /* Neither a label above nor to the left. */
                   largest_label = largest_label + 1; /* Make a new, as-yet-unused cluster label. */
                   label[x,y] = largest_label;
               else if (left != 0) and (above == 0) then /* One neighbor, to the left. */
                   label[x,y] = find(left);
               else if (left == 0) and (above != 0) then /* One neighbor, above. */
                   label[x,y] = find(above);
               else /* Neighbors BOTH to the left and above. */
                   union(left,above); /* Link the left and above clusters. */
                   label[x,y] = find(left);
       }
   }
   Union
   void union(int x, int y)  {
       labels[find(x)] = find(y);
   }
   Find
   int find(int x)  {
       int y = x;
       while (labels[y] != y)
           y = labels[y];
       while (labels[x] != x)  {
           int z = labels[x];
           labels[x] = y;
           x = z;
       }
   return y;
   }

Example

Consider the following example. The dark cells in the grid in figure (a) represent that they are occupied and the white ones are empty. So by running H–K algorithm on this input we would get the output as shown in figure (b) with all the clusters labeled.

The algorithm processes the input grid, cell by cell, as follows: Let's say that grid is a two-dimensional array.

Consider 6x6 grids in figure (a) and (b).
Figure (a), This is the input to the Hoshen–Kopelman algorithm.
Figure (b), This is the output of the algorithm with all the clusters labeled.

Applications

See also

References

This article is issued from Wikipedia - version of the 10/31/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.