Basic sequential algorithmic scheme

The basic sequential algorithmic scheme (BSAS) is a very basic clustering algorithm that is easy to understand. In the basic form vectors are presented only once and the number of clusters is not known at priori. What is needed is the dissimilarity measured as the distance d (x, C) between a vector point x and a cluster C, threshold of dissimilarity Θ and the number of maximum clusters allowed q. The idea is to assign every newly presented vector to an existing cluster or create a new cluster for this sample, depending on the distance to the already defined clusters. As pseudocode, the algorithm looks like the following:

 1. m = 1; Cm = {x1}; // Init first cluster = first sample     
 2. for every sample x from 2 to N      
   a. find cluster Ck such that min d(x, Ck)     
   b. if d(x, Ck) > Θ AND (m < q)      
     i. m = m + 1; Cm = {x} // Create a new cluster      
   c. else      

     i. Ck = Ck + {x} // Add sample to the nearest cluster      
     ii. Update representative if needed      
 3. end algorithm

As can be seen the algorithm is simple but still quite efficient. Different choices for the distance function lead to different results and unfortunately the order in which the samples are presented can also have a great effect to the final result. What’s also very important is a correct value for Θ. This value has a direct effect on the number of formed clusters. If Θ is too small unnecessary clusters are created and if too large a value is chosen less than required number of clusters are formed.

One detail is that if q is not defined the algorithm ‘decides’ the number of clusters on its own. This might be wanted under some circumstances but when dealing with limited resources a limited q is usually chosen. Also, BSAS can be used with a similarity function simply by replacing the min function with max.

There exists a modification to BSAS called modified BSAS (MBSAS), which runs twice through the samples. It overcomes the drawback that a final cluster for a single sample is decided before all the clusters have been created. The first phase of the algorithm creates the clusters (just like 2b in BSAS) and assigns only a single sample to each cluster. Then the second phase runs through the remaining samples and classifies them to the created clusters (step 2c in BSAS).

External links

This article is issued from Wikipedia - version of the Tuesday, September 22, 2015. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.