Sautrela

edu.gtts.sautrela.vq
Class ClusterSet

java.lang.Object
  extended by edu.gtts.sautrela.vq.ClusterSet

public class ClusterSet
extends java.lang.Object

A set of Clusters sharing a Collection of multidimensional data.


Constructor Summary
ClusterSet()
          Creates a new empty ClusterSet
ClusterSet(java.util.List<DoubleData> codebook)
          Creates a new ClusterSet with initial empty Clusters from a CodeBook.
ClusterSet(java.net.URL cdbkURL, boolean binary)
          Creates a new ClusterSet with initial enpty Clusters from a codebook file (containing dumped Data).
 
Method Summary
 void addCluster(Cluster c)
          Adds a new Cluster and all its data to this ClusterSet.
 void addCluster(java.util.Collection<Cluster> c)
          Adds all the Cluster elements in the specified Collection to this ClusterSet.
 void addData(java.util.Collection<double[]> c)
          Adds all the data elements in the specified Collection to this ClusterSet.
 void addData(double[] d)
          Adds new data to this ClusterSet.
 void doClustering()
          Removes data from all Clusters and does Clustering with all the data avaiable in this ClusterSet.
 void dumpCodebook(java.io.File file, boolean binary)
           
 void emptyClusters()
          Removes all data from all the contained Clusters.
 java.util.List<Cluster> getClusters()
          Returns all the Clusters contained in this ClusterSet.
 java.util.Collection<double[]> getData()
          Returns all the data contained in this ClusterSet.
 double getDistorsion()
          Returns the current distorsion of the ClusterSet (the sum over all the Clusters).
 double[] getDistorsions()
          Returns a new vector containing all the distorsions of the ClusterSet.
 void lbg(double epsi, int maxiter, boolean elbg)
          Applies LBG algorithm to this ClusterSet.
 int nearest(double[] d)
          Returns the nearest Cluster's index for a given data.
 int nearestTo(int i)
          Returns the nearest Cluster's index for a Cluster in the ClusterSet.
 void removeData()
          Removes all data both from this ClusterSet an all the contained Clusters.
 void setRandomClusters(int size)
          Creates new Clusters up to the desired size using this ClusterSet's random data elements as centroids.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ClusterSet

public ClusterSet()
Creates a new empty ClusterSet


ClusterSet

public ClusterSet(java.util.List<DoubleData> codebook)
           throws DataProcessorException
Creates a new ClusterSet with initial empty Clusters from a CodeBook.

Parameters:
codebook - a List of DoubleData containing the initial codebook
Throws:
DataProcessorException - when the constructor fails with the given parameters

ClusterSet

public ClusterSet(java.net.URL cdbkURL,
                  boolean binary)
           throws DataProcessorException
Creates a new ClusterSet with initial enpty Clusters from a codebook file (containing dumped Data). Dumped Data if filtered and only DoubleData elements are loaded.

Parameters:
cdbkURL - initial codebook URL
binary - dumped binary data flag. If set to true, Data is supposed to be serialized. If set to false, data is supposed to be xml text.
Throws:
DataProcessorException - when the constructor fails with the given parameters
Method Detail

dumpCodebook

public void dumpCodebook(java.io.File file,
                         boolean binary)
                  throws DataProcessorException
Throws:
DataProcessorException

addData

public void addData(double[] d)
Adds new data to this ClusterSet. Clusters are not affected.

Parameters:
d - data to be added

addData

public void addData(java.util.Collection<double[]> c)
Adds all the data elements in the specified Collection to this ClusterSet. Clusters are not affected.

Parameters:
c - data elements to be added.

addCluster

public void addCluster(Cluster c)
Adds a new Cluster and all its data to this ClusterSet. WARNING: If the data was previously contained in this ClusterSet, it will be added again.

Parameters:
c - the Cluster to be added.

addCluster

public void addCluster(java.util.Collection<Cluster> c)
Adds all the Cluster elements in the specified Collection to this ClusterSet. WARNING: If any data was previously contained in this ClusterSet, it will be added again.

Parameters:
c - Cluster elements to be added.

getData

public java.util.Collection<double[]> getData()
Returns all the data contained in this ClusterSet.

Returns:
all the data contained in this ClusterSet.

getClusters

public java.util.List<Cluster> getClusters()
Returns all the Clusters contained in this ClusterSet. Those Clusters could contain any data of the ClusterSet

Returns:
all the Clusters contained in this ClusterSet.

removeData

public void removeData()
Removes all data both from this ClusterSet an all the contained Clusters.


emptyClusters

public void emptyClusters()
Removes all data from all the contained Clusters. Data is not removed from this ClusterSet.


getDistorsion

public double getDistorsion()
Returns the current distorsion of the ClusterSet (the sum over all the Clusters).

Returns:
the current distorsion of the ClusterSet.

getDistorsions

public double[] getDistorsions()
Returns a new vector containing all the distorsions of the ClusterSet.

Returns:
all the distorsions of the ClusterSet.

doClustering

public void doClustering()
Removes data from all Clusters and does Clustering with all the data avaiable in this ClusterSet. After the clustering, each Cluster contains the nearest data set to his current centroid. The clustering does not alter centroids.


nearest

public int nearest(double[] d)
Returns the nearest Cluster's index for a given data. This index conforms to the order of the List returned by getClusters()

Parameters:
d - the given data element
Returns:
the nearest Cluster's index.

nearestTo

public int nearestTo(int i)
Returns the nearest Cluster's index for a Cluster in the ClusterSet. Those indexes conforms the order of the List returned by getClusters()

Parameters:
i - the index of the given cluster
Returns:
the nearest Cluster's index.

lbg

public void lbg(double epsi,
                int maxiter,
                boolean elbg)
Applies LBG algorithm to this ClusterSet. The algorithm finishes when either the convergence criteria is reached or a maximun number of iterations occurs.

Parameters:
epsi - convergence constant. The convergence is reached when (dold-dnew)/dnew <= epsi , being dold and dnew the previous and current iteration distorsions.
maxiter - maximun number of iterations.
elbg - if true, the Enhanced LBG algorithm is used. in each iteration, for each Cluster with distorsion smaller than the mean ("low" Cluster):
  • select randomly (according to their distorsions) a "high" cluster
  • find the nearest cluster to the "low" one
  • split the "high" cluster selecting one element randomly and creating the symmetrical one.
  • create a new ClusterSet ( split1+split2+nearest )
  • do a 1 iteration lbg on the ClusterSet
  • if distorsion is reduced, replace clusters
See Also:
ELBG demo applet

setRandomClusters

public void setRandomClusters(int size)
Creates new Clusters up to the desired size using this ClusterSet's random data elements as centroids. If previous Clusters existed, they are not affected and only needed new ones are created.

Parameters:
size - the number of Clusters that will contain this ClusterSet

Sautrela