| Title: | Affinity Propagation Clustering |
|---|---|
| Description: | Implements Affinity Propagation clustering introduced by Frey and Dueck (2007) <DOI:10.1126/science.1136800>. The algorithms are largely analogous to the 'Matlab' code published by Frey and Dueck. The package further provides leveraged affinity propagation and an algorithm for exemplar-based agglomerative clustering that can also be used to join clusters obtained from affinity propagation. Various plotting functions are available for analyzing clustering results. |
| Authors: | Ulrich Bodenhofer [aut, cre], Johannes Palme [ctb], Chrats Melkonian [ctb], Andreas Kothmeier [aut], Nikola Kostic [ctb] |
| Maintainer: | Ulrich Bodenhofer <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.4.14 |
| Built: | 2026-05-07 07:52:40 UTC |
| Source: | https://github.com/ubod/apcluster |
The apcluster package implements affinity propagation according to Frey and Dueck and a method for exemplar-based agglomerative clustering. It further offers various functions for plotting clustering results.
The central function is apcluster. It runs affinity
propagation on a given similarity matrix or it creates a similarity matrix
for a given data set and similarity measure and runs affinity propagation
on this matrix. The function returns an APResult
object from which the clustering itself and information about the affinity
propagation run can be obtained. Leveraged affinity propagation clustering
apclusterL allows efficient clustering of large datasets by
using only a subset of the similarities. The package further implements
an exemplar-based agglomerative clustering method aggExCluster
that can be used for computing a complete cluster hierarchy, but also for
joining fine-grained clusters previously obtained by affinity propagation
clustering. Further functions are implemented to visualize the
results and to create distance matrices.
Ulrich Bodenhofer, Andreas Kothmeier, Johannes Palme, Chrats Melkonian, and Nikola Kostic
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim, details=TRUE) ## show details of clustering results show(apres) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## employ agglomerative clustering to join clusters aggres <- aggExCluster(sim, apres) ## show information show(aggres) show(cutree(aggres, 2)) ## plot dendrogram plot(aggres) ## plot clustering result for k=2 clusters plot(aggres, x, k=2) ## plot heatmap heatmap(apres, sim) ## leveraged apcluster apresL <- apclusterL(s=negDistMat(r=2), x=x, frac=0.2, sweeps=3) ## show details of clustering results show(apresL) ## plot clustering result plot(apresL, x)## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim, details=TRUE) ## show details of clustering results show(apres) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## employ agglomerative clustering to join clusters aggres <- aggExCluster(sim, apres) ## show information show(aggres) show(cutree(aggres, 2)) ## plot dendrogram plot(aggres) ## plot clustering result for k=2 clusters plot(aggres, x, k=2) ## plot heatmap heatmap(apres, sim) ## leveraged apcluster apresL <- apclusterL(s=negDistMat(r=2), x=x, frac=0.2, sweeps=3) ## show details of clustering results show(apresL) ## plot clustering result plot(apresL, x)
Runs exemplar-based agglomerative clustering
## S4 method for signature 'matrix,missing' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'matrix,ExClust' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'Matrix,missing' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'Matrix,ExClust' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'missing,ExClust' aggExCluster(s, x, includeSim=TRUE) ## S4 method for signature 'function,ANY' aggExCluster(s, x, includeSim=TRUE, ...) ## S4 method for signature 'character,ANY' aggExCluster(s, x, includeSim=TRUE, ...)## S4 method for signature 'matrix,missing' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'matrix,ExClust' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'Matrix,missing' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'Matrix,ExClust' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'missing,ExClust' aggExCluster(s, x, includeSim=TRUE) ## S4 method for signature 'function,ANY' aggExCluster(s, x, includeSim=TRUE, ...) ## S4 method for signature 'character,ANY' aggExCluster(s, x, includeSim=TRUE, ...)
s |
an |
x |
either a prior clustering of class |
includeSim |
if |
... |
all other arguments are passed to the selected similarity function as they are. |
aggExCluster performs agglomerative clustering.
Unlike other methods, e.g., the ones implemented in hclust,
aggExCluster is computing exemplars for each cluster and
its merging objective is geared towards the identification of
meaningful exemplars, too.
For each pair of clusters, the merging objective is computed as follows:
An intermediate cluster is created as the union of the two clusters.
The potential exemplar is selected from the intermediate cluster as the sample that has the largest average similarity to all other samples in the intermediate cluster.
Then the average similarity of the exemplar with all samples in the first cluster and the average similarity with all samples in the second cluster is computed. These two values measure how well the joint exemplar describes the samples in the two clusters.
The merging objective is finally computed as the average of the two measures above. Hence, we can consider the merging objective as some kind of “balanced average similarity to the joint exemplar”.
In each step, all pairs of clusters are considered and the pair with the largest merging objective is actually merged. The joint exemplar is then chosen as the exemplar of the merged cluster.
aggExCluster can be used in two ways, either by performing
agglomerative clustering of an entire data set or by performing
agglomerative clustering of data previously clustered by
affinity propagation or another clustering algorithm.
Agglomerative clustering of an entire data set can be
accomplished either by calling aggExCluster on a
quadratic similarity matrix without further argument or by
calling aggExCluster for a function or function name
along with data to be clustered (as argument x).
A full agglomeration run is performed that starts from l
clusters (all samples in separate one-element clusters) and ends
with one cluster (all samples in one single cluster).
Agglomerative clustering starting from a given clustering
result can be accomplished by calling aggExCluster for
an APResult or ExClust
object passed as parameter x. The similarity matrix
can either be passed as argument s or, if missing,
aggExCluster looks if the similarity matrix is
included in the clustering object x. A cluster hierarchy
with numbers of clusters ranging from the
number of clusters in x down to 1 is created.
The result is stored in an AggExResult object.
The slot height is filled with the merging
objective of each of the maxNoClusters-1 merges. The slot
order contains a permutation of the samples/clusters for
dendrogram plotting. The algorithm for computing this permutation
is the same as the one used in hclust. If aggExCluster
was called for an entire data set, the slot label
contains the names of the objects to be clustered (if available,
otherwise the indices are used). If aggExCluster was called
for a prior clustering, then labels are set to ‘Cluster 1’,
‘Cluster 2’, etc.
Upon successful completion, the function returns an
AggExResult object.
Similarity matrices can be supplied in dense or sparse format. Note, however, that sparse matrices are converted to full dense matrices before clustering which may lead to memory and/or performance bottlenecks for larger data sets.
Ulrich Bodenhofer, Johannes Palme, and Nikola Kostic
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
AggExResult, apcluster-methods,
plot-methods, heatmap-methods,
cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute agglomerative clustering from scratch aggres1 <- aggExCluster(negDistMat(r=2), x) ## show results show(aggres1) ## plot dendrogram plot(aggres1) ## plot heatmap along with dendrogram heatmap(aggres1) ## plot level with two clusters plot(aggres1, x, k=2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7) ## create hierarchy of clusters determined by affinity propagation aggres2 <- aggExCluster(x=apres) ## show results show(aggres2) ## plot dendrogram plot(aggres2) plot(aggres2, showSamples=TRUE) ## plot heatmap heatmap(aggres2) ## plot level with two clusters plot(aggres2, x, k=2)## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute agglomerative clustering from scratch aggres1 <- aggExCluster(negDistMat(r=2), x) ## show results show(aggres1) ## plot dendrogram plot(aggres1) ## plot heatmap along with dendrogram heatmap(aggres1) ## plot level with two clusters plot(aggres1, x, k=2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7) ## create hierarchy of clusters determined by affinity propagation aggres2 <- aggExCluster(x=apres) ## show results show(aggres2) ## plot dendrogram plot(aggres2) plot(aggres2, showSamples=TRUE) ## plot heatmap heatmap(aggres2) ## plot level with two clusters plot(aggres2, x, k=2)
S4 class for storing results of exemplar-based agglomerative clustering
Objects of this class can be created by calling aggExCluster
for a given similarity matrix.
The following slots are defined for AggExResult objects:
l:number of samples in the data set
sel:subset of samples used for leveraged clustering (empty for normal clustering)
maxNoClusters:maximum number of clusters in the
cluster hierarchy, i.e. it
contains clusterings with 1 - maxNoClusters clusters.
exemplars:list of length maxNoClusters;
the i-th component of the list is a vector of i
exemplars (corresponding to the level with i clusters).
clusters:list of length maxNoClusters;
the i-th component of clusters is a list of i
clusters, each of which is a vector of sample indices.
merge:a maxNoClusters-1 by 2 matrix that
contains the merging hierarchy; fully analogous to the
slot merge in the class hclust.
height:a vector of length maxNoClusters-1 that
contains the merging objective of each merge; largely analogous to
the slot height in the class hclust except
that the slot height in AggExResult objects is
supposed to be non-increasing, since aggExCluster
is based on similarities, whereas hclust uses
dissimilarities.
order:a vector containing a permutation of indices
that can be used for plotting proper dendrograms without crossing
branches; fully analogous to the
slot order in the class hclust.
labels:a character vector containing labels of clustered objects used for plotting dendrograms.
sim:similarity matrix; only available if
aggExCluster was called with similarity
function and includeSim=TRUE.
call:method call used to produce this clustering result
signature(x="AggExResult"): see
plot-methods
signature(x="AggExResult", y="matrix"): see
plot-methods
signature(x="AggExResult"): see
heatmap-methods
signature(x="AggExResult", y="matrix"): see
heatmap-methods
signature(object="AggExResult"): see
show-methods
signature(object="AggExResult", k="ANY",
h="ANY"): see cutree-methods
signature(x="AggExResult"): gives the number of
clustering levels in the clustering result.
signature(x="AggExResult"): see
coerce-methods
signature(object="AggExResult"): see
coerce-methods
In the following code snippets, x is an AggExResult object.
signature(x="AggExResult", i="index", j="missing"):
x[[i]] returns an object of class
ExClust corresponding to the clustering level
with i clusters; synonymous to cutree(x, i).
signature(x="AggExResult", i="index", j="missing",
drop="missing"): x[i] returns a list of ExClust
objects with all clustering levels specified in vector i.
So, the list has as many components as the argument i has
elements. A list is returned even if i is a single level.
signature(x="AggExResult"): gives the similarity
matrix.
Ulrich Bodenhofer, Johannes Palme, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
aggExCluster, show-methods,
plot-methods, cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## compute agglomerative clustering from scratch aggres1 <- aggExCluster(sim) ## show results show(aggres1) ## plot dendrogram plot(aggres1) ## plot heatmap along with dendrogram heatmap(aggres1, sim) ## plot level with two clusters plot(aggres1, x, k=2) ## run affinity propagation apres <- apcluster(sim, q=0.7) ## create hierarchy of clusters determined by affinity propagation aggres2 <- aggExCluster(sim, apres) ## show results show(aggres2) ## plot dendrogram plot(aggres2) ## plot heatmap heatmap(aggres2, sim) ## plot level with two clusters plot(aggres2, x, k=2)## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## compute agglomerative clustering from scratch aggres1 <- aggExCluster(sim) ## show results show(aggres1) ## plot dendrogram plot(aggres1) ## plot heatmap along with dendrogram heatmap(aggres1, sim) ## plot level with two clusters plot(aggres1, x, k=2) ## run affinity propagation apres <- apcluster(sim, q=0.7) ## create hierarchy of clusters determined by affinity propagation aggres2 <- aggExCluster(sim, apres) ## show results show(aggres2) ## plot dendrogram plot(aggres2) ## plot heatmap heatmap(aggres2, sim) ## plot level with two clusters plot(aggres2, x, k=2)
Runs affinity propagation clustering
## S4 method for signature 'matrix,missing' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'dgTMatrix,missing' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'sparseMatrix,missing' apcluster(s, x, ...) ## S4 method for signature 'Matrix,missing' apcluster(s, x, ...) ## S4 method for signature 'character,ANY' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, ...) ## S4 method for signature 'function,ANY' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, ...)## S4 method for signature 'matrix,missing' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'dgTMatrix,missing' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'sparseMatrix,missing' apcluster(s, x, ...) ## S4 method for signature 'Matrix,missing' apcluster(s, x, ...) ## S4 method for signature 'character,ANY' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, ...) ## S4 method for signature 'function,ANY' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, ...)
s |
an |
x |
input data to be clustered; if |
p |
input preference; can be a vector that specifies
individual preferences for each data point. If scalar,
the same value is used for all data points. If |
q |
if |
maxits |
maximal number of iterations that should be executed |
convits |
the algorithm terminates if the examplars have not
changed for |
lam |
damping factor; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur |
includeSim |
if |
details |
if |
nonoise |
|
seed |
for reproducibility, the seed of the random number
generator can be set to a fixed value before
adding noise (see above), if |
... |
for the methods with signatures |
Affinity Propagation clusters data using a set of real-valued pairwise data point similarities as input. Each cluster is represented by a cluster center data point (the so-called exemplar). The method is iterative and searches for clusters maximizing an objective function called net similarity.
When called with a similarity matrix as input (which may also be a
sparse matrix according to the Matrix package), the function performs
AP clustering. When called with the name of a package-provided
similarity function or a user-provided similarity function object and
input data, the function first computes the similarity matrix before
performing AP clustering. The similarity
matrix is returned for later use as part of the
APResult
object depending on whether includeSim was set to TRUE (see
argument description above).
Apart from minor adaptations and optimizations, the AP
clustering functionality of the function apcluster is
largely analogous to Frey's and Dueck's Matlab code
(see https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/).
The new argument q allows for better controlling the number of
clusters without knowing the distribution of similarity
values. A meaningful range for the parameter p can be determined
using the function preferenceRange. Alternatively, a
certain fixed number of clusters may be desirable. For this purpose,
the function apclusterK is available.
Upon successful completion, the function returns an
APResult object.
Ulrich Bodenhofer, Andreas Kothmeier, Johannes Palme, and Chrats Melkonian
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult, show-methods,
plot-methods, labels-methods,
preferenceRange, apclusterL-methods,
apclusterK
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix and run affinity propagation ## (p defaults to median of similarity) apres <- apcluster(negDistMat(r=2), x, details=TRUE) ## show details of clustering results show(apres) ## plot clustering result plot(apres, x) ## plot heatmap heatmap(apres) ## run affinity propagation with default preference of 10% quantile ## of similarities; this should lead to a smaller number of clusters ## reuse similarity matrix from previous run apres <- apcluster(s=apres@sim, q=0.1) show(apres) plot(apres, x) ## now try the same with RBF kernel sim <- expSimMat(x, r=2) apres <- apcluster(s=sim, q=0.2) show(apres) plot(apres, x) ## create sparse similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apcluster(ssim, q=0) apres## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix and run affinity propagation ## (p defaults to median of similarity) apres <- apcluster(negDistMat(r=2), x, details=TRUE) ## show details of clustering results show(apres) ## plot clustering result plot(apres, x) ## plot heatmap heatmap(apres) ## run affinity propagation with default preference of 10% quantile ## of similarities; this should lead to a smaller number of clusters ## reuse similarity matrix from previous run apres <- apcluster(s=apres@sim, q=0.1) show(apres) plot(apres, x) ## now try the same with RBF kernel sim <- expSimMat(x, r=2) apres <- apcluster(s=sim, q=0.2) show(apres) plot(apres, x) ## create sparse similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apcluster(ssim, q=0) apres
Runs affinity propagation demo for randomly generated data set according to Frey and Dueck
apclusterDemo(l=100, d=2, seed=NA, ...)apclusterDemo(l=100, d=2, seed=NA, ...)
l |
number of data points to be generated |
d |
dimension of data to be created |
seed |
for reproducibility, the seed of the random number
generator can be set to a fixed value; if |
... |
all other arguments are passed on to
|
apclusterDemo creates l d-dimensional
data points that are uniformly distributed in . Affinity
propagation is executed for this data set with default parameters.
Alternative settings can be passed to apcluster with
additional arguments. After completion of affinity propagation,
the results are shown and the performance measures are plotted.
This function corresponds to the demo function in the original Matlab code of Frey and Dueck. We warn the user, however, that uniformly distributed data are not necessarily ideal for demonstrating clustering, as there can never be real clusters in uniformly distributed data - all clusters found must be random artefacts.
Upon successful completion, the function returns an invisible list
with three components. The first is the data set that has been
created, the second is the similarity matrix, and the third is an
APResult object with the clustering results (see
examples below).
Ulrich Bodenhofer, Johannes Palme, and Johannes Palme
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult, plot-methods,
apcluster, apclusterL
## create random data set and run affinity propagation apd <- apclusterDemo() ## plot clustering result along with data set plot(apd[[3]], apd[[1]])## create random data set and run affinity propagation apd <- apclusterDemo() ## plot clustering result along with data set plot(apd[[3]], apd[[1]])
Runs affinity propagation clustering for a given similarity matrix adjusting input preferences iteratively in order to achieve a desired number of clusters
## S4 method for signature 'matrix,missing' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE) ## S4 method for signature 'Matrix,missing' apclusterK(s, x, K, ...) ## S4 method for signature 'dgTMatrix,missing' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE) ## S4 method for signature 'sparseMatrix,missing' apclusterK(s, x, K, ...) ## S4 method for signature 'function,ANY' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE, ...) ## S4 method for signature 'character,ANY' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE, ...)## S4 method for signature 'matrix,missing' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE) ## S4 method for signature 'Matrix,missing' apclusterK(s, x, K, ...) ## S4 method for signature 'dgTMatrix,missing' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE) ## S4 method for signature 'sparseMatrix,missing' apclusterK(s, x, K, ...) ## S4 method for signature 'function,ANY' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE, ...) ## S4 method for signature 'character,ANY' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE, ...)
s |
an |
x |
input data to be clustered; if |
K |
desired number of clusters |
prc |
the algorithm stops if the number of clusters does not deviate more than prc percent from desired value K; set to 0 if you want to have exactly K clusters |
bimaxit |
maximum number of bisection steps to perform; note that no warning is issued if the number of clusters is still not in the desired range |
exact |
flag indicating whether or not to compute the initial
preference range exactly (see |
maxits |
maximal number of iterations that |
convits |
|
lam |
damping factor for |
includeSim |
if |
details |
if |
nonoise |
|
seed |
for reproducibility, the seed of the random number
generator can be set to a fixed value, if |
verbose |
flag indicating whether status information should be displayed during bisection |
... |
for the methods with signatures |
apclusterK first runs preferenceRange to determine
the range of meaningful choices of the input preference p. Then
it decreases p exponentially for a few iterations to obtain a
good initial guess for p. If the number of clusters is still
too far from the desired goal, bisection is applied.
When called with a similarity matrix as input, the function performs
the procedure described above. When called with the name of a package-provided
similarity function or a user-provided similarity function object and
input data, the function first computes the similarity matrix before
running apclusterK on this similarity matrix. The similarity
matrix is returned for later use as part of the APResult object
depending on whether includeSim was set to TRUE (see
argument description above).
Apart from minor adaptations and optimizations, the implementation is largely analogous to Frey's and Dueck's Matlab code (see https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/).
Upon successful completion, the function returns a
APResult object.
Ulrich Bodenhofer and Andreas Kothmeier
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
apcluster, preferenceRange,
APResult
## create three Gaussian clouds cl1 <- cbind(rnorm(70, 0.2, 0.05), rnorm(70, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) cl3 <- cbind(rnorm(60, 0.8, 0.04), rnorm(60, 0.8, 0.05)) x <- rbind(cl1, cl2, cl3) ## run affinity propagation such that 3 clusters are obtained apres <- apclusterK(negDistMat(r=2), x, K=3) ## show details of clustering results show(apres) ## plot clustering result plot(apres, x) ## create sparse similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apclusterK(ssim, K=2) apres## create three Gaussian clouds cl1 <- cbind(rnorm(70, 0.2, 0.05), rnorm(70, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) cl3 <- cbind(rnorm(60, 0.8, 0.04), rnorm(60, 0.8, 0.05)) x <- rbind(cl1, cl2, cl3) ## run affinity propagation such that 3 clusters are obtained apres <- apclusterK(negDistMat(r=2), x, K=3) ## show details of clustering results show(apres) ## plot clustering result plot(apres, x) ## create sparse similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apclusterK(ssim, K=2) apres
Runs leveraged affinity propagation clustering
## S4 method for signature 'matrix,missing' apclusterL(s, x, sel, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'character,ANY' apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...) ## S4 method for signature 'function,ANY' apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...)## S4 method for signature 'matrix,missing' apclusterL(s, x, sel, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'character,ANY' apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...) ## S4 method for signature 'function,ANY' apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...)
s |
an |
x |
input data to be clustered; if |
frac |
fraction of samples that should be used for leveraged clustering. The similarity matrix will be generated for all samples against a random fraction of the samples as specified by this parameter. |
sweeps |
number of sweeps of leveraged clustering performed with changing randomly selected subset of samples. |
sel |
selected sample indices; a vector containing the sample indices of the sample subset used for leveraged AP clustering in increasing order. |
p |
input preference; can be a vector that specifies
individual preferences for each data point. If scalar,
the same value is used for all data points. If |
q |
if |
maxits |
maximal number of iterations that should be executed |
convits |
the algorithm terminates if the examplars have not
changed for |
lam |
damping factor; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur |
includeSim |
if |
nonoise |
|
seed |
for reproducibility, the seed of the random number
generator can be set to a fixed value before
adding noise (see above), if |
... |
all other arguments are passed to the selected
similarity function as they are; note that possible name conflicts between
arguments of |
Affinity Propagation clusters data using a set of real-valued pairwise similarities as input. Each cluster is represented by a representative cluster center (the so-called exemplar). The method is iterative and searches for clusters maximizing an objective function called net similarity.
Leveraged Affinity Propagation reduces dynamic and static load for large datasets. Only a subset of the samples are considered in the clustering process assuming that they provide already enough information about the cluster structure.
When called with input data and the name of a package provided or a user
provided similarity function the function selects a random sample subset
according to the frac parameter, calculates a rectangular
similarity matrix of all samples against this subset and repeats
affinity propagation sweep times. A new sample subset is used
for each repetition. The clustering result of the sweep with the highest
net similarity is returned. Any parameters specific to the chosen
method of similarity calculation can be passed to apcluster
in addition to the parameters described above. The similarity matrix
for the best trial is also returned in the result object when requested
by the user (argument includeSim).
When called with a rectangular similarity matrix (which represents a
column subset of the full similarity matrix) the function performs
AP clustering on this similarity matrix. The information
about the selected samples is passed to clustering with the
parameter sel. This function is only needed when the user needs full
control of distance calculation or sample subset selection.
Apart from minor adaptations and optimizations, the implementation
of the function apclusterL
is largely analogous to Frey's and Dueck's Matlab code
(see https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/).
Upon successful completion, both functions returns an
APResult object.
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult, show-methods,
plot-methods, labels-methods,
preferenceRange, apcluster-methods,
apclusterK
## create two Gaussian clouds cl1 <- cbind(rnorm(150, 0.2, 0.05), rnorm(150, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## leveraged apcluster apres <- apclusterL(negDistMat(r=2), x, frac=0.2, sweeps=3, p=-0.2) ## show details of leveraged clustering results show(apres) ## plot leveraged clustering result plot(apres, x) ## plot heatmap of clustering result heatmap(apres) ## show net similarities of single sweeps apres@netsimLev ## show samples on which best sweep was based apres@sel## create two Gaussian clouds cl1 <- cbind(rnorm(150, 0.2, 0.05), rnorm(150, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## leveraged apcluster apres <- apclusterL(negDistMat(r=2), x, frac=0.2, sweeps=3, p=-0.2) ## show details of leveraged clustering results show(apres) ## plot leveraged clustering result plot(apres, x) ## plot heatmap of clustering result heatmap(apres) ## show net similarities of single sweeps apres@netsimLev ## show samples on which best sweep was based apres@sel
S4 class for storing results of affinity propagation
clustering. It extends the class ExClust.
Objects of this class can be created by calling apcluster
or apclusterL for a given similarity matrix or calling
one of these procedures with a data set and a similarity measure.
The following slots are defined for APResult objects. Most names are taken from Frey's and Dueck's original Matlab package:
sweeps:number of times leveraged clustering ran with different subsets of samples
it:number of iterations the algorithm ran
p:input preference (either set by user or
computed by apcluster or
apclusterL)
netsim:final total net similarity, defined as the
sum of expref and dpsim
(see below)
dpsim:final sum of similarities of data points to exemplars
expref:final sum of preferences of the identified exemplars
netsimLev:total net similarity of the individual sweeps for leveraged clustering; only available for leveraged clustering
netsimAll:vector containing the total net similarity
for each iteration; only available if
apcluster was called with
details=TRUE
exprefAll:vector containing the sum of preferences
of the identified exemplars
for each iteration; only available if
apcluster was called with
details=TRUE
dpsimAll:vector containing the sum of similarities
of data points to exemplars
for each iteration; only available if
apcluster was called with
details=TRUE
idxAll:matrix with sample-to-exemplar indices
for each iteration; only available if
apcluster was called with
details=TRUE
Class "ExClust", directly.
signature(x="APResult"): see
plot-methods
signature(x="ExClust", y="matrix"): see
plot-methods
signature(x="ExClust"): see
heatmap-methods
signature(x="ExClust", y="matrix"): see
heatmap-methods
signature(object="APResult"): see
show-methods
signature(object="APResult"): see
labels-methods
signature(object="APResult"): see
cutree-methods
signature(x="APResult"): gives the number of
clusters.
signature(x="ExClust"): see
sort-methods
signature(x="ExClust"): see
coerce-methods
signature(object="ExClust"): see
coerce-methods
In the following code snippets, x is an APResult object.
signature(x="APResult", i="index", j="missing"):
x[[i]] returns the i-th cluster as a list of indices
of samples belonging to the i-th cluster.
signature(x="APResult", i="index", j="missing",
drop="missing"): x[i] returns a list of integer vectors with the
indices of samples belonging to this cluster. The list has as
many components as the argument i has elements. A list is
returned even if i is a single integer.
signature(x="APResult"): gives the similarity
matrix.
Ulrich Bodenhofer, Andreas Kothmeier, Johannes Palme
https://github.com/UBod/apcluster
APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
apcluster, apclusterL,
show-methods, plot-methods,
labels-methods, cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim, details=TRUE) ## show details of clustering results show(apres) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## plot heatmap heatmap(apres, sim)## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim, details=TRUE) ## show details of clustering results show(apres) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## plot heatmap heatmap(apres, sim)
Functions for coercing clustering object to hclust and dendrogram objects
## S4 method for signature 'AggExResult' as.hclust(x, base=0.05) ## S4 method for signature 'ExClust' as.hclust(x, base=0.05, ...) ## S4 method for signature 'AggExResult' as.dendrogram(object, base=0.05, useNames=TRUE) ## S4 method for signature 'ExClust' as.dendrogram(object, base=0.05, useNames=TRUE, ...)## S4 method for signature 'AggExResult' as.hclust(x, base=0.05) ## S4 method for signature 'ExClust' as.hclust(x, base=0.05, ...) ## S4 method for signature 'AggExResult' as.dendrogram(object, base=0.05, useNames=TRUE) ## S4 method for signature 'ExClust' as.dendrogram(object, base=0.05, useNames=TRUE, ...)
x |
a clustering result object of class
|
object |
a clustering result object of class
|
base |
fraction of height used for the very first join; defaults to 0.05, i.e. the first join appears at 5% of the total height of the dendrogram (see details below). |
useNames |
if |
... |
all other arguments are passed on to
|
If called for an AggExResult object,
as.hclust creates an hclust object.
The heights are transformed to the interval from base (height
of lowest join) to 1 (height of highest join).
If called for an ExClust or
APResult object, aggExCluster is
called internally to create a cluster hierarchy first. This is only
possible if the pairwise similarities are included in the sim
slot of x (see aggExCluster on how to ensure
this).
If x is an AggExResult object obtained by
clustering an entire data set, as.hclust produces a complete
hierarchy. If, however, x is an ExClust (or
APResult) object or an
AggExResult obtained by running
aggExCluster on an ExClust or
APResult object, then as.hclust produces
a hierarchy of clusters, not of samples.
If called for an AggExResult object,
as.dendrogram creates an
dendrogram object.
Analogously to as.hclust, the heights are transformed to the
interval ranging from base (height
of lowest join) to 1 (height of highest join). So, any information
about heights of merges is lost. If the original join heights are
relevant, call plot on the original
AggExResult object directly without coercing it
to a dendrogram object first.
If called for an ExClust or
APResult object, aggExCluster is
called first to create a cluster hierarchy. Again this is only
possible if the pairwise similarities are included in the sim
slot of object.
If object is an AggExResult object obtained by
clustering an entire data set, as.dendrogram produces a complete
dendrogram. If object is an ExClust (or
APResult) object or an
AggExResult obtained by previously running
aggExCluster on an ExClust or
APResult object, then as.dendrogram produces
a complete dendrogram of all samples, too, but with the difference
that entire clusters of the previous ExClust or
APResult object are not further split up
hierarchically.
Consequently, if x is not a complete cluster hierarchy, but a
hierarchy of clusters, as.dendrogram(as.hclust(x)) produces a
dendrogram of clusters, whereas as.dendrogram(x) in any case
produces a dendrogram of samples (with the special property mentioned
above).
see details above
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult,
AggExResult, ExClust,
heatmap-methods, apcluster,
apclusterL, aggExCluster,
cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(x=apres) ## compute and plot dendrogram dend1 <- as.dendrogram(aggres1) dend1 plot(dend1) ## compute and show dendrogram computed from hclust object dend2 <- as.dendrogram(as.hclust(aggres1)) dend2 plot(dend2) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## compute and plot dendrogram dend3 <- as.dendrogram(aggres2) dend3 plot(dend3)## create two Gaussian clouds cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(x=apres) ## compute and plot dendrogram dend1 <- as.dendrogram(aggres1) dend1 plot(dend1) ## compute and show dendrogram computed from hclust object dend2 <- as.dendrogram(as.hclust(aggres1)) dend2 plot(dend2) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## compute and plot dendrogram dend3 <- as.dendrogram(aggres2) dend3 plot(dend3)
Converts a dense similarity matrix into a sparse one or vice versa
## S4 method for signature 'matrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'Matrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'sparseMatrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'matrix' as.DenseSimilarityMatrix(s, fill=-Inf) ## S4 method for signature 'Matrix' as.DenseSimilarityMatrix(s, fill=-Inf) ## S4 method for signature 'sparseMatrix' as.DenseSimilarityMatrix(s, fill=-Inf)## S4 method for signature 'matrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'Matrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'sparseMatrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'matrix' as.DenseSimilarityMatrix(s, fill=-Inf) ## S4 method for signature 'Matrix' as.DenseSimilarityMatrix(s, fill=-Inf) ## S4 method for signature 'sparseMatrix' as.DenseSimilarityMatrix(s, fill=-Inf)
s |
a similarity matrix in sparse or dense format (see details below) |
lower |
cut-off threshold to apply when converting similarity
matrices into sparse format. All similarities lower than or equal to
|
fill |
value to fill in for entries that are missing from sparse
similarity matrix 's' (defaults to |
The function as.SparseSimilarityMatrix takes a matrix argument,
removes all diagonal elements and all values that are lower than or
equal to the cut-off threshold lower and returns a sparse
matrix of class Matrix:dgTMatrix-class.
If the function as.DenseSimilarityMatrix is called for a
sparse matrix (class sparseMatrix or any
class derived from this class), a dense matrix is returned, where all
values that were missing in the sparse matrix are replaced with
fill.
as.DenseSimilarityMatrix can also be called for dense
matrix and Matrix:Matrix objects.
In this case, as.DenseSimilarityMatrix assumes that the
matrices have three columns that encode for a sparse matrix
in the same way as the Matlab implementation of Frey's and Dueck's
sparse affinity propagation accepts it:
the first column contains 1-based row indices, the second column
contains 1-based column indices, and the third column contains the
similarity values. The same format is also accepted by
as.SparseSimilarityMatrix to convert a sparse similarity matrix
of this format into a Matrix:Matrix object.
Note that, for matrices of this format,
as.DenseSimilarityMatrix replaces the deprectated function
sparseToFull that was used in older versions of the package.
Note that as.SparseSimilarityMatrix and
as.DenseSimilarityMatrix are no S4 coercion methods.
There are no classes named SparseSimilarityMatrix
or DenseSimilarityMatrix.
returns a square similarity matrix in sparse format (class
Matrix:dgTMatrix-class or in dense format (standard class
matrix).
Ulrich Bodenhofer
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create similarity matrix in sparse format according to Frey and Dueck sp <- matrix(c(1, 2, 0.5, 3, 1, 0.2, 5, 4, -0.2, 3, 4, 1.2), 4, 3, byrow=TRUE) sp ## perform conversions as.DenseSimilarityMatrix(sp, fill=0) as.SparseSimilarityMatrix(sp) ## create dense similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apcluster(ssim, q=0) apres## create similarity matrix in sparse format according to Frey and Dueck sp <- matrix(c(1, 2, 0.5, 3, 1, 0.2, 5, 4, -0.2, 3, 4, 1.2), 4, 3, byrow=TRUE) sp ## perform conversions as.DenseSimilarityMatrix(sp, fill=0) as.SparseSimilarityMatrix(sp) ## create dense similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apcluster(ssim, q=0) apres
Cut out a clustering level from a cluster hierarchy
## S4 method for signature 'AggExResult' cutree(tree, k, h) ## S4 method for signature 'APResult' cutree(tree, k, h)## S4 method for signature 'AggExResult' cutree(tree, k, h) ## S4 method for signature 'APResult' cutree(tree, k, h)
tree |
an object of class |
k |
the level (i.e. the number of clusters) to be selected |
h |
alternatively, the level can be selected by specifying a cut-off for the merging objective |
The function cutree extracts a clustering level from a
cluster hierarchy stored in an AggExResult
object. Which level is selected can be determined by one of the
two arguments k and h (see above). If both k and
h are specified, k overrides h. This is
done largely analogous to the standard function
cutree. The differences are (1) that
only one level can be extracted at a time and (2) that an
ExClust is returned instead of an index list.
The function cutree may further be used to convert an
APResult object into an
ExClust object. In this case, the arguments
k and h are ignored.
returns an object of class ExClust
Ulrich Bodenhofer and Andreas Kothmeier
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create two simple clusters x <- c(1, 2, 3, 7, 8, 9) names(x) <- c("a", "b", "c", "d", "e", "f") ## compute similarity matrix (negative squared distance) sim <- negDistMat(x, r=2) ## run affinity propagation aggres <- aggExCluster(sim) ## show details of clustering results show(aggres) ## retrieve clustering with 2 clusters cutree(aggres, 2) ## retrieve clustering with cut-off h=-1 cutree(aggres, h=-1)## create two simple clusters x <- c(1, 2, 3, 7, 8, 9) names(x) <- c("a", "b", "c", "d", "e", "f") ## compute similarity matrix (negative squared distance) sim <- negDistMat(x, r=2) ## run affinity propagation aggres <- aggExCluster(sim) ## show details of clustering results show(aggres) ## retrieve clustering with 2 clusters cutree(aggres, 2) ## retrieve clustering with cut-off h=-1 cutree(aggres, h=-1)
S4 class for storing exemplar-based clusterings
Objects of this class can be created by calling cutree
to cut out a clustering level from a cluster hierarchy
of class AggExResult. Moreover,
cutree can also be used to convert an object of
class APResult to class ExClust.
The following slots are defined for ExClust objects:
l:number of samples in the data set
sel:subset of samples used for leveraged clustering
exemplars:vector containing indices of exemplars
clusters:list containing the clusters; the i-th component is a vector of indices of data points belonging to the i-th exemplar (including the exemplar itself)
idx:vector of length l realizing a
sample-to-exemplar mapping; the i-th entry
contains the index of the exemplar the i-th
sample belongs to
sim:similarity matrix; only available if
the preceding clustering method was called with
includeSim=TRUE.
call:method call of the preceding clustering method
signature(x="ExClust"): see
plot-methods
signature(x="ExClust", y="matrix"): see
plot-methods
signature(x="ExClust"): see
heatmap-methods
signature(x="ExClust", y="matrix"): see
heatmap-methods
signature(object="ExClust"): see
show-methods
signature(object="ExClust"): see
labels-methods
signature(object="ExClust", k="ANY", h="ANY"): see
cutree-methods
signature(x="ExClust"): gives the number of
clusters.
signature(x="ExClust"): see
sort-methods
signature(x="ExClust"): see
coerce-methods
signature(object="ExClust"): see
coerce-methods
In the following code snippets, x is an ExClust object.
signature(x="ExClust", i="index", j="missing"):
x[[i]] returns the i-th cluster as a list of indices
of samples belonging to the i-th cluster.
signature(x="ExClust", i="index", j="missing",
drop="missing"): x[i] returns a list of integer vectors with the
indices of samples belonging to this cluster. The list has as
many components as the argument i has elements. A list is
returned even if i is a single integer.
signature(x="ExClust"): gives the similarity
matrix.
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
aggExCluster, show-methods,
plot-methods, labels-methods,
cutree-methods, AggExResult,
APResult
## create two Gaussian clouds cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(25, 0.7, 0.08), rnorm(25, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation aggres <- aggExCluster(sim) ## extract level with two clusters excl <- cutree(aggres, k=2) ## show details of clustering results show(excl) ## plot information about clustering run plot(excl, x)## create two Gaussian clouds cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(25, 0.7, 0.08), rnorm(25, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation aggres <- aggExCluster(sim) ## extract level with two clusters excl <- cutree(aggres, k=2) ## show details of clustering results show(excl) ## plot information about clustering run plot(excl, x)
Functions for Plotting of Heatmap
## S4 method for signature 'ExClust,missing' heatmap(x, y, ...) ## S4 method for signature 'ExClust,matrix' heatmap(x, y, ...) ## S4 method for signature 'ExClust,Matrix' heatmap(x, y, ...) ## S4 method for signature 'ExClust,sparseMatrix' heatmap(x, y, ...) ## S4 method for signature 'AggExResult,missing' heatmap(x, y, ...) ## S4 method for signature 'AggExResult,matrix' heatmap(x, y, Rowv=TRUE, Colv=TRUE, sideColors=NULL, col=heat.colors(12), base=0.05, add.expr, margins=c(5, 5, 2), cexRow=max(min(35 / nrow(y), 1), 0.1), cexCol=max(min(35 / ncol(y), 1), 0.1), main=NULL, dendScale=1, barScale=1, legend=c("none", "col"), ...) ## S4 method for signature 'matrix,missing' heatmap(x, y, ...) ## S4 method for signature 'missing,matrix' heatmap(x, y, ...)## S4 method for signature 'ExClust,missing' heatmap(x, y, ...) ## S4 method for signature 'ExClust,matrix' heatmap(x, y, ...) ## S4 method for signature 'ExClust,Matrix' heatmap(x, y, ...) ## S4 method for signature 'ExClust,sparseMatrix' heatmap(x, y, ...) ## S4 method for signature 'AggExResult,missing' heatmap(x, y, ...) ## S4 method for signature 'AggExResult,matrix' heatmap(x, y, Rowv=TRUE, Colv=TRUE, sideColors=NULL, col=heat.colors(12), base=0.05, add.expr, margins=c(5, 5, 2), cexRow=max(min(35 / nrow(y), 1), 0.1), cexCol=max(min(35 / ncol(y), 1), 0.1), main=NULL, dendScale=1, barScale=1, legend=c("none", "col"), ...) ## S4 method for signature 'matrix,missing' heatmap(x, y, ...) ## S4 method for signature 'missing,matrix' heatmap(x, y, ...)
x |
a clustering result object of class
|
y |
a similarity matrix |
sideColors |
character vector of colors to be used for plotting color
bars that visualize clusters of the finest clustering level in
|
col |
color ramp used for the heatmap image; see
|
Rowv |
determines whether or not a row dendrogram should be
plotted. If |
Colv |
determines whether or not a column dendrogram should be
plotted. Fully analogous to |
base |
fraction of height used for the very first join in
dendrograms; see |
add.expr, margins, cexRow, cexCol, main
|
largely analogous to the standard
|
dendScale |
factor scaling the width of vertical and height of
horizontal dendrograms; values have to be larger than 0 and no
larger than 2. The default is 1 which corresponds to the same size
as the dendrograms plot by the standard
|
barScale |
factor scaling the width of color bars; values have to
be larger than 0 and no larger than 4. The default is 1 which
corresponds to half the width of the color bars plot by the standard
|
legend |
if |
... |
see details below |
The heatmap functions provide plotting of heatmaps from several
different types of input object. The implementation is similar to the standard
graphics function heatmap.
Plotting heatmaps via the plot command as available in previous
versions of this package is still available for backward
compatibility.
If heatmap is called for objects of classes
APResult or ExClust,
a heatmap of the similarity matrix in slot sim of the parameter
x is created with clusters grouped together and highlighted in
different colors. The order of clusters is determined by running
aggExCluster on the clustering result x. This
variant of heatmap returns an invisible
AggExResult object.
If heatmap is called for an AggExResult
object that contains all levels of clustering, the heatmap is
displayed with the corresponding clustering dendrogram. If the
AggExResult object is the result of running
aggExCluster on a prior clustering result, the same heatmap
plot is produced as if heatmap had been called on this
prior clustering result, however, returning the cluster hierarchy's
dendrogram. In the latter case, color bars are plotted
to visualize the prior clustering result (see description of
argument sideColors above).
All variants described above only work if the input object x
contains a slot sim with the similarity matrix (which is only
the case if the preceding clustering method has been called with
includeSim=TRUE). In case the slot sim of x does not
contain the similarity matrix, the similarity matrix must be supplied
as second argument y.
All variants described above internally use heatmap with signature
AggExResult,matrix, so all arguments list above can be used for
all variants, as they are passed through using the ...
argument. All other arguments, analogously to the standard
heatmap function, are passed on to the
standard function image. This is
particularly useful for using alternative color schemes via the
col argument.
The two variants with one of the two arguments being a matrix and one
being missing are just wrappers around the standard
heatmap function with the aim to provide
compatibility with this standard case.
see details above
Similarity matrices can be supplied in dense or sparse format. Note, however, that sparse matrices are converted to full dense matrices before plotting heatmaps which may lead to memory and/or performance bottlenecks for larger data sets.
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult,
AggExResult, ExClust,
apcluster, apclusterL,
aggExCluster, cutree-methods,
plot-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation using negative squared Euclidean apres <- apcluster(negDistMat(r=2), x, p=-0.1) ## plot heatmap clustering run heatmap(apres) ## rerun affinity propagation ## reuse similarity matrix from previous run apres2 <- apcluster(s=apres@sim, q=0.6) ## plot heatmap of second run heatmap(apres2, apres@sim) ## with alternate heatmap coloring, alternating color bars, and no dendrograms heatmap(apres2, apres@sim, Rowv=NA, Colv=NA, sideColors=c("darkgreen", "yellowgreen"), col=terrain.colors(12)) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(apres@sim, apres2) ## plot heatmap heatmap(cutree(aggres1, 2), apres@sim) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## show heatmap along with dendrogram heatmap(aggres2)## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation using negative squared Euclidean apres <- apcluster(negDistMat(r=2), x, p=-0.1) ## plot heatmap clustering run heatmap(apres) ## rerun affinity propagation ## reuse similarity matrix from previous run apres2 <- apcluster(s=apres@sim, q=0.6) ## plot heatmap of second run heatmap(apres2, apres@sim) ## with alternate heatmap coloring, alternating color bars, and no dendrograms heatmap(apres2, apres@sim, Rowv=NA, Colv=NA, sideColors=c("darkgreen", "yellowgreen"), col=terrain.colors(12)) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(apres@sim, apres2) ## plot heatmap heatmap(cutree(aggres1, 2), apres@sim) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## show heatmap along with dendrogram heatmap(aggres2)
Generate a label vector from an clustering result
## S4 method for signature 'ExClust' labels(object, type="names")## S4 method for signature 'ExClust' labels(object, type="names")
object |
|
type |
specifies which kind of label vector should be created, see details below |
The function labels creates a label vector from a clustering
result. Which kind of labels are produced is controlled by the
argument type:
(default) returns the name of the exemplar to which each data sample belongs to; if no names are available, the function stops with an error;
returns the index of the cluster to which
each data sample belongs to, where clusters are enumerated
consecutively from 1 to the number of clusters (analogous to
other clustering methods like kmeans);
returns the index of the exemplar to
which each data sample belongs to, where indices of exemplars are
within the original data, which is nothing else but the slot
object@idx with attributes removed.
returns a label vector as long as the number of samples in the original data set
Ulrich Bodenhofer and Andreas Kothmeier
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create two simple clusters x <- c(1, 2, 3, 7, 8, 9) names(x) <- c("a", "b", "c", "d", "e", "f") ## compute similarity matrix (negative squared distance) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim) ## show details of clustering results show(apres) ## label vector (names of exemplars) labels(apres) ## label vector (consecutive index of exemplars) labels(apres, type="enum") ## label vector (index of exemplars within original data set) labels(apres, type="exemplars") ## now with agglomerative clustering aggres <- aggExCluster(sim) ## label (names of exemplars) labels(cutree(aggres, 2))## create two simple clusters x <- c(1, 2, 3, 7, 8, 9) names(x) <- c("a", "b", "c", "d", "e", "f") ## compute similarity matrix (negative squared distance) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim) ## show details of clustering results show(apres) ## label vector (names of exemplars) labels(apres) ## label vector (consecutive index of exemplars) labels(apres, type="enum") ## label vector (index of exemplars within original data set) labels(apres, type="exemplars") ## now with agglomerative clustering aggres <- aggExCluster(sim) ## label (names of exemplars) labels(cutree(aggres, 2))
Functions for Visualizing Clustering Results
## S4 method for signature 'APResult,missing' plot(x, y, type=c("netsim", "dpsim", "expref"), xlab="# Iterations", ylab="Similarity", ...) ## S4 method for signature 'ExClust,matrix' plot(x, y, connect=TRUE, xlab="", ylab="", labels=NA, limitNo=15, ...) ## S4 method for signature 'ExClust,data.frame' plot(x, y, connect=TRUE, xlab="", ylab="", labels=NA, limitNo=15, ...) ## S4 method for signature 'AggExResult,missing' plot(x, y, main="Cluster dendrogram", xlab="", ylab="", ticks=4, digits=2, base=0.05, showSamples=FALSE, horiz=FALSE, ...) ## S4 method for signature 'AggExResult,matrix' plot(x, y, k=NA, h=NA, ...) ## S4 method for signature 'AggExResult,data.frame' plot(x, y, k=NA, h=NA, ...)## S4 method for signature 'APResult,missing' plot(x, y, type=c("netsim", "dpsim", "expref"), xlab="# Iterations", ylab="Similarity", ...) ## S4 method for signature 'ExClust,matrix' plot(x, y, connect=TRUE, xlab="", ylab="", labels=NA, limitNo=15, ...) ## S4 method for signature 'ExClust,data.frame' plot(x, y, connect=TRUE, xlab="", ylab="", labels=NA, limitNo=15, ...) ## S4 method for signature 'AggExResult,missing' plot(x, y, main="Cluster dendrogram", xlab="", ylab="", ticks=4, digits=2, base=0.05, showSamples=FALSE, horiz=FALSE, ...) ## S4 method for signature 'AggExResult,matrix' plot(x, y, k=NA, h=NA, ...) ## S4 method for signature 'AggExResult,data.frame' plot(x, y, k=NA, h=NA, ...)
x |
a clustering result object of class
|
y |
a matrix or data frame (see details below) |
type |
a string or array of strings indicating which
performance measures should be plotted; valid values are
|
xlab, ylab
|
labels for axes of 2D plots; ignored if |
labels |
names used for variables in scatter plot matrix
(displayed if |
limitNo |
if the number of columns/features in |
connect |
used only if clustering is plotted on original data,
ignored otherwise. If |
main |
title of plot |
ticks |
number of ticks used for the axis on the left side of the plot (applies to dendrogram plots only, see below) |
digits |
number of digits used for the axis tickmarks on the left side of the plot (applies to dendrogram plots only, see below) |
base |
fraction of height used for the very first join; defaults to 0.05, i.e. the first join appears at 5% of the total height of the dendrogram. |
showSamples |
if |
horiz |
if |
k |
level to be selected when plotting a single clustering
level of cluster hierarchy (i.e. the number of clusters; see
|
h |
cut-off to be used when plotting a single clustering
level of cluster hierarchy (see |
... |
all other arguments are passed to the plotting command that
are used internally, |
If plot is called for an APResult object
without specifying the second argument y,
a plot is created that displays graphs of performance
measures over execution time of the affinity propagation run.
This only works if apcluster was called with
details=TRUE.
If plot is called for an APResult object
along with a matrix or data frame as argument y, then the dimensions of
the matrix determine the behavior of plot:
If the matrix y has two columns, y is
interpreted as the original data set. Then a plot of
the clustering result superimposed on the original
data set is created. Each cluster is displayed in a
different color. The exemplar of each cluster is highlighted
by a black square. If connect is TRUE, lines
connecting the cluster members to their exemplars are drawn.
This variant of plot does not return any value.
If y has more than two columns, clustering results are
superimposed in a sort of scatter plot matrix. The variant
that y is interpreted as similarity matrix if it is
quadratic has been removed in version 1.3.2. Use
heatmap instead.
If y has only one column, an error is displayed.
If plot is called for an ExClust object
along with a matrix or data frame as argument y, then
plot behaves exactly the same as described in the previous
paragraph.
If plot is called for an AggExResult object
without specifying the second argument y, then a dendrogram
plot is drawn. This variant returns an invisible
dendrogram object. The showSamples argument
determines whether a complete dendrogram or a dendrogram of clusters
is plotted (see above). If the option horiz=TRUE is used, the
dendrogram is rotated. Note that, in this case, the margin to the
right of the plot may not be wide enough to accommodate long
cluster/sample labels. In such a case, the figure margins have to
be widened before plot is called.
If plot is called for an AggExResult object
along with a matrix or data frame y, y is
again interpreted
as original data set. If one of the two arguments k or
h is present, a clustering is cut out from the cluster hierarchy
using cutree and this clustering is displayed with the
original data set as described above. This variant of
plot returns an invisible ExClust object
containing the extracted clustering.
see details above
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult,
AggExResult, ExClust,
heatmap-methods, apcluster,
apclusterL, aggExCluster,
cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(x=apres) ## show dendrograms plot(aggres1) plot(aggres1, showSamples=TRUE) ## show clustering result for 4 clusters plot(aggres1, x, k=4) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## show dendrogram plot(aggres2) ## show heatmap along with dendrogram heatmap(aggres2) ## show clustering result for 2 clusters plot(aggres2, x, k=2) ## cluster iris data set data(iris) apIris <- apcluster(negDistMat(r=2), iris, q=0) plot(apIris, iris)## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(x=apres) ## show dendrograms plot(aggres1) plot(aggres1, showSamples=TRUE) ## show clustering result for 4 clusters plot(aggres1, x, k=4) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## show dendrogram plot(aggres2) ## show heatmap along with dendrogram heatmap(aggres2) ## show clustering result for 2 clusters plot(aggres2, x, k=2) ## cluster iris data set data(iris) apIris <- apcluster(negDistMat(r=2), iris, q=0) plot(apIris, iris)
Determines meaningful ranges for affinity propagation input preference
## S4 method for signature 'matrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'Matrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'dgTMatrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'sparseMatrix' preferenceRange(s, exact=FALSE)## S4 method for signature 'matrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'Matrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'dgTMatrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'sparseMatrix' preferenceRange(s, exact=FALSE)
s |
an |
exact |
flag indicating whether exact ranges should be computed,
which is relatively slow; if bounds are sufficient,
supply |
Affinity Propagation clustering relies on an appropriate choice of input preferences. This function helps in finding a good choice by determining meaningful lower and upper bounds.
If the similarity matrix s is sparse or if it contains
-Inf similarities, only the similarities are taken into account
that are specified in s and larger than -Inf. In such
cases, the lower bound returned by preferenceRange need not
correspond to one or two clusters. Moreover, it may also happen in
degenerate cases that the lower bound exceeds the upper bound.
In such a case, no warning or error is issued, so it is the user's
responsibility to ensure a proper interpretation of the results.
The method apclusterK makes use of this function
internally and checks the plausibility of the result
returned by preferenceRange.
returns a vector with two entries, the first of which is the minimal input preference (which would lead to 1 or 2 clusters) and the second of which is the maximal input prefence (which would lead to as many clusters as data samples).
Ulrich Bodenhofer and Andreas Kothmeier
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## create similarity matrix sim <- negDistMat(x, r=2) ## determine bounds preferenceRange(sim) ## determine exact range preferenceRange(sim, exact=TRUE)## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## create similarity matrix sim <- negDistMat(x, r=2) ## determine bounds preferenceRange(sim) ## determine exact range preferenceRange(sim, exact=TRUE)
Display methods for S4 classes APResult,
ExClust, and AggExResult
## S4 method for signature 'APResult' show(object) ## S4 method for signature 'ExClust' show(object) ## S4 method for signature 'AggExResult' show(object)## S4 method for signature 'APResult' show(object) ## S4 method for signature 'ExClust' show(object) ## S4 method for signature 'AggExResult' show(object)
object |
an object of class
|
show displays the most important information stored in
object.
For APResult objects,
the number of data samples, the number of clusters, the number of
iterations, the input preference, the final objective
function values, the vector of exemplars, the list of clusters and
for leveraged clustering the selected sample subset are printed.
For ExClust objects,
the number of data samples, the number of clusters,
the vector of exemplars, and list of clusters are printed.
For AggExResult objects,
only the number of data samples and the maximum
number of clusters are printed. For retrieving a particular
clustering level, use the function cutree.
For accessing more detailed information, it is necessary to
access the slots of object directly. Use
str to get a compact overview of all slots of an object.
show returns an invisible NULL
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult,
ExClust, AggExResult,
cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim) ## show details of clustering results show(apres) ## apply agglomerative clustering to apres aggres <- aggExCluster(sim, apres) ## display overview of result show(aggres) ## show clustering level with two clusters show(cutree(aggres, 2))## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim) ## show details of clustering results show(apres) ## apply agglomerative clustering to apres aggres <- aggExCluster(sim, apres) ## display overview of result show(aggres) ## show clustering level with two clusters show(cutree(aggres, 2))
Compute similarity matrices from data set
negDistMat(x, sel=NA, r=1, method="euclidean", p=2) expSimMat(x, sel=NA, r=2, w=1, method="euclidean", p=2) linSimMat(x, sel=NA, w=1, method="euclidean", p=2) corSimMat(x, sel=NA, r=1, signed=TRUE, method="pearson") linKernel(x, sel=NA, normalize=FALSE)negDistMat(x, sel=NA, r=1, method="euclidean", p=2) expSimMat(x, sel=NA, r=2, w=1, method="euclidean", p=2) linSimMat(x, sel=NA, w=1, method="euclidean", p=2) corSimMat(x, sel=NA, r=1, signed=TRUE, method="pearson") linKernel(x, sel=NA, normalize=FALSE)
x |
input data to be clustered; if |
sel |
selected samples subset; vector of row indices for x in increasing order (see details below) |
r |
exponent (see details below) |
w |
radius (see details below) |
signed |
take sign of correlation into account (see details below) |
normalize |
see details below |
method |
type of distance measure to be used; for |
p |
exponent for Minkowski distance; only used for
|
negDistMat creates a square matrix of mutual
pairwise similarities of data vectors as negative distances. The
argument r (default is 1) is used to transform the resulting
distances by computing the r-th power (use r=2 to obtain
negative squared distances as in Frey's and Dueck's demos), i.e.,
given a distance d, the resulting similarity is computed as
. With the parameter sel a subset of samples
can be specified for distance calculation. In this case not the
full distance matrix is computed but a rectangular similarity matrix
of all samples (rows) against the subset (cols) as needed for
leveraged clustering. Internally, the computation of distances is
done using an internal method derived from
dist. All options of this function except
diag and upper can be used, especially method
which allows for selecting different distance measures.
Note that, since version 1.4.4. of the package, there is an additional
method "discrepancy" that implements Weyl's discrepancy measure.
expSimMat computes similarities in a way similar to
negDistMat, but the transformation of distances to similarities
is done in the following way:
The parameter sel allows the creation of a rectangular
similarity matrix. As above, r is an exponent. The parameter w controls
the speed of descent. r=2 in conjunction with Euclidean
distances corresponds to the well-known Gaussian/RBF kernel,
whereas r=1 corresponds to the Laplace kernel. Note that these
similarity measures can also be understood as fuzzy equality relations.
linSimMat provides another way of transforming distances
into similarities by applying the following transformation to a
distance d:
Thw parameter sel is used again for creation of a rectangular
similarity matrix. Here w corresponds to a maximal radius of
interest. Note that this is a fuzzy equality relation with respect to
the Lukasiewicz t-norm.
Unlike the above three functions, linKernel computes pairwise
similarities as scalar products of data vectors, i.e. it corresponds,
as the name suggests, to the “linear kernel”. Use parameter
sel to compute only a submatrix of the full kernel matrix as
described above. If normalize=TRUE, the values are scaled to
the unit sphere in the following way (for two samples x and
y:
The function corSimMat computes pairwise similarities as
correlations. It uses link[stats:cor]{cor} internally.
The method argument is passed on to link[stats:cor]{cor}.
The argument r serves as an exponent with which the correlations
can be transformed. If signed=TRUE (default), negative correlations are
taken into account, i.e. two samples are maximally dissimilar if they
are negatively correlated. If signed=FALSE, similarities are
computed as absolute values of correlations, i.e. two samples are
maximally similar if they are positively or negatively correlated and
the two samples are maximally dissimilar if they are uncorrelated.
Note that the naming of the argument p has been chosen for
consistency with dist and previous versions
of the package. When using leveraged AP in
conjunction with the Minkowski distance, this leads to conflicts with
the input preference parameter p of
apclusterL. In order to avoid that, use the above
functions without x argument to create a custom similarity
measure with fixed parameter p (see example below).
All functions listed above return square or rectangular matrices of similarities.
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Micchelli, C. A. (1986) Interpolation of scattered data: distance matrices and conditionally positive definite functions. Constr. Approx. 2, 11-20.
De Baets, B. and Mesiar, R. (1997) Pseudo-metrics and T-equivalences. J. Fuzzy Math. 5, 471-481.
Bauer, P., Bodenhofer, U., and Klement, E. P. (1996) A fuzzy algorithm for pixel classification based on the discrepancy norm. In Proc. 5th IEEE Int. Conf. on Fuzzy Systems, volume III, pages 2007–2012, New Orleans, LA. DOI: doi:10.1109/FUZZY.1996.552744.
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## create negative distance matrix (default Euclidean) sim1 <- negDistMat(x) ## compute similarities as squared negative distances ## (in accordance with Frey's and Dueck's demos) sim2 <- negDistMat(x, r=2) ## compute RBF kernel sim3 <- expSimMat(x, r=2) ## compute similarities as squared negative distances ## all samples versus a randomly chosen subset ## of 50 samples (for leveraged AP clustering) sel <- sort(sample(1:nrow(x), nrow(x)*0.25)) sim4 <- negDistMat(x, sel, r=2) ## example of leveraged AP using Minkowski distance with non-default ## parameter p cl1 <- cbind(rnorm(150, 0.2, 0.05), rnorm(150, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) apres <- apclusterL(s=negDistMat(method="minkowski", p=2.5, r=2), x, frac=0.2, sweeps=3, p=-0.2) show(apres)## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## create negative distance matrix (default Euclidean) sim1 <- negDistMat(x) ## compute similarities as squared negative distances ## (in accordance with Frey's and Dueck's demos) sim2 <- negDistMat(x, r=2) ## compute RBF kernel sim3 <- expSimMat(x, r=2) ## compute similarities as squared negative distances ## all samples versus a randomly chosen subset ## of 50 samples (for leveraged AP clustering) sel <- sort(sample(1:nrow(x), nrow(x)*0.25)) sim4 <- negDistMat(x, sel, r=2) ## example of leveraged AP using Minkowski distance with non-default ## parameter p cl1 <- cbind(rnorm(150, 0.2, 0.05), rnorm(150, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) apres <- apclusterL(s=negDistMat(method="minkowski", p=2.5, r=2), x, frac=0.2, sweeps=3, p=-0.2) show(apres)
Rearrange clusters according to sort criterion
## S4 method for signature 'ExClust' sort(x, decreasing=FALSE, sortBy=c("aggExCluster", "size", "nameExemplar", "noExemplar"), ...)## S4 method for signature 'ExClust' sort(x, decreasing=FALSE, sortBy=c("aggExCluster", "size", "nameExemplar", "noExemplar"), ...)
x |
|
decreasing |
logical indicating if sorting should be done in decreasing order, see details below |
sortBy |
sort criterion, see details below |
... |
further arguments are ignored; only defined for S3 method consistency |
The function sort takes an APResult
or ExClust clustering object x and creates
a new clustering object of the same class, but with clusters arranged
according to the sort criterion passed as argument sortBy:
(default) order clusters as they
would appear in the dendrogram produced by
aggExCluster. This is also the same ordering in
which the clusters are arranged by heatmap.
Note that this only works if the similarity matrix is included
in the input object x, otherwise an error message is
produced.
sorts clusters according to their size (from small to large).
sorts clusters according to the names of the examplars (if available, otherwise an error is produced).
sorts clusters according to the indices of the examplars.
If decreasing is TRUE, the order is reversed and, for
example, sortBy="size" sorts clusters with such that the larger
clusters come first.
Note that the cluster numbers of x are not preserved by
sort, i.e. the cluster no. 1 of the object returned by
sort is the one that has been ranked first by sort,
which may not necessarily coincide with cluster no. 1 of the original
clustering object x.
Note that this is an S3 method (whereas all other methods in this
package are S4 methods). This inconsistency has been introduced in
order to avoid interoperability problems with the BiocGenerics
package which may overwrite the definition of the sort generic
if it is loaded after the apcluster package.
returns a copy of x, but with slots exemplars and
clusters (see APResult
or ExClust) reordered.
Ulrich Bodenhofer
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create two Gaussian clouds cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06)) cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05)) x <- rbind(cl1,cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7) show(apres) ## show dendrogram plot(aggExCluster(x=apres)) ## default sort order: like in heatmap or dendrogram show(sort(apres)) ## show dendrogram (note the different cluster numbers!) plot(aggExCluster(x=sort(apres))) ## sort by size show(sort(apres, decreasing=TRUE, sortBy="size"))## create two Gaussian clouds cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06)) cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05)) x <- rbind(cl1,cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7) show(apres) ## show dendrogram plot(aggExCluster(x=apres)) ## default sort order: like in heatmap or dendrogram show(sort(apres)) ## show dendrogram (note the different cluster numbers!) plot(aggExCluster(x=sort(apres))) ## sort by size show(sort(apres, decreasing=TRUE, sortBy="size"))