Title: | Affinity Propagation Clustering |
---|---|
Description: | Implements Affinity Propagation clustering introduced by Frey and Dueck (2007) <DOI:10.1126/science.1136800>. The algorithms are largely analogous to the 'Matlab' code published by Frey and Dueck. The package further provides leveraged affinity propagation and an algorithm for exemplar-based agglomerative clustering that can also be used to join clusters obtained from affinity propagation. Various plotting functions are available for analyzing clustering results. |
Authors: | Ulrich Bodenhofer [aut, cre], Johannes Palme [ctb], Chrats Melkonian [ctb], Andreas Kothmeier [aut], Nikola Kostic [ctb] |
Maintainer: | Ulrich Bodenhofer <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.4.13 |
Built: | 2024-12-22 04:18:13 UTC |
Source: | https://github.com/ubod/apcluster |
The apcluster package implements affinity propagation according to Frey and Dueck and a method for exemplar-based agglomerative clustering. It further offers various functions for plotting clustering results.
The central function is apcluster
. It runs affinity
propagation on a given similarity matrix or it creates a similarity matrix
for a given data set and similarity measure and runs affinity propagation
on this matrix. The function returns an APResult
object from which the clustering itself and information about the affinity
propagation run can be obtained. Leveraged affinity propagation clustering
apclusterL
allows efficient clustering of large datasets by
using only a subset of the similarities. The package further implements
an exemplar-based agglomerative clustering method aggExCluster
that can be used for computing a complete cluster hierarchy, but also for
joining fine-grained clusters previously obtained by affinity propagation
clustering. Further functions are implemented to visualize the
results and to create distance matrices.
Ulrich Bodenhofer, Andreas Kothmeier, Johannes Palme, Chrats Melkonian, and Nikola Kostic
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim, details=TRUE) ## show details of clustering results show(apres) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## employ agglomerative clustering to join clusters aggres <- aggExCluster(sim, apres) ## show information show(aggres) show(cutree(aggres, 2)) ## plot dendrogram plot(aggres) ## plot clustering result for k=2 clusters plot(aggres, x, k=2) ## plot heatmap heatmap(apres, sim) ## leveraged apcluster apresL <- apclusterL(s=negDistMat(r=2), x=x, frac=0.2, sweeps=3) ## show details of clustering results show(apresL) ## plot clustering result plot(apresL, x)
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim, details=TRUE) ## show details of clustering results show(apres) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## employ agglomerative clustering to join clusters aggres <- aggExCluster(sim, apres) ## show information show(aggres) show(cutree(aggres, 2)) ## plot dendrogram plot(aggres) ## plot clustering result for k=2 clusters plot(aggres, x, k=2) ## plot heatmap heatmap(apres, sim) ## leveraged apcluster apresL <- apclusterL(s=negDistMat(r=2), x=x, frac=0.2, sweeps=3) ## show details of clustering results show(apresL) ## plot clustering result plot(apresL, x)
Runs exemplar-based agglomerative clustering
## S4 method for signature 'matrix,missing' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'matrix,ExClust' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'Matrix,missing' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'Matrix,ExClust' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'missing,ExClust' aggExCluster(s, x, includeSim=TRUE) ## S4 method for signature 'function,ANY' aggExCluster(s, x, includeSim=TRUE, ...) ## S4 method for signature 'character,ANY' aggExCluster(s, x, includeSim=TRUE, ...)
## S4 method for signature 'matrix,missing' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'matrix,ExClust' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'Matrix,missing' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'Matrix,ExClust' aggExCluster(s, x, includeSim=FALSE) ## S4 method for signature 'missing,ExClust' aggExCluster(s, x, includeSim=TRUE) ## S4 method for signature 'function,ANY' aggExCluster(s, x, includeSim=TRUE, ...) ## S4 method for signature 'character,ANY' aggExCluster(s, x, includeSim=TRUE, ...)
s |
an |
x |
either a prior clustering of class |
includeSim |
if |
... |
all other arguments are passed to the selected similarity function as they are. |
aggExCluster
performs agglomerative clustering.
Unlike other methods, e.g., the ones implemented in hclust
,
aggExCluster
is computing exemplars for each cluster and
its merging objective is geared towards the identification of
meaningful exemplars, too.
For each pair of clusters, the merging objective is computed as follows:
An intermediate cluster is created as the union of the two clusters.
The potential exemplar is selected from the intermediate cluster as the sample that has the largest average similarity to all other samples in the intermediate cluster.
Then the average similarity of the exemplar with all samples in the first cluster and the average similarity with all samples in the second cluster is computed. These two values measure how well the joint exemplar describes the samples in the two clusters.
The merging objective is finally computed as the average of the two measures above. Hence, we can consider the merging objective as some kind of “balanced average similarity to the joint exemplar”.
In each step, all pairs of clusters are considered and the pair with the largest merging objective is actually merged. The joint exemplar is then chosen as the exemplar of the merged cluster.
aggExCluster
can be used in two ways, either by performing
agglomerative clustering of an entire data set or by performing
agglomerative clustering of data previously clustered by
affinity propagation or another clustering algorithm.
Agglomerative clustering of an entire data set can be
accomplished either by calling aggExCluster
on a
quadratic similarity matrix without further argument or by
calling aggExCluster
for a function or function name
along with data to be clustered (as argument x
).
A full agglomeration run is performed that starts from l
clusters (all samples in separate one-element clusters) and ends
with one cluster (all samples in one single cluster).
Agglomerative clustering starting from a given clustering
result can be accomplished by calling aggExCluster
for
an APResult
or ExClust
object passed as parameter x
. The similarity matrix
can either be passed as argument s
or, if missing,
aggExCluster
looks if the similarity matrix is
included in the clustering object x
. A cluster hierarchy
with numbers of clusters ranging from the
number of clusters in x
down to 1 is created.
The result is stored in an AggExResult
object.
The slot height
is filled with the merging
objective of each of the maxNoClusters-1
merges. The slot
order
contains a permutation of the samples/clusters for
dendrogram plotting. The algorithm for computing this permutation
is the same as the one used in hclust
. If aggExCluster
was called for an entire data set, the slot label
contains the names of the objects to be clustered (if available,
otherwise the indices are used). If aggExCluster
was called
for a prior clustering, then labels are set to ‘Cluster 1’,
‘Cluster 2’, etc.
Upon successful completion, the function returns an
AggExResult
object.
Similarity matrices can be supplied in dense or sparse format. Note, however, that sparse matrices are converted to full dense matrices before clustering which may lead to memory and/or performance bottlenecks for larger data sets.
Ulrich Bodenhofer, Johannes Palme, and Nikola Kostic
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
AggExResult
, apcluster-methods
,
plot-methods
, heatmap-methods
,
cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute agglomerative clustering from scratch aggres1 <- aggExCluster(negDistMat(r=2), x) ## show results show(aggres1) ## plot dendrogram plot(aggres1) ## plot heatmap along with dendrogram heatmap(aggres1) ## plot level with two clusters plot(aggres1, x, k=2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7) ## create hierarchy of clusters determined by affinity propagation aggres2 <- aggExCluster(x=apres) ## show results show(aggres2) ## plot dendrogram plot(aggres2) plot(aggres2, showSamples=TRUE) ## plot heatmap heatmap(aggres2) ## plot level with two clusters plot(aggres2, x, k=2)
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute agglomerative clustering from scratch aggres1 <- aggExCluster(negDistMat(r=2), x) ## show results show(aggres1) ## plot dendrogram plot(aggres1) ## plot heatmap along with dendrogram heatmap(aggres1) ## plot level with two clusters plot(aggres1, x, k=2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7) ## create hierarchy of clusters determined by affinity propagation aggres2 <- aggExCluster(x=apres) ## show results show(aggres2) ## plot dendrogram plot(aggres2) plot(aggres2, showSamples=TRUE) ## plot heatmap heatmap(aggres2) ## plot level with two clusters plot(aggres2, x, k=2)
S4 class for storing results of exemplar-based agglomerative clustering
Objects of this class can be created by calling aggExCluster
for a given similarity matrix.
The following slots are defined for AggExResult objects:
l
:number of samples in the data set
sel
:subset of samples used for leveraged clustering (empty for normal clustering)
maxNoClusters
:maximum number of clusters in the
cluster hierarchy, i.e. it
contains clusterings with 1 - maxNoClusters
clusters.
exemplars
:list of length maxNoClusters
;
the i
-th component of the list is a vector of i
exemplars (corresponding to the level with i
clusters).
clusters
:list of length maxNoClusters
;
the i
-th component of clusters
is a list of i
clusters, each of which is a vector of sample indices.
merge
:a maxNoClusters-1
by 2 matrix that
contains the merging hierarchy; fully analogous to the
slot merge
in the class hclust
.
height
:a vector of length maxNoClusters-1
that
contains the merging objective of each merge; largely analogous to
the slot height
in the class hclust
except
that the slot height
in AggExResult
objects is
supposed to be non-increasing, since aggExCluster
is based on similarities, whereas hclust
uses
dissimilarities.
order
:a vector containing a permutation of indices
that can be used for plotting proper dendrograms without crossing
branches; fully analogous to the
slot order
in the class hclust
.
labels
:a character vector containing labels of clustered objects used for plotting dendrograms.
sim
:similarity matrix; only available if
aggExCluster
was called with similarity
function and includeSim=TRUE
.
call
:method call used to produce this clustering result
signature(x="AggExResult")
: see
plot-methods
signature(x="AggExResult", y="matrix")
: see
plot-methods
signature(x="AggExResult")
: see
heatmap-methods
signature(x="AggExResult", y="matrix")
: see
heatmap-methods
signature(object="AggExResult")
: see
show-methods
signature(object="AggExResult", k="ANY",
h="ANY")
: see cutree-methods
signature(x="AggExResult")
: gives the number of
clustering levels in the clustering result.
signature(x="AggExResult")
: see
coerce-methods
signature(object="AggExResult")
: see
coerce-methods
In the following code snippets, x
is an AggExResult
object.
signature(x="AggExResult", i="index", j="missing")
:
x[[i]]
returns an object of class
ExClust
corresponding to the clustering level
with i
clusters; synonymous to cutree(x, i)
.
signature(x="AggExResult", i="index", j="missing",
drop="missing")
: x[i]
returns a list of ExClust
objects with all clustering levels specified in vector i
.
So, the list has as many components as the argument i
has
elements. A list is returned even if i
is a single level.
signature(x="AggExResult")
: gives the similarity
matrix.
Ulrich Bodenhofer, Johannes Palme, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
aggExCluster
, show-methods
,
plot-methods
, cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## compute agglomerative clustering from scratch aggres1 <- aggExCluster(sim) ## show results show(aggres1) ## plot dendrogram plot(aggres1) ## plot heatmap along with dendrogram heatmap(aggres1, sim) ## plot level with two clusters plot(aggres1, x, k=2) ## run affinity propagation apres <- apcluster(sim, q=0.7) ## create hierarchy of clusters determined by affinity propagation aggres2 <- aggExCluster(sim, apres) ## show results show(aggres2) ## plot dendrogram plot(aggres2) ## plot heatmap heatmap(aggres2, sim) ## plot level with two clusters plot(aggres2, x, k=2)
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## compute agglomerative clustering from scratch aggres1 <- aggExCluster(sim) ## show results show(aggres1) ## plot dendrogram plot(aggres1) ## plot heatmap along with dendrogram heatmap(aggres1, sim) ## plot level with two clusters plot(aggres1, x, k=2) ## run affinity propagation apres <- apcluster(sim, q=0.7) ## create hierarchy of clusters determined by affinity propagation aggres2 <- aggExCluster(sim, apres) ## show results show(aggres2) ## plot dendrogram plot(aggres2) ## plot heatmap heatmap(aggres2, sim) ## plot level with two clusters plot(aggres2, x, k=2)
Runs affinity propagation clustering
## S4 method for signature 'matrix,missing' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'dgTMatrix,missing' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'sparseMatrix,missing' apcluster(s, x, ...) ## S4 method for signature 'Matrix,missing' apcluster(s, x, ...) ## S4 method for signature 'character,ANY' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, ...) ## S4 method for signature 'function,ANY' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, ...)
## S4 method for signature 'matrix,missing' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'dgTMatrix,missing' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'sparseMatrix,missing' apcluster(s, x, ...) ## S4 method for signature 'Matrix,missing' apcluster(s, x, ...) ## S4 method for signature 'character,ANY' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, ...) ## S4 method for signature 'function,ANY' apcluster(s, x, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, ...)
s |
an |
x |
input data to be clustered; if |
p |
input preference; can be a vector that specifies
individual preferences for each data point. If scalar,
the same value is used for all data points. If |
q |
if |
maxits |
maximal number of iterations that should be executed |
convits |
the algorithm terminates if the examplars have not
changed for |
lam |
damping factor; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur |
includeSim |
if |
details |
if |
nonoise |
|
seed |
for reproducibility, the seed of the random number
generator can be set to a fixed value before
adding noise (see above), if |
... |
for the methods with signatures |
Affinity Propagation clusters data using a set of real-valued pairwise data point similarities as input. Each cluster is represented by a cluster center data point (the so-called exemplar). The method is iterative and searches for clusters maximizing an objective function called net similarity.
When called with a similarity matrix as input (which may also be a
sparse matrix according to the Matrix package), the function performs
AP clustering. When called with the name of a package-provided
similarity function or a user-provided similarity function object and
input data, the function first computes the similarity matrix before
performing AP clustering. The similarity
matrix is returned for later use as part of the
APResult
object depending on whether includeSim
was set to TRUE
(see
argument description above).
Apart from minor adaptations and optimizations, the AP
clustering functionality of the function apcluster
is
largely analogous to Frey's and Dueck's Matlab code
(see https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/).
The new argument q
allows for better controlling the number of
clusters without knowing the distribution of similarity
values. A meaningful range for the parameter p
can be determined
using the function preferenceRange
. Alternatively, a
certain fixed number of clusters may be desirable. For this purpose,
the function apclusterK
is available.
Upon successful completion, the function returns an
APResult
object.
Ulrich Bodenhofer, Andreas Kothmeier, Johannes Palme, and Chrats Melkonian
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult
, show-methods
,
plot-methods
, labels-methods
,
preferenceRange
, apclusterL-methods
,
apclusterK
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix and run affinity propagation ## (p defaults to median of similarity) apres <- apcluster(negDistMat(r=2), x, details=TRUE) ## show details of clustering results show(apres) ## plot clustering result plot(apres, x) ## plot heatmap heatmap(apres) ## run affinity propagation with default preference of 10% quantile ## of similarities; this should lead to a smaller number of clusters ## reuse similarity matrix from previous run apres <- apcluster(s=apres@sim, q=0.1) show(apres) plot(apres, x) ## now try the same with RBF kernel sim <- expSimMat(x, r=2) apres <- apcluster(s=sim, q=0.2) show(apres) plot(apres, x) ## create sparse similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apcluster(ssim, q=0) apres
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix and run affinity propagation ## (p defaults to median of similarity) apres <- apcluster(negDistMat(r=2), x, details=TRUE) ## show details of clustering results show(apres) ## plot clustering result plot(apres, x) ## plot heatmap heatmap(apres) ## run affinity propagation with default preference of 10% quantile ## of similarities; this should lead to a smaller number of clusters ## reuse similarity matrix from previous run apres <- apcluster(s=apres@sim, q=0.1) show(apres) plot(apres, x) ## now try the same with RBF kernel sim <- expSimMat(x, r=2) apres <- apcluster(s=sim, q=0.2) show(apres) plot(apres, x) ## create sparse similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apcluster(ssim, q=0) apres
Runs affinity propagation demo for randomly generated data set according to Frey and Dueck
apclusterDemo(l=100, d=2, seed=NA, ...)
apclusterDemo(l=100, d=2, seed=NA, ...)
l |
number of data points to be generated |
d |
dimension of data to be created |
seed |
for reproducibility, the seed of the random number
generator can be set to a fixed value; if |
... |
all other arguments are passed on to
|
apclusterDemo
creates l
d
-dimensional
data points that are uniformly distributed in . Affinity
propagation is executed for this data set with default parameters.
Alternative settings can be passed to
apcluster
with
additional arguments. After completion of affinity propagation,
the results are shown and the performance measures are plotted.
This function corresponds to the demo function in the original Matlab code of Frey and Dueck. We warn the user, however, that uniformly distributed data are not necessarily ideal for demonstrating clustering, as there can never be real clusters in uniformly distributed data - all clusters found must be random artefacts.
Upon successful completion, the function returns an invisible list
with three components. The first is the data set that has been
created, the second is the similarity matrix, and the third is an
APResult
object with the clustering results (see
examples below).
Ulrich Bodenhofer, Johannes Palme, and Johannes Palme
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult
, plot-methods
,
apcluster
, apclusterL
## create random data set and run affinity propagation apd <- apclusterDemo() ## plot clustering result along with data set plot(apd[[3]], apd[[1]])
## create random data set and run affinity propagation apd <- apclusterDemo() ## plot clustering result along with data set plot(apd[[3]], apd[[1]])
Runs affinity propagation clustering for a given similarity matrix adjusting input preferences iteratively in order to achieve a desired number of clusters
## S4 method for signature 'matrix,missing' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE) ## S4 method for signature 'Matrix,missing' apclusterK(s, x, K, ...) ## S4 method for signature 'dgTMatrix,missing' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE) ## S4 method for signature 'sparseMatrix,missing' apclusterK(s, x, K, ...) ## S4 method for signature 'function,ANY' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE, ...) ## S4 method for signature 'character,ANY' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE, ...)
## S4 method for signature 'matrix,missing' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE) ## S4 method for signature 'Matrix,missing' apclusterK(s, x, K, ...) ## S4 method for signature 'dgTMatrix,missing' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE) ## S4 method for signature 'sparseMatrix,missing' apclusterK(s, x, K, ...) ## S4 method for signature 'function,ANY' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE, ...) ## S4 method for signature 'character,ANY' apclusterK(s, x, K, prc=10, bimaxit=20, exact=FALSE, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, details=FALSE, nonoise=FALSE, seed=NA, verbose=TRUE, ...)
s |
an |
x |
input data to be clustered; if |
K |
desired number of clusters |
prc |
the algorithm stops if the number of clusters does not deviate more than prc percent from desired value K; set to 0 if you want to have exactly K clusters |
bimaxit |
maximum number of bisection steps to perform; note that no warning is issued if the number of clusters is still not in the desired range |
exact |
flag indicating whether or not to compute the initial
preference range exactly (see |
maxits |
maximal number of iterations that |
convits |
|
lam |
damping factor for |
includeSim |
if |
details |
if |
nonoise |
|
seed |
for reproducibility, the seed of the random number
generator can be set to a fixed value, if |
verbose |
flag indicating whether status information should be displayed during bisection |
... |
for the methods with signatures |
apclusterK
first runs preferenceRange
to determine
the range of meaningful choices of the input preference p
. Then
it decreases p
exponentially for a few iterations to obtain a
good initial guess for p
. If the number of clusters is still
too far from the desired goal, bisection is applied.
When called with a similarity matrix as input, the function performs
the procedure described above. When called with the name of a package-provided
similarity function or a user-provided similarity function object and
input data, the function first computes the similarity matrix before
running apclusterK
on this similarity matrix. The similarity
matrix is returned for later use as part of the APResult object
depending on whether includeSim
was set to TRUE
(see
argument description above).
Apart from minor adaptations and optimizations, the implementation is largely analogous to Frey's and Dueck's Matlab code (see https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/).
Upon successful completion, the function returns a
APResult
object.
Ulrich Bodenhofer and Andreas Kothmeier
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
apcluster
, preferenceRange
,
APResult
## create three Gaussian clouds cl1 <- cbind(rnorm(70, 0.2, 0.05), rnorm(70, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) cl3 <- cbind(rnorm(60, 0.8, 0.04), rnorm(60, 0.8, 0.05)) x <- rbind(cl1, cl2, cl3) ## run affinity propagation such that 3 clusters are obtained apres <- apclusterK(negDistMat(r=2), x, K=3) ## show details of clustering results show(apres) ## plot clustering result plot(apres, x) ## create sparse similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apclusterK(ssim, K=2) apres
## create three Gaussian clouds cl1 <- cbind(rnorm(70, 0.2, 0.05), rnorm(70, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) cl3 <- cbind(rnorm(60, 0.8, 0.04), rnorm(60, 0.8, 0.05)) x <- rbind(cl1, cl2, cl3) ## run affinity propagation such that 3 clusters are obtained apres <- apclusterK(negDistMat(r=2), x, K=3) ## show details of clustering results show(apres) ## plot clustering result plot(apres, x) ## create sparse similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apclusterK(ssim, K=2) apres
Runs leveraged affinity propagation clustering
## S4 method for signature 'matrix,missing' apclusterL(s, x, sel, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'character,ANY' apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...) ## S4 method for signature 'function,ANY' apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...)
## S4 method for signature 'matrix,missing' apclusterL(s, x, sel, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=FALSE, nonoise=FALSE, seed=NA) ## S4 method for signature 'character,ANY' apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...) ## S4 method for signature 'function,ANY' apclusterL(s, x, frac, sweeps, p=NA, q=NA, maxits=1000, convits=100, lam=0.9, includeSim=TRUE, nonoise=FALSE, seed=NA, ...)
s |
an |
x |
input data to be clustered; if |
frac |
fraction of samples that should be used for leveraged clustering. The similarity matrix will be generated for all samples against a random fraction of the samples as specified by this parameter. |
sweeps |
number of sweeps of leveraged clustering performed with changing randomly selected subset of samples. |
sel |
selected sample indices; a vector containing the sample indices of the sample subset used for leveraged AP clustering in increasing order. |
p |
input preference; can be a vector that specifies
individual preferences for each data point. If scalar,
the same value is used for all data points. If |
q |
if |
maxits |
maximal number of iterations that should be executed |
convits |
the algorithm terminates if the examplars have not
changed for |
lam |
damping factor; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur |
includeSim |
if |
nonoise |
|
seed |
for reproducibility, the seed of the random number
generator can be set to a fixed value before
adding noise (see above), if |
... |
all other arguments are passed to the selected
similarity function as they are; note that possible name conflicts between
arguments of |
Affinity Propagation clusters data using a set of real-valued pairwise similarities as input. Each cluster is represented by a representative cluster center (the so-called exemplar). The method is iterative and searches for clusters maximizing an objective function called net similarity.
Leveraged Affinity Propagation reduces dynamic and static load for large datasets. Only a subset of the samples are considered in the clustering process assuming that they provide already enough information about the cluster structure.
When called with input data and the name of a package provided or a user
provided similarity function the function selects a random sample subset
according to the frac
parameter, calculates a rectangular
similarity matrix of all samples against this subset and repeats
affinity propagation sweep
times. A new sample subset is used
for each repetition. The clustering result of the sweep with the highest
net similarity is returned. Any parameters specific to the chosen
method of similarity calculation can be passed to apcluster
in addition to the parameters described above. The similarity matrix
for the best trial is also returned in the result object when requested
by the user (argument includeSim
).
When called with a rectangular similarity matrix (which represents a
column subset of the full similarity matrix) the function performs
AP clustering on this similarity matrix. The information
about the selected samples is passed to clustering with the
parameter sel
. This function is only needed when the user needs full
control of distance calculation or sample subset selection.
Apart from minor adaptations and optimizations, the implementation
of the function apclusterL
is largely analogous to Frey's and Dueck's Matlab code
(see https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/).
Upon successful completion, both functions returns an
APResult
object.
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult
, show-methods
,
plot-methods
, labels-methods
,
preferenceRange
, apcluster-methods
,
apclusterK
## create two Gaussian clouds cl1 <- cbind(rnorm(150, 0.2, 0.05), rnorm(150, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## leveraged apcluster apres <- apclusterL(negDistMat(r=2), x, frac=0.2, sweeps=3, p=-0.2) ## show details of leveraged clustering results show(apres) ## plot leveraged clustering result plot(apres, x) ## plot heatmap of clustering result heatmap(apres) ## show net similarities of single sweeps apres@netsimLev ## show samples on which best sweep was based apres@sel
## create two Gaussian clouds cl1 <- cbind(rnorm(150, 0.2, 0.05), rnorm(150, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## leveraged apcluster apres <- apclusterL(negDistMat(r=2), x, frac=0.2, sweeps=3, p=-0.2) ## show details of leveraged clustering results show(apres) ## plot leveraged clustering result plot(apres, x) ## plot heatmap of clustering result heatmap(apres) ## show net similarities of single sweeps apres@netsimLev ## show samples on which best sweep was based apres@sel
S4 class for storing results of affinity propagation
clustering. It extends the class ExClust
.
Objects of this class can be created by calling apcluster
or apclusterL
for a given similarity matrix or calling
one of these procedures with a data set and a similarity measure.
The following slots are defined for APResult objects. Most names are taken from Frey's and Dueck's original Matlab package:
sweeps
:number of times leveraged clustering ran with different subsets of samples
it
:number of iterations the algorithm ran
p
:input preference (either set by user or
computed by apcluster
or
apclusterL
)
netsim
:final total net similarity, defined as the
sum of expref
and dpsim
(see below)
dpsim
:final sum of similarities of data points to exemplars
expref
:final sum of preferences of the identified exemplars
netsimLev
:total net similarity of the individual sweeps for leveraged clustering; only available for leveraged clustering
netsimAll
:vector containing the total net similarity
for each iteration; only available if
apcluster
was called with
details=TRUE
exprefAll
:vector containing the sum of preferences
of the identified exemplars
for each iteration; only available if
apcluster
was called with
details=TRUE
dpsimAll
:vector containing the sum of similarities
of data points to exemplars
for each iteration; only available if
apcluster
was called with
details=TRUE
idxAll
:matrix with sample-to-exemplar indices
for each iteration; only available if
apcluster
was called with
details=TRUE
Class "ExClust"
, directly.
signature(x="APResult")
: see
plot-methods
signature(x="ExClust", y="matrix")
: see
plot-methods
signature(x="ExClust")
: see
heatmap-methods
signature(x="ExClust", y="matrix")
: see
heatmap-methods
signature(object="APResult")
: see
show-methods
signature(object="APResult")
: see
labels-methods
signature(object="APResult")
: see
cutree-methods
signature(x="APResult")
: gives the number of
clusters.
signature(x="ExClust")
: see
sort-methods
signature(x="ExClust")
: see
coerce-methods
signature(object="ExClust")
: see
coerce-methods
In the following code snippets, x
is an APResult
object.
signature(x="APResult", i="index", j="missing")
:
x[[i]]
returns the i-th cluster as a list of indices
of samples belonging to the i-th cluster.
signature(x="APResult", i="index", j="missing",
drop="missing")
: x[i]
returns a list of integer vectors with the
indices of samples belonging to this cluster. The list has as
many components as the argument i
has elements. A list is
returned even if i
is a single integer.
signature(x="APResult")
: gives the similarity
matrix.
Ulrich Bodenhofer, Andreas Kothmeier, Johannes Palme
https://github.com/UBod/apcluster
APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
apcluster
, apclusterL
,
show-methods
, plot-methods
,
labels-methods
, cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim, details=TRUE) ## show details of clustering results show(apres) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## plot heatmap heatmap(apres, sim)
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim, details=TRUE) ## show details of clustering results show(apres) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## plot heatmap heatmap(apres, sim)
Functions for coercing clustering object to hclust and dendrogram objects
## S4 method for signature 'AggExResult' as.hclust(x, base=0.05) ## S4 method for signature 'ExClust' as.hclust(x, base=0.05, ...) ## S4 method for signature 'AggExResult' as.dendrogram(object, base=0.05, useNames=TRUE) ## S4 method for signature 'ExClust' as.dendrogram(object, base=0.05, useNames=TRUE, ...)
## S4 method for signature 'AggExResult' as.hclust(x, base=0.05) ## S4 method for signature 'ExClust' as.hclust(x, base=0.05, ...) ## S4 method for signature 'AggExResult' as.dendrogram(object, base=0.05, useNames=TRUE) ## S4 method for signature 'ExClust' as.dendrogram(object, base=0.05, useNames=TRUE, ...)
x |
a clustering result object of class
|
object |
a clustering result object of class
|
base |
fraction of height used for the very first join; defaults to 0.05, i.e. the first join appears at 5% of the total height of the dendrogram (see details below). |
useNames |
if |
... |
all other arguments are passed on to
|
If called for an AggExResult
object,
as.hclust
creates an hclust
object.
The heights are transformed to the interval from base
(height
of lowest join) to 1 (height of highest join).
If called for an ExClust
or
APResult
object, aggExCluster
is
called internally to create a cluster hierarchy first. This is only
possible if the pairwise similarities are included in the sim
slot of x
(see aggExCluster
on how to ensure
this).
If x
is an AggExResult
object obtained by
clustering an entire data set, as.hclust
produces a complete
hierarchy. If, however, x
is an ExClust
(or
APResult
) object or an
AggExResult
obtained by running
aggExCluster
on an ExClust
or
APResult
object, then as.hclust
produces
a hierarchy of clusters, not of samples.
If called for an AggExResult
object,
as.dendrogram
creates an
dendrogram
object.
Analogously to as.hclust
, the heights are transformed to the
interval ranging from base
(height
of lowest join) to 1 (height of highest join). So, any information
about heights of merges is lost. If the original join heights are
relevant, call plot
on the original
AggExResult
object directly without coercing it
to a dendrogram
object first.
If called for an ExClust
or
APResult
object, aggExCluster
is
called first to create a cluster hierarchy. Again this is only
possible if the pairwise similarities are included in the sim
slot of object
.
If object
is an AggExResult
object obtained by
clustering an entire data set, as.dendrogram
produces a complete
dendrogram. If object
is an ExClust
(or
APResult
) object or an
AggExResult
obtained by previously running
aggExCluster
on an ExClust
or
APResult
object, then as.dendrogram
produces
a complete dendrogram of all samples, too, but with the difference
that entire clusters of the previous ExClust
or
APResult
object are not further split up
hierarchically.
Consequently, if x
is not a complete cluster hierarchy, but a
hierarchy of clusters, as.dendrogram(as.hclust(x))
produces a
dendrogram of clusters, whereas as.dendrogram(x)
in any case
produces a dendrogram of samples (with the special property mentioned
above).
see details above
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult
,
AggExResult
, ExClust
,
heatmap-methods
, apcluster
,
apclusterL
, aggExCluster
,
cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(x=apres) ## compute and plot dendrogram dend1 <- as.dendrogram(aggres1) dend1 plot(dend1) ## compute and show dendrogram computed from hclust object dend2 <- as.dendrogram(as.hclust(aggres1)) dend2 plot(dend2) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## compute and plot dendrogram dend3 <- as.dendrogram(aggres2) dend3 plot(dend3)
## create two Gaussian clouds cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(x=apres) ## compute and plot dendrogram dend1 <- as.dendrogram(aggres1) dend1 plot(dend1) ## compute and show dendrogram computed from hclust object dend2 <- as.dendrogram(as.hclust(aggres1)) dend2 plot(dend2) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## compute and plot dendrogram dend3 <- as.dendrogram(aggres2) dend3 plot(dend3)
Converts a dense similarity matrix into a sparse one or vice versa
## S4 method for signature 'matrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'Matrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'sparseMatrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'matrix' as.DenseSimilarityMatrix(s, fill=-Inf) ## S4 method for signature 'Matrix' as.DenseSimilarityMatrix(s, fill=-Inf) ## S4 method for signature 'sparseMatrix' as.DenseSimilarityMatrix(s, fill=-Inf)
## S4 method for signature 'matrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'Matrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'sparseMatrix' as.SparseSimilarityMatrix(s, lower=-Inf) ## S4 method for signature 'matrix' as.DenseSimilarityMatrix(s, fill=-Inf) ## S4 method for signature 'Matrix' as.DenseSimilarityMatrix(s, fill=-Inf) ## S4 method for signature 'sparseMatrix' as.DenseSimilarityMatrix(s, fill=-Inf)
s |
a similarity matrix in sparse or dense format (see details below) |
lower |
cut-off threshold to apply when converting similarity
matrices into sparse format. All similarities lower than or equal to
|
fill |
value to fill in for entries that are missing from sparse
similarity matrix 's' (defaults to |
The function as.SparseSimilarityMatrix
takes a matrix argument,
removes all diagonal elements and all values that are lower than or
equal to the cut-off threshold lower
and returns a sparse
matrix of class dgTMatrix
.
If the function as.DenseSimilarityMatrix
is called for a
sparse matrix (class sparseMatrix
or any
class derived from this class), a dense matrix is returned, where all
values that were missing in the sparse matrix are replaced with
fill
.
as.DenseSimilarityMatrix
can also be called for dense
matrix
and Matrix
objects.
In this case, as.DenseSimilarityMatrix
assumes that the
matrices have three columns that encode for a sparse matrix
in the same way as the Matlab implementation of Frey's and Dueck's
sparse affinity propagation accepts it:
the first column contains 1-based row indices, the second column
contains 1-based column indices, and the third column contains the
similarity values. The same format is also accepted by
as.SparseSimilarityMatrix
to convert a sparse similarity matrix
of this format into a dgTMatrix
object.
Note that, for matrices of this format,
as.DenseSimilarityMatrix
replaces the deprectated function
sparseToFull
that was used in older versions of the package.
Note that as.SparseSimilarityMatrix
and
as.DenseSimilarityMatrix
are no S4 coercion methods.
There are no classes named SparseSimilarityMatrix
or DenseSimilarityMatrix
.
returns a square similarity matrix in sparse format (class
dgTMatrix
or in dense format (standard class
matrix
).
Ulrich Bodenhofer
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create similarity matrix in sparse format according to Frey and Dueck sp <- matrix(c(1, 2, 0.5, 3, 1, 0.2, 5, 4, -0.2, 3, 4, 1.2), 4, 3, byrow=TRUE) sp ## perform conversions as.DenseSimilarityMatrix(sp, fill=0) as.SparseSimilarityMatrix(sp) ## create dense similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apcluster(ssim, q=0) apres
## create similarity matrix in sparse format according to Frey and Dueck sp <- matrix(c(1, 2, 0.5, 3, 1, 0.2, 5, 4, -0.2, 3, 4, 1.2), 4, 3, byrow=TRUE) sp ## perform conversions as.DenseSimilarityMatrix(sp, fill=0) as.SparseSimilarityMatrix(sp) ## create dense similarity matrix cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05)) x <- rbind(cl1, cl2) sim <- negDistMat(x, r=2) ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2) ## run apcluster() on the sparse similarity matrix apres <- apcluster(ssim, q=0) apres
Cut out a clustering level from a cluster hierarchy
## S4 method for signature 'AggExResult' cutree(tree, k, h) ## S4 method for signature 'APResult' cutree(tree, k, h)
## S4 method for signature 'AggExResult' cutree(tree, k, h) ## S4 method for signature 'APResult' cutree(tree, k, h)
tree |
an object of class |
k |
the level (i.e. the number of clusters) to be selected |
h |
alternatively, the level can be selected by specifying a cut-off for the merging objective |
The function cutree
extracts a clustering level from a
cluster hierarchy stored in an AggExResult
object. Which level is selected can be determined by one of the
two arguments k
and h
(see above). If both k
and
h
are specified, k
overrides h
. This is
done largely analogous to the standard function
cutree
. The differences are (1) that
only one level can be extracted at a time and (2) that an
ExClust
is returned instead of an index list.
The function cutree
may further be used to convert an
APResult
object into an
ExClust
object. In this case, the arguments
k
and h
are ignored.
returns an object of class ExClust
Ulrich Bodenhofer and Andreas Kothmeier
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create two simple clusters x <- c(1, 2, 3, 7, 8, 9) names(x) <- c("a", "b", "c", "d", "e", "f") ## compute similarity matrix (negative squared distance) sim <- negDistMat(x, r=2) ## run affinity propagation aggres <- aggExCluster(sim) ## show details of clustering results show(aggres) ## retrieve clustering with 2 clusters cutree(aggres, 2) ## retrieve clustering with cut-off h=-1 cutree(aggres, h=-1)
## create two simple clusters x <- c(1, 2, 3, 7, 8, 9) names(x) <- c("a", "b", "c", "d", "e", "f") ## compute similarity matrix (negative squared distance) sim <- negDistMat(x, r=2) ## run affinity propagation aggres <- aggExCluster(sim) ## show details of clustering results show(aggres) ## retrieve clustering with 2 clusters cutree(aggres, 2) ## retrieve clustering with cut-off h=-1 cutree(aggres, h=-1)
S4 class for storing exemplar-based clusterings
Objects of this class can be created by calling cutree
to cut out a clustering level from a cluster hierarchy
of class AggExResult
. Moreover,
cutree
can also be used to convert an object of
class APResult
to class ExClust
.
The following slots are defined for ExClust objects:
l
:number of samples in the data set
sel
:subset of samples used for leveraged clustering
exemplars
:vector containing indices of exemplars
clusters
:list containing the clusters; the i-th component is a vector of indices of data points belonging to the i-th exemplar (including the exemplar itself)
idx
:vector of length l
realizing a
sample-to-exemplar mapping; the i-th entry
contains the index of the exemplar the i-th
sample belongs to
sim
:similarity matrix; only available if
the preceding clustering method was called with
includeSim=TRUE
.
call
:method call of the preceding clustering method
signature(x="ExClust")
: see
plot-methods
signature(x="ExClust", y="matrix")
: see
plot-methods
signature(x="ExClust")
: see
heatmap-methods
signature(x="ExClust", y="matrix")
: see
heatmap-methods
signature(object="ExClust")
: see
show-methods
signature(object="ExClust")
: see
labels-methods
signature(object="ExClust", k="ANY", h="ANY")
: see
cutree-methods
signature(x="ExClust")
: gives the number of
clusters.
signature(x="ExClust")
: see
sort-methods
signature(x="ExClust")
: see
coerce-methods
signature(object="ExClust")
: see
coerce-methods
In the following code snippets, x
is an ExClust
object.
signature(x="ExClust", i="index", j="missing")
:
x[[i]]
returns the i-th cluster as a list of indices
of samples belonging to the i-th cluster.
signature(x="ExClust", i="index", j="missing",
drop="missing")
: x[i]
returns a list of integer vectors with the
indices of samples belonging to this cluster. The list has as
many components as the argument i
has elements. A list is
returned even if i
is a single integer.
signature(x="ExClust")
: gives the similarity
matrix.
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
aggExCluster
, show-methods
,
plot-methods
, labels-methods
,
cutree-methods
, AggExResult
,
APResult
## create two Gaussian clouds cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(25, 0.7, 0.08), rnorm(25, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation aggres <- aggExCluster(sim) ## extract level with two clusters excl <- cutree(aggres, k=2) ## show details of clustering results show(excl) ## plot information about clustering run plot(excl, x)
## create two Gaussian clouds cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06)) cl2 <- cbind(rnorm(25, 0.7, 0.08), rnorm(25, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation aggres <- aggExCluster(sim) ## extract level with two clusters excl <- cutree(aggres, k=2) ## show details of clustering results show(excl) ## plot information about clustering run plot(excl, x)
Functions for Plotting of Heatmap
## S4 method for signature 'ExClust,missing' heatmap(x, y, ...) ## S4 method for signature 'ExClust,matrix' heatmap(x, y, ...) ## S4 method for signature 'ExClust,Matrix' heatmap(x, y, ...) ## S4 method for signature 'ExClust,sparseMatrix' heatmap(x, y, ...) ## S4 method for signature 'AggExResult,missing' heatmap(x, y, ...) ## S4 method for signature 'AggExResult,matrix' heatmap(x, y, Rowv=TRUE, Colv=TRUE, sideColors=NULL, col=heat.colors(12), base=0.05, add.expr, margins=c(5, 5, 2), cexRow=max(min(35 / nrow(y), 1), 0.1), cexCol=max(min(35 / ncol(y), 1), 0.1), main=NULL, dendScale=1, barScale=1, legend=c("none", "col"), ...) ## S4 method for signature 'matrix,missing' heatmap(x, y, ...) ## S4 method for signature 'missing,matrix' heatmap(x, y, ...)
## S4 method for signature 'ExClust,missing' heatmap(x, y, ...) ## S4 method for signature 'ExClust,matrix' heatmap(x, y, ...) ## S4 method for signature 'ExClust,Matrix' heatmap(x, y, ...) ## S4 method for signature 'ExClust,sparseMatrix' heatmap(x, y, ...) ## S4 method for signature 'AggExResult,missing' heatmap(x, y, ...) ## S4 method for signature 'AggExResult,matrix' heatmap(x, y, Rowv=TRUE, Colv=TRUE, sideColors=NULL, col=heat.colors(12), base=0.05, add.expr, margins=c(5, 5, 2), cexRow=max(min(35 / nrow(y), 1), 0.1), cexCol=max(min(35 / ncol(y), 1), 0.1), main=NULL, dendScale=1, barScale=1, legend=c("none", "col"), ...) ## S4 method for signature 'matrix,missing' heatmap(x, y, ...) ## S4 method for signature 'missing,matrix' heatmap(x, y, ...)
x |
a clustering result object of class
|
y |
a similarity matrix |
sideColors |
character vector of colors to be used for plotting color
bars that visualize clusters of the finest clustering level in
|
col |
color ramp used for the heatmap image; see
|
Rowv |
determines whether or not a row dendrogram should be
plotted. If |
Colv |
determines whether or not a column dendrogram should be
plotted. Fully analogous to |
base |
fraction of height used for the very first join in
dendrograms; see |
add.expr , margins , cexRow , cexCol , main
|
largely analogous to the standard
|
dendScale |
factor scaling the width of vertical and height of
horizontal dendrograms; values have to be larger than 0 and no
larger than 2. The default is 1 which corresponds to the same size
as the dendrograms plot by the standard
|
barScale |
factor scaling the width of color bars; values have to
be larger than 0 and no larger than 4. The default is 1 which
corresponds to half the width of the color bars plot by the standard
|
legend |
if |
... |
see details below |
The heatmap
functions provide plotting of heatmaps from several
different types of input object. The implementation is similar to the standard
graphics function heatmap
.
Plotting heatmaps via the plot
command as available in previous
versions of this package is still available for backward
compatibility.
If heatmap
is called for objects of classes
APResult
or ExClust
,
a heatmap of the similarity matrix in slot sim
of the parameter
x
is created with clusters grouped together and highlighted in
different colors. The order of clusters is determined by running
aggExCluster
on the clustering result x
. This
variant of heatmap
returns an invisible
AggExResult
object.
If heatmap
is called for an AggExResult
object that contains all levels of clustering, the heatmap is
displayed with the corresponding clustering dendrogram. If the
AggExResult
object is the result of running
aggExCluster
on a prior clustering result, the same heatmap
plot is produced as if heatmap
had been called on this
prior clustering result, however, returning the cluster hierarchy's
dendrogram
. In the latter case, color bars are plotted
to visualize the prior clustering result (see description of
argument sideColors
above).
All variants described above only work if the input object x
contains a slot sim
with the similarity matrix (which is only
the case if the preceding clustering method has been called with
includeSim=TRUE
). In case the slot sim
of x
does not
contain the similarity matrix, the similarity matrix must be supplied
as second argument y
.
All variants described above internally use heatmap
with signature
AggExResult,matrix
, so all arguments list above can be used for
all variants, as they are passed through using the ...
argument. All other arguments, analogously to the standard
heatmap
function, are passed on to the
standard function image
. This is
particularly useful for using alternative color schemes via the
col
argument.
The two variants with one of the two arguments being a matrix and one
being missing are just wrappers around the standard
heatmap
function with the aim to provide
compatibility with this standard case.
see details above
Similarity matrices can be supplied in dense or sparse format. Note, however, that sparse matrices are converted to full dense matrices before plotting heatmaps which may lead to memory and/or performance bottlenecks for larger data sets.
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult
,
AggExResult
, ExClust
,
apcluster
, apclusterL
,
aggExCluster
, cutree-methods
,
plot-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation using negative squared Euclidean apres <- apcluster(negDistMat(r=2), x, p=-0.1) ## plot heatmap clustering run heatmap(apres) ## rerun affinity propagation ## reuse similarity matrix from previous run apres2 <- apcluster(s=apres@sim, q=0.6) ## plot heatmap of second run heatmap(apres2, apres@sim) ## with alternate heatmap coloring, alternating color bars, and no dendrograms heatmap(apres2, apres@sim, Rowv=NA, Colv=NA, sideColors=c("darkgreen", "yellowgreen"), col=terrain.colors(12)) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(apres@sim, apres2) ## plot heatmap heatmap(cutree(aggres1, 2), apres@sim) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## show heatmap along with dendrogram heatmap(aggres2)
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation using negative squared Euclidean apres <- apcluster(negDistMat(r=2), x, p=-0.1) ## plot heatmap clustering run heatmap(apres) ## rerun affinity propagation ## reuse similarity matrix from previous run apres2 <- apcluster(s=apres@sim, q=0.6) ## plot heatmap of second run heatmap(apres2, apres@sim) ## with alternate heatmap coloring, alternating color bars, and no dendrograms heatmap(apres2, apres@sim, Rowv=NA, Colv=NA, sideColors=c("darkgreen", "yellowgreen"), col=terrain.colors(12)) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(apres@sim, apres2) ## plot heatmap heatmap(cutree(aggres1, 2), apres@sim) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## show heatmap along with dendrogram heatmap(aggres2)
Generate a label vector from an clustering result
## S4 method for signature 'ExClust' labels(object, type="names")
## S4 method for signature 'ExClust' labels(object, type="names")
object |
|
type |
specifies which kind of label vector should be created, see details below |
The function labels
creates a label vector from a clustering
result. Which kind of labels are produced is controlled by the
argument type
:
(default) returns the name of the exemplar to which each data sample belongs to; if no names are available, the function stops with an error;
returns the index of the cluster to which
each data sample belongs to, where clusters are enumerated
consecutively from 1 to the number of clusters (analogous to
other clustering methods like kmeans
);
returns the index of the exemplar to
which each data sample belongs to, where indices of exemplars are
within the original data, which is nothing else but the slot
object@idx
with attributes removed.
returns a label vector as long as the number of samples in the original data set
Ulrich Bodenhofer and Andreas Kothmeier
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create two simple clusters x <- c(1, 2, 3, 7, 8, 9) names(x) <- c("a", "b", "c", "d", "e", "f") ## compute similarity matrix (negative squared distance) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim) ## show details of clustering results show(apres) ## label vector (names of exemplars) labels(apres) ## label vector (consecutive index of exemplars) labels(apres, type="enum") ## label vector (index of exemplars within original data set) labels(apres, type="exemplars") ## now with agglomerative clustering aggres <- aggExCluster(sim) ## label (names of exemplars) labels(cutree(aggres, 2))
## create two simple clusters x <- c(1, 2, 3, 7, 8, 9) names(x) <- c("a", "b", "c", "d", "e", "f") ## compute similarity matrix (negative squared distance) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim) ## show details of clustering results show(apres) ## label vector (names of exemplars) labels(apres) ## label vector (consecutive index of exemplars) labels(apres, type="enum") ## label vector (index of exemplars within original data set) labels(apres, type="exemplars") ## now with agglomerative clustering aggres <- aggExCluster(sim) ## label (names of exemplars) labels(cutree(aggres, 2))
Functions for Visualizing Clustering Results
## S4 method for signature 'APResult,missing' plot(x, y, type=c("netsim", "dpsim", "expref"), xlab="# Iterations", ylab="Similarity", ...) ## S4 method for signature 'ExClust,matrix' plot(x, y, connect=TRUE, xlab="", ylab="", labels=NA, limitNo=15, ...) ## S4 method for signature 'ExClust,data.frame' plot(x, y, connect=TRUE, xlab="", ylab="", labels=NA, limitNo=15, ...) ## S4 method for signature 'AggExResult,missing' plot(x, y, main="Cluster dendrogram", xlab="", ylab="", ticks=4, digits=2, base=0.05, showSamples=FALSE, horiz=FALSE, ...) ## S4 method for signature 'AggExResult,matrix' plot(x, y, k=NA, h=NA, ...) ## S4 method for signature 'AggExResult,data.frame' plot(x, y, k=NA, h=NA, ...)
## S4 method for signature 'APResult,missing' plot(x, y, type=c("netsim", "dpsim", "expref"), xlab="# Iterations", ylab="Similarity", ...) ## S4 method for signature 'ExClust,matrix' plot(x, y, connect=TRUE, xlab="", ylab="", labels=NA, limitNo=15, ...) ## S4 method for signature 'ExClust,data.frame' plot(x, y, connect=TRUE, xlab="", ylab="", labels=NA, limitNo=15, ...) ## S4 method for signature 'AggExResult,missing' plot(x, y, main="Cluster dendrogram", xlab="", ylab="", ticks=4, digits=2, base=0.05, showSamples=FALSE, horiz=FALSE, ...) ## S4 method for signature 'AggExResult,matrix' plot(x, y, k=NA, h=NA, ...) ## S4 method for signature 'AggExResult,data.frame' plot(x, y, k=NA, h=NA, ...)
x |
a clustering result object of class
|
y |
a matrix or data frame (see details below) |
type |
a string or array of strings indicating which
performance measures should be plotted; valid values are
|
xlab , ylab
|
labels for axes of 2D plots; ignored if |
labels |
names used for variables in scatter plot matrix
(displayed if |
limitNo |
if the number of columns/features in |
connect |
used only if clustering is plotted on original data,
ignored otherwise. If |
main |
title of plot |
ticks |
number of ticks used for the axis on the left side of the plot (applies to dendrogram plots only, see below) |
digits |
number of digits used for the axis tickmarks on the left side of the plot (applies to dendrogram plots only, see below) |
base |
fraction of height used for the very first join; defaults to 0.05, i.e. the first join appears at 5% of the total height of the dendrogram. |
showSamples |
if |
horiz |
if |
k |
level to be selected when plotting a single clustering
level of cluster hierarchy (i.e. the number of clusters; see
|
h |
cut-off to be used when plotting a single clustering
level of cluster hierarchy (see |
... |
all other arguments are passed to the plotting command that
are used internally, |
If plot
is called for an APResult
object
without specifying the second argument y
,
a plot is created that displays graphs of performance
measures over execution time of the affinity propagation run.
This only works if apcluster
was called with
details=TRUE
.
If plot
is called for an APResult
object
along with a matrix or data frame as argument y
, then the dimensions of
the matrix determine the behavior of plot
:
If the matrix y
has two columns, y
is
interpreted as the original data set. Then a plot of
the clustering result superimposed on the original
data set is created. Each cluster is displayed in a
different color. The exemplar of each cluster is highlighted
by a black square. If connect
is TRUE
, lines
connecting the cluster members to their exemplars are drawn.
This variant of plot
does not return any value.
If y
has more than two columns, clustering results are
superimposed in a sort of scatter plot matrix. The variant
that y
is interpreted as similarity matrix if it is
quadratic has been removed in version 1.3.2. Use
heatmap
instead.
If y
has only one column, an error is displayed.
If plot
is called for an ExClust
object
along with a matrix or data frame as argument y
, then
plot
behaves exactly the same as described in the previous
paragraph.
If plot
is called for an AggExResult
object
without specifying the second argument y
, then a dendrogram
plot is drawn. This variant returns an invisible
dendrogram
object. The showSamples
argument
determines whether a complete dendrogram or a dendrogram of clusters
is plotted (see above). If the option horiz=TRUE
is used, the
dendrogram is rotated. Note that, in this case, the margin to the
right of the plot may not be wide enough to accommodate long
cluster/sample labels. In such a case, the figure margins have to
be widened before plot
is called.
If plot
is called for an AggExResult
object
along with a matrix or data frame y
, y
is
again interpreted
as original data set. If one of the two arguments k
or
h
is present, a clustering is cut out from the cluster hierarchy
using cutree
and this clustering is displayed with the
original data set as described above. This variant of
plot
returns an invisible ExClust
object
containing the extracted clustering.
see details above
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult
,
AggExResult
, ExClust
,
heatmap-methods
, apcluster
,
apclusterL
, aggExCluster
,
cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(x=apres) ## show dendrograms plot(aggres1) plot(aggres1, showSamples=TRUE) ## show clustering result for 4 clusters plot(aggres1, x, k=4) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## show dendrogram plot(aggres2) ## show heatmap along with dendrogram heatmap(aggres2) ## show clustering result for 2 clusters plot(aggres2, x, k=2) ## cluster iris data set data(iris) apIris <- apcluster(negDistMat(r=2), iris, q=0) plot(apIris, iris)
## create two Gaussian clouds cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE) ## plot information about clustering run plot(apres) ## plot clustering result plot(apres, x) ## perform agglomerative clustering of affinity propagation clusters aggres1 <- aggExCluster(x=apres) ## show dendrograms plot(aggres1) plot(aggres1, showSamples=TRUE) ## show clustering result for 4 clusters plot(aggres1, x, k=4) ## perform agglomerative clustering of whole data set aggres2 <- aggExCluster(negDistMat(r=2), x) ## show dendrogram plot(aggres2) ## show heatmap along with dendrogram heatmap(aggres2) ## show clustering result for 2 clusters plot(aggres2, x, k=2) ## cluster iris data set data(iris) apIris <- apcluster(negDistMat(r=2), iris, q=0) plot(apIris, iris)
Determines meaningful ranges for affinity propagation input preference
## S4 method for signature 'matrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'Matrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'dgTMatrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'sparseMatrix' preferenceRange(s, exact=FALSE)
## S4 method for signature 'matrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'Matrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'dgTMatrix' preferenceRange(s, exact=FALSE) ## S4 method for signature 'sparseMatrix' preferenceRange(s, exact=FALSE)
s |
an |
exact |
flag indicating whether exact ranges should be computed,
which is relatively slow; if bounds are sufficient,
supply |
Affinity Propagation clustering relies on an appropriate choice of input preferences. This function helps in finding a good choice by determining meaningful lower and upper bounds.
If the similarity matrix s
is sparse or if it contains
-Inf
similarities, only the similarities are taken into account
that are specified in s
and larger than -Inf
. In such
cases, the lower bound returned by preferenceRange
need not
correspond to one or two clusters. Moreover, it may also happen in
degenerate cases that the lower bound exceeds the upper bound.
In such a case, no warning or error is issued, so it is the user's
responsibility to ensure a proper interpretation of the results.
The method apclusterK
makes use of this function
internally and checks the plausibility of the result
returned by preferenceRange
.
returns a vector with two entries, the first of which is the minimal input preference (which would lead to 1 or 2 clusters) and the second of which is the maximal input prefence (which would lead to as many clusters as data samples).
Ulrich Bodenhofer and Andreas Kothmeier
https://github.com/UBod/apcluster
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## create similarity matrix sim <- negDistMat(x, r=2) ## determine bounds preferenceRange(sim) ## determine exact range preferenceRange(sim, exact=TRUE)
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## create similarity matrix sim <- negDistMat(x, r=2) ## determine bounds preferenceRange(sim) ## determine exact range preferenceRange(sim, exact=TRUE)
Display methods for S4 classes APResult
,
ExClust
, and AggExResult
## S4 method for signature 'APResult' show(object) ## S4 method for signature 'ExClust' show(object) ## S4 method for signature 'AggExResult' show(object)
## S4 method for signature 'APResult' show(object) ## S4 method for signature 'ExClust' show(object) ## S4 method for signature 'AggExResult' show(object)
object |
an object of class
|
show
displays the most important information stored in
object
.
For APResult
objects,
the number of data samples, the number of clusters, the number of
iterations, the input preference, the final objective
function values, the vector of exemplars, the list of clusters and
for leveraged clustering the selected sample subset are printed.
For ExClust
objects,
the number of data samples, the number of clusters,
the vector of exemplars, and list of clusters are printed.
For AggExResult
objects,
only the number of data samples and the maximum
number of clusters are printed. For retrieving a particular
clustering level, use the function cutree
.
For accessing more detailed information, it is necessary to
access the slots of object
directly. Use
str
to get a compact overview of all slots of an object.
show
returns an invisible NULL
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
APResult
,
ExClust
, AggExResult
,
cutree-methods
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim) ## show details of clustering results show(apres) ## apply agglomerative clustering to apres aggres <- aggExCluster(sim, apres) ## display overview of result show(aggres) ## show clustering level with two clusters show(cutree(aggres, 2))
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) x <- rbind(cl1, cl2) ## compute similarity matrix (negative squared Euclidean) sim <- negDistMat(x, r=2) ## run affinity propagation apres <- apcluster(sim) ## show details of clustering results show(apres) ## apply agglomerative clustering to apres aggres <- aggExCluster(sim, apres) ## display overview of result show(aggres) ## show clustering level with two clusters show(cutree(aggres, 2))
Compute similarity matrices from data set
negDistMat(x, sel=NA, r=1, method="euclidean", p=2) expSimMat(x, sel=NA, r=2, w=1, method="euclidean", p=2) linSimMat(x, sel=NA, w=1, method="euclidean", p=2) corSimMat(x, sel=NA, r=1, signed=TRUE, method="pearson") linKernel(x, sel=NA, normalize=FALSE)
negDistMat(x, sel=NA, r=1, method="euclidean", p=2) expSimMat(x, sel=NA, r=2, w=1, method="euclidean", p=2) linSimMat(x, sel=NA, w=1, method="euclidean", p=2) corSimMat(x, sel=NA, r=1, signed=TRUE, method="pearson") linKernel(x, sel=NA, normalize=FALSE)
x |
input data to be clustered; if |
sel |
selected samples subset; vector of row indices for x in increasing order (see details below) |
r |
exponent (see details below) |
w |
radius (see details below) |
signed |
take sign of correlation into account (see details below) |
normalize |
see details below |
method |
type of distance measure to be used; for |
p |
exponent for Minkowski distance; only used for
|
negDistMat
creates a square matrix of mutual
pairwise similarities of data vectors as negative distances. The
argument r
(default is 1) is used to transform the resulting
distances by computing the r-th power (use r=2
to obtain
negative squared distances as in Frey's and Dueck's demos), i.e.,
given a distance d, the resulting similarity is computed as
. With the parameter
sel
a subset of samples
can be specified for distance calculation. In this case not the
full distance matrix is computed but a rectangular similarity matrix
of all samples (rows) against the subset (cols) as needed for
leveraged clustering. Internally, the computation of distances is
done using an internal method derived from
dist
. All options of this function except
diag
and upper
can be used, especially method
which allows for selecting different distance measures.
Note that, since version 1.4.4. of the package, there is an additional
method "discrepancy"
that implements Weyl's discrepancy measure.
expSimMat
computes similarities in a way similar to
negDistMat
, but the transformation of distances to similarities
is done in the following way:
The parameter sel
allows the creation of a rectangular
similarity matrix. As above, r is an exponent. The parameter w controls
the speed of descent. r=2
in conjunction with Euclidean
distances corresponds to the well-known Gaussian/RBF kernel,
whereas r=1
corresponds to the Laplace kernel. Note that these
similarity measures can also be understood as fuzzy equality relations.
linSimMat
provides another way of transforming distances
into similarities by applying the following transformation to a
distance d:
Thw parameter sel
is used again for creation of a rectangular
similarity matrix. Here w
corresponds to a maximal radius of
interest. Note that this is a fuzzy equality relation with respect to
the Lukasiewicz t-norm.
Unlike the above three functions, linKernel
computes pairwise
similarities as scalar products of data vectors, i.e. it corresponds,
as the name suggests, to the “linear kernel”. Use parameter
sel
to compute only a submatrix of the full kernel matrix as
described above. If normalize=TRUE
, the values are scaled to
the unit sphere in the following way (for two samples x
and
y
:
The function corSimMat
computes pairwise similarities as
correlations. It uses link[stats:cor]{cor}
internally.
The method
argument is passed on to link[stats:cor]{cor}
.
The argument r
serves as an exponent with which the correlations
can be transformed. If signed=TRUE
(default), negative correlations are
taken into account, i.e. two samples are maximally dissimilar if they
are negatively correlated. If signed=FALSE
, similarities are
computed as absolute values of correlations, i.e. two samples are
maximally similar if they are positively or negatively correlated and
the two samples are maximally dissimilar if they are uncorrelated.
Note that the naming of the argument p
has been chosen for
consistency with dist
and previous versions
of the package. When using leveraged AP in
conjunction with the Minkowski distance, this leads to conflicts with
the input preference parameter p
of
apclusterL
. In order to avoid that, use the above
functions without x
argument to create a custom similarity
measure with fixed parameter p
(see example below).
All functions listed above return square or rectangular matrices of similarities.
Ulrich Bodenhofer, Andreas Kothmeier, and Johannes Palme
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: doi:10.1126/science.1136800.
Micchelli, C. A. (1986) Interpolation of scattered data: distance matrices and conditionally positive definite functions. Constr. Approx. 2, 11-20.
De Baets, B. and Mesiar, R. (1997) Pseudo-metrics and T-equivalences. J. Fuzzy Math. 5, 471-481.
Bauer, P., Bodenhofer, U., and Klement, E. P. (1996) A fuzzy algorithm for pixel classification based on the discrepancy norm. In Proc. 5th IEEE Int. Conf. on Fuzzy Systems, volume III, pages 2007–2012, New Orleans, LA. DOI: doi:10.1109/FUZZY.1996.552744.
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## create negative distance matrix (default Euclidean) sim1 <- negDistMat(x) ## compute similarities as squared negative distances ## (in accordance with Frey's and Dueck's demos) sim2 <- negDistMat(x, r=2) ## compute RBF kernel sim3 <- expSimMat(x, r=2) ## compute similarities as squared negative distances ## all samples versus a randomly chosen subset ## of 50 samples (for leveraged AP clustering) sel <- sort(sample(1:nrow(x), nrow(x)*0.25)) sim4 <- negDistMat(x, sel, r=2) ## example of leveraged AP using Minkowski distance with non-default ## parameter p cl1 <- cbind(rnorm(150, 0.2, 0.05), rnorm(150, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) apres <- apclusterL(s=negDistMat(method="minkowski", p=2.5, r=2), x, frac=0.2, sweeps=3, p=-0.2) show(apres)
## create two Gaussian clouds cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) ## create negative distance matrix (default Euclidean) sim1 <- negDistMat(x) ## compute similarities as squared negative distances ## (in accordance with Frey's and Dueck's demos) sim2 <- negDistMat(x, r=2) ## compute RBF kernel sim3 <- expSimMat(x, r=2) ## compute similarities as squared negative distances ## all samples versus a randomly chosen subset ## of 50 samples (for leveraged AP clustering) sel <- sort(sample(1:nrow(x), nrow(x)*0.25)) sim4 <- negDistMat(x, sel, r=2) ## example of leveraged AP using Minkowski distance with non-default ## parameter p cl1 <- cbind(rnorm(150, 0.2, 0.05), rnorm(150, 0.8, 0.06)) cl2 <- cbind(rnorm(100, 0.7, 0.08), rnorm(100, 0.3, 0.05)) x <- rbind(cl1, cl2) apres <- apclusterL(s=negDistMat(method="minkowski", p=2.5, r=2), x, frac=0.2, sweeps=3, p=-0.2) show(apres)
Rearrange clusters according to sort criterion
## S4 method for signature 'ExClust' sort(x, decreasing=FALSE, sortBy=c("aggExCluster", "size", "nameExemplar", "noExemplar"), ...)
## S4 method for signature 'ExClust' sort(x, decreasing=FALSE, sortBy=c("aggExCluster", "size", "nameExemplar", "noExemplar"), ...)
x |
|
decreasing |
logical indicating if sorting should be done in decreasing order, see details below |
sortBy |
sort criterion, see details below |
... |
further arguments are ignored; only defined for S3 method consistency |
The function sort
takes an APResult
or ExClust
clustering object x
and creates
a new clustering object of the same class, but with clusters arranged
according to the sort criterion passed as argument sortBy
:
(default) order clusters as they
would appear in the dendrogram produced by
aggExCluster
. This is also the same ordering in
which the clusters are arranged by heatmap
.
Note that this only works if the similarity matrix is included
in the input object x
, otherwise an error message is
produced.
sorts clusters according to their size (from small to large).
sorts clusters according to the names of the examplars (if available, otherwise an error is produced).
sorts clusters according to the indices of the examplars.
If decreasing
is TRUE
, the order is reversed and, for
example, sortBy="size"
sorts clusters with such that the larger
clusters come first.
Note that the cluster numbers of x
are not preserved by
sort
, i.e. the cluster no. 1 of the object returned by
sort
is the one that has been ranked first by sort
,
which may not necessarily coincide with cluster no. 1 of the original
clustering object x
.
Note that this is an S3 method (whereas all other methods in this
package are S4 methods). This inconsistency has been introduced in
order to avoid interoperability problems with the BiocGenerics
package which may overwrite the definition of the sort
generic
if it is loaded after the apcluster package.
returns a copy of x
, but with slots exemplars
and
clusters
(see APResult
or ExClust
) reordered.
Ulrich Bodenhofer
https://github.com/UBod/apcluster
Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: doi:10.1093/bioinformatics/btr406.
## create two Gaussian clouds cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06)) cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05)) x <- rbind(cl1,cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7) show(apres) ## show dendrogram plot(aggExCluster(x=apres)) ## default sort order: like in heatmap or dendrogram show(sort(apres)) ## show dendrogram (note the different cluster numbers!) plot(aggExCluster(x=sort(apres))) ## sort by size show(sort(apres, decreasing=TRUE, sortBy="size"))
## create two Gaussian clouds cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06)) cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05)) x <- rbind(cl1,cl2) ## run affinity propagation apres <- apcluster(negDistMat(r=2), x, q=0.7) show(apres) ## show dendrogram plot(aggExCluster(x=apres)) ## default sort order: like in heatmap or dendrogram show(sort(apres)) ## show dendrogram (note the different cluster numbers!) plot(aggExCluster(x=sort(apres))) ## sort by size show(sort(apres, decreasing=TRUE, sortBy="size"))