Using clustbench

The Python version of the clustering-benchmarks package for Python can be installed from PyPI, e.g., via a call to:

pip3 install clustering-benchmarks

from the command line. Alternatively, please use your favourite Python package manager.

Once installed, we can import it by calling:

import clustbench

Below we discuss its basic features/usage.


To learn more about Python, check out Marek’s recent open-access (free!) textbook Minimalist Data Wrangling in Python [23].

Fetching Benchmark Data

The datasets from the Benchmark Suite (v1.1.0) can be queried easily. It is best to download the whole repository onto our disk first. Let us assume they are available in the following directory:

# load from a local library (download the suite manually)
import os.path
data_path = os.path.join("~", "Projects", "clustering-data-v1")

Here is the list of the currently available benchmark batteries:

## ['fcps', 'g2mg', 'graves', 'h2mg', 'mnist', 'other', 'sipu', 'uci', 'wut']

We can ask for the names of the dataset in an example battery:

battery = "wut"
print(clustbench.get_dataset_names(battery, path=data_path))
## ['circles', 'cross', 'graph', 'isolation', 'labirynth', 'mk1', 'mk2', 'mk3', 'mk4', 'olympic', 'smile', 'stripes', 'trajectories', 'trapped_lovers', 'twosplashes', 'windows', 'x1', 'x2', 'x3', 'z1', 'z2', 'z3']

For instance, let us load the wut/x2 dataset:

dataset = "x2"
b = clustbench.load_dataset(battery, dataset, path=data_path)

The above call returned a named tuple. For instance, the README file can be accessed via the description field:

## Author: Eliza Kaczorek (Warsaw University of Technology)
## `labels0` come from the Author herself.
## `labels1` were generated by Marek Gagolewski.
## `0` denotes the noise class (if present).

What is more, data is the data matrix, labels gives the list of all ground truth partitions (encoded as label vectors), and n_clusters gives the corresponding numbers of subsets. In case of any doubt, we can always consult the official documentation of the clustbench.load_dataset function.

For instance, here is the shape (n and d) of the data matrix, the number of reference partitions, and their cardinalities k.

print(, len(b.labels), b.n_clusters)
## (120, 2) 2 [3 4]

The following figure (generated via a call to genieclust.plots.plot_scatter) illustrates the benchmark dataset at hand.

import genieclust
for i in range(len(b.labels)):
    plt.subplot(1, len(b.labels), i+1)
    genieclust.plots.plot_scatter(, labels=b.labels[i]-1, axis="equal", title=f"labels{i}"

Figure 7: An example benchmark dataset and the corresponding ground truth labels.

Fetching Precomputed Results

Say we would like to study one of the sets of precomputed clustering results stored in the following directory:

results_path = os.path.join("~", "Projects", "clustering-results-v1", "original")

The results can be fetched by calling:

method_group = "Genie"  # or "*" for everything
res = clustbench.load_results(
    method_group, b.battery, b.dataset, b.n_clusters, path=results_path
## ['Genie_G0.1', 'Genie_G0.3', 'Genie_G0.5', 'Genie_G0.7', 'Genie_G1.0']

We thus have got access to data generated by the Genie algorithm with different gini_threshold parameter settings.

Computing External Cluster Validity Measures

Different external cluster validity measures can be computed by calling clustbench.get_score:

pd.Series({  # for aesthetics
    method: clustbench.get_score(b.labels, res[method])
    for method in res.keys()
## Genie_G0.1    0.870000
## Genie_G0.3    0.870000
## Genie_G0.5    0.590909
## Genie_G0.7    0.666667
## Genie_G1.0    0.010000
## dtype: float64

By default, adjusted asymmetric accuracy is applied. As explained in the tutorial, we compare the predicted clusterings against all the reference partitions (ignoring noise points) and report the maximal score.

Let us depict the results for the "Genie_G0.3" method:

method = "Genie_G0.3"
for i, k in enumerate(res[method].keys()):
    plt.subplot(1, len(res[method]), i+1)
    genieclust.plots.plot_scatter(, labels=res[method][k]-1, axis="equal", title=f"{method}; k={k}"

Figure 8: Results generated by Genie.

Applying Clustering Methods Manually

Naturally, the aim of this benchmark framework is also to test new methods. We can use clustbench.fit_predict_many to generate all the partitions required to compare ourselves against the reference labels.

For instance, let us investigate the behaviour of the k-means algorithm:

import sklearn.cluster
m = sklearn.cluster.KMeans()
res["KMeans"] = clustbench.fit_predict_many(m,, b.n_clusters)
clustbench.get_score(b.labels, res["KMeans"])
## 0.9848484848484849

We see that k-means (which specialises in detecting symmetric Gaussian-like blobs) performs better than Genie on this particular dataset.

method = "KMeans"
for i, k in enumerate(res[method].keys()):
    plt.subplot(1, len(res[method]), i+1)
    genieclust.plots.plot_scatter(, labels=res[method][k]-1, axis="equal", title=f"{method}; k={k}"

Figure 9: Results generated by K-Means.

For more functions, please refer to the package’s documentation (in the next section). Moreover, Colouriser: Planar Data Editor describes a standalone application that can be used for preparing our own two-dimensional datasets.