hierarchical clustering
Definition
Hierarchical clustering is an unsupervised machine learning method that organizes data into a tree-like structure (dendrogram) based on similarity or distance metrics. It groups biological entities—such as genes, proteins, or samples—into nested clusters without requiring predefined categories. The algorithm works either bottom-up (agglomerative), merging similar items progressively, or top-down (divisive), splitting groups recursively. In life sciences, hierarchical clustering reveals natural groupings in high-dimensional data like gene expression profiles, identifying co-expressed genes, functional modules, or patient subtypes. The resulting dendrogram visualizes relationships at multiple scales, allowing researchers to cut the tree at different heights to obtain varying levels of granularity, making it invaluable for exploratory analysis and pattern discovery in omics data.
Visualize hierarchical clustering in Nodes Bio
Nodes Bio enables researchers to transform hierarchical clustering results into interactive network graphs where nodes represent biological entities and edge weights reflect clustering distances. Users can visualize dendrogram structures as layered networks, identify tightly connected modules, and overlay functional annotations or pathway information. The platform allows dynamic exploration of cluster membership at different cutoff thresholds, revealing how genes or proteins group together across biological conditions.
Visualization Ideas:
- Gene co-expression networks with hierarchical cluster modules highlighted by color
- Multi-scale protein interaction networks showing nested functional communities
- Patient similarity networks with dendrogram-derived subtype classifications
Example Use Case
A cancer researcher analyzes RNA-seq data from 200 tumor samples across multiple subtypes. Using hierarchical clustering on gene expression profiles, they identify five major patient clusters with distinct molecular signatures. By visualizing the clustering dendrogram alongside protein-protein interaction networks in Nodes Bio, they discover that one cluster uniquely overexpresses a module of immune checkpoint genes. This finding suggests a patient subgroup likely to respond to immunotherapy, guiding stratification for clinical trials and revealing potential biomarkers for treatment selection.