Network propagation is a versatile computational technique widely used for analyzing network structures, particularly in the biological sciences. The fundamental idea behind network propagation is to spread information, or “signals,” across a network, where nodes represent elements like genes or proteins and edges represent their relationships, such as interactions or correlations. This process amplifies weak signals by distributing initial data points, often starting from a small subset of known nodes, across the network to uncover patterns and relationships that might not be evident from direct interactions alone.
A key feature of network propagation is its ability to account for both local and global network structures. It allows information from initially identified nodes to spread iteratively through connected nodes, helping identify other relevant nodes that may not be directly connected but are part of a broader network pathway. This makes it useful in various fields, such as integrating multi-omics data, predicting protein functions, and uncovering relationships in large-scale biological networks.
Several popular algorithms, such as Random Walk with Restart (RWR) and Heat Diffusion (HD), are used to implement network propagation. These approaches differ in how they manage signal spreading, but both rely on iteratively updating the values associated with each node based on its neighbors until a steady state is reached. RWR, for example, introduces a “restart” mechanism that allows the signal to occasionally return to its source, ensuring that the propagation stays close to the initial nodes and maintains a balance between local and global influences.
Network propagation algorithms are essential tools for exploring complex networks, such as protein-protein interaction networks or multi-omics data in bioinformatics. These algorithms work by diffusing signals (or information) across the network, with each node’s score being influenced by its connections and neighboring nodes. Several popular algorithms extend this basic concept, each with its strengths and optimizations for specific contexts.
DIAMOnD (Disease Module Detection): DIAMOnD is designed to identify disease modules within a protein-protein interaction network. It iteratively adds nodes (genes) to a set of known disease genes based on their network proximity, aiming to build a coherent module where the added nodes are likely involved in the same biological process as the initial set. DIAMOnD excels at prioritizing disease-associated genes by focusing on topological proximity, but its effectiveness can decline when large numbers of candidate genes are considered.
GUILD (Genes Underlying Inheritance Linked Disorders): GUILD is a framework that integrates various prioritization methods to rank genes based on their relevance to certain diseases. It implements multiple network propagation techniques like NetScore, NetZcore, NetShort, fFlow, and NetRank. For example, NetScore relies on message passing, where nodes transmit information to their neighbors, and the scores of nodes are updated iteratively. NetZcore, on the other hand, normalizes node scores using random networks, reducing the bias towards highly connected nodes. This versatility allows GUILD to be applied across different types of networks, making it suitable for a wide range of applications, such as drug target discovery and gene prioritization.
Random Walk with Restart (RWR): One of the most widely used propagation algorithms, RWR simulates a random walker that moves across the network, occasionally “restarting” at the original node. This ensures that the propagation remains influenced by the initial input data. RWR is highly effective for tasks like gene prioritization and protein function prediction because it balances local and global network information, maintaining proximity to the starting nodes.
Heat Diffusion (HD): Heat Diffusion is another popular propagation method that models the spreading of heat (information) over the network. Unlike RWR, it operates in continuous time and diffuses signals according to a time parameter that controls the extent of spreading. HD is particularly effective for integrating multi-omics data, such as gene expression and protein interaction data, to identify biologically relevant network regions.
Cowen, L., Ideker, T., Raphael, B. et al. Network propagation: a universal amplifier of genetic associations. Nat Rev Genet 18, 551–562 (2017).
Giovanni, V. et al. Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery, Briefings in Bioinformatics 25 (2024).