Python Graph Analysis: Essential Algorithms and Practical Applications for Network Problem Solving

#programming #devto #python #softwareengineering

As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!

Let's talk about graphs. In simple terms, a graph is just a way to show how things are connected. Think of your social media friends, the roads in a city, or even how websites link to each other. That's a graph. My job often involves making sense of these connections using Python. Today, I want to share some of the most useful methods I use to work with graphs, from the basic building blocks to more complex analysis. I'll write this as if I'm explaining it to you while we code together.

First, we need to decide how to store our graph in the computer's memory. There are two main ways, and the choice depends on your graph.

If your graph has few connections compared to the number of things (we call this a sparse graph), use an adjacency list. It's like keeping a phone book for each point (or node) that lists only its direct friends (or neighbors). It saves space.

If your graph is densely connected, where almost everything is linked to everything else, an adjacency matrix might be better. It's a big grid (a table) that marks a connection between two nodes with a number, often 1. Checking if two things are connected is very fast with this method.

In practice, I often start with a library called NetworkX because it gives me a simple way to handle both. But understanding what happens underneath is crucial. Here's how I might build it from scratch to see the mechanics.

import networkx as nx
import matplotlib.pyplot as plt
from typing import Dict, List, Set, Tuple
import numpy as np

class GraphRepresentations:
    def __init__(self):
        # The adjacency list: a dictionary where each key is a node.
        # The value is a list of its neighbors and the weight of the connection.
        self.adjacency_list = {}
        # The adjacency matrix: a 2D array.
        self.adjacency_matrix = None
        self.node_count = 0

    def add_edge_adjacency_list(self, u: int, v: int, weight: float = 1.0):
        """Add a link between node 'u' and node 'v' to the list."""
        if u not in self.adjacency_list:
            self.adjacency_list[u] = []
        if v not in self.adjacency_list:
            self.adjacency_list[v] = []

        # Add 'v' to the list of 'u's neighbors.
        self.adjacency_list[u].append((v, weight))
        # If the graph is undirected, add the reverse link too.
        self.adjacency_list[v].append((u, weight))

        # Keep track of the highest node number we've seen.
        self.node_count = max(self.node_count, u + 1, v + 1)

    def build_adjacency_matrix(self):
        """Convert our adjacency list into a matrix."""
        n = self.node_count
        # Create a grid of zeros of size n by n.
        self.adjacency_matrix = np.zeros((n, n))

        # Fill the grid. If node 'u' is connected to 'v', place the weight at that position.
        for u in self.adjacency_list:
            for v, weight in self.adjacency_list[u]:
                self.adjacency_matrix[u][v] = weight

    def get_neighbors(self, node: int) -> List[Tuple[int, float]]:
        """Who is this node connected to?"""
        return self.adjacency_list.get(node, [])

    def degree_centrality(self) -> Dict[int, float]:
        """A simple measure of importance: how many connections does a node have?"""
        centrality = {}
        total_nodes = len(self.adjacency_list)

        for node in self.adjacency_list:
            # Degree centrality is connections divided by (total nodes - 1).
            centrality[node] = len(self.adjacency_list[node]) / (total_nodes - 1)

        return centrality

    def to_networkx(self) -> nx.Graph:
        """Convert our homemade graph to a NetworkX graph for drawing."""
        G = nx.Graph()
        for u in self.adjacency_list:
            for v, weight in self.adjacency_list[u]:
                G.add_edge(u, v, weight=weight)
        return G

# Let's build a tiny example network.
graph = GraphRepresentations()

# These are connections. (Node 0 connects to Node 1 with weight 1.0, and so on).
edges = [
    (0, 1, 1.0), (0, 2, 1.0), (0, 3, 1.0),
    (1, 2, 1.0), (1, 4, 1.0),
    (2, 3, 1.0), (2, 4, 1.0),
    (3, 5, 1.0),
    (4, 5, 1.0), (4, 6, 1.0),
    (5, 6, 1.0)
]

for u, v, w in edges:
    graph.add_edge_adjacency_list(u, v, w)

graph.build_adjacency_matrix()

print("Let's look at the adjacency list for the first 3 nodes:")
for node in range(3):
    print(f"  Node {node} is connected to: {graph.get_neighbors(node)}")

print(f"\nOur matrix is {graph.adjacency_matrix.shape[0]} nodes by {graph.adjacency_matrix.shape[1]} nodes.")
print(f"Only {np.count_nonzero(graph.adjacency_matrix)} spots in the matrix have a connection.")

centrality = graph.degree_centrality()
print("\nWho are the most connected nodes? (Degree Centrality):")
for node, cent in sorted(centrality.items())[:5]:
    print(f"  Node {node}: {cent:.3f}")

Once our graph is built, we need to walk through it. There are two fundamental strategies: Breadth-First Search (BFS) and Depth-First Search (DFS).

Think of BFS like taking a cautious, level-by-level approach. You visit all your immediate neighbors first, then their immediate neighbors, and so on. This is perfect for finding the shortest path in a network where all steps cost the same, like finding the fewest number of friend links between you and someone else.

DFS is different. It's like exploring a maze by always taking the first new path you see and going as far down as you can before backtracking. It's great for figuring out if you can get from A to B at all, for finding loops in a network, or for arranging tasks that depend on each other.

Here is how I implement these walks.

from collections import deque, defaultdict
from typing import List, Set, Dict, Optional

class GraphTraversal:
    def __init__(self, adjacency_list: Dict[int, List[Tuple[int, float]]]):
        self.adjacency_list = adjacency_list

    def bfs_shortest_path(self, start: int, target: int) -> Optional[List[int]]:
        """Find the shortest number of steps from 'start' to 'target'."""
        if start not in self.adjacency_list or target not in self.adjacency_list:
            return None

        # We use a queue (First-In, First-Out).
        queue = deque([start])
        visited = {start}
        # A dictionary to remember how we got to each node.
        parent = {start: None}

        while queue:
            current = queue.popleft()

            if current == target:
                # We found it! Let's trace our steps back.
                path = []
                node = current
                while node is not None:
                    path.append(node)
                    node = parent[node]
                return path[::-1]  # Reverse it to get start -> target.

            # Look at all neighbors of the current node.
            for neighbor, _ in self.adjacency_list.get(current, []):
                if neighbor not in visited:
                    visited.add(neighbor)
                    parent[neighbor] = current  # We came from 'current'.
                    queue.append(neighbor)

        return None  # No path exists.

    def dfs_recursive(self, start: int) -> List[int]:
        """Go deep first, using a function that calls itself."""
        visited = set()
        traversal_order = []

        def dfs_visit(node: int):
            visited.add(node)
            traversal_order.append(node)

            for neighbor, _ in self.adjacency_list.get(node, []):
                if neighbor not in visited:
                    dfs_visit(neighbor)  # Go deeper!

        dfs_visit(start)
        return traversal_order

    def dfs_iterative(self, start: int) -> List[int]:
        """Go deep first, using a manual stack (like a pile of books)."""
        stack = [start]
        visited = set()
        traversal_order = []

        while stack:
            node = stack.pop()  # Take from the top of the pile.

            if node not in visited:
                visited.add(node)
                traversal_order.append(node)

                # Add all neighbors to the stack.
                neighbors = [n for n, _ in self.adjacency_list.get(node, [])]
                for neighbor in reversed(neighbors):
                    if neighbor not in visited:
                        stack.append(neighbor)

        return traversal_order

    def detect_cycle_directed(self) -> bool:
        """Does this directed graph have a loop? (Like A->B->C->A)."""
        # We'll give each node a "color".
        # 0 = White (unvisited), 1 = Gray (visiting), 2 = Black (fully visited).
        color = {}

        def dfs_cycle(node: int) -> bool:
            color[node] = 1  # Mark as "currently being explored".

            for neighbor, _ in self.adjacency_list.get(node, []):
                if color.get(neighbor) == 0:  # If the neighbor is unvisited...
                    if dfs_cycle(neighbor):
                        return True
                elif color.get(neighbor) == 1:  # If we find a node that's "currently being explored"...
                    return True  # We've found a cycle!

            color[node] = 2  # Mark as completely done.
            return False

        for node in self.adjacency_list:
            color[node] = 0  # Start everyone as unvisited.

        for node in self.adjacency_list:
            if color[node] == 0:
                if dfs_cycle(node):
                    return True
        return False

    def topological_sort(self) -> Optional[List[int]]:
        """Arrange nodes so all dependencies come first. Only works if there's NO cycle."""
        if self.detect_cycle_directed():
            print("Can't sort topologically; there's a cycle!")
            return None

        visited = set()
        result_stack = []

        def topological_visit(node: int):
            visited.add(node)

            for neighbor, _ in self.adjacency_list.get(node, []):
                if neighbor not in visited:
                    topological_visit(neighbor)

            # After visiting all neighbors, add this node to the stack.
            result_stack.append(node)

        for node in self.adjacency_list:
            if node not in visited:
                topological_visit(node)

        # The last node added is the one with no outgoing dependencies.
        # So we reverse the stack.
        return result_stack[::-1]

    def connected_components(self) -> List[List[int]]:
        """Find all isolated groups in an undirected graph."""
        visited = set()
        components = []

        def dfs_component(start: int) -> List[int]:
            """Find all nodes reachable from 'start'."""
            stack = [start]
            component = []

            while stack:
                node = stack.pop()
                if node not in visited:
                    visited.add(node)
                    component.append(node)

                    for neighbor, _ in self.adjacency_list.get(node, []):
                        if neighbor not in visited:
                            stack.append(neighbor)
            return component

        for node in self.adjacency_list:
            if node not in visited:
                a_component = dfs_component(node)
                components.append(a_component)

        return components

# Let's test our traversal class with the earlier graph.
traversal = GraphTraversal(graph.adjacency_list)

path = traversal.bfs_shortest_path(0, 6)
print(f"BFS says the shortest path from node 0 to 6 is: {path}")

dfs_recursive = traversal.dfs_recursive(0)
dfs_iterative = traversal.dfs_iterative(0)
print(f"DFS (recursive) from node 0: {dfs_recursive}")
print(f"DFS (iterative) from node 0: {dfs_iterative}")

components = traversal.connected_components()
print(f"This graph has these connected groups: {components}")

# Example for cycle detection: A directed graph with a loop.
directed_graph_with_cycle = {
    0: [(1, 1.0), (2, 1.0)],
    1: [(3, 1.0)],
    2: [(1, 1.0)],
    3: [(4, 1.0)],
    4: [(2, 1.0)]  # This creates a cycle: 2 -> 1 -> 3 -> 4 -> 2
}

cycle_checker = GraphTraversal(directed_graph_with_cycle)
print(f"\nDoes the directed graph have a cycle? {cycle_checker.detect_cycle_directed()}")

# Now for a graph without a cycle.
directed_graph_acyclic = {
    0: [(1, 1.0), (2, 1.0)],
    1: [(3, 1.0)],
    2: [(3, 1.0)],
    3: [(4, 1.0)],
    4: []  # No connection back.
}

acyclic_traversal = GraphTraversal(directed_graph_acyclic)
topological_order = acyclic_traversal.topological_sort()
print(f"A valid order to process tasks (topological sort) is: {topological_order}")

Now, what if the connections have different costs? A short road and a long highway both connect two cities, but they aren't the same. For this, we need shortest path algorithms that understand weights.

Dijkstra's algorithm is the most famous. It's like a smart BFS that always explores the cheapest known path next. It only works when all costs are zero or positive. You can't have a "negative distance."

For situations with potential negative costs (which can happen in financial or signal routing networks), we use the Bellman-Ford algorithm. It's slower but more general, and it can also tell us if there's a negative cost cycle—a loop you could travel forever to reduce the total cost infinitely, which usually indicates a problem in the model.

Let's code them.

import heapq
from typing import Dict, List, Tuple, Optional
import math

class ShortestPath:
    def __init__(self, adjacency_list: Dict[int, List[Tuple[int, float]]]):
        self.adjacency_list = adjacency_list

    def dijkstra(self, start: int) -> Tuple[Dict[int, float], Dict[int, Optional[int]]]:
        """
        Find the cheapest cost from 'start' to every other node.
        Returns:
            distances: The lowest cost to reach each node.
            previous: The best previous node to trace the path back.
        """
        # Start with infinite cost for everyone.
        distances = {node: math.inf for node in self.adjacency_list}
        distances[start] = 0
        # Keep track of the node we came from for the best path.
        previous = {node: None for node in self.adjacency_list}

        # A priority queue: (current_best_distance, node)
        priority_queue = [(0, start)]

        while priority_queue:
            current_dist, current_node = heapq.heappop(priority_queue)

            # If we found a longer path to this node already, skip it.
            if current_dist > distances[current_node]:
                continue

            # Check all neighbors.
            for neighbor, weight in self.adjacency_list.get(current_node, []):
                distance = current_dist + weight
                # If this new path to the neighbor is cheaper...
                if distance < distances[neighbor]:
                    distances[neighbor] = distance
                    previous[neighbor] = current_node
                    heapq.heappush(priority_queue, (distance, neighbor))

        return distances, previous

    def dijkstra_path(self, start: int, target: int) -> Optional[Tuple[List[int], float]]:
        """Get the actual cheapest path and its total cost."""
        distances, previous = self.dijkstra(start)

        if distances[target] == math.inf:
            return None  # No path exists.

        # Reconstruct the path by following the 'previous' links.
        path = []
        node = target
        while node is not None:
            path.append(node)
            node = previous[node]
        path.reverse()

        return path, distances[target]

    def bellman_ford(self, start: int) -> Optional[Tuple[Dict[int, float], Dict[int, Optional[int]]]]:
        """
        Find shortest paths, allowing for negative weights.
        Returns None if a negative-weight cycle is reachable from the start.
        """
        distances = {node: math.inf for node in self.adjacency_list}
        distances[start] = 0
        previous = {node: None for node in self.adjacency_list}

        # We need to relax all edges (V-1) times, where V is number of nodes.
        nodes = list(self.adjacency_list.keys())
        for _ in range(len(nodes) - 1):
            updated = False
            for u in self.adjacency_list:
                for v, weight in self.adjacency_list.get(u, []):
                    if distances[u] + weight < distances[v]:
                        distances[v] = distances[u] + weight
                        previous[v] = u
                        updated = True
            if not updated:
                break  # No changes, we can stop early.

        # One more pass to check for negative-weight cycles.
        for u in self.adjacency_list:
            for v, weight in self.adjacency_list.get(u, []):
                if distances[u] + weight < distances[v]:
                    # We can still improve the path -> negative cycle exists.
                    print("Warning: Graph contains a negative-weight cycle reachable from start.")
                    return None

        return distances, previous

# Let's create a weighted graph.
weighted_graph = {
    0: [(1, 4), (2, 1)],
    1: [(3, 1)],
    2: [(1, 2), (3, 5)],
    3: [(4, 3)],
    4: []
}

sp = ShortestPath(weighted_graph)

print("=== Dijkstra's Algorithm ===")
distances, previous = sp.dijkstra(0)
print("Cheapest cost from node 0 to all others:")
for node, dist in distances.items():
    print(f"  To node {node}: cost = {dist:.1f}, came from {previous[node]}")

path_to_4 = sp.dijkstra_path(0, 4)
if path_to_4:
    path, cost = path_to_4
    print(f"\nCheapest path from 0 to 4: {path}, total cost: {cost}")

# Graph with a negative weight (but no negative cycle).
graph_with_neg = {
    0: [(1, 4), (2, 5)],
    1: [(3, 1), (2, -2)],  # Negative weight here.
    2: [(3, 3)],
    3: []
}

print("\n=== Bellman-Ford Algorithm (with negative weight) ===")
sp_neg = ShortestPath(graph_with_neg)
bf_result = sp_neg.bellman_ford(0)

if bf_result:
    bf_distances, bf_previous = bf_result
    print("Costs from node 0 (via Bellman-Ford):")
    for node, dist in bf_distances.items():
        print(f"  To node {node}: cost = {dist:.1f}")

Finding the shortest path is one thing. But what about the overall flow through a network? Imagine water pipes, internet data, or highway traffic. How much can we push from a source to a destination? That's a maximum flow problem.

The Edmonds-Karp algorithm is a specific implementation of the Ford-Fulkerson method. It uses BFS to find the best augmenting paths to send flow through, repeatedly, until no more can be sent. The 'minimum cut' it finds shows the network's biggest bottleneck.

from collections import deque

class MaxFlow:
    """
    Finds the maximum flow from a source node to a sink node in a directed graph.
    Uses the Edmonds-Karp algorithm (Ford-Fulkerson with BFS).
    """
    def __init__(self, adjacency_list: Dict[int, List[Tuple[int, float]]]):
        # We'll store the graph as a capacity matrix for simplicity.
        self.nodes = set()
        for u in adjacency_list:
            self.nodes.add(u)
            for v, cap in adjacency_list[u]:
                self.nodes.add(v)
        self.n = max(self.nodes) + 1
        self.capacity = [[0] * self.n for _ in range(self.n)]

        # Build the capacity matrix from the adjacency list.
        for u in adjacency_list:
            for v, cap in adjacency_list[u]:
                self.capacity[u][v] = cap

    def bfs_augmenting_path(self, source: int, sink: int, parent: List[int]) -> bool:
        """Find a path from source to sink with spare capacity using BFS."""
        visited = [False] * self.n
        queue = deque([source])
        visited[source] = True

        while queue:
            u = queue.popleft()
            for v in range(self.n):
                # If we haven't visited v and there's spare capacity on edge u->v...
                if not visited[v] and self.capacity[u][v] > 0:
                    queue.append(v)
                    visited[v] = True
                    parent[v] = u
                    if v == sink:
                        return True
        return False

    def edmonds_karp(self, source: int, sink: int) -> float:
        """Calculate the maximum flow from source to sink."""
        parent = [-1] * self.n
        max_flow = 0

        # Keep finding paths while BFS can find one.
        while self.bfs_augmenting_path(source, sink, parent):
            # Find the minimum residual capacity along the path we found.
            path_flow = math.inf
            s = sink
            while s != source:
                path_flow = min(path_flow, self.capacity[parent[s]][s])
                s = parent[s]

            # Update capacities along the path.
            v = sink
            while v != source:
                u = parent[v]
                self.capacity[u][v] -= path_flow  # Reduce forward capacity.
                self.capacity[v][u] += path_flow  # Increase reverse capacity (for undo flow).
                v = parent[v]

            max_flow += path_flow

        return max_flow

# Example: A simple network of pipes.
flow_graph_adj = {
    0: [(1, 10), (2, 10)],  # Source (0) can send 10 to node 1 and 10 to node 2.
    1: [(2, 2), (3, 15)],
    2: [(4, 10)],
    3: [(4, 10), (5, 15)],
    4: [(5, 10)],
    5: []  # Sink is node 5.
}

max_flow_solver = MaxFlow(flow_graph_adj)
source, sink = 0, 5
flow = max_flow_solver.edmonds_karp(source, sink)
print(f"\n=== Maximum Flow ===")
print(f"The maximum flow from node {source} to node {sink} is: {flow}")

Sometimes, we need to visit a series of points in the most efficient way, like a delivery truck visiting many houses. If you need to visit every node exactly once and return to start, that's the Traveling Salesperson Problem (TSP). It's famously hard for large graphs, but for small ones or with smart shortcuts, we can find good solutions.

A common approximate method uses a Minimum Spanning Tree (MST). A spanning tree connects all nodes with the fewest edges (no cycles). The "minimum" one does this with the lowest total weight. Kruskal's algorithm builds this by always adding the cheapest edge that doesn't create a loop.

Here’s Kruskal’s algorithm, followed by a simple 2-approximation for the metric TSP.

class UnionFind:
    """A helper class to manage groups of nodes (for Kruskal's algorithm)."""
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

    def find(self, u):
        if self.parent[u] != u:
            self.parent[u] = self.find(self.parent[u])
        return self.parent[u]

    def union(self, u, v):
        root_u = self.find(u)
        root_v = self.find(v)
        if root_u != root_v:
            if self.rank[root_u] > self.rank[root_v]:
                self.parent[root_v] = root_u
            elif self.rank[root_u] < self.rank[root_v]:
                self.parent[root_u] = root_v
            else:
                self.parent[root_v] = root_u
                self.rank[root_u] += 1
            return True
        return False

class SpanningAndTSP:
    def __init__(self, adjacency_list: Dict[int, List[Tuple[int, float]]]):
        self.adjacency_list = adjacency_list
        self.nodes = list(adjacency_list.keys())
        self.n = len(self.nodes)
        self.node_to_index = {node: idx for idx, node in enumerate(self.nodes)}

    def kruskal_mst(self) -> List[Tuple[float, int, int]]:
        """Find the Minimum Spanning Tree using Kruskal's algorithm."""
        edges = []
        # Collect all edges with weights.
        for u in self.adjacency_list:
            for v, weight in self.adjacency_list[u]:
                # Avoid duplicates for undirected graph.
                if u < v:
                    edges.append((weight, self.node_to_index[u], self.node_to_index[v]))

        # Sort edges by weight.
        edges.sort()

        uf = UnionFind(self.n)
        mst_edges = []

        for weight, u_idx, v_idx in edges:
            if uf.union(u_idx, v_idx):
                mst_edges.append((weight, u_idx, v_idx))
                if len(mst_edges) == self.n - 1:
                    break

        # Convert indices back to original node labels.
        result = []
        index_to_node = {idx: node for node, idx in self.node_to_index.items()}
        for weight, u_idx, v_idx in mst_edges:
            result.append((weight, index_to_node[u_idx], index_to_node[v_idx]))
        return result

    def tsp_double_mst_approx(self, start_node: int) -> Tuple[List[int], float]:
        """
        A 2-approximation for the metric TSP.
        Steps: 1. Find an MST. 2. Do a DFS preorder walk on the MST.
        This gives a tour that visits all nodes (with repeats).
        We then shortcut to get a Hamiltonian cycle.
        """
        if start_node not in self.node_to_index:
            raise ValueError("Start node not in graph.")

        # Step 1: Build MST (as an adjacency list).
        mst_edges = self.kruskal_mst()
        mst_adj = {node: [] for node in self.nodes}
        for weight, u, v in mst_edges:
            mst_adj[u].append((v, weight))
            mst_adj[v].append((u, weight))

        # Step 2: Perform DFS to get a preorder traversal.
        visited = set()
        preorder = []

        def dfs_preorder(node):
            visited.add(node)
            preorder.append(node)
            for neighbor, _ in mst_adj.get(node, []):
                if neighbor not in visited:
                    dfs_preorder(neighbor)

        dfs_preorder(start_node)

        # Step 3: Create the tour by adding the start node at the end.
        tour = preorder + [start_node]

        # Step 4: Shortcut - calculate the cost using the original graph's weights.
        # We assume the graph is complete for a true TSP; here we approximate from given edges.
        cost = 0
        for i in range(len(tour) - 1):
            u, v = tour[i], tour[i+1]
            # Find the weight in the original graph. (Assumes edge exists).
            found = False
            for neighbor, w in self.adjacency_list.get(u, []):
                if neighbor == v:
                    cost += w
                    found = True
                    break
            if not found:
                # If no direct edge, this approximation is not valid for this graph.
                # In practice, for TSP, we'd have a complete graph with distances.
                cost += math.inf
        return tour, cost

# Example: A small graph to find an MST and approximate TSP.
mst_graph_adj = {
    0: [(1, 10), (2, 15), (3, 20)],
    1: [(0, 10), (2, 35), (3, 25)],
    2: [(0, 15), (1, 35), (3, 30)],
    3: [(0, 20), (1, 25), (2, 30)]
}

solver = SpanningAndTSP(mst_graph_adj)
mst = solver.kruskal_mst()
print("\n=== Minimum Spanning Tree (Kruskal's) ===")
print("Edges in the MST (weight, node1, node2):")
for edge in mst:
    print(f"  {edge}")

print("\n=== Approximate TSP Tour (Double MST method) ===")
tour, approx_cost = solver.tsp_double_mst_approx(0)
print(f"Tour visiting all nodes starting and ending at 0: {tour}")
print(f"Approximate total tour cost: {approx_cost}")

Finally, let's discuss how to find important nodes. Centrality measures help with this. We already saw degree centrality (how many connections). Another powerful one is betweenness centrality. It finds nodes that act as bridges—the ones that sit on the shortest paths between many other pairs of nodes. Controlling such a node can control the flow of information in a network.

Here's a way to compute betweenness centrality using a modified BFS for each node.

class CentralityMeasures:
    def __init__(self, adjacency_list: Dict[int, List[Tuple[int, float]]]):
        self.adjacency_list = adjacency_list
        self.nodes = list(adjacency_list.keys())

    def betweenness_centrality(self) -> Dict[int, float]:
        """Compute betweenness centrality for all nodes (unweighted graph)."""
        centrality = {node: 0.0 for node in self.nodes}

        for s in self.nodes:
            # Single-source shortest paths from node s.
            # We'll use BFS since the graph is unweighted for this example.
            stack = []  # Will hold nodes in order of discovery.
            pred = {w: [] for w in self.nodes}  # Predecessors on shortest paths.
            dist = {w: -1 for w in self.nodes}
            sigma = {w: 0 for w in self.nodes}  # Number of shortest paths.

            dist[s] = 0
            sigma[s] = 1
            queue = deque([s])

            while queue:
                v = queue.popleft()
                stack.append(v)
                for neighbor, _ in self.adjacency_list.get(v, []):
                    if dist[neighbor] < 0:  # First time seeing neighbor?
                        queue.append(neighbor)
                        dist[neighbor] = dist[v] + 1
                    if dist[neighbor] == dist[v] + 1:  # Is this a shortest path to neighbor?
                        sigma[neighbor] += sigma[v]
                        pred[neighbor].append(v)

            # Accumulation - distribute centrality backwards.
            delta = {w: 0.0 for w in self.nodes}
            while stack:
                w = stack.pop()
                for v in pred[w]:
                    delta[v] += (sigma[v] / sigma[w]) * (1.0 + delta[w])
                if w != s:
                    centrality[w] += delta[w]

        # Normalization for undirected graph.
        n = len(self.nodes)
        if n > 2:
            for node in centrality:
                centrality[node] /= ((n - 1) * (n - 2) / 2.0)
        return centrality

# Let's compute centrality on our first simple graph.
centrality_calculator = CentralityMeasures(graph.adjacency_list)
betweenness = centrality_calculator.betweenness_centrality()
print("\n=== Betweenness Centrality ===")
print("Nodes that are crucial bridges (high betweenness):")
for node, cent in sorted(betweenness.items(), key=lambda x: x[1], reverse=True)[:5]:
    print(f"  Node {node}: {cent:.4f}")

To bring it all together, Python's NetworkX library provides production-ready implementations for almost everything we've built. It's what I use for real projects after prototyping. Here’s a quick example of using NetworkX to do in one line what took us many.

print("\n=== Using NetworkX for Production ===")
import networkx as nx

# Create a graph in one line.
G = nx.Graph()
G.add_edges_from([(u, v) for u, v, _ in edges])  # Using our original 'edges' list.

# Compute various metrics effortlessly.
print(f"NetworkX - Number of nodes: {G.number_of_nodes()}")
print(f"NetworkX - Number of edges: {G.number_of_edges()}")
print(f"NetworkX - Average clustering: {nx.average_clustering(G):.3f}")
print(f"NetworkX - Betweenness centrality (first 3 nodes):")
nx_betweenness = nx.betweenness_centrality(G)
for node in sorted(G.nodes())[:3]:
    print(f"  Node {node}: {nx_betweenness[node]:.4f}")

# Find the MST.
mst_nx = nx.minimum_spanning_tree(G)
print(f"\nNetworkX MST has {mst_nx.number_of_edges()} edges.")

Graphs are everywhere. Starting with a clear mental model of nodes and edges is half the battle. Whether you're choosing the right way to store your data, walking through it with BFS or DFS, finding optimal paths and flows, or identifying key players, these techniques form a core toolkit. I find that implementing them from scratch once, as we did here, makes using powerful libraries like NetworkX much more intuitive. You understand not just the 'how' but the 'why' behind the results. The next time you look at a social network map, a transportation grid, or a dependency chart, you'll see not just connections, but the rich structure and potential stories hidden within them.

📘 Checkout my latest ebook for free on my channel!

Be sure to like, share, comment, and subscribe to the channel!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!