Exercise: Power method for large graphsΒΆ

In the lecture, we used the eig function to compute pagerank. Since this is unsuitable for large graphs, we pursue an alternate idea, also from the lecture: the pagerank vector, being the eigenvector of the dominant eigenvalue of a positive stochastic matrix, is the limit of the sequence $$ x, \quad (P^t) x, \quad (P^t)^2 x, \quad (P^t)^3x, \quad \ldots, $$ which we can computationally approach. This is an instance of the power method, a topic well studied in numerical analysis. This method only needs to apply the matrix $P^t$ repeatedly (and has no need for other operations found in general eigensolvers that carry more memory overhead). This exercise shows you the practical problems with using a general eigensolver as the graph size increases and asks you to implement the power method to be able to solve on large graphs.

Task 1: Compute the stationary distribution of the Markov chain from Gambler's ruin with $p = 0.4, N = 10$. (Do this task with eig.) Do you get more than one stationary distribution?

Task 2: Consider the adjacency matrix (independent of $p$) of the random walk and introduce a restart probability. Using it, compute the pagerank for all states of the Markov chain with $N = 10$ and restart probabilities $r=0.1, 0.01, 10^{-3},$ and $10^{-4}.$

Task 3: Compute the pagerank of the ruin state for $r=0.1, N=1000$. How much farther can you go increasing $N$ in your computing environment, while continuing to use eig?

Task 4: Implement the power method for a positive stochastic matrix $P$ as a python function, with the inputs indicated below.

def powerP(x, aPt, r=0.1, maxn=1000, tol=1e-10):
    """ Apply power iterations to a positive stochastic matrix P.    
     - x: initial probability distribution for the power method,
     - aPt: function that returns P.T @ x given x
     - r: restart probability, 
     - maxn: apply at most 'maxn' power iterations
     - tol: quit if successive iterations differ by less than 'tol'. 
    OUTPUT: Vector of pageranks if converged. """

This function should start with a random initial probability distribution $x$ and compute $(P^t)^n x$ for increasing $n$ until $\| (P^t)^n x - (P^t)^{n-1} x \|$ becomes smaller than a given input tolerance. To save memory and flops, you should not create the transition matrix $P$ in memory, rather, you should make a function that applies $P^t$ to a vector, and pass that function as one of the arguments aPt to the powerP function.

Apply your function to compute the pagerank of the ruin state for $r=0.1, N=100000$.