In the lecture, we used the eig
function to compute pagerank. Since this is unsuitable for large graphs, we pursue an alternate idea, also from the lecture: the pagerank vector, being the eigenvector of the dominant eigenvalue of a positive stochastic matrix, is the limit of the sequence
$$
x, \quad (P^t) x, \quad (P^t)^2 x, \quad (P^t)^3x, \quad \ldots,
$$
which we can computationally approach. This is an instance of the power method, a topic well studied in numerical analysis. This method only needs to apply the matrix $P^t$ repeatedly (and has no need for other operations found in general eigensolvers that carry more memory overhead). This exercise shows you the practical problems with using a general eigensolver as the graph size increases and asks you to implement the power method to be able to solve on large graphs.
Task 1: Compute the stationary distribution of the Markov chain from Gambler's ruin with $p = 0.4, N = 10$. (Do this task with eig
.) Do you get more than one stationary distribution?
Task 2: Consider the adjacency matrix (independent of $p$) of the random walk and introduce a restart probability. Using it, compute the pagerank for all states of the Markov chain with $N = 10$ and restart probabilities $r=0.1, 0.01, 10^{-3},$ and $10^{-4}.$
Task 3: Compute the pagerank of the ruin state for $r=0.1, N=1000$. How much farther can you go increasing $N$ in your computing environment, while continuing to use eig
?
Task 4: Implement the power method for a positive stochastic matrix $P$ as a python function, with the inputs indicated below.
def powerP(x, aPt, r=0.1, maxn=1000, tol=1e-10):
""" Apply power iterations to a positive stochastic matrix P.
INPUTS:
- x: initial probability distribution for the power method,
- aPt: function that returns P.T @ x given x
- r: restart probability,
- maxn: apply at most 'maxn' power iterations
- tol: quit if successive iterations differ by less than 'tol'.
OUTPUT: Vector of pageranks if converged. """
This function should start with a random initial probability distribution $x$ and compute $(P^t)^n x$ for increasing $n$ until $\| (P^t)^n x - (P^t)^{n-1} x \|$ becomes smaller than a given input tolerance. To save memory and flops, you should not create the transition matrix $P$ in memory, rather, you should make a function that applies $P^t$ to a vector, and pass that function as one of the arguments aPt
to the powerP
function.
Apply your function to compute the pagerank of the ruin state for $r=0.1, N=100000$.