In this project I explore the dynamics of random walks via computational simulation.
Consider a particle which starts at the origin of the $x-y$ plane. This particle will take a series of steps of magnitude 1 in a random direction. What will the particle's trajectory look like?
Below I implement a 2-D random walk function. The angle of the unit step, $\theta$ is taken from a uniform distribution from $[0,2\pi]$.
import random
import numpy as np
import matplotlib.pyplot as plt
def randomWalks2D (numSteps):
#set up a matrix of zeros to store the locations of the random walk
locations = np.zeros((numSteps+1, 2))
#take steps and store locations
for i in range(1, numSteps+1):
theta = random.random()*2* np.pi #random theta between 0 and 2 pi
move = np.array([np.cos(theta), np.sin(theta)])
locations[i] = locations[i-1] + move
return locations
walk = randomWalks2D(20)
plt.plot( walk[:,0], walk[:,1], alpha=0.5 )
plt.plot([0], [0],'r.')
plt.xlim(-10,10)
plt.ylim(-10,10);
walk = randomWalks2D(20)
plt.plot( walk[:,0], walk[:,1], alpha=0.5 )
plt.plot([0], [0],'r.')
plt.xlim(-10,10)
plt.ylim(-10,10);
Above are two separate simulations of the randomWalks2D function. We can see it is working properly.
1) a) What is the average distance of the random walk from the origin after n steps? Make a plot showing how the average distance of the random walk from the origin depends on n, and find a function that roughly fits your plot.
def avg_dist(numWalks, numSteps):
sum = 0
for i in range(numWalks):
walk = randomWalks2D(numSteps)
sum = sum + np.linalg.norm(walk[-1])
return sum/numWalks
list_of_avg_dists = [avg_dist(500, n) for n in range(1, 100)]
plt.plot(list_of_avg_dists)
plt.xlabel("Number of Steps")
plt.ylabel("Average Ending Distance");
The above graph shows the average distance from the center after $n$ steps. Each data point represents 500 simulations. This looks like it could be a square root graph. To check, I will square the values to see if I get a straight line and find the line of best fit.
sq_vals = [d**2 for d in list_of_avg_dists]
plt.plot(sq_vals)
plt.xlabel("Number of Steps")
plt.ylabel("Average Ending Distance^2");
Looks pretty linear! I will find the line of best fit. I will use $\texttt{scipy}$ to find the line of best fit.
from scipy import stats
result = stats.linregress(range(1,100), sq_vals)
result
LinregressResult(slope=0.7794837440338905, intercept=0.3649916377464777, rvalue=0.9947602039191759, pvalue=6.93612006056966e-98, stderr=0.008134021821916304)
m = result.slope
b = result.intercept
rootFit = [np.sqrt(m*x+b) if m*x+b > 0 else 0 for x in range(1,100)]
plt.plot(range(1,100), rootFit,'r-')
plt.plot(range(1,100), list_of_avg_dists, 'b-')
plt.xlabel("Number of Steps")
plt.ylabel("Average Ending Distance");
As we can see above, the average distance after $n$ steps follows the curve $y =(0.781 n - 0.0633)^{1/2}$.
(b) Since the walk is not on a grid, it’s very unlikely that it will return exactly to (0, 0). However, it may return to within a circle of radius 1/2 around the origin. What proportion of n-step random walks return at least once to within 1/2 unit of the origin? How does this proportion depend on n? Investigate for various n and make a plot of your results. Do you think that all random walks will eventually return near the origin? Why or why not?
Below is my implementation. The $\texttt{return_to_origin}$ will take a random walk trajectory and see if any steps after the first step are within $1/2$ of the origin. The $\texttt{prop_return}$ function will run many simulations of a given number of steps and return the porportion of them which returned within $1/2$ of the origin.
def return_to_origin(walk):
for i in walk[1:]:
if np.linalg.norm(i) < 1/2:
return True
return False
def prop_returns(numWalks, numSteps):
sum = 0
for _ in range(numWalks):
walk = randomWalks2D(numSteps)
if return_to_origin(walk) == True:
sum = sum + 1
return sum/ numWalks
list_of_prop_returns = [prop_returns(500, n) for n in range(1, 300)]
plt.plot(list_of_prop_returns, 'b.');
plt.xlabel("Step Number")
plt.ylabel("Proportion Returning to Origin");
The porportion of random walks which return to the sphere of radius 1/2 centered at the origin increases as the step size increases. Up to 300 steps, the growth of this graph slows down, but it does not seem to asymtote. Intuitively, it makes sense to believe that eventually the proportion will become 100% as stepsize goes to infinity.
2) What happens if you constrain the random walk to the region −5 ≤ y ≤ 5? Modify your code to keep the y-coordinate of the walk between −5 and 5. You may decide exactly how to implement this modification, but explain your choice and make a few plots of random walks to show that your modified implementation works. Then answer questions (a) and (b) for this constrained random walk.
To keep the random walk within the region of -5 < y < 5, I will reroll the random steps if the step would make the simulation go outside of the bounds. A rerolled step will not count as a step, so if the number of steps is 10, there will be 10 steps within the bounds.
def constrained_randomWalks2D (numSteps):
#set up a matrix of zeros to store the locations of the random walk
locations = np.zeros((numSteps+1, 2))
#initialize counter
i = 1
#changed the for loop to a while loop
while i <=numSteps:
theta = random.random()*2* np.pi
move = np.array([np.cos(theta), np.sin(theta)])
yloc = locations[i - 1, 1] + np.sin(theta)
if yloc > -5 and yloc < 5:
locations[i] = locations[i-1] + move
i = i + 1 #only increase the counter if a new location is updated
return locations
Here are a few tests to make sure the function is working properly:
randomWalks2D(5)
array([[ 0. , 0. ], [ 0.09257911, 0.99570533], [-0.9008849 , 0.88155952], [-1.90020081, 0.84457692], [-2.36258823, 1.73125492], [-1.65920686, 1.02044225]])
constrained_randomWalks2D(5)
array([[ 0. , 0. ], [-0.70106604, 0.71309635], [ 0.29678078, 0.77868382], [-0.36704162, 1.52657406], [-1.35540853, 1.67866238], [-2.34890155, 1.56476939]])
The size of $\texttt{constrained_randomWalks2D(5)}$ is the same as the original function, so I know that it will work with the other functions I have made above.
walk = constrained_randomWalks2D(1000)
plt.figure(figsize=(4, 3))
plt.plot( walk[:,0], walk[:,1], alpha=0.5)
plt.plot([0], [0],'r.')
plt.xlim(-20,20)
plt.ylim(-10,10);
The plot above shows a random walk with 1000 steps. We can see that the walk is constrained within -5 <y <5 because the walk never steps out of these bounds.
Now that I know this works as intended, let's see how this constrained random walk changed the behavior of the random walk. Let's first take a look at the average distance.
def avg_dist2(numWalks, numSteps):
sum = 0
for i in range(numWalks):
walk = constrained_randomWalks2D(numSteps)
sum = sum + np.linalg.norm(walk[-1])
return sum/numWalks
list_avg_dist2= [avg_dist2(500, n) for n in range(1, 100)]
plt.figure(figsize=(4, 3))
plt.plot(list_avg_dist2)
plt. xlabel("Step Number")
plt. ylabel("Average Distance");
The average distance for the constrained random walk has a similar shape to the unconstrained walk. Let's see what they look like on top of each other.
plt.plot(list_avg_dist2, label = "Constrained")
plt.plot(list_of_avg_dists, label = "Unconstrained")
plt.legend(loc = "lower right");
It looks like the two curves start to deviate around 15 steps.
I will also find the curve of best fit like before.
sq_vals2 = [d**2 for d in list_avg_dist2]
plt.plot(sq_vals2)
[<matplotlib.lines.Line2D at 0x7fe6140311d0>]
This curve almost looks linear, like the unconstrained walk. I will find the equation that best fits this:
slop = stats.linregress(range(1,100), sq_vals2).slope
print(slop)
intercept = stats.linregress(range(1,100), sq_vals2).intercept
print(intercept)
0.39849665802616036 4.8793205200891805
rootFit2 = [np.sqrt(slop*x+intercept) if m*x+b > 0 else 0 for x in range(1,100)]
plt.plot(range(1,100), rootFit2,'r-')
plt.plot(range(1,100), list_avg_dist2, 'b-')
plt.xlabel("Number of Steps")
plt.ylabel("Average Ending Distance");
Clearly, this graph does not follow a curve like $y = a\sqrt{x}$. I will see if a different exponential fits the data better.
If the data follows a curve that looks like $$y = A x^B,$$ then $$\log(y) = \log(A) + B \log(x)$$ which is an equation of a line, and we know how to find the parameters for a line. I will use this to find $A$ and $B$.
from scipy import stats
log_avg_dist2 = [np.log(d) for d in list_avg_dist2]
slope = stats.linregress(np.log(range(1,100)), log_avg_dist2).slope
intercept = stats.linregress(np.log(range(1,100)), log_avg_dist2).intercept
A = np.exp(intercept)
B = slope
fit = [A*x**B for x in range(1,100)]
plt.plot(range(1,100), fit,'r-')
plt.plot(range(1,100), list_avg_dist2, 'b-')
plt.xlabel("Number of Steps")
plt.ylabel("Average Ending Distance")
print('A = ',A)
print('B = ',B)
A = 1.0958975146963732 B = 0.39194147957097547
The average ending distance follows the curve $y = 1.09 x^{0.39}$ nicely.
Now, let's find how often the constrained walk returns $1/2$ unit ball.
def prop_returns2(numWalks, numSteps):
sum = 0
for _ in range(numWalks):
walk = constrained_randomWalks2D(numSteps)
if return_to_origin(walk) == True:
sum = sum + 1
return sum/ numWalks
list_con_returns= [prop_returns2(500, n) for n in range(1, 300)]
plt.plot(list_con_returns, 'r.')
plt.plot(list_of_prop_returns, 'b.')
plt.xlabel("Number of Steps")
plt.ylabel("Proportion Returning to Origin");
Above, in blue have the porportion of random walks that return to a sphere of radius 1/2 centered at the origin for various step sizes for the unconstrained walk, and in red I have the same plot for the constrained walk. We can see that initially the two have the same path, but as the step number increases, the constrained one is more likely to return to the center. This makes sense, because the constrained walks will not go farther than +/- 5 in the y direction, so you would expect more of them to return to the origin than the unconstrained version.
3) Investigate the average diameter for both the constrained and unconstrained random walks.
The diameter of a given walk is the largest distance away from the center for a given walk. I expect this will be different than the average distance which I have looked at above.
My implementation is below.
# get the largest distance away in a given walk
def diameter(walk):
dia_list = np.zeros(len(walk))
for i in range(len(walk)):
dia_list[i] = np.linalg.norm(walk[i])
return max(dia_list)
# avg_diameter where I can choose which random walk I want, constrained or unconstrained
def avg_diameter(numSteps, numWalks, constraint):
sum = 0
for _ in range (numWalks):
if constraint== True:
walk = constrained_randomWalks2D(numSteps)
if constraint== False:
walk = randomWalks2D(numSteps)
sum = sum + diameter(walk)
return sum/numWalks
con_list_dias = [avg_diameter(n , 500, True) for n in range (1, 100)]
un_list_dias = [avg_diameter(n, 500, False) for n in range(1,100)]
plt.plot(con_list_dias, 'r.', label = "Constrained")
plt.plot(un_list_dias, 'b.', label = "Unconstrained")
plt.xlabel("Number of Steps")
plt.ylabel("Average Diameter")
plt.legend(loc = "lower right");
Just like for the ending distances, the diameters of the unconstrained random walks ended up farther away than the constrained random walks. My hypothesis for why this is is because some of the y component of the the random steps for the constrained version are rerolled so they are alwasy within -5 to 5, so taking the magnitude of this will be smaller than if it was unconstrained.