# questions about a simple random walk

Discussion in 'Probability' started by hanrahan398, Nov 9, 2010.

1. ### hanrahan398Guest

Consider the simple random walk given by {S_n} = sum_i=1_to_n (A_i)
where each A_i is equiprobably 1 or -1.

Run a large number of trials, each of length n.

Clearly the expected mean of S_n is zero and its expected standard
deviation is sqrt(n). So we can define the z-score Z_n = (S_n)/
sqrt(n).

Now define the statistic u_n = sum_i=1_to_n (Z_i / n^k), where k >=1
is a constant.

Run a large number of trials.

1) is it true that its mean is zero?

2) what is its expected standard deviation?

3) is it normal? if not, what is the probability that u_n > t, where t
is a positive constant?

Thanks!
Michael

hanrahan398, Nov 9, 2010

2. ### Ray VicksonGuest

Are the Z_1, Z_2, ..., Z_n computed from a single A_i-sample of size n
(case 1), or are they separate z-scores from n independent samples of
size n (case 2)? I will assume you mean case 1, so Z_1 = A_1/sqrt(1),
Z_2 = (A_1 + A_2)/sqrt(2), etc.

(1) Since u_n is a sum of mean-zero random variables, it has mean
zero: the mean of the sum is the sum of the means---no independence
needed.

(2) However, the terms of u_n are correlated, so computing the
variance is more complicated. We can pull the 1/n^k outside the
summation that defines u_n, so it is enough to look at v_n = sum_{i=1}
^n Z_i, so that u_n = v_n / n^k. We have E(Z_i) = 0 and Var(Z_i) = 1
for all i. For i < j we have Z_j = (A_1 + ... + A_j)/sqrt(j) = [S_i +
S_{j-i}]/sqrt(j), where S_{j-1} = A_{i+1}+...+A_j. We have E(Z_i Z_j)
= E[S_i * (S_i + S_{j-i}]/sqrt(i*j) = Var(S_i)/[sqrt(i)*sqrt(j)] + 0,
since S_i and S_{j-i} are independent with 0 means. Thus, E(Z_i * Z_j)
= i/[sqrt(i)*sqrt(j)] = sqrt(i/j), hence cov(Z_i,Z_j) = E(Z_i * Z_j) -
E(Z_i)*E(Z_j) = sqrt(i/j). We have Var(v_n) = sum(Var(Z_i)) + 2*sum_{i
< j} cov(Z_i,Z_j) = n + sum_{i < j} sqrt(i/j).

(3) The random variables v_n (and u_n) are NOT normally distributed,
since the terms Z_i are not normally distributed. However, for large
n, most of the summands Z_i involve a large number of terms, so will
be "approximately" normal. In fact, we can try to look at the moment-
generating function M(w) = E[exp(w*u_n)] for large n, and see whether
it is roughly of the form const*exp(-b*w^2), which would be the mgf of
a normal random variable.
This will be difficult to compute exactly.

R.G. Vickson

Ray Vickson, Nov 9, 2010

3. ### hanrahan398Guest

Many thanks for all the work you've done here, Ray - I am much
obliged!

I have to admit, I am close to the limit of my current level of
understanding of statistics here.

I did mean case 1, with Z_i's defined as you say.

But unfortunately I mistyped the definition of u_n. Rather than
"u_n = sum_i=1_to_n (Z_i / n^k), where k >=1 is a constant"
I should have typed:
"u_n = sum_i=1_to_n (Z_i / i^k), where k >=1 is a constant"
and then further defined
t_n = u_n/(sum_j=1_to_n (1/j^k))

Thus for example u_3 is the sum Z_1/(1^k) + Z_2/(2^k) + Z_3/(3^k).

If k=1 then t_3 is (Z_1/1 + Z_2/2 + Z_3/3 ) / (1 + 1/2 + 1/3).

If k=2, then t_3 is (Z_1/1 + Z_2/4 + Z_3/9 ) / (1 + 1/4 + 1/9).

i.e. given a large number of trials of length n, what are its expected
mean and s.d., and does it tend to normality for large n?

The basic idea is that a lot of 1's or -1's near the beginning have a
bigger effect on t_n than a lot near the end.

Thanks!
Kind regards,
Michael

hanrahan398, Nov 10, 2010
4. ### Ray VicksonGuest

I will follow the Maple convention of writing a/[b*c] as a/b/c, etc.

The terms of u_n are A_1/sqrt(1)/1, (A_1+A_2)/sqrt(2)/2^k,
(A_1+A_2+A_3)/sqrt(3)/3^k, etc. In u_n, what is the coefficient of
A_1? It is U_1 = 1 + 1/sqrt(2)/2^k + 1/sqrt(3)/3^k + ... + 1/sqrt(n)/
n^k. The coefficient of A_2 is U_2 = 1/sqrt(2)/2^k + ... + 1/sqrt(n)/
n^k,... and the coefficient of A_i is U_i = sum{1/sqrt(j)/j^k, j =
i..n}.

Thus, u_n has the form u_n = U_1*A_1 + U_2*A_2+ ... + U_n*A_n, where
U_i is written above. The terms are independent because the A_i are
mutually independent, so E(u_n)= 0 (sum of terms of mean 0), and
Var(u_n) = sum{U_i^2 * Var(A_i), i=1..n} = sum{[sum{1/sqrt(j)/
j^k,j=i..n)]^2, i=1..n}. (The variance of a sum of independent terms
of the form U_i*A_i is the sum of the variances; and Var(U_i*A_i) =
U_i^2 * Var(A_i) = U_i^2.)

The moment-generating function of u_n is Fn(w) = E[exp(w*u_n)] =
E[exp(w*U_1*A_1) * exp(w*U_2*A_2) * ... * exp(w*U_n*A_n)]. Since the
individual factors exp(w*U_i*A_i) are mutually independent, the
expected value of the product is the product of the expected values:
Fn(w) = product{E[exp(w*U)i*A_i)],i=1..n}. We have E[exp(w*U)i*A_i)] =
(1/2)*exp(w*U_i*1) + (1/2)*exp(w*U_i*(-1)) = cosh(w*U_i), so we
finally have Fn(w) = product{cosh(w*U_i),i=1..n] From this, you can
get the characteristic function of u_n and, in principle, invert it
(inverse Fourier transform) to get the cumulative distribution Pr{u_n
<= t}; in practice, it is not so easy, though. Numerical inverse
transform methods are available; see http://www.jstor.org/pss/2684571
..

Alternatively, you could use B_i = 0 or 2 with probability 1/2
(instead of A_i), to get an expression of the form u_n = c_n + v_n,
where c_n = -sum(U_i) and v_n = sum(U_i*B_i). (This writes A_i = -1 +
B_i and separates out the nonrandom "-1" part from the random B_i
part.) Thus, Fn(w) = exp(-w*c_n)*product(1/2 + 1/2 *
exp(2*w*U_i),i=1..n). Since all the B_i are >= 0, we can use Laplace-
transform inversion methods to get the distribution Gn(t) = Pr{v_n <=
t} of v_n;
see http://www.columbia.edu/~ww2040/Fall03/LaplaceInversionJoC95.pdf
(complete article freely available). We can then get Pr{u_n <= t} as
Pr{v_n <= t - c_n}.

R.G. Vickson

Ray Vickson, Nov 10, 2010