Randomness is everywhere

In 1827, R. Brown observed that pollen particles suspended in water under a microscope appear to randomly jiggle. In 1905, Einstein provided a theoretical explanation for why that was happening, where, in essence, water molecules randomly whack pollen particles from different directions, which can then be seen as pollen jiggle. This theory established the first link between the atomic theory - that everything is made of particles - and experiments, as at the time, debate was still open whether the atomic theory was valid. In the next few years series of experiments was conducted (notably Perrin’s experiments, for which he was awarded Nobel Prize in Physics), and the question of whether atomic theory is valid was settled.

Since then the idea has also been applied in finance. In 1973, Black, Scholes, and Merton showed how to calculate the fair value of a financial contract whose payoff depends on future price changes, assuming prices alike tiny particles, bounce randomly around. Their work turned uncertain price movements into something that could be priced using math instead of guesswork, a work for which the 1997 “Nobel Prize in Economics” was awarded.

More recently, the randomness ideas have been utilized in the field of Machine Learning to create software generating things like pictures and text. These methods work by slowly messing up data with randomness and learning how to undo it. Interested readers can read more about it in “Understanding Generative AI as Galton Board”.

The idea that many phenomena can be modeled by adding randomness turns out to be very useful. We’ll look at how to do calculus when randomness is involved, a subject more officially known as Stochastic Calculus. This topic does demand a solid comfort level with regular calculus and probability, but we won’t take a heavily formal approach. While this is a genuinely fun subject, it often gets buried under layers of technical detail. We’re not here for a hundred pages of formal proofs - instead, we’ll focus on the motivations, conceptual ideas, and definitions that make stochastic calculus interesting, and leave the full rigor for elsewhere.

Modeling Randomness

Random does not mean property-less, and in particular, we will consider Gaussian randomness as it is a very common type seen in practice. In particular, we will be interested in equations of the form:

$$ \Delta\tilde x = A(x(t),t)\Delta t + B(x(t),t)\Delta \tilde W $$

where \( \Delta \tilde W \) is a Gaussian with zero mean (a note on notation - we will be using tilde over variables to remind they are random variables). For example, if \( \tilde x \) is the stock price, its value at the next moment in time will depend predictably on price plus a random jump, and we might want to calculate \( \mathbb{E}[\tilde x(t)] \) and \( \mathrm{Var}[\tilde x(t)] \).

Note that \( \tilde x \) is a random variable due to \( \Delta \tilde W \), while also having a pretty regular calculus-looking part \( \Delta t \). Now, to find \( \tilde x \), we need to sum up the individual \( \Delta \tilde x \) as \( \Delta t \to 0 \). Having a look at the equation, the \( A\Delta t \) term looks pretty normal from regular calculus - if \( B=0 \), then \( x= x_0 + \int^{t}_{0} A\,dt' \). The unconventional bit is \( B\,\Delta \tilde W \), which is a random variable. The term is also sometimes called noise, in contrast to the deterministic part of the equation. Conceptually, the idea is the same - we want to sum up all individual \( B\Delta\tilde W \) as \( \Delta \tilde W\to 0 \). However, summing infinitesimal random variables is not what we do in regular calculus, so we will need to build up a new calculus, but for random variables, a.k.a. Stochastic Calculus.

Random variables approach to what?

It is good to recall from regular calculus that:

$$ I \equiv \int f(x),dx \equiv \lim_{\Delta x\to 0}\sum_i f(x_i)\Delta x_i $$

In particular, that integral is a limit of a sum, and that the sum “approaches” some value. In the random variable case, what does it mean for one random variable to “approach” another? Turns out this topic needs to be handled with some care. There is more than one way to define a notion of “approaches” for a sequence \( \tilde X_n \) as \( n\to\infty \). A natural idea would be to say that two random variables converge if their distributions converge, but it turns out that such a requirement does not imply that the random variable means or variances converge. A stronger definition is to require that the variance between two random variables approaches zero:

$$ \lim_{n\to\infty}\mathbb{E}\left[(\tilde X_n -\tilde X)^2\right]=0 $$

where \( \tilde X_n \) is our variable of interest and \( \tilde X \) is what it “approaches” to. This is known as convergence in mean-squared, and we will further use the following notation for the above:

$$ \tilde X = \lim_{n\to\infty}^{\mathrm{ms}} \tilde X_n $$

It can be shown that mean-square convergence implies not only their distributions converge but also expectations and variances. We have a definition for limit now, so stochastic integral is defined as what sum of random variables that converge in the mean-square sense:

$$ \tilde I \equiv \int\tilde fd\tilde W \equiv \lim_{n\to\infty}^{\mathrm{ms}} \sum_i \tilde f(t_i)\Delta \tilde W_i $$

How do ΔW̃ and Δt relate?

In the equation of \( \Delta \tilde x \), there is \( \Delta t \) and \( \Delta \tilde W \), but \( \Delta \tilde W \) also depends on \( \Delta t \) as the noise is something that happens over the time interval \( \Delta t \). We assume jumps over different disjoint \( \Delta t \) are independent, and as the jump is bigger the more time passes \( \Delta \tilde W \sim \mathcal N(0, \Delta t^\alpha) \) for some \( \alpha \). Let us consider a simplified case when \( A=0 \) and \( B=1 \), so \( \Delta \tilde x = \Delta \tilde W \). Then the solution is \( \tilde X=\sum_{i=1}^n \Delta \tilde W_i \). From standard probability, a sum of independent Gaussian random variables is another Gaussian with variances summed up. For a total time period \( T \) we have \( \Delta t = T/n \) and \(\operatorname{Var}(\tilde x(T)) = \sum_{i=1}^{n} \Delta t^\alpha = n \left(\frac{T}{n}\right)^\alpha = T^\alpha\, n^{1-\alpha}\). If \( \alpha>1 \) then \( \operatorname{Var}[\tilde x(T)] \to 0 \) as \( n\to\infty \), and if \( \alpha<1 \) then \( \operatorname{Var}[\tilde x(T)] \to \infty \) as \( n\to\infty \), so only for \( \alpha=1 \) does the variance remain finite and nontrivial, yielding \( \operatorname{Var}[\tilde x(T)] = T \). In other words, the noise term \( \Delta \tilde W \) can be seen as \( \mathcal N(0,1) \) but whose amplitude has been rescaled as \( \sqrt{\Delta t} \).

Itô Integral, Rules and Formula

We have come pretty close to defining the integral for noise; however, there is a final touch to add. In the definition above, it actually matters where \( x_i \) of \( \tilde f(x_i) \) is picked in the interval \( \Delta t_i \), which is in contrast to the regular Riemann integral, where picking it at any point in the interval yields the same integral. We will pick \( x_i \) at the start of \( \Delta t_i \), i.e. the function is not to peek into the future as it is the uncertain future modeled by noise, in which case the integral is known as the Itô integral (e.g, if \( x_i \) is picked in the middle of the interval, it is known as the Stratonovich integral). Putting it all together, we have arrived at a milestone - the Itô integral:

$$ I_{\mathrm{ito}} \equiv \int_{t_0}^{t} \tilde fd\tilde W = \lim_{n\to\infty}^{\mathrm{ms}} \sum_{i=1}^n \tilde f\left(x_i^{\text{start}}\right)\Delta \tilde W_i, \qquad \Delta \tilde W \sim \mathcal N(0,\Delta t). $$

We could now, for example, show that the Itô integral is linear from the definition, as one would expect for an integral. However, let us consider a general function \( f(x) \), and expand \( \Delta f \) as a Taylor series, and recall that \( \Delta \tilde x = A\,\Delta t + B\,\Delta \tilde W \) and calculate the Itô integral \( I = \lim \sum_{a,b} \Delta \tilde f = \lim \sum_{a,b} (\cdots)\Delta t^a \Delta \tilde W^b \equiv \int \sum_{a,b} (\cdots) dt^a d\tilde W^b = \sum_{a,b} \int (\cdots) dt^a d\tilde W^b \). We see that it is a sum of integrals with terms of the form \( dt^a\,dW^b \) with \( a\ge 0 \), \( b\ge 0 \). With each integral, one can plug them into the definition of the Itô integral and, with a bit of crunching, show that:

$$ dW^2 = dt, \qquad dW^n = 0 \qquad (n>2), \qquad dWdt = 0, \qquad dt^n = 0 \qquad (n>1) $$

These rules are to be understood in the context of the integral, e.g. \( \int (\cdots)\,dt\,d\tilde W = 0 \), or \( \int (\cdots)\,d\tilde W^2 = \int (\cdots)\,dt \). These are known as Itô’s rules. Applying these rules to the Taylor expanded sum for calculating \( f = \int d\tilde f \) one finds:

$$ d\tilde f = f’ Adt + \frac{1}{2}f’’ B^2dt + f’Bd\tilde W $$

This is known as the Itô’s formula or Itô’s lemma. In case \( f \) has explicit time dependence \( f(x,t) \), there will also be an extra time derivative term \( \partial_t f\,dt \). It looks quite similar to differential \( df \) from regular calculus, where one would treat \( dt \) and \( d\tilde W \) as independent variables and apply the chain rule, however, in the Itô’s formula, there is an extra \( \tfrac12 f'' B^2 dt \) term.

Black–Scholes

As a treat to ourselves, we will apply the above and outline the derivation of the Black-Scholes equation. The issue at hand is - what should be the fair price of an option (a contract whose payoff depends on the future price of a security) if the issuer does not want to take risk from random price moves? We model the security price \( S(t) \) as

$$ dS = \mu Sdt + \sigma SdW $$

This is to say that there is some constant proportional change of value over time, and Gaussian noise which is also proportional to the current price. Let the option value be a function of both the current price and time, i.e. \( V = V(S,t) \). Let’s use Itô’s formula, for which we worked quite hard to get to with its unusual quadratic term! \( V(S,t) \) gives \(dV = \big(\partial_t V + \mu S\partial_S V + \tfrac12 \sigma^2 S^2\partial_{SS}V\big)dt + \sigma S\partial_S VdW\). Now consider us selling an option, but we can also trade the underlying security. Define a portfolio \( P = cS - V \), meaning - we are holding \( c \) shares of the stock, but also on the hook for one option which we sold. Over a short interval, assuming that portfolio value only changes due to price changes, the portfolio change is \( dP = c\,dS - dV \). Substituting the formulas for \( dS \) (security price model) and \( dV \) (Itô’s formula) and collecting terms yields \(dP = \big(c\mu S - \partial_t V - \mu S\partial_S V - \tfrac12 \sigma^2 S^2\partial_{SS}V\big)dt + \sigma S\big(c - \partial_S V\big)dW\). We eliminate the random term by setting \( c = \partial_S V \), so \(dP = \big(-\partial_t V - \tfrac12 \sigma^2 S^2\partial_{SS}V\big)dt\). A riskless portfolio must grow at the risk-free interest rate \( r \), so \(dP=rPdt=r(cS - V)dt=r(S\partial_S V - V)dt\) Equating the two expressions for \( dP \) and rearranging yields the Black-Scholes equation:

$$ \frac12\sigma^2 S^2\partial_{SS}V + rS\partial_S V + \partial_t V = rV $$

Summary

We have done a brisk walk-through basics of Stochastic Calculus, and there are many roads to follow up in greater detail; for example, we did not discuss why the standard derivative notion does not work out in stochastic calculus. But we have had a start, and it is a fun topic built on a similar, but somewhat different basis than regular calculus. Future might be unknowable, but hey, it might throw out some averages!