Here they are, on my bookshelf. All three volumes of Whitehead and Russell, *Principia Mathematica*.

# Category Archives: Mathematics

# MathJax

This site uses MathJax as implemented in the wordpress plugin MathJax-LaTeX.

Here is an example of how MathJax renders mathematics:

Given \((\mu,\sigma)\) and a setup \((\Omega,\mathcal{F},P,F,W)\), a *solution* on \((\Omega,\mathcal{F},P,F,W)\) of the SDE \((\mu,\sigma)\) is an \(N\) dimensional Itô process \(X\) on \((\Omega,\mathcal{F},P,F,W)\) with a potentially random initial value \(X(0)\), such that the process \(\mu(X,t)\) belongs to \(\mathcal{L}^{1}\), the process \(\sigma(X,t)\) belongs to \(\mathcal{L}^{2}\), and for all \(t \in [0,\infty)\),

$$X(t) = X(0) + \int_{0}^{t} \mu(X,s) \, ds + \int_{0}^{t} \sigma(X,s) \, dW$$

To use LaTex in comments on this site, you can enclose it in \(\backslash ( \ldots \backslash ) \) (for inline LaTex) or \( \$\$ \ldots \$\$ \) (for display LaTex).

# Unambiguous Stochastic Integration

Stochastic integrals with respect to a Wiener process are defined by a process of convergence in probability.

If you consider two different but equivalent probability measures, such as an original measure and an “equivalent martingale measure”, then convergence in probability under the new and the old measure are equivalent. So, it does not matter whether you define the stochastic integral using convergence in the old or in the new probability measure. On this count, there is nothing to worry about.

There is, however, a more subtle ambiguity that needs to be resolved.

Let us first state Girsanov’s theorem.

Let \(W\) be a \(K\) dimensional Wiener process with respect to an augmented filtration \(\mathcal{F}\).

Let \(\lambda\) be a \(K\) dimensional row vector valued process. Assume that \(\lambda\) is measurable and adapted and pathwise square integrable on bounded intervals. That is what is required in order for the stochastic integral of \(\lambda\) with respect to \(W\) to be well defined.

Define processes \(\eta[0,-\lambda]\) and \(W^{\lambda}\) by

$$\eta[0,-\lambda](t) = \exp \left[ – \frac{1}{2}\int_{0}^{t} \lambda \lambda^{\top} ds – \int_{0}^{t} \lambda \, dW \right]$$

and

$$W^{\lambda}(t) = \int_{0}^{t}\lambda^{\top}ds + W(t)$$

**Theorem**

*Girsanov’s theorem*

If \(E\eta[0,-\lambda](T) = 1\), then the process \(W^{\lambda}\) is a standard \(K\)-dimensional Wiener process with respect to \(\mathcal{F}\) and \(Q\), where \(Q\) is the probability measure which has density \(\eta[0,-\lambda](T)\) with respect to the original measure \(P\).

Karatzas and Shreve (1991), Theorem 3.5.1, Liptser and Shiryaev (200), 6, Theorem 6.4.

Now, if \(b \in \mathcal{L}^{2}\) then the integral

\[ \int_{0}^{t} b \, dW^{\lambda} \]

can be interpreted in two potentially different ways.

On the one hand, it can be seen as an integral with respect to a Wiener process \(W^{\lambda}\) based on \(Q\). As such, it is defined by

\[ \int_{0}^{t} b \, dW^{\lambda} = \sum_{k=1}^{K} \int_{0}^{t} b_{k} \, d W^{\lambda}_{k} \]

and each of the integrals

\[ \int_{0}^{t} b_{k} \, d W^{\lambda}_{k} \]

is defined by approximation by simple processes.

On the other hand, \(b \lambda^{\top} \in \mathcal{L}^{1}\), so that the integral can also be seen as an integral with respect to the Ito process

\[ W^{\lambda}(t) = \int_{0}^{t}\lambda^{\top} ds + W(t) \]

based on \(P\):

\[ \int_{0}^{t} b \, dW^{\lambda} =

\int_{0}^{t}b\lambda^{\top}ds + \int_{0}^{t}b \, dW =

\int_{0}^{t}b\lambda^{\top}ds + \sum_{k=1}^{K} \int_{0}^{t}b_{k} \, dW_{k} \]

where each of the integrals

\[ \int_{0}^{t}b_{k} \, dW_{k} \]

is defined by approximation by simple processes.

These two integrals are indeed identical.

Yes of course I did. I proved this in my book Pricing and Hedging of Derivative Securities, on pages 83–84. That is a necessary part of doing mathematics the way it was supposed to be done. Much of the exposition above is quoted from the book.

# Disavowing Lebesgue

No, you cannot disavow Lebesgue. Forget about it.

Lebesgue integration (and measure theory) are indispensable in stochastic calculus, especially for finance applications.

Take for example the martingale representation theorem, which is central to the theory of dynamic hedging and replication. This version is essentially Rogers and Williams (1987), Theorem 36.5:

**Theorem**

*The martingale representation theorem*

Let \(W\) be a standard Brownian motion of dimension \(K\). If \(X\) is a martingale with respect to the augmented filtration generated by \(W\), then there exists a process \(b \in \mathcal{L}^{2}\) such that

$$X(t) -X(0) = \int_{0}^{t} b \, dW$$ (in the sense that the two processes are indistinguishable).

But what does it mean to say that the process \(b\) is in \(\mathcal{L}^{2}\)? It means that \(b\) is measurable and adapted and pathwise square integrable on bounded intervals. In other words,

$$\int_{0}^{t} \|b\|^{2} \, ds < \infty $$ with probability one, for all \(t \in [0,\infty)\). Or, to write it out in detail, if the underlying probability space is \((\Omega,\mathcal{F},P)\), then the requirement is that

$$\int_{0}^{t} \|b(\omega,s)\|^{2} \, ds < \infty $$ for \(P\)-almost all \(\omega \in \Omega\), for all \(t \in [0,\infty)\).

For each \(\omega\), the time integral is a Lebesgue integral. The martingale representation theorem does not in any way guarantee that the function \(\|b\|^{2}\) is pathwise Riemann integrable. How could it? What a laughable idea.

So the statement of the martingale representation theorem requires Lebesgue integration and measure theory.

Of course, my book Pricing and Hedging of Derivative Securities uses measure theory and Lebesgue integration throughout. That is just part of the craft of doing mathematics properly.

# Discrete Knowledge and Information

No, I am not talking about the kind of knowledge and information you should keep to yourself.

By discrete knowledge I mean ex post knowledge in the context of a discrete information structure. And by a discrete information structure I mean a partition of a probability space into events with positive probability.

I am going to describe the distinction between ex post knowledge and an ex ante information structure.

There is an underlying probability space \((\Omega, \mathcal{F}, P)\), where the elements of \(\Omega\) are states of the world, and the sets in \(\mathcal{F}\) are events.

Information is represented by a partition \(\Pi \subset \mathcal{F}\) of \(\Omega\) into events with positive probability, or by the sigma-algebra \(\sigma(\Pi)\) generated by \(\Pi\). Because the events in \(\Pi\) have positive probability, there can be at most countably many of them, which implies that \(\sigma(\Pi)\) simply consists of all unions of elements of \(\Pi\) plus the empty set. One can recover \(\Pi\) from \(\sigma(\Pi)\) by picking those nonempty sets in \(\sigma(\Pi)\) that are minimal with respect to inclusion.

In this discrete setup there is supposed to be some particular state \(\omega\) in \(\Omega\) which is *the true state* of the world.

An event *happens* if it contains the true state. Otherwise it does not happen. Only events that happen can possibly be known.

The idea that an agent with information partition \(\Pi\) and information sigma-algebra \(\sigma(\Pi)\) knows an event \(A\) if the true state is \(\omega\) can be formalized like this: say that \(\Pi\) *knows \(A\) at \(\omega\)* if there is some event \(D\) in \(\Pi\) with \(\omega \in D \subset A\), or equivalently, there is some event \(D\) in \(\sigma(\Pi)\) with \(\omega \in D \subset A\). Here and in what follows, the agent is identified with his or her information structure.

Define the event \(K(\Pi,A)\) that \(\Pi\) knows the event \(A\) as:

$$\begin{eqnarray*}

K(\Pi,A)

& = &

\{ \omega \in \Omega : \Pi \mbox{ knows } A \mbox{ at } \omega \} \\

& = &

\bigcup \{ D \in \Pi: D \subset A \}

\end{eqnarray*}$$

Since the set \(K(\Pi,A)\) is a union of cells in \(\Pi\), it is in \(\sigma(\Pi)\), and in particular, it is an event (it is in \(\mathcal{F}\)).

Since \(K(\Pi,A) \subset A\), the event \(A\) can only be known in states of the world where it actually happens.

An event \(A\) belongs to \(\sigma(\Pi)\) if and only if \(K(\Pi, A) =A\). So, \(\sigma(\Pi)\) consists of those events that will necessarily be known by \(\Pi\) if they happen.

If an event in \(\sigma(\Pi)\) does not happen, then its complement happens. Since the complement is in \(\sigma(\Pi)\), the fact that it happens will be known by \(\Pi\). In other words, if an event is in \(\sigma(\Pi)\), then \(\Pi\) will know whether it happens or not.

Say that \(\Pi\) is *informed about* \(A\) if \(A\) belongs to \(\sigma(\Pi)\).

So this is the distinction between ex post knowledge and ex ante information:

- Knowledge depends on the true state of the world. The events that the agent knows are those that contain the cell in \(\Pi\) that contains the true state, or equivalently, those that contain an event from \(\sigma(\Pi)\) that contains the true state.
- Information is independent of the true state. The events that the agent is informed about are those in \(\sigma(\Pi)\), or equivalently, those that are unions of events from \(\Pi\). These are the events such that the agent will know whether they happen or not, no matter what the true state is.

That was easy.

Next steps:

- Define what it means that somebody knows that somebody else knows something, and define the concept of common knowledge
- Define knowledge when the information structure is given by a general sigma-algebra that is not necessarily generated by a partition into events with positive probability

#### Reference

Lars Tyge Nielsen: “Common Knowledge, Communication, and Convergence of Beliefs,”

*Mathematical Social Sciences* 8 (1984), 1-14 [Abstract]