06 Distribution | Zhijun's Notes

$$ \newcommand{aster}{*} \newcommand{exist}{\exists} \newcommand{B}{\mathbb B} \newcommand{C}{\mathbb C} \newcommand{I}{\mathbb I} \newcommand{N}{\mathbb N} \newcommand{Q}{\mathbb Q} \newcommand{R}{\mathbb R} \newcommand{Z}{\mathbb Z} \newcommand{eR}{\overline {\mathbb R}} \newcommand{cD}{ {\mathbb D}} \newcommand{dD}{ {\part \mathbb D}} \newcommand{dH}{ {\part \mathbb H}} \newcommand{eC}{\overline {\mathbb C}} \newcommand{A}{\mathcal A} \newcommand{D}{\mathcal D} \newcommand{E}{\mathcal E} \newcommand{F}{\mathcal F} \newcommand{G}{\mathcal G} \newcommand{H}{\mathcal H} \newcommand{J}{\mathcal J} \newcommand{L}{\mathcal L} \newcommand{U}{\mathcal U} \newcommand{M}{\mathcal M} \newcommand{O}{\mathcal O} \newcommand{P}{\mathcal P} \newcommand{S}{\mathcal S} \newcommand{T}{\mathcal T} \newcommand{V}{\mathcal V} \newcommand{W}{\mathcal W} \newcommand{X}{\mathcal X} \newcommand{Y}{\mathcal Y} \newcommand{bE}{\symbf E} \newcommand{bF}{\symbf F} \newcommand{bD}{\symbf D} \newcommand{bI}{\symbf I} \newcommand{bX}{\symbf X} \newcommand{bY}{\symbf Y} \newcommand{nz}{\mathcal Z} \newcommand{bT}{\mathbb T} \newcommand{bB}{\mathbb B} \newcommand{bS}{\mathbb S} \newcommand{bA}{\mathbb A} \newcommand{bL}{\mathbb L} \newcommand{bP}{\symbf P} \newcommand{bM}{\symbf M} \newcommand{bH}{\mathbb H} \newcommand{dd}{\mathrm d} \newcommand{Mu}{\mathup M} \newcommand{Tau}{\mathup T} \newcommand{ae}{\operatorname{a.e.}} \newcommand{aut}{\operatorname{aut}} \newcommand{adj}{\operatorname{adj}} \newcommand{char}{\operatorname{char}} \newcommand{cov}{\operatorname{Cov}} \newcommand{cl}{\operatorname{cl}} \newcommand{cont}{\operatorname{cont}} \newcommand{e}{\mathbb E} \newcommand{pp}{\operatorname{primitive}} \newcommand{dist}{\operatorname{dist}} \newcommand{diam}{\operatorname{diam}} \newcommand{fp}{\operatorname{Fp}} \newcommand{from}{\leftarrow} \newcommand{Gal}{\operatorname{Gal}} \newcommand{GCD}{\operatorname{GCD}} \newcommand{LCM}{\operatorname{LCM}} \newcommand{fg}{\mathrm{fg}} \newcommand{gf}{\mathrm{gf}} \newcommand{im}{\operatorname{Im}} \newcommand{image}{\operatorname{image}} \newcommand{inj}{\hookrightarrow} \newcommand{irr}{\operatorname{irr}} \newcommand{lcm}{\operatorname{lcm}} \newcommand{ltrieq}{\mathrel{\unlhd}} \newcommand{ltri}{\mathrel{\lhd}} \newcommand{loc}{ {\operatorname{loc}}} \newcommand{null}{\operatorname{null}} \newcommand{part}{\partial} \newcommand{pf}{\operatorname{Pf}} \newcommand{pv}{\operatorname{Pv}} \newcommand{rank}{\operatorname{rank}} \newcommand{range}{\operatorname{range}} \newcommand{re}{\operatorname{Re}} \newcommand{span}{\operatorname{span}} \newcommand{su}{\operatorname{supp}} \newcommand{sgn}{\operatorname{sgn}} \newcommand{syn}{\operatorname{syn}} \newcommand{var}{\operatorname{Var}} \newcommand{res}{\operatorname{Res}} \newcommand{data}{\operatorname{data}} \newcommand{erfc}{\operatorname{erfc}} \newcommand{erfcx}{\operatorname{erfcx}} \newcommand{tr}{\operatorname{tr}} \newcommand{col}{\operatorname{Col}} \newcommand{row}{\operatorname{Row}} \newcommand{sol}{\operatorname{Sol}} \newcommand{lub}{\operatorname{lub}} \newcommand{glb}{\operatorname{glb}} \newcommand{ltrieq}{\mathrel{\unlhd}} \newcommand{ltri}{\mathrel{\lhd}} \newcommand{lr}{\leftrightarrow} \newcommand{phat}{^\widehat{\,\,\,}} \newcommand{what}{\widehat} \newcommand{wbar}{\overline} \newcommand{wtilde}{\widetilde} \newcommand{iid}{\operatorname{i.i.d.}} \newcommand{Exp}{\operatorname{Exp}} \newcommand{abs}[1]{\left| {#1}\right|} \newcommand{d}[2]{D_{\text{KL}}\left (#1\middle\| #2\right)} \newcommand{n}[1]{\|#1\|} \newcommand{norm}[1]{\left\|{#1}\right\|} \newcommand{pd}[2]{\left \langle {#1},{#2} \right \rangle} \newcommand{argmax}[1]{\underset{#1}{\operatorname{argmax}}} \newcommand{argmin}[1]{\underset{#1}{\operatorname{argmin}}} \newcommand{p}[1]{\left({#1}\right)} \newcommand{c}[1]{\left \{ {#1}\right\}} \newcommand{s}[1]{\left [{#1}\right]} \newcommand{a}[1]{\left \langle{#1}\right\rangle} \newcommand{cc}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{f}{\mathfrak F} \newcommand{fi}{\mathfrak F^{-1}} \newcommand{Fi}{\mathcal F^{-1}} \newcommand{l}{\mathfrak L} \newcommand{li}{\mathfrak L^{-1}} \newcommand{Li}{\mathcal L^{-1}} \newcommand{const}{\text{const.}} $$

Distributions and Stochastic Differential Equations #

Särkkä, S., & Solin, A. (2019). Applied Stochastic Differential Equations.
"The solutions of SDEs are stochastic processes, and therefore their solutions have certain probability distributions and statistics."

Strong assumptions #

We will be working under following assumptions in this section.

Suppose $(\Omega, \F, (\F _ t), [0, T], P)$ is a complete filtered probability space. Let $(\symbf B _ t)$ be a $m$-dimensional standard Brownian motion.

Consider an Itô stochastic differential equation (SDE) is of the form

$$ \dd \symbf X _ t = \symbf b(\symbf X _ t, t) \dd t + \symbf A(\symbf X _ t, t) \dd \symbf B _ t,\quad \symbf X _ 0 = \symbf Z, \quad t \in [0, T] \label{equ:forward _ sde} $$

For $R=(r _ {ij}), S = (s _ {ij}) \in \R^{n \times n}$, define $R:S := \sum _ {ij} r _ {ij} s _ {ij}$.
Define $\symbf C(\symbf x, t): \R^n \times [0, T] \to \R^{n \times n}$ as $$ \symbf C(\symbf x, t) := \frac{1}{2}\symbf A(\symbf x, t) \symbf A^T(\symbf x, t) $$

We will make the following strong assumptions in the following derivations.

$\symbf b \in C(\R^n \times [0, T] \to \R^n)$ and $\symbf A \in C(\R^n \times [0, T] \to \R^{n \times m})$ satisfies the strong conditions.
Existence and uniqueness of the strong solution. Weak uniqueness of weak solutions.
The strong solution $(\symbf X _ t)$ is Markov with respect to $(\F _ t)$.
Existence of $C^2$ Markov transition densities.
- For $t > s$, we have $C^2(\R^n \times [0, T] \to [0, \infty))$ conditional density $p _ {t, s}(\symbf x _ t|\symbf x _ s)$.
- Denote $E^{\symbf x, s}$ for $E[\cdot | \symbf X _ {s} = \symbf x]$ and $P^{\symbf x, s}$ for $P(\cdot | \symbf X _ s = \symbf x)$.

Generalized generators of Itô SDEs #

The generalized generator $\A$ of the Itô SDE is an operator on $\D :=C _ c^\infty(\R^n \times [0, T] \to \R)$.

Consider $f(\symbf x, t) \in \D$.

Define $\nabla f(\symbf x, t) := (D _ {x _ i} f(\symbf x, t)) _ {i = 1}^n$, and $\nabla^2 f(\symbf x, t) := (D _ {x _ i, x _ j} f (\symbf x, t)) _ {1 \le i, j \le n}$.
Define $f'(\symbf x, t) := D _ t f(\symbf x, t)$. We will keep using prime for time derivatives in the following.
For convenient, we abbreviate the arguments $(\symbf X _ t, t)$ in most functions.

$$ \begin{aligned} \dd f(\symbf X _ t, t) & = f' \dd t + \nabla f \cdot\dd \symbf X _ t + \nabla^2 f : \symbf C\, \dd t\\ & = f' \dd t + \nabla f \cdot \p{\symbf b \, \dd t + \symbf A \, \dd \symbf B _ t} + \nabla^2 f: \symbf C \dd t\\ & = \p{f' + \nabla f \cdot \symbf b + \nabla^2 f : \symbf C} \dd t + \nabla f \cdot \symbf A \dd \symbf B _ t \end{aligned} $$

Therefore we have:

$$ \begin{aligned} \A f(\symbf x, t) & := \lim _ {\tau \downarrow 0}\frac{E^{\symbf x, t}[f(\symbf X _ {t + \tau}, t + \tau)] - f(\symbf x, t)}{\tau}\\ & = \lim _ {\tau \downarrow 0} \int _ {\Omega} \frac{f(\symbf X _ {t + \tau},t + \tau) - f(\symbf X _ {t}, t)}{\tau} \dd P^{\symbf x, t}(\omega)\\ & = \lim _ {\tau \downarrow 0} \int _ {\Omega}\frac{1}{\tau}\s{\int _ t^{t + \tau} \p{f'(\symbf X _ {t + \tau}, t + \tau) + \nabla^T f(\cdot) \, \symbf b(\cdot) + \nabla^2 f(\cdot) : \symbf C(\cdot)} \dd \tau} \dd P^{\symbf x, t}(\omega)\\ & = {f'(\symbf x, t) + \nabla^T f(\symbf x, t) \, \symbf b(\symbf x, t) + \nabla^2 f(\symbf x, t) : \symbf C(\symbf x, t)} \end{aligned} $$

Generators of Itô SDEs #

The generator also dented by $\A _ t$ of the Itô SDE is an operator on $\mathcal D := C _ c^\infty(\R^n \to \R)$.

Consider $f(\symbf x) \in \D$. And we adopt previous conventions. $$ \dd f(\symbf X _ t) = \p{\nabla^T f \, \symbf b + \nabla^2 f : \symbf C} \dd t + \nabla^T f \symbf A \,\dd \symbf B _ t \label{generator} $$ Therefore we have, $$ \A _ t f(\symbf x) := \lim _ {\tau \downarrow 0}\frac{E^{\symbf x, t}[f(\symbf X _ {t + \tau})] - f(\symbf x)}{\tau} = \nabla^T f(\symbf x) \, \symbf b(\symbf x, t) + \nabla^2 f(\symbf x) : \symbf C(\symbf x, t) $$

Define the adjoint operator of $\mathcal A _ t$ as $\mathcal A _ t^ * $ on $\mathcal D'$. $$ \mathcal A^ * _ t p(\symbf x) := - \nabla \cdot \p{\symbf b(\symbf x, t) p(\symbf x)} + \nabla^2 : \p{\symbf C(\symbf x, t) p(\symbf x)} $$

Fokker-Planck Kolmogorov equation of Itô SDEs #

Suppose that for $t \in [0, T]$, we have $C^2(\R^n \times [0, T] \to [0, \infty))$ density $p _ {t}(\symbf x _ t)$.

For any $f \in \D = C _ c^\infty(\R^n \to \R)$ we have equation $\ref{generator}$ therefore $$ f(\symbf X _ {t + \tau}) - f(\symbf X _ {t}) = \int _ t^{t + \tau} \p{\nabla f(\symbf X _ {u}, u) \cdot \symbf b(\cdot) + \nabla^2 f(\cdot): \symbf C(\cdot)} \dd t +\nabla f(\cdot) \cdot \symbf A(\cdot) \dd \symbf B _ u $$ Following assumptions $\nabla^T f(\symbf X _ t, t) \symbf A(\symbf X _ t, t) \dd \symbf B _ t$ is a martingale with zero expectation. So

$$ \begin{aligned} E\s{f(\symbf X _ {t + \tau}) - f(\symbf X _ {t})} & = E\s{\int _ t^{t + \tau} \p{\nabla f(\symbf X _ {u}) \cdot \symbf b(\symbf X _ u, u) + \nabla^2 f(\symbf X _ u): \symbf C(\symbf X _ u, u)} \dd u}\\ & = \int _ t^{t + \tau} E\s{\nabla f(\symbf X _ {u})\cdot \symbf b(\symbf X _ u, u)} + E\s{\nabla^2 f(\symbf X _ u): \symbf C(\symbf X _ u, u)} \dd u \end{aligned} \label{equ:expectation _ difference} $$ Consider the first term, apply Fubini's theorem + integral by parts. $$ \begin{aligned} E\s{\nabla^T f(\symbf X _ {u}) \symbf b(\symbf X _ u, u)} & = \int _ {\R^n} \nabla f(\symbf x)\cdot \symbf b(\symbf x, u) p _ u(\symbf x) \dd \symbf x\\ & = -\int _ {\R^n} f(\symbf x) \nabla\cdot(\symbf b(\symbf x, u) p _ u(\symbf x)) \dd \symbf x \end{aligned} $$ Similarly for the second term, apply Fubini's theorem + integral by part twice. $$ \begin{aligned} E\s{\nabla^2 f(\symbf X _ u): \symbf C(\symbf X _ u, u)} & = \int _ {\R^n} \nabla^2 f(\symbf x):(\symbf C(\symbf x, u)p _ u(\symbf x)) d\symbf x\\ & = \int _ {\R^n} f(\symbf x) \nabla^2:(\symbf C(\symbf x, u)p _ u(\symbf x))\dd \symbf x \end{aligned} $$ Now consider equation $\ref{equ:expectation _ difference}$, the expectations are continuous in $u$, therefore $$ \begin{aligned} \lim _ {\tau \downarrow 0} E\s{\frac{f(\symbf X _ {t + \tau}) - f(\symbf X _ {t})}{\tau}} & = E\s{\nabla f(\symbf X _ {t})\cdot \symbf b(\symbf X _ t, t)} + E\s{\nabla^2 f(\symbf X _ t): \symbf C(\symbf X _ t, t)}\\ & = \int _ {\R^n} f(\symbf x) \c{-\nabla\cdot(\symbf b(\symbf x, t) p _ t(\symbf x)) + \nabla^2:(\symbf C(\symbf x, t)p _ t(\symbf x))} \dd \symbf x \end{aligned} $$ By assumptions, it is legal to swap the differential and integral here: $$ D _ t E[f(\symbf X _ t)] = E[D _ t f(\symbf X _ t)] = D _ t\int _ {\R^n} f(\symbf x) p _ t(\symbf x) \dd \symbf x = \int _ {\R^n} f(\symbf x) p' _ t(\symbf x) \dd \symbf x $$ Therefore we have $$ \forall f(\symbf x) \in \D: \int _ {\R^n} f(\symbf x) p' _ t(\symbf x) \dd \symbf x = \int _ {\R^n} f(\symbf x) \A _ t^ * p _ t(\symbf x) \dd \symbf x $$ Therefore $p _ t(\symbf x)$ satisfies the following Fokker-Planck Equation: $$ p' _ t(\symbf x) = \A _ t^ * p _ t(\symbf x),\quad t \in [0, T] $$

Weak-sense time reversal of SDEs #

I do not have enough weapons to prove the more general result proved in the following paper.
Castañón, D.A. (1982). Reverse-time diffusion processes. IEEE Trans. Inf. Theory, 28, 953-956.
For now, I can demonstrate a weaker result with restricted generality.

Suppose $(\wbar\Omega, \wbar\F, (\wbar\F _ t), T = [T, 0], P)$ is a complete filtered probability space. Let $(\wbar {\symbf B} _ t)$ be a reverse-time $m$-dimensional standard Brownian motion starting at $T$.

Consider the following class of reverse-time SDE. We would like to find $\symbf {\wbar b}(\symbf x, t)$ that gives an appropriate time reversal SDE. $$ \dd \symbf {\wbar X} _ t = \symbf {\wbar b}(\symbf {\wbar X} _ t, t) \dd t + \sqrt{\lambda(t)} \symbf {{A}}(\symbf {\wbar X} _ t, t) \dd \symbf {\wbar {B}} _ t,\quad \symbf{\wbar X} _ T = \symbf X _ T, \quad t \in [T, 0] $$ Recall that the forward SDE has FPK equation. $$ p' _ t(\symbf x) = -D _ i \p{\symbf b^{(i)}(\symbf x, t) p _ t(\symbf x)} + D _ {ij} \p{\symbf C^{(i,j)}(\symbf x, t) p _ t(\symbf x)}, \quad t \in [0, T] $$ Notice that the backward SDE has FPK equation, the sign is flipped due to time reversal. $$ -p _ t'(\symbf x) = -D _ i \p{-{\symbf {\wbar b}}^{(i)}(\symbf x, t) p _ t(\symbf x)} + \lambda(t) D _ {ij} \p{\symbf C^{(i, j)}(\symbf x, t) p _ t(\symbf x)}, \quad t \in [0, T] $$ To achieve weak-sense time reversal, we need the two FPK equations to be equal. Assuming that $p _ t(\symbf x) > 0$ on $\R^n$, $$ \begin{aligned} {{\symbf {\wbar b}}^{(i)}(\symbf x, t)} & = {\symbf b^{(i)}(\symbf x, t)} - \frac{(1 + \lambda(t))}{p _ t(\symbf x)} D _ {j}\p{\symbf C^{(i,j)}(\symbf x, t) p _ t(\symbf x)}\\ & = {\symbf b^{(i)}(\symbf x, t)} - (1 + \lambda(t)) \p{D _ j \symbf C^{(i, j)}(\symbf x, t) + \symbf C^{(i, j)}(\symbf x, t)D _ j \log p _ t(\symbf x)} \end{aligned} $$ Notice that for $\lambda(t) = 0$, this gives the reverse probability flow ODE.