02 Derivative

$$ \newcommand{aster}{*} \newcommand{exist}{\exists} \newcommand{B}{\mathbb B} \newcommand{C}{\mathbb C} \newcommand{I}{\mathbb I} \newcommand{N}{\mathbb N} \newcommand{Q}{\mathbb Q} \newcommand{R}{\mathbb R} \newcommand{Z}{\mathbb Z} \newcommand{eR}{\overline {\mathbb R}} \newcommand{cD}{ {\mathbb D}} \newcommand{dD}{ {\part \mathbb D}} \newcommand{dH}{ {\part \mathbb H}} \newcommand{eC}{\overline {\mathbb C}} \newcommand{A}{\mathcal A} \newcommand{D}{\mathcal D} \newcommand{E}{\mathcal E} \newcommand{F}{\mathcal F} \newcommand{G}{\mathcal G} \newcommand{H}{\mathcal H} \newcommand{J}{\mathcal J} \newcommand{L}{\mathcal L} \newcommand{U}{\mathcal U} \newcommand{M}{\mathcal M} \newcommand{O}{\mathcal O} \newcommand{P}{\mathcal P} \newcommand{S}{\mathcal S} \newcommand{T}{\mathcal T} \newcommand{V}{\mathcal V} \newcommand{W}{\mathcal W} \newcommand{X}{\mathcal X} \newcommand{Y}{\mathcal Y} \newcommand{bE}{\symbf E} \newcommand{bF}{\symbf F} \newcommand{bD}{\symbf D} \newcommand{bI}{\symbf I} \newcommand{bX}{\symbf X} \newcommand{bY}{\symbf Y} \newcommand{nz}{\mathcal Z} \newcommand{bT}{\mathbb T} \newcommand{bB}{\mathbb B} \newcommand{bS}{\mathbb S} \newcommand{bA}{\mathbb A} \newcommand{bL}{\mathbb L} \newcommand{bP}{\symbf P} \newcommand{bM}{\symbf M} \newcommand{bH}{\mathbb H} \newcommand{dd}{\mathrm d} \newcommand{Mu}{\mathup M} \newcommand{Tau}{\mathup T} \newcommand{ae}{\operatorname{a.e.}} \newcommand{aut}{\operatorname{aut}} \newcommand{adj}{\operatorname{adj}} \newcommand{char}{\operatorname{char}} \newcommand{cov}{\operatorname{Cov}} \newcommand{cl}{\operatorname{cl}} \newcommand{cont}{\operatorname{cont}} \newcommand{e}{\mathbb E} \newcommand{pp}{\operatorname{primitive}} \newcommand{dist}{\operatorname{dist}} \newcommand{diam}{\operatorname{diam}} \newcommand{fp}{\operatorname{Fp}} \newcommand{from}{\leftarrow} \newcommand{Gal}{\operatorname{Gal}} \newcommand{GCD}{\operatorname{GCD}} \newcommand{LCM}{\operatorname{LCM}} \newcommand{fg}{\mathrm{fg}} \newcommand{gf}{\mathrm{gf}} \newcommand{im}{\operatorname{Im}} \newcommand{image}{\operatorname{image}} \newcommand{inj}{\hookrightarrow} \newcommand{irr}{\operatorname{irr}} \newcommand{lcm}{\operatorname{lcm}} \newcommand{ltrieq}{\mathrel{\unlhd}} \newcommand{ltri}{\mathrel{\lhd}} \newcommand{loc}{ {\operatorname{loc}}} \newcommand{null}{\operatorname{null}} \newcommand{part}{\partial} \newcommand{pf}{\operatorname{Pf}} \newcommand{pv}{\operatorname{Pv}} \newcommand{rank}{\operatorname{rank}} \newcommand{range}{\operatorname{range}} \newcommand{re}{\operatorname{Re}} \newcommand{span}{\operatorname{span}} \newcommand{su}{\operatorname{supp}} \newcommand{sgn}{\operatorname{sgn}} \newcommand{syn}{\operatorname{syn}} \newcommand{var}{\operatorname{Var}} \newcommand{res}{\operatorname{Res}} \newcommand{data}{\operatorname{data}} \newcommand{erfc}{\operatorname{erfc}} \newcommand{erfcx}{\operatorname{erfcx}} \newcommand{tr}{\operatorname{tr}} \newcommand{col}{\operatorname{Col}} \newcommand{row}{\operatorname{Row}} \newcommand{sol}{\operatorname{Sol}} \newcommand{lub}{\operatorname{lub}} \newcommand{glb}{\operatorname{glb}} \newcommand{ltrieq}{\mathrel{\unlhd}} \newcommand{ltri}{\mathrel{\lhd}} \newcommand{lr}{\leftrightarrow} \newcommand{phat}{^\widehat{\,\,\,}} \newcommand{what}{\widehat} \newcommand{wbar}{\overline} \newcommand{wtilde}{\widetilde} \newcommand{iid}{\operatorname{i.i.d.}} \newcommand{Exp}{\operatorname{Exp}} \newcommand{abs}[1]{\left| {#1}\right|} \newcommand{d}[2]{D_{\text{KL}}\left (#1\middle\| #2\right)} \newcommand{n}[1]{\|#1\|} \newcommand{norm}[1]{\left\|{#1}\right\|} \newcommand{pd}[2]{\left \langle {#1},{#2} \right \rangle} \newcommand{argmax}[1]{\underset{#1}{\operatorname{argmax}}} \newcommand{argmin}[1]{\underset{#1}{\operatorname{argmin}}} \newcommand{p}[1]{\left({#1}\right)} \newcommand{c}[1]{\left \{ {#1}\right\}} \newcommand{s}[1]{\left [{#1}\right]} \newcommand{a}[1]{\left \langle{#1}\right\rangle} \newcommand{cc}[2]{\left(\begin{array}{c} #1 \\ #2 \end{array}\right)} \newcommand{f}{\mathfrak F} \newcommand{fi}{\mathfrak F^{-1}} \newcommand{Fi}{\mathcal F^{-1}} \newcommand{l}{\mathfrak L} \newcommand{li}{\mathfrak L^{-1}} \newcommand{Li}{\mathcal L^{-1}} \newcommand{const}{\text{const.}} $$

Derivative #

(Landau notation)

Suppose $(a _ n) _ {n = 1}^\infty, (c _ n) _ {n = 1}^\infty \in \C$ and $(b _ n) _ {n = 1}^\infty \in (0, \infty)$.

  • $a _ n = O(b _ n) \iff \exists M > 0, \forall n \ge 1: |a _ n| \le M b _ n$.
  • $a _ n = o(b _ n) \iff \lim _ {n \to \infty} a _ n / b _ n = 0$.
  • $a _ n = c _ n + O(b _ n) \iff a _ n - c _ n = O(b _ n)$.
  • $a _ n = c _ n + o(b _ n) \iff a _ n - c _ n = o(b _ n)$.

Suppose $S$ is some topological space. Suppose $a(x): S \to \C$ and $b(x): S \to (0, \infty)$.

  • $a(x) = o(b(x))$ as $x \to a$ iff $\lim _ {x \to a} a(x) / b(x) = 0$.
Derivative #

Consider $f: [a, b] \to \R$. Define $f^ * : [a, b] \to \R$ for $c \in [a, b]$ as $$ f^ * (x) := \frac{f(x) - f(c)}{x - c} $$

  • $f^ * $ has a removable discontinuity at $c$.

When $f'(c): = \lim _ {x \to c} f^ * (x)$ exists in $\R$. $f$ is called differentiable at $c$. And $f'(c)$ is the derivative of $f$ at $c$.

  • $f$ is continuous at $c$ if $f$ is differentiable at $c$.
  • $f': F \to \R$ is called the derivative of $f$ where $f'(x)$ exists.
  • The $n$-th derivative of $f$, denoted by $f^{(n)}$, is the derivative of $f^{(n-1)}$.

We extend the definition of $f'(c)$ when $f$ is continuous at $c$ and $\lim _ {x \to c} f^ * (x)$ exists in $\eR$.

For $c = a$, $f'(c)$ is called the right-derivative, and also denoted as $f' _ {+}(c)$. $f(x)$ is called right-differentiable if $f' _ +(c)$ is defined.

Derivative chain rule #

Consider the following situation:

  • Suppose $f: S \to \R$ is differentiable at $c \in S^\circ$.
  • Suppose $g: T \to \R$ is differentiable at $f(c) \in T^\circ$.
  • Suppose $h = g \circ f: S \to \R$.

Then $h$ is differentiable at $c$ and $h'(c) = g'(f(c))f'(c)$.

  • $f(x) = f(c) + f'(c)(x - c) + R _ 0(x-c)$ for $x \in B(c, \delta _ 0)$.
  • $g(y) = g(f(c)) + g'(f(c)) (y - f(c)) + R _ 1(y - f(c))$ for $y \in B(f(c), \delta _ 1)$.
  • w.l.o.g. take $\delta _ 0$ small enough where $f[B(c, \delta _ 0)] \subseteq B(f(c), \delta _ 1)$.
  • $h(x) = g(f(c)) + g'(f(c))f'(c) (x - c) + g'(f(c)) R _ 0(x - c) + R _ 2(x - c)$.
    • Where $R _ 2(x - c) := R _ 1(f'(c) (x - c) + R _ 0(x - c))$.
  • For $x \in B(c, \delta _ 2)$ with $\delta _ 2$ small enough, we have $\lim _ {x \to c}\frac{R _ 2(x - c)}{x - c} = 0$.
    • Just notice that for $\delta _ 2$ small enough $f'(c) (x - c) + R _ 0(x - c)$ is bounded by $K\abs{x - c}$.
Derivative Arithmetic #

For $f, g: S \to \R$ differentiable at $c \in S^\circ$, rules for $(f + g)', (fg)', (f/g)'$ in Calculus are true.

See any Calculus book for the derivations of these rules.

Sets of Differentiable Functions #

For $f:S \subseteq \R \to \R$.

  • $f \in C^k[S]$ means $f^{(k)}$ exists and is continuous on $S$.
  • $f \in C^\infty[S]$ means $f$ is infinitely differentiable on $S$.
  • $f \in D[S]$ means $f$ is differentiable on $S$.
  • $f \in D'[S]$ means $f'(x) \in \eR$ is defined on $S$.
Extrema of real functions #

For $f:S \subseteq \R \to \R$.

  • Minima and maxima of $f$ are extrema.
  • $f$ have a local maximum at local maximum point $a$ if for all points in $B _ S(a, \delta)$, $f(x) \le f(a)$.
  • $f$ have a global maximum at global maximum point $a$ if $\forall x \in S: f(x) \le f(a)$.

Mean Value Theorems #

Rolle #

Suppose $f: [a, b] \to \R$. Suppose $f \in C[a, b]$ and $f \in D'(a, b)$, and $f(a) = f(b)$.

There is some $c \in (a, b)$ such that $f'(c) = 0$.

  • w.l.o.g. suppose $f(a) = f(b) = 0$, and $\sup f(x) > 0$.
  • Suppose for $x \in (a, b)$, $f(x) = \max f[a, b]$.
    • Since $f[a, b]$ is compact.
  • $f _ +'(x) \le 0$ and $f' _ -(x) \ge 0$. So $f'(x) = 0$.
Mean Value Theorem #

Suppose $f: [a, b] \to \R$. Suppose $f \in C[a, b]$ and $f \in D'(a, b)$.

Then there is some $c \in (a, b)$ such that $f(b) - f(a) = f'(c) (a - b)$.

  • Take $g(x) = f(x) - (x-a)(f(b) - f(a))/(b -a)$ then apply (Rolle).

Immediately results from MVT I:

  • If $\forall x \in (a, b): f'(x) \gt 0$, $f$ is strictly increasing on $[a, b]$.
  • If $\forall x \in (a, b): f'(x) = 0$, $f$ is constant on $[a, b]$.
  • Suppose $f, g \in D'(a, b)$ and $f, g \in C[a, b]$. If $f' - g' = 0$ on $(a, b)$ then $f - g = 0$ on $[a, b]$.
Cauchy Mean Value Theorem #

Suppose $f, g: [a, b] \to \R$.

Suppose $f, g \in C[a, b]$ and $f, g \in D'(a, b)$, and $f'(x)$ and $g'(x)$ are not simultaneously infinite.

For some $c \in (a, b)$, $f^{\prime}(c)[g(b)-g(a)]=g^{\prime}(c)[f(b)-f(a)]$.

  • Define $h(x) = f(x)[g(b) - g(a)] - g(x)[f(b) - f(a)]$.
  • Notice that $h(a) = h(b) = f(a) g(b) - f(b) g(a)$.
  • $h \in C[a, b]$ and $h \in D'(a, b)$.
  • The result follows from Rolle.

Intermediate Value Theorem of Derivative #

Intermediate Value Theorem of Derivative #

Consider $f: (a, b) \to \R$. Suppose $f \in C(a, b)$ and $f \in D'(a, b)$.

$f'(a, b)$ obtains all intermediate values.

  • Define the triangle region $S := \c{a < x < y < b: x, y \in \R}$ where $S \subseteq \R^2$.
    • Clearly $S$ is open and connected.
  • Consider function $g(x, y): S \to \R$. Defined as $$ g(x, y) := \frac{f(y) - f(x)}{y - x} $$
    • $g$ is continuous on $S$.
    • Since $S$ is connected, $g[S]$ obtains all intermediate values.
    • According to intermediate value theorem, $g[S] \subseteq f'(a, b)$.
  • Notice that $f'(a, b) = \overline{g[S]}$.
    • Even though $g[S] \neq f'(a, b)$ in general. $g[S]$ is clearly dense in $f'(a, b)$.
  • So $f'(a, b)$ obtains all intermediate values.
Discontinuities of derivative #

Consider $f: (a, b) \to \R$. Suppose $f \in D'(a, b)$.

$f'$ does not have jump or removable discontinuities.

Monotonicity and derivative #

For $f: (a, b) \to \R$. Suppose $f \in D'(a, b)$.

Suppose $0 \notin f'(a, b)$ then $f$ is strictly monotonic on $[a, b]$.

  • Either $f'(a, b) \subseteq (0, \infty]$ or $f'(a, b) \subseteq [-\infty, 0)$.

Suppose $f'$ is monotonic on $(a, b)$, $f' \in C(a, b)$.

  • Since $f'$ is monotonic, there can only be jump and removable discontinuities.

Convexity #

Convex real functions #

Suppose $g: I \to \R$ where $I$ is an open interval in $\R$. $g$ is called convex if $$ \forall x, y \in I, \forall a \in [0, 1]: g(ax + (1 - a)y) \le ag(x) + (1 - a)g(y) $$

  • Intuitively, the function is always below any line segment.

Suppose $g(x)$ is convex the following are rather apparent:

  • $g _ +'$ exists on $I$.
    • $\forall 0 < a < b: [g(x+a)-g(x)]/a \leq [g(x+b)-g(x)]/b$.
  • $g' _ -$ exists on $I$.
    • $\forall 0 < a < b: [g(x) - g(x-b)]/b \geq [g(x) - g(x-a)]/a$.
  • $g(x)$ is continuous on $I$. Since $g' _ +$ and $g' _ -$ exists on $I$.
  • $\forall x \in I: g' _ -(x) \le g' _ +(x)$.
    • For $a > 0$ and $b > 0$, $[g(x) - g(x - a)]/a \le [g(x + b) - g(x)]/b$.
  • $g' _ +, g' _ -$ are increasing on $I$.
    • $g _ +'(x _ 1) \le [g(x _ 2) - g(x _ 1)]/(x _ 2 - x _ 1) \le g _ -'(x _ 2) \le g _ +'(x _ 2)$.
  • For any $c \in I$ there exists a linear $L _ c(x): I \to \R$ where $\forall x \in I: L _ c(x) \le g(x)$.
    • There exists some $g _ -'(x) \le k \le g _ +'(x)$.
    • Clearly $\forall x\in I: g(x) \ge g(c) + k(x - c)$.
  • Such a function $L _ c(x)$ is called a line of support for $g$ at $c$.
    • $g(x) = \sup _ {c \in I \cap \Q}L _ c(x)$.

Taylor's Theorem #

Taylor polynomial #

Consider $f: [a, b] \to \R$. Suppose $f \in D^{n}[a, b]$. Where $n \in \N$.

Define n-th order Taylor polynomial $P _ n(x): [a, b] \to \R$: $$ P _ n(x) := \sum _ {k=0}^n \frac{f^{(k)}(a)}{k!} (x - a)^k = f(a) + f'(a)(x - a) + \frac{f''(a)}{2} (x - a)^2 + \cdots $$ The n-th order Taylor remainder is $R _ n(x) := f(x) - P _ n(x)$.

Lagrange remainder #

Consider $f: [a, b] \to \R$. Suppose $f \in D^{n}[a, b]$, and $f \in D^{n+1}(a, b)$. $$ \forall x \in (a, b],\exist \xi _ x \in (a, x): R _ n(x) = \frac{f^{(n+1)}(\xi _ x)}{(n+1)!}(x - a)^{n+1} $$

  • Define $F(x) := R _ n(x) = f(x) - P _ n(x)$. And $G(x):= (x - a)^{n + 1}$.
    • $F(a) = F'(a) = \cdots = F^{(n)}(a) = 0$.
    • $G(a) = G'(a) = \cdots = G^{(n)} = 0$.
    • $F^{(n+1)}(x) = f^{(n+1)}(x)$ for $x \in (a, b)$.
    • $G^{(n+1)}(x) = (n+1)!$
  • By Cauchy's MVT, we have (division should be expanded to products): $$ \frac{F(x)}{G(x)} = \frac{F'(c _ 1)}{G'(c _ 1)} = \cdots = \frac{F^{(n+1)}(c _ {n+1})}{G^{(n+1)}(c _ {n+1})}; \quad a < c _ {n+1} < \cdots < c _ 1 < x $$