$\approx$
### Theorem
The Chain Rule If $f(u)$ is differentiable at the point $u=g(x)$ and $g(x)$ is differentiable at $x$, then the composite function $(f \circ g)(x)=f(g(x))$ is differentiable at $x$, and
$
(f \circ g)^{\prime}(x)=f^{\prime}(g(x)) \cdot g^{\prime}(x)
$
In Leibniz's notation, if $y=f(u)$ and $u=g(x)$, then
$
\frac{d y}{d x}=\frac{d y}{d u} \cdot \frac{d u}{d x}
$
##### Alternate Notation
$\begin{aligned} \frac{d y}{d x} &=f^{\prime}(g(x)) \cdot g^{\prime}(x) \\ \frac{d}{d x} f(u) &=f^{\prime}(u) \frac{d u}{d x} \end{aligned}$
#### Proof
$\Delta u=g(x+\Delta x)-g(x)\neq0$
$\Delta y=f(u+\Delta u)-f(u)$
$
\begin{aligned}
\frac{\Delta y}{\Delta x}&=\frac{\Delta y}{\Delta u} \cdot \frac{\Delta u}{\Delta x}\\
&=\lim _{\Delta x \rightarrow 0} \frac{\Delta y}{\Delta x} \\
\frac{d y}{d x} &=\lim _{\Delta x \rightarrow 0} \frac{\Delta y}{\Delta u} \cdot \frac{\Delta u}{\Delta x} \\
&=\lim _{\Delta x \rightarrow 0} \frac{\Delta y}{\Delta u} \cdot \lim _{\Delta x \rightarrow 0} \frac{\Delta u}{\Delta x} \\
&=\lim _{\Delta u \rightarrow 0} \frac{\Delta y}{\Delta u} \cdot \lim _{\Delta x \rightarrow 0} \frac{\Delta u}{\Delta x} \\ &=\frac{d y}{d u} \cdot \frac{d u}{d x} \end{aligned}$
---
Given $y=f(u)$ and $u=g(x)$ and everything is nice and differentiable. $\Delta x$ is a increment that $\Delta u$ and $\Delta y$ correspond to.
Using [[Change in y near x=a]]
$
\Delta u=g^{\prime}\left(x_{0}\right) \Delta x+\epsilon_{1} \Delta x=\left(g^{\prime}\left(x_{0}\right)+\epsilon_{1}\right) \Delta x
$
where $\epsilon_{1} \rightarrow 0$ as $\Delta x \rightarrow 0 .$ Similarly,
$
\Delta y=f^{\prime}\left(u_{0}\right) \Delta u+\epsilon_{2} \Delta u=\left(f^{\prime}\left(u_{0}\right)+\epsilon_{2}\right) \Delta u
$
where $\epsilon_{2} \rightarrow 0$ as $\Delta u \rightarrow 0 .$ Notice also that $\Delta u \rightarrow 0$ as $\Delta x \rightarrow 0 .$ Substituting $\Delta u=\left(g^{\prime}\left(x_{0}\right)+\epsilon_{1}\right) \Delta x$ into the equation for $\Delta y$ gives
$
\Delta{y}=\left(f^{\prime}\left(u_{0}\right)+\epsilon_{2}\right)\left(g^{\prime}\left(x_{0}\right)+\epsilon_{1}\right) \Delta x
$
so, by expanding the binomials, and dividing by $\Delta x$
$\frac{\Delta y}{\Delta x}=f^{\prime}\left(u_{0}\right) g^{\prime}\left(x_{0}\right)+\epsilon_{2} g^{\prime}\left(x_{0}\right)+f^{\prime}\left(u_{0}\right) \epsilon_{1}+\epsilon_{2} \epsilon_{1}$
All the terms with $\epsilon$ will approach 0
$\left.\frac{d y}{d x}\right|_{x=x_{0}}=\lim _{\Delta x \rightarrow 0} \frac{\Delta y}{\Delta x}=f^{\prime}\left(u_{0}\right) g^{\prime}\left(x_{0}\right)=f^{\prime}\left(g\left(x_{0}\right)\right) \cdot g^{\prime}\left(x_{0}\right)$
## Notes
When a function is a composition of more than two functions, the chain rule may need to be applied more than once. To recognize anything other than elementary anti-derivatives, it is necessary to understand the chain rule.