Erreur quadratique moyenne

En statistiques, l’erreur quadratique moyenne d’un estimateur ${\hat {\theta }}$ d’un paramètre $\theta$ de dimension 1 (mean squared error ( $\operatorname {MSE}$ ), en anglais) est une mesure caractérisant la « précision » de cet estimateur. Elle est plus souvent appelée « erreur quadratique » (« moyenne » étant sous-entendu) ; elle est parfois appelée aussi « risque quadratique ».

L’erreur quadratique moyenne est définie par :

Définition — $\operatorname {MSE} ({\hat {\theta }})\,{\overset {\text{def}}{=}}\,\mathbb {E} \left[({\hat {\theta }}-\theta )^{2}\right]$

Propriétés[modifier | modifier le code]

Expression[modifier | modifier le code]

On peut exprimer l’erreur quadratique moyenne en fonction du biais et de la variance de l’estimateur :

Théorème — $\operatorname {MSE} ({\hat {\theta }})=\operatorname {Biais} ({\hat {\theta }})^{2}+\operatorname {Var} ({\hat {\theta }})$

Démonstration

Rappelons d’abord que $\operatorname {Biais} ({\hat {\theta }})\,{\overset {\text{def}}{=}}\,\mathbb {E} ({\hat {\theta }})-\theta$ et $\mathbb {E} ({\hat {\theta }})$ sont des constantes, ce qui permet d’utiliser la linéarité de l’espérance : $\mathbb {E} (c_{1}X+c_{2})=c_{1}\mathbb {E} (X)+c_{2}$ .

${\begin{aligned}\operatorname {MSE} ({\hat {\theta }})\,{\overset {\text{def}}{=}}\,\mathbb {E} \left[({\hat {\theta }}-\theta )^{2}\right]&=\mathbb {E} \left[\left({\hat {\theta }}-\mathbb {E} ({\hat {\theta }})+\operatorname {Biais} ({\hat {\theta }})\right)^{2}\right]\\&=\mathbb {E} \left[\left({\hat {\theta }}-\mathbb {E} ({\hat {\theta }})\right)^{2}+2\left({\hat {\theta }}-\mathbb {E} ({\hat {\theta }})\right)\operatorname {Biais} ({\hat {\theta }})+\operatorname {Biais} ({\hat {\theta }})^{2}\right]\\&=\mathbb {E} \left[\left({\hat {\theta }}-\mathbb {E} ({\hat {\theta }})\right)^{2}\right]+2\mathbb {E} \left({\hat {\theta }}-\mathbb {E} ({\hat {\theta }})\right)\operatorname {Biais} ({\hat {\theta }})+\operatorname {Biais} ({\hat {\theta }})^{2}\\&=\operatorname {Var} ({\hat {\theta }})+2\left(\mathbb {E} ({\hat {\theta }})-\mathbb {E} ({\hat {\theta }})\right)\operatorname {Biais} ({\hat {\theta }})+\operatorname {Biais} ({\hat {\theta }})^{2}\\&=\operatorname {Var} ({\hat {\theta }})+\operatorname {Biais} ({\hat {\theta }})^{2}\end{aligned}}$

Signe[modifier | modifier le code]

Corollaire — La variance étant toujours positive ou nulle, $\operatorname {MSE} ({\hat {\theta }})\geq 0$ .

Minimisation[modifier | modifier le code]

Théorème — Soit ${\bar {\theta }}$ un estimateur sans biais du paramètre $\theta$ , tel que $\operatorname {MSE} ({\bar {\theta }})>0$ (si l’erreur quadratique moyenne est nulle, elle est déjà minimale, voir section « Signe » ci-dessus).

Parmi tous les estimateurs proportionnels à ${\bar {\theta }}$ , l’erreur quadratique moyenne est minimale pour l’estimateur ${\check {\theta }}\,{\overset {\text{def}}{=}}\,{\frac {\theta ^{2}}{\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})}}{\bar {\theta }}$ .

Cette erreur quadratique moyenne minimale vaut $\operatorname {MSE} ({\check {\theta }})={\frac {\theta ^{2}\operatorname {MSE} ({\bar {\theta }})}{\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})}}$ .

Démonstration

Par définition de l’estimateur sans biais, $\mathbb {E} ({\bar {\theta }})=\theta$ , d’où $\operatorname {Var} ({\bar {\theta }})=\operatorname {MSE} ({\bar {\theta }})$ .

Soit ${\hat {\theta }}_{\alpha }=\alpha {\bar {\theta }}$ , donc :

par linéarité de l’espérance, $\mathbb {E} ({\hat {\theta }}_{\alpha })=\mathbb {E} (\alpha {\bar {\theta }})=\alpha \mathbb {E} ({\bar {\theta }})=\alpha \theta$ ;
par homogénéité de la variance, $\operatorname {Var} ({\hat {\theta }}_{\alpha })=\operatorname {Var} (\alpha {\bar {\theta }})=\alpha ^{2}\operatorname {Var} ({\bar {\theta }})=\alpha ^{2}\operatorname {MSE} ({\bar {\theta }})$ ;

d’où $\operatorname {MSE} ({\hat {\theta }}_{\alpha })=(\alpha \theta -\theta )^{2}+\alpha ^{2}\operatorname {MSE} ({\bar {\theta }})=(\alpha -1)^{2}\theta ^{2}+\alpha ^{2}\operatorname {MSE} ({\bar {\theta }})$ .

En dérivant par rapport à $\alpha$ , on trouve $\operatorname {MSE} '({\hat {\theta }}_{\alpha })=2(\alpha -1)\theta ^{2}+2\alpha \operatorname {MSE} ({\bar {\theta }})=2\left(\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})\right)\alpha -2\theta ^{2}$ .

Comme on a supposé $\operatorname {MSE} ({\bar {\theta }})>0$ , cette dérivée est une fonction affine de coefficient directeur strictement positif, donc elle s’annule en $\alpha _{0}={\frac {\theta ^{2}}{\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})}}$ , est strictement négative pour $\alpha <\alpha _{0}$ , et est strictement positive pour $\alpha >\alpha _{0}$ , donc $\alpha _{0}$ est le minimum de $\operatorname {MSE} ({\hat {\theta }}_{\alpha })$ .

L’erreur quadratique moyenne est donc minimale pour

{\hat {\theta }}_{\alpha _{0}}={\frac {\theta ^{2}}{\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})}}{\bar {\theta }}\,{\overset {\text{def}}{=}}\,{\check {\theta }}

.

Ce minimum vaut :

{\begin{aligned}\operatorname {MSE} ({\check {\theta }})&=\operatorname {MSE} ({\hat {\theta }}_{\alpha _{0}})\\&=(\alpha _{0}-1)^{2}\theta ^{2}+\alpha _{0}^{2}\operatorname {MSE} ({\bar {\theta }})\\&=\left(-{\frac {\operatorname {MSE} ({\bar {\theta }})}{\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})}}\right)^{2}\theta ^{2}+\left({\frac {\theta ^{2}}{\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})}}\right)^{2}\operatorname {MSE} ({\bar {\theta }})\\&={\frac {\theta ^{2}\operatorname {MSE} ({\bar {\theta }})^{2}+\theta ^{4}\operatorname {MSE} ({\bar {\theta }})}{\left(\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})\right)^{2}}}\\&={\frac {\left(\theta ^{2}\operatorname {MSE} ({\bar {\theta }})\right)\left(\operatorname {MSE} ({\bar {\theta }})+\theta ^{2}\right)}{\left(\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})\right)^{2}}}\\&={\frac {\theta ^{2}\operatorname {MSE} ({\bar {\theta }})}{\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})}}\end{aligned}}

Remarque : la valeur de $\theta$ étant inconnue par nature (sinon, on n’en chercherait pas un estimateur), cette formule n’a d’intérêt pratique que si le coefficient ${\tfrac {\theta ^{2}}{\theta ^{2}+\operatorname {MSE} ({\bar {\theta }})}}$ se simplifie en une constante indépendante de $\theta$ , c’est-à-dire si et seulement si $\operatorname {MSE} ({\bar {\theta }})$ est proportionnel à $\theta ^{2}$ (voir exemple plus bas).

Utilité[modifier | modifier le code]

Comparaison d’estimateurs[modifier | modifier le code]

Si les deux estimateurs à comparer sont sans biais, l’estimateur le plus efficace est simplement celui dont la variance est la plus petite. De même, si un estimateur a à la fois un plus grand biais (en valeur absolue) et une plus grande variance qu’un autre estimateur, ce dernier est évidemment meilleur.

Cependant, si un estimateur a un plus grand biais (en valeur absolue) mais une plus petite variance, la comparaison n’est plus immédiate : l’erreur quadratique moyenne permet alors de trancher.

Exemple :

Comparons les deux estimateurs les plus courants de la variance :

s_{n-1}^{2}\,{\overset {\text{def}}{=}}\,{\frac {1}{n-1}}\sum _{i=1}^{n}\left(y_{i}-{\overline {y}}\right)^{2}

et

s_{n}^{2}\,{\overset {\text{def}}{=}}\,{\frac {1}{n}}\sum _{i=1}^{n}\left(y_{i}-{\overline {y}}\right)^{2}={\frac {n-1}{n}}s_{n-1}^{2}

Pour un tirage avec remise et une loi de probabilité dont on suppose que le kurtosis normalisé est nul^{[note 1]} (ex. : la loi normale), les calculs montrent que (voir Greene, section C.5.1) :

\mathbb {E} (s_{n-1}^{2})=\sigma ^{2}

d’où

\operatorname {Biais} (s_{n-1}^{2})=0

,

\operatorname {Var} (s_{n-1}^{2})={\frac {2\sigma ^{4}}{n-1}}

d’où

\operatorname {MSE} (s_{n-1}^{2})={\frac {2\sigma ^{4}}{n-1}}

;

\mathbb {E} (s_{n}^{2})={\frac {n-1}{n}}\mathbb {E} (s_{n-1}^{2})={\frac {n-1}{n}}\sigma ^{2}

d’où

\operatorname {Biais} (s_{n}^{2})=-{\frac {\sigma ^{2}}{n}}

,

\operatorname {Var} (s_{n}^{2})=\left({\frac {n-1}{n}}\right)^{2}\operatorname {Var} (s_{n-1}^{2})=\left({\frac {n-1}{n}}\right)^{2}{\frac {2\sigma ^{4}}{n-1}}={\frac {2(n-1)\sigma ^{4}}{n^{2}}}

d’où

\operatorname {MSE} (s_{n}^{2})={\frac {(2n-1)\sigma ^{4}}{n^{2}}}

.

L’estimateur $s_{n-1}^{2}$ est sans biais mais a une plus grande variance (plus faible efficacité) que l’estimateur $s_{n}^{2}$ .

La comparaison des erreurs quadratiques moyennes donne :

\operatorname {MSE} (s_{n}^{2})-\operatorname {MSE} (s_{n-1}^{2})=\sigma ^{4}\left({\frac {2n-1}{n^{2}}}-{\frac {2}{n-1}}\right)=-{\frac {(3n-1)\sigma ^{4}}{n^{2}(n-1)}}<0

L’estimateur biaisé $s_{n}^{2}$ est donc meilleur en termes d’erreur quadratique moyenne.

Toujours dans le cas d’un tirage avec remise et d’un kurtosis nul, en appliquant le théorème de minimisation donné plus haut à l’estimateur sans biais $s_{n-1}^{2}$ , on trouve que l’estimateur $s_{n+1}^{2}={\frac {n}{n+1}}s_{n}^{2}={\frac {n-1}{n+1}}s_{n-1}^{2}$ est l’estimateur minimisant l’erreur quadratique moyenne, cette dernière valant alors ${\frac {2\sigma ^{4}}{n+1}}$ .

Convergence de l'estimateur[modifier | modifier le code]

Il est possible de déterminer si un estimateur est convergent en probabilité à partir de son erreur quadratique moyenne, on a en effet:

Théorème — $\left[\left(\lim _{n\to \infty }\mathbb {E} ({\hat {\theta }})=\theta \quad \mathbf {et} \quad \lim _{n\to \infty }\operatorname {Var} ({\hat {\theta }})=0\right)\Leftrightarrow \lim _{n\to \infty }\operatorname {MSE} ({\hat {\theta }})=0\right]\Rightarrow {\hat {\theta }}{\xrightarrow {p}}\theta$

La démonstration est faite à la page convergence de variables aléatoires.

Généralisation[modifier | modifier le code]

Dans un cadre plus général pour un modèle multiparamétrique où l'on cherche à estimer plusieurs paramètres ou pour estimer une fonction $f(\theta )$ de un ou plusieurs paramètres, l'erreur quadratique moyenne pour un estimateur $\delta$ de $f(\theta )$ est défini par:

Définition — $\mathbb {E} \left[^{t}(\delta -f(\theta ))A(\delta -f(\theta ))\right]$

où A est une matrice symétrique définie positive (qui définit donc un produit scalaire).

Notes et références[modifier | modifier le code]

Notes[modifier | modifier le code]

↑ Plus généralement, toujours pour un tirage avec remise, on a : $\operatorname {Var} (s_{n-1}^{2})=\left({\frac {\gamma _{2}}{n}}+{\frac {2}{n-1}}\right)\sigma ^{4}$ .