Exponential Family

Notice

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

statduck

Exponential Family 본문

잡담

Exponential Family

statduck 2023. 6. 20. 00:37

우리가 흔히 알고있는 분포들(이항분포, 다항분포, 정규분포)은 주로 지수족(Exponential Family)에 속한다.

지수족은 몇가지 특성들을 만족시키는 분포들의 집합이라고 생각하면 된다.

정의로는 다음과 같은 확률함수의 모양을 가지면 지수족이다.

$$ p(x|\eta)= \dfrac{1}{Z(\eta)}h(x)exp[\eta^T T(x)] = h(x)exp[\eta^T T(x)-A(\eta)] $$

$\eta \in R^K$ with fixed support over $x^D \subseteq R^D$
$\eta$: canonical parameters, $T(x)$: suffcient statistics
$A(\eta) = logZ(\eta)$: log partition function ( A is a convex function over the convex set $\Omega$)

다음의 세가지 이유로 지수족은 유용하다고 여겨진다.

Log partition function can generate moments by its derivatives
Covaraicne of the sufficient statistics are the same as Fisher Information Matrix
The statistics of moments are easily derived from $T(x)$(even MLE)

첫번째 불릿 - Log partition function can generate moments by its derivatives

부터 살펴보자. 다음 두가지가 매우 중요하다. $\nabla_\eta A(\eta) = E[T(x)], \; \nabla^2_\eta A(\eta) = Cov[T(x)]$ 이 중 첫번째에 대한 증명이 아래와 같다.

결국 핵심은 미분으로 모멘트를 생성할 수 있다는 것이다.

두번째 불릿 - Covaraicne of the sufficient statistics are the same as Fisher Information Matrix(FIM)

몇개의 조건(regulartiy conditions) 하에서 Fisher Information은 다음과 같이 계산된다.

$$ F(\eta) = -E_{p(x\eta)} [\nabla^2_\eta log p(x|\eta)] $$

$$ F(\eta) = -E_{p(x|\eta)} [ \nabla^2_\eta (\eta^TT(x) - A(\eta))] = \nabla^2_\eta A(\eta) = Cov[T(x)]$$

즉, 로그 분할 함수(log partition function)의 헤시안 매트릭스가 되는데, 이는 충분통계랑 $T$의 공분산과 동일하다는 것이다.

Cramer-Rao Lower Bound 등을 계산할 때 FIM이 이용되는데 지수족에 있는 분포들이면 손쉽게 계산된다.

세번째 불릿 -The statistics of moments are easily derived from $T(x)$(even MLE)

The likelihood of an exponential family has the form

$$ p(D|\eta) \Big[ \Pi^N_{n=1} h(x_n) \Big] exp\Big( \eta^T [\sum^N_{n=1} T(x_n)] - NA(\eta) \Big) \propto exp[\eta^T T(D)-NA(\eta)] $$

$T(D) = [ \sum^N_{n=1} T_1(x_n), ..., \sum^N_{n=1} T_K(x_n)]$
이 때 해당 꼴에서 로그를 씌운 로그 우도함수(log-likelihood function)의 미분값이 0이 되면 이 때 충분통계량의 추정평균과 실제 평균이 같아진다.

$$ log p(D|\eta) = \eta^T T(D) - NA(\eta) + const $$

$$ \nabla_\eta log p(D|\eta) = \nabla_\eta \eta^T T(D) - N \nabla_\eta A(\eta) = T(D) - N E[T(x)] $$

N=1(single data case)에 대해서는 다음과 같다.

$$ \nabla_\eta log p(x|\eta) = T(x) - E[T(x)] $$

이 때 좌변의 gradient를 0으로 조절하면 (로그 우도함수의 최대화) 다음과 같으며 이를 moment matching 이라 한다.

$$ E[T(x)] = \dfrac{1}{N} \sum^N_{n=1} T(x_n)$$

Probabilistic Machine Learning: Advanced Topics. probml.github.io. (n.d.). https://probml.github.io/pml-book/book2.html

'잡담' 카테고리의 다른 글

게임에 적용되는 ML/DL (0)	2022.09.27

'잡담' Related Articles

게임에 적용되는 ML/DL 2022.09.27

Comments

statduck

Exponential Family 본문

Exponential Family

'잡담' 카테고리의 다른 글

티스토리툴바