808facfe
Francisco Coelho
Main text adapted...
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
|
# Probability Problems
>- What are the general tasks we expect to solve with probabilistic programs?
> - The **MAP** task is the one with best applications. It is also the hardest to compute.
> - **MLE** is the limit case of **MAP**. Has simpler computations but overfits the data.
## Background
- **Conditional Probability** $$P(A, B) = P(B | A) P(A).$$
- **Bayes Theorem** $$P(B | A) = \frac{P(A | B) P(B)}{P(A)}.$$
- **For maximization tasks** $$P(B | A) \propto P(A | B) P(B).$$
- **Marginal** $$P(A) = \sum_b P(A,b).$$
- In $P(B | A) \propto P(A | B) P(B)$, if the **posterior** $P(B | A)$ and the **prior** $P(B)$ follow distributions of the same family, $P(B)$ is a **conjugate prior** for the **likelihood** $P(A | B)$.
- **Density Estimation:** Estimate a joint probability distribution from a set of observations; Select a probability distribution function and the parameters that best explains the distributions of the observations.
## MLE: Maximum Likelihood Estimation
> Given a probability **distribution** $d$ and a set of **observations** $X$, find the distribution **parameters** $\theta$ that maximize the **likelihood** (_i.e._ the probability of those observations) for that distribution.
>
> **Overfits the data:** high variance of the parameter estimate; sensitive to random variations in the data. Regularization with $P(\theta)$ leads to **MAP**.
Given $d, X$, find
$$
\hat{\theta}_{\text{MLE}}(d,X) = \arg_{\theta} \max P_d(X | \theta).
$$
## MAP: Maximum A-Priori
> Given a probability **distribution** $d$ and a set of **observations** $X$, find the distribution **parameters** $\theta$ that best explain those observations.
Given $d, X$, find
$$
\hat{\theta}_{\text{MAP}}(d, X) = \arg_{\theta}\max P(\theta | X).
$$
Using $P(A | B) \propto P(B | A) P(A)$,
$$\hat{\theta}_{\text{MAP}}(d, X) = \arg_{\theta} \max P_d(X | \theta) P(\theta)$$
Variants:
- **Viterbi algorithm:** Find the most likely sequence of hidden states (on HMMs) that results in a sequence of observed events.
- **MPE: Most Probable Explanation** and **Max-sum, Max-product algorithms:** Calculates the marginal distribution for each unobserved node, conditional on any observed nodes; Defines the most likely assignment to all the random variables that is consistent with the given evidence.
|