@nanmeng 2016-05-19T08:43:41.000000Z 字数 4208 阅读 1065

# Probabilistic Graphical Models(Stanford) - 1

notes Probabilistic_Graphical_Models

## Pre-Class

Why factors?
* Fundamental building block for defining distributions in high-dimensional spaces
* Set of basic operations for manipulating these probability distributions

## Week1 Bayesian Network Fundamentals

### 1. Semantics & Factorization

An example for bayesian network:

The calculation rule: chain rule

The illustration of calculation the joint distribution:
(how to calculate with the value)

Bayesian Network:

• A directed acyclic graph(DAG)
• For each node $X_i$ a CPD $P(X_i|Par_G(X_i))$

#### A trick in BN:

As shown below, in the calculation of Bayesian Network, the summation can be calculated on ''part'' of the whole equation.

#### P Factorizes over G:

An example of Genetic Inheritance

### 2. Reasoning Patterns

• Causal Reasoning(top down)
• Evidential Reasoning(bottom up)
• Intercausal Reasoning(flow information between two causes)

An illustration of the Intercausal Reasoning is hard to figure out:

Student aces the SAT contribute to the increase prob. of $P(i^1|g^3,d^1)$ and the prob. of $P(d^1|g^3,s^1)$.

### 3. Flow of Probabilistic Influence

An example of active trail:

A trail $X_1 - ... - X_k$ is active if: it has no v-structures like $X_{i-1} \rightarrow X_i \leftarrow X_{i+1}$
Notice: v-structure is a structure that two nodes point to the same one. (like $X_{i-1} \rightarrow X_i \leftarrow X_{i+1}$).
Then the rules for what condition influence the information flow is like what shown below.

(The final line in the table of the picture above is: $X$ and all of its descendants not in $Z$ | either if $W$ or one of its descendants is in $Z$)

#### Summary

##### Independencies in BNs
• Types of three-variable structures: chain (aka causal trail or evidential trail), common parent
(aka common cause), v-structure (aka common effect)
• Property: A variable $X$ is independent of its non-descendants given its parents
• Property: A variable $X$ is independent of all other variables in the network given its Markov blanket, which consists of its parents, its children, and its co-parents

### 4. Conditional Independence

basic for conditional independence & symbol

The definition of conditional independence

An example of conditional independence:

when people have not been told that the coin is a fair coin, then the prob of second time toss the coin and get head is higher given the first time get head. However, when people have been told that this is a fair coin, the two times tosses are independent with each other.

### 5. Independencies in Bayesian Networks

Recap:

A new question:

• Theorem: If $P$ factorizes over $G$, and $d-sep_G(X,Y|Z)$ then $P$ satisfies $(X \perp Y | Z)$

• red line: all the non-descendants of Letter
• descendants of Letter are Job, Happy.

#### I-maps

Example:

G1 is the I-map of P1, while G2 is the I-map of P1 or P2.

I-Maps
• I-Map: A graph G is an I-map for a distribution P if $I(G) \subseteq I(P)$
• Minimal I-Map: A graph G is a minimal I-map for a distribution P if you cannot remove any
edges from G and have it still be an I-map for P
• Perfect I-Map: A graph G is a perfect I-map for a distribution P if $I(G) = I(P)$
• I-Equivalence: Two graphs G1 and G2 are I-equivalent if $I(G1) = I(G2)$

Illustrate one example in the picture: We know that when knowing the parrent of a node then it is independent with its non-descendants.Thus $P(S|D,I,G) \Rightarrow P(S|I)$
Thus, the first equation in the picture is equall to the second equation.

### 6. Naive Bayes

• What independence assumption does the Naive Bayes model make?
Given the class variable, each observed variable is independent of the other observed variables.

If given the class, the variables are independent with each other.

green: prior probabilities of two classes
blue: odds ratio
An example of Bernoulli Naive Bayes for text

SIAMIAM

### Relative materials

• Markov Random Fields
• 3.1 Independencies in MRFs
• Two variables $X$ and $Y$ are independent if there is no active trail between them; a trail is active if it doesn’t contain any observed variables.
• Property: A variable $X$ is independent of all other variables in the network given its Markov blanket, which consists of its direct neighbors in the graph.
• 3.2 Parameterization of MRFs
• Markov random fields are parameterized by a set of factors defined over cliques in the graph; factors are not distributions as they do not have to sum to 1.
• The joint probability distribution of the variables in an MRF can be written in factorized form as a normalized product of factors, i.e. $P(X1, ..., Xn) = \frac{1}{Z}\phi_i(C_i)$ where $C_i$ is the set of variables in the ith clique, and $Z$ is the partition function.

### SMO(Sequential Minimal Optimization)

• Features:

• Memory: The amount of memory required for SMO is linear in the training set size(which allows SMO to handle very large training sets)
• Speed: SMO is fastest for linear SVMs and sparse data sets.
• relevant materials: http://research.microsoft.com/pubs/69644/tr-98-14.pdf

• 私有
• 公开
• 删除