`@nanmeng`

`2016-05-19T08:43:41.000000Z`

`字数 4208`

`阅读 1240`

`notes`

`Probabilistic_Graphical_Models`

Why factors?

* Fundamental building block for defining distributions in high-dimensional spaces

* Set of basic operations for manipulating these probability distributions

An example for bayesian network:

The calculation rule: **chain rule**

The illustration of calculation the **joint distribution:**

(how to calculate with the value)

Bayesian Network:

- A directed acyclic graph(DAG)
- For each node a CPD

As shown below, in the calculation of Bayesian Network, the summation can be calculated on ''part'' of the whole equation.

An example of Genetic Inheritance

- Causal Reasoning(top down)
- Evidential Reasoning(bottom up)
- Intercausal Reasoning(flow information between two causes)

An illustration of the Intercausal Reasoning is hard to figure out:

Student aces the SAT contribute to the increase prob. of and the prob. of .

An example of active trail:

A trail is active if: it has no **v-structures** like

**Notice:** v-structure is a structure that two nodes point to the same one. (like ).

Then the rules for what condition influence the information flow is like what shown below.

(The final line in the table of the picture above is: and all of its descendants not in | either if or one of its descendants is in )

- Types of three-variable structures: chain (aka causal trail or evidential trail), common parent

(aka common cause), v-structure (aka common effect)Property: A variable is independent of its non-descendants given its parentsProperty: A variable is independent of all other variables in the network given its Markov blanket, which consists of its parents, its children, and its co-parents

materials：Probabilistic Graphical Models 10-708 Recitation 1 Handout

basic for conditional independence & symbol

The **definition** of conditional independence

An example of conditional independence:

when people have not been told that the coin is a fair coin, then the prob of second time toss the coin and get head is higher given the first time get head. However, when people have been told that this is a fair coin, the two times tosses are independent with each other.

Recap:

A new question:

Theorem:If factorizes over , and then satisfies

red line:all the non-descendants ofLetter- descendants of
LetterareJob,Happy.

Example:

G1 is the I-map of P1, while G2 is the I-map of P1 or P2.

I-Maps

• I-Map: A graph G is an I-map for a distribution P if

• Minimal I-Map: A graph G is a minimal I-map for a distribution P if you cannot remove any

edges from G and have it still be an I-map for P

• Perfect I-Map: A graph G is aperfect I-mapfor a distribution P if

• I-Equivalence: Two graphs G1 and G2 are I-equivalent if

Illustrate one example in the picture: We know that when knowing the parrent of a node then it is independent with its non-descendants.Thus

Thus, the first equation in the picture is equall to the second equation.

What independence assumption does the Naive Bayes model make?

Given the class variable, each observed variable is independent of the other observed variables.

If given the class, the variables are independent with each other.

green: prior probabilities of two classes

blue: odds ratio

An example of Bernoulli Naive Bayes for text

SIAMIAM

- Markov Random Fields
- 3.1 Independencies in MRFs

- Two variables and are independent if there is
no active trailbetween them; a trail isactiveif itdoesn’t contain any observed variables.- Property: A variable is independent of all other variables in the network given its Markov blanket, which
consists of its direct neighborsin the graph.- 3.2 Parameterization of MRFs

- Markov random fields are parameterized by a set of factors defined over cliques in the graph; factors are not distributions as they do not have to sum to 1.
- The joint probability distribution of the variables in an MRF can be written in factorized form as a normalized product of factors, i.e. where is the set of variables in the ith clique, and is the
partition function.

- Features:

**Memory:**The amount of memory required for SMO is**linear**in the training set size(which allows SMO to handle very large training sets)**Speed:**SMO is fastest for linear SVMs and sparse data sets.**relevant materials:**http://research.microsoft.com/pubs/69644/tr-98-14.pdf

添加新批注

在作者公开此批注前，只有你和作者可见。

回复批注