Learning the structure of a BN

Includes local and global components...

Local: independence tests

Measures of deviance-from-independence between variables

For variables $x_i, x_j$ in dataset $\mathcal D$ of $M$ samples...

  1. Pearson's Chi-squared ($\chi^2$) statistic:

$$ d_{\chi^2}(\mathcal D) = \sum_{x_i, x_j} \frac{(M[x_i,x_j] - M \cdot \hat P(x_i) \cdot \hat P(x_j))^2}{M \cdot \hat P(x_i) \cdot \hat P(x_j)} $$

Independence increases as this value approaches 0

  1. Mutual information (KL divergence) b/w joint and product of marginals:

$$ d_I(\mathcal D) = \frac{1}{M} \sum_{x_i,x_j} M[x_i, x_j] \log \frac{M[x_i, x_j]}{M[x_i]M[x_j]} $$

Independence increases as this value approaches 0

A decision rule for accepting/rejecting hypothesis of independence

Choose some p-value $t$, acept if $d(\mathcal D) <= t$, else reject.

Global: scoring the structure

For a graph $\mathcal G$ with $n$ variables

  1. Log-likelihood score:

$$ \text{score}_L (\mathcal G: \mathcal D) = \sum_{\mathcal D} \sum_{i=1}^n \log \hat P (x_i | \text{parents}(x_i)) $$

  1. Bayesian score:

$$ \text{score}_B (\mathcal G: \mathcal D) = \log p(\mathcal D | \mathcal G) + \log p(\mathcal G) $$

  1. Bayes information criterion (with Dirichlet prior over graphs):

$$ \text{score}_{BIC} (\mathcal G: \mathcal D) = l(\hat \theta) : \mathcal D) - \frac{\log M}{2} \text{Dim}(\mathcal G) $$

Learning algorithms