pre-paper.tex 15.4 KB
\documentclass[a4paper, 12pt]{article}

\usepackage[x11colors]{xcolor}
%
\usepackage{tikz}
\usetikzlibrary{calc}
%
\usepackage{hyperref}
\hypersetup{
    colorlinks=true,
    linkcolor=blue,
}
%
\usepackage{commath}
\usepackage{amsthm}
\newtheorem{assumption}{Assumption}
\usepackage{amssymb}
%
% Local commands
%
\newcommand{\note}[1]{\marginpar{\scriptsize #1}}
\newcommand{\naf}{\ensuremath{\sim\!}}
\newcommand{\larr}{\ensuremath{\leftarrow}}
\newcommand{\at}[1]{\ensuremath{\!\del{#1}}}
\newcommand{\co}[1]{\ensuremath{\overline{#1}}}
\newcommand{\fml}[1]{\ensuremath{{\cal #1}}}
\newcommand{\deft}[1]{\textbf{#1}}
\newcommand{\pset}[1]{\ensuremath{\mathbb{P}\at{#1}}}
\newcommand{\ent}{\ensuremath{\lhd}}
\newcommand{\cset}[2]{\ensuremath{\set{#1,~#2}}}
\newcommand{\langof}[1]{\ensuremath{\fml{L}\at{#1}}}
\newcommand{\uset}[1]{\ensuremath{\left|{#1}\right>}}
\newcommand{\lset}[1]{\ensuremath{\left<{#1}\right|}}
\newcommand{\pr}[1]{\ensuremath{\mathrm{P}\at{#1}}}
\newcommand{\given}{\ensuremath{~\middle|~}}

\title{Zugzwang\\\textit{Logic and Artificial Intelligence}}
\author{Francisco Coelho\\ \texttt{fc@uevora.pt}}

\begin{document}
\maketitle

\begin{abstract}
    A major limitation of logical representations is the implicit assumption that the Background Knowledge (BK) is perfect. This assumption is problematic if data is noisy, which is often the case. Here we aim to explore how ASP specifications with probabilistic facts can lead to characterizations of probability functions on the specification's domain. 
\end{abstract}

\section{Introduction and Motivation }

Answer Set Programming (ASP) is a logic programming paradigm based on the Stable Model semantics of Normal Logic Programs (NP) that can be implemented using the latest advances in SAT solving technology. ASP is a truly declarative language that supports language constructs such as disjunction in the head of a clause, choice rules, and hard and weak constraints.

The Distribution Semantics (DS) is a key approach to extend logical representations with probabilistic reasoning. Probabilistic Facts (PF) are the most basic stochastic DS primitive and they take the form of logical facts, $a$, labelled with a probability, such as $p::a$; Each probabilistic fact represents a boolean random variable that is true with probability $p$ and false with probability $1 - p$. A (consistent) combination of the PFs defines a \textit{total choice} $c = \set{p::a, \ldots}$ such that

\begin{equation}
    \pr{C = x} = \prod_{a\in c} p \prod_{a \not\in c} (1- p).
    \label{eq:prob.total.choice}
\end{equation}

Our goal is to extend this probability, from total choices, to cover the specification domain. We can foresee two key applications of this extended probability:

\begin{enumerate}
    \item Support any probabilistic reasoning/task on the specification domain.
    \item Also, given a dataset and a divergence measure, now the specification can be scored (by the divergence w.r.t.\ the \emph{empiric} distribution of the dataset), and sorted amongst other specifications. This is a key ingredient in algorithms searching, for example, an optimal specification of the dataset. 
\end{enumerate}

This goal faces a critical problem concerning situations where multiple standard models result from a given total choice, illustrated by the following example. The specification
\begin{equation}
    \begin{aligned}
        0.3::a&,\cr
        b \vee c& \leftarrow a.
    \end{aligned}
    \label{eq:example.1}
\end{equation}
has three stable models, $\co{a}, ab$ and $ac$. While it is straightforward to set $P(\co{a})=0.7$, there is \textit{no further information} to assign values to $P(ab)$ and $P(ac)$. At best, we can use a parameter $x$ such that
$$
\begin{aligned}
P(ab) &= 0.3 x,\cr
P(ac) &= 0.3 (1 - x).
\end{aligned}
$$

This uncertainty in inherent to the specification, but can be mitigated with the help of a dataset: the parameter $x$ can be estimated from the empirical distribution.

In summary, if an ASP specification is intended to describe some observable system then:

\begin{enumerate}
    \item The observations can be used to estimate the value of the parameters (such as $x$ above and others entailed from further clauses).
    \item With a probability set for the stable models, we want to extend it to all the samples (\textit{i.e.} consistent sets of literals) of the specification. 
    \item This extended probability can then be related to the \textit{empirical distribution}, using a probability divergence, such as Kullback-Leibler; and the divergence value used as a \textit{performance} measure of the specification with respect to the observations.
    \item If that specification is only but one of many possible candidates then that performance measure can be used, \textit{e.g.} as fitness, by algorithms searching (optimal) specifications of a dataset of observations.
\end{enumerate}

Currently, we are on the step two above: Extending a probability function (with parameters such as $x$), defined on the stable sets of a specification, to all the events of the specification. This extension must, of course, respect the axioms of probability so that probabilistic reasoning is consistent with the ASP specification. 

\section{Extending Probabilities}

Given an ASP specification, we consider the \textit{atoms} $a \in \fml{A}$ and \textit{literals}, $z \in \fml{L}$, \textit{events} $e \in \fml{E} \iff e \subseteq \fml{L}$ and \textit{worlds} $w \in \fml{W}$ (consistent events), \textit{total choices} $c \in \fml{C} \iff c = a \vee \neg a$ and \textit{stable models} $s \in \fml{S}$.

% In a statistical setting, the outcomes are the literals $x$, $\neg x$ for each atom $x$, the events express a set of possible outcomes (including $\emptyset$, $\set{a, b}$, $\set{a, \neg a, b}$, \textit{etc.}), and worlds are events with no contradictions.

Our path, traced by equations (\ref{eq:prob.total.choice}) and (\ref{eq:prob.stablemodel} --- \ref{eq:prob.events}), starts with the probability of total choices, $\pr{C = c}$, expands it to stable models, $\pr{S = s}$, and then to worlds $\pr{W = w}$ and events $\pr{E = e}$.

\begin{enumerate}
    \item \textbf{Total Choices.} This case is given by $\pr{C = c}$, from equation \ref{eq:prob.total.choice}. Each total choice $C = c$ (together with the facts and rules) entails some stable models, $s \in S_c$, and each stable model $S = s$ contains a single total choice $c_s \subseteq s$.
    \item \textbf{Stable Models.} Given a stable model $s \in \fml{S}$, and variables/values $x_{s,c} \in \intcc{0, 1}$, 
    \begin{equation}
        \pr{S = s \given C = c} = \begin{cases}
            x_{s,c} & \text{if~} s \in S_c,\cr
            0&\text{otherwise}
        \end{cases}
        \label{eq:prob.stablemodel}
    \end{equation}
    such that $\sum_{s \in S_c} x_{s,c} = 1$.
    \item\label{item:world.cases} \textbf{Worlds.} Each world $W  = w$ either:
    \begin{enumerate}
        \item Is a \textit{stable model}. Then 
        \begin{equation}
            \pr{W  = w \given C = c} = \pr{S = s \given C = c}.
            \label{eq:world.fold.stablemodel}
        \end{equation}
        \item \textit{Contains} some stable models. Then
        \begin{equation}
            \pr{W  = w \given C = c} = \prod_{s \subset w}\pr{S = s \given C = c}.
            \label{eq:world.fold.superset}
        \end{equation}
        \item \textit{Is contained} in some stable models. Then
        \begin{equation}
            \pr{W  = w \given C = c} = \sum_{s \supset w}\pr{S = s \given C = c}.
            \label{eq:world.fold.subset}
        \end{equation}
        \item Neither contains nor is contained by a stable model. Then
        \begin{equation}
            \pr{W  = w} = 0.
            \label{eq:world.fold.independent}
        \end{equation}
    \end{enumerate}
    \item \textbf{Events.} For each event $E = e$,
    \begin{equation}
        \pr{E = e \given C = c} = \begin{cases}
            \pr{W = e \given C = c} & e \in \fml{W}, \cr
            0 & \text{otherwise}.
            \end{cases}
        \label{eq:prob.events}
    \end{equation}
\end{enumerate}

Since stable model are minimal, there is no proper chain $s_1 \subset w \subset s_2$ so each world folds into exactly one ot the four cases of point \ref{item:world.cases} above.

% PARAMETERS FOR UNCERTAINTY

Equation (\ref{eq:prob.stablemodel}) expresses the lack of knowledge about the probability assignment when a single total choice entails more than one stable model. In this case, how to distribute the respective probability? Our answer to this problem consists in assigning an unknown probability, $x_{s,c}$, conditional on the total choice, $c$, to each stable model $s$. This approach allow the expression of an unknown quantity and future estimation, given observed data.

% STABLE MODEL
The stable model case, in equation (\ref{eq:world.fold.stablemodel}), identifies the probability of a stable model \textit{as a world} with its probability as defined previously in equation (\ref{eq:prob.stablemodel}), as a stable model.

% SUPERSET
Equation \ref{eq:world.fold.superset} results from conditional independence of the stable models $s \subset w$. Conditional independence of stable worlds asserts a least informed strategy that we make explicit:

\begin{assumption}
    Stable models are conditionally independent, given their total choices.
\end{assumption}

Consider the stable models $ab, ac$ from the example above. They result from the clause $b \vee c \leftarrow a$ and the total choice $a$. These formulas alone impose no relation between $b$ and $c$ (given $a$), so none should be assumed. Dependence relations are further discussed in Subsection (\ref{subsec:dependence}).

% SUBSET
\hrule

\bigskip
I'm not sure about what to say here.\marginpar{todo}

My first guess was
\begin{equation*}
    \pr{W  = w \given C = c} = \sum_{s \supset w}\pr{S = s \given C = c}.
\end{equation*}

$\pr{W = w \given C = c}$ already separates $\pr{W}$ into \textbf{disjoint} events!

Also, I am assuming that stable models are independent. 

This would entail $p(w) = p(s_1) + p(s_2) - p(s_1)p(s_2)$ \textit{if I'm bound to set inclusion}. But I'm not. I'm defining a relation

Also, if I set $p(w) =  p(s_1) + p(s_2)$ and respect the laws of probability, this entails $p(s_1)p(s_2) = 0$.

So, maybe what I want is (1) to define the cover $\hat{w} = \cup_{s \supset w} s$

\begin{equation*}
    \pr{W  = w \given C = c} = \sum_{s \supset w}\pr{S = s \given C = c} - \pr{W = \hat{w} \given C = c}.
\end{equation*}

But this doesn't works, because we'd get $\pr{W = a \given C = a} < 1$.
%

%
\bigskip
\hrule

% INDEPENDENCE

A world that neither contains nor is contained in a stable model describes a case that, according to the specification, should never be observed. So the respective probability is set to zero, per equation (\ref{eq:world.fold.independent}).
%
%   ================================================================
%
\subsection{Dependence}
\label{subsec:dependence}

Dependence relations in the underlying system can be explicitly expressed in the specification. 

For example, $b \leftarrow c \wedge d$, where $d$ is an atomic choice, explicitly expressing this dependence between $b$ and $c$. One would get, for example, the specification
$$
0.3::a, b \vee c \leftarrow a, 0.2::d, b \leftarrow c \wedge d.
$$
with the stable models
$
\co{ad}, \co{a}d, a\co{d}b, a\co{d}c, adb
$. 


The interesting case is the subtree of the total choice $ad$. Notice that no stable model $s$ contains $adc$ because (1)  $adb$ is a stable model and (2) if $adc \subset s$ then $b \in s$ so $adb \subset s$. 

Following equations (\ref{eq:world.fold.stablemodel}) and (\ref{eq:world.fold.independent}) this sets 
\begin{equation*}
    \begin{cases}
        \pr{W = adc \given C = ad} = 0,\cr
        \pr{W = adb \given C = ad} = 1
    \end{cases}
\end{equation*}
which concentrates all probability mass from the total choice $ad$ in the $adb$ branch, including the node $W = adbc$. This leads to the following cases:
$$
\begin{array}{l|r}
    x & \pr{W = x \given C = ad}\\
    \hline
    ad & 1 \\
    adb & 1\\
    adc & 0\\
    adbc & 1    
\end{array}
$$
so, for $C = ad$, 
$$
\begin{aligned}
    \pr{W = b} &= \frac{2}{4} \cr
    \pr{W = c} &= \frac{1}{4} \cr
    \pr{W = bc} &= \frac{1}{4} \cr
                &\not= \pr{W = b}\pr{W = c}
\end{aligned}
$$
\textit{i.e.} the events $W = b$ and $W = c$ are dependent and that dependence results directly from the segment $0.2::d, b \leftarrow c \wedge d$ in the specification.


%

%
\hrule
\begin{quotation}\note{Todo}
    
    Prove the four world cases (done), support the product (done) and sum (tbd) options, with the independence assumptions.
\end{quotation}
    
\section{Developed Example}

We continue with the specification from equation \ref{eq:example.1}. 

\textbf{Step 1: Total Choices.} The total choices, and respective stable models, are
\begin{center}
    \begin{tabular}{l|r|r}
        Total Choice ($c$) & $\pr{C = c}$ & Stable Models ($s$)\\
        \hline 
        $a$ & $0.3$ & $ab$ and $ac$.\\
        $\co{a} = \neg a$ & $\co{0.3} = 0.7$ & $\co{a}$.
    \end{tabular}
\end{center}

\textbf{Step 2: Stable Models.} Suppose now that 
\begin{center}
    \begin{tabular}{l|c|r}
        Stable Models ($s$) & Total Choice ($c$)  & $\pr{S = c \given C = c}$\\
        \hline 
        $\co{a}$    & $1.0$ & $\co{a}$. \\
        $ab$        & $0.8$ & $a$. \\
        $ac$        & $0.2 = \co{0.8}$ & $a$.
    \end{tabular}
\end{center}

\textbf{Step 3: Worlds.} Following equations \ref{eq:world.fold.stablemodel} --- \ref{eq:world.fold.independent} we get:
\begin{center}
    \begin{tabular}{l|c|l|c|r}
        Occ. ($o$) & S.M. ($s$) & Relation & T.C. ($c$)  & $\pr{W  = w}$\\
        \hline 
        $\emptyset$ & all           & contained & $a$, $\co{a}$ & $1.0$ \\ 
        $a$         & $ab$, $ac$    & contained & $a$ & $0.8\times 0.3 + 0.2\times 0.3 = 0.3$ \\
        $b$         & $ab$          & contained & $a$ & $0.8\times 0.3 = 0.24$ \\ 
        $c$         & $ac$          & contained & $a$ & $0.2\times 0.3 = 0.06$ \\ 
        $\co{a}$    & $\co{a}$      & stable model & $\co{a}$ & $1.0\times 0.3 = 0.3$ \\
        $\co{b}$    & none          & independent & none & $0.0$ \\
        $\co{c}$    & none              & \ldots & & \\
        $ab$        & $ab$          & stable model & $a$ & $0.24$ \\  
        $ac$        & $ac$          & stable model & $a$ & $0.06$ \\  
        $a\co{b}$   & none              & \ldots & & \\
        $a\co{c}$   & none              & \ldots & & \\
        $\co{a}b$   & $\co{a}$      & contains & $\co{a}$ & $1.0$ \\
        $\co{a}c$   &  $\co{a}$         & \ldots & & \\
        $\co{a}\co{b}$  & $\co{a}$      & \ldots & & \\
        $\co{a}\co{c}$  & $\co{a}$      & \ldots & & \\
        $abc$       & $ab$, $ac$    & contains & $a$ & $0.8\times 0.2 = 0.016$ \\
    \end{tabular}
\end{center}

\section*{References}


\begin{enumerate}
    \item Victor Verreet, Vincent Derkinderen, Pedro Zuidberg Dos Martires, Luc De Raedt, Inference and Learning with Model Uncertainty in Probabilistic Logic Programs (2022)
    \item Andrew Cropper, Sebastijan Dumancic, Richard Evans, Stephen H. Muggleton, Inductive logic programming at 30 (2021)
    \item Fabio Gagliardi Cozman, Denis Deratani Mauá, The joy of Probabilistic Answer Set Programming: Semantics - complexity, expressivity, inference (2020)
    \item Fabrizio Riguzzi, Foundations of Probabilistic Logic Programming Languages, Semantics, Inference and Learning. Rivers Publishers (2018) 
    \item Martin Gebser, Roland Kaminski, Benjamin Kaufmann, and Torsten Schaub, Answer Set Solving in Practice, Morgan \& Claypool Publishers (2013)
\end{enumerate}

\end{document}