Demo of the Aggregators
========================================================

The file **aggregators.R** includes implementations of the aggregators 
 proposed in the paper, together with some variations.

The file **evaluation.R** contains functions that are used to evaluate the 
 results against the gold standard.

```{r}
source("./aggregators.R")
source("./evaluation.R")
```
(Please make sure the auxiliary file **aux.R** is in the same directory.) 

## Input

Each aggregator takes as input a data frame which should have 3 columns named 
 *Annotator*, *Item*, and *Category*. 
If evaluation is needed (or the oracle aggregator **ORA** is used), 
 then the data frame should have an additional column named *Gold*.
 
For example, below is the *Question Dialogue Acts (QDA)* dataset used in the 
 paper.
 
```{r}
qda <- read.csv("./QDA-annotations.csv", colClasses="character")
head(qda, n=11)
```

## Use of Aggregators

There are 6 aggregators: **SPR**, **COM**, **INV**, **DIFF**, **RAT**, **AGR** 
(with 2 additional variations **AGR.PRIOR** and **AGR.ITER**). 

The result of applying an aggregator is a named vector showing the collective 
 annotation for each item. 
For instance, below are some results for the *Simple Plurality Rule* (**SPR**).
```{r}
qda.spr <- SPR(qda)
head(qda.spr)
```
As a sanity check, we know from above that all annotators chose category 2 
 for item sw_0165_4079_A_59_utt2. 
```{r}
qda.spr["sw_0165_4079_A_59_utt2"]
```
Sure enough, **SPR** outputs category 2 for that item.


## Evaluation

We can use the functions in **evaluation.R** to extract the gold standard and 
 calculate the observed aggreement between an aggregator and the gold standard:
```{r}
qda.gold <- ReadGold(qda)
head(qda.gold)
ObservedAgreement(qda.spr, qda.gold)
```
(Please make sure to source **evaluation.R** if the oracle aggregator **ORA** 
 is used, since it needs the *ReadGold* function to access the gold standard)

## Optional Arguments

Other aggregators mostly work similarly, but the following aggregators have 
 optional arguments.
 
For **COM** and the **AGR** family (including **ORA**), 
 by default the number of categories K is the number of 
 different categories that annotators assigned to all the items, which may not 
 be correct if a category was never used by any annotator. 
 On the other hand, in that case sometimes it is not unreasonable to exclude 
 that category. In any case, K can be explicitly specified, e.g., 
```{r}
qda.com <- COM(qda, K=4)
head(qda.com)
ObservedAgreement(qda.com, qda.gold)
``` 

In addition, for **AGR.ITER**, the default maximal number of iterations is 50, 
 but it can be modified, e.g.,
```{r}
qda.agrIter <- AGR.ITER(qda, K=4, iter.max=10)
head(qda.agrIter)
ObservedAgreement(qda.agrIter, qda.gold)
```