Computer Science/Discrete Mathematics Seminar I | |

Topic: | Active learning with "simple" membership queries |

Speaker: | Shachar Lovett |

Affiliation: | University of California, San Diego |

Date: | Monday, January 23 |

Time/Room: | 11:15am - 12:15pm/S-101 |

Video Link: | https://video.ias.edu/csdm/2017/0123-ShacharLovett |

An active learning algorithm for a classification problem has access to many unlabelled samples. The algorithm asks for the labels of a small number of samples, carefully chosen, such that that it can leverage this information to correctly label most of the unlabelled samples. It is motivated by many real world applications, where it is much easier and cheaper to obtain access to unlabelled data compared to labelled data. The main question is: can active learning algorithms out-perform classic passive learning algorithms? can we always choose a few unlabelled samples, such that revealing their label will teach us the labels for most of the unlabelled samples? A classical example where this is possible is that of thresholds in the real line. Consider an unknown distribution over $\mathbb R$, where a point $x$ is labelled as 0 if $x < t$, and 1 if $x \ge t$, where $t$ is an unknown threshold. If one wishes to correctly label $1-\epsilon$ fraction of the points, then classical VC theory tells us that it suffices to correctly label a sample of $1/\epsilon$ points. This can be done using a binary search, where the algorithm only asks for the labels of $\log(1/\epsilon)$ points. Thus in this case, the label complexity is exponentially better than the sample complexity. Unfortunately, this breaks down in many other natural concept classes. Even for the concept class of points and half-planes in 2 dimensions, there are simple examples showing that active learning cannot always outperform classical passive learning. As such, most of the research has focused on assuming further structure on the unknown underlying distribution or concept. In this work we suggest another approach: allowing the algorithm to ask "simple" membership queries for points outside the observed sample. For example, in the setting of points in 2 dimensions where concepts are half-planes, we will allow the algorithm to also ask, given two points in our sample, which one is closer to the line defining the unknown half-plane. It turns out that allowing such questions improves the label complexity by an exponential factor. This is made possible by a new combinatorial framework for active learning, based on compression schemes. We show that whenever a learning problem has an "active" compression scheme, then active learning out-performs passive learning by an exponential factor. We then show that such schemes exist for the concept classes of points and half-spaces in d dimensions, as well as for large margin classifiers. This turns out to rely on algorithms in discrete geometry for the hyperplane arrangement problem, in particular the study of linear decision trees for $k$-SUM and related problems. Joint work with Shay Moran and Jiapeng Zhang.