# Approximating large frequency moments with pick-and-drop sampling

 Computer Science/Discrete Mathematics Seminar I Topic: Approximating large frequency moments with pick-and-drop sampling Speaker: Vladimir Braverman Affiliation: Johns Hopkins University Date: Monday, November 4 Time/Room: 11:15am - 12:15pm/West Bldg. Lect. Hall Video Link: https://video.ias.edu/csdm/2013/1104-VladimirBraverman

Given data stream $D = \{p_1,p_2,...,p_m\}$ of size $m$ of numbers from $\{1,..., n\}$, the frequency of $i$ is defined as $f_i = |\{j: p_j = i\}|$. The $k$-th frequency moment of $D$ is defined as $F_k = \sum_{i=1}^n f_i^k$. We consider the problem of approximating frequency moments for $k\ge 3$. For any constant $c$ we show an $O(n^{1-2/k}\log(n)\log^{(c)}(n))$ upper bound on the space complexity of the problem. Here $\log^{(c)}(n)$ is the iterative $\log$ function. We make the following assumptions: $n$ and $m$ are polynomially far; approximation error $\epsilon$ and parameter $k$ are constants. We observe a natural bijection between streams and special matrices. Our main technical contribution is a non-uniform sampling method on matrices. We call our method a *pick-and-drop sampling*; it samples a heavy element (i.e., element $i$ with frequency $\Omega(F_k)$) with probability $\Omega(1/n^{1-2/k})$ and gives approximation $\tilde{f_i} \ge (1-\epsilon)f_i$. In addition, the estimations never exceed the real values, that is $\tilde{f_j} \le f_j$ for all $j$. As a result, we reduce the space complexity of finding a heavy element to $O(n^{1-2/k}\log(n))$ bits. We apply our method of recursive sketches and resolve the problem with $O(n^{1-2/k}\log(n)\log^{(c)}(n))$ bits. This is joint work with Rafail Ostrovsky.