## Papers and preprints

Click the title to view/hide the abstract.

- The loss landscape of overparametrized neural networks.

arXiv link

At the heart of deep learning is a procedure for minimizing a loss function $L: \mathbb{R}^n \rightarrow \mathbb{R}$. This function $L$ is essentially always nonconvex, and often pictured as a Morse function. In this paper, we show that $L$ is not Morse, and in fact has a positive dimensional manifold of global minima, whose dimension we compute.

We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from $\mathbb{R}^n$ to $\mathbb{R}$ - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has n parameters and is trained on $d$ data points, with $n>d$, we show that the locus $M$ of global minima of $L$ is usually not discrete, but rather an $nād$ dimensional submanifold of $\mathbb{R}^n$. In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that $M$ is typically a very high-dimensional subset of $\mathbb{R}^n$. - A Fock Space Approach to Severi Degrees of Hirzebruch Surfaces.

arXiv link

In this paper, we extend the approach of the previous paper to the setting of Hirzebruch surfaces, and connect the resulting formulas to the groundbreaking work of Caporaso-Harris, as well as subsequent refinements of Vakil, Getzler, Abramovich, and Bertram.

The classical Severi degree counts the number of algebraic curves of fixed genus and class passing through some general points in a surface. In this paper we study Severi degrees as well as several types of Gromov-Witten invariants of the Hirzebruch surfaces $F_k$, and the relationship between these numbers. To each Hirzebruch surface $F_k$ we associate an operator $M_{F_k} \in H[\mathbb{P}^1]$ acting on the Fock space $F[\mathbb{P}^1]$. Generating functions for each of the curve-counting theories we study here on $F_k$ can be expressed in terms of the exponential of the single operator $š¬_{F_k}$, and counts on $\mathbb{P}^2$ can be expressed in terms of the exponential of $š¬_{F_1}$. Several previous results can be recovered in this framework, including the recursion of Caporaso and Harris for enumerative curve counting on $\mathbb{P}^2$, the generalization by Vakil to $F_k$, and the relationship of Abramovich-Bertram between the enumerative curve counts on $F_0$ and $F_2$. We prove an analog of Abramovich-Bertram for $F_1$ and $F_3$. We also obtain two differential equations satisfied by generating functions of relative Gromov-Witten invariants on $F_k$. One of these recovers the differential equation of Getzler and Vakil. - A Fock Space Approach to Severi Degrees (with Rahul Pandharipande).
*Proceedings of the London Mathematical Society*, 114 (2017) no. 3, 476–494.

arXiv link journal link

One of the classical problems in algebraic geometry is that of computing Severi degrees - the number of plane curves of a fixed genus and degree passing through a well chosen number of generic points. In this paper, we develop a new approach to this classical problem by connecting it to representation theory of an infinite dimensional Lie algebra.

The classical Severi degree counts the number of algebraic curves of fixed genus and class passing through points in a surface. We express the Severi degrees of $\mathbb{CP}^1 \times \mathbb{CP}^1$ as matrix elements of the exponential of a single operator M on Fock space. The formalism puts Severi degrees on a similar footing as the more developed study of Hurwitz numbers of coverings of curves. The pure genus 1 invariants of the product $E \times \mathbb{CP}^1$ (with $E$ an elliptic curve) are solved via an exact formula for the eigenvalues of $M$ to initial order. The Severi degrees of $\mathbb{CP}^2$ are also determined by $M$ via the $(-1)^{(d-1)/d^2}$ disk multiple cover formula for Calabi-Yau 3-fold geometries. - Mirror Symmetry for Stable Quotients Invariants (with Aleksey Zinger).
*Michigan Math Journal*, 63 (2014), no. 3, 571–621.

arXiv link journal link

The mirror conjecture has motivated work in Gromov-Witten theory for decades and in it's full form is still unproved. In this paper, we sought to deepen our understanding of the mirror conjecture in genus one by relating Gromov-Witten invariants, stable quotient, and generating functions arising from the mirror. We find that stable quotients provide a natural bridge between Gromov-Witten invariants and generating functions arising on the mirror.

The moduli space of stable quotients introduced by Marian-Oprea-Pandharipande provides a natural compactification of the space of morphisms from nonsingular curves to a nonsingular projective variety and carries a natural virtual class. We show that the analogue of Givental's J-function for the resulting twisted projective invariants is described by the same mirror hypergeometric series as the corresponding Gromov-Witten invariants (which arise from the moduli space of stable maps), but without the mirror transform (in the Calabi-Yau case). This implies that the stable quotients and Gromov-Witten twisted invariants agree if there is enough "positivity", but not in all cases. As a corollary of the proof, we show that certain twisted Hurwitz numbers arising in the stable quotients theory are also described by a fundamental object associated with this hypergeometric series. We thus completely answer some of the questions posed by Marian-Oprea-Pandharipande concerning their invariants. Our results suggest a deep connection between the stable quotients invariants of complete intersections and the geometry of the mirror families. As in Gromov-Witten theory, computing Givental's J-function (essentially a generating function for genus 0 invariants with 1 marked point) is key to computing stable quotients invariants of higher genus and with more marked points; we exploit this in forthcoming papers. - The Geometry of Stable Quotients in Genus One.
*Mathematische Annalen*, 361 (2015), no. 3–4, 943–979.

arXiv link journal link

In 2011, Marian Oprea and Pandharipande introduced an alternate compactification of the space of stable maps. One striking feature was that some of these spaces were smooth in genus one, while the corresponding space of stable maps was not. In this paper, we compute many features of these smooth spaces including betti numbers and cones of ample and effective divisors.

Stable quotient spaces provide an alternative to stable maps for compactifying spaces of maps. When the target is projective space and the domain curve has genus 1, these are smooth proper Deligne-Mumford stacks. In this paper we study the associated coarse moduli schemes. We show these schemes are projective, rationally connected and have Picard number 2. Then we give generators for the Picard group, compute the canonical divisor, and the cones of ample and effective divisors. In certain cases, we also give a closed formula for the Poincaré polynomial. - Congruences for Modular Forms of Non-positive Weight (with Nicholas Wage and Irena Wang).
*International Journal of Number Theory*, 4 (2008), 1–13.

journal link

Ramanujan made the striking discovery that the partition function $p(n)$ satisfies a congruence $p(5k+4) \cong 0 (mod 5)$, as well as additional congruences related to other prime numbers. In this paper we show that under some technical conditions, if $f$ is a modular form of non-positive weight then such congruences can exist for only finitely many primes.

In this paper, we consider modular forms $f(z)$ whose $q$-series expansions $\sum b(n)q^n$ have coefficients in a localized ring of algebraic integers $\mathcal{O}_{K,\nu}$. Extending results of Serre and Ono, we show that if $f$ has non-positive weight, a congruence of the form $b(\ell n+a) \equiv 0 \pmod \nu$, where $\nu$ is a place over $\ell$ in $\mathcal{O}_K$, can hold for only finitely many primes $\ell\geq 5$. To obtain this, we establish an effective bound on $\ell$ in terms of the weight and the structure of $f(z)$. - Properties Determined by the Ihara Zeta Function of a Graph.
*Electronic Journal of Combinatorics*, 16 (2009).

journal link

Given a graph $G$, one can associate a power series $Z_G(u)$ called the Ihara zeta function. In this work, we prove that several properties of a finite graph $G$ can be recovered from its Ihara zeta function. In addition, we show that several properties cannot be recovered by producing examples of non-isomorphic finite graphs with the same Ihara zeta function.

In this paper, we show how to determine several properties of a finite graph $G$ from its Ihara zeta function $Z_G(u)$. If $G$ is connected and has minimal degree at least 2, we show how to calculate the number of vertices of $G$. To do so we use a result of Bass, and in the case that $G$ is nonbipartite, we give an elementary proof of Bass' result. We further show how to determine whether $G$ is regular, and if so, its regularity and spectrum. On the other hand, we extend work of Czarneski to give several infinite families of pairs of non-isomorphic non-regular graphs with the same Ihara zeta function. These examples demonstrate that several properties of graphs, including vertex and component numbers, are not determined by the Ihara zeta function. We end with Hashimoto's edge matrix $T$. We show that any graph $G$ with no isolated vertices can be recovered from its $T$ matrix. Since graphs with the same Ihara zeta function are exactly those with isospectral $T$ matrices, this relates again to the question of what information about $G$ can be recovered from its Ihara zeta function.

## Works in progress

- Descendent Gromov-Witten invariants of surfaces via operators.
- The geometry of the critical loci for overparameterized neural networks.

## Notes

- Gradient descent in higher codimension.

arXiv

We consider the behavior of gradient flow and of discrete and noisy gradient descent. It is commonly noted that the addition of noise to the process of discrete gradient descent can affect the trajectory of gradient descent. In previous work, we observed such effects. There, we considered the case where the minima had codimension 1. In this note, we do some computer experiments and observe the behavior of noisy gradient descent in the more complex setting of minima of higher codimension. - Gradient descent in some simple settings.

arXiv

In this note, we observe the behavior of gradient flow and discrete and noisy gradient descent in some simple settings. It is commonly noted that addition of noise to gradient descent can affect the trajectory of gradient descent. Here, we run some computer experiments for gradient descent on some simple functions, and observe this principle in some concrete examples.