FC 2023

map of machine learning:

machine learning models are probability models
- specifying a random variable, and then fitting a relevant distribution
causal models as the future of ML?

gnns:
- permutation invariant functions of features of nodes and their neighbourhoods
training gnns to execute classical algorithms
- in particular: infinite data
- clearly specified generating function
- (seemingly easy — and yet)
What can neural networks reason about
gnns align with dynamic programmin
- distances are node features
- minimizing is aggregation
- relaxation is summing?
Neural execution of graph algorithms
- easy to overfit
- shared layer that iterates (arbitrary input sizes)
- encode-process-produce
- using a relevant aggregation (sum vs max)
- step-wise supervision
  - checking your work
Neural turing machine?
how is this useful?
- algorithmically aligned models accelerate science
- approximating np-hard problems
A generalist neural algorithmic learner

we need a empricial approximation of risk
- that doesn’t require knowing the data distribution $D$
problems occur on two axes
- assumptions/knowledge, and data
- max data (have everything we would ever want to predict on) (lookup table)
- $L_{S} A (S) :=$
- algorithm $A$ that generates hypotheses $h$ over a finite set $H$
SGD changes hypothesis finder
data augmentation is changing sample
parametrize knowledge $p (f) = \int p (f ∣ x = x_{i}, y = y_{j}) p (x, y) d x, y$
knowledge about the points
prediction of $p (g) = \int p (f ∣ x) d p (x)$

to generalize on an arithmetic dataset, networks must perform that computation somehow internally
LU mechanism — leads to grokking
- weight norms
- finding something in parameter (weight) space, and then finding this goldilocks zone
Omnigrok Grokking Beyond Algorithmic Data