map of machine learning:
- machine learning models are probability models
- specifying a random variable, and then fitting a relevant distribution
- causal models as the future of ML?
AI and algorithms
- gnns:
- permutation invariant functions of features of nodes and their neighbourhoods
- training gnns to execute classical algorithms
- in particular: infinite data
- clearly specified generating function
- (seemingly easy — and yet)
- What can neural networks reason about
- gnns align with dynamic programmin
- distances are node features
- minimizing is aggregation
- relaxation is summing?
- Neural execution of graph algorithms
- easy to overfit
- shared layer that iterates (arbitrary input sizes)
- encode-process-produce
- using a relevant aggregation (sum vs max)
- step-wise supervision
- Neural turing machine?
- how is this useful?
- algorithmically aligned models accelerate science
- approximating np-hard problems
- A generalist neural algorithmic learner
bayesian machine learning
- we need a empricial approximation of risk
- that doesn’t require knowing the data distribution D
- problems occur on two axes
- assumptions/knowledge, and data
- max data (have everything we would ever want to predict on) (lookup table)
- LSA(S):=
- algorithm A that generates hypotheses h over a finite set H
- SGD changes hypothesis finder
- data augmentation is changing sample
- parametrize knowledge p(f)=∫p(f∣x=xi,y=yj)p(x,y)dx,y
- knowledge about the points
- prediction of p(g)=∫p(f∣x)dp(x)
eric michaud and grokking
- to generalize on an arithmetic dataset, networks must perform that computation somehow internally
- LU mechanism — leads to grokking
- weight norms
- finding something in parameter (weight) space, and then finding this goldilocks zone
- Omnigrok Grokking Beyond Algorithmic Data