map of machine learning:

  • machine learning models are probability models
    • specifying a random variable, and then fitting a relevant distribution
  • causal models as the future of ML?

AI and algorithms

  • gnns:
    • permutation invariant functions of features of nodes and their neighbourhoods
  • training gnns to execute classical algorithms
    • in particular: infinite data
    • clearly specified generating function
    • (seemingly easy — and yet)
  • What can neural networks reason about
  • gnns align with dynamic programmin
    • distances are node features
    • minimizing is aggregation
    • relaxation is summing?
  • Neural execution of graph algorithms
    • easy to overfit
    • shared layer that iterates (arbitrary input sizes)
    • encode-process-produce
    • using a relevant aggregation (sum vs max)
    • step-wise supervision
      • checking your work
  • Neural turing machine?
  • how is this useful?
    • algorithmically aligned models accelerate science
    • approximating np-hard problems
  • A generalist neural algorithmic learner

bayesian machine learning

  • we need a empricial approximation of risk
    • that doesn’t require knowing the data distribution
  • problems occur on two axes
    • assumptions/knowledge, and data
    • max data (have everything we would ever want to predict on) (lookup table)
    • algorithm that generates hypotheses over a finite set
  • SGD changes hypothesis finder
  • data augmentation is changing sample
  • parametrize knowledge
  • knowledge about the points
  • prediction of

eric michaud and grokking

  • to generalize on an arithmetic dataset, networks must perform that computation somehow internally
  • LU mechanism — leads to grokking
    • weight norms
    • finding something in parameter (weight) space, and then finding this goldilocks zone
  • Omnigrok Grokking Beyond Algorithmic Data