dron's garden!

❯

❯

Mixture of Experts

Mixture of Experts

Last modified: Nov 26, 20241 min read

definition
AI

duplicating the MLPs and route representations via a “routing” layer
you save compute relative to having a bigger MLP
which gives you better inference properties (better long context, tend to do better multilingually)
something maybe about initialization behaviour?
- overtrained base models are good targets for duplication (upcycling https://arxiv.org/abs/2410.07524)

Graph View

Backlinks

No backlinks found

Created with Quartz v4.4.0 © 2025

GitHub
Discord Community