Part of my series of notes from ICLR 2019 in New Orleans.

I missed the beginning of this one and had a little trouble catching up, so these notes make even less sense than usual (especially the first bit). Sorry about that.

## The First Bit

- multiple datasets for same phenomenon but exhibiting potentially different biases
- want to learn real phenomenon without spurious correlations in dataset
- nature doesn’t shuffle examples, we do
- …so maybe we shouldn’t

- robust regression
**interpolation**in the convex hull of seen training environmentss

- but is interpolation enough?
- how to learn stable properties across environments?
- invariant regression
**extrapolation**rather than just interpolation

- design function family that is insensitve to spurious correlations

**invariant representations**- find relevant variables so that regression is invariant

- some related work
- invariance and causation – properties that are invariant after intervention
- adversarial domain adaptation

## Invariant Regularization

- method corresponds to inserting frozen domain adaptation layer
- idea: make the representation good enough that I don’t have to learn a different adaptation layer for each domain

- show this works on “colored MNIST”
- 2 datasets, engineered to have misleading information, but in different ways
- invariant regularization helps model perform well even when test set is perversely chosen to be quite different

- issues scaling it up
- numerical issues
- realizable problems work differently (colored MNIST isn’t realizable)

- phenomenon (pre-label) vs. interpretation (post-label, i.e. annotation)
- supervised – designed to be realizable, labelling supposed to be deterministic
- unsupervised – label comes from nature, not necessarily realizable

- to sum up…