Language Modeling for Contextual Representation Learning

09 Nov 2018 |

Mark Neumann (AI2), South England Natural Language Processing Meetup. Slides are here.

1. The beginning: ELMo

2. Others pile on

variation

3. About those factors of variation

conclusions

layers

4. AllenNLP, the advert

Postscript

After this meetup, I had an interesting chat with some people at work about how the sharp syntax / semantics distinction is something that linguists (basically Chomsky, natch) introduced, somewhat artificially, to try to understand language. It may in fact have little or nothing to do with how language actually works. Of course this doesn’t detract from the utility of the distinction for the study of language, but it does raise the question of whether this is a useful piece of knowledge for neural networks or not.

And in particular, while a connection between an aspect of a particular model and ability to solve syntax vs. semantics tasks is an appealing piece of evidence supporting the notion that that model is really understanding language, we should take this with a grain of salt – because who knows if even human linguists are really understanding language! Though it is fascinating to think that a language model trained on a shit ton of unstructured text might have happened upon a similar “mental model” for how language works as Chomsky did.

Finally, the connection between depth in the network and amount of context actually kinda supports the notion that the distinction between syntax and semantics is not so sharp after all, if you think about how it fleshes out the earlier claim that “shallow is good at syntax, deep is good at semantics”. In fact, it does seem that there is a spectrum from small context, syntax-y stuff to large context, semantics-y stuff – and we can use layers of deep networks to effectively step from one to the other.

Not sure if this makes any sense (even to the individuals who were a part of this conversation), but anyway I find this stuff quite interesting!