Mirella Lapata: Learning Natural Language Interfaces with Neural Models

Part of my series of notes from ICLR 2019 in New Orleans. This was a fantastic talk, and not just because it included this SNL clip on Alexa.

Introduction

want to interact with computers in human language
semantic parsing
- natural language => machine executable language
- don’t really care what that language is – SQL, lambda calculus, whatever
applications: question answering, digital assistants, etc.
- “They all have female names, which I object to. Why not Bob… I don’t know”
challenges:
- match natural and artificial language
- well-formed machine language (structured prediction)
- model coverage

challenges

seq2seq model to map natural language input to logical form
- encoder-decoder, attention mechanisms, the works
but “we are natural language people and we do not like sequences”
use seq2tree instead
- decoder predicts sequentially, but can predict non-terminal nodes that use preceding information to feed child LSTM
- training and inference are standard

seq2tree

“coarse-to-fine” decoding
don’t predict logical form immediately
predict meaning sketch – abstraction of logical form
- sketches are shared for examples with same basic meaning
- (how’s annotation work for this? is it much more difficult?)
where do the templates come from?
- could learn… probably overkill though
- from lambda expression / code where we know what we want to remove (variables etc.)
experiments – note nothing is specific to particular meaning representation

coarse

humans will always say unpredictable stuff
query paraphrasing framework – not a total solution, but it can make things more robust
train paraphrase model jointly with QA model
aside: where do the paraphrases come from?

architecture
- paraphrase scoring – is this a good paraphrase?
- QA model – wants good paraphrase and good answer
for paraphrasing, use neural machine translation (e.g. OpenNMT), but want EN-EN…
bilingual pivoting
- if 2 phrases translate to same phrase in another language, likely to be paraphrases
- indirect path EN-XX-EN
- use multiple pivots (including in other languages)

pivoting

encoder-decoder performs competitively with minimal engineering effort
constrained decoding is important (trees & sketches)
paraphrases enhance model robustness
all these models are fully general
data and code here
future work:
- learn meaning sketches
- learn model from database alone
  - “we will never have enough training data”
- and of course…

brexit

GANs to generate paraphrases?
- the issues are around constraining the semantics
predicting the sketches matter a lot – have high weight (hyperparameter)
large language model pretraining & fine-tuning?
- could help for small datasets, but benefits diminish for larger datasets
- could help a lot with paraphrasing – similarity scoring etc.
- “Rumour has it that wherever you use BERT you get 2-3% performance increases”
other settings besides supervised?
- e.g. have answer to question, not logical form