Through  mysterious means
mysterious means , I was able to attend the last half-day of KDD.  So here’s some notes on the talks I went to.
, I was able to attend the last half-day of KDD.  So here’s some notes on the talks I went to.
(As an aside, I’m also going to try taking notes directly in markdown rather than on paper like some sort of neanderthal, wish me luck.)
Dynamic Embeddings for User Profiling in Twitter
- Xiangliang Zhang, paper
- the task: given user and tweets, predict keywords over time
    - should be relevant, diverse, dynamic
 
- related work:
    - expert finding (based on documents) - e.g. Ryback 2014
- Balong 2007 - generative language (static)
- dynamic topic modeling
- dynamic word embedding (Kim 2014, Hamilton 2016, Bamler & Mandt 2017)
 
- want words & users embedded in same semantic space, modeled dynamically

- Dynamic User and Word Embedding - graphical model
    - model user & word representations as diffusion, parametrized by how much user’s documents (tweets) change over time
- use skipgram filtering (Bamler & Mandt 2017) for inference
 
- Streaming Keyword Diversification Model - not really described
    - (but this has given me some ideas…..  ) )
 
- (but this has given me some ideas….. 
- evaluation - not really described, but naively this seems tricky to do right
TINET: Learning Invariant Networks via Knowledge Transfer
- Chen Luo, paper
- analysing system behaviour of large-scale online services (e.g. AWS) - automate that shit
- invariant network model - capture normal system behaviour, can do anomaly detection etc.
- standard methods of learning (time series etc.) is slow slow slow
- difficult to directly transfer invariant network to new environ
- given complete old IN and incomplete new IN - get complete new IN
    - domain-specific knowledge from target
- common knowledge from source
- deal with heterogeneity
 

- TINET = Entity Estimation Model + Dependency Construction Model
    - transfer relevant entities and construct missing dependencies
 
- compare with random walk & collective matrix factorization (other ways to compute node importance & link prediction)
- effectiveness demonstrated in synthetic & real experiments
- converges within 20 iterations, no parameters need to be tuned
Can Who‑Edits‑What Predict Edit Survival
- Victor Kristof, paper
- distributed peer production systems (open source, crowdsourcing…)
- want to predict quality of contributions
    - e.g. user reputation, specialised classifiers
- want something both general and accurate
 

- INTERANK - simple, general, accurate
- model the probability that edit is successful as game between user and item
    - “skill” of user and “difficulty” of item
- also model closeness of user and item in shared “skill” latent embedding space
 
- Wikipedia experiments
    - difficulty param correlates highly with manually determined controversial articles
- topical clustering in latent space
 
- Linux experiments
    - core components most difficult (low acceptance rates)
 
An Efficient Two‑Layer Mechanism for Privacy‑Preserving Truth Discovery
- Yaliang Li, paper
- truth discovery - weighted aggregation (e.g. for crowdsourcing)
    - estimate accuracy of answers and reliability of users
 
- but may require sharing info that you want to keep private
- randomized response - return true response with some probability, otherwise some default
- two-layer system
    - sample private probability from some distribution
- use this for randomized response
 
- analyse privacy with epsilon-local differential privacy
- analyse utility by bounding change in error rate
- I kinda stopped paying attention to this one partway through, whoops   
Generalized Score Functions for Causal Discovery
- Biwei Huang, paper
- causal discovery from observational data
- existing score-based methods (e.g. BIC) make strong assumptions on data distribution, etc.
- want something general and asymptotically correct
    - mixed data types
- nonlinear causal relations
- arbitrary data distributions
- multi-dimensionalities
 
- cross-validated likelihood and marginal likelihood in RKHS
    - locally statistically consistent
 
- I found this presentation really hard to follow, but it seems like a simple and theoretically well-founded idea
R‑VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering
- Pan Lu, paper
- image + question => predict answer
- past work
    - image & question feature extraction, fusion, multiclass classification into answers from training data
- models tend to extract entities
 
- relation facts (triples) have larger semantic capacities - extract these instead
- existing data sets don’t have aligned relation facts (e.g. COCO-QA, VQA) or aligned question-answer pairs (visual genome)
- => Relation-VQA dataset! (to be released soon)

- architecture tries to extract relation facts from image and question
- then semantic attention over relations to answer question
- also context-aware visual attention conditioned on image and question
- gets SOTA on R-VQA (a dataset they just came up with, so of course)
    - … and also COCO-QA (though they trained on strictly more information I guess since there’s also relations? maybe other work does as well)
 
- I still don’t really understand what the point of this task is, though I guess it’s mildly AGI-ish and probably learns some useful semantic representations
And that’s a wrap!