[논문리뷰] Wide & Deep learning for recommender system - penny4860/study-note GitHub Wiki

1. 정리

요약

1) Wide & Deep 모델 : pointwise ranking 모델

P(Y=1 | x) = sigmoid(W_wide(.) + W_deep(.) + b)
로직
- Raw Input : x
- Cross Product Transformation : pi(x)
- 임베딩 : '''Embedding(x)'''
- Wide Component
  - [x, pi(x)]
  - y1 = W_wide(.) + b
- Deep Component
  - Embedding(x)
  - y2 = Dnn(Embedding(x))
- Joint
  - y = sigmoid(y1 + y2)
  - 최종 score

2) 구글 앱스토어에 적용 :

Wide component
- Sparse Input Feature끼리의 interaction 정보를 직접 입력
- user installed apps, impression app의 cross-product vector를 입력
Deep component
- Sparse Input Feature의 임베딩을 입력

3) Wide / Deep의 장, 단점

Wide
- (+) : Memorization 능력이 뛰어남. co-occurrence pattern을 캡쳐
- (-) : feature engineering의 어려움
Deep
- (+) : Generalization 능력이 뛰어나 unseen pattern을 캡쳐할수 있음.
- (-) : interaction 데이터가 부족한 경우 학습이 어려워 추천의 질이 떨어짐.

질문

user feature, contextual feature, impression feature의 의미

user feature
- user의 feature : age, 성별, 국가
contextual feature
- local time of day, day of week, device등
impression feature
- (평가대상이되는) item의 feature
- item age, item popularity 등
- Youtube 추천 시스템은 impression feature를 이용하기위해 2단계로 분류
- pointwise learning-to-rank 방식에서 사용할 수 있다.

2. 내용

1. Intro

search ranking system관점에서의 recommender system
- Input Query : A set of user and contextual information
- Output : 추천 리스트
memorization
- historical data를 암기
- 구현 방법
  - binary sparse feature 사용
  - cross-product feature transformation 으로 input feature를 늘린다.
  - linear 모델로 단순하게 구현
generalization
- historical data를 응용
- 임베딩 기반 모델로 구현
Contribution
- Wide & Deep 구조로 memorization과 generalization을 동시에 추구하는 모델을 제안

2. Recommender system overview

검색후 개인화 re ranking에서

Retrieval
- Input : Query
- Output : a short list of items that best match the query using various signals
Ranking
- Input
  - user feature
    - country, age
  - contextual feature
    - device, hour of the day, day of the week
  - impression feature
    - app age, historical statistics of an app
- Output
  - ranked 추천 리스트

3. WIDE & DEEP LEARNING

3.1 The Wide Component

generalized linear model
- y = sigmoid(Wx + b)
Input : x
- Raw input features (Sparse & Binary)
- Cross producted transformed features
  - input feature와의 interaction정보를 표현
  - non-linearity를 추가하는 용도
Output : y
- scalar score

3.2. The Deep Component

DNN 모델
임베딩
- raw input feature vector를 dense embedding vector로 변환
- 임베딩 vector를 concat후 DNN 모델에 입력

3.3. Joint Training of Wide & Deep Model

4. SYSTEM IMPLEMENTATION

4.1. Data Generation

Catogorial Feature : 1-hot 인코딩
Continuous Feature : [0, 1] range로 normalize

4.2. Model Training

모델구조
- Wide Component Input
  - cross-product transformation of user installed apps and impression apps
  - cross-product vector만 입력하고, 각각의 sparse vector는 입력하지 않은듯.
- Deep Component Input
  - Continous Features
  - Categorical Feature의 임베딩
warm-starting system
- initializes a new model with the embeddings and the linear model weights from the previous model.

4.3 Model Serving

5. EXPERIMENT RESULTS

5.1 App Acquisitions

Test 결과
- offline test
  1. Wide : 0
  2. Deep : -0.004
  3. Wide & Deep : +0.002
- online test
  1. Wide : 0
  2. Deep : +2.9
  3. Wide & Deep : +3.9
deep learning모델의 online 지표가 더 좋음.
- 해석 : 일반화 능력이 뛰어나기 때문에 온라인 지표가 더 좋을 수 있음

6. Related Work

Factorization Machine
- add generalization to linear models by factorizing the interactions between two variables as a dot product between two low-dimensional embedding vectors.