Neural Collaborative Autoencoder - penny4860/study-note GitHub Wiki

1. 정리

Interaction Data만 사용한 Autoencoder기반 CF를 구현
- explicit / implicit 모두의 학습방법을 제공
- error reweighting, sparsity-aware DA로 높은 성능의 implicit dataset의 CF를 제안

negative sample의 loss 연산
- error reweighting 모듈
- (가설): popular한 item인데 그 유저와의 iteraction이 없다면 싫어할 가능성이 크다.
- 방법
  1. popularity vector를 미리연산
    - vector element의 총합 = 1.0
  2. observed item
    - positive label로 간주하고 loss 계산 : unit weight (1.0)
  3. unobserved item
    - negative label로 간주하고 loss 계산
    - popularity로 weight
    - popularity vector의 합이 1.0이므로 negative sample 1개의 loss는 작아진다.
sparse user의 data augmentation에 사용
- (가설): less popular item에 대한 interaction은 user의 preference를 더 반영한다.
- 방법
  - interation 숫자가 threshold 미만이면, popular한 item을 지우는 방식으로 augmentation

low-rank representation 에서 rank의 의미
- 행렬의 rank
  - column vector로 생성될 수 있는 vector space의 dimension
- Matrix Factorization으로 rating matix를 low-rank matrix로 분해한다.

MF에 의한 CF 추천알고리즘
- MF : user-item interaction function을 user latent vector와 item latent vector로 모델링
- inner product는 복잡한 관계를 capture할수 없음.
DL을 이용한 추천
1. Integration 모델
  - DNN을 feature를 추출하고 이를 CF framework에 결합하는 방식
  - 결국엔 내적으로 interation data를 모델링한다는 한계가 있음.
2. Neural Network 모델
  - interation data를 NN으로 모델링함으로써 CF를 직접수행
  - non-linear relationship를 발견할 수 있음.
  - 기존 방법들의 한계
    - CFN, DAE : layer를 깊게 쌓지 못했음
    - NCF : overfitting
      - pointwise ranking 방식이기때문에 time-consuming
NCAE
- 키 idea : user/item prefernce를 복원하는 과정에서 non-linear MF를 수행
- designed at user/item level
  - input : sparse user or item vector
  - output : score
- 3-stage pretraining mechanism
- overfitting issue
  1. Error reweighting : 모든 unobserved data를 negative로 사용
  2. sparsity-aware data augmentation

Data notation
- RR : user-item rating matrix
  - RR_ij : i번째 user의 j번째 item의 rating score
- R : user-item pairs
  - (i, j, r_ij)
- R_bar : unobserved user-item pairs
- R_i
  - user i의 history
- u_i
  - user i의 observed items
  - u_ij
    - r_ij
    - 0 : unobserved item일 경우
  - u_tilda
    - corrupted
  - u_hat
    - estimated
모델 notation
- W^l, b^l
- theta^l : l번째 layer의 activation function
- z^l : activation output

NCAE
- input : u_i (sparse vector)
  - 어떤 user의 rating(click) history
- output : u_hat_i = nn(u_i)
  - 모든 item의 predicted score
NCAE를 matrix factorization으로 해석
- R = V * U'
  - R : (n_items, n_users)
  - V : (n_items, d)
  - U' : (d, n_users)
- R_i = V * U[:, i]'
  - R_i : (n_items, 1)
  - V : (n_items, d)
  - U[:, i]' : (d, 1)
    - 어떤 user의 임베딩
- u_hat_i = W * z = nn(u_i)
  - W : 마지막 weight matrix
    - (n_items, d)
    - MF에서 item의 임베딩으로 해석할 수 있음
  - z : hidden layer의 출력 vector
    - (d, 1)
    - MF에서 user의 임베딩으로 해석할 수 있음

unobserved rating
- error term을 0으로 설정해서 해당 weight는 update하지 않는다.
observed rating
- input rating을 맞추도록 학습
- dropout된 input rating도 원래의 rating을 맞추도록 학습

Implicit dataset의 경우에는 Sparse Backward 방식으로 unobserved rating을 안쓰면
- (거의) 모든 item을 1로 예측해서 overfitting 나기 쉽다.
  - input 모듈에서 noise를 주지만 그걸로는 부족
- negative sample이 없는 것과 같음
  - explicit의 경우에는 낮은 rating이 negative sample의 역할을 하지만
  - implit의 경우에는 모든 target이 1 인것과 마찬가지
Implicit dataset에서는 unobserved item을 negative sample로 써야함.
- 그냥 쓰면 negative sample이 너무 많아짐
- popularity로 적당히 weight를 주자
unobserved item의 weighting
- 해당 item의 interatrion / 전체 interaction

DA로 더 많은 item correlation pattern을 제공할 수 있다.
가정
1. sparse한 user만 augment가 필요하다.
2. less popular item에 대한 interaction은 user의 preference를 더 반영한다.
Flow
1. interaction 숫자가 sparsity thd 미만이면
2. dropout ratio 만큼의 popular한 item을 drop

(생략)

(생략)