Deep Learning for Sequential Recommendation - penny4860/study-note GitHub Wiki

1. 정리

요약

Sequential Recommendation 알고리즘의 분류

Experience Based
Transaction Based
- 1가지 행동(클릭)만 존재
- GRU4Rec / Bert4Rec / NextItNet(CNN기반)
Interaction Based
- 다른 행동이 존재 : click / buy가 구분된다.

Influential Factors

질문

negative sampling 이란
- 학습과정에서 positive/negative pair가 필요할때 negative sample을 정하는 방식을 negative sampling 이라고 함.
popularity based sampling
- popular한 item(같은 batch의 item)중에서 interaction이 없는 item을 negative sample로 정하는 방법
attention은 seq2seq구조인데 이걸 어떻게 쓴 걸까?
- Bert구조처럼 encoder 부분을 떼어서 transformer layer를 기반으로 사용

2. 내용

1. Intro

온라인 platform 에서의 Traditional recommend systems
- 분류
  - content based
  - colaborative filtering
- 문제점
  - 모든 user의 동작을 같은 중요도로 고려한다. (즉, sequential data로 다루지 않는다.)
Sequential recommendation
- 용어
  - session-based, session-aware, sequence-aware recommendataion 이라고도 함.
- sequential recommendation에서는 long-term preference(장기적인 선호도)와 shor-term interest(단기적인 관심사)를 동시에 고려해야한다.
- sequential data를 모델링하는 방법을 도입하려는 시도가 있었음.
Deep Learning 기반의 sequence recommendation
- 딥러닝이 NLP에서 성과를 보였고 이를 sequence recommendation에 적용하려는 시도가 있음.
- Application
  - e-commerce / point-of-interest / music / video
Contribution
1. DL 기반 sequential recommendation 에 대한 overview
2. 3가지 추천 시나리오에 따른 classification framework를 제안
3. DL 기반 sequential recommendation에 영향을 주는 factor를 정리
4. open issue / future direction에 대한 서술

1.1. Related Survey

Use of deep learning in modern recommendation system: A summary of recent works. arXiv preprint arXiv:1712.07525 (2017).
- DL based 추천에 대한 요약
- 3가지 분류
  - collaborative filtering
  - content-based
  - hybrid
A review on deep learning for recommender systems: challenges and remedies (2018)
- 추천에 적용된 딥러닝 기술을 중심으로 서술
Deep learning based recommender system: A survey and new perspectives. (2019)
Sequence-aware recommender systems. (2018)
A survey on session-based recommender systems (2019)

2. Overview

2.1. Concept Definitions

user의 동작 (behavior)
- definition
  1. behavior object (item)
    - 딜 id
    - 딜에 대한 다른 정보를 포함할 수 있다.
      - text descriptions
      - image
      - interaction time
  2. behavior type
    - search / click / add-to-cart / buy / share
- notation
  - a_i = (c_i, o_i)
    - i-번쨰 action
    - c_i : action 종류
    - o_i : item
Sequential Recommend System
- input : behavior sequence
- output : items
behavior sequence의 3가지 분류
1. Experience-based behavior sequence : same objects / different types
  - 같은 item에 대한 다른 동작을 모두 record
2. Transaction-based behavior sequence : different objects / same types
  - 1가지 동작만을 record
3. Interaction-based behavior sequence : different objects / differrent types
  - 같은 item은 1가지 동작(최종 action)만 record

2.2. Sequential Recommendation Tasks

(p_1, p_2, ...., p_I) = func(a_1, a_2, ..., a_t, u)
- input
  - a_1, a_2, ..., a_t : behavior sequence
  - u : user profile
- output
  - (p_1, p_2, ...., p_I)
  - item별 확률값

2.3 Related Models

2.3.1 Traditional Methods.

2.3.2 Deep Learning Techniques.

RNNs
- DL 기반의 sequential recomendation task의 주류
- 장점
  - item끼리의 dependency를 (within a session, across differenct session)에서 모두 잘 capture 한다.
- 단점
  - longer sequence에서의 dependency를 모델링하기 힘들다.
CNNs
- local information의 dependent relationship을 잘 capture한다.
Attention mechanisms
- vanilla attention을 sequential recommendation에 적용
- self-attention (transformer)을 sequential recommendation에 적용
  - 더 좋은 성능을 보임.

3. SEQUENTIAL RECOMMENDATION ALGORITHMS

3.1. Experience-based sequential recommendations

multi-behavior recommendation
목표 : user의 next behavior를 예측하는 것.

3.1.2. DL-based

NMTR : [20]
- behavior type간의 관계를 capture하기위해 cascade prediction 방식을 사용

3.2. Transaction-based sequential recommendations

1개의 behavior만 존재 (click, buy)
item끼리의 sequential relationship과 user preference를 고려

3.2.1. Rnn-based models

3.2.1.1. GRU4Rec-based models

GRU4Rec
- sequential recommendation task에 rnn을 적용한 첫 번째 논문
- user 정보는 사용하지 않았음.
- 모델 구조
  - input layer
    - item의 임베딩
    - (또는) item 여러개의 임베딩의 weighted average
- 학습 방식
  - session-parallel mini-batch 방식
    - 모든 session의 길이를 같게 만들어서 mini-batch를 구성하는 방식
  - popularity-based negative sampling
    - 모든 item의 ranking을 구하고 sample을 구성
개선 방식
- 학습 방식의 개선
  - facilitating training (76)
    - data augmentation을 사용
  - dwell time (8)
    - mini-batch 구성 방식을 개선
  - Additional sampling for negative sampling (26)
    - popularity-based negative sampling 방법을 개선
- item 정보를 추가
  - text description 이나 image 정보를 추가
  - p-rnn (28)
    - click sequence와 click된 item 정보를 사용

3.2.1.2. With user representation

3.2.1.2. Context-aware sequential recommendation

Context 정보
- input context
  - user behavior 1개에 대한 정보 (age, gender, location, time, weather)
- transition context
  - user behavior 사이에 대한 정보 (time interval)
ARNN (71)
- rnn에 user side context 정보를 추가
- rnn 기반 추천모델에 user context 정보를 추가하는 방식을 제안

3.2.2. CNN-based models

RNN 모델의 단점
- 짧은 sequence에 대한 모델만 가능
- expensive computing costs
CNN 모델
- 왜 쓰지?
  - rnn의 단점을 극복하기 위한 시도
  - long term preference를 capture하기 좋은 구조
- 관련 연구
  - 3d-cnn (78)
    - item id, name, category에 대한 embedding matrix를 설계
  - caser (77)
    - 이전 item의 embedding matrix를 image로 간주
    - horizontal layer / vertical layer를 함께 사용해서 point-level, union-level sequential pattern을 capture
  - cnn-rec (29)
    - (77)과 비슷하지만 vertical convolution을 사용하지 않았음
  - NextItNet (95)
    - residual block 구조를 사용해서 short/long time dependency 정보를 capture

3.2.3. Attention-based models

3.2.3.1. Vanilla Attention

encoder-decoder 구조의 attention을 사용해서 sequence 내부의 noise를 제거 (실수로 클릭한 아이템)

3.2.3.2. Self Attention

SASRec (98, 34)
Bert4Rec (19)
(9) : 추천의 다양성을 추구

3.3. Interaction-based Sequential Recommendation

서로 다른 행동(click/buy)를 구분하기 때문에 더 복잡한 구조를 보인다.

3.3.1. Rnn-based models

(78)
(37) : session을 target / supporting sequence로 분할
- target behavior (구매 행동)
  - prediction task에 대한 더 efficient information
- 나머자 (click 찜등)
  - target 행동을 예측하기 위한 supporting sequence로 사용
(41) : 모델을 2개의 component로 나누어서
1. neural item embedding
2. discriminative behavior learning
  - utilizes all types of behavior
(45) : multi-behavior sequence를 모델링
(69) : context information을 사용해서 rnn 구조를 수정

3.3.2. Other models

Attention
- AtRank (103)
  - self attention과 vanilla attention을 같이 사용
  - self attention : among behaviors
  - vanilla attention : different behaviors
- CASN (30)
  - Atrank를 개선

3.4. Concluding remarks

Rnn과 Attention mechanism이 sequencial recommendation task의 주류를 이루고 있음.
interaction-based task는 이슈가 남아있음.
1. behavior type과 item이 거의 동등하게 취급
  - Atrank에서 behavior type/item에 같은 attention score를 적용
2. diffenent behavior type을 잘 구분하지 않기도 함.
3. behaviorrksdp correlation이 무시되기도 함.

4. Influential Factors on DL-based models

4.1. Input Module

figure 10
table 2

4.1.1. Side Information

item관련 정보를 추가하는 방법 (image/text)
- p-RNN (28), (30)
거래관련 정보를 추가 (dwell time)

4.1.2. Behavior Type

다른 행동은 다른 의도를 나타내므로, 분리할 필요가 있음.
- buy는 long-term preference를 capture하는데 용이
- click등의 다른 행동은 short-term interest를 capture데 용이
다른 행동을 다루는 방법들
- CBS (37)
  - buy를 포함한 target sequence와 나머지 supporting sequence로 나누어서 학습
- BINN (20)
  - short-term interest를 capture할때는 모든행동을 사용
  - long-term preference를 capture할떄는 buy만 사용
- (103)
  - behavior type각각의 임베딩 vector를 학습해서 concat하는 방법을 사용

4.2. Data Processing

4.2.1. Embedding Design

item, user, session을 vector로 변환하는 과정
사용예시
- (21) : ecommerce application에서 item embedding과정에서 word2vec을 사용
- (41) : w-item2vec
  - skip gram 모델 기반의 item vector representation 방법
- (90) : session embedding

4.2.2. Data Augmentation

(76) original input session을 training sequence로 활용
- GRU4Rec의 성능을 14.7% 올렸음.
(73) Dropout
- xs -> y 에서 xs의 일부를 삭제하는 방법

4.3. Model Structure

4.3.1. Incorporating Attention Mechanisms

attention을 쓰는게 좋음.

4.3.2. Combining with Conventional Methods

기존의 traditional method와 DL을 병합하는 방식

4.3.3. Adding Explicit User Representation

user의 성향(long-term preference)을 명시적으로 학습
2가지 방법
1. user embedded models
  - embedding 방식으로 user representation을 학습
  - 단점
    - cold-start
    - user의 dynamic preference를 반영할 수 없음.
  - 결국은 recurrent 방식이 더 좋다.
2. user recurrent models
  - user representation을 recurrent component로 학습
  - (56) : rnn framework로 user의 long-term preference를 behavior sequence로 학습

4.4. Model Training

학습 전략 (학습을 어떻게 돌릴것인가)에 대한 서술

4.4.1. Negative sampling

popularity-based sampling
- popular item에 대해서 그 user의 interaction이 없다면, 싫어하는 item이라고 가정
uniform sampling
additional sampling (26)
- negative sampling 공식 : supp_i**alpha
- alpha == 0 : uniform sampling
- alpha == 1 : popularity sampling

4.4.2 Mini-batch Creation

Session-Parallel mini-batch
변형
1. item boosting
2. user-parallel mini-batch

4.4.3 Loss Function Design.

(생략)

5. EMPIRICAL STUDIES ON INFLUENTIAL FACTORS

논문에서 서술한 influential factors에 대한 실험을 진행

5.1 Experimental Settings

5.1.1 Datasets

RecSys15 / RecSys19 / LastFM

5.1.2 Model

실험방법
- Gru4Rec을 기본 모델로 사용하고, influential factor를 추가해가면서 실험
Influential Factors
1. input module
  - item category
    - category 임베딩을 추가
  - dwell time
    - 8번 논문
  - Behavior type
    - 행동 type 정보를 임베딩으로 추가
2. Data processing module
  - data augmentation
3. model structure
  - NARM : GRU4rec에 attention을 추가
  - weighted model
    - (31)
    - DL + KNN
  - Adding user representation
    - user id에 대한 임베딩 layer를 추가
4. Training module
  - loss
  - sampling method
  - size of negative samples

5.1.3 Evaluation Metrics

recall : target item이 추천 list에 있는지
MRR : target item이 추천 list에서 ranking이 높은지를 측정
MAP : target list가 추천 list에서 ranking이 높을 수록 score가 높음.
NDCG : target list와 추천 list의 ranking이 얼마나 일치하는가

5.2 Results

5.2.1 Input Module

Side Information
- item category / dwell time 모두 사용해서 나빠지지는 않았다.
Behavior Type
- 큰 차이는 없었음.
- 사용한 데이터셋의 문제일수도 있음.

5.2.2 Data Processing

Augmentation사용 여부는 큰 차이가 없었음.

5.2.3 Model Structure

Attention mechanism을 추가하는 것
- 좋아졌음
Combining with conventional method effects (KNN과 결합하는 것)
- 좋아졌음
User representation을 추가하는 것
- 논문에서의 실험결과는 안좋아졌음.
- 데이터셋의 문제일 수도 있음

5.2.4 Model Training

sampling method.
negative sample size
- Fig. 15.
- negative sample이 클수록 성능이 좋아짐.
  - 32이후로는 경사가 완만하게 좋아짐
- Training time과 trade-off 관계에 있으므로 적절한 size를 선택하자.
loss function
- BPR-max, Top1-max, cross-entropy > BPR, Top1
  - cross-entropy도 나쁘지 않았음.
- dataset에 따라 조금씩 다르게 나왔음.

5.2.5 Concluding Remarks

논문에서의 제안
1. try all possible side information (이미지, 텍스트까지 포함)
2. behavior type과 target behavior의 관계를 고려할 것.
  - 목적이 구매행동이라면 클릭행동은 노이즈가 될 수도 있음.
3. 학습과정에서
  - data argumentation을 사용할것.
    - 실험결과는 미미했음...
  - loss 함수는 TOP1-max, BPR-max, crossentropy
  - negative sample size는 클수록 좋지만 학습시간과의 밸런스를 맞춰야함.
4. attention과 user representation을 붙여볼 것.

6. Future Directions and Conclusions

6.1 Future Directions

Open Issue
- Objective and comprehensive evaluations across different models
  - 모델 구조에 대한 종합적인 비교가 필요함.
- More designs on embedding methods
  - 임베딩 방식에 대한 연구가 필요함.
    - (43 논문) : dependencies relationships among items and their attributes.
- Advanced sampling strategies.
  - Sequential Reco. 분야에서의 sampling 방식은 NLP에서의 sampling 보다 간단함.
- Better modeling user long-term preference
- Personalized recommendation based on polymorphic behavior trajectory
- Learning behavior sequences in real time.
- Sequential recommendation for specific domains

6.2 Conclusion

Sequential Reco. 알고리즘을 3가지 타입으로 분류
Influential Factor에 대해 정리하고 실험결과를 report