Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency - USC-LHAMa/CSCI544_Project GitHub Wiki

** Abstract:

"Based on synonymous substitution strategy, we introduce a new word replacement order determined by both the word saliency and the classification probability, and propose a GREEDY algorithm called **probability weighted word saliency (PWWS) for text and adversarial attack. .. Performing adversarial training using our perturbed datasets improves the robustness of the models. At last, our method also exhibits a good transferrability on the generated ed. ex."

** Intro.: PWWS considers the word saliency as well as the classification probability.

Word Saliency: how well the original word affects the classification Change value of classification probability used to measure the attack effect on proposed substituted word.

** Related Word:

Various existing methods need to calculate gradient with access to the model structure, model parameters, and the feature set of inputs. These are WHITE-BOX attack.

Still lots of room for improvement in percentage of modifications, attacking success rate, maintenance on lexical as well as grammatical correctness and semantic similarity.

** Text Classification Attack:

Add a small, imperceptible perturbation delta to x to get the classifier to yield the wrong answer. Perturbations must be grammatically correct, and semantically meaningful to pass human perception.

Replace words in input texts with synonyms and replace NER with similar NERs.

** Word Substitution Strategy: (verbatim)

The change in classification probability between x and xi represents the best attack effect that can be achieved after replacing wi. Pi = P(ytruejx) 􀀀 P(ytruejxi): (5)

Replacement Order Strategy

First calculate word saliency vector for text x. Then rank the candidate words using a score function H(x,x_i*,w_i) to select.

** Empirical Evaluation: Used mostly for Classification. Test on IMDB, AG's News, and Yahoo's Answers. Using Word-based CNN, Bi-directional LSTM, Char-based CNN and LSTM

** Result:

This method achieves the most reduction compared to other tried methods, and also uses very few word replacements. The rate refers to the number of substituted words divided by the total number of words int eh original clean sample texts.