Abstract

Given the strength and sophistication of language models as well as their rapid improvement, distinguishing between human-written and machine-generated text has become more and more difficult. This project aims to understand if GAN (Generative Adversarial Network) type models can be used in text generation and discrimination, leveraging vector quantization to encode discrete grammatical structures. This type of approach employs an OpenAI-powered generator and a BERT-based discriminator. Model performance is judged based on information retention scores, lexical richness, and classification accuracy. In the end, our generator was able to achieve an average information retention score of 0.677, however, our discriminator achieved a 100% accuracy rate which we believe is attributed to a poorly selected dataset. This project highlights the potential as well as challenges of utilizing GANs in natural language processing. In the future, this approach could be improved given a more pertinent dataset, enhancements to the generator’s updating, and an independent generative model.

What this project is about

Generative Adversarial Networks (GANs) are a deep learning architecture. They utilize two neural networks (a generator and a discriminator) and have them compute against each other for them both to train synchronously. We hope to use this architecture to explore the potential use of discrete, grammatical representations of a sentence to reconstruct another version that retains all original meaning and perhaps discern if this method can be as successful as continuous inputs in NLP models. Furthermore, this explores grammar rather than individual word embeddings for text generation and meaning retention.

In the case of this project, the generator will get better at creating human-like text as it will be able to see how to fool the discriminator. The inputs to the generator will be a human-written sentence and a sentence grammatical structure with the output being a brand-new sentence that maintains the meaning of the original and has the inputted structure. The discriminator will get better at detecting machine text as it can learn from which guesses it gets right and wrong. The input for the discriminator will simply be the generator’s sentences, with the output being a probability that said sentence was machine-generated.

First, we must obtain a set of inputs - a set of parallel human-written text and machine-generated text. At a high level, we took the TripAdvisor hotel reviews dataset and passed each review into a ChatGPT query (via the API) and regenerated the review as machine-generated text. Both the review and the machine-generated versions were then added to a csv file. We then encode the inputs by using spaCy to extract their grammatical features.

For the generator, we need a model for generating sentences given a grammatical structure and group of words. While it would be ideal to self-train our own generative model, that task is outside of the scope of our project. Because of this, we settled to leverage the OpenAI API to generate text. As stated earlier, there are two inputs for the generator - a text and a sentence structure. We developed a prompt and passed both inputs into said prompt to get the output we desired (a new sentence with the new structure but retaining the meaning of the input text).

We use vector quantization to encode the inputs to the generator, focusing on grammatical features. Vector quantization is a technique where you can approximate a large dataset using a smaller representative dataset of fixed size called a codebook. Each tokenized input is quantized, or mapped to the nearest vector in the codebook. For our goal of generating AI text that retains all the original meaning of the human-written sample, our encodings will contain the sentence structure information of a sentence, and the codebook will contain the most common grammatical structures. The quantizer will be pre-trained on the text corpora to extract grammatical structures from sentences. Thus, we will have a fixed-size codebook of common grammatical structures that can be accessed using Vector Quantization on sentences. We will also store every possible pair of sentence structures in the codebook alongside a probability, so that every structure has its own probability distribution for the structure that retains its meaning the best. So our codebook and quantizer can not only quantize input sentences but also find strong pairings between quantized sentences that will work well for paraphrasing.

The discriminator’s only goal is to differentiate between machine-generated text and human-written text - it has no job in understanding or computing the ‘information’ or ‘reconstruction’ loss. It will be a binary classifier pre-trained on a subset of the training corpora.

To measure success, we will have 2 baselines. First, we will compare our final GAN against a BERT model from a Pace University study on AI-generated text discriminator. We will compare our generator’s performance against the discriminator to the PACE model’s 93% accuracy. We use this as a baseline for both models we create, where we can directly compare our discriminator, and we can compare our generator via if it is more discriminated against than our original machine-generated inputs. Second, we will measure information loss from the generator’s paraphrased output of human text using our information loss model. Other qualitative measures to consider are using models like GPT2 to score the outputs for perplexity/semantic complexity, and separately considering the information loss score. These measures will help show how “natural-sounding” and information-retaining our generator is.

Progress made so far

Approach

Main Approach
- Starting with the “tripadvisor-hotel-reviews” dataset that was presented in a previous homework assignment, we have created a dataset of human writing samples and corresponding writing samples from the AI. We first took the dataset and siphoned off the human-generated reviews as a list of strings. Next, we had to create their respective machine-generated texts. To do this, we leveraged OpenAI’s API to query ChatGPT with the following query: “[p]lease rewrite the following text so that the meaning is preserved, but the wording is changed: {input_text}. Additionally, structure the sentence according to this pattern: "{sentence_structure}" ” where the input text is replaced with a review. The query results in a machine-generated version of said review which is added to a list. This list is finally appended to a CSV file along with their respective original human-written reviews.
- We created a quantizer that extracts the grammatical structure of a sentence and maps it to one of the grammar categories in its fixed length codebook made up of the n most common grammatical structures (via a K-means algorithm). Given a sentence, the quantizer uses spaCy to extract its relevant grammatical features and represents the sentence as a dynamically-sized vector of dependency-pos (part of speech) pairs. This vector is converted into a numerical vector using the quantizer; we map the pairs to a vector of integers, which we linearly transform into a fixed length vector (of dimension 1 * d) via a trained projection layer. The codebook is then learned through k-means clustering, where the number of clusters is the number of entries in the codebook, and sentences with similar grammatical structures are clustered together. When a new sentence is processed, its structure vector is compared to the codebook, and the quantizer outputs the nearest cluster using Euclidean distance.
- We have a discriminator that discriminates between AI and human text. It’s a BERT-based binary classifier that is trained on a subset of our training data.
- We have an information loss model that takes in two sentences A and B, and evaluates how much of A’s original meaning B retained. We created the model by downloading the pre-trained roBERTa model and fine-tuning it on the STS-B dataset, which contains pairs of sentences and their meaning similarity scores.
- We have combined our generator and quantizer to create a complete generator. First the trained quantizer takes in a human sentence and returns its closest sentence structure for its AI-counterpart. The generator takes in this structure and the original sentence to return an AI-written version of the sentence. More specifically, the generator takes a sentence structure and a human-written sentence as input, passing said input into a query requesting that the sentence be rewritten while maintaining the original sentence meaning while adhering to the inputted structure. We also pass in an example transformation as well as what the structure encodings pertain to (e.g. “‘[a]rticle’ means an article (a, an, the) must be used here”). The output is then a new sentence that has the structure and meaning from both inputs respectively.
Baselines
- We evaluate the success of our codebook in learning a fixed number of the most common sentence structures by visualizing the learned clusters and calculating its average silhouette score. The silhouette score is a score that ranges from -1 to 1 and shows how close each data point is to its cluster compared to how close it is to other clusters. A score of below 0 shows misclassification that the data points are closer to a cluster besides its assigned one, a score below 0.3 is considered weak clustering, and a score above 0.5 indicates strong clustering. The graph and the score will indicate whether our model learned distinct sentence structures for paraphrasing that correspond to specific sentence original structures.
- We will evaluate the discriminability of our model against a BERT model in Detecting AI Generated Text Based on NLP and Machine Learning Approaches that was also trained for AI text discrimination on a similar but larger dataset and achieved 93% accuracy.
- We will evaluate the meaning retention of our model against the average 0.9 score of some manual samples we put into our information loss model (reworded sentences with the exact same meaning) . Our information loss model is learned simply with regression, so we can calculate an MSE against the expected score of 1 for all the generator’s outputs against the original sentences.
Novelty
- The use of vector quantization to learn text generation is novel because it allows for discrete representations of grammatical structures, while most NLP models use continuous representations. Instead of giving every sentence a unique encoding, the encodings are limited by the number of clusters and each sentence gets mapped to its nearest encoding. We based our quantizer off of the quantizer used in the paper Neural Discrete Representation Learning, which used [vector quantization] (https://arxiv.org/abs/1711.00937) and discrete instead of continuous variables to avoid posterior collapse and learn just as well as continuous models.
- Below is a visualization of VQ, where each datapoint is mapped to its closest representation (the red dot in the same blue frame) from the fixed codebook (the set of red dots) .

VQ image

Experiments

Data
- For the data, we wanted a CSV file containing a large number of human-written text and machine-generated versions of the same text. Because of the nature of the project (discriminating between human and machine-generated text), we wanted a dataset that was written with voice and style. Therefore, we didn’t want a dataset like the Wikipedia one as the texts in the data don’t have any human quirks such as voice. We believed the TripAdvisor hotel reviews dataset had the human voice and style that we required along with ample volume to train on. From there, we took the dataset and siphoned off the human-generated reviews as a list of strings. Next, we had to generate their respective machine-generated texts. To do this, we leveraged OpenAI’s API to query ChatGPT with the following query: “[p]lease rewrite the following text so that the meaning is preserved, but the wording is changed: [INPUT TEXT]” where the input text is replaced with a review. The query results in a machine-generated version of said review which is added to a list. This list is finally appended to a csv file along with their respective original human-written reviews.
- We used the [STS-B dataset] (https://huggingface.co/datasets/sentence-transformers/stsb) to train the information loss model. This dataset is ideal for training information loss, because each sample is a pair of sentences labelled with a meaning similarity score between 0 and 1. For example, (“A plane is taking off.”, “An airplane is taking off.”) scores 1, but the slightly different pair (“A man is playing a large flute.”, “A man is playing a flute.”) scores 0.76. This data is used for the task of training an information loss model, which will be used to evaluate our generator’s text for meaning retention.
Evaluation method We evaluate the entire GAN by running the GAN results through our information loss and discriminator model. We evaluated each of these models individually on their evaluation sets. The quantizer component is evaluated by visualizing its clusters and calculating its silhouette score. Below are details on the GAN evaluation specifically:
- To compare against the BERT 93% baseline, we will set aside part of our dataset as evaluation data, and for every test sample, we will feed the generator the human writing and have it generate the AI version. Then we will calculate the percentage of generated text the discriminator marks as AI.
- To check for information loss, we will run the test output through the information loss model. The baselines are 0 for no meaning retention, 0.5 for partial meaning retention, and 0.9-1 for perfect meaning retention.
- We will also compare the average Lexical Richness score from our generated samples to their human-written counterparts, which can be used to measure how “natural-sounding” our generator’s text is.
Experimental details
- For the quantizer, we created a SentenceStructureQuantizer model that takes in 2 hyperparameters: codebook size/number of clusters and number of dimensions for the encoded vectors. Each sentence is first encoded as its spacy-extracted sentence structure (example: “I am cold” - ['nsubj-PRON', 'ROOT-AUX', 'acomp-ADJ']), then converted to a numerical vector. We trained a projection layer to reduce the vectors to the dimension hyperparameter and use k-means clustering to get the final clusters. This becomes our fixed codebook that will map a quantized sentence (its projected numerical vector) to its best sentence structure for a paraphrase. We adjusted the hyperparameters by testing 11 different configurations of (clusters/codebook size,dimensions) tuples ranging from (16,8) to (256,32). The quantizers were trained on the TripAdvisor dataset. Each model was trained for 2 epochs with a 1e-3 learning rate and the Adam optimizer.
- Regarding the generator, we need a model for generating sentences given a grammatical structure and group of words. We took our two inputs for the generator (a text and a sentence structure) and passed them into the following OpenAI prompt: “Please rewrite the following text so that the meaning is preserved, but the wording is changed. Please feel free to generate a sentence that is longer than the original and one that uses new/different words. Additionally, structure the sentence according to this pattern: {sentence_structure}. Each word in the rewritten sentence must strictly follow the given order: “Verb” means a verb must be in this position. “Article” means an article (a, an, the) must be used here. “Adverb” means an adverb must be placed here. “Adjective” means an adjective must be placed here. “Noun” means a noun must be placed here. “Preposition” means a preposition must be placed here. “Punctuation” could mean a sentence ending punctuation OR a comma if placed in the middle of the structure. Example transformation: Original: “The cat jumps quickly over the fence.” Structure: "Article Noun Verb Adverb Preposition Article Noun Punctuation" Rewritten: "A dog runs swiftly under a bridge." Now rewrite the following text: {original_text}” This prompt, as stated previously, is placed into a Python function with an OpenAI API call. The output is then a new sentence with the inputted text’s meaning with the inputted structure.
- We trained the discriminator using Google Colab’s T4 GPU and the TripAdvisor dataset. We used an 80/20 split of all the human and AI-generated sentences. We started with the pre-trained BERT-based-uncased model, which is also a transformer-based language model with 12 layers, 12 attention heads in each layer, vocabulary/token size 30,522. We used 3 training epochs, batch size of 8, and weight decay 0.01.
- We trained the information loss model using Google Colab’s T4 GPU. We started with the pre-trained roBERTa-large model, which is a transformer-based language model with 24 layers, 16 attention heads, vocabulary size 50,265, and the gelu activation function. We used 3 training epochs, batch size of 8, and weight decay 0.01.
- Per the name, the GAN employs adversarial learning with the generator attempting to produce text that the discriminator cannot distinguish from the human-written samples while the discriminator improves its classification ability. The training follows an adversarial approach (as mentioned earlier) where improvements in one model led to iterative refinements in the other. During training, the only things that train are the discriminator and the codebook (the generator trains indirectly through the codebook) . The discriminator trains as usual. The codebook learns to map one cluster/sentence structure to another cluster that best retains meaning, through a loss function that combines information loss retention and discriminator feedback (in order to preserve meaning and improve realism respectively). The codebook’s training consists of learning an n x n probability matrix, where each entry (i,j) corresponds to the likelihood of sentence structure j being a good paraphrase for sentence structure i. We initialize the distribution evenly, and then select our input sentences from the training data. For a quantized sentence belonging to cluster i, we sample from codebook entry i’s probability distribution for the corresponding sentence structure j to try out. We feed sentence i and structure j into the generator. We combine the information loss and discrimination loss scores on the generator’s output to continually update the codebook’s learned distribution: if loss is high, that pairing’s probability is decreased and vice versa.
Results
- The information loss model has an evaluation loss MSE score of 0.0758. Since the similarity scores are on a scale of 0 to 1, the MSE indicates decent performance. Below is a comparison of the improvement before and after fine-tuning. In particular, you can see that the information loss model calculates relative similarity really well in spite of red herrings like identical sentence structures with differing meanings. These sentence pairs are arranged in order of expected similarity, and are presented with their similarity scores before and after fine-tuning.
  - ("The dog chased the cat.", "The cat was chased by the dog") | 0.01862064152956009 | 0.899785041809082
  - ("The dog chased the cat.", "The dog was chased by the dog") | 0.017950563132762908 | 0.5314149379730224
  - ("My name is Allie.", "My name is Alex.") | 0.022851869463920593 | 0.24938662052154542
  - ("My name is Allie.", "His name is Alex.") | 0.16397514343261718 | 0.022809723019599916
- The discriminator has an evaluation loss MSE score of 0.000191.
- For the quantizer, the (16,8) model ended up having the best silhouette score of 0.301, which makes sense because fewer clusters and lower dimensionality will avoid overfitting the best.
  - Silhouette Score for quantizer_n16_d8: 0.30076608061790466
  - Silhouette Score for quantizer_n16_d16: 0.18706172704696655
  - Silhouette Score for quantizer_n16_d32: 0.11842822283506393
  - Silhouette Score for quantizer_n16_d64: 0.05619918555021286
  - Silhouette Score for quantizer_n16_d128: 0.03889497369527817
  - Silhouette Score for quantizer_n32_d16: 0.17034225165843964
  - Silhouette Score for quantizer_n32_d32: 0.07735659927129745
  - Silhouette Score for quantizer_n32_d64: 0.03266051039099693
  - Silhouette Score for quantizer_n64_d32: 0.06313733011484146
  - Silhouette Score for quantizer_n128_d32: 0.05521371588110924
  - Silhouette Score for quantizer_n256_d32: 0.042372141033411026

A silhouette score of 0.3 indicates weak clustering, but interpreted alongside the graph, we attribute this to densely-packed clusters and not weak clusters since the graph shows distinct, strong clustering.

After training the GAN on the training subset of the TripAdvisor dataset, our results on the evaluation set were as follows:
- Average information loss: 0.7190. This indicates decent information retention, falling within our 0.5-1 range for partial to perfect information retention. You can also use the example sentence pairs under the Results section as a relative measure.
- Average lexical richness: 0.9245. This comes very close to the human rows’ average score of 0.9880, indicating that our generator retains good vocabulary diversity.
- Average discrimination rate: 1. We attribute the discriminator’s 100% success rate to the fact that the TripAdvisor dataset’s human rows contain too many grammar errors, so it’s too easy for the discriminator to identify anything grammatically correct as AI-generated. We explored training the discriminator and GAN on other datasets with grammatically correct human and AI sentences such as paraAMR [https://github.com/amazon-science/ParaAMR] which contains rows of human text and AI paraphrasing, but we could only train that discriminator to get a 0.69 evaluation loss.
  Codebook loss function distributions: "0": [0.0,80.0,225.0,259.2,120.0,405.0,150.0,270.0,120.0,64.0,3844.3359375,162.0,911.25,1000000.0,100.0,150.0]
  We examined the codebook's learned distributions to see that the GAN training helped the codebook learn meaningful pairings between clusters. For example the above distribution for cluster/sentence structure 0 shows that it pairs strongest with sentence structure 13.
We have a completed generator (quantizer + generator) trained with the GAN. The quantizer takes in a sentence s, quantizes it, and samples the codebook’s probability distribution for a corresponding sentence structure to paraphrase s with. The generator generates a new sentence based on that sentence structure, and returns its information loss score compared to s and the discriminator’s result. Below is an example: We feed this sentence into the quantizer: “Hospitals have systems of clinics around the Midwest.” The quantizer returns its sentence structure:

['quantmod-NUM', 'punct-SYM', 'nummod-NUM', 'dep-NOUN', 'mark-SCONJ', 'det-DET', 'nsubj-NOUN', 'aux-AUX', 'ROOT-VERB', 'amod-ADJ', 'dobj-NOUN', 'prep-ADP', 'pobj-NOUN', 'prep-ADP', 'pcomp-VERB', 'punct-PUNCT'] The quantizer maps this sentence to its closest cluster center for paraphrase structures: - ['nsubj-PRON', 'aux-AUX', 'ROOT-VERB', 'det-DET', 'dobj-NOUN', 'punct-PUNCT', 'dep-VERB', 'nsubj-PRON', 'ccomp-VERB', 'punct-PUNCT'] The generator takes in the original sentence and the given sentence structure and rewrites the sentence to: “The clinics are organized within the systems of hospitals throughout the Midwest.” The discriminator says the rewritten text is AI-generated. The rewritten text's information retention score from its original sentence is 0.911.

Future Work

Develop a Standalone Generator: As described before, we currently leverage the OpenAI API to generate text with our generator. Developing a standalone generator (as we did with the discriminator) will fit with the GAN model of the two training off of each other better. Enhance Vector Quantization Encoding: By increasing the codebook size and reducing the overlapping structures within, sentence diversity could be promoted.
Improve Generator Updates: Currently, the generator updates with rule-based adjustments which likely limits its flexibility and adaptability in learning meaningful transformations. It would be beneficial to integrate a loss function that dynamically scales updates based on information retention scores and discriminator feedback.
Rethink Dataset Selection/new way of training discriminator: It was mentioned that the average discrimination rate was 1.00 implying that it was very easy for the discriminator to identify machine-written text. This is attributed in large part to the dataset that was selected - the TripAdvisor Hotel Reviews. Because nearly all of the human-written text is written without grammatical structure and rules (and subsequently all of the machine-generated text is written following strict grammar rules), it makes it very easy to tell when text is human vs. machine. We also tried retraining the discriminator on the ParaAMR dataset, where the human and AI versions of text are both grammatically correct, but this resulted in a bad discriminator with a very high 0.69 evaluation loss. In a future project, it would be better to select a dataset that still has human voice, but also follows grammatical rules, or find a way to train a discriminator so that it can distinguish between grammatically correct human and AI text.

GANs in NLP: Enhancing Text Generation and Discrimination - minalee-research/cs257-students GitHub Wiki

Abstract

What this project is about

Progress made so far

Approach

Experiments

Future Work

⚠️ GitHub.com Fallback ⚠️

GANs in NLP: Enhancing Text Generation and Discrimination - minalee-research/cs257-students GitHub Wiki

Abstract

What this project is about

Progress made so far

Approach

Experiments

Future Work

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️