roberta-large-openai-detector

Overview

RoBERTa large OpenAI Detector is the GPT-2 output detector model, obtained by fine-tuning a RoBERTa large model with the outputs of the 1.5B-parameter GPT-2 model. The model can be used to predict if text was generated by a GPT-2 model. This model was released by OpenAI at the same time as OpenAI released the weights of the largest GPT-2 model, the 1.5B parameter version.

The model is a classifier that can be used to detect text generated by GPT-2 models.

The model's developers have stated that they developed and released the model to help with research related to synthetic text generation, so the model could potentially be used for downstream tasks related to synthetic text generation. See the associated paper for further discussion.

Training Details

Training Data

The model is a sequence classifier based on RoBERTa large (see the RoBERTa large model card for more details on the RoBERTa large training data) and then fine-tuned using the outputs of the 1.5B GPT-2 model (available here).

Training Procedure

The model developers write that:

We based a sequence classifier on RoBERTaLARGE (355 million parameters) and fine-tuned it to classify the outputs from the 1.5B GPT-2 model versus WebText, the dataset we used to train the GPT-2 model.

They later state:

To develop a robust detector model that can accurately classify generated texts regardless of the sampling method, we performed an analysis of the model’s transfer performance.

See the associated paper for further details on the training procedure.

Evaluation Results

The following evaluation information is extracted from the associated paper.

The model is intended to be used for detecting text generated by GPT-2 models, so the model developers test the model on text datasets, measuring accuracy by:

testing 510-token test examples comprised of 5,000 samples from the WebText dataset and 5,000 samples generated by a GPT-2 model, which were not used during the training.

The model developers find:

Our classifier is able to detect 1.5 billion parameter GPT-2-generated text with approximately 95% accuracy...The model’s accuracy depends on sampling methods used when generating outputs, like temperature, Top-K, and nucleus sampling (Holtzman et al., 2019. Nucleus sampling outputs proved most difficult to correctly classify, but a detector trained using nucleus sampling transfers well across other sampling methods. As seen in Figure 1 [in the paper], we found consistently high accuracy when trained on nucleus sampling.

See the associated paper, Figure 1 (on page 14) and Figure 2 (on page 16) for full results.

Limitations and Biases

In their associated paper, the model developers discuss the risk that the model may be used by bad actors to develop capabilities for evading detection, though one purpose of releasing the model is to help improve detection research.

In a related blog post, the model developers also discuss the limitations of automated methods for detecting synthetic text and the need to pair automated detection tools with other, non-automated approaches. They write:

We conducted in-house detection research and developed a detection model that has detection rates of ~95% for detecting 1.5B GPT-2-generated text. We believe this is not high enough accuracy for standalone detection and needs to be paired with metadata-based approaches, human judgment, and public education to be more effective.

The model developers also report finding that classifying content from larger models is more difficult, suggesting that detection with automated tools like this model will be increasingly difficult as model sizes increase. The authors find that training detector models on the outputs of larger models can improve accuracy and robustness.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by RoBERTa large and GPT-2 1.5B (which this model is built/fine-tuned on) can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups (see the RoBERTa large and GPT-2 XL model cards for more information). The developers of this model discuss these issues further in their paper.

Model Evaluation samples

Task	Use case	Dataset	Python sample (Notebook)	CLI with YAML
Text Classification	Sentiment Classification	SST2	evaluate-model-sentiment-analysis.ipynb	evaluate-model-sentiment-analysis.yml

Inference samples

Inference type	Python sample (Notebook)
Real time	sdk-example.ipynb
Real time	text-classification-online-endpoint.ipynb

Sample inputs and outputs

Sample input

{
    "input_data": [
        "Today was an amazing day!",
        "It was an unfortunate series of events."
    ]
}

Sample output

[
  {
    "label": "LABEL_0",
    "score": 0.5973310470581055
  },
  {
    "label": "LABEL_0",
    "score": 0.5915216207504272
  }
]

Version: 17

Tags

model_specific_defaults : {'apply_deepspeed': 'true', 'apply_lora': 'true', 'apply_ort': 'true'}

View in Studio: https://ml.azure.com/registries/azureml/models/roberta-large-openai-detector/version/17

Properties

SharedComputeCapacityEnabled: True

SHA: 5002d695ecf610d8bbfb1fa0d14f1575185b4915

evaluation-min-sku-spec: 4|0|28|56

evaluation-recommended-sku: Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_DS12_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_FX4mds, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

finetune-min-sku-spec: 4|1|28|64

finetune-recommended-sku: Standard_NV12s_v3, Standard_NV24s_v3, Standard_NV48s_v3, Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC24rs_v3, Standard_NC4as_T4_v3, Standard_NC8as_T4_v3, Standard_NC16as_T4_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND40rs_v2, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4

finetuning-tasks: text-classification, question-answering

inference-min-sku-spec: 2|0|8|28

inference-recommended-sku: Standard_DS3_v2, Standard_D4a_v4, Standard_D4as_v4, Standard_DS4_v2, Standard_D8a_v4, Standard_D8as_v4, Standard_DS5_v2, Standard_D16a_v4, Standard_D16as_v4, Standard_D32a_v4, Standard_D32as_v4, Standard_D48a_v4, Standard_D48as_v4, Standard_D64a_v4, Standard_D64as_v4, Standard_D96a_v4, Standard_D96as_v4, Standard_F4s_v2, Standard_FX4mds, Standard_F8s_v2, Standard_FX12mds, Standard_F16s_v2, Standard_F32s_v2, Standard_F48s_v2, Standard_F64s_v2, Standard_F72s_v2, Standard_FX24mds, Standard_FX36mds, Standard_FX48mds, Standard_E2s_v3, Standard_E4s_v3, Standard_E8s_v3, Standard_E16s_v3, Standard_E32s_v3, Standard_E48s_v3, Standard_E64s_v3, Standard_NC4as_T4_v3, Standard_NC6s_v3, Standard_NC8as_T4_v3, Standard_NC12s_v3, Standard_NC16as_T4_v3, Standard_NC24s_v3, Standard_NC64as_T4_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2

languages: en

models roberta large openai detector - Azure/azureml-assets GitHub Wiki

roberta-large-openai-detector

Overview

Training Details

Training Data

Training Procedure

Evaluation Results

Limitations and Biases

Model Evaluation samples

Inference samples

Sample inputs and outputs

Sample input

Sample output

Tags

Properties

⚠️ GitHub.com Fallback ⚠️

models roberta large openai detector - Azure/azureml-assets GitHub Wiki

roberta-large-openai-detector

Overview

Training Details

Training Data

Training Procedure

Evaluation Results

Limitations and Biases

Model Evaluation samples

Inference samples

Sample inputs and outputs

Sample input

Sample output

Tags

Properties

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️