Working Group #5: Safety & Security - SteveKommrusch/CSU_AISafetySecurity GitHub Wiki

NIST Overview

Coordinate and develop guidelines related to managing the safety and security of dual-use foundation models.

Additional material

GPT trustworthiness

Proposals

NIST document open for public review until March 15th, 2025

NIST SP 800-1 Updated Guidelines for Managing Misuse Risk for Dual-Use Foundation Models: This document provides voluntary guidelines for improving the safety, security, and trustworthiness of dual-use foundation model (hereafter referred to as “foundation models”) consistent with the National AI Initiative Act, Executive Order 14110, and the October 24, 2024, Presidential National Security Memorandum on AI.v,1 5 Specifically, it focuses on managing the risk that such models will be deliberately misused to cause harm to public safety or national security. The ways that foundation models can be misused continue to evolve, but scenarios include using a model to facilitate the development of chemical, biological, radiological, or nuclear weapons; enable offensive cyber-attacks; and generate harmful or dangerous content, such as child sexual abuse material (CSAM) and non-consensual intimate imagery (NCII) of realindividuals.
Ideas for comments:

NIST document open for public review until September 9th, 2024

Managing Misuse Risk for Dual-Use Foundation Models. From the Intro: "This document provides guidelines for improving the safety, security, and trustworthiness of dual-use foundation models (hereafter referred to as “foundation models”) consistent with the National AI Initiative Act and Executive Order 14110. Specifically, it focuses on managing the risk that such models will be deliberately misused to cause harm. The ways that foundation models can be misused continue to evolve, but they include the risks that models will facilitate the development of chemical, biological, radiological, or nuclear weapons; enable offensive cyber attacks; aid deception and obfuscation; and generate child sexual abuse material (CSAM) and non-consensual intimate imagery (NCII) of real individuals."
Response from Leela AI to be sent by Steve Kommrusch

As the principal investigator for Leela AI in the AISIC, I'm pleased to provide comments on the NIST AI 800-1 draft document. Leela AI works with state-of-the-art machine learning models and adds common-sense reasoning to create activity and object interaction summaries from video data. We also have ongoing research in the area of online machine learning. We believe documents such as this one are a key way for companies to communicate responsibly and ensure safe AI is being provided to consumers. As this draft is open for public response, I'd like to acknowledge Dr. Indrajit Ray, professor and associate chair with the Department of Computer Science at Colorado State University, and John Diamant, former Secure Development and Application Security Strategist with HP[E] and acknowledged contributor to NIST IR 8151 for their contributions to this response. Overall we feel this is a strong document that outlines a good approach, and we provide the comments below for your consideration.

At a high level, one of our goals for NIST AI 800-1 would be to help the industry develop consistent metrics to the point that a market demand creates independent AI safety companies that provide not just independent testing but also tools that are relied on by companies to do internal testing. Another goal we have is to see AI used to automatically test for AI safety; we feel that as AI advances, using AI in evaluations will become critical - automatically checking a wide variety of cases and risks will be important as AI models advance and also as models are created by new companies and other organizations.

Include certain specifics to provide examples that illustrate concepts: Certainly NIST wants to be cautious about endorsing any specific solution or organization, but in some cases specifics can help concretize the concept. In order to help concretize metrics, we would recommend footnote 18, referenced in section 1.1, be brought up to the document level by adding the following to point "a" on line 24, page 5: "Some representative examples are: Anthropic (2023) Responsible Scaling Policy [cite], OpenAI's Preparedness Framework (Beta) [cite], DeepMind (2024) Frontier Safety Framework [cite], Magic (2024) AGI Readiness Policy [cite]." Line 3 on page 10 could be continued with "For example, the METR third-party evaluation suite includes human or AI evaluations of actions during evaluation to significantly reduce the risk that dangerous actions are taken [https://arxiv.org/pdf/2312.11671]." Line 9 on page 10 could be continued with "A representative example is the UK AISI report [https://www.aisi.gov.uk/work/advanced-ai-evaluations-may-update] evaluating Harmbench [https://arxiv.org/pdf/2402.04249] on several models." In addition to Harmbench, LiveBench [https://livebench.ai/] is another well-supported benchmark intended for capabilities testing of foundation models and could be referenced directly by 800-1. We would view other such specific examples as helping the community move towards common standards that can then be enablers for common communication and testing.

CVE-like reporting: The cybersecurity problem has been around long enough to have created the CVE (common vulnerabilities and exposures) system for software. As AI advances and common architectures demonstrate common vulnerabilities, we feel that sowing the seeds for a future AI version of CVE is what a NIST standard should be striving for. In addition to creating a standard for vulnerability reporting, 800-1 should recommend that third party discovery of an AI vulnerability be communicated to the model owner within a specified time before public disclosure. To add these concepts specifically into 800-1, we suggest that between line 16 and line 17 on page 17 add "An example of a mature vulnerability reporting system is the cybersecurity field's CVE database [https://cve.mitre.org/], which organizes and tracks cybersecurity risk." The end of line 18 on page 15 could be extended with "Example processes for third party discovery of AI vulnerabilities could mimic established security disclosure processes [https://www.infosecinstitute.com/resources/vulnerabilities/how-to-reserve-a-cve-from-vulnerability-discovery-to-disclosure/]"

AI automation in testing: Advancing AI models and capabilities will necessitate rapid evaluation against previously tested threats. As multiple organizations develop models that require testing, third party evaluators may become overloaded and require in-house automation to keep up with demand. To increase trust in AI evaluation of models, we should add best-practice comments regarding AI model usage. For example, between lines 29 and 30 on page 10, an 8th item could be added: "8. Consider using a an AI model to help execute recommendations 1 through 7. As AI models advance, their ability to aid in red-teaming is expected to improve [https://openreview.net/forum?id=4KqkizXgXU]. The AI model used may be another instance of the model being tested, or may be a previously tested and verified model." And then also after line 6 on page 12 add: "e. Detail the success and weaknesses of AI-assisted testing to help evaluate AI capabilities in the area of software testing as well as document how deeply AI was able to explore this problem space."

Threat modeling: The document should leverage and reference threat modeling and analysis best practices and methodologies, as to a large extent this paper is an application of threat modeling to a new threat target. For example this reference [https://owasp.org/www-community/Threat_Modeling] contains this summary of what a thread model typically includes:

Description of the subject to be modeled
Assumptions that can be checked or challenged in the future as the threat landscape changes
Potential threats to the system
Actions that can be taken to mitigate each threat
A way of validating the model and threats, and verification of success of actions taken

NIST document open for public review until June 2nd, 2024

NIST SP 800-218A Secure Software Development Practices for Generative AI and Dual-Use Foundation Models (NIST Special Publication (SP) 800-218A), is designed to be used alongside the SSDF (SP 800-218). While the SSDF is broadly concerned with securing the software’s lines of code, the companion resource expands the SSDF to address a major concern with generative AI systems: They can be compromised with malicious training data that adversely affect the AI system’s performance.
Ideas for comments:
1. Generally a good document draft and approach.
2. PW.2.1 (independent, qualified software design review for security requiremenents being met, and risks addressed) should be High priority, not Medium. This is an absolutely fundamental activity for security requirements being met by the design, and to prioritize any gaps. Based on hundreds of such reviews over a decade of experience in systems which conducted these reviews, it is almost guaranteed that such reviews will find otherwise undiscovered high and moderate severity security requirements gaps, so the reviews are essential to prevent these important gaps from remaining. What good are security requirements and risk modeling if the systems are not reliably assessed to determine whether those critical requirements and risks are addressed?
3. Either this document or a future one should be expanded to include non-Generative AI (other types of AI) security.
4. PO 3.1 relates to using tools to help insure AI security; it doesn't mention AI itself directly and it should. As AI advances, getting usage history related to AI will help us deal with more capable and more advanced models. Clearly establishing expectations that AI should be used in evaluation of newer AI should be incorporated in the document.