Working Group #3: Capability Evaluations - SteveKommrusch/CSU_AISafetySecurity GitHub Wiki

NIST Overview

Create guidance and benchmarks for evaluating and auditing AI capabilities, with a focus on capabilities through which AI could cause harm, such as in the areas of chemical, biological, radiological, and nuclear (CBRN), cybersecurity, autonomous replication, control of physical systems, and other areas Develop and aid in ensuring the availability of testing environments, such as testbeds, to support the development of safe, secure, and trustworthy AI technologies.

Materials

SWE-agent
LiveCodeBench

Working Group #3: Capability Evaluations - SteveKommrusch/CSU_AISafetySecurity GitHub Wiki

NIST Overview

Materials

Proposals