Signed in as:
filler@godaddy.com
Signed in as:
filler@godaddy.com
Click on the link below to open the editable version of the Guide.
We welcome your comments there.
This draft Guide to Test and Evaluation Practices in AI-Enabled Military Systems results from an ongoing dialogue between Chinese, American, European and other international experts on the design, deployment and international cooperation for AI in the military. Aligned with the goals of responsible AI and AI assurance, the Guide offers practical suggestions that governments and militaries can use, and other stakeholders can demand, to make military AI applications safer, more robust and more compliant with international law.
INHR's dialogue between Chinese, U.S. and an "international all-star team" of experts and former officials has been meeting since 2019 - online and in-person under various hosts and sponsors. The most recent meeting in Copenhagen, co-hosted with the Center for a New American Security and the Royal Danish Defense College - produced expert-level agreement on this draft Guide. The Guide includes 22 recommended practices, covering the life cycle of a weapon from design, through deployment and post-use investigation and international cooperation standards. Among the topics addressed are those relating to foundation models for generative AI, data and data integrity, design standards, human machine-teaming, "red teaming" new products and searching for unanticipated errors, proposals for standards to draw from corporate and tech community expertise, and a recommendation that all designers, users and purchasers follow the precautionary principle. Because the field of AI testing, evaluation, validation and verification is fast-paced and rapidly changing, we share the draft guide here as a way to make it more accessible and to encourage experts from the commercial, technical and academic world to share their insights and comments with government officials from all countries.
The Guide is divided into three sections: Design, Deployment and International Cooperation related to Standards, Incidents and Confidence-Building. It also includes commentary analyzing why these recommendations are especially relevant to AI-enabled and military systems, as opposed to testing of a traditional computer software or weapons platform. We hope you find it useful and encourage you to share the recommendations with those in your government or network. If you have proposals to improve the guide, please offer them below.