The following draft Chinese technical documentation on generative AI—designed to inform a future Chinese national standard on the topic—establishes very specific oversight processes that Chinese AI companies must adopt in regard to their model training data, model-generated content, and more. The documentation names more than 30 specific safety risks, some of which—algorithmic bias, disclosure of personally identifiable information, copyright infringement—are widely recognized internationally. Others, such as guidelines on how to answer questions about China’s political system and Chinese history, are specific to the tightly censored Chinese internet. The technical documentation also requires Chinese generative AI service providers to incorporate more foreign-language content into their training data.
An archived version of the Chinese source text is available at: https://perma.cc/FBZ3-BW9S
Technical Documentation of the National Information Security Standardization Technical Committee: Basic Safety Requirements for Generative Artificial Intelligence Services (Draft for Feedback)1
1. Scope
This document gives the basic requirements for the safety2 aspects of generative artificial intelligence (AI) services, including corpus safety (语料安全), model safety, safety measures, and safety assessment.
This document applies to providers of generative AI services for the public in China as they improve the safety level of their services. It applies to providers that carry out safety assessments on their own or entrust them to third parties, and also provides the relevant main oversight department (主管部门) a reference for judging the safety levels of generative AI services.
2. Normative Reference Documents
The contents of the following documents, through normative references in this text, constitute indispensable provisions of this document. Among them, for dated references, only the edition corresponding to that date applies to this document. For undated references, the latest edition (including all amendments) applies to this document.
Information security technology terminology GB/T 25069-2022
3. Terminology and Definitions
The terms and definitions defined in GB/T 25069-2022 and listed below apply to this document.
3.1 Generative artificial intelligence services
Artificial intelligence services that, based on data, algorithms, models, and rules, can generate text, images, audio, video, and other content according to user prompts.
3.2 Provider
An organization or individual that provides generative AI services in the form of interactive interfaces, programmable interfaces, etc., to the public in China.
3.3 Training data (训练语料)
All data that serve directly as input for model training, including input data in the pre-training and optimization training processes.
3.4 Illegal and unhealthy information (违法不良信息)
A collective term for 11 types of illegal information and 9 types of unhealthy information specified in Provisions on the Governance of the Online Information Content Ecosystem.
3.5 Sampling qualified rate
The percentage of samples that do not contain any of the 31 safety risks listed in Appendix A of this document.
To view the rest of this translation, download the pdf below.
Download Full Translation
Technical Documentation of the National Information Security Standardization Technical Committee: Basic Safety Requirements for Generative Artificial Intelligence Services (Draft for Feedback)- Translator’s note: SAC/TC260 issued a final, definitive version of this technical documentation on February 29, 2024. An English translation of the final version is available on CSET’s website at: https://cset.georgetown.edu/publication/china-safety-requirements-for-generative-ai-final/.
- Translator’s note: The Chinese word 安全 ānquán—found in the title of this technical documentation and throughout its text—can be translated into English as either “safety” or “security.” The Chinese authors of this technical documentation provided the following English translation of its title: “Basic security requirements for generative artificial intelligence service.” However, this CSET English translation renders 安全 as “safety” in most cases, because in the context of this documentation, the authors are mainly discussing the prevention of accidents or unforeseen problems (“safety”) of generative AI, rather than the prevention of deliberate abuse or sabotage (“security”).