Make a submission: Published response

CSIRO

Published name

CSIRO

1. Do the proposed principles adequately capture high-risk AI?

Yes

Please provide any additional comments.

AI risk is commonly assessed based on its consequences and frequency/likelihood (Xia, B. et al. (2023) ‘Towards Concrete and Connected AI Risk Assessment (C2AIRA)’, CSIRO, in. 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN), arXiv. Available at: https://doi.org/10.48550/arXiv.2301.11616.). The principles outlined employ varying terms such as impacts, severity, and extent. Later, additional words like scale, intensity, and likelihood are introduced to provide further elaboration. On the one hand, the term ‘consequence’ may present limitations in two ways. First, it often refers to the more immediate effects of an AI system, whereas ‘impact’ more accurately captures broader and longer-term adverse effects. Second, consequences can be multi-dimensional, encompassing aspects such as scale/extent and intensity.

On the other hand, it may be beneficial to begin by clarifying the use of key terms (Aven, T. et al. (2018) Society for Risk Analysis Glossary. Society for Risk Analysis. https://www.sra.org/risk-analysis-introduction/risk-analysis-glossary/). For example, ‘consequence’ and ‘frequency/likelihood’ could be established as standard terms, with an emphasis on ‘impact’ to highlight longer-term and wider-ranging adverse effects. Following this, different dimensions of consequence/impact—such as scale/extent and intensity—could be introduced to reflect the multifaceted nature of consequences/impacts and to apply them clearly in the context of the principles.

Finally, the current principles are framed around the risk of AI “due to its use,” not its development. However, we know that the development of AI often requires access to massive amounts of data, some of which may be obtained without permission. It may be worth considering changing “due to its use” to “due to its use and development” to reflect these risks more accurately.

Are there any principles we should add or remove?

Yes

Please provide any additional comments.

It is understandable that the current principles focus on the adverse impacts on affected entities, such as individuals, groups, or the economy, society, and the environment, with AI technology risks considered as risk factors contributing to these impacts.

However, there is merit in listing technology risks as a distinct principle for several reasons. The practical ability to assess AI system risks to affected entities is often very challenging—sometimes nearly impossible—due to unique and intrinsic characteristics of AI, such as autonomy, general-purpose capabilities (often requiring minimal further development thus reducing the opportunity to further control risks), and the inherent inscrutability of the technology.
While the existing principles are sound in theory, applying them in practice can be exceedingly difficult. This is precisely why General Purpose AI (GPAI) models, which exhibit these characteristics most strongly, have been proposed for designation as high-risk, independent of any proposed existing principles or use. The EU has also designated all GPAI systems (https://artificialintelligenceact.eu/high-level-summary/) (in addition to GPAI models) in this category, while the US has focused on GPAI models or systems with advanced capabilities.

Introducing technology risk as a standalone principle would make such designations more consistent with proposed principles, rather than treating them as exceptions. It would also future-proof the principles, as technological evolution may increasingly render entity or outcome-driven risk assessments less practical, despite their theoretical appeal.

A suggested new principle could be: “The technology risks associated with the intrinsic and unique characteristics of AI—such as autonomy, general-purpose/cognitive capabilities, opacity, and lack of explainability and controllability—may lead to unpredictable or systemic adverse impacts, independent of specific use cases.”

2. Do you have any suggestions for how the principles could better capture harms to First Nations people, communities and Country?

Yes

Please provide any additional comments.

Most current thinking addressing harm to First Nations communities focuses on the impact or consequences of AI systems. The risks of misappropriation, exploitation and the continuation of deficit discourse when using AI are high. The consequences are further negative impacts on the health and well-being of Indigenous communities. Understanding about how First Nations data and knowledge can be used to train AI systems is required.

Example: Deficit biases were detected during the development of the Indigenous Jobs Map created by CSIRO in 2023. Managing this within the project required Indigenous expertise and leadership to guide, test and validate the results from the algorithm to provide an accurate representation of Indigenous job advertisements.

Additionally, the use of First Nations knowledge systems in parallel with AI systems (not necessarily trained on First Nation data) and other knowledge systems during operation needs thoughtful consideration. AI does not consider the Indigenous Cultural and Intellectual Property rights of Indigenous peoples globally and has the potential to create more harm to Indigenous peoples due to poor historical research practices drawing on and extracting Indigenous knowledges. The development of Indigenous Data Protocols is essential for the ethical use of data and creation of AI.

This approach, rather than subordinating First Nation knowledge systems as merely inputs to AI, offers a complementary—and sometimes alternative—approach to responsible AI learning from First Nations data with Indigenous leadership and expertise. It is suggested that the Guardrails emphasise both the responsible use of data and the harm that could arise from relegating First Nations knowledge systems to a subordinate role in the context of AI systems.

3. Do the proposed principles, supported by examples, give enough clarity and certainty on high-risk AI settings and high-risk AI models? Is a more defined approach, with a list of illustrative uses, needed?

YES the principles give enough clarity and certainty

5. Are the proposed principles flexible enough to capture new and emerging forms of high-risk AI, such as general-purpose AI (GPAI)?

No

Please provide any additional comments.

AI models form a component of the final AI system, which is the entity that ultimately exhibits both positive and adverse impacts. There are four distinct types of entities: narrow AI models, narrow AI systems, GPAI models, and GPAI systems. The current proposal mentions “high-risk AI systems” and “GPAI models,” which could potentially lead to confusion for several reasons:

- Although one could argue that the term "AI system" includes both narrow AI systems and GPAI systems, leaving the GPAI system implicit may cause confusion, especially when the EU AI Act explicitly differentiates between GPAI models and GPAI systems (https://artificialintelligenceact.eu/high-level-summary/). By only considering high-risk AI systems, this proposal may leave a loophole for non-high-risk GPAI systems. Technically, since GPAI systems are built on GPAI models, which are inherently difficult to control, it is nearly impossible to reliably create a low-risk GPAI system from high-risk GPAI models. Moreover, new risks and different risk factors can arise in GPAI systems, even if the underlying GPAI models have undergone thorough risk assessment and control.

- Another source of confusion may be whether it is possible to build a narrow AI system using a GPAI model by limiting the capabilities of the underlying GPAI model through various methods such as model-level mechanisms, fine-tuning, or system-level mechanisms. Technically, mechanisms to limit a powerful GPAI model remain unreliable. This makes any AI system based on GPAI models inherently high risk, regardless of the use case, risk assessment, or whether it is claimed to be narrow AI via limiting underlying capabilities.

- A third point of confusion involves whether high-risk narrow AI models are included. Narrow AI model development can occur in separate organisations from those developing and deploying AI systems. It would be technically challenging for an AI system developer to take a high-risk narrow AI model, developed without proper guardrails, and exert sufficient control to make the final AI system low risk.
Thus, it is suggested to change “high-risk AI systems and GPAI models” to “high-risk AI models, high-risk AI systems, GPAI models, and GPAI systems.”

(Hooker, S. (2024) ‘On the Limitations of Compute Thresholds as a Governance Strategy’. Available at: https://arxiv.org/abs/2407.05694v1)

6. Should mandatory guardrails apply to all GPAI models?

Yes

7. What are suitable indicators for defining GPAI models as high-risk?

Base on technical capability

What technical capability should it be based on?

Other

8. Do the proposed mandatory guardrails appropriately mitigate the risks of AI used in high-risk settings?

No

Please provide any additional comments.

On page 30, it is stated that “These guardrails aim to reduce the chance of harms.” While it is true that there is always the likelihood or chance of harm, the consequence of such harms should also be considered. Guardrails should aim to reduce both.

The effectiveness of the proposed guardrails in mitigating risk depends heavily on the concrete technical approaches taken. These technical approaches differ between narrow AI models/systems and GPAI models/systems, and also between AI model developers, AI system developers, and AI deployers. It is important to develop or refer to concrete approaches that sufficiently differentiate between them.

Due to the rapid evolution of technology and the disciplinary differences across AI model development (core AI community), AI system development (software/AI engineering community), and governance (policy, law, and management communities), there has been significant confusion over the use of terminology, leading to mischaracterisation and the misuse of approaches at the model, system, and organisational levels (Xia, B. et al. (2024) ‘An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping’. CSIRO. Available at: https://doi.org/10.48550/arXiv.2404.05388.). Practitioners in different roles often consider applying techniques from one field sufficient without incorporating others. For example, some believe that model testing is sufficient without system testing after integration. Figure 2 in this discussion paper reflects this tendency, as it omits AI system testing after integration and before deployment, giving the impression that testing a trained model alone is sufficient. Other gaps may arise from the quality of the test cases themselves and misunderstandings about the subtle differences between benchmarking, capability discovery, and testing for functional accuracy versus testing for all risks.

It is recommended that a more comprehensive set of example techniques, with harmonised or mapped terms across different disciplines, be provided at both the guardrail level and in the standards/guidelines to be developed later.

9. How can the guardrails incorporate First Nations knowledge and cultural protocols to ensure AI systems are culturally appropriate and preserve Indigenous Cultural and Intellectual Property?

See answer to Question 2.

10. Do the proposed mandatory guardrails distribute responsibility across the AI supply chain and throughout the AI lifecycle appropriately?

No

Which of the guardrails should be amended?

Protect AI systems, and implement data governance measures to manage data quality and provenance
Test AI models and systems to evaluate model performance and monitor the system once deployed
Enable human control or intervention in an AI system to achieve meaningful human oversight
Inform end-users regarding AI-enabled decisions, interactions with AI and AI-generated content
Establish processes for people impacted by AI systems to challenge use or outcomes
Be transparent with other organisations across the AI supply chain about data, models and systems to help them effectively address risks
Other

11. Are the proposed mandatory guardrails sufficient to address the risks of GPAI?

No

How could we adapt the guardrails for different GPAI models, for example low-risk and high-risk GPAI models? 

First, it is important to differentiate between GPAI models and GPAI systems. There are only a limited set of unreliable risk mitigation techniques that can be applied at the model level, such as through improved data curation, safety-focused fine-tuning, model manipulation, or reinforcement learning using human/AI feedback.

On the other hand, many additional mechanisms can be applied outside of the model, such as responsible AI design patterns for LLM-based systems (Qinghua Lu, Liming Zhu, Xiwei Xu, Zhenchang Xing, Stefan Harrer, and Jon Whittle. “Towards Responsible Generative AI: A Reference Architecture for Designing Foundation Model based Agents.” arXiv preprint arXiv:2311.13148 (2023). CSIRO) or LLM-based agents (Liu, Yue, Sin Kit Lo, Qinghua Lu, Liming Zhu, Dehai Zhao, Xiwei Xu, Stefan Harrer, and Jon Whittle. “Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents.” arXiv preprint arXiv:2405.10467 (2024). CSIRO), as well as out-of-model runtime guardrails (Md Shamsujjoha, Qinghua Lu, Dehai Zhao, and Liming Zhu. “Towards AI-Safety-by-Design: A Taxonomy of Runtime Guardrails in Foundation Model based Systems.” arXiv preprint arXiv:2408.02205 (2024). CSIRO) for detecting, filtering and blocking. Given the inherent inscrutability and lack of controllability of GPAI models—and the fact that most Australian companies will be using GPAI models to develop either narrow-AI systems or GPAI systems—a stronger emphasis on out-of-model guardrails should be placed on GPAI systems or high-risk AI systems that leverage GPAI models.

It is also crucial to clarify the distinction between AI model/system evaluation and testing. While testing involves executing a model or system to verify and validate that it exhibits expected behaviours using test cases, evaluation encompasses a broader set of risk mitigation techniques. These include capability evaluations to detect unforeseen behaviours and capabilities (rather than only expected behaviours), the evaluation of model artifacts without executing them (such as by examining training data and base models via tools), diverse model creation approaches (such as model merging and model configuration), and the internal structure of GPAI models. Some AI developers and deployers may narrowly interpret the term "testing" as verification against predefined expected behaviours, which could limit their use of broader techniques such as user validation, benchmarking and exploratory testing, where expected behaviours are dynamically determined.

For GPAI models and systems, it is suggested that more technical evaluation (beyond testing) approaches and technical indicators (beyond testing results) be considered to mitigate risks, as outlined in the response to question 7.

12. Do you have suggestions for reducing the regulatory burden on small-to-medium sized businesses applying guardrails?

Yes

Please provide any additional comments

There are several key approaches to reducing the compliance/conformance burden on SMEs, especially when they act as AI system developers and deployers without directly engaging in narrow AI or GPAI model development.

- Ideally, the additional work required for implementing AI system developer or deployer guardrails should depend on the nature and extent of the guardrails already put in place by upstream AI developers, with the focus only on addressing remaining gaps. The current guardrails do not emphasise how to identify specific gaps and mitigate them. By placing greater emphasis on identifying gaps and effectively reusing the guardrail work of more transparent upstream players, the compliance burden on SMEs can be significantly reduced.

- SMEs would benefit greatly from automated compliance tools. However, this requires that the guardrails and related standards be as machine interpretable as possible, enabling tool creation. At present, most regulations and standards are written primarily for human interpretation, often with a degree of vagueness—sometimes necessarily so, but other times unnecessarily. Ensuring these are written in a way that machines can interpret would streamline compliance for SMEs.

- Providing more detailed guidance can reduce, not increase, SME compliance and conformance costs. There is a common misconception that SMEs, due to their limited resources and AI literacy, benefit from simpler, shorter, or more abstract guidelines. However, as long as standards are not lowered for SMEs, providing more detailed and technical guidance equips them with the specific information they need to comply effectively. Larger organisations can afford the resources to figure out and develop their own practices from vague or abstract practices, but SMEs need the specifics in order to directly implement them. It is recommended that the guardrails offer more detailed, technical guidance to SMEs, rather than oversimplified or high-level advice for the sake of simplicity.