Make a submission: Published response
Published name
6. Should mandatory guardrails apply to all GPAI models?
Please provide any additional comments.
Such a regulation will stall innovation and progress in Australia given that GPAI models, same as any technology, can be used to benefit individuals and society (e.g., in healthcare, education) and the scenarios where they are benefitting the society continues to grow and is an area of active research.
Recently California Governor Newsome vetoed Senate Bill 1047, the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act. In his vetoes' explanatory letter, Governor Newsome explains “Smaller, specialized models may emerge as equally or even more dangerous than the models targeted by SB 1047 — at the potential expense of curtailing the very innovation that fuels advancement in favor of the public good”. This view recognises that the lines between GPAI, Generative AI, AI, and indeed automated systems, are unclear, with many such terms difficult to define. (More on the question of definition in our response to Question 7.)
At the same time, it is clear that AI innovation is rapid and globally decentralised. While well intentioned, inappropriately targeted guardrails may stifle innovation in Australia, and risk the nation’s burgeoning technology sector global competitiveness.
To better manage risk for any future guardrails, we would advocate for consideration of carve outs for basic research in AI that is pre-commercialisation.
7. What are suitable indicators for defining GPAI models as high-risk?
8. Do the proposed mandatory guardrails appropriately mitigate the risks of AI used in high-risk settings?
Please provide any additional comments.
*** Guardrail 2: Establish and implement a risk management process to identify and mitigate risks. ***
* Threat model and risk likelihood *
This guardrail should explicitly state threat modelling, i.e., identifying risks without also performing threat modelling means that risks due to misuse of AI models may be missed or under-explored. The process needs to include the considerations of uncertainty and assumptions [1], given the rapid development of GPAI, the fast adaption of threat actors on new services, systems powered by GPAI, and the possibly constrained effectiveness of new defence strategies. Given this threat model, Guardrail 4 should then require “testing” in the view of the threat model.
The proposals paper frames risks as capable of being eliminated. In reality, many threats to systems, spanning privacy, security/integrity, and bias/fairness may be mitigated but often elimination is not possible.
*** Guardrail 3: Protect AI systems, and implement data governance measures to manage data quality and provenance ***
* Data Privacy *
Guardrail 3 should explicitly state protection of data and privacy in its title and expand its scope accordingly.
The risk of AI to data privacy is high due to
1. AI models are developed and trained on often personal data (e.g., healthcare, financial data, images, messages)
2. AI models operate on private data of end-users.
AI models can and have been shown to memorise data they use and can then inadvertently return it as part of their output – violating privacy of an individual [2]. The proposal does not emphasise that data plays an essential role in AI eco-system and has to be considered and protected explicitly. Solely relying on Privacy Act is de-emphasising the fact that AI poses an unprecedented risk to data privacy and this risk is unique to AI eco-system. Moreover, the Privacy Act does not cover and protect personal data appropriately. For example, the Privacy Act suggests that anonymisation presents adequate protection. This is not true, as anonymised data can be linked to an individual without explicit identifiers (e.g., number of people who share the same home and work address is small, allowing for singling out an individual in a dataset that has only this information). AI models operate on data that may not be considered personally identifiable albeit pertains to very sensitive information.
For example, consider a chat between a patient and a health bot that will be used to improve the performance of the chat bot in the future. Even if the name of the patient is stripped, their unique medical condition and language use can be traced to this patient [3].
To this end, data used in an AI model has to be explicitly protected as information about it can be explicitly leaked. For example, it has been shown that by comparing multiple copies of language models it is easy to extract data that was added to train the new model in verbatim form [4]. In the example, above this would indicate that extraction of verbatim sentences of the patient used in chatbot can be extracted.
As a result, we suggest that Guardrail 3 explicitly states protection of privacy in its title and expands the scope of the guardrail. Moreover, the data that requires protection should not only concern personally identifiable information (as it is widely known that one can be easily de-identified through other information pertaining to the individual, even if the data is anonymised).
We also recommend that Figure 3 explicitly states that sensitive data may be used 1. in “Data used for modification” (e.g., fine-tuning is often done on the dataset closer to the final task and may contain confidential, sensitive, IP-protected or personal data) 2. when end user/consumer interacts with the model (e.g., by supplying details of their health condition when interacting with a health bot).
*Alignment with use case and threat model*
The proposed guardrail (including Guardrail 8) calls for transparency regarding the data, models and evaluations for both developers and deployers. This is important, but in addition, a mechanism is desirable that explicitly tracks the alignment between the intended use and the actual deployment of an AI system. This should cover (1) appropriateness of the training data to the domain in which systems are deployed; (2) alignment of the intended use of the system and the actual use case; (3) alignment of developer tests and evaluations to the actual use case.
*** Guardrail 4: Test AI models and systems to evaluate model performance and monitor the system once deployed. ***
We need to adopt an early warning detection mechanism [5] and recommend this is made explicit in Guardrail 4. Before deployment, the developer must conduct adversarial testing, which has been recommended in the report. Additionally, we suggest that the requirement of ‘testing’ should not only cover adversarial testing. It should specifically include testing AI systems’ susceptibility to the threats identified during threat modelling (our recommendation for Guardrail 2), and to test the adequacy of the mitigations and defences deployed against those threats. After deployment, ongoing monitoring and evaluation is also needed. Throughout the process, the misuse of AI and consequences must be well documented, and reported via a responsible report protocol, so the identified risks can be communicated effectively and efficiently.
We believe the proposals paper would ideally expand on the types of threats to be guarded against (and evaluated by adversarial testing). While the proposals paper tends to focus on bias, it is critical that systems (whether incorporating AI or otherwise) protect privacy, copyright, and integrity against known vulnerabilities.
*** Guardrail 5: Enable human control or intervention in an AI system to achieve meaningful human oversight. ***
The guardrail suggests that reversibility may be possible with human intervention. However, it is important to note that in many settings, including online and physical settings and those with high-frequency operations, actions may not be reversible.
*** Guardrail 8: Be transparent with other organisations across the AI supply chain about data, models and systems to help them effectively address risks. ***
* Integration of threat model and use case *
Similar to Guardrail 3, we argue that transparency should be with respect to the intended use and threat model. The framework of Model Cards [6] (as briefly mentioned in the proposal paper) has been shown to be effective for sharing of relevant information by the developers. In addition, deployers should record their compliance with the information shared in the Model Card. This should be done in addition to the reporting of adverse incidents and model failures (Guardrail 8), as the former could prevent cases of the latter.
Transparency through frameworks like model cards and documentation of intended and actual use cases can be applied to closed- and open-sourced models alike, and as such could help address concerns around the guardrailing open-source models (page 33).
* Risk prevention results *
In addition to sharing information of model development, training data sources, AI systems and deployment, it is also crucial to report the safety test results (e.g., benchmarks and red teaming practices), and incidents such as model stealing, unintentional releases of model parameters [5].
For ease of burden on SMEs, it is also practical to explore the level of information disclosure.
[1] Kapoor et al., “On the Societal Impact of Open Foundation Models.” arXiv preprint arXiv:2403.07918, 2024.
[2] Carlini et al., “Extracting Training Data from Large Language Models”. In USENIX Security 2021.
[3] Chris Culnane, Benjamin Rubinstein & Vanessa Teague. “Health data in an open world”, December 2017, available at https://arxiv.org/abs/1712.05627.
[4] S. Zanella-Béguelin et al., “Analyzing Information Leakage of Updates to Natural Language Models”. In ACM Conference on Computer and Communications Security (CCS), 2020
[5] Bommasani et al., “A Path for Science‑ and Evidence‑based AI Policy”, https://understanding-ai-safety.org
[6] Mitchell et al. “Model Cards for Model Reporting”. In Proceedings of the Conference on Fairness, Accountability, and Transparency 2018 https://dl.acm.org/doi/abs/10.1145/3287560.3287596
10. Do the proposed mandatory guardrails distribute responsibility across the AI supply chain and throughout the AI lifecycle appropriately?
Which of the guardrails should be amended?
Please provide any additional comments
*** The end-user – a critical party in the AI eco-system ***
Currently the end-user is missing from the proposal. The end-user presents risks in the AI eco-system that must be captured when regulating AI. The interactions between an AI model and the end-user can be harmful to the deployer and other users of AI models in at least two settings:
1. By interacting with the API of an AI model, the end-user can reverse-engineer the AI model for the purpose of stealing the training data [7] of the model or stealing the model itself [8]. The former is harmful as training data is likely to contain the data of other users that can be personal, confidential and sensitive (e.g., health information, images, text messages, information such as race, gender even when records are de-anonymised). The latter is harmful for the deployer of the AI model, as their “stolen” model can be used for unintended purposes and impersonate the deployer. Moreover, given that the attacker “knows” the original model, some attacks become easier as opposed to the case when the attacker has only API access (e.g., adversarial examples against a model that cause misprediction are known to become easier [9] [10]).
2. AI models integrate user data to train and fine-tune models to better match their specific use case (e.g., chatbot of an insurance provider). As a result, user data has a direct effect on the behaviour of the model. An adversarial end-user can supply data that can “poison” the model such that the model behaves in unexpected manner (e.g., the model can be biased towards certain outputs [11] [12]).
Given that there are already known ways of an end-user adversarial impacting the AI ecosystem, we strongly suggest that the end-user is considered as another party in Table 2 and appropriate guardrails are added for intentionally malicious interactions with AI. Certainly, malicious uses may be dealt with under existing laws, including the e-safety and data protection regimes. The point is that these uses should be foreseen and responded to ex ante.
Importantly, when placing guardrails, it is important not to forbid cases where “white-hat” end-users (e.g., researchers) find vulnerabilities in AI systems and report them – preventing malicious actors of exploiting these vulnerabilities for harmful purposes.
*** Proposal’s distinction between the deployer and the developer is blurry ***
In many circumstances, the deployer will need to adapt the model for their purposes, combine AI with their existing infrastructure. For example, consider a retailer interested in embedding a chatbot on their website. They may use a general chatbot as an initial model and then fine-tune it on their FAQs and product information to make the final chatbot specialised for their use case. To customise the chatbot ever further, they may use dialogues between their customers and past versions of the chatbot. In this example, the roles of the developer and deployer, as defined in Table 2 overlap.
We understand that there is some inevitable blurring between the roles of developer and deployer in any guidance or in this kind of proposals paper. However, we consider that the mandatory guardrails that are adopted and any resultant safety/responsible regulatory framework should be responsible to this mix of possible roles and avoid unnecessary duplication of responsibilities.
[7] Carlini et al., “Extracting Training Data from Large Language Models”. In USENIX Security Symposium 2021.
[8] Tramèr et al. “Stealing machine learning models via prediction APIs”. In USENIX Security Symposium 2016.
[9] Carlini and Wagner, “Audio Adversarial Examples: Targeted Attacks on Speech-to-Text". In Deep Learning and Security Workshop 2018.
[10] Carlini and Wagner, “Towards Evaluating the Robustness of Neural Networks”. In IEEE Symposium on Security and Privacy (SP) 2017.
[11] Hayes and Ohrimenko. “Contamination Attacks and Mitigation in Multi-Party Machine Learning”. In NeurIPS 2018.
[12] Nelson et al., “Misleading learners: Co-opting your spam filter”. In Machine Learning in Cyber Trust: Security, Privacy, Reliability 2009.
11. Are the proposed mandatory guardrails sufficient to address the risks of GPAI?
How could we adapt the guardrails for different GPAI models, for example low-risk and high-risk GPAI models?
We argued that the risk of the model depends on the use-case, and it is the use-case and how it intends to integrate AI that needs to be evaluated for risk. Moreover, each use case and deployment scenario present different risk and evaluation of risk should be associated with likelihood.
For example, it is known that a random guess of a password can circumvent security. However, likelihood of someone guessing a (strong) password is small. Similarly, a machine learning/AI model trained on images may not have seen all the possible images during its training (e.g., all photos used during training were taken at daylight or the black swan example). One can try to evaluate a likelihood of a behaviour of the AI model on input that it has not seen. This is where a threat model of where AI will be deployed (our recommendation for Guardrail 2) is also crucial as it will help understand the likelihood of a malicious actor having access to an AI model.
Rather than focus on GPAI, guardrails need to take into account the scenario and the use case of an AI model when evaluating and reporting risks.
12. Do you have suggestions for reducing the regulatory burden on small-to-medium sized businesses applying guardrails?
Please provide any additional comments
Given that large portion of business in Australia are SMEs, it is likely that AI adoption will be among SMEs. As such guardrail adoption and evaluation of AI risk may put a high burden on SMEs and require strong expertise in AI. To this end, option 3 may fit better in this case, where SMEs can receive help from an expert body on their use of AI.
13. Which legislative option do you feel will best address the use of AI in high-risk settings?
What opportunities should the Government take into account in considering each approach?
We consider that effective regulation of AI requires expertise to allow a responsible approach that does not undermine innovation but also ensures safe and responsible design, deployment and use of AI in Australian Society. This is best achieved through an independent regulator with specific expertise in AI. We do not see this option introducing undue problems of inconsistency or complexity given the focus would be on AI safety standards and targeted ex-ante.
15. Which regulatory option(s) will best ensure that guardrails for high-risk AI can adapt and respond to step-changes in technology?
Upload 1