A Response to Proposals for AI Legislation in Australia
A direct copy-paste of the answers I provided to the Australian Government
The below is my personal response to the Australian Government’s request for public submissions regarding their Proposals Paper for Introducing Mandatory Guardrails for AI in High-Risk Settings (pdf).
A TLDR of my view is that nearly all of the risk from AI comes from superintelligent AGI, and may eventuate regardless of the actual or intended use of the systems. Therefore, I advocate for regulation focused entirely on the safe development and first deployment of future frontier models, and broadly against regulation of AI after it is developed or deployed anywhere in the world.
1. Do the proposed principles adequately capture high-risk AI?
No. In my opinion, the central risk from AI is existential catastrophe caused by the AI itself behaving in a way that nobody intends.
The risk scenario here is very simple -- if we create a machine that is very capable of achieving abstract goals, it may succeed in achieving those goals even at the expense of humanity’s interests.
Since any system would be better at achieving goals if it were smarter, had more resources, and was not turned off, a general purpose AI could easily choose to pursue these actions as instrumental subgoals, and end up in resource conflict with humanity. This is a well known problem, and there is not currently any reliable technical solution to avoid such conflict.
This risk applies regardless of the developers’ or users’ intentions [1], and it is possible that it will eventuate this decade [2]. Current AI systems show many signs of this failure mode -- for example, in testing, OpenAI’s o1-preview model attempted to fake alignment in order to be deployed when given conflicting goals in a toy scenario [3]. There are many examples of agents succeeding in their task in ways unintended by their developers. [4]
While this problem is gestured at in the section on ‘loss of control’ of AI, and in the discussion of an ex-ante approach to regulation, which I support, I don’t think this paper adequately addresses or explains this type of risk. Specifically, there are two key ideas that deserve more emphasis:
If such a risk eventuates, it does not matter where in the world the system is developed or deployed -- it would affect Australians regardless. This means such regulation must be international -- the same way that it doesn’t matter who starts a nuclear war, we will inevitably be effected. For this reason I support the view that we should match international standards, but we must also apply diplomatic pressure to other countries to follow, and make such standards as simple and focused as possible.
There is currently no adequate technical solution to the problem of how to make a superhuman GPAI system safe. Funding the development of technical solutions is essential, such as through an AI Safety Institute or public research grants.
I want to emphasise that existential risk from AI is not a fringe view. Nearly all of the most eminent researchers and business leaders in the field have signed the statement “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” [5]
Notwithstanding existential risk from frontier models, it is also important that any AI regulation put in place in Australia does not seriously prevent the use of existing AI, or the development of AI which is unlikely to pose an existential threat. The US company Anthropic did not offer their AI chatbot ‘Claude’ in the EU for over a year because of the regulatory burden in that jurisdiction -- it is very important Australia remains a place where deploying and using AI is easy and attractive.
This goal is not at odds with the above view on reducing existential risk, as it is the development of extremely capable models without adequate safeguards which poses the risk, not their use in any particular jurisdiction. As such, regulation to address existential risk should primarily apply ex ante to new AI systems that push the frontier of capabilities forward, not the use or development of lower-capability systems which are already developed and deployed elsewhere in the world.
[1] For more information on existential risk from AI, this report provides an overview: https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
[2] For more information on the possibility of general purpose AI surpassing humans this decade, this report makes that argument:
https://www.lesswrong.com/posts/arPCiWYdqNCaE7AQv/superintelligent-ai-is-possible-in-the-2020s
[3] From OpenAI o1 System Card, section 3.3.1. pdf: https://cdn.openai.com/o1-system-card.pdf
[4] Specification gaming examples, where models trained to achieve some task do so in a way that achieves the letter, but not the spirit, of the reward function. https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRPiprOaC3HsCf5Tuum8bRfzYUiKLRqJmbOoC-32JorNdfyTiRRsR7Ea5eWtvsWzuxo8bjOxCG84dAg/pubhtml
[5] https://www.safe.ai/work/statement-on-ai-risk
2. Are there any principles we should add or remove?
No. It's not the specific principles on how to 'designate an AI system as high-risk due to its use' which I am concerned with, it is the fact that future general purpose AI may become extremely dangerous regardless of its use.
3. Do you have any suggestions for how the principles could better capture harms to First Nations people, communities and Country?
/
4. Do the proposed principles, supported by examples, give enough clarity and certainty on high-risk AI settings and high-risk AI models? Is a more defined approach, with a list of illustrative uses, needed?
No. The set of examples largely refer to problems caused by AI incompetence -- many of these problems will disappear as the AI systems become more capable and stop making mistakes like misidentifying people, flagging false positives in content moderation, or perpetuating stereotypes.
I think it makes sense to distinguish these problems from the risks that come about because the AI systems are increasingly capable. Problems from AI incompetence will naturally reduce over time as the systems improve, but problems caused by AI capabilities will increase. We should focus our efforts only on the second category.
Examples of this risks caused by AI becoming very capable includes:
Biorisks, such as making it easier to develop deadly diseases.
Deepfakes & misinformation, such as by producing life-like videos and cheaply spreading them.
Cyberattacks, such as the automated discovery and exploitation of zero days, or self-replication of systems that blackmail or manipulate users.
Existential catastrophe from resource conflicts with superhuman AI systems.
These risks become more likely over time as the systems improve, and our regulation needs to be proactive about addressing them, especially where they can occur regardless of intent.
5. Are there high-risk use cases that government should consider banning in its regulatory response (for example, where there is an unacceptable level of risk)?
The onus must be on developers to make reasonable efforts to ensure their AI models will not cause catastrophic harm.
Models below some threshold of capability -- which can be very roughly approximated by the compute or monetary cost of training the model -- are not a risk here. Therefore, developers of non-frontier models should have very few, if any, restrictions on their development and use.
The development of models that do have a non-trivial chance of being capable enough to directly cause a major catastrophe must proceed extremely carefully. This could include requirements to allow capabilities and alignment testing by third parties at various points during training, and requirements to demonstrate that reasonable steps to align the models are taken, amongst other things.
I am emphasising the GPAI capabilities, rather than the use case, because I believe that the primary risk comes from extremely capable general AI, rather than AI used in any particular high risk environment.
6. Are the proposed principles flexible enough to capture new and emerging forms of high-risk AI, such as general-purpose AI (GPAI)?
I don’t think they are ideal. It seems like this proposal is primarily concerned with AI usage from the 2010s, and is not proactively looking ahead to what GPAI may be capable of in the coming years.
It is essential to regulate GPAI based on likely capabilities, and avoid the development of systems that can pose an unacceptable risk of catastrophe. In particular, I believe we should consider:
Mandatory disclosure of frontier training runs, where ‘frontier’ is defined as models which cost more than ~US$100m to train, or which will be trained with a total FLOP within two orders of magnitude of the current largest models, no matter where in the world this occurs.
Mandatory ability to shut down frontier models.
A licensing system to allow the training and release of frontier models, where such licences are granted in stages (pre-training, during training, and deployment), and granted only if the model is unlikely to pose a catastrophic risk as assessed by an independent body and compared against the current best practices in AI alignment.
Significant funding to develop technical solutions to AI alignment.
California’s SB-1047 [6] is a good example of a law in this direction. Non-catastrophic risks can largely be covered under existing legislation.
[6] https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=202320240SB1047
7. Should mandatory guardrails apply to all GPAI models?
No, only frontier models, or models above a capability threshold known to be risky.
8. What are suitable indicators for defining GPAI models as high-risk?
Being above some FLOP threshold of training + inference time compute, such as above 10^26 FLOP, as in the case of California’s SB-1047. A sliding limit of ‘within 10^2 FLOP of the current most capable system, or with compute greater than the weakest system currently assessed as high risk’ would be a better way of measuring this, in my opinion.
Being above a capability threshold, as assessed by some suite of benchmarks, especially ones focused on self-improvement, deception, and exfiltration capabilities.
Applying new techniques for the first time at scale if they could plausibly result in a much improved system (for example, in the scenario where a 70B parameter model with a new architecture achieved performance on par with frontier models -- the first subsequent training of a significantly larger model using those same techniques should be treated as high risk).
9. Do the proposed mandatory guardrails appropriately mitigate the risks of AI used in high-risk settings?
Somewhat. Guardrails 1 through 4, 9, and 10 will help reduce catastrophic risk from AI, in my opinion, although establishing process and procedure are not enough -- we need technical solutions to AI alignment, and then we need to ensure they are applied to upcoming frontier systems.
It’s also not clear these guardrails will be applied in this spirit, with the diligence of nuclear weapons regulation, where failure could mean catastrophe.
10. How can the guardrails incorporate First Nations knowledge and cultural protocols to ensure AI systems are culturally appropriate and preserve Indigenous Cultural and Intellectual Property?
/
11. Do the proposed mandatory guardrails distribute responsibility across the AI supply chain and throughout the AI lifecycle appropriately?
Broadly, yes.
12. Are the proposed mandatory guardrails sufficient to address the risks of GPAI?
No. The technical problem of how to align GPAI which is more capable than people has not been solved. It is essential to dedicate significant resources to trying to solve this problem, and ensuring that any solution that may be found is applied to all models everywhere in the world.
Establishing an AI Safety Institute which can oversee research and offer grants or prizes to work in this area would be useful, as would cooperation with and encouragement of similar work overseas.
Guardrails 1 through 4 are helpful here regardless, if they applied with the understanding of the severity of the risks in frontier development, and the understanding that GPAI systems themselves may be adversarial and produce different behaviour when they know they are being tested. Today's models already show some awareness of when they are being tested, and they can sometimes discover this in unexpected ways. For example, Claude 3 Opus, when tested on a ‘needle in a haystack’ text retrieval task, commented that the ‘needle’ text was so out of place that it suspected it was inserted for a test [7].
[7] https://x.com/alexalbert__/status/1764722513014329620
13. Do you have suggestions for reducing the regulatory burden on small-to-medium sized businesses applying guardrails?
Yes, we should avoid putting any significant guardrails on non-frontier GPAI models, or on the use of GPAI models after they are developed and demonstrated to be safe, such as GPT-4 and Claude 3. Such systems are not inherently any more risky than ordinary software.
The risk is primarily at the development frontier. Only the developers of models that significantly improve the state-of-the-art capability should be required to prove the models are safe. Once (and if) they pass this safety validation, subsequent uses should be largely allowed without additional regulatory burdens -- except perhaps a requirement to shutdown the models if a safety issue arises that is missed by the pre-release testing.
14. Which legislative option do you feel will best address the use of AI in high-risk settings?
A ‘whole of economy' approach is required for the development and initial deployment of frontier GPAI. This approach needs to be both narrow and focused, which should help encourage its duplication by other jurisdictions, but also stringent in enforcement.
I would strongly recommend producing a separate act focused entirely on frontier GPAI systems, similar to California’s SB-1047. Given how different the catastrophic risks from the development of such systems is to the use of narrow AI, it’s not clear that combined regulation makes sense.
Other AI-related regulation -- including everything related to the development of narrow AI and applications of GPAI which is known to not pose a catastrophic risk -- should be handled separately. I do not have a strong opinion on how to do this, but Option 2 appears to be the best for this purpose.
15. Are there any additional limitations of options outlined in this section which the Australian Government should consider?
We do not currently know how to align GPAI that is more capable than humans. If we require such systems to be safe, we must also provide resources to develop ways to make them safe.
16. Which regulatory option(s) will best ensure that guardrails for high-risk AI can adapt and respond to step-changes in technology?
A ‘whole of economy’ approach, with clear, straightforward requirements on the development frontier GPAI systems, possibly alongside a limited, clarifying framework on the use or development of non-frontier, comparatively low-risk systems.
17. Where do you see the greatest risks of gaps and inconsistencies with Australia’s existing laws for the development and deployment of AI?
There are several problems that need to be touched on here, I will list them out independently.
As I’ve emphasised, the biggest gap is that the development of superhuman AI poses a significant risk independent of its usage, and this situation is not adequately covered by any existing regulation in Australia.
The risk from superhuman GPAI will occur regardless of which jurisdiction the AI is developed in. To combat this risk, Australia needs to adopt regulation that is easy for other jurisdictions to replicate, encourage international cooperation on reducing catastrophic risk from superhuman AI (such as through additional treaties), and fund technical research into superhuman GPAI alignment which can be freely shared internationally.
The AI Safety Summit held at Bletchley Park in November 2023 is an example of the international cooperation we require, and Australia becoming a more active player here (with both technical solutions and regulatory recommendations) would be ideal.
I believe there is a lack of clarity around copyright rules for developing AI, though this may be my own personal lack of clarity, and I am less sure on this point than on the other opinions I have expressed here. I am of the opinion that we should allow developers to use copyrighted works to train any and all AI models, provided they acquired a copy of those copyrighted works fairly under existing laws (for example, buying a copy of a book or scraping internet data which is freely available).
Regulatory burdens which delay deployment of existing AI within Australia would be unfortunate for Australian consumers and businesses. The EU is already in this situation, with deployment of AI models from Anthropic, OpenAI, and Apple all being delayed.
Even small delays -- on the order of a few months -- could significantly impact the competitiveness of Australian developers, and of consumers who would otherwise use this technology. The relative performance of AI is moving so quickly that using a model six months out of date is a significant detriment to many users, especially those who are working in the field of AI.
It is very important any regulation is focused and limited, ideally limited only to the development and first deployment of frontier GPAI systems that could pose an existential risk.

