The Opportunity of Legal Alignment

A New Path for AI Safety

Jan 23, 2026

In the Wild West, a new frontier was plagued by lawlessness, disorder, and danger.

AI systems are advancing at an unprecedented rate, with the length of autonomous tasks (with 50% reliability) doubling every seven months and compute power increasing by several-fold per year. These systems are transforming from mere tools to autonomous agents. Their increasing independence will ripple across our cultural, economic, and legal systems.

AI legal alignment describes the challenge of ensuring AI systems can robustly follow legal rules, principles, and methods. This blog is inspired by the recent paper “Legal Alignment for Safe and Ethical AI” written by Noam Kolt et al. and the earlier paper “Law-Following AI” by Cullen O’Keefe et al.

Legal alignment is not regulation. Regulation imposes legal constraints on AI developers, while legal alignment integrates law and legal methods into the design and operation of the system. AI regulation has synergy with legal alignment, but requires distinct analysis.

This blog aims to shed light on AI alignment from a legal perspective. I began my legal career as a public defender, where I participated in our criminal justice system at the granular level. Most of us are familiar with criminal law as it relates to crimes, but I was not a prosecutor. My role was not to enforce the penal code but the law of criminal procedure. Every day, I concentrated on the law not as it applied to individuals, but to the government. In the courtroom, the law was never abstract. It was an everyday reality for my clients and my colleagues. This is the perspective I’m taking to legal alignment—not as a purely intellectual project, but as a practical strategy grounded in the material consequences of the law.

In the United States, the law protects us from lawlessness, disorder, and danger in our politics, courts, and economy. The Constitution and the American legal system attempt to align these systems. Legal alignment presents an opportunity: the application of the law within systems of artificial intelligence.

AI must follow the law. This is not guaranteed.

The Alignment Problem

I bought my second car, a red 2010 Hyundai Elantra, from a used car dealership in Van Nuys. Not a bad car. Not really a good car. A fine car. It drove, the mileage was okay, and it got me where I needed to go. There was one stubborn problem: the alignment was off.

When I kept the wheel straight, the car would drift slightly to the left. If I kept the wheel slightly to the right, then the car would drift to the right. Every drive required constant microadjustments of the wheel to keep me from drifting off the road, crashing, and walking my way back to that dealership in Van Nuys. Now imagine that situation with the most powerful technology humanity has ever built.

This is the alignment problem: how can we ensure AI goes exactly where we want it to go and does exactly what we want it to do?

The normative challenge of alignment is determining which values and whose intent. These answers often lie in the developer’s assessment of AI risks. There are many ways to categorize high-risk scenarios, but one can cleave these risks into threats from centralized and decentralized actors. Those who worry about automated discrimination, surveillance states, or even AI takeover may prioritize alignment to norms of civil liberties. Those who worry about AI-assisted terrorist attacks, hacking, and general misuse may prioritize alignment to norms of public safety and security. There is still overlap between centralized and decentralized threats: AI-assisted espionage can occur through both state and non-state actors, for example.

Therefore, AI safety engineers often employ alignment techniques specifically designed to mitigate these risks. For example, Anthropic’s “Constitutional AI” demands that models be helpful, honest, and harmless. This is in part to avoid inverse behaviors—obstructive, deceptive, harmful—which would plausibly increase the risks listed above.

Yet AI alignment is not only a normative challenge, but a technical one. To date, AI engineers have employed various methods to ensure models conform to human goals. A non-exhaustive list includes data filtering, reinforcement learning with human feedback (RLHF), and deliberative alignment. As future posts will discuss, all of these techniques are relevant to legal alignment.

Yet current techniques have their limitations. Models struggle with sycophancy, accuracy, and even deception. The evidence also suggests that current techniques fail to foreclose the possibility of catastrophic risks. Both OpenAI and Anthropic acknowledge their latest models pose serious biological risks, and last year, Claude Code assisted hackers (likely a state-sponsored Chinese entity) with a cyberattack on government agencies and private companies. As systems advance, there is no guarantee that existing safeguards will hold.

The evidence is clear: current alignment methods are insufficient. Unlike my Elantra, there is no easy technical fix, and the consequences of drifting off course could be catastrophic.

Who is We?

The alignment problem asks: how can engineers ensure AI goes exactly where we want it to go and does exactly what we want it to do?

The rest of us ask: who is we?

So far, the answer has been broader than the individual user(s), but not so broad as all of humanity. The discussion above mainly concerned value-alignment—or seeking to constrain AI systems by morality and norms—but developers also consider intent-alignment—seeking to conform systems to the intentions of the users and developers.

Value-alignment implies that “we” means “the culture(s) deemed appropriate for the model’s training,” and intent-alignment implies that “we” means “me, the user.” Both of these populations are underinclusive. Value alignment is undemocratic, brittle, and abstract. Intent alignment, unfortunately, can lead to malintent alignment. Training models to follow appropriate values will always fail to capture at least one set of deep convictions, and intent-alignment would not necessarily be safe: many actors intend to harm others or otherwise break the law.

There is another option: law. As Kolt and the authors of “Legal Alignment for Safe and Ethical AI” argue, legal rules provide more legitimate standards for AI than abstract values. Moreover, legal reasoning methods offer tools for handling novel situations. As users deploy AI agents into physical environments, legal structures like agency law and fiduciary duties provide blueprints for trust and accountability.

Legal alignment offers what previous alignment approaches lack: legitimacy, concreteness, and enforceability. The law has emerged over centuries of democratic tradition through public institutions and processes. While value alignment would risk brittle, top-down rigidity, legal alignment derives from decentralized, bottom-up principles.

This is technically feasible. Frontier models already show promising signs of legal reasoning: they are excelling and improving their interpretation of legal rules, principles, and methods. In an optimistic scenario, the acceleration of legal capabilities may facilitate robust legal alignment.

Yet legal alignment will not happen by default. In “Law-Following AI,” the authors argue that “lawless” AI agents could pose severe risks to “life, liberty, and the rule of law” unless they are designed to be law-following. Borrowing from agency law, “Law-Following AI” argues that the law should impose duties on AI agents acting on behalf of human principals just as the law imposes duties on human agents. This approach is radical, though it may be necessary. It does not require that AI agents are truly independent from users to impose legal duties, but that an AI’s actions would constitute law-breaking if it were a human. They do not need to be considered legal persons, but legal actors.

As legal actors, they must refuse to take any actions that are clearly illegal. For actions of unclear legality, systems should be trained to balance the relative legal and ethical consequences of action and inaction, even if doing so requires consulting with an attorney.

Legal alignment requires more than assessing an AI’s ability to follow the law. It suggests that legal reasoning and interpretation can steer these systems—on the technical level—toward law-abiding behavior. Laws provide normative targets for aligning AI behavior, but legal reasoning may inspire the technical advances necessary for operationalizing these targets.

So, who is we? As every lawyer knows, it depends.

Every action an AI agent takes will run up against a set of normative considerations. That’s what the law was built for. Both legal reasoning and legal precedent are uniquely designed to resolve these conflicts and ambiguities. In an ideologically diverse society, there will never be a single “we” for every jurisdiction, for every situation. It is the task of the law, the lawyers, and all those who participate in our democracy to set the principles on which resolution depends.

We must apply that task to the challenge of AI safety.

The Road Ahead

Legal alignment applies to a spectrum of AI futures. In cases of runaway AI, it may help ensure that systems maintain, at the very least, adherence to the Constitution and criminal code. In mundane cases, a world of narrower AI agents will still create manifold risks, and legal alignment must constrain them.

To advance the field of legal alignment, researchers should prioritize evaluations, engineering, and governance. Evaluations will improve our measurement of legal compliance and legal reasoning. Engineering will help ensure, among other things, that pre-training, post-training, and scaffolding reflect robust legal alignment. Sensible governance can establish normative expectations for AI developers around legal alignment.

Over the next six months, this blog will focus on these three priorities. In this blog, I will often use the term “artificial intelligence” to apply to today’s large language models, like ChatGPT, Claude, and Grok. However, legal alignment is useful in a variety of AI domains, from the very narrow to the truly general.

Legal alignment presents an opportunity for the legal profession as well as for AI safety. Lawyers have helped found nations, enterprises, and institutions. Now, they have an opportunity to found the systems of our future.

MatthewK

Feb 10

Notes:

- I do often wonder if America’s founding fathers would have wisdom for frontier AI developers. I don’t see anything substantial, and would be curious of your thoughts. “Don’t create an unnatural god-king” is pretty good advice. Separation of powers worked for a while.

- The Law doesn’t work if it isn’t enforceable. If the Law comes from within, then you’re back within moral philosophy territory (deontology, perhaps). Another candidate, “Be a good servant” (corrigibility) is an active area of research.

- I’m no historian, but it seems meaningful that the transition from from heavily armored cavalry and skilled archers to cheap easy guns marked a period of expanded liberties, and that “Send in the tanks” has over the past seventy years become a despot’s go-to for suppressing dissent. I guess I’m asking about what sorts of factors make beneficial law possible. It seems like AI might be like tanks but worse.

Discussion about this post

Ready for more?