Listen to the article
From last Wednesday’s decision in X.AI LLC v. Bonta, by Judge Jesus Bernal (C.D. Cal.):
Assembly Bill 2013 …, entitled “Artificial Intelligence Training Data Transparency” requires developers of “a generative artificial intelligence system or service” that is “publicly available to Californians for use” to “post on the developer’s internet website documentation regarding the data used by the developer to train the generative artificial intelligence system or service.” The documentation must include “[a] high-level summary of the datasets used in the development of the generative artificial intelligence system or service” addressing, but not limited to, twelve enumerated topics. Those topics include:
(1) The sources or owners of the datasets.
(2) A description of how the datasets further the intended purpose of the artificial intelligence system or service.
(3) The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets.
(4) A description of the types of data points within the datasets….
(5) Whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain.
(6) Whether the datasets were purchased or licensed by the developer.
(7) Whether the datasets include personal information ….
(8) Whether the datasets include aggregate consumer information ….
(9) Whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence system or service.
(10) The time period during which the data in the datasets were collected, including a notice if the data collection is ongoing.
(11) The dates the datasets were first used during the development of the artificial intelligence system or service.
(12) Whether the generative artificial intelligence system or service used or continuously uses synthetic data generation in its development….
The court concluded that the law likely didn’t violate the First Amendment. It first concluded that the law likely compelled speech only in the context of commercial speech:
The Ninth Circuit has permitted “compelled disclosures” even in cases that “did not ‘propose a commercial transaction,'” where the statutes “nonetheless provided parties to ‘actual or potential’ commercial transactions with information about those transactions.” That is precisely what A.B. 2013 does. In the marketplace of AI models, A.B. 2013 requires AI model developers to provide information about training datasets, thereby giving the public information necessary to determine whether they will use—or rely on information produced by—Plaintiff’s model relative to the other options on the market. After all, “[p]art of the reason that the First Amendment protects commercial speech is that such speech furthers the consumer’s interest in the free flow of commercial information.”
Plaintiff complains that A.B. 2013 “forces developers to publicly disclose their data sources in an attempt to identify what California deems to be ‘data riddled with implicit and explicit biases.'” That language comes from the California Labor Federation’s arguments in support of the statute, not any legislator’s statements or any language in the adopted bill itself. Plaintiff also asserts that A.B. 2013 “indirectly attempts to influence the viewpoints espoused by xAI’s models (i.e., their outputs) by targeting the data that goes into them.” But nothing in the language of the statute suggests that California is attempting to influence Plaintiff’s models’ outputs by requiring dataset disclosure, rather than simply providing consumers with the information necessary to make judgments about Plaintiff’s—and all other AI model developers’—model quality based on the data that goes into them.
There is nothing political, for example, about a consumer wanting to know if certain medical data or scientific information was used to train a model so that the consumer can evaluate whether the model is likely to be sufficiently comprehensively trained and reliable for the consumer’s purposes. Certainly some may use these disclosures to select or avoid certain models based on perceived political biases in training datasets, but that is only one of many potential metrics for consumer evaluation—and one that consumers in the AI model marketplace are entitled to consider when choosing their model. No part of the statute indicates any plan to regulate or censor models based on the datasets with which they are developed and trained….
And it then held that, as a regulation related to commercial speech, the law is likely constitutional:
Given that the Court has found that Plaintiff has failed to carry its burden to show likelihood of success on the merits of its claim that A.B. 2013 regulates non-commercial speech, the Court considers the level of scrutiny appropriate. Plaintiff contends that Zauderer v. Office of Disciplinary Counsel of Supreme Court of Ohio (1985), provides the appropriate standard, but Central Hudson would appear to be more on-point given existing caselaw.
While the Court might be inclined to find that A.B. 2013 regulates speech that is purely factual, noncontroversial, and not unjustified or unduly burdensome—the Zauderer standard—the Supreme Court’s limited use of Zauderer outside of misleading advertisement regulations counsels against its application in this case. In Milavetz, Gallop & Milavetz, P.A. v. U.S. (2010), the Court noted that it had previously employed Central Hudson in a case involving advertising statements that were not inherently misleading and not likely to mislead consumers. This case is even further afield from the original context under which Zauderer arose, tipping the scale toward Central Hudson.
For A.B. 2013 to “survive intermediate scrutiny under Central Hudson, the State must establish that the law directly advance[s] a substantial governmental interest, and [that] the means chosen [are] not … more extensive than necessary.” Plaintiff’s allegation that “it is far from clear how the trade secrets A.B.2013 would force xAI to disclose are of any value to consumers at all” is not especially compelling. It strains credulity to essentially suggest that no consumer is capable of making a useful evaluation of Plaintiff’s AI models by reviewing information about the datasets used to train them and that therefore there is no substantial government interest advanced by this disclosure statute.
At the same time, it may be possible through the litigation process to demonstrate the limited utility of high-level dataset summaries for important consumer decisionmaking or that the state’s approach with A.B. 2013 is “more extensive than necessary” to achieve the goal of transparency for consumers. While “‘consumer curiosity’ alone is generally insufficient as a substantial state interest,” litigation may reveal that “the States asserted interests here are not limited to transparency for its own sake.” It simply remains to be seen.
Ultimately, Plaintiff has demonstrated a distinct possibility of prevailing on the merits under Central Hudson. But it had not demonstrated a likelihood of success on the merits. The information before the Court is insufficient to come to such a conclusion at this stage. Plaintiff therefore does not satisfy this threshold inquiry for a preliminary injunction on its First Amendment claim….
And the court held the law was likely not unconstitutionally vague:
The statute requires AI model developers like Plaintiff to publish “[a] high-level summary of the datasets used in the development of the generative artificial intelligence system or service.” The “high-level summary” must include, but is not limited to, disclosures on a variety of topics touching sources and owners of datasets; the size of datasets; the period of collection of the data within the datasets; and other information.
Even if the term “high-level summary” is not the picture of clarity standing alone, it is followed by a precise list of information to be included. Plaintiff takes issue with “dataset” and “data point” being undefined in the statute, yet Plaintiff seems to understand and use with ease “dataset” throughout its Complaint. Plaintiff questions the meaning of “dataset” and “data point” in its Complaint and offers various interpretations, but has not actually alleged that this term is ambiguous by industry standards—especially given that Plaintiff appears to know what “dataset” refers to in other parts of its Complaint.
Plaintiff also takes issue with the statute’s list of information being non-comprehensive, because there is apparently “no way of knowing what additional information must be provided to fully comply with that obligation.” But the Ninth Circuit has been clear that “criteria are [not] vague simply because they fail to delineate a set of factors.” Here, there is a list of information required akin to a set of factors—it is simply non-exhaustive. Given that a statute entirely lacking a list of factors can still be sufficiently clear, it is likely that a non-exhaustive list is enough.
Plaintiff’s other arguments are similarly insufficiently persuasive at this stage, absent a better-developed record, to find a likelihood of success on the merits. Plaintiff takes issue with an apparent discrepancy between the disclosure requirements for “training” data versus “development” data. Determining the meaning of the statute will require further development of the record, including on legislative intent and those terms’ usage in the industry.
With respect to which systems the statute covers, Plaintiff questions whether it must make disclosures for licensed systems or incorporated and optimized systems developed by others. But Plaintiff has not alleged facts to suggest it has systems that fall into those categories. Plaintiff has been clear that it is presenting an as-applied, not facial, challenge to this statute. Thus, Plaintiff must actually face such a conundrum—rather than raising an abstract possible issue among AI systems developers—for the Court to make a determination on this issue.
Ultimately, the record at this stage is insufficiently developed for the Court to determine that Plaintiff is likely to succeed on the merits of its vagueness challenge. Evidence may arise during the course of litigation that eventually requires a different determination. But the pleadings and record as they stand are not enough at this time….
Joseph Henry Meeker of the California Justice Department and Kristin A. Liska of the AG’s office represents the state.
Read the full article here
Fact Checker
Verify the accuracy of this article using AI-powered analysis and real-time sources.
