OpenJDK Interim Policy on Generative AI
The field of generative AI is evolving quickly. It brings compelling opportunities to improve developer productivity, but it also brings risks: to reviewer burden, to safety and security, and to intellectual property.
Oracle, as the corporate sponsor of the OpenJDK Community, is working to draft a full policy governing the use of generative AI tools in OpenJDK contributions. Oracle will propose that policy to the OpenJDK Governing Board in due course. Until that policy is in place, the Governing Board has approved this interim policy:
Contributions in the OpenJDK Community must not include content generated, in part or in full, by large language models, diffusion models, or similar deep-learning systems. Content, in this context, includes but is not limited to source code, text, and images in OpenJDK Git repositories, GitHub pull requests, e-mail messages, wiki pages, and JBS issues.
Contributors in the OpenJDK Community may use generative AI tools privately to help comprehend, debug, and review OpenJDK code and other content, and to do research related to OpenJDK Projects, so long as they do not contribute content generated by such tools.
This interim policy aims to encourage the use of generative AI tools in ways that limit their risks while we gain further experience that will inform the full policy.
Frequently Asked Questions
What are the risks to reviewer burden of using generative AI tools?
Generative AI tools, by their nature, make it easy to create large quantities of plausible-looking code, with plausible-looking tests, which is nonetheless incorrect or, even if it is correct, is poorly designed and therefore difficult to maintain. Reviewing submissions of such code can easily become a drain on the already limited time of human reviewers. For this reason, some open-source communities have limited, if not banned, the submission of code created by generative AI tools.
What are the risks to safety and security of using generative AI tools?
The JDK, developed and maintained in the OpenJDK Community, is the primary implementation of the Java Platform. It sits at the foundation of mission-critical systems in businesses, governments, and other organizations around the world. Safety and security are paramount. Plausible-looking but incorrect code would put these critical properties at risk.
What are the intellectual-property risks of using generative AI tools?
The Oracle Contributor Agreement (OCA) requires that a contributor own the intellectual property rights in each contribution and be able to grant those rights to Oracle, without restriction. Most generative AI tools, however, are trained on copyrighted and licensed content, and their output can include content that infringes those copyrights and licenses, so contributing such content would violate the OCA. Whether a user of a generative AI tool has IP rights in content generated by the tool is the subject of active litigation.
Despite these risks, generative AI tools can provide significant value. Are OpenJDK contributors forbidden from using them altogether?
No. As the policy says, you are welcome to use such tools to help comprehend, debug, and review OpenJDK code and other content. Anecdotal evidence from other communities suggests that analysis of existing code, rather than creation of new code, is where generative AI tools shine for established projects with large code bases. This is consistent with our experience thus far.
What does it mean to use generative AI tools “privately”?
The intent of that term is to emphasize that you may use such tools on your own, without contributing the content that they generate. It does not mean that you cannot, e.g., share and discuss the output of such tools with a colleague. When sharing such content, consider adding prominent comments that identify it as being AI-generated.
Is it okay to continue using the spell-checking, grammar-checking, auto-completion, and refactoring features in my editor or IDE?
Yes, so long as they are not based on large language models or similar deep-learning systems.
Is it okay to use a generative AI tool to review draft JEPs, JavaDoc, or other documents, so long as I write all of the text myself?
Yes. This is clearly a case of using a generative AI tool to review content, which is fine.
If I use a generative AI tool to create 100 lines of code, and then edit ten of those lines myself, may I contribute the result?
No. Your contribution would still include, in part, AI-generated code.
Can we improve any of our tooling to help remind contributors of this policy?
Yes. We will shortly reconfigure Skara to add a checkbox to the body of each pull request on GitHub. When you create a pull request, you must check the box to affirm that your contribution is in accordance with the policy. More details, including how to add the checkbox to the body of an existing pull request, are available in the wiki.
In an OpenJDK Project, is it okay to add a feature that calls out to an external AI service?
That depends upon the service’s terms of use, so this amounts to a legal question. Be aware that many such terms place strict limits on how the service may be used. Consult your attorney, or your employers’ attorney, as appropriate, and make sure that everyone with a vested interest in your Project has also consulted appropriate attorneys.
As a Reviewer in an OpenJDK Project, am I responsible for detecting when a contributor has submitted code or other content created with a generative AI tool?
In this role you are already expected to do your best to ensure that incoming contributions are consistent with OpenJDK Community policies and conventions. In general, reliably distinguishing human-generated content from AI-generated content is impossible. If, however, you see evidence that content in a contribution was created with a generative AI tool, then it is your responsibility to notify the contributor of that fact. If the contributor does not respond positively and remove the content, please bring that to the attention of the appropriate Project Lead.
What are some tell-tale clues of content created by generative AI tools?
Sometimes it is obvious, for example when a commit message in the personal fork from which a contributor initiates a pull request includes a
Co-Authored-Bytrailer line that gives credit to a specific generative AI tool. Other times it is more subtle, for example when a contributor’s comments in a pull-request conversation or an e-mail message are in a chatty, verbose style inconsistent with their past writing. Other clues include highly structured comments with multiple headings, unnecessary comments in code, gratuitously defensive programming, and the use of emoji characters.Generative AI tools are evolving rapidly, so clues that are effective indicators today might not be effective indicators tomorrow. In general, if something in a pull request seems uncannily cheerful or meticulous then you could be looking at AI-generated content.