Open Source Development for Public Health Informatics

Open Source development is a viable option for public health informatics software development. While specific open source licenses should be selected based on individual project requirements, the R&D Lab uses the Apache Public License v2 (ASL) for many of its projects.

For those who are unfamiliar with Open Source- you may enjoy watching this 5 minute video using LEGOS.  (link)

Overview:

The R&D Lab, as a CDC government system within the Center for Surveillance, Epidemiology, Laboratory Services (CSELS), is frequently asked how to use open source development in collaboration with partners both internal and external to the agency. Specifically, we are asked how we selected an open source license that includes the necessary protections for the R&D Lab as a project sponsor as well as the Lab’s partners as peer contributors and committers. While specific projects will evaluate and select a development methodology and software license based on the needs of their stakeholders, the Lab consulted with CDC’s legal counsel to review and select a software license that allows us to share source code with as wide an audience as possible.

This brief review shares our experiences in the hope that this may be helpful to other organizations within the public health informatics community interesting in using open source to improve transparency and increase software reuse.

Although some large projects and government entities do have specific “Open Source Policies” [1], this review does not speak for the CDC and certainly does not establish any requirements.

Background:

Open source projects are increasingly common at the Office of the National Coordinator for Health IT (ONC)(e.g., CONNECT, Direct Project), Veteran’s Administration & Department of Defense (e.g., OSEHRA) and the National Cancer Institute (NCI) of the National Institutes of Health (NIH) (e.g., caBIG) and others. Projects within CSELS and its public health partners also seek to develop and release software projects under an open source license. These programs seek to use open source licenses for many positive reasons including, but not limited to: greater community involvement, increased scope of features, greater user adoption, reduced cost of development and increased reuse and interoperability across federal and partner agencies.

The Open Source Initiative (OSI) defines open source software as complying with 10 criteria [2]: free redistribution of software programs, free availability of source code, allowing derived works, integrity of source code, no discrimination against groups, no discrimination against fields of endeavor, distribution of license, license must not be specific to a product, license must not restrict other software, and that the license must be technology neutral. Although OSI recognizes over 75 licenses, these licenses generally fall into three categories:

  • Permissive licenses – provide few, if any, restrictions on modifications and extensions to software (e.g., ASL, BSD, MIT)
  • Viral licenses – require that modifications, extensions and linkages be covered and distributed under a compatible license (e.g., GPL, LGPL)
  • Hybrid licenses – combination of permissive and viral licenses that provides limited restrictions on how modifications and extensions to software are distributed back to the community (e.g., MPL, EPL).

Licenses are historically chosen by the community and sponsoring organization based on the specific requirements of the project with permissive licenses allowing for greater reuse and viral or hybrid licenses providing greater control over a project.

CDC has a history of using permissive licenses, beginning in 2008, to develop and share projects with the open source community through projects sponsored by the R&D Lab and Epi-Info program. Additional programs have expressed interest in developing software in an open source manner so it is efficient to provide a single guidance document that answers the common question of: which license is appropriate for use.

Experiences:

Based on the use of permissive licenses with existing projects within the federal government (ONC CONNECT, VA/DoD OSEHRA, NCI caBIG) and the availability of government produced source code under the Freedom of Information Act, a permissive license is typically, pragmatically selected for use by Lab projects. A viral or hybrid license may place restrictions that would be difficult by the government to enforce and would limit reuse of software with the public health community.

Given that the Lab’s key requirements for software projects are collaboration, sharing, reuse and transparency the default license used by a project should be the ASL, although individual projects may have specific needs that lead them to select an alternative license, including non-open source licenses, for use within their project (e.g., sensitive data contained within the source code; security configuration within the source code; intellectual property used within the source code, etc.).

The ASL is selected as guidance based primarily on its permissive nature and on its extensive use within the federal government at ONC and VA/DoD. This license explicitly provides protections for liability as it clearly states that all software is provided without warranty or support. Issues such as security, privacy, quality and stewardship of federal resources, are not negatively impacted solely by the specific selection of the ASL.

Our projects explicitly note their use of the ASL by including a LICENSE.TXT file within the root of the source code repository that contains the text of the license (see Appendix A) and by including a header within each source file that is included within the project.

 

References

 

open-source