IIU’s Structured Evaluation Framework: Origin and Evolution

In January 2010, CDC created a Public Health Informatics Research & Development Unit (IRDU).  In order to efficiently carry out its mission of advancing the field of public health informatics through applied research and innovation, IRDU staff developed a draft evaluation framework to be used on the wide variety of technology-based tools and solutions of potential value to public health.   The goal of this framework has been to provide a standard – to allow comparisons across technologies.

Two documents were created in this process, the evaluation framework itself, as well a step-by-step users guide.  The scope of this framework was to examine specific technology-based  resources that had the capacity to provide specific value to the public health community. The framework was not designed to evaluate public health programs.

The framework is best used to evaluate the component technologies within a larger system (e.g., a surveillance system)).  Once developed, the framework was vetted by both Gartner and with members of the American Medical Informatics Association (AMIA).

Methods:

As a first phase, the authors looked at the core components of the evaluation.  These components were defined as “dimensions” and represent the aspects or perspectives from which to evaluate the technology.  Based on a combination of an environmental scan as well as the practical needs of IRDU staff, the following 6 dimensions were selected:  Cost, Ease of Installation, Ease of Use, Stability, Performance and Support.

Specifically:

  • “Cost” describes the cost of the technology as relevant to the evaluation. This evaluation component should include licensing cost, training costs, support costs, implementation costs.
  • “Ease of Installation” describes the ease of installation of the technology. This should reflect the supported operating systems and platforms, as well as relevance to CDC and public health environments.
  • “Ease of Use” describes the ease of use of the technology including the learning curve necessary to begin to be effective as well as to be proficient in the tool (e.g., is this tool useful for a casual user and/or for an expert user). It is recommended that one also takes into account whether a particular tool is targeted toward novice or expert skill level users within the technology category.
  • “Stability” describes the maturity and stability of the technology. Stability specifically describes the age of the technology and market penetration (both within public health and within other industries).
  • “Performance” describes the level of performance and responsiveness of the technology.
  • “Support” describes the support and level of community activity that is available for the technology resource. Support also describes the platform that the technology uses and how it affects the supportability of the technology.

Upon further examination, it became clear that a 7th dimension needed to be added.  This was defined as “Domain” (i.e., functionality).   This was needed to allow flexibility in the evaluation tool, given the wide variety of technology-based resources to be evaluated.  This dimension describes the most relevant feature set of the technology.  For example, if an analytics tool were to be evaluated, it would be in this domain that the analytics capabilities would be examined.   Other attributes which can impact the usefulness of the technology for public health, including flexibility, scalability, interoperability, security, modularity, extensibility are included in this larger dimension.  Over time some of these other attributes may evolve into stand-alone dimensions, similar to “Domain.”. Once all the dimensions were articulated, the issue of scoring was addressed.  Although, by its nature, the evaluation was more subjective than objective, the use of scoring would provide a degree of inter-evaluation comparability, and a valuable perspective on the evaluation.

The concept of weighting was added into the scoring methodology to provide a clear numeric score, but with inherent flexibility.  To insure transparency of the evaluation, all weighting (with a required justification) of scores would be transparent to the evaluation readers.  The evaluator has significant flexibility in dimension weighting.  For example, if an evaluator of a data analysis tool places a very high priority on functionality, then the functionality dimension is given a higher weight than the others. The weight represents the highest possible score for a dimension and the dimension’s relative importance when compared to other dimensions within this evaluation.  The evaluator should modify the weights as needed. In general, the higher the score, the closer it is to the ideal value of the dimension.

If the evaluation requires a domain specific dimension, this section of the review should list and describe the one or more components which make up the domain specific dimension. Some dimensions may be potentially unknowable. If so, this section should explain why and give the dimension a weight of zero and note a value of “Not Applicable” on visualizations.  With the core evaluation dimensions articulated, the authors then included additional features into the evaluation, which would enhance the quality and utility of the tool.

An overview section was then added to improve the value and overall impact of the evaluation.  It included a Summary, Keywords, an Introduction, Alignment to public health Business processes, Alternative technologies, and Legal/license issues.  Specifically, the sSummary or “At a Glance” section provided readers with a short executive summary of the evaluation.  As a bonus, it included two small graphics (sparklines) to indicate 1) the overall rating of the technology using 1-5 stars and 2) to show where the technology is with respect to its place on the adoption curve included at the end of the evaluation.

The Keywords section affords the opportunity to include relevant keywords associated with the technology being reviewed, and which may be relevant to other evaluations performed.   The Introduction provides a description of the overall technology/service(s), product(s), purpose/need and background of the evaluation.

Finally, the section focusing on public health business process alignment describes the activities and business processes of public health where this technology is to be of use.  Many of these details were not created by IIU, but refer to a non-exhaustive set of processes articulated by Common Ground:  A National Program of the Robert Wood Johnson Foundation that defines 21 business processes for public health that serve as a requirement for select public health information systems.  Three example processes include: Conduct Syndromic Surveillance, Process, Store, and analyze data, and Develop Public Health Interventions.

Overall, while the 21 Common Ground-based processes serve as a guide, it’s the authors’ perspective that those carrying out evaluations using this framework should feel free to list additional likely business processes that the technology may impact.  that the authors further believe that by aligning to specific business processes within public health, technologies can be grouped and searched according to expected function within the public health workforce.

The Alternatives section provides the opportunity to discuss/describe similar or related technologies which were used as reference for an evaluation.  If possible, reasons should be stated that describe the differences between any alternatives described and the specific technology being evaluated.

The section on Legal / license issues provides the opportunity to describe any relevant legal or policy issues.  Licensing models should also be discussed in this section (role-based, transaction-based, perpetual, etc) for commercial/government off-the-shelf software or (GPL, Apache, BDS, etc.) for open source software.

Once the scoring and overview is completed, the final components of the evaluation can be generated:  overall score, charting and visualization, and overall recommendations and conclusions.

Once all collected, the dimension scores can be combined into an overall score (with percentage).  For improved transparency, the authors felt that it is of value to display the both weighted and unweighted dimension scores.  If unavailable the authors recommend that N/A be displayed, and adjust the results accordingly.  A variety of charting / graphic methods have been included by the authors, these include pie charts, spider/radar charts, and standard bar charts.   Each has its own set of strengths and weaknesses.

The overall recommendation section describes the recommendation / justification for the technology.  The authors recommend that the stakeholders/groups within public health informatics which would be most interested/impacted in the evaluation be specifically addressed.

An adoption time was added to this component of the evaluation.  It allows the reviewers to describe the current state for adoption of the technology.  The technology can be classified as very early/embryonic, nearly mature, mature, or old/outdated.  It provides a very critical perspective on the technology.  The authors recommend that whenever possible, the evaluators provide an estimate of time (e.g., number of years) required for the technology to become mature or transition from obsolescence to end of life.  As expected, technologies which are placed in the embryonic or outdated states, are not highly recommended for adoption at the current time.

The final component of the recommendations section includes a public health/health IT adoption impact- which allows the reviewers to describe the potential impact of the evaluated technology to the public health informatics community. Overall the authors recommend categorizing the impact as either low, moderate, high, or transformational in both the short and long term time frames.

Finally, the authors recommend that, in line with full transparency, an appendix be included in the review which describes the peer review process used for the evaluation.  It’s recommended that the fields of expertise and number of reviewers be included in the appendix.

Results:

This evaluation framework was presented to evaluation subject matter experts (SMEs) at Gartner, Inc., as well as vetted with the American Medical Informatics (AMIA) community in the Spring 2011 at their annual spring conference.   Positive feedback was consistently received, with only minor refinements.  In May 2011, the evaluation framework was then used by US Centers for Disease Control and Prevention (CDC) staff on the data collection tool, EpiCollect, creating a 9-page review.  The authors found the tool to be extremely useful, but not a trivial undertaking – with the task requiring approximately 2 to 3 weeks of effort.  This evaluation is publicly available at www.philab.cdc.gov, by entering the search term “epiCollect”.