Volume 48 Number 6, 2011
Pages 697 — 706
Abstract — The purposes of this article are to describe usability testing and introduce designs and methods of usability testing research as it relates to upper-limb prosthetics. This article defines usability, describes usability research, discusses research approaches to and designs for usability testing, and highlights a variety of methodological considerations, including sampling, sample size requirements, and usability metrics. Usability testing is compared with other types of study designs used in prosthetic research.
Key words: amputation, case studies, consumer, medical devices, methodology, prosthetics, rehabilitation, research design, upper limb, usability.
Participants at the 2006 State-of-the-Science (SOS) Meeting in Prosthetics and Orthotics (sponsored by the National Institute of Disability and Rehabilitation Research and held at the Rehabilitation Engineering Research Center, Northwestern University, Feinberg School of Medicine; http://www.nupoc.northwestern.edu/news-publications/papers/sos_reports/SOS_2006report.pdf) identified a wide range of research priority areas that spanned the continuum from product development to identifying the predictors of successful prosthetic wear to improving education for prosthetists fitting complex prostheses . The majority of SOS recommendations related to upper-limb prosthetics focused on expanding the capabilities and control inputs for prosthetic technologies; in other words, the development and evaluation of new prosthetic devices. Clearly, product development research was identified as a high priority for the field.
Design of highly functional upper-limb prosthetic devices that are acceptable to users has proven to be particularly challenging, as evidenced by high rates of device abandonment reported in the literature . In their 2007 review of more than 200 articles on prosthesis use and abandonment, Biddiss and Chau noted that technological factors relating to discomfort and limited function, particularly for those with higher levels of limb loss, were among the major reasons for prosthetic dissatisfaction and abandonment . Additional reasons for dissatisfaction and abandonment included problems related to durability and mechanical failure, discomfort, control, and cosmesis [2-6]. Although design priorities identified by consumers varied by type of device and level of amputation, identified priorities generally included such key areas as reduced weight, lower cost, improved comfort, increased movement, and greater dexterity .
Successful design of prosthetic devices hinges upon a research and development process that intimately combines end-users with device developers . This process focuses on the usability of the product , in what is sometimes called a "user-centered" approach to design .
The concept of usability and the design and conduct of usability testing studies may be unfamiliar to many in prosthetics and rehabilitation because of the dearth of studies published in these areas. The field of usability engineering originates from the disciplines of human factors science and ergonomics , with origins in the aerospace and automotive industries . The usability concept has been embraced by the fields of information technology, assistive technology , and medical device development . However, little has been written about usability of prosthetic devices or the best design for conducting usability tests of prosthetic devices. Thus, the overall purposes of this article are to describe the concept of usability testing and introduce designs and methods of usability testing research. The perspective on usability testing presented in this article draws from my experience in usability testing and mixed methods (qualitative and quantitative) research. This synthesis is a direct result of my experience designing and implementing the Department of Veterans Affairs (VA) "Study to Optimize the DEKA Arm," a multisite usability study funded by the VA Rehabilitation Research and Development Service.
The concept of device usability, considered an aspect of "usefulness," is a qualitative attribute that assesses the ease of use of device-user interfaces. A variety of definitions for usability have been proposed by standardization bodies such as the International Organization for Standardization (ISO) and human factors/usability experts [7,10-12]. The ISO definition of product usability that may be most applicable to medical devices is "the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use" .
Usability is associated with the functionality or utility of the device; in other words, the usefulness of the device and its perceived value for the purpose for which it is intended. Usability, according to Krug, means "making sure that something works well: that a person of average (or even below average) ability and experience can use the thing-whether it's a website, a fighter jet, or a revolving door-for its intended purpose without getting hopelessly frustrated" .
In his seminal text on usability engineering, Nielson describes a generic framework for product usability delineating the following elements: learnability, efficiency, memorability, errors, and satisfaction . Learning refers to the ease of accomplishing basic tasks with the device the first time users encounter the design. After the basic controls of the device are learned, efficiency relates to the speed at which users can perform tasks with the device, as well as the work and time involved in device use. Memorability relates to the ease of reestablishing proficiency with the device after a period away from it. Errors refer to the frequency and severity of errors in using the controls. Satisfaction refers to the user's satisfaction with the design and its usefulness.
Usability can have a variety of other attributes related to user comfort, aesthetics, durability, perceived complexity, or other features; specific usability attributes may vary depending on the product and its functions . For users of prosthetic devices, I believe that important attributes of usability include, but are not limited to, functionality achieved while using the device, ease of initial setup of the device, use of good posture and body mechanics while operating the device, daily maintenance, and repairs.
Studies that focus on the development and assessment of usability of new products or technology can be classified as usability research or usability engineering . Usability engineering refers to research and design methods for improving ease-of-use during the design process of new products and devices. Usability research can be conducted to inform optimization efforts to refine precommercially available devices and to evaluate existing commercially available technology. Usability research has two basic approaches: (1) usability inspection by nonusers (such as designers and other experts) , and (2) usability testing by device users themselves . Usability testing is the focus of this article.
Information on usability is needed at each stage of prosthesis development to provide feedback to device developers and manufacturers . In the early research and development phase, technical data, i.e., speed and torque of joint movements and angular motions and durability of components, are essential. Although achievement of product engineering goals could be considered an aspect of usability assessment, as development progresses, user testing is needed to assess the use of the device in real-life settings. Just because a device meets the manufacturer's development goals does not necessarily mean that a device is usable. For example, a prosthetic manufacturer may have specific goals regarding torque or speed of grip. User testing is needed to assess whether or not the grip speed and torque are satisfactory for the user's purposes. Thus, user testing can provide information on the performance of devices and the perceived usability that differs from assessment of manufacturing goals.
Usability research can result in recommendations for material changes and other improvements to the device itself, as well as improvements to the human-device interface. As refinements are implemented, additional research is required to assess the changes. Thus, usability research for product development is an iterative process of user engagement and product refinement. Ideally, the incorporation of usability data derived from end-user feedback in an iterative design process enables repeated refinements of product design such that an acceptable level of usability is achieved.
Usability research can also be conducted to evaluate a product after the design is complete and/or after the product is commercially available, in what is sometimes called summative usability research . Summative usability testing can be conducted to compare the usability of several products.
Usability inspection by nonusers may employ strategies such as heuristic evaluation , cognitive walkthroughs (to assess ease of learning to use the device), and design reviews , formally assessing elements of task performance using the device.
Dozens of methods can be used in usability testing, and the choice of methods depends upon the usability attributes that need to be assessed. Because device usability cannot be directly measured, it is often assessed by means of indirect measures or attributes, such as observation of the user interacting with the device, user's impressions of ease-of-use of the device, and user reports of satisfaction with the device. Many of the qualitative methods employed in usability testing are drawn from ethnography and involve participant observation, observation of live or video-recorded use of the device, and observations made by investigators of responses (both verbal and visual) made by participants during usability testing. Qualitative methods also include administration of open-ended surveys and questionnaires, interviews, and focus groups. Usability testing methods typically involve gathering and analyzing various quantitative metrics as well. These may be derived from various testing methods, such as quantitative surveys, time-based tests, measures of task performance and completion, and tracking errors and usage. Some usability testing methods ask subjects to choose between several hypothetical engineering trade-offs in device design. This method of usability assessment may have limitations because users may have differing short- and long-term preferences. Additionally, in the assessment of hypothetical trade-offs, users do not get to actually compare the different versions of the device with and without the various components, so their stated preferences are not actually based on user testing.
Researchers take a variety of approaches to usability testing, choosing qualitative, quantitative, or mixed methodologies, depending upon the researcher, the project, and the objectives. Quantitative and qualitative research paradigms have distinctly different sets of assumptions, terminologies, and methods. Qualitative research takes an integrative, naturalistic approach, which may provide researchers with greater understanding of a complex phenomenon . Although qualitative research is sometimes categorized under the general term descriptive research, its purpose often extends beyond description to include interpretation, prediction, and explanation. Thus, qualitative methodology is an appropriate choice for addressing research questions about complex relationships, clinical situations, or new areas of inquiry. In contrast, a quantitative approach to research is "based upon testing a theory, composed of variables, measured with numbers, and analyzed with statistical procedures" .
The mixed methodology approach combines both qualitative and quantitative methods. Mixed method studies are particularly useful when seeking triangulation (i.e., convergence) or complementarity (i.e., fuller explanation) of results; for examining overlapping and different facets of a phenomenon; and for discovering paradoxes, contradictions, or fresh perspectives . A mixed methods approach is generally seen as adding breadth and scope to a project . The diversity of methods and data sources can help in understanding all types of usability concerns. Triangulating findings from both qualitative and quantitative data analyses allows a more comprehensive understanding of usability testing.
Usability testing typically involves case study or case series designs [20-23] but can also be conducted as within-subjects (crossover) designs and between-subjects quasi-experimental studies. Case studies are typically conducted as uncontrolled observational studies. The single case study involves the study of only one case or individual. A case series, used in many health technology assessments , is the study of more than one individual with similar characteristics. Evidence derived from a case series is considered more robust than that derived from a single case study alone. A multiple case series involves the conduct of more than two case series.
Case studies are considered the weakest type of study design for effectiveness research because they do not have a control group that matches the experimental group on all factors except receipt of the intervention (i.e., the prosthesis). Primary threats to the validity of case study designs relate to several elements: history, maturation, and instrumentation. A historical threat to validity occurs when an observed change over time is attributable to events that occur between measurement points and not to the intervention (i.e., the prosthesis). The threat of maturation is that subjects may change because they grow or age, and observed changes may be attributable to maturation, not the intervention. The threat of instrumentation relates to the effects of test taking or changes in the way that measurements are taken. Without a control or comparison group, it is difficult to guard against these threats to internal validity. Thus, case studies and case series designs cannot be used to conduct conclusive evaluations of effectiveness of devices. Despite these limitations, case studies, case series, and multiple case series are the most appropriate design choices for studies of device usability for prosthetics, where per subject costs are high and the sample is hard to find.
Various design strategies can be used to address the threats to internal validity of simple observational case studies and case series . Quasi-experimental designs for use in case studies and case series employ repeated measures, at least one prior to intervention (baseline) and one after intervention. Use of repeated measures at baseline provides information on the stability of tests and measures and helps minimize threats to instrumentation.
Single subject experimental designs (also called "n-of-1" clinical trials) are most appropriate for early studies on treatment efficacy . These designs employ techniques used in larger randomized controlled trials, for example, random assignment to treatment groups and blinding of subject and assessor to treatment condition. Thus, n-of-1 trial design would not be appropriate if the goal of the study is to obtain user feedback on device usability.
Study designs that could be used to compare effectiveness of one device with another include randomized clinical trials (RCTs), quasi-experimental designs involving two comparison groups, and crossover designs. Larger RCTs and quasi-experimental designs may not be feasible to conduct because of reasons discussed previously (i.e., heterogeneity of population). Subjects in a crossover study are followed over time and receive a sequence of different treatments (in this case, prostheses) in a within-subjects design . Thus, each subject serves as his or her own control/comparison. The advantages of a crossover study over a quasi-experimental design that compares two separate groups of subjects is that the influence of confounding covariates is reduced because subjects serve as their own comparison or control. The strongest crossover study designs also use methods from clinical trial design, including random assignment to experimental conditions. This reduces threats to internal validity much the way an RCT does.
It is well known that the order in which interventions are administered may affect study outcomes. An example might be that hours of training with any type of prosthesis may carryover from one device to another, improving ease of use with a subsequent device. Thus, the strong crossover study designs both randomize the order in which interventions are received and provide a washout period between treatments to minimize carryover.
A distinctive feature of prosthetic device development is the need to evaluate the device, not only for the amputee's use but also in terms of how easily and proficiently clinical staff (e.g., prosthetists, therapists) use the device, i.e., staff can fit and setup the device and train amputees in its optimal use. If clinicians find it difficult to prescribe, fit, or configure the prosthetic device or refine the setup and train a patient with it, amputee users may experience greater problems with the device. Thus, it is important to consider clinicians as well as amputees as subjects in usability research studies.
Involvement of device users in all stages of product development increases the likelihood of producing devices that are safe, usable, effective, and acceptable to users. Thus, failure to engage users in device development may adversely affect the quality of outcomes and, ultimately, the acceptability of the device . Some of the best examples describing user involvement in device development come from the VA [29-30]. The VA model involved device users at all stages of medical device development, starting with the process of identifying clinical need and conceptualizing a new product and continuing patient and caregiver involvement through an iterative process of device design, development, and testing. Other models of usability research have also arisen from the VA system .
The question of how many subjects is "enough" for usability research is an ongoing issue in usability testing and user interface discussions. In all research, the cost of development and testing must be weighed against the potential benefits of the knowledge to be gained. The expense of manufacturing research and development versions of prototype devices for study and the complexity of using the new devices is a major factor limiting study size. In addition, available research funding for new devices often limits study size. In prosthetics, the amount of time it takes to fit subjects with prosthetic sockets (if necessary) and train them in the use of a new device adds to the cost of conducting such studies. There is a need to balance the costs of producing new devices for testing and/or experimental prosthetic control procedures with having a large enough sample to test for usability.
Whereas drug clinical trials require increasingly larger samples of human subjects to establish safety, effectiveness, side effects, and dosage, usability research typically requires a small number of subjects to identify the improvements, limits, and characteristics of the product. Some research has found that 80 percent of usability problems are detected by the first four or five subjects who use the device and the most severe usability problems are likely to have been detected by the first few subjects .
However, other factors must be considered when estimating sample size for usability in upper-limb prosthetics. Sampling in these types of studies is often considered "purposeful" . The objective of purposeful sampling is to gather data that will maximize opportunities to discover variations among concepts and to increase understanding of the phenomena under study. As discussed previously, the upper-limb amputee population is heterogeneous. Thus, usability of a new prosthetic device or technology may need to be assessed separately for subsamples, for example by level of limb loss (if the device is intended for use at different levels); for bilateral or unilateral users; or for those with comorbid conditions, cognitive deficits, or sensory loss. It may also be important to compare and contrast usability of a prosthetic terminal device for new versus experienced users or for males versus females.
The low prevalence of upper-limb amputation, particularly at the higher levels of limb loss, may make accrual of even a small sample size for purposeful sampling quite a challenge. There may be few subjects with the necessary characteristics residing in a given area available to participate in research studies. Thus, research studies may need to use multiple sites to accrue sufficient numbers and include those amputees who are available (convenience sampling).
Purposeful sampling has advantages in helping to maximize the understanding of wide-ranging usability concerns. That said, this sampling strategy, like convenience sampling, also creates an important potential limitation. The use of purposeful or convenience sampling, rather than random selection of subjects from a representative group, can introduce bias by selecting subjects who may not be representative of the broader population. As a hypothetical example, if subjects who agreed to participate in these types of studies were more likely to be unemployed and less happy with their existing devices compared with the broader population of upper-limb amputees, then this could create selection bias.
A variety of metrics can be used in usability testing. Usability performance measures may include quantitative assessments, such as number of errors made while operating the device, time to complete specified tasks, accuracy of task performance, and time to become proficient in device use.
A particular challenge in the conduct of studies of usability of upper-limb prosthetic devices is that until recently, little consensus has existed on the important domains or constructs that should be assessed related to prosthetic performance. Thus, researchers have needed to develop their own conceptual framework for assessment that is tailored to the device itself and its intended uses and capabilities. Recently, the Work Group on Upper Limb Prosthetic Outcome Measures (ULPOM) recommended the use of the World Health Organization's International Classification of Functioning, Disability, and Health (ICF) as an ideal framework for organizing the selection of outcome measures . The group emphasized the importance of selecting assessment tools that captured each of the important elements of the ICF-including body structures and functions (performance of the prosthesis), activity (carrying out tasks), and participation (use of the prosthesis in real-life situations). It is generally accepted that multiple outcome measures are needed to cover the range of important elements in upper-limb prosthetic research because no single tool captures all aspects . Thus, a tool kit consisting of multiple types of measures may be needed.
The ULPOM also acknowledged that selection of measures should be matched to the type of research being conducted and targeted to the stage of development of the prosthesis (Figure). Thus, the most appropriate assessments to use at the earliest stage of prosthesis development include technical measures of grip force, speed, etc.
Development cycle of prostheses, from research through daily use (black). International Classification of Functioning, Disability, and Health components of assessment for prostheses and their location within development cycle (gray). Adapted from Hill W, Stavdahl O, Hermansson LN, Kyberd P, Swanson S, Hubbard S. Functional out-comes in the WHO-ICF model: Establishment of the Upper Limb Prosthetic Outcome Measures Group. J Prosthet Orthot. 2009;21(2): 115???19. DOI:10.1097/JPO.0b013e3181a1d2dc
Click Image to Enlarge. View as PowerPoint Slide
At the later stages of device development, outcome measures need to address broader areas, such as performance of specific tasks and use in everyday activities. Thus, objective measurements, such as timed dexterity tests, often used as outcomes in studies of upper-limb amputees, need to be supplemented by assessments of the amputee's experience to understand which changes "make a real difference in the lives of patients" . Of particular importance in assessment of prosthetic usability is the patient's perspective on the usefulness of the device for performing everyday functions, as well as the comfort and fit of their prosthetic limb, their health-related quality of life, and their mobility.
A challenge for selection of quantitative outcome measures for usability testing is that few measures have been developed and validated for use with adult upper-limb amputees . Nor has prior research been conducted on the responsiveness and sensitivity to change that would aid researchers in interpreting measurement change scores of upper-limb prosthetic measures [16,35]. Additional research is needed before the field has all necessary data to select and interpret a complete battery of measures [16,35].
In large clinical trials, the effectiveness of interventions is assessed statistically by comparing mean change in outcomes scores between groups of patients. These same comparisons are impossible in clinical practice, individual case studies, or trials using very small patient samples. Common measures of responsiveness of a measure, such as effect size, standardized response means, or the responsiveness statistic, summarize test responsiveness and are useful for making relative comparisons between measures but do not contribute to the interpretation of test results in individual subjects or patients.
The best instruments for use in case studies, case series, and other types of studies using small samples would have superior measurement properties. In particular, measurement instruments designed to measure change for an individual person must have greater reliability than measures designed for group use. Additionally, research must be done to know how to interpret changes in scores from one point in time to the next. For interpretability, one should be able to answer questions such as, "Does a change in score of 10 points in a certain measure denote an important change for these patients?" and, "Is a 5-point change in score the same as a 10-point change in score?"
Data on two important constructs, minimum detectable change (MDC) [36-38] and minimal clinically important difference (MCID)  can assist in assessing the interpretability of scores for individuals and small samples. MDC is a statistical measure of meaningful change, defined as the minimum amount of change that exceeds measurement error . In contrast, MCID defines the threshold at which an individual has experienced an important change . The MDC of measures can be calculated using data from studies of the reliability of the measures.
Unfortunately, at this time, few outcome measures have been validated for adults with upper-limb prosthetics and information is lacking on the MDC and within-subject MCID of outcomes instruments that might be used to assess patients using upper-limb prosthetics. Until further data are available, researchers must use their best judgment, informed by a sound theoretical rationale, to guide their instrument choices and be cautious in interpreting the meaningfulness of change scores using these measures.
While research has been conducted to assess consumer priorities and needs [2,6], a necessary first step in user-centered design, little research has been conducted that directly involves users of upper-limb prostheses in testing of products during the development stage. Lack of usability research in upper-limb prosthetics may be attributed to several factors. First, limited funding has been available for this type of research in the United States. Until recently (with the Defense Advanced Research Projects Agency's Revolutionizing Prosthetics Program), few or no major U.S. federally funded research initiatives in upper-limb prosthetic development existed. Studies of upper-limb prosthetics are inherently challenging because of the lack of homogeneity and relative rarity of upper-limb amputation. Major upper-limb amputees constitute only 3 percent of the U.S. amputee population, an estimated 41,000 persons in 2005. Each year in the United States, an estimated 1,908 upper-limb amputations are performed compared with 56,912 lower-limb amputations . Furthermore, this relatively small population is diverse in terms of the level of etiology, level of limb loss, and number of limbs lost.
The paucity of usability research in upper-limb prosthetics may also be explained, in part, by the fact that studies of new devices are not typically driven by regulatory requirements. Although prostheses are considered "Medical Devices," subject to regulation by the U.S. Food and Drug Administration (FDA), they are generally considered "Physical Medicine Devices," which are Class I or low-risk devices. As such, manufacturers are not required to submit a "Premarket Approval Application" to the FDA. Thus, manufacturers are rarely required to conduct research on their new products.
Usability studies are one type of scientific research needed to advance the field of prosthetics and to support the design and development of highly functional prosthetic devices that are acceptable to users. Usability research complements other types of prosthetic research, including descriptive epidemiology, neuroscience/engineering research, effectiveness, measurement studies, and cost-effectiveness. These types of studies call for designs and methods that specifically address their goals and thus may employ varied research strategies that may or may not overlap. While a full description of each of the study types and designs is beyond the scope of this article, representative goals of each of these types of studies are shown in the Table. User studies to understand the needs and preferences of prosthetic users or predictors of prosthetic use and abandonment are typically observational studies that involve surveys of users [41-47] and providers , medical record abstraction [47,49], focus groups , or in-person interviews . User studies are typically recommended as the first step in product development research . Research in the fields of neuroscience and engineering is needed to develop better prosthetic components and controls and advance the interface between the amputee user and prosthetic controls. Usability research complements neuroscience and engineering research to ensure that new inventions and products meet user needs. Studies to evaluate effectiveness of devices and compare effectiveness of more than one device type require experimental or quasi-experimental designs . These designs may also be used to compare the usability of several devices or device prototypes. Studies to develop and validate outcome measures for prosthetics involve a variety of development designs and psychometric testing to ensure that measures have adequate reliability, validity, and responsiveness [52-53]. Such studies are needed to guide selection of metrics for studies of effectiveness and usability. Studies to evaluate cost-effectiveness of prosthetic care would analyze data on costs and benefits of treatment, providing data to assist in payment and prescription.
Types of studies in prosthetics and goals for each type.
Example of Goals
Understanding prosthetic users and their needs.
Neuroscience and Engineering
Designing and testing prosthetic components, control mechanisms, and control interfaces.
Developing/refining prosthetics to meet consumer needs.
Effectiveness and Comparative Effectiveness
Evidence-based prosthetic prescription.
Development of Outcome Measures
Developing reliable and valid measures for clinical and research use.
Developing evidence to support cost-effective prosthetic care.
Upper-limb prostheses must work well to meet user's needs, or they are likely to be abandoned. The development and evaluation of new upper-limb prosthetic devices that better meet the needs of users was identified as a high priority by participants at the 2006 SOS Meeting in Prosthetics and Orthotics.
Usability research helps to ensure that products are designed, developed, and optimized to meet user needs. User testing studies provide feedback on device performance and perceived usability-providing the user's perspective. User testing studies typically employ mixed methods and case study designs. Usability testing is often conducted with purposefully selected samples that best represent typical users or subgroups of users. Choice of testing methods depend on the usability attributes that need to be assessed but often include observations of the user interacting with the device, the user's reports of satisfaction and assessment of usability of specific device features, and collection of standardized usability metrics. Measures used in usability studies should be highly reliable and valid. Although usability testing may be new to many in prosthetics research, usability studies complement other types of studies necessary to advance research and practice in upper-limb prosthetics.
Go to TOP
Last Reviewed or Updated Monday, July 11, 2011 9:43 AM