| Health and Safety Executive - User performance testing: Support for ISO standards |
| Wednesday, 03 June 1998 | |
|
With support from the HSE, we developed a performance-based test for measuring the quality of visual displays. The method is intended to be used as a means of measuring compliance to ergonomics standards. The method is a revised version of the procedure originally described in the ergonomics standard, ISO 9241/3. First, we evaluated the original procedure in ISO 9241/3, and showed that it can discriminate between a display that meets the physical requirements of the standard from one that does not. However, we also showed that the experimental results are contaminated by practice effects. Second, we developed a revised test that differs in five main ways from the original method. (i) We changed the procedure describing how the test is to be conducted in order to eliminate areas of ambiguity, uncertainty and lack of clarity. (ii) We changed the test from character identification to character recognition by using a four alternative forced choice procedure. (iii) We decided on a fixed period for the test (30 minutes per display). (iv) We made the statistical analysis more critical. (v) We combined the time and error scores into a single measure, by calculating the time taken to select a correct answer rather than the time spent on each screen. Third, we measured the validity of the revised test. The validity of the test hinges on the relationship of the experimental task to the real world task which the test simulates. We therefore correlated scores on the revised test with a test of readability. The results showed that the revised test has high predictive validity: the correlation between results on this test with the real world task were about 0.85. Fourth, we measured the reliability of the revised test. We conducted the test procedure under a variety of circumstances: these included repeating the tests with the same subjects, with different subjects, with different experimenters, with different reference displays and in different laboratories. Then we looked specifically at the test/re-test reliability of the revised procedure and showed that it is greater than 0.86 for the timing scores and about 0.76 for the comfort ratings. Finally, we showed that the revised test is practical, in that it is quick to administer, practice effects are negligible, and (if sequential statistics are used) can frequently make a decision with no more than 15 subjects. Even under worse conditions, it should rarely require more than about 45 subjects to reach a decision. Date: 03 June 1998Category: Usability & HCI |
|
| Last Updated ( Wednesday, 26 March 2008 ) |
