Contact  |  Login  Volunteer

Wizard of Oz

User-based evaluation of unimplemented technology where, generally unknown to the user, a human or team is simulating some or all the responses of the system.

The technique has often been used to explore design and usability with speech systems, natural language applications, command languages, imaging systems, and pervasive computing applications.

The originator, J.F. Kelley explains: "The term Wizard of Oz (originally Oz Paradigm) has come into common usage in the fields of Experimental Psychology, Human Factors, Ergonomics and Usability Engineering to describe a testing or iterative design methodology wherein an experimenter (the "Wizard"), in a laboratory setting, simulates the behavior of a theoretical intelligent computer application (often by going into another room and intercepting all communications between participant and system). Sometimes this is done with the participant's a-priori knowledge and sometimes it is a low-level deceit employed to manage the participant's expectations and encourage natural behaviors (though always, I would hope, with appropriate disclosure during the debriefing part of the experiments)."


Related Links


J. F. Kelley coined the terms "Wizard of Oz" and "Oz Paradigm" when he was working on his dissertation in the early 1980s. These articles describe the origins of the method:

Published Studies

  • Nils Dahlbäck , Arne Jönsson , Lars Ahrenberg, Wizard of Oz studies: why and how, Proceedings of the 1st international conference on Intelligent user interfaces, p.193-200, January 04-07, 1993, Orlando, Florida, United States.
  • S. Dow et al., "Wizard of Oz Interfaces for Mixed Reality Applications," Extended Abstracts SIGCHI Conf. Human Factors in Computing Systems (CHI 05), ACM Press, 2005, pp. 1339–1343.
  • S. Dow et al., "Exploring Spatial Narratives and Mixed Reality Experiences in Oakland Cemetery," Proc. ACM SIGCHI Conf. Advances in Computer Entertainment (ACE 05), ACM Press, 2005, pp. 51–60.
  • John D. Gould , John Conti , Todd Hovanyecz, Composing letters with a simulated listening typewriter, Communications of the ACM, v.26 n.4, p.295-308, April 1983
  • S. Hudson et al., "Predicting Human Interruptibility with Sensors: A Wizard of Oz Feasibility Study," Proc. SIGCHI Conf. Human Factors in Computing Systems (CHI 03), ACM Press, 2003, pp. 257–264.
  • Klemmer, S. R., Sinha, A. K., Chen, J., Landay, J. A., Aboobaker, N., and Wang, A., "Suede: A Wizard of Oz Prototyping Tool for Speech User Interfaces," Proc. ACM Symp. User Interface Software and Technology (UIST 00), ACM Press, 2000, pp. 1–10.
  • Alexandra Klein, Ingrid Schwank, Michel Généreux, and Harald Trost. (2001). Evaluating Multimodal Input Modes in a Wizard-of-Oz Study for the Domain of Web Search<. In Ann Blandford, Jean Vanderdonckt, and Phil Gray, editors, People and Computer XV -- Interaction without Frontiers: Joint Proceedings of HCI 2001 and IHM 2001, pages 475--483. Springer: London, September 2001.
  • Kotelly, B. (2003). The art and business of speech recognition: Creating the noble voice. Boston, MA: Addison-Wesley.
  • D. Maulsby, S. Greenberg, and R. Mander, "Prototyping an Intelligent Agent through Wizard of Oz," Proc. ACM SIGCHI Conf. Human Factors in Computing Systems (CHI 93), ACM Press, 1993, pp. 277–284.
  • Ogden, W. C. & Bernick, P. (1997). User natural language interfaces. In M. Helander, T.K. Landauer, & P. Prabhu (Eds.). Handbook of Human-Computer Interaction (Second Edition). Amsterdam, The Netherlands: Elsevier. Pp. 137-158.
  • Microsoft Developer Network, Usability Testing of Microsoft Speech Applications<.

Detailed description

Benefits, Advantages and Disadvantages


The Wizard of Oz technique can provide valuable information on which to base future designs. It can be used to:

  • Gather actual human responses, about the non-existent interaction
  • Test the interaction of a device before building a functional (and possibly expensive) model
    • test which input techniques and sensing mechanisms best represent the interaction (so that subsequent effort developing or adapting sensing technologies is appropriately directed)
    • test design of feedback through output technologies such as speech synthesis
    • test heuristic algorithm to determine how to produce outputs for ambiguous human inputs
  • Find out the kinds of problems people will have with the devices and techniques
  • Investigate aspects of the products form such as:
    • visual affordance (whether the product shows how it can be used)
    • linguistic affordance (which words should be used in prompts to be understood in given contexts)
    • para-linguistic affordance (which feedback can be understood in which meaningful way; e.g. blinking LEDs as confused facial expression)


  • You can test future technologies without building an expensive prototype, or can "fill in" functionality that is not yet ready for a prototype.
  • Rapid iterations, particularly minor changes in wording or call flow, are immediately testable.
  • Allows the system to be evaluated at an early stage in the design process.
  • Provides a unique insight into the user's actions, gained from 'interacting' with the user during the evaluation.
  • Colleagues who play the "wizard" can learn about how users interact with computer systems. [But see Disadvantages below.]


  • Wizard simulations require significant training so the wizard can respond in a way that is credible.
    • Involving and training a Wizard is an additional resource cost.
  • It is difficult for wizards to provide consistent responses across sessions.
    • Thus, proper program code, or 'behavior instruction' should be prepared and given to the wizard.
    • This 'Behavior instruction' should not describe every single reactions, but try to control predictable and typical situation, and guide the session to answer the target questions
  • If a research team member plays the role of Wizard, there is a risk that they will improvise beyond the programmed behavior.
    • To avoid this risk, hire someone who can be instructed [programmed] with simple rules and play as a wizard.
  • Computers respond differently than humans so the wizard needs to match how a computer might respond (for example, the Wizard should not make typing errors).
  • Playing the wizard can be exhausting, meaning the wizard's reaction may change over time, mainly due to cognitive fatigue.
  • It is difficult to evaluate systems with a large graphical interface element.
  • This approach does not uncover errors that arise as a result of system performance and recognition rates (unless these are specifically simulated), so it is more effective in revealing problems than predicting real world usability.


Wizard of Oz testing is a highly cost-effective way to compare multiple designs.

Appropriate Uses

This technique can be used to test device concepts and techniques and suggested functionality before it is implemented. For example, this technique can be used to simulate a caller-system interaction. The user experience is similar to interacting with a functioning interactive voice response (IVR) system.

The Wizard of Oz technique can provide valuable information on which to base future designs. It can:

  • Gather information about the nature of the interaction
  • Test which input techniques and sensing mechanisms best represent the interaction (so that subsequent effort developing or adapting sensing technologies is appropriately directed)
  • Test the interaction of a device before building a functional model
  • Find out the kinds of problems people will have with the devices and techniques
  • Investigate aspects of the products form such as visual affordance (whether the product shows how it can be used)


How To


The wizard sits in a back room, observes the user's actions, and simulates the system's responses in real-time. For input device testing the wizard will typically watch live video feeds from cameras trained on the participant's hand(s), and simulate the effects of the observed manipulations. Often users are unaware (until after the experiment) that the system was not real.

The wizard has to be able to quickly and accurately discern the user's input, which is easiest for simple for voice input or hand movements. The output must also be sufficiently simple that the "wizard" can simulate or create it in real time.

The basic wizard of Oz procedure involves the following steps:

  1. Develop a simulated user interface for the target technology.
  2. Develop a detailed test plan with the instructions for the facilitator, wizard, participants and other staff. Determine if you need to set any expectations about the simulation's "performance" so participants are prepared for sub-par performance.
  3. Recruit users who meet the appropriate user profile, try to cover the range of users within the target population.
  4. Prepare realistic task scenarios for the evaluation.
  5. Develop a procedure where the wizard can respond to input from a participant.
  6. Train the wizard.
  7. Design the instructions for the study so that the participant knows that they are working with an early prototype and that performance is not "optimized" yet.
  8. Conduct pilot tests to refine the procedure and give the wizard some practice. Make any changes to the procedures and test plan.
  9. Ensure recording facilities are available and functioning.
  10. Conduct each session. The facilitator instructs the user to work through the allocated tasks interacting and responding to the system as appropriate.
  11. Conduct a debriefing of the participants. Obtain feedback on the "performance of the wizard system". Tell the users about the wizard and explain why you couldn't tell them earlier.
  12. Collate, analyze, and summarize the data from the study. Consider the themes and severity of the problems identified.
  13. Summarise design implications and recommendations for improvements and feed back to design team. Video recordings can support this.
  14. Where necessary refine the prototype and repeat the above process.


Special Considerations

Ethical and Legal Considerations

The Wizard of Oz method can involve a low level of deception - the participants are lead to believe that they are using a working system rather than a simulation controlled by an expert, the wizard. According to ASA Code of Ethics: "When deception is an integral feature of the design and conduct of research, (researchers) attempt to correct any misconception that research participants may have no later than at the conclusion of the research."

This concept of ethics on experiment is not universal in the field of usability testing, but this should be take more seriously, especially the performed usability testing could induce the participant to have wrongful expectation about the technological status.


Lifecycle: Design
See also: Paper Prototyping
Sources and contributors: 
Nigel Bevan, Chauncey Wilson, Stanley Chung, Donn DeBoard, Cathy Herzon. Based originally on the UsabilityNet description.
Released: 2006-03
© 2010 Usability Professionals Association