Verification and validation of computer simulation models
Verification and validation of computer simulation models is conducted during the development of a simulation model with the ultimate goal of producing an accurate and credible model.[1][2] "Simulation models are increasingly being used to solve problems and to aid in decision-making. The developers and users of these models, the decision makers using information obtained from the results of these models, and the individuals affected by decisions based on such models are all rightly concerned with whether a model and its results are "correct".[3] This concern is addressed through verification and validation of the simulation model.
Simulation models are approximate imitations of real-world systems and they never exactly imitate the real-world system. Due to that, a model should be verified and validated to the degree needed for the models intended purpose or application.[3]
The verification and validation of simulation model starts after functional specifications have been documented and initial model development has been completed.[4] Verification and validation is an iterative process that takes place throughout the development of a model.[1][4]
Verification
In the context of computer simulation, verification of a model is the process of confirming that it is correctly implemented with respect to the conceptual model (it matches specifications and assumptions deemed acceptable for the given purpose of application).[1][4] During verification the model is tested to find and fix errors in the implementation of the model.[4] Various processes and techniques are used to assure the model matches specifications and assumptions with respect to the model concept. The objective of model verification is to ensure that the implementation of the model is correct.
There are many techniques that can be utilized to verify a model. Including, but not limited to, have the model checked by an expert, making logic flow diagrams that include each logically possible action, examining the model output for reasonableness under a variety of settings of the input parameters, and using an interactive debugger.[1] Many software engineering techniques used for software verification are applicable to simulation model verification.[1]
Validation
Validation checks the accuracy of the model's representation of the real system. Model validation is defined to mean "substantiation that a computerized model within its domain of applicability possesses a satisfactory range of accuracy consistent with the intended application of the model".[3] A model should be built for a specific purpose or set of objectives and its validity determined for that purpose.[3]
There are many approaches that can be used to validate a computer model. The approaches range from subjective reviews to objective statistical tests. One approach that is commonly used is to have the model builders determine validity of the model through a series of tests.[3]
Naylor and Finger [1967] formulated a three-step approach to model validation that has been widely followed:[1]
Step 1. Build a model that has high face validity.
Step 2. Validate model assumptions.
Step 3. Compare the model input-output transformations to corresponding input-output transformations for the real system.[5]
Face validity
A model that has face validity appears to be a reasonable imitation of a real-world system to people who are knowledgeable of the real world system.[4] Face validity is tested by having users and people knowledgeable with the system examine model output for reasonableness and in the process identify deficiencies.[1] An added advantage of having the users involved in validation is that the model's credibility to the users and the user's confidence in the model increases.[1][4] Sensitivity to model inputs can also be used to judge face validity.[1] For example, if a simulation of a fast food restaurant drive through was run twice with customer arrival rates of 20 per hour and 40 per hour then model outputs such as average wait time or maximum number of customers waiting would be expected to increase with the arrival rate.
Validation of model assumptions
Assumptions made about a model generally fall into two categories: structural assumptions about how system works and data assumptions.
Structural assumptions
Assumptions made about how the system operates and how it is physically arranged are structural assumptions. For example, the number of servers in a fast food drive through lane and if there is more than one how are they utilized? Do the servers work in parallel where a customer completes a transaction by visiting a single server or does one server take orders and handle payment while the other prepares and serves the order. Many structural problems in the model come from poor or incorrect assumptions.[4] If possible the workings of the actual system should be closely observed to understand how it operates.[4] The systems structure and operation should also be verified with users of the actual system.[1]
Data assumptions
There must be a sufficient amount of appropriate data available to build a conceptual model and validate a model. Lack of appropriate data is often the reason attempts to validate a model fail.[3] Data should be verified to come from a reliable source. A typical error is assuming an inappropriate statistical distribution for the data.[1] The assumed statistical model should be tested using goodness of fit tests and other techniques.[1][3] Examples of goodness of fit tests are the Kolmogorov–Smirnov test and the chi-square test. Any outliers in the data should be checked.[3]
Validating input-output transformations
The model is viewed as an input-output transformation for these tests. The validation test consists of comparing outputs from the system under consideration to model outputs for the same set of input conditions. Data recorded while observing the system must be available in order to perform this test.[3] The model output that is of primary interest should used as the measure of performance.[1] For example, if system under consideration is a fast food drive through where input to model is customer arrival time and the output measure of performance is average customer time in line, then the actual arrival time and time spent in line for customers at the drive through would be recorded. The model would be run with the actual arrival times and the model average time in line would be compared actual average time spent in line using one or more tests.
Hypothesis testing
Statistical hypothesis testing using the t-test can be used as a basis to accept the model as valid or reject it as invalid.
The hypothesis to be tested is
- H0 the model measure of performance = the system measure of performance
versus
- H1 the measure of performance ≠ the measure of performance.
The test is conducted for a given sample size and level of significance or α. To perform the test a number n statistically independent runs of the model are conducted and an average or expected value, E(Y), for the variable of interest is produced. Then the test statistic, t0 is computed for the given α, n, E(Y) and the observed value for the system μ0
- and the critical value for α and n-1 the degrees of freedom
- is calculated.
If
reject H0, the model needs adjustment.
There are two types of error that can occur using hypothesis testing, rejecting a valid model called type I error or "model builders risk" and accepting an invalid model called Type II error, β, or "model user's risk".[3] The level of significance or α is equal the probability of type I error.[3] If α is small then rejecting the null hypothesis is a strong conclusion.[1] For example, if α = 0.05 and the null hypothesis is rejected there is only a 0.05 probability of rejecting a model that is valid. Decreasing the probability of a type II error is very important.[1][3] The probability of correctly detecting an invalid model is 1 - β. The probability of a type II error is dependent of the sample size and the actual difference between the sample value and the observed value. Increasing the sample size decreases the risk of a type II error.
Model accuracy as a range
A statistical technique where the amount of model accuracy is specified as a range has recently been developed. The technique uses hypothesis testing to accept a model if the difference between a model's variable of interest and a system's variable of interest is within a specified range of accuracy.[6] A requirement is that both the system data and model data be approximately Normally Independent and Identically Distributed (NIID). The t-test statistic is used in this technique. If the mean of the model is μm and the mean of system is μs then the difference between the model and the system is D = μm - μs. The hypothesis to be tested is if D is within the acceptable range of accuracy. Let L = the lower limit for accuracy and U = upper limit for accuracy. Then
- H0 L ≤ D ≤ U
versus
- H1 D < L or D > U
is to be tested.
The operating characteristic (OC) curve is the probability that the null hypothesis is accepted when it is true. The OC curve characterizes the probabilities of both type I and II errors. Risk curves for model builder's risk and model user's can be develop from the OC curves. Comparing curves with fixed sample size tradeoffs between model builder's risk and model user's risk can be seen easily in the risk curves.[6] If model builder's risk, model user's risk, and the upper and lower limits for the range of accuracy are all specified then the sample size needed can be calculated.[6]
Confidence intervals
Confidence intervals can be used to evaluate if a model is "close enough"[1] to a system for some variable of interest. The difference between the known model value, μ0, and the system value, μ, is checked to see if it is less than a value small enough that the model is valid with respect that variable of interest. The value is denoted by the symbol ε. To perform the test a number, n, statistically independent runs of the model are conducted and a mean or expected value, E(Y) or μ for simulation output variable of interest Y, with a standard deviation S is produced. A confidence level is selected, 100(1-α). An interval, [a,b], is constructed by
- ,
where
is the critical value from the t-distribution for the given level of significance and n-1 degrees of freedom.
- If |a-μ0| > ε and |b-μ0| > ε then the model needs to be calibrated since in both cases the difference is larger than acceptable.
- If |a-μ0| < ε and |b-μ0| < ε then the model is acceptable as in both cases the error is close enough.
- If |a-μ0| < ε and |b-μ0| > ε or vice versa then additional runs of the model are needed to shrink the interval.
Graphical comparisons
If statistical assumptions cannot be satisfied or there is insufficient data for the system a graphical comparisons of model outputs to system outputs can be used to make a subjective decisions, however other objective tests are preferable.[3]
See also
References
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Banks, Jerry; Carson, John S.; Nelson, Barry L.; Nicol, David M. Discrete-Event System Simulation Fifth Edition, Upper Saddle River, Pearson Education, Inc. 2010 ISBN 0136062121
- ↑ Schlesinger, S.; et al. (1979). "Terminology for model credibility". Simulation 32 (3): 103–104. doi:10.1177/003754977903200304.
- 1 2 3 4 5 6 7 8 9 10 11 12 13 Sargent, Robert G. "VERIFICATION AND VALIDATION OF SIMULATION MODELS". Proceedings of the 2011 Winter Simulation Conference.
- 1 2 3 4 5 6 7 8 Carson, John, "MODEL VERIFICATION AND VALIDATION". Proceedings of the 2002 Winter Simulation Conference.
- ↑ NAYLOR, T. H., AND J. M. FINGER [1967], "Verification of Computer Simulation Models", Management Science, Vol. 2, pp. B92– B101., cited in Banks, Jerry; Carson, John S.; Nelson, Barry L.; Nicol, David M. Discrete-Event System Simulation Fifth Edition, Upper Saddle River, Pearson Education, Inc. 2010 p. 396. ISBN 0136062121
- 1 2 3 Sargent, R. G. 2010. "A New Statistical Procedure for Validation of Simulation and Stochastic Models." Technical Report SYR-EECS-2010-06, Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, New York.