• Ingen resultater fundet

Our strategy has been to attack the program from several angles in search of potential errors. Specifically, we have performed both white box and black box tests on different levels of the program ranging from the on the smallest units possible (testing the results of individual statements), and on the system as a whole.

Creating the white box unit test cases allowed us to deliberately construct complex and unusual examples that have revealed programming errors in

non-1We know that Herde and Fr¨anzle, the authors of the HySat solver, are currently working on a solution for this problem.

CHAPTER 8. TEST trivial parts of the code, e.g. theDCSimplifierand SemanticLookupLiteral-Handler classes. Additionally, it gave us a second chance to have a close look at the code, making it possible for us to clean up, reorganise and even rewrite unnecessarily complex parts of the code.

The automated black box test revealed some errors that were not found in the white box test, for instance an error that consisted of a “missing branch”, and errors occurring in the interplay between classes. The small, manual black box test suite was successful in testing the outskirts of the program that were inconvenient to test automatically.

All known errors have been corrected.

We find that the combination of the white box and black box approaches supplemented each other well. Having automated tests made it possible for us to work with a much larger number of test cases, and to run these repeatedly as errors were corrected. The time we have spent in constructing the test frame-work and the many test cases has been worthwhile because it has significantly increased our confidence in the correctness of the BMC/DCValidator.

Chapter 9

Benchmarks

In this chapter, we shall present some benchmarks for evaluating the perfor-mance of the BMC/DCValidator. We have chosen to primarily investigate two aspects:

1. Are there (combinations of) settings (polarity optimisation, output format etc.) that are clearly superior to others?

2. Is there a difference between the optimal settings for valid and invalid problems?

We could also have attempted to benchmark the BMC/DCValidator against other related tools, but have chosen not to. First of all, the description languages and/or expressiveness may be vastly different. Secondly, besides the problems involved in getting an often poorly documented tool (beyond the mere theoret-ical properties) up and running, it is difficult to create a fair benchmark test when we do not have a strong understanding of the kinds of examples that they are optimised for. Also, the implementation base may be different, so the results of such a comparison would not necessarily teach us much about the usefulness of the various approaches. For these reasons, we have chosen to concentrate on the properties of the BMC/DCValidator.

First, we present and formalise the two cases that will be used for bench-marking. Then, we describe the settings and environment in which the test shall be carried out, and finally we will discuss the results of the benchmarks.

9.1 Cases

9.1.1 Gas Burner

In Section 1.1.1 on page 2 we introduced the gas burner as an example of a real-time system. We shall now try to formulate the design decisions and requirements stated in that section as DC formulas.

First, we will define two states,gas andflame. The predicateleak (stating that the gas is leaking) shall be defined asgas∧ ¬flame.

Design decision 1: Leaks should be detected and stopped within one second.

This may be formulated as the requirement that if we have a period whereleak

CHAPTER 9. BENCHMARKS holds for more than one second, there must be some time within that period whereleak does not hold, i.e. the leak has been stopped:

2 R

leak >1⇒R

¬leak ≥1

(des1) Design decision 2: It should not be possible to switch on the gas for a period of 30 seconds after a leak. In our subset of DC, this is not completely straight-forward to formulate. However, we may state that if we have a period with leak, followed by a period without leak and then again a period with leak, the length of the interval in question must be at least 32:

2 R

leak ≥1_R

¬leak ≥1_R

leak ≥1

⇒`≥32

(des2) Finally, we must formulate the safety requirement: For any time interval at most 30 seconds long, the gas burner leaks no more than nseconds. Using DC, this is formulated as follows:

2 `≤30⇒R

leak ≤n

(req1) Now, we must validate that the conjunction of the two design decisions implies the safety requirement (which it does for any n ≥ 1). Also, we may obtain an invalid problem by leaving out one of the design decisions. We have chosen to leave out design decision 2, stated in equation des2, for the invalid test cases.

9.1.2 Scheduler

In Section 1.1.1 on page 2, we reviewed another example of a real-time system, namely a scheduler. We shall now formulate the design decisions and require-ment for the scheduler withnprocesses using DC. The state ri will be used to indicate that processPi is occupying the processor.

The design decisions were as follows:ProcessPi+1will run only when process Pi has completed its 2 time unit run for this period.

2 ^

m∈{1,...,bk/2c}

i∈{1,...,n−1}

Rri<2m⇒R

ri+1≤2(m−1)

(des1)

and: ProcessP1 will not run again until process Pn has completed its run.

2 ^

m∈{1,...,bk/2c}

Rrn<2m⇒R

r1≤2m

(des2) Then, we will assume that the processor is in constant use. This could have been phrased in several ways, but we shall state the assumption as:

`=k⇒R W

i∈{1,...,n}ri

=k (ass1)

As for safety requirements for the scheduler, we know that only a single process can be allocated the processor at any given time (mutual exclusion).

This requirement could be stated as:

CHAPTER 9. BENCHMARKS

2



^

i∈{1,...,n}

j∈{i+1,...,n}

Rri∧rj= 0



 (req1)

We shall also require that it is possible to schedule the processes so that each process has been allocated 2b2nkcseconds of processor time afterkseconds have elapsed.

`=k⇒ ^

i∈{1,...,n}

Rri= 2b2nk c (req2)

It is now possible to validate that the conjunction of the design decisions and assumptions imply the two safety requirements (mutual exclusion and a fair schedule).

An invalid problem may be obtained by dropping one of the design decisions or assumptions. In the benchmarks we shall present next, equation des2 has been removed in the invalid scheduler case.