• Ingen resultater fundet

In the dynamic analysis we scan traces of system calls to detect abnormal be-haviour. This approach enables us to detect the virus while it is trying to execute itself and stop it before it actually infects other programs.

The basic idea is to detect deviations from normal behaving traces of system calls, just like the researchers from UNM did in their application for intrusion detection as explained in section 4.1 on page 35. The big difference here is that we are going to use HMMs to detect the abnormal behaviour instead of Hamming Distance and that the traces of system calls are not split into sequences. We do not split the traces into sequences because we believe that it removes some of the information about how the programs actually behave.

6.4 Dynamic Analysis 69 There are two approaches to do the dynamic analysis. The first approach is to try to simulate how the programs are going to be executed in a real environment.

Each program is executed with all possible parameters and the traces of systems calls generated during this period are used to train the normal behaviour profile for the program. After the training period the normal behaviour profile is used to detect any abnormality whenever the program is executed. The second approach is to train the normal behaviour profile from normal traces of system calls generated by a program when it is actually executed by the users of the system. The training period is not determined by a certain number of traces as with the first approach. Here we simply keep on training the normal behaviour profile until it has settled down. The first couple of traces will probably deviate a lot from each other resulting in some huge deflections, but after a period we believe that the normal behaviour profile will be settled down and now represents the user’s normal use of the program and thereby the normal behaviour of the program.

6.4.1 Learning from Synthetic Behaviour

We will in this subsection discuss how we can build a normal behaviour profile for a program by trying to simulate how the program would be executed under normal circumstances. Furthermore we describe how we detect any deviations from this normal behaviour profile.

Through a training period the program is executed with all kinds of different parameters and the traces of system calls generated are used to train a HMM representing the normal behaviour of the program. The average probability of observing normal behaving traces in the HMM is computed and saved together with the HMM. The average probability measure together with the HMM will represent the normal behaviour profile for the program. After the training pe-riod is done we are now able to detect any kind of abnormal behaviour by testing subsequent traces of the program in the HMM. This is done by obtaining the trace of system calls generated by the program, computing the probability of observing the trace in the HMM, and comparing this with the average probabil-ity measure saved in the normal behaviour profile. The procedure for training and detecting looks like this:

Training:

1. Execute program with all different kinds of parameters and obtain the traces of system calls generated: O(1), O(2),· · ·, O(n).

2. Train HMMλon these traces.

3. Compute the average probability of observing the traces inλ:

Paverage=Pni=1P(On (i)|λ). 4. SaveλandPaverage in database.

Detecting:

1. Obtain the traces of system calls ´O whenever the program is executed.

2. Compute probability of observing ´Oin λ: P( ´O|λ).

3. Compute the match affinity: Pma= PP( ´O|λ)

average.

4. IfPmais lees than some threshold the trace is thought of as being abnor-mal.

To be really sure that the detected abnormal behaviour is due to a virus infec-tion, and not just some rare program execution trace that did not occur during the training period, we can accumulate subsequent abnormalities. If repeated abnormalities are detected within the traces of the same program, it is very likely that the program is infected with a virus, because once a program has been infected, all subsequent traces will reflect the infection of the virus. If a single rather small abnormality occurs, it is properly due to a program execution which was not simulated during the training period, in this case we should not mark the program as being infected with a virus. In other words repeated large abnormalities indicate a virus infection whereas small single occurring abnor-malities indicate normal behaviour not simulated during the training period.

The technique of accumulating repeated abnormalities is also known from the biological immune system. Here 10-100 receptors of the lymphocytes need to match nonself before they will be activated and engage an immune response to kill the infection. As the receptors of the lymphocyte are bound, signals are send to the core of the lymphocyte, the stronger the binding is, the stronger a signal is sent, and the less signals are needed to activate the lymphocyte. In other words the lymphocytes have some kind of activation threshold; the signals sent to the core of the lymphocytes are accumulated and once they reach the activation threshold the lymphocytes are activated to engage the immune response.

6.4.2 Learning from Real Behaviour

In the above section we trained the normal behaviour profiles from traces gen-erated by simulated program executions with all different kinds of parameters.

In this way we simulated how the programs were normally going to be executed in a real environment. Quite another approach is to train the normal behaviour profiles from real executions of programs made in a real environment.

The first time a program is executed the trace of system calls generated by the program is used to train a HMM representing the normal behaviour of the program. All subsequent traces of the program are then used to re-train the HMM until it seems to be settled down. In the beginning many different traces will appear, but after some time all normal traces have occurred and the HMM will then represent the normal behaviour of the program.

Training:

1. Train HMMλfor the first traceO(1)and initialise the average probability Paverageto P(O(1)|λ).

2. Compute the probability of observing any subsequent trace O(i) in λ:

P(O(i)|λ).

6.5 Conclusion 71