• Ingen resultater fundet

A Validation Study for an Enzyme Analytical Method

N/A
N/A
Info
Hent
Protected

Academic year: 2022

Del "A Validation Study for an Enzyme Analytical Method"

Copied!
115
0
0

Indlæser.... (se fuldtekst nu)

Hele teksten

(1)

A Validation Study for an Enzyme Analytical Method

Xuan Liu advised by Helle Rootzen and Peter Thyregod

November 10, 2007

(2)

1

Acknowledgements

I would like to thank my supervisors, Helle Rootzen and Peter Thyregod, for their incredible support, understanding and encouragement. They have given many great suggestions and advices and have been great help in structuring the thesis and interpreting the results. Fur- thermore, I also would like to thank my friend, Cheng Xu, for critical reading though statistics is a foreign language to him.

(3)

2

Preface

This thesis was written as a part of my studies for a Master of Science degree at the Department of Informatics and Mathematical Modeling (IMM) of the Technical University of Denmark (DTU), under the supervision of Professor Helle Rootzen in IMM and Peter Thyregod from Novozymes.

This thesis deals with inter-laboratory validation study with mixed linear model.

Lyngby, November 2007 Xuan Liu

(4)

Summary

In 2005 a new methodology has been developed to evaluate the activity of the phytase enzyme in liberating phytate bound phosphorus in animal feed. In this thesis validation study of this new method is performed. Two times of inter-laboratory studies with similar design were conducted. The larger inter-laboratory study involving more Labs and Materials is the main analysis object. The analysis of smaller study is also performed as a contrast with the larger one.

Different variance structures of Linear Mixed Models are presented, such as Homogeneous Variance Model and Heterogeneous Variance Model, to detect variability characteristics of measurement error. The main characteristics of performance precision are Repeatability and Reproducibility. Besides that, whether the Type (Solid/Liquid) of the Materials effects the evaluation is another topic of interest. A guess that the liquid materials may have more stable measurements and smaller variances will be investigated by Linear Mixed Models. The homogeneity of the Labs’s capability in the evaluation is also investigated.

Usually mathematical models have to satisfy the assumptions. But in most cases inter- laboratory data could not satisfy the strict assumptions. The objectives of most validation studies are to reveal useful information and parameters of the data rather than finding a fit- ting model. Whereas this thesis also emphasizes particularly on modeling, which could supply more general characteristics of data.

From Homogeneous Variance models to Heterogeneous Variance Models, most of these models presented in this thesis were proven not to satisfy the assumptions. But the modeling and assumption checking process could supply great details of data, which could be the indication of modifying the covariance structures of models.

Key words: Inter-laboratory Validation Study, Linear Mixed Model, Homogeneous Vari- ance Model, Heterogeneous Variance Model, Repeatability and Reproducibility, Assumptions Checking.

3

(5)

4

Reader Guide

The thesis consists of three big modules: Background introduction, Linear Mixed Model theory, and statistical implementation of the theory.

The introduction module supplies the information about design of the Study A and Study B and the experiment mechanism. The definition of the main characteristics: RSDr and RSDr are also presented.

Linear Mixed Model theory module include the general introduction of the LLM models.

The model notation, covariance matrix specifications and assumptions accord with procedure mixed, the SAS 9.1 software. Hierarchical linear model is one kind of LLM. The HLM section focuses on describing the covariance matrices structures involved in this thesis. The readers who know mixed model and proc mixed well could skim the LLM section. But the HLM section is necessary to read for all because the covariance structure specifications are the cores of the HLM design.

In statistical implementation module, homogeneous variance models and heterogeneous mod- els are processed both for study A and Study B. Because Study B is more complicated, it is set as the main object of analysis. While analysis of Study A is processed as the confirmation of the conclusion obtained from study B. The RSDr and RSDR obtained from different models are calculated. Besides, the approximation confidence intervals of RSDr and RSDR of are also discussed.

(6)

5

List of Abbreviations and Notations

REML-Restricted Maximum Likelihood ML-Maximum Likelihood

RSD-Relative Standard Deviation

RSDr-Relative Standard Deviation of repeatability RSDR-Relative Standard Deviation of reproducibility WLS-Weighted Least Squares

GLS-General Least Squares MS-Mean Square

ANOVA-Analysis of Variance LMM-Linear Mixed Model HLM-Hierarchical Linear Model

(7)

6 CHAPTER 0. CONTENTS

(8)

Contents

1 Introduction 11

1.1 Background Information . . . 11

1.1.1 Design of the Inter-Lab Validation Study . . . 12

1.2 Descriptive Statistics . . . 14

1.2.1 Logarithmic Transformation of Raw Data and New Estimations of RSD 14 1.2.2 Data Description of Study A . . . 15

1.2.3 Data Description of Study B . . . 17

1.2.4 Explanatory Variables Relationships . . . 19

1.2.5 Preliminary Analysis . . . 21

2 Theoretical Methods and Modeling 29 2.1 Linear Mixed Model . . . 29

2.1.1 Introduction . . . 29

2.1.2 Notation, Assumptions and Algorithm of Linear Mixed Model . . . 30

2.1.3 Homogeneous and Heterogeneous Variance Model . . . 31

2.2 Hierarchical Linear Model . . . 33

2.2.1 Notation and Covariance Structure for different layer models of Hierar- chical Linear Model . . . 34

7

(9)

8 CONTENTS 3 Statistical Implementation of Theoretic Methods 41

3.1 Homogeneous Variance Model of Study B . . . 41

3.1.1 Test of overall model reduction . . . 42

3.1.2 Diagnostic Checking for Homogeneous Variance Model 3.4 . . . 51

3.1.3 Practical Implication of Model 3.4 . . . 60

3.1.4 RSD% of Homogeneous Variance Model 3.4 . . . 61

3.2 Heterogeneous Variance Model for Type of Study B . . . 62

3.2.1 Motivation . . . 62

3.2.2 Practical Issue of SAS statements . . . 62

3.2.3 Statistical Implementation . . . 63

3.2.4 RSD of Homogeneous Variance Model 3.12 . . . 66

3.2.5 Diagnostic Checking for Heterogeneous Variance Model 3.12 . . . 66

3.2.6 Brief Summary . . . 71

3.3 Homogeneous Variance Model and Heterogeneous Variance Model for Type of Study A . . . 72

3.3.1 Homogeneous Variance Model of Study A . . . 72

3.3.2 Diagnostic Checking for Homogeneous Variance Model A3 . . . 72

3.3.3 RSD of Homogeneous Variance Model A3 . . . 76

3.3.4 Heterogeneous Variance Model for Type of Study A . . . 76

3.3.5 RSD of Heterogeneous Variance Model AT6 . . . 78

3.4 Approximate Confidence Intervals on RSD . . . 78

3.4.1 Homogeneous Variance Model 3.4: Approximate Confidence Intervals on RSD . . . 80

(10)

CONTENTS 9 3.4.2 Homogeneous Variance Model A3 of Study A: Approximate Confidence

Intervals on RSD . . . 83

3.4.3 Comparison between Approximate Confidence Interval on RSD for Ho- mogeneous Variance Model of Study A and Study B . . . 86

3.5 Heterogeneous Variance Model for Material . . . 87

3.5.1 Heterogeneous Variance Model for Material of Study B . . . 87

3.5.2 Heterogeneous Variance Model for Material of Study A . . . 89

3.5.3 Brief Summary . . . 90

3.6 Heterogeneous Variance Model for Lab of Study B . . . 91

3.6.1 Residual Analysis . . . 94

3.6.2 Brief Summary . . . 98

4 Conclusion 99 4.1 Result and Conclusion . . . 99

4.1.1 Homogeneous Variance Model . . . 99

4.1.2 Heterogeneous Variance Model . . . 100

4.2 Further Study . . . 101

5 Appendix 105 5.1 SAS Codes for Modeling . . . 105

5.1.1 SAS Codes for Model3.4, homogeneous variance model . . . 105

5.1.2 SAS Codes for Model3.11, between-type heteroscedasticity Model . . . 106

5.1.3 SAS for RSD Approximation interval . . . 106 5.1.4 SAS Codes for Model MB6, between-material heteroscedasticity Model 107

(11)

10 CONTENTS 5.1.5 SAS Codes for Model LB3, between-Lab and between-Material het-

eroscedasticity Model . . . 107 5.1.6 SAS Codes for Model LB5, between-Lab and between-Material het-

eroscedasticity Model . . . 108 5.1.7 Log Ratio Likelihood Test code: LB3-¿LB4 . . . 108 5.1.8 Normal distribution analysis of conditional residual of LB5 . . . 109 5.2 One way homogeneity test by Lab of conditional residual of LB5 for different

materials . . . 110 5.2.1 SAS output of LB5 . . . 110 5.3 SAS output of LB3 conditional residual homogeneity one way test by lab for 8

materials . . . 112 5.4 Normality Checking for Random Effect Lab in LB3: . . . 114

(12)

Chapter 1 Introduction

1.1 Background Information

The Background Information is generally from a validation study conducted by FEFANA (European Association of Feed Additive Manufacturers).[3]

Phytase is an enzyme that can be added to feed for monogastric animals. This enzyme can liberate phytate bound phosphorus in the digestive tract of animals thereby improving the nutritive value of feed by increasing the amount of available phosphorus for animal. In addition, it has a positive impact on the environment by reducing phosphorus excretion in animal manure.

A new evaluation method:

In 2005 a new methodology has been developed to evaluate the activity of the phytase enzyme in liberating phytate bound phosphorus in animal feed.Compared with other official analytical methods applied to this purpose ,for instance, AOAC method[4], the main advantage of this new method is its capability of measuring the phytase activity of all phytase products that currently exist on the market.

Principle and Mechanism of the new method:

The principle of the method is that the inorganic phosphate released by the enzyme in the presence of an acidic molybdate/vandate reagent forms a yellow complex. The yellow complex could be measured with a spectrophoto-meter at a wavelength of 415 nm and the released inorganic phosphate is expressed in optical density(OD415). Then the optical density is

11

(13)

12 CHAPTER 1. INTRODUCTION quantified through a phosphate standard curve. Finally the activity characteristic is expressed by in phytate units (U)/kg feed sample. One phytase unit(U) is the amount of enzyme that releases 1 µ mol of inorganic phosphate from phytate per minute under the above mentioned reaction conditions.The activity is calculated as following formula:

Activity = ∆OD∗D

m∗W ∗t (1.1)

∆OD =OD415sample−OD415blank

m=slope of the standard curve[OD415/(µmol∗ml−1)]

D =dilution factor (extraction volume* dilution of the extract)[ml]

W=weight of the sample [kg]

t= 30[min]

OD415 is the result of the measurement od the feed sample subjected to the whole analytical procedure,whereas OD415blank is the result of the determination of inorganic phosphate that is already present in the same sample before enzymatic reaction.

The Logarithm Transformation of Activity (U)/kg is the response values in all models in the thesis. The details will be discussed in section of Descriptive Statistics.

1.1.1 Design of the Inter-Lab Validation Study

Two statistical validations are organized. Inter-laboratory validations were performed as fol- lowing structure: Some samples of different Materials were collected and distributed to differ- ent Laboratories.

Training period

Before formal validations some Labs were selected and participated the training exercises of applying new method protocol on some known samples of materials. After they have finished a period of test in their own labs 14 of them were selected to be involved in the final validation study.

Then two time of inter-laboratory validations were performed and their organizations are quite similar. Study B has one more layer of block design than Study A. For clarifying the design structures, we start from the simpler one: Study A.

(14)

1.1 Background Information 13 Organization of the Validation Study A

Study A was carried out according to ISO guide 5725-2, thereby allowing for the determination of the standard deviation for repeatability and reproducibility. In detail, the repeatability standard deviation SDr describes the within-laboratory variation obtained when applying the same method of analysis on the same sample under repeatability conditions (i.e. the same laboratory, the same operator, the same apparatus and short interval of time), whereas the reproducibility standard deviationSDR describes the between-laboratory variation and is obtained when performing the same method of analysis under reproducibility conditions (i.e.

on identical material obtained by operators in different laboratories). Extreme values reported by the participating laboratories were identified by sequential application of the Cochran and Grubbs outlier tests (at 2.5% probability level, 1 tail for Cochran and 2 tails for Grubbs) and were not included in the assessment of the method performance characteristics. The sequential application of these outlier tests stops when more than 22.2% (= 2/9) of the participating laboratories are identified as outliers and therefore excluded from the data set. However, the maximum number of outliers resulting for a single material was 2/14 laboratories.

Organization of the study: Fourteen participants from 9 different countries representing a cross-section of official feed control and laboratories with industry affiliations took part in this collaborative trial. Prior to the validation experiments the participating laboratories attended a training to get familiar with the method.

For second study each participant received:

(1) 10 coded samples comprised of 5 feed samples separately fortified with 5 different phytase products and sent out in blind duplicates

(2) The description of the method

(3) A report template in Excel format in which the participants had to fill the results of their analysis

Organization of the Validation Study B

Study B also included the assessment of intermediate precision as suggested in ISO guide 5725-3, since the laboratories were requested to carry out duplicate analyses on the same day and to repeat these duplicate analyses on three different days. Therefore this study allowed to estimate the precision of the method under different circumstances regarding the execution of the method, namely (a) repeatability conditions (i.e. the same laboratory and

(15)

14 CHAPTER 1. INTRODUCTION the same day), (b) intermediate conditions (i.e. the same laboratory, but different days) and (c) reproducibility conditions (different laboratories). The data set was also screened for the presence of outliers applying the same procedure as described for study A. Since in none of the cases more than 3 out 14 laboratories,corresponding to 21 %, were identified as outliers, the criterion of the maximum number of 2 out of 9 laboratories (=22%) was respected throughout the whole study.

(1)8 coded feed samples separately fortified with 8 different phytase products. The laboratories had to take sub-samples from the sample glasses to carry out the analyses on three days in duplicates

(2) The description of the method

(3) A report template in Excel format in which the participants had to fill the results of their analysis

1.2 Descriptive Statistics

1.2.1 Logarithmic Transformation of Raw Data and New Estima- tions of RSD

As introduced in Background Information Section, Chapter 1, the evaluations are the Activ- ity(Unit/kg). The variance of this response value is not stable. Figure 2.1 shows the standard deviation tend to be proportional to the mean (that is, the relative standard deviation (RSD) is constant).

The log transform could stabilize the variance:

pV ar[Yi] =C∗E([Yi])

, where C is a constant value. The standard deviation of observations tend to be proportional to the mean.

V ar[ln(Yi)] = V ar[Yi] E(Yi)2 ≈C2 Here C =

V ar[Yi]

E([Yi] , is the RSD(relative standard deviation)% of the raw data. Then the important inference is proven that the standard variance of the Log-transformation data

(16)

1.2 Descriptive Statistics 15

Figure 1.1: Standard deviation of Observations of Activity(Unit/kg) vs Means for different Materials, standard deviation tend to be proportional to the mean.

(p

V ar[ln(Yi)]) is almost the RSDR% of the raw data.[?]

In the later analysis and modeling instead of the raw observations Activity(Unit/kg), the log-transformation of Activity(ln(Unit/kg)) is set as response variable. Hence the standard variances estimated by following models are corresponding to the RSD% of raw data.

1.2.2 Data Description of Study A

The first several lines of DATA A are shown below:

Table 1.1: Data Structure of Study B Lab Material Weighing Type ln(Units/kg)

1 1 1 Solid 6.753437919

1 1 2 Solid 6.633318433

2 1 1 Solid 6.683360946

2 1 2 Solid 6.452048954

3 1 1 Solid 6.841615476

3 1 2 Solid 6.650279049

Besides the response variables ”Unit/kg” and ln(Unit/kg) all the rest columns are corre-

(17)

16 CHAPTER 1. INTRODUCTION sponding to 4 explanatory variables, Material ,Lab, Type and Weighing as explanatory variables.

Fourteen participants(Lab) from 9 different countries representing a cross-section of official feed control and laboratories with industry affiliations took part in this collaborative trial.

Those 14 Labs received 10 coded samples comprised of 5 feed samples separately fortified with 5 different phytase products(Material) and sent out in blind duplicates (Weighing).

The Material is either solid or liquid(Type). Among those 5 kinds of materials two are solid type and the rest three are liquid type.

DATA A is hierarchical structure which could be shown below:

Figure 1.2: Data A Hierarchical structure: 5 Materials are nested within Type(Solid/Liquid).

5 Materials are cross with 14 labs. Weighs are nested within Days

The design of the experiment is complete and balanced. Because the outliers have been deleted already the structure of DATA A is incomplete. The principle and method of detecting outliers are introduced in section 1.1.1, Design of Inter-laboratory Validation Study.

After deleting the outliers the distribution of Materials versus Labs in DATA A is as below:

(18)

1.2 Descriptive Statistics 17

Figure 1.3: Distribution of Materials vs Different Labs in DATA A:For any position X(i,j)(i=1,2,3,...14,j=1,2,3..5) in the graph blue dot means No i lab have observations of No j Material in DATA A ,and the blank means No i Lab do not have observations in DATA A

1.2.3 Data Description of Study B

The first several lines of DATA B are shown below:

Table 1.2: Data Structure of DATA B

Lab Day Material Weighing Units/kg ln(Units/kg) Type

1 1 1 1 1600 7.377758908 Solid

1 1 1 2 1644 7.404887576 Solid

1 2 1 1 1524 7.329093736 Solid

1 2 1 2 1376 7.226936018 Solid

1 3 1 1 1688 7.431299675 Solid

(19)

18 CHAPTER 1. INTRODUCTION The structure of DATA B is quite similar to the one of DATA A except for another Factor Day. Besides the response variables ”Unit/kg” and ln(Unit/kg) all the rest columns are corresponding to 5 explanatory variables and classification factor. 14Labsreceived 8 samples composed of 8 different Materials and took 6 sub-samples from each bottles. Then in 2 duplicates(Weigh) within 3 days(Day). The Material is either solid or liquid(Type). Among those 8 kinds of materials four are solid type and the rest four are liquid type.

DATA B is of hierarchical structure which could be shown below:

Figure 1.4: DATA B Hierarchical structure: 8 Materials are nested within Type(solid/Liquid).

8 Materials are cross with 14 labs. Days are nested within Labs. Weighs are nested within Days

The design of the experiment is complete and balanced. Because the outliers have been deleted already the structure of DATA B is incomplete. The principle and method of detecting outliers are introduced in section 1.1.1, Design of Validation Study.

After deleting the outliers the distribution of Materials versus Labs in DATA B is as below:

(20)

1.2 Descriptive Statistics 19

Figure 1.5: Distribution of Materials vs Different Labs in DATA B: For any position X(i,j)(i=1,2,3,...14,j=1,2,3..8) in the graph blue dot means No i lab have observations of No j Material in DATA A ,and the blank means No i Lab do not have observations in DATA A

1.2.4 Explanatory Variables Relationships

Figure 2.1 and 2.3 is the general structure graph of DATA A and DATA B. It is necessary to clarify the relationships between explanatory variables two by two before Modeling, which results will supply important inference of model specification later. Because DATA A and DATA B structures are quite similar, here I only state the DATA B case, which is more complicated because of another layer ”DAY”.

Labs VS Material

In this all the Labs received all 8 kinds of materials. First assume the relationship is hierar- chical,nest Material within Lab:

(21)

20 CHAPTER 1. INTRODUCTION

Figure 1.6: Assumption: If Material is nested within Lab, how does the relationship look like.

Actually Material and Lab are cross with each other.

If the index of Material could be written as Material1, Material2,..Material8,Material9,Material10, then relationship could be Hierarchical. However, here the Material1 of Lab1 and Material1 of Lab2 are totally same samples. Therefore the relationship between Material and Lab is cross rather than Hierarchical.

Material VS Type

vs st.jpg

Figure 1.7: Material is nested within Type(Solid/Liquid).

(22)

1.2 Descriptive Statistics 21 The Materials are either solid or liquid. It is natural to nest Material within Type. There exists no contrast that one material have both liquid and solid types. The effects of Type and Material are totally mixed.

Labs VS. Day

Day is assumed to be nested within Labs. Each lab evaluated each kind of material during three days.

1.2.5 Preliminary Analysis

Here we use ln(Units/kg) (the log transformation) instead of the original observations(Units/kg) as response value. For simplicity, later in this thesis the observations and response values just mean the log-transformation of original observations.

Observations VS. Material

Because one of this thesis aims is to detect whether the Type(Solid/Liquid) of Materials have different variances , observations and means of different materials of solid and liquid are plot separately. we start to plot and analysis from DATA B which is is more complicated and has more observation(618)..

In the 2 by 2 figure matrix of DATA B below the first line is of scatter plots and second line is of box plots with usual 4 interquartile range. The variances of the response value, ln(Units/kg), are different from material to material. ln(Units/kg) of Material No.5 and No.6 have obviously smaller variances than the rest. If we compare the plot of 4 liquid materials on the left with the plot of 4 solid materials on the right side, we could see that the measurements of Liquid materials are more dispersive than the ones of solid. But not all the ranges of observations of the Liquid material are wider than the ones of solid, for instance No5 is more compact than No 8. It is hard to make explicit conclusion in DATA B case. The 2 by 2 figure matrix of DATA A indicates a contrary situation: measurements of Liquid materials are more compact than the ones of solid.

The dispersion of response values are also effected by the Labs and Days. The random effect Lab and Day are confounded with Residuals totally. After the final model has been applied

(23)

22 CHAPTER 1. INTRODUCTION to Data B we will return to comparisons of errors of different Types and Materials.

(24)

1.2 Descriptive Statistics 23

Figure 1.8: DATA B:The observations (Scatter/Box)plot vs. 8 kinds of Materials ,which are divided into two groups by Type(Liquid/Solid) The two figures on the left are of 4 Liquid Materials and two on the right are of Solid Materials. The Box Plots here indicate Interquartile ranges of response values(ln(Units/kg))

(25)

24 CHAPTER 1. INTRODUCTION

Figure 1.9: DATA A:The observations (Scatter/Box)plot vs. 5 kinds of Materials ,which are divided into two groups by Type(Liquid/Solid) The two figures on the left are of 4 Liquid Materials and two on the right are of Solid Materials. The Box Plots here indicate Interquartile ranges of response values(ln(Units/kg))

(26)

1.2 Descriptive Statistics 25 Comparing Activity(ln(unit/kg)) of materials estimated by different Labs

In the 2 by 2 figures of DATA B in above section, for each Material there are more than 50 observations. Those 50 scatter points distribute on one line only 2 centimeters long-it is hard to see the clear distribution trends of the points. In the Box-plot the central line is marked and the acentric trends is clear.

How do different labs affects the observations distribution? Do some Labs always intend to supply higher measurements ? Do the labs give different estimates of the same material? A simple analysis and plot could give some hints.

Assuming we only have one lab, it processed the evaluations of one kind of material on three days and two duplicates one day. With those 6 observations this lab could give an estimate of this material.

Now labs could supply 14 estimates for each kind material. In the scatter plot there are only 14 points at most on one line. The Density of the scatting points is much smaller and we could see how they distribute clearly now.

Figure 1.10: DATA B: Scatter and Line plots of Mean Values of Activity estimated by different Labs.

(27)

26 CHAPTER 1. INTRODUCTION The left figure above is the scatter plot of estimates of 8 materials from 14 different Labs. The distributions of those estimates are still rather dispersive. The estimates of material activity vary from lab to lab. In addition the dispersive trends are kind of similar with the original observations plots in figure 1.8: Obviously Material No4 and No7 are most dispersive in both those 1.8 and 1.10 cases. It seems that the block effect ,Lab, is a big source of the variability of the measurements.

The line plot on the right is the same as scatter plot on the left except the estimates points from same lab are connected with colorful lines. Because the data structure is not complete and some labs have no estimations of some Materials, the 14 colorful lines could not form perfect cluster: Some lines ”jump” away from some Material positions because of being lack of measurements of those kinds of materials. But there still exist some information valuable:

Some lines tend to have trends to have higher or lower measurements. The most clear case is that: For all the materials Lab No 11(Grey line) always give higher estimations than Lab No.13(Green Line).

Figure 1.11: DATA A: Scatter and Line plots of Mean Values of Activity estimated by different Labs.

(28)

1.2 Descriptive Statistics 27 Because in Figure 1.10 the lines are too closed with each other, it is a little hard to distinguish the lines of different labs. New plots are processed: In figure 1.12 estimates activity of different materials is plot versus Labs. 8 different colorful lines indicate the 8 materials. In ideal case the 14 labs should have same capability and give almost same measurements of same material.

Then those lines should almost parallel X-axis with some random oscillations.

Obviously this is not our case. In Figure 1.12 the lines have some obvious pattern for Labs.For instance, Lab 8 have smaller estimates of Materials 4, 7, 6 and 8. From the Material aspect, curves of Material 3,5 and 6 are more stable which means estimates of those materials from different labs are more closed to each other. And this point is also proven in scatter plot of Figure 1.8, in which dots of Material no 3,5 and 6 are obviously more compact.

In Figure 2.10 the curve of Material 1 oscillates more wildly than the rest, which inosculates the information in Figure 2.6 and 2.8 where the dots of Material 1 is most disperse.

Similar plots of DATA A is also presented as below. It also show some patterns for Labs. For instance Lab No 8 give lower estimates than the rest labs. And Material No 1 oscillated most wildly ,which is accord with Figure 1.9, Material No 1 has largest variance.

Figure 1.12: DATA B: Scatter and Line plots of Mean Values of Activity estimated by different Labs.

Further study of those supposes will be examined after modeling and discussed in the later Chapters.

(29)

28 CHAPTER 1. INTRODUCTION

Figure 1.13: DATA A: Scatter and Line plots of Mean Values of Activity estimated by different Labs.

(30)

Chapter 2

Theoretical Methods and Modeling

2.1 Linear Mixed Model

2.1.1 Introduction

The classical variance analysis and regression analysis are based on rather strict assumptions about the data : The structure must be described by a linear model, observations, or rather the residual or error terms, must follow a normal distribution, they must be independent and the variability should be homogeneous.

The linear mixed models extends the general linear model by allowing a more flexible speci- fication of the covariance matrix of errors. In other words, it allows for both correlation and heterogeneous variances. SAS Proc Mixed has recently explored the the great versatility of the mixed linear models. And still today, many statistical packages will only offer a lim- ited version of the possibilities with mixed linear models. In this thesis I use SAS Proc Mixed as primary analysis tool. In the theory Chapter The definition , notation and as- sumption adapted here are on the basis of Proc MixedHelp document. Thereby the theory description could keep consistency with implementation and calculation.

The primary assumptions underlying the analyses performed by PROC MIXED are as follows:

The data are normally distributed.

The means (expected values) of the data are linear in terms of a certain set of parameters.

The variances and covariances of the data are in terms of a different set of parameters, and they exhibit a structure matching one of those available in PROC MIXED.

29

(31)

30 CHAPTER 2. THEORETICAL METHODS AND MODELING

2.1.2 Notation, Assumptions and Algorithm of Linear Mixed Model

Formulation of Linear Mixed Model

The formulation of Mixed Model could be written as:

y=Xβ+Zγ+ (2.1)

Formula 3.1 is the general notation of mixed linear model which contain both random and fixed parameters.

Y is a (n×1) vector of observed data, X is an (n×p) fixed-effects design or regressor matrix of rank k, Z is a (n×g) random-effects design or regressor matrix, β is a (p×1) vector of fixed-effects, γis a (g×1) vector of random effects, and is an (n×1) vector of model errors (also random effects). Thedistributional assumptions made by the MIXED procedure are as follows:

E γ

!

= 0

0

!

(2.2)

Var γ

!

= G 0

0 R

!

(2.3) 1. γ s normal with mean 0 and variance G;

2. is normal with mean 0 and variance R;

The random components γ and are independent.

Estimating Variance Components (G and R) in the Mixed Model

The covariance structure of LMM is much more complicated than the one of GLM. Therefore the estimation of parameters is more complicated. The Least square method could not work because the violation of the assumption of GLM. SAS proc mixed implements Maximum Like- lihood estimation which admits the random missing data and complicated variance structure.

(32)

2.1 Linear Mixed Model 31 restricted maximum likelihood (REML). REML estimates of the covariance components are based on residuals which are computed after estimating the fixed effects by WLS or by GLS and are estimates based on maximizing a marginal likelihood. REML estimates take into account the degrees of freedom used in estimating the fixed effects when estimating the covariance components. Besides, REML could constrains the variance component estimates to be non-negative. REML estimates of G and R are found by maximizing the following log-likelihood function

REML:−1

2log|V| −1

2log|X0V−1X| − 1

2r0V−1r− n−p

2 log(2π) (2.4) where V =ZGZ0+R, the residuals vector: r = y- X( X’ V-1 X)- X’ V-1 y and p is the rank of X.

PROC MIXED actually minimizes -2 times this function using a ridge-stabilized Newton- Raphson algorithm. The output of goodness value is -2 Res log Likelihood instead of Likeli- hood Value. Besides REML, there also exist other estimation methods such as ML, MIVQUE0, or Type1 -Type3 (Moment Estimate). REML and ML is the favorable method which is proven by the simulations evidence presented by Swallow and Monahan (1984). In all programming of this thesis the estimation option is set as REML.

Estimating β and γin the Mixed Model

REML provides the estimates of G and R, ˆG and ˆR. According to the solution of mixed model equation the estimates ofβ and γ could be denoted as :

βˆ= (X0−1X)−1X0−1y (2.5) ˆ

γ = ˆGZ0−1(y−Xβ)ˆ (2.6)

2.1.3 Homogeneous and Heterogeneous Variance Model

The observations vector y has a multivariate normal distribution with an expected value of E(y) =Xβ

(33)

32 CHAPTER 2. THEORETICAL METHODS AND MODELING and variance

V(y) =ZGZ+R

Parameters of this model are the fixed-effects β and all unknown variance components in the variance matrices G and R. The unknown variance components are referred to as the covariance elements in covariance matrix V. R is the within-subjects variance component.

ZGZ0 is the between-subjects variance component.

Proc MIXED could also allows the user to specify, separately and jointly, covariance structures that assume within-subjects and/or between-subjects heteroscedasticity.[?] Within-subjects heteroscedasticity occurs when the variances across repeated measures are unequal. Between- subjects heteroscedasticity occurs when covariance matrices differ across groups. Naturally the discrimination of the two kinds of Linear mixed model is:

Homogeneous variance model: All the observations have same covariance matrix.

Heterogeneous variance model: Within or between subjects heteroscedasticity occurs. In this thesis the heterogeneous variance structure for Type Lab and Material will be crucial topic discussed later.

Covariance Structure(R and G) and Relative Practical SAS Issue

The simple random model only contains random effect, X=0 and R =σ2In, where In is n by n identity matrix. In contrast general linear model only contain fixed effects and Z=0. And R =σ2In still hold.

A real LMM(Linear Mixed Model) would have more complicated structure of R matrix. The Repeated Statement of Proc MIXED of SAS models the intra-individual variation and includes the structure of Ri =V(1), where Ri is a block diagonal matrix for subject i. The Group Option of Repeated Statement could define a effect specifying heterogeneity in the covariance structure of R. All the observations having the same level of the group effect have same. Without specification by Repeated Statement R is justσ2InandRi are the same for all the subjects.[7]

ZGZ0 is the between-subjects variance component. Random statement defines the random effects,Zγ, and the structure of G. TheGroup optiondefines an effect specifying heterogene- ity in the covariance structure of G, ,the heteroscedasticity between-subjects. All observations having the same level of the group effect have the same covariance parameters. Each new level of the group effect produces a new set of covariance parameters with the same structure as

(34)

2.2 Hierarchical Linear Model 33 the original group.

Instead of denoting G and R individually the combination of them, covariance matrix V will be the primary tool used in the following Modeling Analysis section to show the covariance between two observations.

2.2 Hierarchical Linear Model

Hierarchical linear model(HLM) is a type of mixed model with hierarchical data - that is, where data exist at more than one layers. HLM models focus on differences between groups in explaining a dependent variable, which is just the study case of this thesis. By introducing HLM the variance structure of multi-layer models involved in this thesis could be illuminated.

The original definition of HLM was stated By Raudenbush and Bryk 1986: Hierarchical linear model a particular regression technique that is designed to take into account the hierarchical structure. Historically HLM has been used in educational research such as students nested with classes, classes nested within schools . Recently the new statistical computing ability made HLM used widely in many disciplines, including Biostatistics.

With traditional regression approaches, such as multiple regression and logistic regression, an underlying assumption is that the observations are independent. This means that the observa- tions of any one individual are not in any way systematically related to the observations of any other individual. The assumption is violated, however, when some of observations sampled are from the same Laboratory, or the same equipment. When the assumption of independence is violated, the regression coefficients can be biased, and the estimates of standard errors are smaller than they should be.

Multi-level variance decomposition techniques such as HLM offer a number of advantages over traditional analysis techniques such as ANOVA and regression. First,because HLM sep- arates out the criterion variance into within-and between-crew components, error terms are not systematically biased. This leads to more accurate effect size estimates and standard er- rors. Second, because HLM uses all available information, meaningful variance is not wasted.

Finally, HLM allows for testing cross-level effects.[5]

(35)

34 CHAPTER 2. THEORETICAL METHODS AND MODELING

2.2.1 Notation and Covariance Structure for different layer models of Hierarchical Linear Model

As introduced before HLM is a special case of mixed model. Hence the algorithm described in Mixed Model is totally accommodated to HLM. In this section the variance structure is the primary part of study.I have adapted the notation to comfort it to the next modeling step on our two data files.

The experiment design from DATA B is simply described as following:

1. The objective of Samples are of 8 Materials. Those samples are either solid ,or liquid.

2. 14 Laboratories have been chosen as participants. Each Labs received 8 samples comprised by 8 materials 3. Those labs implemented the validation measures on sub-samples in different days and 2 duplicates each day.

4. The outliers have been tested and deleted from the validation DATA. In the end the data file on which modeling is implemented is unbalanced and incomplete.

One-Layer Model

The simplest hierarchical model is one-layer model. The common natation is :

yi =Xβ+εi, εi ∼N(0, σ2) (2.7) Here are the observations are independent and follow normal distribution. Xβ4 denotes the fixed effects part of the model because the fixed part is the predictors of mean value. This simple model is denoted here to clarify the structure of a more complicated hierarchical linear model step by step.

In this simple model mean value could be estimated by Maximum likelihood and denoted as:

βb= (X0X)−1X0y (2.8)

Where β is the coefficient of fixed effects, X is the design matrix of fixed effects and y is observation values.

(36)

2.2 Hierarchical Linear Model 35 The un-bias estimate of the single variance in the model could be noted as:

σb2 = 1 N −p

N

X

i=1

(yi−µbi)2 (2.9)

N is number of observations, and p is number of means. µbi is the un-bias estimated mean value, which is calculated byµbi =Xβbi

We start to analyze the Phytase validation DATA B with one layer model , thereby the random effects, Labs and Day, are ignored. The remaining fixed effect is Material, which has 8 levels. The one layer model for Study B becomes:

yi =intercept+M aterialii, εi ∼N(0, σ2) (2.10)

Two-Layer Model

A two-layer Hierarchical model could be denoted as:

yi =µ+a(Subjecti) +εi, a(Subjecti)∼N(0, σ2), εi ∼N(0, σ2) (2.11)

Where both a(Subjecti) and εi are independent. The notation like a(Subjecti), which brackets a effect, means this effect is set as random. In this model the observations from the same subject are positively correlated with each other. For convenience and simplicity here V is used to denote covariance matrix instead of G and R.

The covariance between two observations is:

cov(yi, yj) =





0 subjecti 6=subjectj and i6=j;

σa2 subjecti =subjectj and i6=j;

σa22 i=j.

(2.12)

σa2 is the variance between subjects, and σ2 is variance between observations within the same object. The variance and fixed parameters are estimated by restrict maximum like- lihood(REML), which result in a un-bias estimate. This estimate of fixed effects βbis denoted

(37)

36 CHAPTER 2. THEORETICAL METHODS AND MODELING as :

βb= (X0V−1X)−1X0V−1y (2.13) where X is design matrix of fixed effects, y is observations and V is covariance matrix of y which structure is as below,

V=

σa22 σ2 σ2 0 0 0

σ2 σa22 σ2 0 0 0 σ2 σ2 σa22 0 0 0

0 0 0 σa22 σ2 σ2

0 0 0 σ2 σa22 σ2

0 0 0 σ2 σ2 σ2a2

(2.14)

The variance matrix above is a block diagonal pattern which indicates the case that there are 6 observations within two subjects.

On the base of illumination of two-layer model structure above, we continue to modeling with DATA B. In this step the random effect DAY is set as subject.

yi =intercept+M ateriali+a(DAYi) +εi,where a(DAYi)∼N(0, σa2),εi ∼N(0, σ2) (2.15)

Here i= 1,2,3,,,618. 618 is number of the observations.

Three-Layer Model

A three-layer Hierarchical model could be denoted as:

yi =µ+a(Subjecti)(blocki) +b(blocki) +εi, b(blocki)∼N(0, σ2b), a(Subjecti)∼N(0, σ2a), εi ∼N(0, σ2) (2.16)

Where b(blocki), a(Subjecti) and εi are independent variances. Besides observations within same subject are correlated with each other, the subjects within the same block are also correlated. The notationa(Subjecti)(blocki) means subject is nested within block. To

(38)

2.2 Hierarchical Linear Model 37 illuminate the covariance structure of three-layer model, the matrix is denoted as below:

V=

σb22a2 σ2ba2 σ2b σb2 0 0 0 0 σb22a σ2ba22 σ2b σb2 0 0 0 0 σ2b σb2 σb22a2 σb22a 0 0 0 0 σ2b σb2 σb2a2 σ2ba22 0 0 0 0

0 0 0 0 ∗ ∗ ∗ ∗

0 0 0 0 ∗ ∗ ∗ ∗

0 0 0 0 ∗ ∗ ∗ ∗

0 0 0 0 ∗ ∗ ∗ ∗

(2.17)

σb2 is the variance between blocks, σa2 is the variance between subjects, and σ2 is variance between observations within the same subject. To simplify the structure we adopted 8 obser- vations and two random effects variance matrix, which is also a block diagonal pattern. In the matrix the ”*” symbol parts is the symmetrical block of the top-left 4 by 4 sub-matrix.

On the basis of the three-layer model structure above I implemented the model with the DATA B. The random effect Lab is set as the third layer.

yi =intercept+M ateriali+a(DAYi)(Labi) +b(Labi) +εi (2.18) where b(Labi) ∼ N(0, σ2b), a(DAYi) ∼ N(0, σa2), εi ∼ N(0, σ2). Here i= 1,2,3,,,618. 618 is number of the observations in DATA B.

Three-Layer Model with Interaction Term

If there exists the interaction term between Lab and Material, the model is:

yi =intercept+M ateriali+a(DAYi)(Labi) +b(Labi) +c(M ateriali :Labi) +εi (2.19) whereb(Labi)∼N(0, σb2),a(DAYi)∼N(0, σ2a), c(M ateriali :Lab)∼N(0, σc2),εi ∼N(0, σ2).

The variance of any observation is

V ar(yi) =σy22ab2c22 and the covariance between two observationsi6=j are:

(39)

38 CHAPTER 2. THEORETICAL METHODS AND MODELING

cov(yi, yj) =













0 Labi 6=Labj ;

σb2 M ateriali 6=M aterialj and Labi =Labj and Dayi 6=Dayj ; σab2 M ateriali 6=M aterialj and Labi =Labj and Dayi =Dayj ; σb22c M ateriali =M aterialj and Labi =Labj and Dayi 6=Dayj ; σa22bc2 M ateriali =M aterialj and Labi =Labj and Dayi =Dayj ;

(2.20)

It is convenient to think of the observations as an 16×1 vector. There are 2 labs conduct measuring on 2 kinds of material. Each lab takes measurements on 2 days, and 2 duplicates for each day. (2×2×2×2 = 16) The first 8 observations are of first material. And within the material, first four of observations are of first lab. Within the lab first 2 observations are od the first day. The 16×16 covariance matrix structure is denoted as below:

V = Σ11 Σ12 Σ21 Σ22

!

(2.21) where Σ11122122 are 8×8 matrices as follows:

Σ1122 =

σy2 σa22bc2 σb22c σ2bc2 0 0 0 0 σa2b22c σ2y σb22c σ2bc2 0 0 0 0 σ2bc2 σb2c2 σy2 σa2b22c 0 0 0 0 σ2bc2 σb2c2 σ2ab2c2 σy2 0 0 0 0

0 0 0 0 ∗ ∗ ∗ ∗

0 0 0 0 ∗ ∗ ∗ ∗

0 0 0 0 ∗ ∗ ∗ ∗

0 0 0 0 ∗ ∗ ∗ ∗

(2.22) Σ11 and Σ22 are the two covariance matrices of two kinds of materials.

(40)

2.2 Hierarchical Linear Model 39

Σ12021=

σ2ab2 σ2ab2 σb2 σb2 0 0 0 0 σ2ab2 σ2ab2 σb2 σb2 0 0 0 0 σb2 σb2 σ2ab2 σ2ab2 0 0 0 0 σb2 σb2 σ2ab2 σ2ab2 0 0 0 0

0 0 0 0 ∗ ∗ ∗ ∗

0 0 0 0 ∗ ∗ ∗ ∗

0 0 0 0 ∗ ∗ ∗ ∗

0 0 0 0 ∗ ∗ ∗ ∗

(2.23)

The matrix notation is inspired by Douglas C. Montgomery 2001.[6]

(41)

40 CHAPTER 2. THEORETICAL METHODS AND MODELING

(42)

Chapter 3

Statistical Implementation of Theoretic Methods

3.1 Homogeneous Variance Model of Study B

DATA B is of hierarchical structure which could be shown below:

Figure 3.1: Data B Hierarchical structure: 8 Materials are nested within Type (solid/Liquid).

8 Materials are cross with 14 labs and the cross relationship is complete. Days are nested within Labs. Weighs are nested within Days

41

(43)

42 CHAPTER 3. STATISTICAL IMPLEMENTATION OF THEORETIC METHODS In this section Homogeneous Variance models are implemented, which means all the observa- tions have same variance components.

3.1.1 Test of overall model reduction

I start with a overall model in which possible effects set as fixed effects. In the ANOVA table the p-values of those terms could be inferences of the following model selection and reduction.

The formula could be denoted as:

yi =µ+T ypei+M ateriali(T ypei) +DAYi(Labi) +Labi+M ateriali :Labii (3.1) where εi ∼N(0, σ2), i= 1 : 618

All the concerning effects are described in Chapter One: Data description. Type is the state of material (Solid/liquid). There are 8 kinds of materials in total. 14 labs participated the Inter-Laboratory Study. Each Lab took sub-samples and evaluates those sub-samples in three Days, two duplicates for each Day. After the deletion of outliers, there are 618 observations left for modeling.

Part of the SAS analysis output is :

(44)

3.1 Homogeneous Variance Model of Study B 43

Figure 3.2: SAS output of model 3.1

The ANOVA table shows that SAS could not return the F-value and P-value because of being lack of degree of freedom. Considering the situation that the model analysis machine could not hold so many terms as fixed effects, I remove Type from the the fixed model, and a sub-model of 3.1 is as follow:

yi =µ+M ateriali+DAYi(Labi) +Labi +M ateriali :Labii (3.2) where εi ∼N(0, σ2), i= 1 : 618

Part of Result returned by SAS:

(45)

44 CHAPTER 3. STATISTICAL IMPLEMENTATION OF THEORETIC METHODS

Figure 3.3: SAS output of model 3.2

ANOVA table above shows that all the terms are significant at 0.01% level. Therefore there is no evidence to delete any term from the model.

From Model 3.1 to Model 3.2 ,Type was deleted to simplify the model by which the F-value and P-value could be calculated. But I still have interest to see in a real mixed model, which contains both random and fixed effects, how does Type affect the modeling. In model 3.3 all the terms in 3.1 are included, but Day,Lab and Material:Lab are set as random effects:

yi =µ+T ypei+M ateriali(T ypei) +a(DAYi)(b(Labi)) +b(Labi) +c(M ateriali :Labi) +εi (3.3) where b(Labi) ∼ N(0, σ2b), a(DAYi) ∼ N(0, σa2), c(M ateriali : Labi) ∼ N(0, σ2c), εi ∼ N(0, σ2).

Part of the SAS analysis output is :

(46)

3.1 Homogeneous Variance Model of Study B 45

Figure 3.4: SAS output of model 3.3

(47)

46 CHAPTER 3. STATISTICAL IMPLEMENTATION OF THEORETIC METHODS Figure 3.4 shows in model 3.3 SAS could estimate all the fixed effects and P-values. This model could be a candidate of final model. In the Fit Statistics Table, four kinds of goodness- of-fit values are supplied. In this thesis -2Restricted log Likelihood values was chosen as the main criterion to select the optimum model.

Model 3.3’s -2Res log Likelihood is 1011.4.

On the basis of this full model, several sub-models were constructed by deleting the fixed effects and random effects, Type, Day, Lab , and random cross term Material:Lab one by one. All the goodness of fit values of those models were compared in the end.

In model 3.3 all the levels of Material and Type are significant at 0.01% level. In practical Type and Material effects are mixed because one kind of material is either solid or liquid.

From this aspect a sub-model 3.4 of 3.3 is implemented which drops the Type effect:

yi =µ+M ateriali +a(DAYi)(b(Labi)) +b(Labi) +c(M ateriali :Labi) +εi (3.4) whereb(Labi)∼N(0, σb2),a(DAYi)∼N(0, σ2a), c(M ateriali :Lab)∼N(0, σc2),εi ∼N(0, σ2), i= 1 : 618

(48)

3.1 Homogeneous Variance Model of Study B 47 Part of the SAS output:

Figure 3.5: SAS output of model 3.4

(49)

48 CHAPTER 3. STATISTICAL IMPLEMENTATION OF THEORETIC METHODS Figure 3.5 shows -2Res log Likelihood value of model is 1011.4, as same as the value of 3.3, which means the fixed effect Type in model 3.3 could not improve the model fitting.

Model 3.5 is a sub-model of model 3.4 rejecting the random cross effect Material:Lab .

yi =µ+M ateriali+a(DAYi)(b(Labi)) +b(Labi) +εi, (3.5)

where b(Labi)∼N(0, σb2),a(DAYi)∼N(0, σa2),εi ∼N(0, σ2), i= 1 : 618 The SAS returned the -2 (Restricted log-likelihood) value of model 3.5: -968.4 Model 3.6 is a sub-model of model 3.5 rejecting the random block effect Lab.

yi =µ+M ateriali +a(DAYi) +εi, (3.6)

where a(DAYi)∼N(0, σa2), εi ∼N(0, σ2), i= 1 : 618

The SAS returned the -2 (Restricted log-likelihood) value of model 3.6: -855.2 Model 3.7 is a sub-model of model 3.5 rejecting the random block effect Day.

yi =µ+M ateriali+b(Labi) +εi, (3.7)

where b(Labi)∼N(0, σb2),εi ∼N(0, σ2),i= 1 : 618

The SAS returned the -2 (Restricted log-likelihood) value of model 3.7: -947.9

Model 3.8 is a sub-model of model 3.7 and only contains the fixed effect Material and the interception µ.

yi =µ+M aterialii, εi ∼N(0, σ2) (3.8)

(50)

3.1 Homogeneous Variance Model of Study B 49 The SAS returned the -2 (Restricted log-likelihood) value of model 3.8: -854.6

Model 3.9 is another possible reduction way. If the cross effectMaterial:Lab is proved to be significant and could not be left out the model we could not drop the basic effect Material and Lab neither. Hence the new investigation is whether Day is a necessary effect.

yi =µ+M ateriali+b(Labi) +c(M ateriali :Labi) +εi (3.9) here b(Labi)∼N(0, σb2), c(M ateriali :Lab)∼N(0, σc2),εi ∼N(0, σ2)

The SAS returned the -2 (Restricted log-likelihood) value of model 3.9: -978.9

Model Comparison

Restricted Likelihood Ratio Test is used here to compare different Models. For instance Model B is a sub-model of model A. Model A has r more parameters than Model B. Twice the change in log likelihood is referred to a chi2r distribution. The additional parameters could be either fixed parameters or random(variance components).[1] The test statistic could be written as:

GA→B = 2leA−2leB =−[(−2leA)−(−2leB)] (3.10) Where 2le indicates the 2 log-likelihood values of models. The restricted likelihood ratio test G statistic should follow chi2df -distribution, where df denotes the difference of number of variance components between Model A and B.

For the model 3.3 to 3.9 seven different models with different variance structures were com- pared:

(51)

50 CHAPTER 3. STATISTICAL IMPLEMENTATION OF THEORETIC METHODS Table 3.1: Model Comparison Table

Model -2le G-value df P-value

3.3: Material,Type,Day,Lab,Lab:Material included -1011.4 G3.3→3.4 = 0 1 1 3.4: Material,Day,Lab,Lab:Material included -1011.4 G3.4→3.5 = 32.5 1 1.1919E-8

3.5: Material,Day,Lab included -968.4 G3.5→3.6 = 113.2 1 0 3.5: Material,Day,Lab included -968.4 G3.5→3.7 = 20.5 1 .000005963

3.6: Material,Day included -855.2 G3.6→3.8 = 0.6 1 0.43858 3.7: Material,Lab included -947.9 G3.7→3.8 = 93.3 1 0 3.8: Only Material included -854.6

Another Reduction Way: start with dropping Day effect:

3.4: Material,Day,Lab,Lab:Material included -1011.4 G3.4→3.9 = 43 1 5.474E-11 3.9: Material,Lab,Lab:Material included -978.9

Table 3.1 gives all the possible models which may appear in the reducing process.

We start from the full model 3.3. The statistic of test From 3.3 to 3.4 is 0 with P-value zero,which meansType as fixed effect can not improving Model fitting . The statistic of test from 3.4 to 3.5 shows the interaction effect Material:Lab is significant 0.0001% level with tiny p-value 1.1919E-8. Therefore the reduction steps of basic effects Material and Lab are unnecessary to be considered.

Then following the reducing of Day from model 3.4. The statistic of test from 3.4 to 3.9 shows the random effect Day is also significant at 0.0001% level with small p-value 5.474E-11.

Finally Model 3.4 is testified to be necessary and fit better than the reduced models:

yi =µ+M ateriali +a(DAYi)(b(Labi)) +b(Labi) +c(M ateriali :Labi) +εi

,where b(Labi)∼N(0, σ2b),a(DAYi)∼N(0, σa2), c(M ateriali :Lab)∼N(0, σ2c),εi ∼N(0, σ2)

The analysis of variance of 3.4 is summarized in Figure 3.5. Materials are fixed effect. Lab ,Day and Material:Lab are random.

Referencer

RELATEREDE DOKUMENTER

This figure tells us that the real area-specific conductance found by means of this model must lie somewhere on this curve between 6.66710 -8 and 1.00510 -8 mmols -1

 The Epsilon family includes languages for model validation, code generation, model comparison, model migration, and model merging.  It combines an imperative programming

For our purposes, two dimensions of the analytical framework are of particular interest: The model which focuses on communication policies as a process, and the model-like listing

Until now I have argued that music can be felt as a social relation, that it can create a pressure for adjustment, that this adjustment can take form as gifts, placing the

I We started in week 6 (now we are in week 10) I lab 6: Hexadecimal to 7-segment decoder I lab 8: 7-segment display with a counter I lab 9: Multiplexed Seven-Segment Display I

Though this study identifies a lack of business model communication in an acquisition process, the analyti- cally indicated links between external communication and

A process report on developing the External Collaboration Model, explaining the collaboration between RDIs, SMEs and IOs..

The relation between wood and fire is stretched between wood as a building material - wood as a combustible and fire as a destructive force - fire as an inherent quality of wood,