• Ingen resultater fundet

3. Methodology

3.2 Research Design

indications on the existence of unobservable mechanisms (Downward & Mearman, 2006, Zachariadis et al., 2013).

Retroduction is a form of abductive reasoning. In brief, abductive reasoning is a form of logical inference that goes from an observation to a hypothesis that accounts for the observation, ideally seeking to find the simplest and most likely explanation.

Abduction involves analyzing data that fall outside of an initial theoretical framework or premise, while retroduction is primarily a method of conceptualizing and theorizing causal relationships. Retroduction refers to the application of previously identified mechanisms and/or identification of new mechanisms to explain the causal events that can generate an observed outcome in a specific context (Wynn & Williams, 2012).

Used in conjunction, these forms of inference can lead to the formation of a new conceptual framework or theory (Danermark et al., 2002).

It is generally argued that the retroductive approach to research embraces a wide variety of methods (Downward & Mearman, 2006; Venkatesh et al., 2013; Wynn and Williams 2012; Zachariadis et al., 2013). This approach integrates qualitative and quantitative approaches, with the aim of identifying and hypothesizing about the generative mechanisms that cause the events we experience. CR advocates a marginally different interpretation from the standard data analysis validation criteria of qualitative and quantitative research (Zachariadis et al., 2013). In CR-based quantitative research, internal validity goes beyond confirming observed correlations and interprets them as manifestations of the particular generative mechanism in the context of the field. Findings from correlational statistics can therefore prove beneficial, by virtue of providing information about the relationships of events observed in the empirical domain, while causal assumptions must be hypothesized or tested through further means such as experimental design.

understanding further, by exploring the meaning and mechanisms of particular processes, while econometric methods can be used to explore their generality in the sense that similar demi-regularities might be detected (Downward & Mearman, 2002).

Thus, combined use of qualitative and quantitative data may serve to generate unique insight into a complex social phenomenon in an unfolding reality (Bhattacherjee, 2012;

Von Krogh et al., 2012).

Mixed methods research has been termed the third methodological movement or paradigm (Venkatesh et al., 2013). Researchers who apply a mixed methods research approach analyze both qualitative and quantitative data and therefore, are capably positioned to use their substantial body of observations to formulate a unified theory consisting of valid concepts and theoretical mechanism, or meta-inferences (Venkatesh et al., 2013). Meta-inferences are defined as “theoretical statements, narratives, or a story inferred from an integration of findings from the quantitative and qualitative strands of mixed methods research.” (Venkatesh et al., 2013, 38).

A concurrent mixed methods design approach refers to quantitative and qualitative data that are collected and analyzed in parallel and consequently, the data are used to create a more holistic view of the phenomenon (Creswell, 2003). As open data are an emerging phenomenon, I undertook my analysis within a constantly evolving reality.

Moreover, I wanted to examine the phenomenon from up close and from afar, as is recommended for doing Engaged Scholarship (Van de Ven, 2007). I adopted a complementary approach for the use of mixed methods, using different methods to gain complementary views of the phenomenon (Venkatesh et al., 2013; Zachariadis et al., 2013). I favored this type of design, in view of the fact that the overall goal of my inquiry was firstly to understand and document the phenomenon of open data, and then to explain how open data generates value. Moreover, my design evolved into a developmental approach, as I applied the findings from my qualitative studies to guide the development of constructs and search for quantitative data, which I would subsequently use to test the hypothesized relationships (Venkatesh et al., 2013).

I selected a longitudinal case study as the main qualitative research method, because this method conforms to the fundamental doctrines of the CR philosophy (Tsang, 2014). A case study is defined as “an empirical inquiry that investigates a contemporary phenomenon in depth and within its real-life context, especially when the boundaries between phenomenon and context are not clearly evident” (Yin, 2009, 18). In general, the case study is a preferred method when: 1) the investigator asks

“how” or “why” questions; 2) the investigator has limited control over events; 3) the focus is on a contemporary phenomenon within a real-life context (Yin, 2009). The

inquiry should rely on multiple sources of evidence, and the data (for example, interviews, observations, documents and archival records) should converge in a triangulating fashion. Strong triangulation of data sources is vital towards establishing the necessary reliability and validity of a research study (Yin, 2009). The case study also benefits from the prior development of theoretical propositions to guide data collection and analysis (Yin, 2009). The strength of the case study research method is its ability to discover a wide variety of social, cultural, and political factors, which are not determined beforehand to be potentially related to a phenomenon of interest that may not be well-known in advance (Bhattacherjee, 2012).

For this PhD study, I conducted two longitudinal case studies. The first case study pertained to the global energy tech vendor Opower, and was carried out with the goal of understanding how value is generated through data-driven innovation. This case study primarily examines the demand or user side of open data. The second case study investigates and analyzes a particular ODI, the Danish Basic Data Program (BDP), and reports on the tensions and governance challenges the program encounters. This inquiry was focused on the supply side of open data. I thus used the qualitative strand for creating a detailed insight into the open data phenomenon based on two real-life cases. I explored the phenomenon through observation and interviews and made use of these data to select and conceptualize relevant constructs and to understand comprehensively the hypothesized relationships. The limitations of a case study analysis, however, are that this form of inquiry demonstrates a tendency to be heavily contextualized and nuanced (Bhattacherjee, 2012). Moreover, interpretation of findings may depend on the observational and integrative abilities of the researcher. The lack of control over events may lead to difficulty in establishing causality, and findings from a single case site may not be readily generalized to other case sites (ibid).

In order to address these limitations I also conducted two empirical quantitative studies (Paper III and Paper VII). I present the conceptual model that is used in those studies in Chapter 5. The model has foundations in state-of-the-art research, but is extended through triangulation of results from qualitative and qualitative data analysis. I encountered numerous barriers when confronted with the task of designing a study that would statistically estimate whether or not openness of data is a relevant enabling factor for societies aiming to stimulate value generation from data. The first barrier relates to the severe complications that arise if one wants to trace the value that governments, companies and individuals gain from using these data. It is understandably difficult to comprehend how societies would have fared without access to these data and experimentation was not an option in this context. Moreover, due to

the very nature of open access, it is a demanding task to identify all the users and usage of multiple sources of data. Consequently, I utilized a correlational approach as my remaining option. I conducted a macro level study comparing countries that are at various distinct stages in the process of opening data. The primary advantage of performing cross-sectional analysis is that it does not rely on the researcher being able to identify the advent of open data in all countries. In many countries, data have been open for years, much prior to the commencement of the mainstream open data initiatives. Instead, it allows the researcher to determine the status of selected variables in one particular year with the aim of detecting whether countries at more advanced stages of opening data will systematically demonstrate superior performance, conceptualized as sustainable value.

The next barrier was to analyze and deduce a method of reflecting the level of sustainable value present in different countries. Numerous macro-level analyses have previously attempted to estimate the overall economic value of open data (see for instance de Vries et al., 2011, Houghton, 2011, McKinsey, 2013a; Vickery, 2011).

However, none of these studies have attempted to document the intangible or social dimensions of value, recognized to offer even more benefits than the economic value (McKinsey, 2013a). While ODIs generally highlight the economic potential of increased use of data, in fact, even more attention has been paid to concepts such as transparency and collaboration, with the underlying motive of increasing social value.

McKinsey, a premier multinational consulting firm, has highlighted the implications of various informational benefits from open data to consumers, for example the ability to decide which school best satisfies our educational requirements, or the most convenient mode of transport suited to our needs (McKinsey, 2013a). The above examples illustrate the need for a model that can connect the availability of open data with the level of available information and give evidence to whether the existence of such information indicates higher levels of societal level phenomena, such as the overall level of education.

By virtue of their contextual nature, the basic structure of mechanisms is frequently described as a context-mechanism-outcome pattern. Hence, only by employing a structural model can we adequately reflect this structure, consequently calling for use of statistical methods that can estimate structural models. A structural model contains formulas representing the relation of every dependent variable to its independent variables, whereas the reduced form exhibits only the net or overall relation between the dependent variable and the ultimate independent variable (Henseler et al., 2009;

Tsang, 2006). The ability to operationalize theoretical but unobservable phenomena

has long been viewed as the core strength of the structural equation modeling method.

Thus, I elected to conceptualize sustainable value as a latent variable, applying economic, social and environmental indicators as recommended in the earlier works of Stiglitz et al. (2009). Additionally, the model needed to reflect the importance of the context in which value generation through open data takes place, including elements of the soft infrastructure (King & Uhlir, 2014).

The partial least squares (PLS) analysis method was chosen for several reasons.

Firstly, PLS is a recommended structural equation modeling method for secondary data analysis (Henseler & Sarstedt, 2013). Secondly, the theory development efforts needed to clarify what are the most essential antecedents of the target constructs. PLS is recommended when the goal of a study is to build rather than test theory, which was the case in this inquiry due to the emergent state of the open data phenomenon (Hair et al., 2011; Ringle et al., 2012; Sarstedt et al., 2014). Thirdly, I used a formative variable to model the robustness of regulatory data and privacy protection framework, and PLS is recommended for models that make use of formative variables. Moreover, the model included both mediating and moderating relationships, indicating high model complexity (Sarstedt et al., 2014). Finally, the data used was a relatively small set of cross-country data, including data that are not normally distributed. Since PLS is based on a series of ordinary least squares or linear least squares (OLS) regressions, it makes minimum demands regarding sample size, and generally achieves high levels of statistical power (Hair et al., 2011). Nevertheless, as is necessary, I did consider the overall characteristics of the data and model in order to ensure that the sample size was sufficient to achieve adequate statistical power (Marcoulides & Saunders, 2006).

The empirical papers (paper III, IV, VI and VII) offer a more detailed discussion of the methodology applied in each of them.