• Ingen resultater fundet

to be completely free of errors. However, it is reasonable to expect of the authors of the experiment to take actions to reduce the risk of any errors being of significant importance to the conclusions of the study.

Focusing on the use of proper design principles, code quality and testing, and general best practices in writing scientific software is one way to build trust in computational results [109], [113], [114], [115]. Another way is to use well designed computing workflows that systematically address potential flaws in the computational experiment [116], [117].

6.1.2 The Need for Reproducibility of Results

Reproducibilityrefers to the notion of being able to independently re-implement and repeat an experiment and its outcome from a description of it [103], [118]. A related notion is replicability, which refers to the notion of being able to repeat an experiment and its outcome using the same experimental setup as was used in the original experiment1.

Without reproducibility (or at least replicability) in a computational experiment, it becomes difficult (if not impossible) to build on published results and identify and correct any errors in the experiment. Thus, for computational results to be useful for others, it is important that the description of the experiment that led to the results, i.e. the software or code as well as associated data, is transparent, complete, and easily available to others.

Extensive time and effort have been put into defining best practices and guidelines for making computational research reproducible and transparent, see e.g. [105], [106], [119], [120], [121], [122], [123]. As noted in several of these works, such guidelines are constantly evolving as more evidence on practical issues with reproducibility becomes available. Soft-ware for aiding in making computational results reproducible is evolving similarly with new tools frequently becoming available, see e.g. [121], [124], [125], [126].

6.2.1. Input Validation

• The entire package including its full API has been documented according to the Numpydoc standard3 and the documentation is automatically built and served on-line4.

• All code included in Magni releases have been reviewed by at least one other person than its author.

• An extensive function input validation system (detailed in Section 6.2.1) is used consistently throughout the package.

The magni.reproducibility subpackage provides tools that may be used to annotate

results databases and track provenance in order to aid in making results reproducible.

This subpackage is detailed in Section 6.2.2. More details about Magni may be found in [130, (PaperD)].

6.2.1 Input Validation

In implementing a large computational experiment, numerous variables are passed between functions, class constructors, class methods, etc. Typically, a function poses strict require-ments for the types of its input variables, e.g. a variable must be a string or it must be an integer. Compliance to such requirements are not automatically tested in weakly typed languages like Python5 which may lead to unexpected and erroneous results, e.g. if a string is supplied instead of an integer (in Python: »2 * 2 = 4« whereas »2 * ’2’ = ’22’«).

Additionally, scientific computations in Python typically involve the use of advanced types such as the NumPy ndarray [131] for storing matrices and vectors. The NumPy ndarray is not only itself a data type but also holds equally important information about the shape of the array, the type of its elements, etc. All of these define requirements on the input to functions.

In order to prevent erroneous results due to wrongly passed function input variables, we propose to use an online (at execution time) input validation framework based on the application-driven data types presented in Main Contribution 6. These application driven data types are specifically tailored for easily expressing the requirements for the numerical variables typically used in scientific computations, e.g. matrices and vectors.

Our implementation of our proposed input validation framework in Python enforces the validation requirements by raising an exception if an input variable does not comply with the validation scheme. This input validation framework is used throughout Magni and we have found that it not only helps us identify potentially disastrous problems in our own code but also helps in identifying problems due to changes in the underlying software libraries that we make use of, i.e. NumPy and other packages from the Scipy stack.

Main Contribution 6 (Application-driven Data Types from [132, (PaperE)]) We suggest the concept of so-called application-driven data types as a signal processing data model for programming. These data types are intended for expressing the validation scheme of function arguments.

An application-driven data type is some "mental" intersection between math and computer science in scientific computing and signal processing in particular. For exam-ple, the set of real-valued matrices with dimensions mtimes n,Rm×n , is an example of an application-driven data type. If the user is able to test the validity of a function

3The Numpydoc standard is available athttps://github.com/numpy/numpy/blob/master/doc/HOWTO_

DOCUMENT.rst.txt.

4The Magni documentation is available athttp://magni.readthedocs.io/en/latest/.

5The latest versions of Python (3.5/3.6) include a standard for specifying such requirements, though third party tools are still needed in order to enforce the requirements. Seehttps://www.python.org/dev/

peps/pep-0526/for the details.

argument against this application-driven data type, there is no need for the user to con-sider the distinction between Python floats, numpy generics, numpy ndarrays, and so on.In Python signal processing applications, there should be an application-driven data type representing the most general numerical value being able to assume any numerical value of any shape. This data type should be able to be restricted to less general data types by specifying the mathematical set, the range or domain of valid values, the number of dimensions, and/or the specific shape of the data type. The suggested validation schemes should be expressed in terms of the desired application-driven data type.

A reference implementation of this suggested validation strategy is made avail-able by the open source Magni Python package [130] through the subpackage magni.utils.validation.

More details are given in [132, (Paper E)].

6.2.2 Storing Experiments Metadata

In making a computational experiment reproducible, one must create a complete record of its computational environment including details such as library versions, parameter values, and clear detailing of the exact code that is run [103], [109], [110]. Two strategies for tracking such provenance of a computational experiment seem to have found adoption in the scientific communities:

1. Using tools that track the computational workflow and store metadata detailing the experiment. Ideally such workflow details enable a later reproduction of the experiment. Examples of such tools include Sumatra [121], Madagascar [124], and ActivePapers [125].

2. Running the experiment in a virtual machine or container such as Docker [133].

Ideally, the virtual machine or container remains forever executable and, thus, allows for future reproduction of the experiment.

The container strategy seems compelling in the current situation where scientific code tends torot, i.e. it eventually stops working as time passes due to incompatible changes in hardware platforms and the low-level software libraries [125], [133]. However, the container strategy has been criticised for being an unstable and non-transparent solution due to its black-box nature6. Thus, at least for now, the tools for implementing the container strategy do not seem to have matured sufficiently to be reliably used in science and research.

In our work, we have adopted a variant of the metadata storage strategy to mak-ing computational results reproducible. Specifically, we store metadata about a particular computational experiment along with its results. Such metadata include information about hardware and software libraries used as well as details about the input data, parameter choices, and the specific code that was run. Our proposed practical solution to this problem of storing metadata about a single computational experiment is detailed in Main Contri-bution 7.

Main Contribution 7 (A Pythonic Approach to Storing Reproducibility Metadata from [134, (PaperF)])

We propose creating a scientific Python package that may be imported in existing scien-tific Python scripts and may be used to store all relevant metadata for a computational experiment along with the results of that experiment in an HDF5 database. For the

6This criticism has been raised by several leading researches in the computa-tional sciences. See e.g. the blog by C. Titus Brown http://ivory.idyll.org/blog/

2017-pof-software-archivability.html or the blog by Gaël Varoquaux http://gael-varoquaux.

info/programming/of-software-and-science-reproducible-science-what-why-and-how.html.