Living Informed Consent

We propose the term Living Informed Consent, for the aligned business, legal, and technical solutions where the participant in the study is empowered to understand the type and quality of the data that is being collected about her, not only during the enrolment, but also when the data is being collected, analysed, and shared with 3rd parties. Rather than pen & clipboard approach for the user enrolment, the users should expect to have a virtual place (a website or application) where they can change their authorizations, drop-out from the study, request data deletion, as well as audit who, and how much is analysing their data. As the quantity and quality of the data collected increases, it becomes difficult to claim that single sentence

20 Informed Consent

description of we will collect your location truly allows the participant to realize the complexity of the signal collected and possible knowledge that can be extracted from it. Such engaging approach to the users’ consent will also be beneficial for the research community: as the relation with the user in terms of their consent expression extends beyond initial enrolment, the system proposed here makes it possible for the user to sign up for the new studies and donate their data from the other studies.

Chapter 5

Data Security

The security of the collected data, although necessary for ensuring privacy goals, is something that is not often discussed [GPA⁺10, MCR⁺10, MLF⁺08, KAB09, VGWR12, KN11, SLC⁺11]. In the next sections we illustrate how security has been addressed in the centralized frameworks and how it can be integrated in (future) distributed solutions. This is not an exhaustive list, but a compendium of techniques that can be applied for CSS frameworks, as well as attacks that one needs to consider.

5.1 Centralized architecture

The centralized architecture, where the data is collected in a single dataset, has been the preferred solution in the majority of the surveyed projects [VGWR12, URM⁺12, KHFK07, PBB09, MFGPP11, MMLP10, KBD⁺10, MLF⁺08, GPA⁺10, VGWR12, RPD⁺06, API⁺11, OWK⁺09, EP06, MCM⁺11, KO08, QC09, TCD⁺10, MKHS08, KN11]. The centralized architecture suffers from several problem. First, if the server is subject to denial-of-service attacks, it can not guarantee the availability of the service. This might result in the smartphones having to retain more information locally with consequential privacy risks. More importantly, if compromised, a single server can reveal all user data.

22 Data Security

The number of malware and viruses for mobile phones is growing. Given the amount of sensitive information present on these devices, social scientists should consider using and developing robust portable applications in order to avoid privacy thefts[AAE⁺11]. To tackle this problem, some of the studied frameworks reduce the time that the raw information collected by the sensors is kept on the phone. For example, in Darwin platform the data records are discarded once the classification task has been performed. Since most of the sensing applications use opportunist uproach to the data uploading, they might store a large amount of data temporarily on external memory [MFGPP11]. This introduces a security threat if the device does not procure an encrypted file-system by default. A possible way to tackle this problem is employing frameworks like Funf the open-source sensing platform developed for [API⁺11] and also used in theSensibleDTU study. Funf provides a reliable storing system that encrypts the files before moving them to special archives on the phone memory. An automatic process uploads the archives, keeping a tem-porary (encrypted) backup. This mitigates the risk of disclosure of information if the smartphone is lost or stolen. In such case, the last resort would be to provide a remote access to delete the data off the phone. Generally, to reduce the risks, good practice is to minimize the amount of information exchanged and avoid transmitting the raw data [MLF⁺08].

Some frameworks use default HTTP protocol to transmit data [HL04, MLF⁺08, GPA⁺10, YL10, MKHS08], other useHTTP over SSLto secure data transmission [SLC⁺11, CKK⁺08, KTC⁺08, KAB09], but pushing data through WiFi connec-tion remains the most common scenario [API⁺11, MCM⁺11, EP06, EP03, AN10, KBD⁺10, LGPA⁺12, TCD⁺10]. Event encrypted content can disclose informa-tion to malicious users, for example by observing the traffic flow: the opportunis-tic architecture of transmission and the battery constrains do not allow smart-phones to mask communication channels with dummy traffic to avoid such analysis [HL04, CKK⁺08].

When data reach the central server, is is usually stored in SQL databases (e.g.

[API⁺11, GBL06, GBL⁺02, MMLP10, MFGPP11]) which aggregate them for later analysis. We remark that in all of the surveyed frameworks, the mechanisms for access control, user authentication, and data integrity checks (where present), had been implemented for the purpose of given study. For example, in OtaSizzle “the data are stored in local servers within the university to ensure adequate security”

[KN11]. Ensuring security of the data is a very complex task. We believe that common solutions and practices are important so that the researchers do not need to worry about creating new security platforms for every new study. Finally, given the importance of the stored data, the security of the entire CSS platform (network and server) may be enhanced accordingly to defence in depth paradigm, as illustrated in the guidelines on firewalls [SH09] and intrusion detection systems [SM07] by the National Institute of Standards and Technology (NIST).

In document Privacy in (Sider 29-33)