The IDA protocol - A P2P Backup System for Small and Medium-sized Enterprises

and mathematical operations over the data when comparing to other data types, but underperforms them. It has also the advantage of allowing to represent a high number of bytes by a single integer. This feature favors the multiplication of several bytes in a single operation. And that represents that less operations need to be performed when compared to a byte-per-byte computation.

4.4.1 Algorithm re-Design

The organization and structure of the algorithm cannot be changed, but how some operations are performed or the data type that is used can be altered in order to improve the algorithm performance.

In the core of the IDA algorithm there is a matrix multiplication. The current implementation of it uses a standard algorithm which, when multiplying two squaredn×nmatrices, it performsO(n³) operations. In particular, it performs eight recursive matrix multiplication calls and four matrix additions.

However, there are clever ways of multiplying matrices. Strassen’s [28] pro-poses an algorithm that reduces the number of recursive matrix multiplications calls, one less in the same example as above, but at a cost of 18 matrix ad-ditions and subtractions. In this way, Strassen’s algorithm performs O(n^{lg 7}) operations. Winograd’s [13] variant of Strassen’s algorithm reduces to 15 the number of matrix additions and subtractions, by reusing common subexpres-sions, but keeps the number of multiplications. Any of these two algorithms can be implemented in the core of the IDA algorithm, slightly improving in this way the performance of the operation, since additions and subtractions outperform multiplication calls.

Another proposed improvement to the previous implementation is the change of data type that is used. Instead of the BigInteger, use byte arrays. In terms of data representation of a set of bytes, it can be used multidimensional arrays of bytes, but there will still exist the drawback of having to perform byte-per-byte computations. However, the primitive data type byte-per-byte outperforms the BigInteger, which if implemented in a way that the data can be loaded at once to the computation, there will be a performance gain.

To perform matrix operations, the algorithm uses an auxiliary classMatrix, which holds matrix specific methods. This represents a performance problem in a way that for every multiplication of two matrices, they first need to be converted to a matrix structure. Even though this operation may not be that expensive, if the file to disperse is long, then there will be several matrices to use theMatrix class, and therefore it represents some computation that could be avoided. To improve it, the matrix multiplication is implemented directly in the algorithm, in a way that it does not require to make a call to the Matrix class.

4.4 The IDA protocol 55

4.4.2 Protocol re-Design

Regarding the IDA protocol, the changes pass by re-structuring it to a similar protocol structure as the one designed for the AON operations, and eliminating the send of shares to all sharing peers phase as soon as they are ready.

Figure 4.12: The new IDA backup protocol

The overall of the structure is maintained, as it is presented in Figure4.12, in the way that there exists a component reading the input file, now named fileReader, and another to collect the data shares that are processed in the IDA encoder/decoder, namedcollector. These two components, together with the idaEngine class replace the previous idaFileWorker and idaWorker. In this way, the data shares are computed but not sent right away to their destinations.

They are gathered, grouped and stored in the hard disk and only sent to all sharing peers at the end of the process.

This change will save time that is being spent to send several small data pieces to all sharing peers, by just sending one final file at the end of the process.

And it will still prevent out-of-memory problems while processing long files, since only part of the file is loaded to memory at each moment.

With the introduction of theshareinterface, and in oder to make the IDA sharings agree with the common sharings interface, they need to be re-arranged.

Thus, it is introduced a single sharing structure, theIDAsharing, which replaces the previous receipt and sharingIDA. Also, the IDAshare is not longer nec-essary since the share data is sent in one package at the end of the process.

TheIDAsharing, like the other sharing structures in the application, holds the sharing data that is common to all sharing peers, a list of the sharing peers and IDA-specific information, which depends in who is the recipient of the share.

Master peers will have a share with a complete information of the backup, that is, including the vector key used in the IDA operation and the key that encrypted

the original file. Normal peers will receive a share with no backup-private in-formation.

Chapter 5

Implementation

This chapter describes the implementation of the extension to the application prototype. It does not refer to already implemented components in previous versions, for that please review Chapter 3. Here, the main focus is to describe how the new AON algorithm and its data structures are implemented. It is also described the changes made to the IDA protocol and, in a general way, to the structure of the application.

In document A P2P Backup System for Small and Medium-sized Enterprises (Sider 65-69)