5.3 D ATA C OLLECTION AND P ROCESSING
5.3.3 Creation of the Final Dataset
In the third phase, we merge the three primary databases in the data analysis software Stata, based on the previously performed manual steps. This creates the final dataset, which we will use for our empirical
CVC Unit Parent Assignee Name Assignee ID
BMW i Ventures BAYER MOTOREN WERKE AG BMW Rolls-Royce GmbH f0f8dc951e2e94716b1645e1ea39366a BMW i Ventures BAYER MOTOREN WERKE AG Bayerische Motoren Werke AG 2e9d43cf73603ed007ccdba09c877741 BMW i Ventures BAYER MOTOREN WERKE AG Bayerische Motoren Werke Aktiengesellschaft 8ebde89594918b97d22deb20e79421c4 BMW Technologies Inc BAYER MOTOREN WERKE AG BMW Rolls-Royce GmbH f0f8dc951e2e94716b1645e1ea39366a BMW Technologies Inc BAYER MOTOREN WERKE AG Bayerische Motoren Werke AG 2e9d43cf73603ed007ccdba09c877741 BMW Technologies Inc BAYER MOTOREN WERKE AG Bayerische Motoren Werke Aktiengesellschaft 8ebde89594918b97d22deb20e79421c4
analysis. We first use the CVC unit’s name to retrieve the associated granular investment data from Thomson One Banker. As our structural variable remains unchanged over time for the individual programs, we decide to gather cross-sectional data (Saunders et al., 2016) by aggregating the information on the investor’s level over the investment period. Since some corporations pursue multiple CVC activities which can even overlap in time, we create one entry for each of the parent’s investment activities. We obtain the following list of investment related information from Thomson One Banker (Table 3).
Table 3 List of investment information derived from Thomson One Banker
In the next step, we derive the relevant data from Compustat. As already mentioned, Compustat distinguishes between public corporation listed in North America and globally. Therefore, we start by separately merging the CVC program, the parent organization, as well as the minimum and maximum investment year with the Compustat data for North America and global by using the allocated parent’s
The minimum year of investment The maximum year of investment
The investor’s experience as the period between the first and last investment The number of different startups
The number of total investments
The total amount of invested equity in USD The equity amount invested per investment The equity amount invested per startup
The number of startups with differing SIC codes from the parent organization The number of startups in differing nations from the parent organization The number of investments in early stage
The average number of funding round
The average age of the startups at the time of investment
name. Subsequently, we define the time period of interest by dropping all data before and after the respective investment period of the CVC unit. We want to underline that the investing period and the period with available financial data need to overlap for at least one year to be able to proceed. Since the data availability of financial information is not given for 107 entries of the CVC investors, we continue with 637 CVC programs. We then calculate the average numbers of the various financial information across the investment period on the CVC investor level. An overview of the derived variables can be found in Table 4.
Table 4 List of financial information derived from Compustat Average total assets per year
Average. capital expenditures per year Average common equity per year Average costs of goods sold per year Average total long-term debt per year Average dividends per year
Average earnings before interest and taxes per year
Average earnings before interest, taxes, depreciation and amortization per year Average number of employees per year
Average gross profit or loss per year Average intangible assets per year Average inventories per year Average liabilities per year
Average net income or loss per year Average revenue per year
Average research and development expenses per year Industry classification as SIC code
In the last step, we merge the patent data from PatentsView with the previously created investment and financial data. We retrieve various information on the patents granted between 1985 and 2015 (Table 5).
Particularly, we focus on the total number of utility patents applied for by the parent organization within the investment period of the CVC unit and between the last investment year and 2015 including the number of associated claims and forward citations. As we have previously described, our manual search enables us to unambiguously match the CVC unit and the parent organization with the associated patent assignees. At first, we therefore count the number of patents and take the respective sum of claims and forward citations by assignee ID and application year. We then connect the generated patent data with the respective assignee ID from our dataset. Due to multiple assignee IDs per parent organization, we subsequently aggregate the number of patents, claims, and forward citations of all associated assignees to the parent organization’s level by the application year. Lastly, we take the sum of all patents, claims, and forward citations over the investment period and calculate the same variables for the post-investment period until 2015. As already remarked, 135 parent corporations did not generate any patents in these periods. Therefore, we set the respective value to zero. Moreover, 160 programs, 85 external and 75 internal, pursued their last investment in 2015. Since this poses the last year of data availability, we cannot assign any patents for the period after the last investment was made. Even though those entries entail many relevant CVC investors, we decide to exclude them from our data due to misleading information in the final empirical model.
Our final cross-sectional dataset consists of 477 CVC programs of which 367 are internally and 110 are externally structured. It entails the name of the CVC investor, its structure, investment data from Thomson One Banker, financial data from Compustat, and patent data from PatentsView.
Table 5 List of Patent Data derived from PatentsView