• Ingen resultater fundet

4. Methodology 31

4.5. Data Understanding

Next step in the CRISP-DM model is the data understanding part. First, we describe the initial data collection, explore the used data sets, including their data quality, and discuss our data limitation.

4.5.1. Data Collection

Through several interviews with different teams within the TMF of the City of Copenhagen, as more precisely described in subsection 3.1.1, we identified three data sources containing data about road maintenance: RoSy, PUMA and Giv et praj. While RoSy and Giv et praj are developed by Sweco, an international company that focuses on planning and designing sustainable communities and cities, PUMA is a software internally developed by the City of Copenhagen. The systems are explained more in detail in subsection 3.1.2. The respective departments exported the required data as a comma-separated value file (CSV-file). Most of the original data is either numerical or in Danish and have been translated by the city or by us. Previews of the data can be found in appendix A.2.

4.5.2. Data Description

RoSy

After multiple iterations of data gathering, the provided data from the RoSy team consists of six files.

• the current condition of the streets of Copenhagen, including 2918 rows with 25 columns where a few road keys and geodata values are missing

• the historical damages of those streets, including 17152 rows with 23 features, missing construktionsname and acute damages from the current condition CSV. This dataset is missing geodata, district-, road class and road status data

• the traffic data for the different street parts, a complete dataset including 2562 rows and eight features

• the pavement data for all layers, including 8524 rows with nine features

• the latest longlist from RoSy which states the worst streets in Copenhagen in 606 rows and eight features,

• and information regarding product lifetime and prices which includes tables about costs per material, the width of the different streets and predicted lifetime per material The current condition and the historical damages data sets include the same variables besides "Acute_Damage"and "construktionsname", which is unique in the current condition file:

• wkt_geom: geometrical data for each street part, saved as a line string to describe the exact geolocation inEPSG:25832format

• MI_PRINX: a unique identifier for each line in the dataset

• District: contains the district of the street part

• Roadkey: supposed to be a unique key to identify the road. However, through previous data preparation by the city, the road key is saved in an unusable format

• road name: describes the road name of the street part

• Lane: encoded integer value which describes different lanes (right lane, left lane, ...)

• FromChainage: defines the start of the street part

• ToChainage: defines the end of the street part

• Update: contains the timestamp of the inspection

• Different damages: See chapter 3.1.3 for more detailed information

• ResidualLifetime: predicted year (by the RoSy program), based on RoSy data, when the street part needs complete restoration

• road class: describes what kind of type the road is

• construktionsname: similar to road class, it describes the type of the road. Based on the construktionsname, the area and the costs for renovating the street part can be calculated (together with a cost table)

• Acute_damage: either 0 or 1. Rudimentary gives information if the street part has acute damage or not

• road status: contains information about who owns the street part, if it is private or part of the municipality

The traffic data contains further information about the approximated daily and heavy traffic.

In the pavement data, the different layers, including their thickness and the used product are described together with the year the layer got laid out. The latest longlist from RoSy is provided to get a better understanding of the current processes in the department. The document about product lifetime and prices includes tables with respective information. It enables a cost-based calculation for repairing the street parts.

Giv et praj

Similar to RoSy, Giv et praj is a third-party tool. Users can manually report issues regard-ing various concerns, includregard-ing street damages. After talkregard-ing to the responsible persons for the system, the data we received are manually reported potholes from Copenhagen, including their geolocations. The utilized dataset consists of 34382 rows with the following columns/variables:

• the X coordinate of the damage inEPSG:25832format,

• the Y coordinate of the damage inEPSG:25832format,

• the type of the issue,

• the type of road where the damage occurred

• and the date when the damage was repaired (if applicable)

PUMA

The internally developed PUMA system contains data about acute road damages. The received data includes 34382 rows and ten features:

• the case number that works as a unique identifier for each damage

• the case status that reports if the case is open or done

• the task path, which consists of predefined options of which the person who enters the data must choose. The type of road and the type of damage are described by this attribute

• the task status, which is according to interviews not necessarily correct (task status can be open although the case status is closed)

• the date when the damage was reported

• the date when the damage was fixed (if applicable)

• the district where the damage is

• the latitude coordinate of the damage in the more commonEPSG:4326format

• the longitude coordinate of the damage in the more commonEPSG:4326format

• the priority/severity of the damage 4.5.3. Data Limitation

Since the owner of the data is the City of Copenhagen, the data is limited to what they agreed to provide us. Detailed information about the date of street renewals is missing. The received data is manually entered and partly incomplete. Thus, we had to make many assumptions.

Due to limited time and resources, the thesis focuses on road maintenance. We decided to exclude any other information about bike lanes, sidewalks, fountains or similar things. For the reparation price, we simplified the calculation by assuming that renewing different streets cost the same, besides paved roads which have a separate price per square meter.

In order to get a better understanding of the current road lifetime calculation, we requested more information about RoSy from Sweco, which they refused to support.