Back to Top

 Skip navigation

Background Notes

Background Notes

Methodology for the Generation of Faster Transport Indicators

Today, there is a greater demand to produce more timely official statistics at a more granular level. The CSO is looking to new and novel data sources to meet this demand. For the CSO, ‘Big Data’, such as the Automatic Identification System (AIS) and Transport Infrastructure Ireland data, represents an innovative opportunity to generate experimental maritime and traffic volume statistics.

Traffic Count Analysis Using TII Data

About TII Data

Transport Infrastructure Ireland (TII) have over 300 active Traffic Monitoring Units (TMU’s) around the country that record the volume of traffic by hour of day and vehicle class. Vehicles are counted when they pass over loops embedded in the road surface.

The hourly aggregated counts are provided daily to the CSO by TII via an API where they are uploaded and stored in the CSO’s data hub. The daily data files consisted of seven variables. In addition to the hour, day, month and year, the data includes the hourly vehicle count, the vehicle classification, and a unique identifier information for each TMU. The unique identifier can be used to identify the TMU location and description.

Data Quality Overview

Given the velocity and volume of the TII data, it falls under the classification of Big Data.

The TMU’s recording data for the TII unfortunately lack a storage capacity which is typically found in Motorway Incident Detection and Automatic Signalling (MIDAS) sensors. Given a lack of storage capacity, when a TMU is temporarily out of operation it results in a loss of information or traffic count. The leading causes of temporary inactivity are typically software updates, roadworks or where a sensor has failed validation checks. In the case of TMU’s when any of the following faults or outages are present there will be gaps present in the data until the issue is resolved.

A major aspect when dealing with this TII traffic count data is identifying if any rises or falls in recorded traffic is valid and not because of a TMU failure. The importance of this aspect strongly correlates with spatial resolution. Inversely, as the spatial resolution decreases due to an increase in the geographical area investigated, aggregated values over several counters allows for an increased margin of error.

Although the TII TMU’s are quite comprehensive in their coverage it should be noted that there may be location bias in the areas in which they are operational. TMU’s tend to be clustered in more densely populated areas where greater traffic volumes are expected.

Imputation Of Missing Data

While the TII traffic count data can help to improve the periodicity and timeliness of traffic count statistics it is not a simple matter of aggregating weekly or daily aggregates. Among the challenges faced are the quality and completeness of the data sets. For example, gaps can occur during the recording of traffic counts due to inactive TMU’s as a result of road works or technical difficulties. A major component of this work is the preparation of the data and the generation of methodologies to deal with missing data.

Missing data for each TMU and each day are imputed based on the last available value for that day. For example, if Monday 08/08/22 was missing then the traffic count based on the previous Monday (01/08/22) was imputed.

Ship Arrivals Based On AIS Data

About AIS Data

Automatic Identification System (AIS) data are supplied by the Task Team on AIS Data of the UN Committee of Experts on Big data and Data Science for Official Statistics (Task Team on AIS Data — UN-CEBD), and accessed through the UN Global Platform — UN-CEBD (UNGP) which holds a global repository of live and archived AIS data. This data is provided by ExactEarth who combine their own satellite data with terrestrial data from FleetMon. In addition to location, bearing and navigation status, the AIS data includes unique identifier information on International Maritime Organisation (IMO) number and Maritime Mobile Service Identity (MMSI) number.

The UNGP not only holds AIS data but also the IHS Shipping Registry. Incorporating SeaWeb and Lloyd’s Register of Ships (published since 1764), the IHS Shipping Registry provides detailed information on all self-propelled and seagoing merchant ships. Among the information included in the registry are IMO number, MMSI number, ship name, ship type, cargo type, ownership, registration, tonnage, dimensions, and propulsion.

AIS and H3 Cells

AIS data on the UN Global Platform (UNGP) is improved by adding a spatial index to each AIS message record (see Figure 1). This spatial index system, known as H3 and initially developed by Uber, includes sixteen resolution levels, comprising cells that cover the Earth's surface with corresponding H3 Index values. H3 cells are geometric or geographic unit polygons, typically hexagons or pentagons, and an H3 Index represents an H3 object. This hierarchical geospatial index system organizes cells by spatial hierarchy, with each hexagonal cell having seven child cells beneath it, up to the maximum supported resolution. The main benefit of the H3 indexing systems is any position on the earth can be assigned to any AIS message.

H3 Cells at resolution 10 in the area Waterford docks with a single H3 Cell at resolution 9.  This is an example of the hierarchical geospatial index and seven child cells in this hierarchy. 

 

 

Port Polygons

Port polygons define the space for port visits and link visits to specific ports. These polygons also serve as a spatial filter to limit the amount of AIS data processing. Port polygons do not overlap and are created using GIS software. It is important to note the port polygons do not represent an area corresponding to port activities but rather an area that contains the port. Data reduction is crucial when dealing with Big Data, so Bounding Box Buffers (BBB) were established around the port polygons to restrict data analysis to this area. The BBB geometry reduced the amount of data to be processed by about 80%.

Stationary Marine Broadcast Method

Arrivals of ships are calculated using the Stationary Marine Broadcast Method (SMBM).

The Stationary Marine Broadcast Method, developed by the CSO, relies on the concept that ships are stationary for a time during loading and unloading. When a ship remains stationary within a port polygon for an extended period, it is likely a ship visit. This method uses the H3 Indexes discussed above. Once a ship is identified as stationary (a triggering event), an escape condition is needed to detect movement. H3 Indexes make this escape condition an attribute comparison instead of a geographic distance calculation. The escape condition checks if the test record's H3 Index is outside a defined neighbourhood of the triggering event, determined by the H3 k-ring (griddisk) function. "k-depth" refers to the number of rings used to define this neighbourhood.

Table 1: K-depth and number of cells for k-ring
K depthNumber cells in k-ring
01
17
219
337
461
591
6127

This function involves two hyper-parameters: the k-depth for the k-ring and the H3 resolution. A level 10 H3 Index resolution is used, with cells approximately 150m wide, matching the ship sizes under study. A k-depth of three is chosen to consider position reporting errors, minor shifts due to tide and current, and small operational movements of the ship. In Ireland, a k-depth of three with a resolution of 10 corresponds to an area of nearly one square kilometre. 

Data Processing

The data processing for the SMBM consists of the following steps:

  1. Geometry creation: First, AIS records are filtered on H3 indices of resolution 2, and then point geometries are created from those AIS records that sit within a defined space around Ireland, called the “Irish Box”.

  2. Data reduction: AIS data points within the Irish Box are further refined to focus on AIS messages within the study area, reducing the data volume significantly.

  3. Adding sorting variable: Timestamps in AIS observations are converted to UNIX timestamp format, enabling natural time sorting and time difference calculations.

  4. Create study area ship list: A list of distinct ships is generated based on ship MMSI numbers, and AIS records are sorted by UNIX timestamp.

  5. Identify stopped ships: The algorithm identifies "stopped ships" based on speed information in AIS records.

  6. Iterate until escape condition met: If a speed of zero is detected, then the algorithm triggers and records it as a “reference” variable for testing. The immediately previous record is also marked as a “prior” record.  The H3 index of the “reference” is then used to define a set of neighbourhoods of H3 Indexes known as a k-ring. The next record for the ship is examined and the value of its H3 index tested against the k-ring set.
    If it is in the k-ring, then the ship is regarded as being sufficiently close enough in location and the record count for the stopped ship is incremented by one value. This record now becomes the new “previous” and the next record for the ship is tested against it.
    If it is not in that k-ring, then the AIS record as met an “escape” condition and is sufficiently far away from its stopped location to be regarded as being in a new position.

  7. Calculate upper and lower stopped ship from stored values: The UNIX times are available for four records: the “reference” record itself, its corresponding “prior” record, the “previous” record prior to the escape condition record; and the “escape” record itself.

    This allows us to make an upper and lower time estimate of the time the ship was stopped:

    Upper Time Estimate = Uescape - Uprior

    Lower Time Estimate = Uprevious - Ureference

  8. Create stopped ship record: The above steps add additional information to each “stopped ship” record:

            record count value, this is the number of iterations in step 6 above and is the number of records checked in meeting the escape condition.

            the upper and lower time estimates of stopped duration.

  9. Recommence identification of stopped ship: The process is repeated for each ship, generating a set of records for stopped ship events.

  10. Link stopped ship to ship register: AIS data is linked to the IHS ship register data using IMO and MMSI numbers. Any records that could not be matched are not counted.
    Vessel types are classified based on the IHS Shipping Registry. The following vessel types are covered:

            Liquid bulk vessels
            Dry bulk vessels
            Container vessels
            Specialised vessels
            General cargo vessels
            Passenger vessels

  11. Select stopped ship points within port area: Stopped ships within port polygons are identified as potential ship visits.

  12. Exclusion of Invalid Visits: Ship visits with fewer than 5 AIS observations and have spent below a certain amount of time in port are excluded. It is assumed that visits with such a small number of observations in such a short time are not true stops in port. This lower time limit is calculated based on “time_upper”. For each ship type below:

            Ro-Ro Cargo
            Not Ro-Ro Cargo
            Container
            Dry bulk
            Liquid bulk

    The distribution of estimated visit lengths (measured by time_upper) is plotted for each month. Ships that have a visit time below the 5th percentile are examined. If these ships also have fewer than 5 observations, they are marked as invalid and excluded from the count.

  13. Filter on relevant ship types: Only cargo vessels, car ferries, and other passenger vessels are considered for analysis, aligning with Eurostat Maritime Transport definitions. The following vessels are excluded:

            Fish-catching vessels
            Fish-processing vessels
            Vessels for drilling and exploration
            Tugs
            Pusher craft
            Research and survey vessels
            Dredgers
            Naval vessels
            Vessels used solely for non-commercial purposes

    For example: those two specific ferries were excluded:

            Carrigaloe (IMO: 7028386)
            Glenbrook (IMO: 7101607)
    These are both cross-river ferries so are not of interest to our goal of measuring port visits.

  14. Preventing double count of port visits:
    The AIS port visit data is analysed on a monthly basis. Therefore, there is a risk that a ship will enter port on the last day of the month and leave on the first day of the next month. We do not want these visits to be counted twice, so each month the past month’s data is checked to find any repeated visits. Then the port visit is only recorded in the month in which it entered port.
  15. Coverage: The Dashboard summarises the results of seven main Irish ports areas:
            1. Bantry Bay Maritime Area
            2. Drogheda Maritime Area
            3. Dublin Maritime Area
            4. Shannon Foynes Maritime Area
            5. Cork Maritime Area
            6. Rosslare Maritime Area
            7. Waterford Maritime Area

Comparison between AIS data and Official Statistics

The AIS based data source is relatively new. A significant benefit of this data source is that there is one global standard, providing for the opportunity of easily compiled coherent and standardised maritime statistics. The AIS based source, like many new and novel data sources, enables greater temporal granularity. International standards are currently being developed based on experiences using this data in the UNGP and other projects. Current CSO analysis does show up differences between the AIS based estimates and official statistics – the AIS based estimates tend to be little higher. Official statistics are compiled from summary data provided by each port authority on a quarterly basis and are published in the CSO Statistics of Port Traffic release. There are two possible reasons (or a combination there of) for the higher estimate in AIS based statistics. The first reason is that the difference could be explained by a higher coverage in the AIS based estimates. The second reason is that there may be an element of double counting included in the AIS based estimation procedure indicating over-coverage errors with the AIS based estimates. In all likelihood the differences are probably explained by some combination of these error types.

The CSO continues to develop and produce experimental statistics for port traffic in parallel with the official statistics. This will allow monitoring of how both sets of statistics behave over a greater time scale. The experimental statistics are compiled at a greater granularity (monthly