Back to Top

 Skip navigation

Methodology and Data Sources

Methodology and Data Sources

CSO statistical publication, , 11am

The analysis in this research paper is based on matching the individual characteristics of respondents to the CSO’s Labour Force Survey (LFS) with corresponding earnings data from the Earnings Analysis using Administrative Data Sources (EAADS).

This is a continuation of the approach taken for the 2011-2014 and 2015-2018 analysis. This approach was taken in the absence of the National Employment Survey (NES), as the CSO sought an alternative source of data which would provide information on the earnings of employees in both the public and private sectors. The LFS provided a consistent source of information on the individual attributes of the employees surveyed, and it was linked to the EAADS to provide information on earnings for each individual employee.

Summary of Methodology used

Summary of Methodology used

Individual characteristics from LFS are paired with Earnings characteristics from EAADS to create a matched LFS/EAADS file. From this a subset of this matched file for permanent, full-time employees aged 25-59 years was created.

Data Sources

EAADS Data

Earnings data was taken from the PAYE Modernisation (PMOD) dataset used to compile the CSO’s publication Earnings Analysis from Administrative Data Sources (EAADS) which provides analysis of earnings data for PAYE individuals for the period 2019 to 2022. The relevant variables used are:

  • CSOPPSN
  • Gross Annual Earnings
  • Weeks worked
  • Weekly Earnings
  • Public-Private sector status
  • NACE Principal Business Activity
  • Firm Size – based on Enterprise

When creating the EAADS dataset a number of records were removed from the analysis file based on the criteria below:

  • Instances where individual employments earned less than €500 per annum
  • Employments where the duration was less than two weeks in the year
  • Instances of employments with extremely high and low earnings
  • Employments with missing employer and employee reference numbers
  • Employments with activity in NACE sectors A (Agriculture), T (Household Activities) and U (Activities of Extra-Territorial Organisations) 

As some individuals had multiple employments across more than one sector/occupation, it was necessary to identify their principal employment – this was done by selecting the employment with the highest annual earnings on the EAADS file. The impact of this is that in the matching process for 2022, for example, a total of approximately 329,566 non primary employments were dropped from the EAADS file (2.97 million employments). These other employments were mainly in the Wholesale & Retail sector, the Health sector and Administrative & Support Services sector (approximately 46,265, 41,149 and 40,150 employments respectively). Also, approximately 20,038 secondary employments were dropped from the Education sector representing instances where employees in this sector receive small additional incomes in the course of teaching duties.

Labour Force Survey (LFS) Data

Quarterly data from the LFS was combined to create an annual pooled dataset for each year for the period 2019 to 2022. The dataset only contains persons who are in employment and have no missing values for the variables listed below. Only one record of employment per person is taken.

The following variables were used in order to create a file containing the relevant employee characteristics for matching with the EAADS data:

  • CSOPPSN
  • Gender
  • Citizenship
  • Age
  • Full-time/Part-time status
  • Supervisor status
  • Temporary/Permanent status
  • Usual Hours worked
  • Overtime Hours
  • Length of service with current employer
  • Union Membership Status
  • Occupation (UK SOC2010) Highest level of education
  • Grossing Factor

Matching Process

The CSOPPSN was used as the common identifier between both the LFS and EAADS data. The matched LFS dataset contains the following variables:

Identifier Dataset
CSOPPSN EAADS/LFS
Gender LFS
Public-Private sector status EAADS
NACE Principal Business Activity EAADS
Age LFS
Nationality/ Citizenship LFS
Gross Annual Earnings EAADS
Weeks Worked EAADS
Weekly Earnings EAADS
Supervisor status LFS
Full-time/Part-time status LFS
Temporary/Permanent status LFS
Usual Hours worked LFS
Overtime Hours LFS
Length of service with current employer LFS
Union Membership Status LFS
Occupation (UK SOC 10) LFS
Grossing Factor LFS
Highest level of education LFS
Firm Size class (1-99 & 100 +) EAADS

Grossing & Calibration

The LFS grossing factor was calibrated to the EAADS population using parameters for both:

  • Gender, Public-Private sector status and Age class
  • Gender and NACE Sector

Additional Superannuation Contribution Deducted from Gross Pay - Quantitative Analysis

The public sector pension-related deduction (known as the pension levy) was introduced with effect from 1st March 2009 via the Financial Emergency Measures in the Public Interest Act 2009, which was originally enacted by the Oireachtas in February 2009. The rates and bands have been adjusted on several occasions since they were introduced.

In 2019, the pension levy transitioned to the additional superannuation contribution (ASC) which became a permanent fixture under the Public Service Pay & Pensions Act 2017 (“the Act”). The ASC is deducted only from pensionable income which does not include overtime, allowances or bonuses etc. Similar to the pension levy the ASC is subject to adjustment on the rates and bands.

The results of these analyses contained in this report are presented both with and without the public sector additional superannuation contribution.

Firm Size

A determining factor in the public-private pay differential is firm size. In the previous two iterations of this paper 2011-2014 and 2015-2018, firm size at the local unit was included in the study. However, a better data source for firm size has become available in recent years at the enterprise level. As wages are generally set at the enterprise level the transition to firm size at the enterprise level from the local unit seemed appropriate. This latest edition of the public-private sector pay differential constitutes a break in series from the previous two iterations. This data has been backdated back to 2015 and is included in this paper.

Results are presented both with and without size of enterprise. A table with the local unit figures is available in the Appendix chapter.

Methods used for analysis

The two methods used in this analysis are:

  • Ordinary least square regression (OLS)
  • Quantile regression

In keeping with other published analysis examining the public-private sector pay differential (including previous analysis of NES data), the models used in this analysis concentrate on permanent, full-time employees aged between 25 and 59 years.

OLS regression

An ordinary least square (OLS) regression was used to model the natural log of weekly earnings on a set of explanatory variables that account for some of the variations in earnings. Details of the OLS methodology are available in Detailed OLS Results chapter. This standard OLS model is widely used in the analysis of gender and public-private wage gaps in both the national and international literature. The approach adopted in this report is similar to that used in Belman and Heywood (2004) and used the following explanatory variables:

  • Occupation
  • Educational attainment
  • Gender
  • Public or Private sector
  • Nationality/ Citizenship
  • Membership of a trade union
  • Age
  • Age-squared*
  • Size of enterprise
  • Length of service with current employer
  • Log of overtime hours worked
  • Log of hours worked
  • Supervisory status

* Age-squared was used as an explanatory variable to capture the non-linear relationship between earnings and age.

The approach is sometimes referred to as a hybrid approach (Belman and Heywood (1996), Bender and Elliott (2007)) in that it accounts both for differences in the characteristics of the employees in the two sectors, and for differences in the characteristics of the workplace. Models both including and excluding size of the enterprise as an explanatory variable were considered in this analysis.

Quantile Regression

OLS regression is limited in the information that it can provide about earnings as it only estimates average earnings corresponding to the various explanatory variables. Quantile regression is used when an estimate at various points in the distribution is required (quantiles or percentiles) rather than simply estimating the mean. It is widely used in the literature on the public-private sector wage gap as it allows us to examine how the public sector differential varies across the earnings distribution.