Back to Top

 Skip navigation

Background Notes

Background Notes

CSO statistical publication, , 11am
Frontier Series Output

CSO Frontier Series outputs may use new methods which are under development and/or data sources which may be incomplete, for example new administrative data sources. Particular care must be taken when interpreting the statistics in this release.
Learn more about CSO Frontier Series outputs.

Reference period

This Frontiers in Statistics Output, Irish Family and Household Estimates from Administrative Data Source 2022, aims to estimate the number of family and household units in Ireland for April 2022. It uses pseudonymised administrative data from public sector bodies to produce experimental estimates of the families and households in Ireland and includes specific breakdowns into types of families and households.

Administrative data sources

The quality of administrative family and household counts ultimately depends on the quality, relevance, and availability of administrative data to the CSO. Note that individual records on administrative data sets were not checked or corrected as all data are pseudonymised. The purpose of this project is to demonstrate the potential for administrative data to be used to produce household and family estimates.

Data Protection

Protected Identifier Keys

Before using personal administrative data for statistical purposes, the CSO removes all identifying personal information. This includes the Personal Public Service Number (PPSN), a unique number used by people in Ireland to access social welfare benefits, personal taxation, and other public services. A pseudonymised Protected Identifier Key (PIK) is created by the CSO when the PPSN is removed. This PIK is unique and non-identifiable and is only used by the CSO.

Using the PIK enables the CSO to link and analyse data for statistical purposes, while protecting the security and confidentiality of the individual data. All records in the matched datasets are pseudonymised and the results are in the form of statistical aggregates which do not identify any individuals.

The linkage and analysis were undertaken by the Central Statistics Office (CSO) for statistical purposes in line with the Statistics Act, 1993 and the CSO Data Protocol. More information on transparency can be found on the CSO website.

CSO Policy and National Data Ecosystem

The CSO is committed to broadening the range of high-quality information it provides on societal and economic change. The large increase in the volume and nature of secondary data in recent years poses a variety of challenges and opportunities for institutes of national statistics. Joining secondary data sources in a safe manner across public service bodies, while adhering to statistical and data protection legislation, can provide new analysis and outputs to support decision-making and accountability in a way that is not possible using discrete datasets. Furthermore, a coordinated approach to data integration can lead to cost savings, greater efficiency, and a reduction in duplication.

The CSO has a formal role in coordinating the integration of statistical and administrative data across public service bodies that together make up the Irish Statistical System (ISS). Underpinning this integration is the development of a National Data Infrastructure (NDI) – a platform for linking data across the administrative system using unique identifiers for individuals, businesses, and locations. The data linking for statistical purposes is carried out by the CSO on pseudonymised datasets using only those variables which are relevant to the research being undertaken. A strong focus on data integration, which requires the collection and storage of identifiers such as PPSN and Eircodes, is a priority of the ISS in its goal of improving the analytical capacity of the system.

Data protection is a core principle of the CSO and is central to the development of the NDI. As well as the strict legal protections set out in the Statistics Act, 1993, and other existing regulations, we are committed to ensuring compliance with all data protection requirements. These include the Data Sharing and Governance Act (2019) and the General Data Protection Regulation (GDPR, EU 2016/679).

Data sources used to derive family and household estimates

Children included in Irish Family and Household Estimates using Administrative Data Sources are collected using the following data sources:

Child Benefit (CB)

The Child Benefit dataset contains information on eligible children’s benefit payments to parents/guardians. Data is supplied by the Department of Employment and Social Protection on an annual basis. The CRS Client file (see Central Records System (CRS) below) is used to identify children born in the year prior to the reference date and not yet in receipt of Child Benefit.

Primary Online Database (POD)

The Primary Online Database contains data on each student enrolled in each recognised primary school collected by the Department of Education. Data is supplied on an annual basis.

Post-Primary Online Database (P-POD)

The Post-Primary Online Database is a central database for student and some school data which is collected by the Department of Education. Data is supplied on an annual basis.

Primary Care Reimbursement Service (PCRS – GMS)

The PCRS is responsible for making payments to healthcare professionals – doctors, dentists, pharmacists, and optometrists/ophthalmologists – for the free or reduced costs services they provide to the public across a range of community health schemes. The scheme is the infrastructure through which the HSE delivers a significant proportion of Primary Care to the public. PCRS also manages the National Medical Card Unit (NMCU) which was established in 2011 to process all Medical Card and GP Visit Card applications at a national level. Data is supplied by the HSE on an annual basis.

Students included in Irish Family and Household Estimates using Administrative Data Sources are collected using the following data sources:

Higher Education Authority (HEA)

The Higher Education Authority data provides details on annual enrolments and graduations from the publicly funded universities and institutes of technology in Ireland. Data is supplied by the HEA on an annual basis.

Programme Learner Support System (PLSS)

The Programme Learner Support System is used to manage course information, learner records and reporting by SOLAS (an tSeirbhís Oideachais Leanúnaigh agus Scileanna). Solas is the Further Education and Training Authority. They provide a clear, integrated pathway for learners seeking to enrol in Further Education and Training. Data is supplied by SOLAS on an annual basis.

Quality and Qualifications Ireland (QQI)

Quality and Qualifications Ireland is an amalgamation of the previously operational Further Education and Training Awards Council (FETAC); the Higher Education and Training Awards Council (HETAC); the Irish Universities Quality Board (IUQB) and the National Qualifications Authority of Ireland (NQAI). Data is supplied on an annual basis.

Student Universal Support Ireland (SUSI)

Student Universal Support Ireland contains funding information for all higher and further education grants. SUSI offers funding to eligible students in approved full-time, third-level education. Data is supplied on an annual basis.

HEA Springboard

HEA Springboard and ICT provides information on students who have undertaken HEA springboard or ICT courses. This data includes course details and basic demographic information for enrolled students. Data is supplied by the HEA on an annual basis.

Employees, pensioners, and persons in receipt of welfare payments included in Irish Family and Household Estimates using Administrative Data Sources are collected using the following data sources:

DSP Payments (DSP)

Department of Social Protection’s database (real-time) from the Business Object Model implementation (BOMi) and Integrated Short-Term Payments System (ISTS) contains information on welfare payments, including state pension, unemployment benefit and child benefit (adults only). Data is supplied monthly.

PAYE Modernisation (PMOD)

The Revenue Commissioners’ PAYE Modernisation (PMOD) dataset contains information on payslip submissions of persons in employment and on occupational pensions from 2019 onwards. Data is supplied monthly.

Self-employed persons included in Irish Family and Household Estimates using Administrative Data Sources are collected using the following data sources:

Form 11 Income Tax returns (ITForm11)

The ITForm11 contains the annual income tax returns of the self-employed. Data for a calendar year is only complete three years after the reference year, because of the nature of self-assessment, although the majority of records are available about 14 months after the reference year.

Linkage across multiple administrative datasets through a pseudonymised version of the PPSN used as a unique identifier ensures that persons who appear in more than one data source will be counted only once in the population. 

Data sources used to assign geography and other attribute variables include:

Residential Tenancies Board (RTB) Register

The Residential Tenancies Board register contains information on all tenancies registered by landlords, both private and Approved Housing Bodies (AHB). Data is supplied by the RTB on a quarterly basis.

Local Property Tax (LPT)

The LPT file contains one record - the most recent LPT return - for each of the properties in the State. A local property tax return is not an indicator of activity but used to determine location (among other sources). Data is supplied by the Revenue Commissioner on an annual basis.

Central Records System (CRS)

The Central Records System is a legacy system within the Department of Social Protection (DSP) which holds data on their customers held on different systems within DSP. Data from the CRS used in this analysis includes information on age, sex, address, nationality, and relationships (for example dependent children and marital status). Data is supplied by the DSP on a quarterly basis.

Address data from the HEA dataset records students' usual place of residence outside term time and is used to assign up to date off campus geography data for students.

Note: some individuals that may be administratively active may not live in the State, for example professionals commuting to work from Northern Ireland or individuals living abroad and receiving a state pension. Persons are excluded from the population count where indicators for usually resident outside Ireland are available from the administrative data sources listed above.

Geography

Information about where people live is contained in many of the administrative data sources used to derive the estimated population count. The quality of this administrative geographic location varies between different administrative data sources and very much depends on the coverage and accuracy of address information on each dataset. Where available, geocoded location data was first sourced from the RTB, LPT, HSE, PUP, SUSI, and HEA datasets. If good quality geocoded location data was not available on these datasets, then data from the CRS was used.

The presence of an accurate Eircode with an address on administrative data sources significantly enhances its statistical value for the purposes of releases like this. It facilitates data linkage and more accurate spatial analysis.

Over 90% of the person records in this release had Eircodes associated with them in the administrative data sets from which they were sourced. For the remaining records, the address data in administrative data sources was matched against the national address database to add an Eircode. This database contains the geographical details of over 2.3 million residential and commercial address points across the State. When this was not possible, primarily in the case of non-unique addresses, a Small Area code was added to the record in lieu of an Eircode.

Improved coverage of up-to date Eircodes on the administrative data sets received by the CSO will facilitate the production more robust household and family statistics. See also Methodology.

Data availability

The data sources selected for this analysis were selected based on their capacity to meet the following criteria;

  • Timeliness and accessibility – the ease at which the CSO can obtain the data flow and understand the relevance and any limitations of the data and the variables collected in the data source.
  • Coherence – the ease at which the data source can be combined with other data sources.
  • Coverage – how does the data source contribute to various cohorts of the population?

The administrative data sources contributing to this project vary in respect of the criteria listed here. The administrative data landscape is constantly developing. The CSO will continue to assess the quality and availability of existing and new data sources as they become available for inclusion and further development of this project.