GESLA Version 3: A major update to the global higher-frequency sea-level dataset

This paper describes a major update to the quasi-global, higher-frequency sea-level dataset known as GESLA (Global Extreme Sea Level Analysis). Versions 1 (released 2009) and 2 (released 2016) of the dataset have been used in many published studies, across a wide range of oceanographic and coastal engineering-related investigations concerned with evaluating tides, storm surges, extreme


| INTRODUCTION
Having access to high-quality sea-level measurements worldwide is vital for many oceanographic and coastal applications.For example, sea-level records form the basis of our understanding of changes in mean sea level, which affects the livelihoods of hundreds of millions of people living in the world's coastal regions and is one of the key indicators of climate change (Oppenheimer et al., 2019).Coastal sea-level extremes are among the costliest and potentially most hazardous impacts affecting densely populated coastal regions (Wong et al., 2014).Analyses of sea-level records help engineers and coastal managers define flood defence heights and other coastal protection measures.Measurements of sea level are used to map the timing and heights of astronomical tides and calibrate and validate both operational and scientific numerical models of oceanic processes (Muis et al., 2020).Furthermore, coastal sea-level measurements form a key component of the data used in nautical charts and geodetic surveys, and influence legal definitions of shoreline boundaries (Shalowitz, 1962).Building on an earlier study (Woodworth et al., 2017), this paper is concerned with extending a global dataset of higher-frequency (at least hourly) sea-level records from tide gauges at as many locations as possible worldwide.

entific numeric
l models of oceanic processes (Muis et al., 2020).Furthermore, coastal sea-level measurements form a key component of the data used in nautical charts and geodetic surveys, and influence legal definitions of shoreline boundaries (Shalowitz, 1962).Building on an earlier study (Woodworth et al., 2017), this paper is concerned with extending a global dataset of higher-frequency (at least hourly) sea-level records from tide gauges at as many locations as possible worldwide.

The international body responsible for coordinating collection and access to in situ sea-level records is the Global Sea Level Observing System (GLOSS), which was established by the UNESCO Intergovernmental Oceanographic Commission (IOC) in 1985 to support a broad research and operational user base.Multiple GLOSS data centres contribute to the aggregation of global sealevel datasets with varying temporal resolutions and levels of quality control.Global datasets of monthly and annual mean sea levels have been available for many decades via the Permanent Service for Mean Sea Level (PSMSL).Established in 1933, PSMSL has been responsible for the collection of mean sea-level data from global tide gauges (Holgate et al., 2013) and has been used, with altimeter records, in most past mean sea-level trend and variabi The international body responsible for coordinating collection and access to in situ sea-level records is the Global Sea Level Observing System (GLOSS), which was established by the UNESCO Intergovernmental Oceanographic Commission (IOC) in 1985 to support a broad research and operational user base.Multiple GLOSS data centres contribute to the aggregation of global sealevel datasets with varying temporal resolutions and levels of quality control.Global datasets of monthly and annual mean sea levels have been available for many decades via the Permanent Service for Mean Sea Level (PSMSL).Established in 1933, PSMSL has been responsible for the collection of mean sea-level data from global tide gauges (Holgate et al., 2013) and has been used, with altimeter records, in most past mean sea-level trend and variability studies.PSMSL has always had good coverage globally because, historically, tide gauge operators have been more willing to share monthly mean data, rather than higherfrequency data.However, higher-frequency data are required for the study of ocean tides, storm surges, and extreme sea levels (Woodworth et al., 2019).The GLOSS dataset for research-quality hourly sea-level data are the Joint Archive for Sea Level (Caldwell et al., 2015), which was established in 1987 and is hosted by the University of Hawaii Sea Level Center (UHSLC).This dataset is composed of nearly 18,000 years of hourly sea-level data from 696 records in 97 countries.These data have been inspected for outliers, timing issues, and datum shifts, and efforts have been made to reconcile quality issues with the data originators.The locations of records in the UHSLC dataset are distributed globally, with care given to balance global coverage with the time-intensive process of quality assessment.Thus, the UHSLC dataset excludes many records in densely sampled regions to provide global coverage while maintaining an update cycle of approximately 2 years.
ity studies.PSMSL has always had good coverage globally because, historically, tide gauge operators have been more willing to share monthly mean data, rather than higherfrequency data.However, higher-frequency data are required for the study of ocean tides, storm surges, and extreme sea levels (Woodworth et al., 2019).The GLOSS dataset for research-quality hourly sea-level data are the Joint Archive for Sea Level (Caldwell et al., 2015), which was established in 1987 and is hosted by the University of Hawaii Sea Level Center (UHSLC).This dataset is composed of nearly 18,000 years of hourly sea-level data from 696 records in 97 countries.These data have been inspected for outliers, timing issues, and datum shifts, and efforts have been made to reconcile quality issues with the data originators.The locations of records in the UHSLC dataset are distributed globally, with care given to balance global coverage with the time-intensive process of quality assessment.Thus, the UHSLC dataset excludes many records in densely sampled regions to provide global coverage while maintaining an update cycle of approximately 2 years.

The GESLA (Global Extreme Sea Level Analysis) project was established, over a decade ago, to increase access to a greater volume of the global hourly and even higherfrequency sea-level data, than is available in the UHSLC dataset.The original aim of the project was to assemble as many higher-frequency sea-level records as were readily available into a common format with consistent quality control flags, to make it easier for researchers to maximize the geographic density of data, capturing extreme sea levels on a global scale.The first GESLA dataset, denoted GESLA-1, was assembled in 2009 and contained 21,197 years of higher-frequency measurements from 675 records.The majority of the data were obtained by ingesting UHSLC and other GLOSS data.The GLOSS datasets were then supplemented by a small number of other records The GESLA (Global Extreme Sea Level Analysis) project was established, over a decade ago, to increase access to a greater volume of the global hourly and even higherfrequency sea-level data, than is available in the UHSLC dataset.The original aim of the project was to assemble as many higher-frequency sea-level records as were readily available into a common format with consistent quality control flags, to make it easier for researchers to maximize the geographic density of data, capturing extreme sea levels on a global scale.The first GESLA dataset, denoted GESLA-1, was assembled in 2009 and contained 21,197 years of higher-frequency measurements from 675 records.The majority of the data were obtained by ingesting UHSLC and other GLOSS data.The GLOSS datasets were then supplemented by a small number of other records obtained from national data centres or from contributions received from colleagues in the sealevel community.GESLA-1 was first used in a study of sea-level extremes by Menendez and Woodworth (2010).Subsequent publications based on GESLA-1 included, for example, Hunter et al. (2013), Mawdsley et al. (2015) and Marcos et al. (2015), and GESLA-1 was used in the Intergovernmental Panel on Climate Change's (IPCC) Fifth Assessment Report (Church et al., 2013;Rhein et al., 2013;Wong et al., 2014).
btained from national data centres or from contributions received from colleagues in the sealevel community.GESLA-1 was first used in a study of sea-level extremes by Menendez and Woodworth (2010).Subsequent publications based on GESLA-1 included, for example, Hunter et al. (2013), Mawdsley et al. (2015) and Marcos et al. (2015), and GESLA-1 was used in the Intergovernmental Panel on Climate Change's (IPCC) Fifth Assessment Report (Church et al., 2013;Rhein et al., 2013;Wong et al., 2014).

After some years, it became apparent that GESLA-1 needed updating to include additional data and to extend its coverage in under-represented areas.Thus GESLA-2 was assembled in 2015 and 2016.The compilation of GESLA-2 is described in detail in Woodworth et al. (2017).This second version contained almost twice the amount of data compared to the first.GESLA-2 contained 39,151 years of higher-frequency measurements of sea level from 1,355 records; again, the UHSLC dataset made up a significant proportion of this database.Since its release in early 2016, GESLA-2 has been used in a wide range of ocean research, examples including:

1. Assessment of temporal and spatial changes in extreme sea levels and links to regional climate (e.g., Marcos & Woodworth, 2017;Rashid et al., 2021).2. Calculation of extreme se After some years, it became apparent that GESLA-1 needed updating to include additional data and to extend its coverage in under-represented areas.Thus GESLA-2 was assembled in 2015 and 2016.The compilation of GESLA-2 is described in detail in Woodworth et al. (2017).This second version contained almost twice the amount of data compared to the first.GESLA-2 contained 39,151 years of higher-frequency measurements of sea level from 1,355 records; again, the UHSLC dataset made up a significant proportion of this database.Since its release in early 2016, GESLA-2 has been used in a wide range of ocean research, examples including: 1. Assessment of temporal and spatial changes in extreme sea levels and links to regional climate (e.g., Marcos & Woodworth, 2017;Rashid et al., 2021).2. Calculation of extreme sea-level return periods and sealevel allowances (e.g., Tsitsikas, 2018;Wahl et al., 2017;Woodworth et al., 2021).3. Provision of information for flood inundation studies (e.g., Hunter et al., 2017).4. Analysis of nonlinear interactions between tides and nontidal residuals or skew surges (e.g., Arns et al., 2020;Santamaria-Aguilar & Vafeidis, 2019). 5. Investigations of changes in ocean tidal constituents and levels (e.g., Ray, 2020;Schindelegger et al., 2018).6. Examinations of the magnitude and changes in the perigean and nodal inter-annual tidal cycles (e.g., Peng et al., 2019;Woodworth & Hibbert, 2018).7. Validation of regional and global ocean tide and tide/ surge hydrodynamic models (e.g., Muis et al., 2020;Piccioni et al., 2018).8. Assessment of compound flooding from coastal, fluvial, and pluvial sources (e.g., Ward et al., 2018).9. Other applications (e.g., Tadesse et al., 2020;Wolff et al., 2018).
-level return periods and sealevel allowances (e.g., Tsitsikas, 2018;Wahl et al., 2017;Woodworth et al., 2021).3. Provision of information for flood inundation studies (e.g., Hunter et al., 2017).4. Analysis of nonlinear interactions between tides and nontidal residuals or skew surges (e.g., Arns et al., 2020;Santamaria-Aguilar & Vafeidis, 2019). 5. Investigations of changes in ocean tidal constituents and levels (e.g., Ray, 2020;Schindelegger et al., 2018).6. Examinations of the magnitude and changes in the perigean and nodal inter-annual tidal cycles (e.g., Peng et al., 2019;Woodworth & Hibbert, 2018).7. Validation of regional an global ocean tide and tide/ surge hydrodynamic models (e.g., Muis et al., 2020;Piccioni et al., 2018).8. Assessment of compound flooding from coastal, fluvial, and pluvial sources (e.g., Ward et al., 2018).9. Other applications (e.g., Tadesse et al., 2020;Wolff et al., 2018).

GESLA-2 data has also been used in the IPCC Special Report on Ocean and the Cryosphere (Oppenheimer et al., 2019), and in the Sixth Assessment Report (Fox-Kemper et al., 2021).Furthermore, a secondary database of tidal constituents has been derived from GESLA-2 by Piccioni et al. (2019) and another for skew surges has also been made available through the GESLA website, after Woodworth et al. (2017).All the studies that the authors are aware of that have used the GESLA dataset to date are listed on https://www.gesla.org.In 2016, GESLA was made an official GLOSS dataset.

In this paper, we describe the development of Version 3 of the dataset, which provides a major update.Section 2 of this paper describes the data sources, the data processing, and the revised GESLA data format.Access to the d GESLA-2 data has also been used in the IPCC Special Report on Ocean and the Cryosphere (Oppenheimer et al., 2019), and in the Sixth Assessment Report (Fox-Kemper et al., 2021).Furthermore, a secondary database of tidal constituents has been derived from GESLA-2 by Piccioni et al. (2019) and another for skew surges has also been made available through the GESLA website, after Woodworth et al. (2017).All the studies that the authors are aware of that have used the GESLA dataset to date are listed on https://www.gesla.org.In 2016, GESLA was made an official GLOSS dataset.
In this paper, we describe the development of Version 3 of the dataset, which provides a major update.Section 2 of this paper describes the data sources, the data processing, and the revised GESLA data format.Access to the data set is described in Section 3. A discussion and conclusions are given in Section 4.
ta set is described in Section 3. A discussion and conclusions are given in Section 4.


| DATA DESCRIPTION AND DEVELOPMENT

Here, we describe the data sources, record locations, and the number of years of data (Section 2.1), we outline the data processing and format (Section 2.2), we describe the usage licences (Section 2.3), and we discuss the dataset in regards to the recently established FAIR (findable, accessible, interoperable, and reusable) data principles (Section 2.4).


| Data sources

We obtained the higher-frequency sea-level dataset for GESLA-3 from 36 interna

| DATA DESCRIPTION AND DEVELOPMENT
Here, we describe the data sources, record locations, and the number of years of data (Section 2.1), we outline the data processing and format (Section 2.2), we describe the usage licences (Section 2.3), and we discuss the dataset in regards to the recently established FAIR (findable, accessible, interoperable, and reusable) data principles (Section 2.4).

| Data sources
We obtained the higher-frequency sea-level dataset for GESLA-3 from 36 international and national data providers (Table 1).Providers are ordered by the number of years of sea-level data available (see Table 2).Below, we use the abbreviated names of the providers; readers should refer to Table 1 for their full names.We define the length of a sea-level dataset for a particular record, as being the number of years available; a year is a calendar year containing one or more sea-level measurements for that particular record.We use the term record to refer to a sea-level dataset at a particular tide gauge.A specific tide gauge station can have more than one record; either because (a) a duplicate record for that station is available from different providers or (b) because sometimes sea-level time series for the same station is split into different records when there are datum jumps or changes in the location or instrument (i.e., the UHSLC dataset contains such records, and these are denoted by letters, A, B, C, etc. after the station code).
ional and national data providers (Table 1).Providers are ordered by the number of years of sea-level data available (see Table 2).Below, we use the abbreviated names of the providers; readers should refer to Table 1 for their full names.We define the length of a sea-level dataset for a particular record, as bein

the number of years available; a ye
r is a calendar year containing one or more sea-level measurements for that particular record.We use the term record to refer to a sea-level dataset at a particular tide gauge.A specific tide gauge station can have more than one record; either because (a) a duplicate record for that station is available from different providers or (b) because sometimes sea

evel time serie
for the same station is split into different records when there are datum jumps or changes in the location or instrument (i.e., the UHSLC dataset contains such records, and these are denoted by letters, A, B, C, etc. after the station code).

Data were obtained and processed as follows.First, full records were downloaded again from all the sources used to compile GESLA-2 (Table 2 in Woodworth et al., 2017), except where noted below.Therefore, any changes to quality control or datums made since 2015/16 are reflected in GESLA-3.GESLA-2 included 191 records from the GLOSS Delayed Model dataset (source
T A B L E 1 (Continued) (Continues)
1 glossdm-bodc, see Table 2 in Woodworth et al., 2017).However, this dataset has not been updated for many years and data from all but two of these records (Aasiaat and Maniitsoq in Greenland) are now available from other sources (see Table 1).Hence, we only included these two records in GESLA-3.GESLA-2 also included two datasets for Australia (source 28 johnhunter and 29 national_tidal_centre, see Table 2 in Data were obtained and processed as follows.First, full records were downloaded again from all the sources used to compile GESLA-2 (Table 2 in Woodworth et al., 2017), except where noted below.Therefore, any changes to quality control or datums made since 2015/16 are reflected in GESLA-3.GESLA-2 included 191 records from the GLOSS Delayed Model dataset (source 1 glossdm-bodc, see Table 2 in Woodworth et al., 2017).However, this dataset has not been updated for many years and data from all but two of these records (Aasiaat and Maniitsoq in Greenland) are now available from other sources (see Table 1).Hence, we only included these two records in GESLA-3.GESLA-2 also included two datasets for Australia (source 28 johnhunter and 29 national_tidal_centre, see Table 2 in Woodworth et al., 2017).We did not include either of these in GESLA-3; instead, we replaced them with a more up-todate sea-level dataset compiled by BOM, with a greater number of records.Next, we obtained measurements from 16 additional providers that were not in GESLA-2 (indicated by the grey shading in Table 1).GESLA-3 now includes higher-frequency sea-level data obtained from paper records via data archaeology (DA) exercises.These included 21 records in the United States collated by Bromirski et al. (2003), Talke et al. (2014Talke et al. ( , 2018Talke et al. ( , 2020Talke et al. ( , 2021)) 2009) and 3 records in Spain digitized by Marcos et al. (2013Marcos et al. ( , 2021)).These datasets include the earliest higher-frequency data available for the Pacific Ocean (Astoria, 1855-1876 and San Francisco, 1858-1877) and stations on the US East Coast and Europe from the late 19th century.While some information such as the datum and time zone is available in GESLA-3 metadata for these DA sources, users are referred to the references above for more detailed discussions of data provenance and quality.
Woodworth et al., 2017).We did not include either of these in GESLA-3; instead, we replaced them with a more up-todate sea-level dataset compiled by BOM, with a greater number of records.Next, we obtained measurements from 16 additional providers that were not in GESLA-2 (indicated by the grey shading in Table 1).GESLA-3 now includes higher-frequency sea-level data obtained from paper records via data archaeology (DA) exercises.These included 21 records in the United States collated by Bromirski et al. (2003), Talke et al. (2014Talke et al. ( , 2018Talke et al. ( , 2020Talke et al. ( , 2021)) 2009) and 3 records in Spain digitized by Marcos et al. (2013Marcos et al. ( , 2021)).These datasets include the earliest higher-frequency data available for the Pacific Ocean (Astoria, 1855-1876 and San Francisco, 1858-1877) and stations on the US East Coast and Europe from the late 19th century.While some information such as the datum and time zone is available in GESLA-3 metadata for these DA sources, users are referred to the references above for more detailed discussions of data provenance and quality.

For five of the 36 sources within GESLA-3 (i.e., UHSLC, NOAA, NHS, MI-C, and MI-R), we downloaded the data automatically and rapidly via an Application Programming Interface (API).For the NHS dataset, we combined the more recent data from the late 1980's, downloaded via API, with historical data going back as far as 1915, that was provided to us directly.For 25 of the 36 sources, we manually downloaded the data from provider websites.For some providers, the data could be downloaded in bulk.However, for other providers, the data had to be downloaded one record at a time.Furthermore, for a few providers, the data had to be downloaded in 1-15-year blocks, for each record.For the remaining six sources (i.e., DA, DMI, NOC, ESEAS, ICG, and UZ), we obtained the data directly from the provider or copied the data from GESLA-2 (when updates were not available).The United States providers USGS, CDWR, SFWMD, NWFWMD, and NCDEM, and the Dutch provider RWS, did not discern between tidally influenced gauges and river-only gauges; in these cases, we hand-selected stations where there was the obvious presence of tidal forcing during at least part of the year and we did not include the river-only records.The  NOAA and MEDS datasets included records in the Great Lakes, and we retained these in GESLA-3.

In GESLA-1 and GESLA-2, we focused primarily on obtaining long records.However, many shorter records (a few days to a few years) are now being routinely provided by data centres.Furthermore, as described in Section 1, the GESLA dataset is increasingly being used for a wider range of analysis purposes.Short records, even those up to a month in duration, can be proved useful for a variety of applications, including the calculation of harmonic constituents and the validation of numerical models.Therefore, for GESLA-3, we included all the higherfrequency records that were available from the 36 providers, as long as they had at least 30 days of measurements.As discussed below, the inclusion of short records is a primary reason why the number of records and years greatly increased in GESLA-3, compared to GESLA-2.

For most sources we obtained the so-called "delayed mode" or "research quality" data, which typically becomes available to a user with a delay from days to years, enabling the data centres to perform quality control and include flags to highlight periods of good, suspect, and bad data values.The latest years available for each source are listed in Table 2.For around half of the sources, we obtained data up to October 2021 (the dates we did the final processing of the dataset).Most other datasets included data until the end of 2019 or 2020.

The number of records and years of data in GESLA-3 is listed in Table 2 for each of the 36 contributing sources.In total, GESLA-3 contains 91,021 years from 5,119 records.A map showing the locations of the records for GESLA-3 is shown in Figure 1.The areas where the coverage has most improved, compared to GESLA-2, are North America (Figure 2a), Europe (Figure 2b), Japan (Fi For five of the 36 sources within GESLA-3 (i.e., UHSLC, NOAA, NHS, MI-C, and MI-R), we downloaded the data automatically and rapidly via an Application Programming Interface (API).For the NHS dataset, we combined the more recent data from the late 1980's, downloaded via API, with historical data going back as far as 1915, that was provided to us directly.For 25 of the 36 sources, we manually downloaded the data from provider websites.For some providers, the data could be downloaded in bulk.However, for other providers, the data had to be downloaded one record at a time.Furthermore, for a few providers, the data had to be downloaded in 1-15-year blocks, for each record.For the remaining six sources (i.e., DA, DMI, NOC, ESEAS, ICG, and UZ), we obtained the data directly from the provider or copied the data from GESLA-2 (when updates were not available).The United States providers USGS, CDWR, SFWMD, NWFWMD, and NCDEM, and the Dutch provider RWS, did not discern between tidally influenced gauges and river-only gauges; in these cases, we hand-selected stations where there was the obvious presence of tidal forcing during at least part of the year and we did not include the river-only records.The  NOAA and MEDS datasets included records in the Great Lakes, and we retained these in GESLA-3.
In GESLA-1 and GESLA-2, we focused primarily on obtaining long records.However, many shorter records (a few days to a few years) are now being routinely provided by data centres.Furthermore, as described in Section 1, the GESLA dataset is increasingly being used for a wider range of analysis purposes.Short records, even those up to a month in duration, can be proved useful for a variety of applications, including the calculation of harmonic constituents and the validation of numerical models.Therefore, for GESLA-3, we included all the higherfrequency records that were available from the 36 providers, as long as they had at least 30 days of measurements.As discussed below, the inclusion of short records is a primary reason why the number of records and years greatly increased in GESLA-3, compared to GESLA-2.
For most sources we obtained the so-called "delayed mode" or "research quality" data, which typically becomes available to a user with a delay from days to years, enabling the data centres to perform quality control and include flags to highlight periods of good, suspect, and bad data values.The latest years available for each source are listed in Table 2.For around half of the sources, we obtained data up to October 2021 (the dates we did the final processing of the dataset).Most other datasets included data until the end of 2019 or 2020.
The number of records and years of data in GESLA-3 is listed in Table 2 for each of the 36 contributing sources.In total, GESLA-3 contains 91,021 years from 5,119 records.A map showing the locations of the records for GESLA-3 is shown in Figure 1.The areas where the coverage has most improved, compared to GESLA-2, are North America (Figure 2a), Europe (Figure 2b), Japan (Figure 2c), and Australia (Figure 2d).This is illustrated clearly in Figure 3, which shows the location of new records in GESLA-3 that are more than 50 km from a record in GESLA-2.Coverage outside of these regions is primarily achieved by ingesting the UHSLC dataset, which continues to be updated with new data, but has remained consistent in terms of the number and location of included stations.Coverage in North America has increased enormously for several reasons.First, we added all datasets available from NOAA and MEDS, not just the longer datasets.Furthermore, we also incorporated new datasets from the USGS, CDWR, SFWMD, NWFWMD, NCDEM, and UNAM.In Europe, the largest increase in coverage stems from the records added from CMEMS.However, note many of the records from CMEMS only cover more recent decades, and not the full period often available from other providers (e.g., for Newlyn, data are available from 1915 from BODC, but only from 1990 from CMEMS).We also added new datasets for the United Kingdom from the  et al., 2021).The ANCHORS methodology applied statistical techniques to remove stepwise changes in annual means resulting, for example, from datum shifts and tide gauge relocations, for long tide-gauge records.So that quality control processes applied in GESLA-3 are internally consistent, we only included unhomogenized data from ANCHORS records, which is then quality controlled as described in Section 2.2.In the process of developing ANCHORS, many additional shorter records suitable for GESLA-3 were identified and are also included here.

re 2c),
and Australia (Figure 2d).This is illustrated clearly in Figure 3, which shows the location of new records in GESLA-3 that are more than 50 km from a record in GESLA-2.Coverage outside of these regions is primarily achieved by ingesting the UHSLC dataset, which continues to be updated with new data, but has remained consistent in terms of the number and location of included stations.Coverage in North America has increased enormously for several reasons.First, we added all datasets available from NOAA and MEDS, not just the longer datasets.Furt ermore, we also incorporated new datasets from the USGS, CDWR, SFWMD, NWFWMD, NCDEM, and UNAM.In Europe, the largest increase in coverage stems from the records added from CMEMS.However, note many of the records from CMEMS only cover more recent decades, and not the full period often available from other providers (e.g., for Newlyn, data are available from 1915 from BODC, but only from 1990 from CMEMS).We also added new datasets for the United Kingdom from the  et al., 2021).The ANCHORS methodology applied statistical techniques to remove stepwise changes in annual means resulting, for example, from datum shifts and tide gauge relocations, for long tide-gauge records.So that quality control processes applied in GESLA-3 are internally consistent, we only included unhomogenized data from ANCHORS records, which is then quality controlled as described in Section 2.2.In the process of developing ANCHORS, many additional shorter records suitable for GESLA-3 were identified and are also included here.

In GESLA-3, records are available for 114 countries.The countries with the highest number of records are the United States and Canada, reflecting in part the vast length of the coastlines in these countries.The number of countries, covered by each of the 36 contributing sources is listed in Table 2 (final column).The UHSLC dataset contains records from 97 countries, significantly higher than any of the other sources.This illustrates how essential the UHSLC dataset is for achieving good global coverage in GESLA-3 (and earlier versions).

GESLA-3 contains 91,021 years of sea-level data (Table 2).The number of records containing different numbers of years is shown in Figure 4a.The record, with the most years of data (168 years between 1851 and 2021) is Olands Norra Udde from the SMHI, and the next longest record is Brest (165 years between 1846 and 2021) from REFMAR.The number of records with different ranges of years is shown in Figure 4b.The inclusion of many new short (i.e., <5 years) records is evident, but GESLA-3 also includes many new longer records, for example, for Japan from JODC_JCG, JODC_GIAJ, and JODC_PAHB, and for the United States and Europe from the DA sources.The record locations, with corresponding numbers of years, are shown in Figure 5.The majority of the sites with >100 years are located in North America and Europe.Four further sites are located in Panama and Australia.The number of records starting in particular year ranges is shown in Figure 4c.The location of records starting in the corresponding year ranges are shown in Figure 6.The earliest record, Katwijk in the Netherlands, starts in the In GESLA-3, records are available for 114 countries.The countries with the highest number of records are the United States and Canada, reflecting in part the vast length of the coastlines in these countries.The number of countries, covered by each of the 36 contributing sources is listed in Table 2 (final column).The UHSLC dataset contains records from 97 countries, significantly higher than any of the other sources.This illustrates how essential the UHSLC dataset is for achieving good global coverage in GESLA-3 (and earlier versions).
GESLA-3 contains 91,021 years of sea-level data (Table 2).The number of records containing different numbers of years is shown in Figure 4a.The record, with the most years of data (168 years between 1851 and 2021) is Olands Norra Udde from the SMHI, and the next longest record is Brest (165 years between 1846 and 2021) from REFMAR.The number of records with different ranges of years is shown in Figure 4b.The inclusion of many new short (i.e., <5 years) records is evident, but GESLA-3 also includes many new longer records, for example, for Japan from JODC_JCG, JODC_GIAJ, and JODC_PAHB, and for the United States and Europe from the DA sources.The record locations, with corresponding numbers of years, are shown in Figure 5.The majority of the sites with >100 years are located in North America and Europe.Four further sites are located in Panama and Australia.The number of records starting in particular year ranges is shown in Figure 4c.The location of records starting in the corresponding year ranges are shown in Figure 6.The earliest record, Katwijk in the Netherlands, starts in the year 1805 (but this record only contains 3 years).Hence, GESLA-3 spans the 217-year period from 1805 to 2021.The next earliest record, Saint Nazaire in France starts in the year 1821 (this record contains 134 years of data).The number of records containing data each year between 1805 and 2021, is shown in Figure 4d, for GESLA-3, plotted alongside the same information for the earlier GESLA-1 and GESLA-2 datasets.There is an apparent decline in data availability in the most recent few years.As discussed above, this is because the "delayed mode" or "research quality" data become available with a typical delay of a few years during which data centres perform quality control.
ear 1805 (but this record only contains 3 years).Hence, GESLA-3 spans the 217-year period from 1805 to 2021.The next earliest record, Saint Nazaire in France starts in the year 1821 (this record contains 134 years of data).The number of records containing data each year between 1805 and 2021, is shown in Figure 4d, for GESLA-3, plotted alongside the same information for the earlier GESLA-1 and GESLA-2 datasets.There is an apparent decline in data availability in the most recent few years.As discussed above, this is because the "delayed mode" or "research quality" data become available with a typical delay of a few years during which data centres perform quality control.


| Data processing and format

The sea-level dataset we obtained from the 36 providers has differing units, time zones, and formats, and quality control flags are variously defined.As with GESLA-1 and GESLA-2, we converted height units to metres, the time zone of each record was adjusted to Coordinated Universal Time (UTC), we matched the specific data provider quality control flags to our defined GESLA flags (see below), and we processed the records into a standard format (a slightly modified version of the GESLA-2 format, see below).USGS and CDWR used Daylight Savings time in summer and we first shifted these to standard time, before converting to UTC; however, the times of annual shifts between Daylight Savings Time and Standard Time are imperfectly documented, some errors may remain.

In most instances, we did not adjust the frequency of the records, which in all cases was at least hourly, although several sources have data at higher frequency (6, 10, or 15 min).When given an option (e.g., on a provider's website), we always downloaded the hourly data, over higher-frequency data, as hourly da

| Data processing and format
The sea-level dataset we obtained from the 36 providers has differing units, time zones, and formats, and quality control flags are variously defined.As with GESLA-1 and GESLA-2, we converted height units to metres, the time zone of each record was adjusted to Coordinated Universal Time (UTC), we matched the specific data provider quality control flags to our defined GESLA flags (see below), and we processed the records into a standard format (a slightly modified version of the GESLA-2 format, see below).USGS and CDWR used Daylight Savings time in summer and we first shifted these to standard time, before converting to UTC; however, the times of annual shifts between Daylight Savings Time and Standard Time are imperfectly documented, some errors may remain.
In most instances, we did not adjust the frequency of the records, which in all cases was at least hourly, although several sources have data at higher frequency (6, 10, or 15 min).When given an option (e.g., on a provider's website), we always downloaded the hourly data, over higher-frequency data, as hourly data are adequate for most analyses that have previously been undertaken using GESLA, and it reduces the file sizes of the final processed datasets.Within the CMEMS dataset, the French data are provided at different frequencies for the same tide gauge.For example, the dataset at Brest is provided at 1-, 2-, 5-, 10-, and 60-min frequencies (for different overlapping periods).The higher-frequency records are generally much shorter, and the quality control is often less rigorous, so we ignored these and only included, in most instances, the hourly resolution dataset.The WSV data had a resolution of 1-min and the USGS, CDWR, SFWMD, NWFWMD, and NCDEM data had resolutions between 1 and 15 min.We averaged these records, to hourly values, again to reduce the file size of the processed dataset.To do this, we selected all the data that lay within plus or minus 30 min from a specific hour, and averaged these values.Data from some providers are temporally regular (e.g., there is a date/ time stamp every single hour) while for other providers the data are irregular (e.g., there is not a date/time stamp every hour -some are missing).In some cases, the frequency changes over time (e.g., the first part of the record is hourly, while the more recent period has a frequency of 15-min).We did not attempt to make the dataset temporally regular, or (with the exception of that mentioned above) adjust the frequency, as most analysis approaches can handle data with irregular time scales.Furthermore, we wanted the records to remain as consistent as possible with that provided by the originating agency.

are adequate for most analys
s that have previously been undertaken using GESLA, and it reduces the file sizes of the final processed datasets.Within the CMEMS dataset, the French data are provided at different frequencies for the same tide gauge.For example, the dataset at Brest is provided at 1-, 2-, 5-, 10-, and 60-min frequencies (for different overlapping periods).The higher-frequency records are generally much shorter, and the quality control is often less rigorous, so we ignored these and only included, in most instances, the hourly resolution dataset.The WSV data had a resolution of 1-min and the USGS, CDWR, SFWMD, NWFWMD, and NCDEM data had resolutions between 1 and 15 min.We averaged these records, to hourly values, again to reduce the file size of the processed dataset.To do th s, we selected all the data that lay within plus or minus 30 min from a specific hour, and averaged these values.Data from some providers are temporally regular (e.g., there is a date/ time stamp every single hour) while for other providers the data are irregular (e.g., there is not a date/time stamp every hour -some are missing).In some cases, the frequency changes over time (e.g., the first part of the record is hourly, while the more recent period has a frequency of 15-min).We did not attempt to make the dataset temporally regular, or (with the exception of that mentioned above) adjust the frequency, as most analysis approaches can handle data with irregular time scales.Furthermore, we wanted the records to remain as consistent as possible with that provided by the originating agency.

For consistency, we have kept the format of the GESLA-3 data files virtually the same as in GESLA-2.As illustrated in Table 3, each text file contains 41 lines of header information, followed by the data itself.In GESLA-2, we listed only the name of the contributor of the data.However, in GESLA-3 we have included two extra header lines recording the website and the contact details of the contributor.For the international data centres, such as the UHSLC and CMEMS, the data they provide originate from different national centres.To ensure the originators of the data receive the credit they deserve, and so that the data can be traced back to the original providers, we have included three extra header lines listing the originator of the data, their website, and contact details.Where the contributor and originator are the same, the information is simply repeated.In GESLA-3, we have also added a new header line to indicate the record length in years.

We have also In GESLA-3, we have added many new records located in the upper reaches of estuaries and tidally influenced rivers, and we hope these new records may help For consistency, we have kept the format of the GESLA-3 data files virtually the same as in GESLA-2.As illustrated in Table 3, each text file contains 41 lines of header information, followed by the data itself.In GESLA-2, we listed only the name of the contributor of the data.However, in GESLA-3 we have included two extra header lines recording the website and the contact details of the contributor.For the international data centres, such as the UHSLC and CMEMS, the data they provide originate from different national centres.To ensure the originators of the data receive the credit they deserve, and so that the data can be traced back to the original providers, we have included three extra header lines listing the originator of the data, their website, and contact details.Where the contributor and originator are the same, the information is simply repeated.In GESLA-3, we have also added a new header line to indicate the record length in years.
We have also In GESLA-3, we have added many new records located in the upper reaches of estuaries and tidally influenced rivers, and we hope these new records may help spur scientific innovation in these dynamic, highly anthropogenically affected regions (see reviews by Hoitink and Jay, 2016;Haigh et al., 2020;and Talke and Jay, 2020).To aid in analysis, another new header line has therefore been added to indicate the hydrographic environment of the tide-gauge location.This header line denotes whether a record is associated with a (a) coastal, (b) river, or (c) lake environment.We visually inspected each record, and location, and distinguished between 'coastal' and 'river' stations based on whether the water level signal was clearly As mentioned earlier, if a record had no clear tidal signal, for at least part of the year, it was removed.Lake stations are in regions hydraulically disconnected from the ocean.The lake sites are mostly in the Great Lakes (from NOAA or MEDS), although a small selection of sites is in the Ijsselmeer in the Netherlands (from RWS).We realize the subdivision into 'coastal' and 'river' is very difficult, and somewhat subjective, but we hope this is useful for users of the dataset.In total, 4,159 records are classified as coastal, 784 as a river, and 178 as a lake.Users only interested in assessing trends in extreme sea levels from oceanographic sources may wish to just select the coastal records, and ignore the records associated with river and lake stations.
spur scientific innovation in these dynamic, highly anthropogenically affected regions (see reviews by Hoitink and Jay, 2016;Haigh et al., 2020;and Talke and Jay, 2020).To aid in analysis, another new header line has therefore been added to indicate the hydrographic environment of the tide-gauge location.This header line denotes whether a record is associated with a (a) coastal, (b) river, or (c) lake environment.We visually inspected each record, and location, and distinguished between 'coastal' and 'river' stations based on whether the water level signal was clearly As mentioned earlier, if a record had no clear tidal signal, for at least part of the year, it was removed.Lake stations are in regions hydraulically disconnected from the ocean.The lake sites are mostly in the Great Lakes (from NOAA or MEDS), although a small selection of sites is in the Ijsselmeer in the Netherlands (from RWS).We realize the subdivision into 'coastal' and 'river' is ve y difficult, and somewhat subjective, but we hope this is useful for users of the dataset.In total, 4,159 records are classified as coastal, 784 as a river, and 178 as a lake.Users only interested in assessing trends in extreme sea levels from oceanographic sources may wish to just select the coastal records, and ignore the records associated with river and lake stations.

In each file, the data itself is comprised of five columns, separated by one or more spaces, consistent with GESLA-1 and GESLA-2.These are (a) the date, (b) time, (c) the observed sea level, (d) the quality control flag, and (e) the flag indicating whether the data should be used for analysis or not.Each data value in GESLA-3 has been assigned two flags.The first flag (in column 4) indicates the quality control undertaken by the provider.For this we use the following flags to be consistent with GESLA-1 and GESLA-2: 0 for no quality control; 1 for correct value; 2 fo In each file, the data itself is comprised of five columns, separated by one or more spaces, consistent with GESLA-1 and GESLA-2.These are (a) the date, (b) time, (c) the observed sea level, (d) the quality control flag, and (e) the flag indicating whether the data should be used for analysis or not.Each data value in GESLA-3 has been assigned two flags.The first flag (in column 4) indicates the quality control undertaken by the provider.For this we use the following flags to be consistent with GESLA-1 and GESLA-2: 0 for no quality control; 1 for correct value; 2 for interpolated value; 3 for doubtful value; 4 for an isolated spike or wrong value, and 5 for missing value (set to − 99.9999).Where available, we matched each of the provider flags to our system.Due to the huge effort it would require, we did not undertake further extensive quality control of our own.However, we did visually inspect each record individually, and we manually flagged suspect values that were clearly outside of the normal range or were isolated spikes.It is clear that data quality is poor for some sources, and datum jumps do exist, and users should treat these particular records with caution.As discussed earlier the overall record quality identifies the records that should be treated with caution.The second flag (in column 5) is a 1 or 0, indicating whether that value should F I G U R E 2 Locations of the sea-level records in GESLA-2 and GESLA-3 for the four regions with the greatest coverage increase: (a) North America; (b) Europe; (c) Japan; and (d) Australia.Note the GESLA-2 locations are also in GESLA-3 be used for analysis, or not, respectively.All values whose quality control flag was 0, 1, or 2 were set to analysis flag 1 (use), and all values whose quality control flag was 3, 4, or 5, were set to analysis flag 0 (do not use).
interpolated value; 3 for doubtful value; 4 for an isolated spike or wrong value, and 5 for missing value (set to − 99.9999).Where available, we matched each of the provider flags to our system.Due to the huge effort it would require, we did not undertake further extensive quality control of our own.However, we did visually inspect each record individually, and we manually flagged suspect values that were clearly outside of the normal range or were isolated spikes.It is clear that data quality is poor for some sources, and datum jumps do exist, and users should treat these particular records with caution.As discussed earlier the overall record quality identifies the records that should be treated with caution.The second flag (in column 5) is a 1 or 0, indicating whether that value should F I G U R E 2 Locations of the sea-level records in GESLA-2 and GESLA-3 for the four regions with the greatest coverage increase: (a) North America; (b) Europe; (c) Japan; and (d) Australia.Note the GESLA-2 locations are also in GESLA-3 be used for analysis, or not, respectively.All values whose quality control flag was 0, 1, or 2 were set to analysis flag 1 (use), and all values whose quality control flag was 3, 4, or 5, were set to analysis flag 0 (do not use).

The name of each file is made up of the (lower case) site name, site code, country code, and an abbreviation of the contributor name (note, for the DA records, we have added an underscore and the initials of the person who provided that record, for example, da_mm for the three records provided by Marta Marcos), separated by a hyphen (e.g., brest-822a-fra-uhslc).We have replaced all spaces in site names with an underscore.We have also removed all full stops, commas, brackets, accents, hyphens, and other special characters from file names and site codes.Hence, the file name and code might not exactly match that of the data provider.For country codes, we use the three-letter ISO 3166-1 alpha-3 codes (https://en.wikipedia.org/wiki/ISO_3166-1_alpha The name of each file is made up of the (lower case) site name, site code, country code, and an abbreviation of the contributor name (note, for the DA records, we have added an underscore and the initials of the person who provided that record, for example, da_mm for the three records provided by Marta Marcos), separated by a hyphen (e.g., brest-822a-fra-uhslc).We have replaced all spaces in site names with an underscore.We have also removed all full stops, commas, brackets, accents, hyphens, and other special characters from file names and site codes.Hence, the file name and code might not exactly match that of the data provider.For country codes, we use the three-letter ISO 3166-1 alpha-3 codes (https://en.wikipedia.org/wiki/ISO_3166-1_alpha -3).
3).


| Data licence

The developers of GESLA-1 only used data that had been provided to them on a personal basis, knowing how it was intended to be used.The dataset was subsequently made available only to trusted scientific users.For GESLA-2, the team divided the dataset into 27 "public" and 3 "private" sub-sets.Subject to the acknowledgement of the data owner, the 'public' data set was readily available to download from the GESLA website and could be used for both research and consultancy purposes.However, the 'private' dataset could only be used for research and not a consultancy.This could only be obtained from the GESLA website with a password; bona fide researchers had to contact the GESLA team with an explanation of why they would like access to the dataset, to be given the password.

To simplify the process, we have decided not to separate the GESLA-3 data into two sets, on the GESLA website.Instead, we have examined the licences associated with each data contributor, where available, included a link to the specific licence in Table 1, and trust the users to comply with the licence conditions.Table 1 also lists whether the data can be freely used for research and/or consultancy.For example, users wishing to use the records provided by CV, UZ, and CMEMS for consultancy purposes, must contact these organizations to obtain permission first (or in the case of CMEMS the organizations that provided the data to them).In GESLA-2, the Australia records were included in the 'private' sub-set.However, we are pleased that in GESLA-3 permission has been obtained to make these Australian records publicly available.

In summary, the data are accessible but are covered by several different licences, some of which are noncommercial, by-attribution, or a combination of conditions.Access to the data does not currently require authen

| Data licence
The developers of GESLA-1 only used data that had been provided to them on a personal basis, knowing how it was intended to be used.The dataset was subsequently made available only to trusted scientific users.For GESLA-2, the team divided the dataset into 27 "public" and 3 "private" sub-sets.Subject to the acknowledgement of the data owner, the 'public' data set was readily available to download from the GESLA website and could be used for both research and consultancy purposes.However, the 'private' dataset could only be used for research and not a consultancy.This could only be obtained from the GESLA website with a password; bona fide researchers had to contact the GESLA team with an explanation of why they would like access to the dataset, to be given the password.
To simplify the process, we have decided not to separate the GESLA-3 data into two sets, on the GESLA website.Instead, we have examined the licences associated with each data contributor, where available, included a link to the specific licence in Table 1, and trust the users to comply with the licence conditions.Table 1 also lists whether the data can be freely used for research and/or consultancy.For example, users wishing to use the records provided by CV, UZ, and CMEMS for consultancy purposes, must contact these organizations to obtain permission first (or in the case of CMEMS the organizations that provided the data to them).In GESLA-2, the Australia records were included in the 'private' sub-set.However, we are pleased that in GESLA-3 permission has been obtained to make these Australian records publicly available.
In summary, the data are accessible but are covered by several different licences, some of which are noncommercial, by-attribution, or a combination of conditions.Access to the data does not currently require authentication, so restricted data are open to all, and we ask users to F I G U R E 3 Location of sea-level records in GESLA-3 that are more than 50 km from a record in GESLA-2 comply with the licence conditions.In acknowledgment of the central role of the UHSLC dataset in GESLA-3 (and earlier versions) and the decades-long effort to collect and quality assess the UHSLC data, we request that users of GESLA-3 data cite Caldwell et al. (2015) in addition to this paper in their work.
ication, so restricted data are open to all, and we ask users to F I G U R E 3 Location of sea-level records in GESLA-3 that are more than 50 km from a record in GESLA-2 comply with the licence conditions.In acknowledgment of the central role of the UHSLC dataset in GESLA-3 (and earlier versions) and the decades-long effort to collect and quality assess the UHSLC data, we request that users of GESLA-3 data cite Caldwell et al. (2015) in addition to this paper in their work.


| Data principles

While constructing this third version of GESLA, we carefully considered the FAIR data principles, conceived by Wilkinson et al. (2016); that is that data should be findable, accessible, interoperable, and reusable.These principles also help ensure that proper credi

| Data principles
While constructing this third version of GESLA, we carefully considered the FAIR data principles, conceived by Wilkinson et al. (2016); that is that data should be findable, accessible, interoperable, and reusable.These principles also help ensure that proper credit is given to all those involved in the data lifecycle.In GESLA-3, we have implemented several improvements compared to GESLA-2, to move the dataset towards being FAIR-compliant.The data archived with the BODC Published Data Library (PDL) has been assessed against the GO-FAIR criteria (https:// www.go-fair.org/fair-principles/) and at the time of writing, partially meets the criteria.The GESLA-3 data are assigned a globally unique and persistent identifier and the metadata contains the identifier of the data (the DOI universally unique identifier, UUID, is given on the landing page).The datasets are findable in searchable resources, such as Google Dataset Search and included in metadata directories (e.g., the European Directory of Marine Environmental).The file header metadata have been improved since GESLA-2 as we have differentiated between who has contributed the data (# CONTRIBUTOR) and where the data originated from (# ORIGINATOR), but in the next version, we could look to implement more of the minimum mandatory metadata as detailed in the EuroSea deliverable D3.3 (Pérez Gómez et al., 2021).

is given to all
those involved in the data lifecycle.In GESLA-3, we have implemented several improvements compared to GESLA-2, to move the dataset towards being FAIR-compliant.The data archived with the BODC Published Data Library (PDL) has been assessed against the GO-FAIR criteria (https:// www.go-fair.org/fair-principles/) and at the time of writing, partially meets the criteria.The GESLA-3 data are assigned a globally unique and persistent identifier and the metadata contains the identifier of the data (the DOI universally unique identifier, UUID, is given on the landing page).The datasets are findable in searchable resources, such as Google Dataset Search and included in metadata directories (e.g., the European Directory of Marine Environmental).The file header metadata have been impro ed since GESLA-2 as we have differentiated between who has contributed the data (# CONTRIBUTOR) and where the data originated from (# ORIGINATOR), but in the next version, we could look to implement more of the minimum mandatory metadata as detailed in the EuroSea deliverable D3.3 (Pérez Gómez et al., 2021).

We are working towards making the GESLA-3 data more interoperable.We have started to implement the use of some controlled vocabularies (e.g., ISO 3166-1 alpha-3 for country code), but in future versions we would like to include controlled vocabularies for other metadata.These would include using vocabularies such as the Research Organization Registry (https://ror.org)or the European Directory of Marine Organizations (https://edmo.seadatanet.org)for organizations, and SeaDataNet (https:// www.seada tanet.org)for coordinate We are working towards making the GESLA-3 data more interoperable.We have started to implement the use of some controlled vocabularies (e.g., ISO 3166-1 alpha-3 for country code), but in future versions we would like to include controlled vocabularies for other metadata.These would include using vocabularies such as the Research Organization Registry (https://ror.org)or the European Directory of Marine Organizations (https://edmo.seadatanet.org)for organizations, and SeaDataNet (https:// www.seada tanet.org)for coordinate and datum information.The data can easily be converted into NetCDF (Network Common Data Form) files, and we hope to archive and distribute these data via an ERDDAP data server in the future, where allowable.We have also provided computer scripts on the GESLA website in a variety of programming languages (e.g., MATLAB, Python, and R), to allow users to easily load in the dataset for scientific analysis.
nd datum information.The data can easily be converted into NetCDF (Network Common Data Form) files, and we hope to archive and distribute these data via an ERDDAP data server in the future, where allowable.We have also provided computer scripts on the GESLA website in a variety of programming languages (e.g., MATLAB, Python, and R), to allow users to easily load in the dataset for scientific analysis.


| DATASET ACCESS

The 5,119 records in GESLA-3, and copies of the earlier two versions of the dataset, can be obtained from https:// www.gesla.org.Furthermore, we now also provide a comma-delimited ASCII file containing information about each record and a Keyhole Markup Language (KML) file

| DATASET ACCESS
The 5,119 records in GESLA-3, and copies of the earlier two versions of the dataset, can be obtained from https:// www.gesla.org.Furthermore, we now also provide a comma-delimited ASCII file containing information about each record and a Keyhole Markup Language (KML) file, which can be opened, for example, in Google Earth, to show record locations and information.On the GESLA website, we keep a list of any problems that we, or others, identify with the data, which we subsequently correct.

which can be opene
, for example, in Google Earth, to show record locations and information.On the GESLA website, we keep a list of any problems that we, or others, identify with the data, which we subsequently correct.

The GESLA-3 dataset has also been archived with the BODC, in two parts: the first part, which can be obtained here https://www.bodc.ac.uk/data/published_data_library/catalogue/10.5285/d21a496a-a48e-1f21-e053-6c86abc08512/contains the 4,527 records that can be used for both research and consultancy purposes and is covered by a creative commons CC-BY 4.0 licence and the second part, which can be downloaded here https://www.bodc.ac.uk/data/published_data_library/catalogue/10.5285/d21a496a-a48f-1f21-e053-6c86abc08512/ contains the 592 records that can be used for research purposes, but not consultancy and is covered by a creative commons BY-NC 4.0 licence.


| DISCUSSION AND


CONCLUSIONS

This paper has described the assembly of the third version of the GESLA dataset.GESLA-3 is a major update, containing 91,021 years of sea-level observations, more than double that of GESLA-2.The 5,119 records in GESLA-3 are nearly four times the number of that in GESLA-2.Many of the records are now available to October 2021, encompassing an extra 6 or 7 years of data compared to GESLA-2.Furthermore, new records have been added, improving spatial coverage, especially in North America, Europe, Japan, and Australia.In particular, we have added many new records for stations located in the upper reaches of estuaries and tidally influenced rivers.

There is some duplication between records provided by the different sources.For example, a record for Brest is provided by UHSLC, REFMAR, and CMEMS, and the data for Newlyn is pr The GESLA-3 dataset has also been archived with the BODC, in two parts: the first part, which can be obtained here https://www.bodc.ac.uk/data/published_data_library/catalogue/10.5285/d21a496a-a48e-1f21-e053-6c86abc08512/contains the 4,527 records that can be used for both research and consultancy purposes and is covered by a creative commons CC-BY 4.0 licence and the second part, which can be downloaded here https://www.bodc.ac.uk/data/published_data_library/catalogue/10.5285/d21a496a-a48f-1f21-e053-6c86abc08512/ contains the 592 records that can be used for research purposes, but not consultancy and is covered by a creative commons BY-NC 4.0 licence.

CONCLUSIONS
This paper has described the assembly of the third version of the GESLA dataset.GESLA-3 is a major update, containing 91,021 years of sea-level observations, more than double that of GESLA-2.The 5,119 records in GESLA-3 are nearly four times the number of that in GESLA-2.Many of the records are now available to October 2021, encompassing an extra 6 or 7 years of data compared to GESLA-2.Furthermore, new records have been added, improving spatial coverage, especially in North America, Europe, Japan, and Australia.In particular, we have added many new records for stations located in the upper reaches of estuaries and tidally influenced rivers.
There is some duplication between records provided by the different sources.For example, a record for Brest is provided by UHSLC, REFMAR, and CMEMS, and the data for Newlyn is provided by UHSLC, BODC, and CMEMS.Some duplicate records may be present in USGS and CDWR data, or NOAA and USGS.In some cases, two agencies may operate gauges within several kilometres of each other (e.g., the USGS and NOAA at Vancouver, Washington, or USGS and NOAA at Fort Pulaski, Georgia).The level of quality control may also differ between providers and the data lengths might not be consistent (e.g., the UHSLC and BODC dataset for Newlyn start in 1915 whereas the CMEMS record starts in 1990).At a tide gauge site with more than one record, we advise users to utilize the longest record, and preferably also the most up-to-date; a complementary strategy would be to use the agency giving the most attention to data quality (e.g., UHSLC in many cases) or the agency with the most experience measuring sea level (e.g., in a US context it is likely that NOAA has the most experience measuring sea level).Our choice to minimize data processing, and remain as consistent as possible with the originating agency, provides more freedom but also puts more responsibility on the end-user.We recommend, therefore, that researchers do due diligence and carry out additional quality assurance that is commensurate with their goals and needs.We are in the process of making a list of the tide gauge sites with duplicate records, and will make this available on the GESLA website in the future.We also hope to add derived products in the future (e.g., time series of astronomical tides and skew surges, etc.).
vided by UHSLC, BODC, and CMEMS.Some duplicate records may be present in USGS and CDWR data, or NOAA and USGS.In some cases, two agencies may operate gauges within several kilometres of each other (e.g., the USGS and NOAA at Vancouver, Washington, or USGS and NOAA at Fort Pulaski, Georgia).The level of quality control may also differ between providers and the data lengths might not be consistent (e.g., the UHSLC and BODC dataset for Newlyn start in 1915 whereas the CMEMS record starts in 1990).At a tide gauge site with more than one record, we advise users to utilize the longest record, and preferably also the most up-to-date; a complementary strategy would be to use the agency giving the most attention to data quality (e.g., UHSLC in many cases) or the agency with the most experience measuring sea level (e.g., in a US context it is likely that NOAA has the most experience measuring sea level).Our choice to minimize data process

g, and remain as
onsistent as possible with the originating agency, provides more freedom but also puts more responsibility on the end-user.We recommend, therefore, that researchers do due diligence and carry out additional quality assurance that is commensurate with their goals and needs.We are in the process of making a list of the tide gauge sites with duplicate records, and will make this available on the GESLA website in the future.We also hope to add derived products in the future (e.g., time series of astronomical tides and skew surges, etc.).

Despite the large improvement in the number of records and the number of years available, further improvements in the GESLA database are possible and desirable.As Woodworth et al. (2017) pointed out, GESLA-2 did not contain any data from India, for example, and there are only a few Bangladesh, Russian and Chinese sites made available via UHSLC.Mean sea-level data are available via PSMSL for these countries, but higher-frequency data are not distributed to the international community.A number of data series are only available commercially (e.g., from the National Mapping and Resource Information Authority [NAMRIA] Despite the large improvement in the number of records and the number of years available, further improvements in the GESLA database are possible and desirable.As Woodworth et al. (2017) pointed out, GESLA-2 did not contain any data from India, for example, and there are only a few Bangladesh, Russian and Chinese sites made available via UHSLC.Mean sea-level data are available via PSMSL for these countries, but higher-frequency data are not distributed to the international community.A number of data series are only available commercially (e.g., from the National Mapping and Resource Information Authority [NAMRIA] in the Philippines or the Mekong Commission in Vietnam), and are therefore not included in GESLA-3.For example, only a fraction of the more than 1,000 years of data from the Philippines, spread over >50 stations, are available in GESLA-3 (through the UHSLC data set), though data can be purchased.Coverage across South America and Africa could also be better, although this primarily reflects a smaller number of operational stations rather than a lack of access to data.Additional records exist even in regions with high data coverage, for example for the Mississippi Delta from the US Army Corps of Engineers, or German authorities along the Ems River Estuary.Earlier digital records from our providers (Tables 1 and 2) are often unavailable online.For example, many USGS records from pre-2007 are unavailable (e.g., from Florida) due to uncertain data control.In Germany, many digital records are only in high water/low water format and are unavailable online; a similar issue exists for data archaeology efforts (such as the high water/low water record from 1875 to the present made available in Ralston et al., 2019).In the future, the GESLA effort may therefore include a separate database for high water/low water or irregularly measured data, since these are often critical for assessing long-term trends in extremes (e.g., Dangendorf et al., 2013).Continued data archaeology efforts are needed; a number of records remain in nonelectronic format, even up to the 1980s, sometimes in formats only readable by specialized machines (e.g., Talke and Jay, 2017).Many thousands of years of additional records remain to be digitized, quality-assured, and published from around the Pacific Rim, North America, and Europe (e.g., Bradshaw et al., 2015;Pouvreau, 2008;Talke and Jay, 2013;Talke and Jay, 2017).Many historical records in other countries likely remain undocumented, undigitized, or otherwise unavailable.As these records become available, they will be added to the GESLA-3 database.

n the Philippines or the Meko
g Commission in Vietnam), and are therefore not included in GESLA-3.For example, only a fraction of the more than 1,000 years of data from the Philippines, spread over >50 stations, are available in GESLA-3 (through the UHSLC data set), though data can be purchased.Coverage across South America and Africa could also be better, although this primarily reflects a smaller number of operational stations rather than a lack of access to data.Additional records exist even in regions with high data coverage, for example for the Mississippi Delta from the US Army Corps of Engineers, or German authorities along the Ems River Estuary.Earlier digital records rom our providers (Tables 1 and 2) are often unavailable online.For example, many USGS records from pre-2007 are unavailable (e.g., from Florida) due to uncertain data control.In Germany, many digital records are only in high water/low water format and are unavailable online; a similar issue exists for data archaeology efforts (such as the high water/low water record from 1875 to the present made available in Ralston et al., 2019).In the future, the GESLA effort may therefore include a separate database for high water/low water or irregularly measured data, since these are often critical for assessing long-term trends in extremes (e.g., Dangendorf et al., 2013).Continued data archaeology efforts are needed; a number of records remain in nonelectronic format, even up to the 1980s, sometimes in formats only readable by specialized machines (e.g., Talke and Jay, 2017).Many thousands of years of additional records remain to be digitized, quality-assured, and published from around the Pacific Rim, North America, and Europe (e.g., Bradshaw et al., 2015;Pouvreau, 2008;Talke and Jay, 2013;Talke and Jay, 2017).Many historical records in other countries likely remain undocumented, undigitized, or otherwise unavailable.As these records become available, they will be added to the GESLA-3 database.

Therefore, sea-level data archaeology efforts remain vital for improving 19th and 20 th -century data coverage.Due to the time-consuming nature of this work, updates to GESLA have been made in 5-or 6-year intervals.Because data providers have recently made it easier to obtain datasets via website downloads or APIs, we now hope to update the records more frequently.We also hope to continue to add new records from additional data providers, as we become aware of them.In GESLA-3, we have added, for the first time, 29 records captured recently Therefore, sea-level data archaeology efforts remain vital for improving 19th and 20 th -century data coverage.Due to the time-consuming nature of this work, updates to GESLA have been made in 5-or 6-year intervals.Because data providers have recently made it easier to obtain datasets via website downloads or APIs, we now hope to update the records more frequently.We also hope to continue to add new records from additional data providers, as we become aware of them.In GESLA-3, we have added, for the first time, 29 records captured recently from exercises in data archaeology; in the future, we hope to add many more records of this nature.We ask the readers and encourage data providers to contact us with details of any higher-frequency records that are available, but not currently in GESLA; we will endeavour to include these in future releases.As mentioned earlier, we also hope in the future to make GESLA data available via an ERDDAP data server.
from exercises in data archaeology; in the future, we hope to add many more records of this nature.We ask the readers and encourage data providers to contact us with details of any higher-frequency records that are available, but not currently in GESLA; we will endeavour to include these in future releases.As mentioned earlier, we also hope in the future to make GESLA data available via an ERDDAP data server.

While assembling GESLA-3, we became aware of a new sea-level dataset that has recently been assembled called MISELA (Minute Sea-Level Analysis) (Zemunik et al., 2021).This contains 1-minute sea-level data, at 331 tide gauges worldwide, required for studying oceanographic processes like seiches, meteotsunamis, infragravity, and coastal waves.We welcome this new dataset.Combined, the PSMSL, UHSLC, GESLA, and MISELA databases now allow for assessments of sea-level change across the full spectrum of frequencies of interest.

In concluding their paper, Woodworth et al. (2017) noted that the two scientists (Philip Woodworth and John Hunter), who provide the bulk of the construction of GESLA-2, had now retired.Now, under new leadership, the GESLA initiative continues, and the number of studies that use GESLA continues to grow.We are confident that further advances in the understanding of ocean tides, storm surges, extreme sea levels, and other relevant coastal processes will stem from this new release and enhance insight into how coastal communities might respond to sea-level rise, extreme events, and climate change.



, Familkhalili and Talke (2016), Chant et al. (2018), and Ray and Talke (2019), 5 records in the United Kingdom by Haigh et al. (


T

Abbreviated name




Abbreviated name




added a new header line to indicate the overall record quality, to aid the range of users of GESLA.A brief, qualitative expert judgement assessment was made by visually inspecting every record in GESLA-3.Based on this evaluation, we now indicate if that record has (a) no obvious issues; (b) possible datum issues; (c) possible quality control issues; and (d) possible datum and quality control issues.In total, 4,747 records are classified as having no obvious issues, 149 as having possible datum issues, 179 as having possible quality control issues, and 46 as having possible datum and quality control issues.Users who want to assess trends in extreme sea levels might, for example, only use long records identified to have no obvious issues.By contrast, users who are interested in shorter time periods (e.g., for hydrodynamic model validation or investigation of a specific event) might choose to use all available records.


F

I G U R E 1 (a) Locations of the sea-level records in GESLA-3; with histograms of the record count plotted along (b) y-axis for latitude and (c) x-axis for longitude dominated by tidal or river influences, considering the distance from the open coastline."River" stations were classified as those where a strong river influence is evident in the water levels (and they are often some distance from the open coastline), whereas "coastal sites" were classified as those where the tidal component was clearly dominant.


F

I G U R E 4 (a) Number of records with the stated number of years of data in GESLA-3 (note the logarithmic scale); (b) the number of records with a particular number of years; (c) the number of records with data starting in a particular span of years; and (d) the number of records with data in a particular year in GESLA-3, −2 and −1 (note the logarithmic scale)


F

I G U R E 5 Locations of records with: (a) <5, station years; (b) 5-10; (c) 10-20; (d) 20-50; (e) 50-100; and (f) >100


F

I G U R E 6 Locations of records with data starting in the years: (a) before 1850; (b) 1850-1900; (c) 1900-1950; (d)1950-2000; (e) 2000-2015; and (f) afte While assembling GESLA-3, we became aware of a new sea-level dataset that has recently been assembled called MISELA (Minute Sea-Level Analysis) (Zemunik et al., 2021).This contains 1-minute sea-level data, at 331 tide gauges worldwide, required for studying oceanographic processes like seiches, meteotsunamis, infragravity, and coastal waves.We welcome this new dataset.Combined, the PSMSL, UHSLC, GESLA, and MISELA databases now allow for assessments of sea-level change across the full spectrum of frequencies of interest.
In concluding their paper, Woodworth et al. (2017) noted that the two scientists (Philip Woodworth and John Hunter), who provide the bulk of the construction of GESLA-2, had now retired.Now, under new leadership, the GESLA initiative continues, and the number of studies that use GESLA continues to grow.We are confident that further advances in the understanding of ocean tides, storm surges, extreme sea levels, and other relevant coastal processes will stem from this new release and enhance insight into how coastal communities might respond to sea-level rise, extreme events, and climate change.

T
Abbreviated name Abbreviated name added a new header line to indicate the overall record quality, to aid the range of users of GESLA.A brief, qualitative expert judgement assessment was made by visually inspecting every record in GESLA-3.Based on this evaluation, we now indicate if that record has (a) no obvious issues; (b) possible datum issues; (c) possible quality control issues; and (d) possible datum and quality control issues.In total, 4,747 records are classified as having no obvious issues, 149 as having possible datum issues, 179 as having possible quality control issues, and 46 as having possible datum and quality control issues.Users who want to assess trends in extreme sea levels might, for example, only use long records identified to have no obvious issues.By contrast, users who are interested in shorter time periods (e.g., for hydrodynamic model validation or investigation of a specific event) might choose to use all available records.

F
I G U R E 1 (a) Locations of the sea-level records in GESLA-3; with histograms of the record count plotted along (b) y-axis for latitude and (c) x-axis for longitude dominated by tidal or river influences, considering the distance from the open coastline."River" stations were classified as those where a strong river influence is evident in the water levels (and they are often some distance from the open coastline), whereas "coastal sites" were classified as those where the tidal component was clearly dominant.

F
I G U R E 4 (a) Number of records with the stated number of years of data in GESLA-3 (note the logarithmic scale); (b) the number of records with a particular number of years; (c) the number of records with data starting in a particular span of years; and (d) the number of records with data in a particular year in GESLA-3, −2 and −1 (note the logarithmic scale)

Number Abbreviated name Full name Website Country Download method Licence Use
Woodworth et al. (2017)ogy represents the same provider as the National Tidal Centre Australia in Table2ofWoodworth et al. (2017).