Study area
Lanzhou City (N 35° 23'–37° 42', E102° 24–104° 33', Fig. 1), located in the west of China, is the largest city consisting of the largest population in the Gansu Province; by the end of 2019, the total registered population of the entire city was 3.79 million. This city has a typical temperate semi-arid climate and is surrounded by mountains. The local industries are mainly petrochemical, metallurgy or machinery related.
Data collection
According to the hospital admissions and geographical locations of large general hospitals in Lanzhou, three large general hospitals with complete electronic medical record systems were selected as data sources. The city territory is geographical orientated on the slopes of the mountains and descends from the southern side to the northern side with a 40 km urban line stretching along the river from the west to the east (see Additional file 1: Fig. S1). Residential areas are mainly distributed in strips from east to west, and these three hospitals are located in the central district with access to convenient transportation, which are surrounded by high-density population regions in Lanzhou. All densely populated areas, in addition to the selected three hospitals are located within a 15 km radius of them. These three hospitals are the largest hospitals in Lanzhou, with 2500, 3500, 2100 inpatient beds, and 8420, 11,860, 8530 inpatients reported in 2019, respectively. We selected these three hospitals for this study mainly due to their reputable levels of medical care, sophisticated medical departments, and their proven capabilities to diagnose and treat patients with CVD. It was estimated that these three hospitals serve roughly 75 percent of all patients in Lanzhou [20], which is a preferable choice for local patients in Lanzhou with CVD. Daily counts of hospital admission on CVD were collected from these three hospitals between 1 January 2014 and 12 December 2019. Our collected research data included the patient's date of admission, principal diagnosis, age and gender. All subjects with CVD were diagnosed by specialist physicians according to the International Classification of Diseases, 10th revision (ICD-10: I00–I99). Patients were selected according to their primary diagnosis ICD-10 codes in the electronic medical record. Then we calculated the daily count of CVD admission (ICD-10 code: I00–I99). In addition, we also extracted Cause-specific CVD hospitalizations, including IHD (ICD-10 code: I20–I25), heart rhythm disturbances (HRD, ICD-10 code: I44–I49), Heart failure (HF, ICD-10 code: I50), and cerebrovascular events (CD, ICD-10 code: I60–I69), the most common cardiovascular disease diagnosed in Lanzhou [20]. To avoid exposure misclassification, patients from locations other than Lanzhou were excluded. We excluded from the study those with two or more hospitalization records found in the hospital information system (HIS).
Daily (24 h) average concentrations of air pollutants, including particulate matter (PM2.5, PM10, unit: μg/m3), sulfur dioxide (SO2: μg/m3), nitrogen dioxide (NO2: μg/m3), carbon monoxide (CO: mg/m3), and the maximum daily 8 h moving average concentration of ozone (O38h: μg/m3) were acquired from Lanzhou Environmental Monitoring Centre, which were gathered consecutively in 3 designated monitoring stations covering urban districts of Lanzhou. According to construction norms for air quality monitoring stations, these 3 monitoring stations are located far from sources of pollution, urban transportation, and buildings, hence, the data obtained from these stations are representative of the overall levels of air pollution in the city. In the Chinese air quality online monitoring system, PM2.5 and PM10 were monitored by using a continuous automatic β-ray monitoring system. SO2 and O3 were monitored using ultraviolet fluorescence, NO2 by chemiluminescence and CO by infrared absorption. All measurements were made in line with China’s National Air Quality Control standards (GB3095-2012). Since Lanzhou is a long but “narrow" city that is situated along a river valley approximately 40 km long from east to west and 3–8 km wide from north to south, the urban area is small (see Additional file 1: Fig. S1). Therefore, the three hospitals and the three monitoring stations are within 5–15 km of one another, and the average data of the three monitoring stations can better reflect the actual air pollution exposure in Lanzhou City. However, we were not able to geocode the locations of the monitoring stations or the residential addresses pertaining to the patients using Baidu Map API. Because the home address history of the individuals who came to get medical treatment were not recorded, detailed, and standardized by the operators from the three hospitals, the home addresses of participants information could not be converted into the corresponding latitude and longitude coordinates obtained from the Baidu Map website (http://api.map.baidu.com/lbsapi/) and managed by ArcGIS10.0 (Redlands, CA, USA). Therefore, it was not possible to use spatial interpolation or pollution data from the nearest air quality monitoring station to reflect the exposure level of the hospital population. After consulting the relevant literature [8, 21, 22], the values from the above three urban stations were averaged to calculate one daily concentration value for PM2.5. PM10, SO2, and NO2, and the corresponding air pollutant concentration values were set as the average pollutant exposure levels of urban residents according to the recommended methods.
There is one weather monitoring station located in the urban area of Lanzhou, and most of the air quality monitoring stations in Lanzhou are located within a 19 km radius of the meteorological station. The data, including daily average temperature (°C) and relative humidity (%) were obtained from this monitoring station in the urban area of Lanzhou (N 103° 53, E36° 03). For evaluating city-wide temperature effects on morbidity, a time-series model, based on one monitoring station temperature, is equal to spatiotemporal model that utilizes spatial temperatures [23]. Monitoring of meteorological data was conducted in accordance with the mandatory quality assurance/quality control (QA/QC) procedures set by the Chinese meteorological administration, ensuring the high standard of meteorological monitoring data. No air pollutant data or meteorological information were realized missing during the study period.
Statistical analyses
Daily hospital admissions of CVD are relatively small in number and the case data on CVD often appear over-dispersed, in addition to approximately following Poisson distribution. For this reason, we estimated the short-term association between air pollutants and temperature on CVD daily morbidity by conducting a quasi-Poisson regression analysis using a distributed lag nonlinear model (DLNM). Table 2 shows the correlations between weather conditions and air pollutants. Spearman correlation analysis indicated a strong correlation among air pollutants. To avoid a multicollinearity problem, only those factors with a correlation of |r|< 0.8 were incorporated into the model. We ran a single-pollutant model, including only one contaminant in each model. The relationship between air pollutants and CVD daily morbidity was as follows:
$$\begin{aligned} {\text{Log}}[E(Yt)]\; = & \;\alpha + \beta X_{t,l} \; + \;{\text{ns}}\left( {{\text{Tem}}_{t} ,\;{\text{df}}} \right)\; + \;{\text{ns}}\left( {{\text{rh}}_{t} ,{\text{df}}} \right) + {\text{ns}}\left( {{\text{Time}}_{t} ,{\text{df}}} \right) \\ + & \;{\text{ns}}\left( {{\text{Season}},{\text{df}}} \right) + {\text{ factor}}\left( {{\text{Dow}}_{t} } \right) + {\text{ factor}}\left( {{\text{Holiday}}_{t} } \right), \\ \end{aligned}$$
(1)
where Yt represents the count of hospital visits for CVD or other disease on day t; E (Yt) represents the expected number of daily hospital visits for CVD or other disease on day t; \(\alpha\) represents the constant term; Xt,l represents the cross-basis matrix obtained by applying the DLNM to the concentration of PM2.5; and l represents lag day (we used natural cubic spline for the nonlinear effect and a polynomial function for the lagged effect); ns () indicates the smoother of the nature cubic spline; Temt represents daily average on day t; and rht represents the daily average relative humidity. Timet refers to the calendar time. Season refers to the day/days of year variable, which controls the seasonal trends. DOWt and Holidayt refer to the dummy viable of the day of the week and public holiday. ns (Temt, df), ns (rht, df) and ns (Timet, df) are natural cubic spline functions to control potential nonlinear confounding effects of the underlying temporal trends of temperature, relative humidity and time, each with 3, 3 and 5 degrees of freedom. According to the minimum Akaike information criterion for the quasi-Poisson model (Q-AIC), the optimal degrees of freedom (df) were set as 3 for both temperature and humidity and 5 for per year time trend. Ns (Season, df) is used to adjust and control seasonal trends. The seasonal degree of freedom was set as df = 4/year [24], and the four seasons, spring (March, April, May), summer (June, July, August), autumn (September, October, November), and winter (December, January, February) were included in the model.
A similar approach was adopted to assess the association between temperature and the CVD hospitalization, and the model formula was defined as:
$$\begin{aligned} {\text{Log}}\left[ {E\left( {Y_{t} } \right)} \right]\; = & \;\alpha + \beta \;{\text{TEM}}_{t,l} \; + \;{\text{ns}}\left( {{\text{Time}}_{t} {\text{,df}}} \right) + {\text{ns}}\left( {{\text{rh}}_{t} {\text{,df}}} \right) + {\text{ns}}({\text{PM}}_{10t} {\text{,df}}) + {\text{ns}}({\text{SO}}_{2t} {\text{,df}}) \\ + \; & {\text{ns}}({\text{NO}}_{{{\text{2t}}}} {\text{,df}}) + {\text{ ns}}\left( {{\text{Season}},{\text{df}}} \right) + {\text{ factor}}({\text{Dow}}_{t} ) + {\text{ factor}}\left( {{\text{Holiday}}_{t} } \right), \\ \end{aligned}$$
(2)
where t represents day of observation; E (Yt) represents the expected number of hospital admissions for CVD or other disease on day t; α represents the intercept; TEMt,l represents the matrix obtained by applying the DLNM to temperature; β represents the vectors of coefficients for TEMt,l; l represents the lag days and a natural cubic spline (knots at equally spaced percentiles by default). 5 degrees of freedom (df) were used for the exposure–response relationship and natural cubic splines (knots at equally spaced values in the log scale of lags by default), and 4 degrees of freedom for the lag-response relationship. ns () represents the natural cubic spline in DLNM. To account for the nonlinear variables (i.e., time trend and relative humidity), Timet represent the long-term temporal trend and the seasonal trend. Df represents the degree of freedom. Rht represents the relative humidity of day t. PM10t represents particulate matter less than 10 μm in aerodynamic diameter day t. SO2t represents sulfur dioxide day t. NO2t represents sulfur dioxide day t. The meaning of Season, Dowt and holidayt are the same as the aforementioned description in the preceding formula. The selection of degree of freedom was based on minimizing Akaike Information Criterion for quasi-Poisson (Q-AIC). Ns represents a smoothed function of Timet (df = 5), rht (df = 3), PM10t (df = 3), SO2t (df = 3), NO2t (df = 3) and Season (df = 4).
Considering that there may be a delayed effect of air pollutants, therefore, the single-day lag effect (lag 0 to 7 days) and the cumulative lag effect (lag 01 to 07 days) were analyzed. The greatest effects of both single-day lag and cumulative lag for each pollutant were used in further analysis. The cumulative lag days were defined as the mean of the current day and several prior days (1–7 days, lag01 to lag07). We also explored the effect of daily temperatures on total and cause-specific CVD hospitalizations by choosing 7 days as the maximum lag periods [16].
In this study, the zero value of the daily PM2.5 was used as a reference, and relative risk (RR) and 95% CI was also used to represent the specific lag and cumulative risk of CVD hospitalization for every 10 μg/m3 increase in PM2.5 concentration. For mean daily temperature, the 5th percentile for cold and the 95th percentile for heat were compared with the median temperature and relative risk (RR). The 95% confidence interval (95% CI) were also calculated.
In order to identify the susceptible populations, we also performed subgroup analysis by gender (male and female) and age (< 65 years and ≥ 65 years). We further conducted a Z-test to verify the statistical significance of the stratified analysis differences by using the formula below [25, 26]:
$$(\beta_{1} - \beta_{2} {)/}\sqrt {{\text{SE}}_{1}^{2} \; + \;{\text{SE}}_{2}^{2} } ,$$
where \(\beta_{1}\) and \(\beta_{2}\) represent the estimates for the two categories, and \({\text{SE}}_{1}\) and \({\text{SE}}_{2}\) represent their respective standard errors.
Residual plots and Shapiro–Wilk normality test of residuals were used to assess the appropriateness of models [27]. In an attempt to minimize autocorrelation, plots of partial autocorrelation function (PACF) were examined to evaluate whether the parameter selections in the model were appropriate.
For assessing the stability of result, several sensitivity analyses are performed. Firstly, we changed df in the smooth function: long-term trend (df: 6–10). Secondly, two-pollutant models were constructed to investigate the confounding or compound effects of other pollutants, with the exception of PM2.5 and PM10. This was because the high Spearman's correlation coefficients existed between PM2.5 and PM10 (Spearman rank correlations of 0.86; Table 2). Thirdly, stratified analyses based on three air quality monitors were performed to examine the stability of the model by using average concentrations on their model by using single (reference site) or multiple sites. We also plotted the exposure–response curves for the associations of hospital admissions for total CVD or other disease with PM2.5 at different exposure concentrations. The exposure–response curve is presented by using cubic spline functions with 4 degrees of freedom, in line with previous studies [28].
All statistical analyses were conducted with R software (version 3.6.3) using “dlnm” and “mgcv” packages.