Temperature forecast

Temperature forecast#

Aug, 2025

Time-series forecasting

Background#

In recent summers, southern Navarra has experienced very hot weather, particularly in the area known as La Ribera, near the banks of the Ebro river. My wife’s father is originally from there, and she remembers that during her childhood holidays, there were also some extremely hot days. But were those summers truly as hot as the ones we experience today? She is also concerned about rising temperatures in the coming years, as she fears the area could become too hot to spend time in. So she asked me to look into the data, and here are the results.

The data#

The weather station I selected is located in Cadreita (42°12’28.0”N, 1°43’05.9”W)

Daily maximum and minimum temperatures, along with rainfall data, were available in CSV format from 1985 to mid-2025 on this site: https://meteo.navarra.es/estaciones/estacion.cfm?IDEstacion=96

Since the data could only be downloaded one year at a time, I wrote a script to combine all the files into a single dataset containing the complete series.

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 14761 entries, 1985-01-01 to 2025-05-31
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   rainfall  14600 non-null  float64
 1   tmax      14532 non-null  float64
 2   tmin      14532 non-null  float64
dtypes: float64(3)
memory usage: 461.3 KB
None

	rainfall	tmax	tmin
FECHA
1985-01-01	0.0	NaN	NaN
1985-01-02	0.0	10.0	4.0
1985-01-03	0.0	12.0	-1.0
1985-01-04	0.0	13.0	-5.5
1985-01-05	3.8	2.5	-5.0
...	...	...	...
2025-05-27	0.0	28.5	10.0
2025-05-28	0.0	31.5	10.0
2025-05-29	0.0	35.0	12.0
2025-05-30	0.0	36.0	14.0
2025-05-31	0.0	35.0	13.0

14761 rows × 3 columns

Exploratory analysis#

Yearly temperatures#

I subset temperatures from the dataframe, dropping rows with no data.

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 14532 entries, 1985-01-02 to 2025-05-31
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   tmin    14532 non-null  float64
 1   tmax    14532 non-null  float64
dtypes: float64(2)
memory usage: 340.6 KB
None

I create a new column with the average temperature for the day.

	tmin	tmax	tavg
FECHA
1985-01-02	4.0	10.0	7.00
1985-01-03	-1.0	12.0	5.50
1985-01-04	-5.5	13.0	3.75
1985-01-05	-5.0	2.5	-1.25
1985-01-07	-7.0	1.5	-2.75

The data for 2025 is incomplete (available only until May 31, 2025), so I will use data up to December 31, 2024, in order to work with full years.

I will then calculate yearly mean values and a moving average of 10 years for filtering, and plot the results.

../_images/b0ff11a4a613eef4fd84bc16c2f9d76f8793bde035c7ff32c01ed32ee286713f.png

Yearly temperatures have clearly risen over the past 40 years. The moving average indicates an increase of about 1 °C in minimum, maximum, and average temperatures. However, since this smoothing considers the past 10 years, it may understate the impact of the most recent—and hottest—years, meaning the actual increase could be even greater.

Climate change assesment#

Let’s check if climate change is a real phenomenon. For each of the three temperature measures (tmax, tavg, tmin), I will divide the data into two groups by setting a cutoff point—for example, the year 2012.

The year 2012 seems like a good compromise: for the most recent group, it focuses on the last several years, which were more extreme, while still providing enough data points to keep the confidence intervals from becoming too large.

Show code cell source Hide code cell source

cutoff = 2012 # To divide in two groups

# Subset data series in those groups
yearly_then = yearly[yearly.index.year < cutoff]
yearly_now = yearly[yearly.index.year >= cutoff]

# Prepare data to plot in seaborn
tmin_comp = pd.concat([yearly_then["tmin"].reset_index(drop=True).rename(f"until {cutoff}"),
                       yearly_now["tmin"].reset_index(drop=True).rename(f"from {cutoff}")],
                      axis=1).melt().dropna()

tmax_comp = pd.concat([yearly_then["tmax"].reset_index(drop=True).rename(f"until {cutoff}"),
                       yearly_now["tmax"].reset_index(drop=True).rename(f"from {cutoff}")],
                      axis=1).melt().dropna()

tavg_comp = pd.concat([yearly_then["tavg"].reset_index(drop=True).rename(f"until {cutoff}"),
                       yearly_now["tavg"].reset_index(drop=True).rename(f"from {cutoff}")],
                      axis=1).melt().dropna()

# Plot
fig, ax = plt.subplots(1, 3, figsize=(9, 4))

for i, data in enumerate([tmin_comp, tmax_comp, tavg_comp]):
    sns.swarmplot(ax=ax[i],
                  x="variable",
                  y="value",
                  data=data,
                  color=sns.color_palette("tab10")[i]
                 )
    
    sns.boxplot(ax=ax[i],
                x="variable",
                y="value",
                data=data,
                color=sns.color_palette("tab10")[i],
                boxprops=dict(alpha=0.5)
               )

    ax[i].grid(axis="y", alpha=0.3)
    ax[i].set_axisbelow(True)
    ax[i].set_xlabel("")
    ax[i].set_ylabel("")

ax[0].set_title("Tmin")
ax[1].set_title("Tmax")
ax[2].set_title("Tavg")

sns.despine()
plt.show()

# Plot
fig, ax = plt.subplots(1, 3, figsize=(9, 4))

for i, data in enumerate([tmin_comp, tmax_comp, tavg_comp]):
    sns.pointplot(ax=ax[i],
                  x="variable",
                  y="value",
                  data=data,
                  color=sns.color_palette("tab10")[i]
                 )
    
    ax[i].grid(axis="y", alpha=0.3)
    ax[i].set_axisbelow(True)
    ax[i].set_xlabel("")
    ax[i].set_ylabel("")

sns.despine()
plt.show()

../_images/b0d02faf23fe2bfd29815bfbbc11d93ed5e478059868aba6e590687216217590.png

../_images/813afbbf31223124c72b472a480fcc438fe78520ef902ca08f51a446c7f5c136.png

In the plots on the second row, we can clearly see that the confidence intervals for the mean values do not overlap. This provides strong evidence that the data points from each group belong to different distributions—indicating that the climate is indeed changing.

Days with extreme temperatures#

Since daily temperature records are available for all years, let’s calculate the number of days with extreme temperatures. This will serve as a good indicator of the changes taking place in the climate. I will plot the number of days per year with:

Maximum temperatures reaching 38ºC.
Minimum temperatures staying above 22ºC (‘tropical nights’)
Minimum temperatures dropping below -5ºC.

The reason I chose these temperature thresholds is that they clearly reveal the trends in each graph.

Show code cell source Hide code cell source

rolling_years = 10 # For the moving average in the graphs

# Plot
tmax = 38
tmax_days = temps.loc[temps["tmax"] >= tmax, ["tmax"]]

tmax_days_count = tmax_days.groupby(tmax_days.index.year).count().rename(columns={"tmax": "tmax_days"})
tmax_days_count.index = pd.to_datetime(tmax_days_count.index, format="%Y")
tmax_days_count = tmax_days_count.reindex(
    pd.date_range(start="1985-01-01", end="2024-01-01", freq="YS")).fillna(0).astype(int)

fig, ax = plt.subplots(figsize=(8, 4))

tmax_days_count.plot(ax=ax, kind="bar", color=sns.color_palette("tab10")[1])
tmax_days_count.rolling(window=rolling_years).mean().reset_index(drop=True).plot(ax=ax, linewidth=1.5, color="black")

ax.grid(axis="y", alpha=0.3)
ax.set_axisbelow(True)
ax.set_ylabel("Days", size=11)
ax.set_title(f"Tmax >= {tmax} ºC")
x_labels = list(range(1985, 2025))
ax.set_xticks(range(len(x_labels)), labels=x_labels)
ax.set_yticks(range(14))
ax.tick_params(axis='x', labelsize=9, rotation=90)
h, _ = ax.get_legend_handles_labels()
ax.legend(handles=[h[1], h[0]], labels=["number of days", f"Avg. last {rolling_years} years"], loc="upper left")
sns.despine()
plt.show()

# Plot
tmin_above = 22
tmin_above_days = temps.loc[temps["tmin"] >= tmin_above, ["tmin"]]

tmin_above_days_count = tmin_above_days.groupby(tmin_above_days.index.year).count().rename(columns={"tmin": "tmin_above_days"})
tmin_above_days_count.index = pd.to_datetime(tmin_above_days_count.index, format="%Y")
tmin_above_days_count = tmin_above_days_count.reindex(
    pd.date_range(start="1985-01-01", end="2024-01-01", freq="YS")).fillna(0).astype(int)

fig, ax = plt.subplots(figsize=(8, 4))

tmin_above_days_count.plot(ax=ax, kind="bar", color="brown")
tmin_above_days_count.rolling(window=rolling_years).mean().reset_index(drop=True).plot(ax=ax, linewidth=1.5, color="black")

ax.grid(axis="y", alpha=0.3)
ax.set_axisbelow(True)
ax.set_ylabel("Nights", size=11)
ax.set_title(f"Tmin >= {tmin_above} ºC : 'Tropical nights'")
x_labels = list(range(1985, 2025))
ax.set_xticks(range(len(x_labels)), labels=x_labels)
ax.set_yticks(range(6))
ax.tick_params(axis='x', labelsize=9, rotation=90)
h, _ = ax.get_legend_handles_labels()
ax.legend(handles=[h[1], h[0]], labels=["number of nights", f"Avg. last {rolling_years} years"], loc="upper left")
sns.despine()
plt.show()

# Plot
tmin = -5
tmin_days = temps.loc[temps["tmin"] <= tmin, ["tmin"]]

tmin_days_count = tmin_days.groupby(tmin_days.index.year).count().rename(columns={"tmin": "tmin_days"})
tmin_days_count.index = pd.to_datetime(tmin_days_count.index, format="%Y")
tmin_days_count = tmin_days_count.reindex(
    pd.date_range(start="1985-01-01", end="2024-01-01", freq="YS")).fillna(0).astype(int)

fig, ax = plt.subplots(figsize=(8, 4))

tmin_days_count.plot(ax=ax, kind="bar", color=sns.color_palette("tab10")[0])
tmin_days_count.rolling(window=rolling_years).mean().reset_index(drop=True).plot(ax=ax, linewidth=1.5, color="black")

ax.grid(axis="y", alpha=0.3)
ax.set_axisbelow(True)
ax.set_ylabel("Days", size=11)
ax.set_title(f"Tmin <= {tmin} ºC")
x_labels = list(range(1985, 2025))
ax.set_xticks(range(len(x_labels)), labels=x_labels)
ax.set_yticks(range(18))
ax.tick_params(axis='x', labelsize=9, rotation=90)
h, _ = ax.get_legend_handles_labels()
ax.legend(handles=[h[1], h[0]], labels=["number of days", f"Avg. last {rolling_years} years"], loc="upper right")
sns.despine()
plt.show()

../_images/43c18aa47192903447830c5e72b07b4c61395f4f343651ba603e6d6a92192c40.png

../_images/fb9f5648bee1c8cfd08523063a85d43cb6903e06f2367a8b950b0bd17f2940e7.png

../_images/5f3cf8dd80386d0fc11d7b2ed77fd414b9f9bfcbb97bd10b3606e03fc66319c0.png

Modelling#

We will model the data series to predict temperatures in the next few years.

Let’s run the Augmented Dickey-Fuller test to determine whether the time series are stationary or exhibit a statistically significant trend. If a trend is present (as is expected), the differencing orders required to make the series stationary for ARIMA modeling will be calculated.

'tmin' stationary? False
'tmin' -> difference order d=1
'tmax' stationary? False
'tmax' -> difference order d=2
'tavg' stationary? False
'tavg' -> difference order d=1

According to the difference orders:

d=2 in tmax would mean a more persistent slope changes.
d=1 in tmin and tavg would mean the slope stays rather steady.

../_images/bedc8f14846f501d8a970a2ec4e42d44a39f631aa441fdfb852f0c1d1dae483d.png

Based on the modeling and the 15-year forecast, it is quite clear that rising temperatures are here to stay.
One conclusion from the graph is that maximum temperatures appear to exhibit a little bit more pronounced upward trend, although confidence intervals are also wider, suggesting a bit more unpredictability than minimum temperatures.

Conclusion#

It seems obvious that temperatures are rising, so even though my wife remembers very hot summers in the 1980s, they are now even hotter in this area. Does this mean that spending holidays there will become unbearable in the long term? Who knows. For now, the best we can do is monitor the temperatures to see where we’re headed—and, alternatively, invest in an air conditioning system and stay indoors with the shades drawn during heatwaves.