Climate Change

Climate Change#

\(^{1}\)Image credit: MysteryShot via iStock

Mar, 2023

Stats Probability

Background#

Climate is changing, we all know that. There is no doubt temperatures are rising and weather patterns are shifting. Science repeatedly confirms that.

Besides, when I was a kid, I remember winters were colder and it used to rain more.
But wait! This is now my memory talking. And I know it can be misleading. Where really colder and rainier those distant days in the seventies and eighties?

There is no doubt about the global warming, but is it also happening in my town?
I wanted to check it out by myself.

The data#

I headed for the website of the Spanish national weather agency AEMET, where open data can be downloaded. There is a weather station right in my town, so I selected it, but only data from 2008 onwards was available. So I chose instead the main station in the province, the meteorological observatory in Igueldo (San Sebastian).

Igeldo
\(^{1}\)Image credit: AEMET

This historical observatory started working in 1905, and the available data spans from 1928, so it holds the second longest weather data series in Spain. I also learned from the internet that measurement methods have not changed since then, and that made the data very valuable and this observatory a member of the international organization for the study of the climate change.

	temp	rainfall
1928-01-01	NaN	179.8
1928-02-01	NaN	42.0
1928-03-01	NaN	184.5
1928-04-01	NaN	115.7
1928-05-01	NaN	125.8
...	...	...
2022-08-01	21.4	76.5
2022-09-01	18.6	147.6
2022-10-01	19.6	40.6
2022-11-01	13.4	317.3
2022-12-01	11.9	82.4

1140 rows × 2 columns

The data was downloaded as JSON files (one for each year: 95 files-years), and from them monthly average temperatures and monthly accumulated rainfalls were gathered.

Data validation#

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1140 entries, 1928-01-01 to 2022-12-01
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   temp      1132 non-null   float64
 1   rainfall  1140 non-null   float64
dtypes: float64(2)
memory usage: 26.7 KB

There are 8 month-temperatures missing.

	temp	rainfall
1928-01-01	NaN	179.8
1928-02-01	NaN	42.0
1928-03-01	NaN	184.5
1928-04-01	NaN	115.7
1928-05-01	NaN	125.8
1928-09-01	NaN	19.8
1936-09-01	NaN	77.0
1936-10-01	NaN	166.6

Temperature data is missing for six months in 1928, so I am going to remove that year altogether. Two months are also lacking data in 1936 (beginning of the Spanish War). I am going to assign them the mean value of the closest known month temperatures.

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1128 entries, 1929-01-01 to 2022-12-01
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   temp      1128 non-null   float64
 1   rainfall  1128 non-null   float64
dtypes: float64(2)
memory usage: 58.7 KB

The data is now ready to proceed with the analysis.

Temperatures#

Let’s start exploring the temperatures!

Exploratory Data Analysis#

I will plot yearly average temperatures together with their average value of the last 25 years. This rolling window of 25 years draws a smoother curve and allows to see possible trends.

../_images/767b86c487b1e15d533372dd8c280866a87da44212397b82d53ccd55557a9857.png

Average temperatures do seem to be increasing lately according to the graph.

To evaluate the differences and proceed with the analysis, let’s classify the data into three groups:

Temperatures from the last 25 years (from 1998 to 2022)
Temperatures from the previous last 25 years (from 1973 to 1997)
Temperatures from the previous to the previous last 25 years (from 1948 to 1972)

../_images/544dec6446310f868579eafb78db887a46fd1100a206c79aecc6d6419317c592.png

../_images/9927212f6c2a6bceb3573847d72199126cb1aefb4026118b85941ea9e409804b.png

../_images/f9b8aebea10336b27f0bf7d324b0996b89611bd1d2069c0c33848d28b3a11456.png

Clearly, there has been a hike in annual average temperatures in the las 50 years, the difference is clear between ‘1973-1997’ and ‘1998-2022’ groups.

Difference of sample means
from 1973-1997 to 1998-2022 -> +0.75 ºC

How significant is this increment?
Are temperatures really on the rise, or were this higher temperatures the result of just chance?

We are going to do hypothesis testing to evaluate it.

Hypothesis testing#

The null hypothesis takes the stand of the climate change denial: that is, the measures from ‘1973-1997’ and ‘1998-2022’, they both come from the same data distribution because climate has not changed, and so the difference we obtained in temperatures occurred just by chance.

We will do a permutation test to check if this could be the case. We will scramble the data from both sets in what is known as permutation samples to simulate this belonging to the same source.

Let’s first explore visually in a ECDF (Empirical Cumulative Distribution Function) how permutation samples relate to the original data sets.

../_images/a89e05f053dbdbd1028265e5ad073bbb2faee5ae43fb1fe732eb1f9840a5aa60.png

We see that permutated data sets (in hazed purple points) do not overlap with the original temperature sets, suggesting there is a significant difference among them. Let’s prove it!

To test this hypothesis we will use the difference of means as the test statistic..

We will compute the probability of getting a difference in mean temperatures of 0.75 ºC or more, under the null hypothesis that the distributions of the two groups (‘1973-1997’ and ‘1998-2022’) are identical.

p-value = 0.0001

The p-value says that after randomly scrambling the data from the two groups 10000 times and making two groups each time, only once the difference between mean values of each group was equal or higher than the difference among the original groups. That is to say, the probability of observing this difference in mean temperatures if there was not such a thing as two different groups would be just of 0.01 %.

Graphically looks this way.

../_images/eea1522b69e00a83a66d9168a1ef2ecf1257f17d25ce0a65efad907a1f26d099.png

Extremely low probability, so we reject the null hypothesis and conclude that the two temperature groups must be indeed different and come with different distributions, i.e. temperatures are rising.

Rainfall#

Was it rainier when I was a kid? That is what I thought. But my memories are probably skewed, maybe in part because of the awareness about the ongoing climate change, which suggests drier weather conditions nowadays.

Let’s check it out!

Exploratory Data Analysis#

../_images/8055b57e1a1cf9e77dd363fd1318f737da3d523db33dcb007f94c5831fc62049.png

Yikes, it looks like average rainfall has not changed during all these years!

../_images/51f15894f321b1ee6b9b10a91a19dbd27b1d865c9cbabb3df94930551f0e95b4.png

If we make three groups like in the case of the temperatures, we see very similar measure distributions for the rainfalls.

../_images/d04d707df35e6c2107951828eafa16d0334c63d268ae97fe40303dd2e1d026e6.png

Mean values are quite similar, with confidence intervals overlapping.

../_images/ccb80e448394809c5edd126656e08e1c4d041da1821bd6005b02c82582272358.png

They all overlap and look similar. It does not seem there has been any change during the years.

Empirical diff. in mean
from 1973-1997 to 1998-2022 -> -8.69 mm

The difference in rainfalls looks insignificant.

../_images/6c8e7231a940536a7757188e6a5659a25e670da5f722988544877dff060da2b3.png

We see that permutated data sets (in hazed points) do overlap with the original rainfall sets (orange and green), suggesting there is not a significant difference. I conclude that the two rainfall groups are indeed the same and come with the same distributions, i.e. there is no difference in rainfalls (so my memory was in fact skewed).

Conclusion#

In this little project permutational testing was used to assess whether temperatures and rainfalls are changing locally because of the climate change. I came to the conclusion that temperatures are certainly rising, and rainfalls remain the same.

One of the best things about doing Data Science is the thrill you get from collecting raw data and drawing your own conclusions. Even more if the subject is such an important and universal topic as Climate Change. You are doing Science, probably not high-level science, but science anyway!

Climate Change

Contents

Climate Change#

Background#

The data#

Data validation#

Temperatures#

Exploratory Data Analysis#

Hypothesis testing#

Rainfall#

Exploratory Data Analysis#

Conclusion#