GISS Surface Temperature Analysis (GISTEMP v32)

Introduction

This is a look at the Godard Institute for Space Studies' surface temperature data. In particular it is the Global-mean monthly, seasonal, and annual means data which has data from 1880 to the present (CSV Download Link).

Set Up

Imports

Python

from pathlib import Path
import os

PyPi

from dotenv import load_dotenv
import pandas

Load Dotenv

load_dotenv(".env")

Load the Data

Take One

path = Path(os.environ.get("GLOBAL")).expanduser()
assert path.is_file()
with path.open() as reader:
    giss = pandas.read_csv(path)
print(giss.head())
                                                                                          Land-Ocean: Global Means
Year Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec  J-D  D-N  DJF  MAM  JJA                       SON
1880 -.29 -.18 -.11 -.19 -.11 -.23 -.20 -.09 -.16 -.23 -.20 -.22 -.18 ***  ***  -.14 -.17                     -.19
1881 -.15 -.17 .04  .04  .02  -.20 -.06 -.02 -.14 -.21 -.22 -.11 -.10 -.11 -.18 .03  -.10                     -.19
1882 .14  .15  .04  -.19 -.16 -.26 -.21 -.05 -.10 -.25 -.16 -.24 -.11 -.10 .06  -.10 -.17                     -.17
1883 -.31 -.39 -.13 -.17 -.20 -.12 -.08 -.15 -.20 -.14 -.22 -.16 -.19 -.20 -.31 -.16 -.12                     -.19

One thing to notice is that the first line got read in as columns and the columns got read in as the first row.

print(giss.iloc[0])
Land-Ocean: Global Means    SON
Name: (Year, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec, J-D, D-N, DJF, MAM, JJA), dtype: object

So we're going to have to skip the first row.

Take Two

path = Path(os.environ.get("GLOBAL")).expanduser()
assert path.is_file()
with path.open() as reader:
    giss = pandas.read_csv(path, skiprows=1)
print(giss.head())
   Year   Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov  \
0  1880 -0.29  -.18  -.11  -.19  -.11  -.23  -.20  -.09  -.16  -.23  -.20   
1  1881 -0.15  -.17   .04   .04   .02  -.20  -.06  -.02  -.14  -.21  -.22   
2  1882  0.14   .15   .04  -.19  -.16  -.26  -.21  -.05  -.10  -.25  -.16   
3  1883 -0.31  -.39  -.13  -.17  -.20  -.12  -.08  -.15  -.20  -.14  -.22   
4  1884 -0.15  -.08  -.37  -.42  -.36  -.40  -.34  -.26  -.27  -.24  -.30   

    Dec   J-D   D-N   DJF   MAM   JJA   SON  
0  -.22  -.18   ***   ***  -.14  -.17  -.19  
1  -.11  -.10  -.11  -.18   .03  -.10  -.19  
2  -.24  -.11  -.10   .06  -.10  -.17  -.17  
3  -.16  -.19  -.20  -.31  -.16  -.12  -.19  
4  -.29  -.29  -.28  -.13  -.38  -.34  -.27  
print(giss.columns)
Index(['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep',
       'Oct', 'Nov', 'Dec', 'J-D', 'D-N', 'DJF', 'MAM', 'JJA', 'SON'],
      dtype='object')
print(giss.describe())
            Year         Jan
count   140.0000  140.000000
mean   1949.5000    0.027500
std      40.5586    0.396867
min    1880.0000   -0.790000
25%    1914.7500   -0.265000
50%    1949.5000   -0.020000
75%    1984.2500    0.290000
max    2019.0000    1.150000

So most of the columns weren't read as numeric, probably because of the use of *** for missing data.

Take Three

with path.open() as reader:
    giss = pandas.read_csv(path, skiprows=1, na_values="***")
print(giss.describe())
            Year         Jan         Feb         Mar         Apr         May  \
count   140.0000  140.000000  139.000000  139.000000  139.000000  139.000000   
mean   1949.5000    0.027500    0.038201    0.052806    0.026187    0.016043   
std      40.5586    0.396867    0.393732    0.387470    0.363309    0.348825   
min    1880.0000   -0.790000   -0.610000   -0.600000   -0.600000   -0.560000   
25%    1914.7500   -0.265000   -0.235000   -0.230000   -0.260000   -0.240000   
50%    1949.5000   -0.020000   -0.040000   -0.020000   -0.050000   -0.050000   
75%    1984.2500    0.290000    0.325000    0.275000    0.250000    0.260000   
max    2019.0000    1.150000    1.330000    1.300000    1.070000    0.900000   

              Jun         Jul         Aug         Sep         Oct         Nov  \
count  139.000000  139.000000  139.000000  139.000000  139.000000  139.000000   
mean     0.003022    0.026043    0.030863    0.041367    0.060072    0.048561   
std      0.339148    0.317524    0.330365    0.323767    0.335174    0.341057   
min     -0.530000   -0.540000   -0.540000   -0.530000   -0.570000   -0.540000   
25%     -0.245000   -0.210000   -0.210000   -0.180000   -0.190000   -0.185000   
50%     -0.070000   -0.050000   -0.050000   -0.060000    0.000000   -0.020000   
75%      0.190000    0.195000    0.190000    0.205000    0.190000    0.180000   
max      0.780000    0.820000    1.000000    0.880000    1.060000    1.020000   

              Dec         J-D         D-N         DJF         MAM         JJA  \
count  139.000000  139.000000  138.000000  138.000000  139.000000  139.000000   
mean     0.021727    0.032302    0.033116    0.026449    0.031583    0.020360   
std      0.364511    0.336896    0.338215    0.369663    0.361006    0.324987   
min     -0.790000   -0.490000   -0.510000   -0.660000   -0.560000   -0.520000   
25%     -0.220000   -0.200000   -0.215000   -0.240000   -0.255000   -0.220000   
50%     -0.050000   -0.070000   -0.060000   -0.070000   -0.060000   -0.070000   
75%      0.275000    0.215000    0.230000    0.280000    0.265000    0.195000   
max      1.100000    0.980000    1.010000    1.190000    1.090000    0.860000   

              SON  
count  139.000000  
mean     0.050504  
std      0.327437  
min     -0.490000  
25%     -0.190000  
50%     -0.020000  
75%      0.190000  
max      0.970000  

Actually I just looked at the "official" file given by Coursera and I downloaded the wrong one.

The Real Data

The data I was supposed to pull was the Combined Land-Surface Air and Sea-Surface Water Temperature Anomolies' Zonal Annual Means which shows the different annual mean for each zone in a given year (rather than monthly global averages).

zone_path = Path(os.environ.get("ZONES")).expanduser()
assert zone_path.is_file()
with zone_path.open() as reader:
    giss = pandas.read_csv(reader)
print(giss.describe())
              Year        Glob        NHem        SHem     24N-90N  \
count   139.000000  139.000000  139.000000  139.000000  139.000000   
mean   1949.000000    0.032302    0.056043    0.008561    0.077698   
std      40.269923    0.336896    0.393435    0.301848    0.464606   
min    1880.000000   -0.490000   -0.540000   -0.490000   -0.580000   
25%    1914.500000   -0.200000   -0.220000   -0.235000   -0.280000   
50%    1949.000000   -0.070000   -0.010000   -0.080000    0.020000   
75%    1983.500000    0.215000    0.210000    0.265000    0.255000   
max    2018.000000    0.980000    1.260000    0.710000    1.500000   

          24S-24N     90S-24S     64N-90N     44N-64N     24N-44N     EQU-24N  \
count  139.000000  139.000000  139.000000  139.000000  139.000000  139.000000   
mean     0.036115   -0.018561    0.111079    0.117770    0.027698    0.027626   
std      0.331384    0.295695    0.917715    0.516729    0.356416    0.326111   
min     -0.650000   -0.470000   -1.640000   -0.710000   -0.590000   -0.720000   
25%     -0.215000   -0.250000   -0.545000   -0.270000   -0.200000   -0.230000   
50%     -0.030000   -0.100000    0.020000    0.000000   -0.070000    0.000000   
75%      0.255000    0.230000    0.660000    0.360000    0.135000    0.240000   
max      0.970000    0.700000    3.050000    1.440000    1.060000    0.930000   

          24S-EQU     44S-24S     64S-44S     90S-64S  
count  139.000000  139.000000  139.000000  139.000000  
mean     0.045683    0.020432   -0.069353   -0.078129  
std      0.343385    0.312688    0.269380    0.732359  
min     -0.580000   -0.430000   -0.540000   -2.570000  
25%     -0.210000   -0.220000   -0.265000   -0.490000  
50%     -0.030000   -0.080000   -0.090000    0.050000  
75%      0.290000    0.260000    0.180000    0.410000  
max      1.020000    0.780000    0.450000    1.270000  
print(giss.iloc[0])
Year       1880.00
Glob         -0.18
NHem         -0.31
SHem         -0.06
24N-90N      -0.38
24S-24N      -0.17
90S-24S      -0.01
64N-90N      -0.97
44N-64N      -0.47
24N-44N      -0.25
EQU-24N      -0.21
24S-EQU      -0.13
44S-24S      -0.04
64S-44S       0.05
90S-64S       0.67
Name: 0, dtype: float64

Criteria

Appropriate Chart Selection and Variables

Did you select the appropriate chart and use the correct chart elements to visualize the nominal, ordinal, discrete, and continuous variables, as described in lecture 2.1.3? Continuous data variables should be assigned to continuous chart elements (e.g., lines between data points), whereas discrete variables should be assigned to discrete chart elements (e.g., separate bars). Furthermore, the assignment of variables to elements should follow the priorities in lecture 2.1.2.

Design of the Chart

Does the chart effectively display the data, based on the design rules in lecture 2.3.1?

Content

How interesting is the result? Does this represent an interesting choice of data and/or an interesting way to display the data? For example, was a streamgraph used instead of an ordinary bar chart?

Grading

Criteria Poor (1–2 points) Fair (3 points) Good (4 points) Great (5 points)
Appropriate Chart Selection and Variables Chart is indecipherable or significantly misleading because of poor chart type or assignment of variables to elements Major problem(s) with chart selection or assignment of elements to variables Minor problem(s) with chart selection or assignment of elements to variables Chart selection is appropriate for data and its elements properly assigned to appropriate data variables
Design of the Chart No apparent attention paid to design Evidence that several of the design rules should have been followed but were not Evidence that one of the design rules should have been followed but was not Attention paid to all design rules
Content Misleading Boring Not boring Interesting

Citation