GISS Surface Temperature Analysis (GISTEMP v32)
Table of Contents
Introduction
This is a look at the Godard Institute for Space Studies' surface temperature data. In particular it is the Global-mean monthly, seasonal, and annual means data which has data from 1880 to the present (CSV Download Link).
Set Up
Imports
Python
from pathlib import Path
import os
PyPi
from dotenv import load_dotenv
import pandas
Load Dotenv
load_dotenv(".env")
Load the Data
Take One
path = Path(os.environ.get("GLOBAL")).expanduser()
assert path.is_file()
with path.open() as reader:
giss = pandas.read_csv(path)
print(giss.head())
Land-Ocean: Global Means
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec J-D D-N DJF MAM JJA SON
1880 -.29 -.18 -.11 -.19 -.11 -.23 -.20 -.09 -.16 -.23 -.20 -.22 -.18 *** *** -.14 -.17 -.19
1881 -.15 -.17 .04 .04 .02 -.20 -.06 -.02 -.14 -.21 -.22 -.11 -.10 -.11 -.18 .03 -.10 -.19
1882 .14 .15 .04 -.19 -.16 -.26 -.21 -.05 -.10 -.25 -.16 -.24 -.11 -.10 .06 -.10 -.17 -.17
1883 -.31 -.39 -.13 -.17 -.20 -.12 -.08 -.15 -.20 -.14 -.22 -.16 -.19 -.20 -.31 -.16 -.12 -.19
One thing to notice is that the first line got read in as columns and the columns got read in as the first row.
print(giss.iloc[0])
Land-Ocean: Global Means SON Name: (Year, Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec, J-D, D-N, DJF, MAM, JJA), dtype: object
So we're going to have to skip the first row.
Take Two
path = Path(os.environ.get("GLOBAL")).expanduser()
assert path.is_file()
with path.open() as reader:
giss = pandas.read_csv(path, skiprows=1)
print(giss.head())
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov \
0 1880 -0.29 -.18 -.11 -.19 -.11 -.23 -.20 -.09 -.16 -.23 -.20
1 1881 -0.15 -.17 .04 .04 .02 -.20 -.06 -.02 -.14 -.21 -.22
2 1882 0.14 .15 .04 -.19 -.16 -.26 -.21 -.05 -.10 -.25 -.16
3 1883 -0.31 -.39 -.13 -.17 -.20 -.12 -.08 -.15 -.20 -.14 -.22
4 1884 -0.15 -.08 -.37 -.42 -.36 -.40 -.34 -.26 -.27 -.24 -.30
Dec J-D D-N DJF MAM JJA SON
0 -.22 -.18 *** *** -.14 -.17 -.19
1 -.11 -.10 -.11 -.18 .03 -.10 -.19
2 -.24 -.11 -.10 .06 -.10 -.17 -.17
3 -.16 -.19 -.20 -.31 -.16 -.12 -.19
4 -.29 -.29 -.28 -.13 -.38 -.34 -.27
print(giss.columns)
Index(['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep',
'Oct', 'Nov', 'Dec', 'J-D', 'D-N', 'DJF', 'MAM', 'JJA', 'SON'],
dtype='object')
print(giss.describe())
Year Jan
count 140.0000 140.000000
mean 1949.5000 0.027500
std 40.5586 0.396867
min 1880.0000 -0.790000
25% 1914.7500 -0.265000
50% 1949.5000 -0.020000
75% 1984.2500 0.290000
max 2019.0000 1.150000
So most of the columns weren't read as numeric, probably because of the use of *** for missing data.
Take Three
with path.open() as reader:
giss = pandas.read_csv(path, skiprows=1, na_values="***")
print(giss.describe())
Year Jan Feb Mar Apr May \
count 140.0000 140.000000 139.000000 139.000000 139.000000 139.000000
mean 1949.5000 0.027500 0.038201 0.052806 0.026187 0.016043
std 40.5586 0.396867 0.393732 0.387470 0.363309 0.348825
min 1880.0000 -0.790000 -0.610000 -0.600000 -0.600000 -0.560000
25% 1914.7500 -0.265000 -0.235000 -0.230000 -0.260000 -0.240000
50% 1949.5000 -0.020000 -0.040000 -0.020000 -0.050000 -0.050000
75% 1984.2500 0.290000 0.325000 0.275000 0.250000 0.260000
max 2019.0000 1.150000 1.330000 1.300000 1.070000 0.900000
Jun Jul Aug Sep Oct Nov \
count 139.000000 139.000000 139.000000 139.000000 139.000000 139.000000
mean 0.003022 0.026043 0.030863 0.041367 0.060072 0.048561
std 0.339148 0.317524 0.330365 0.323767 0.335174 0.341057
min -0.530000 -0.540000 -0.540000 -0.530000 -0.570000 -0.540000
25% -0.245000 -0.210000 -0.210000 -0.180000 -0.190000 -0.185000
50% -0.070000 -0.050000 -0.050000 -0.060000 0.000000 -0.020000
75% 0.190000 0.195000 0.190000 0.205000 0.190000 0.180000
max 0.780000 0.820000 1.000000 0.880000 1.060000 1.020000
Dec J-D D-N DJF MAM JJA \
count 139.000000 139.000000 138.000000 138.000000 139.000000 139.000000
mean 0.021727 0.032302 0.033116 0.026449 0.031583 0.020360
std 0.364511 0.336896 0.338215 0.369663 0.361006 0.324987
min -0.790000 -0.490000 -0.510000 -0.660000 -0.560000 -0.520000
25% -0.220000 -0.200000 -0.215000 -0.240000 -0.255000 -0.220000
50% -0.050000 -0.070000 -0.060000 -0.070000 -0.060000 -0.070000
75% 0.275000 0.215000 0.230000 0.280000 0.265000 0.195000
max 1.100000 0.980000 1.010000 1.190000 1.090000 0.860000
SON
count 139.000000
mean 0.050504
std 0.327437
min -0.490000
25% -0.190000
50% -0.020000
75% 0.190000
max 0.970000
Actually I just looked at the "official" file given by Coursera and I downloaded the wrong one.
The Real Data
The data I was supposed to pull was the Combined Land-Surface Air and Sea-Surface Water Temperature Anomolies' Zonal Annual Means which shows the different annual mean for each zone in a given year (rather than monthly global averages).
zone_path = Path(os.environ.get("ZONES")).expanduser()
assert zone_path.is_file()
with zone_path.open() as reader:
giss = pandas.read_csv(reader)
print(giss.describe())
Year Glob NHem SHem 24N-90N \
count 139.000000 139.000000 139.000000 139.000000 139.000000
mean 1949.000000 0.032302 0.056043 0.008561 0.077698
std 40.269923 0.336896 0.393435 0.301848 0.464606
min 1880.000000 -0.490000 -0.540000 -0.490000 -0.580000
25% 1914.500000 -0.200000 -0.220000 -0.235000 -0.280000
50% 1949.000000 -0.070000 -0.010000 -0.080000 0.020000
75% 1983.500000 0.215000 0.210000 0.265000 0.255000
max 2018.000000 0.980000 1.260000 0.710000 1.500000
24S-24N 90S-24S 64N-90N 44N-64N 24N-44N EQU-24N \
count 139.000000 139.000000 139.000000 139.000000 139.000000 139.000000
mean 0.036115 -0.018561 0.111079 0.117770 0.027698 0.027626
std 0.331384 0.295695 0.917715 0.516729 0.356416 0.326111
min -0.650000 -0.470000 -1.640000 -0.710000 -0.590000 -0.720000
25% -0.215000 -0.250000 -0.545000 -0.270000 -0.200000 -0.230000
50% -0.030000 -0.100000 0.020000 0.000000 -0.070000 0.000000
75% 0.255000 0.230000 0.660000 0.360000 0.135000 0.240000
max 0.970000 0.700000 3.050000 1.440000 1.060000 0.930000
24S-EQU 44S-24S 64S-44S 90S-64S
count 139.000000 139.000000 139.000000 139.000000
mean 0.045683 0.020432 -0.069353 -0.078129
std 0.343385 0.312688 0.269380 0.732359
min -0.580000 -0.430000 -0.540000 -2.570000
25% -0.210000 -0.220000 -0.265000 -0.490000
50% -0.030000 -0.080000 -0.090000 0.050000
75% 0.290000 0.260000 0.180000 0.410000
max 1.020000 0.780000 0.450000 1.270000
print(giss.iloc[0])
Year 1880.00 Glob -0.18 NHem -0.31 SHem -0.06 24N-90N -0.38 24S-24N -0.17 90S-24S -0.01 64N-90N -0.97 44N-64N -0.47 24N-44N -0.25 EQU-24N -0.21 24S-EQU -0.13 44S-24S -0.04 64S-44S 0.05 90S-64S 0.67 Name: 0, dtype: float64
Criteria
Appropriate Chart Selection and Variables
Did you select the appropriate chart and use the correct chart elements to visualize the nominal, ordinal, discrete, and continuous variables, as described in lecture 2.1.3? Continuous data variables should be assigned to continuous chart elements (e.g., lines between data points), whereas discrete variables should be assigned to discrete chart elements (e.g., separate bars). Furthermore, the assignment of variables to elements should follow the priorities in lecture 2.1.2.
Design of the Chart
Does the chart effectively display the data, based on the design rules in lecture 2.3.1?
Content
How interesting is the result? Does this represent an interesting choice of data and/or an interesting way to display the data? For example, was a streamgraph used instead of an ordinary bar chart?
Grading
| Criteria | Poor (1–2 points) | Fair (3 points) | Good (4 points) | Great (5 points) |
|---|---|---|---|---|
| Appropriate Chart Selection and Variables | Chart is indecipherable or significantly misleading because of poor chart type or assignment of variables to elements | Major problem(s) with chart selection or assignment of elements to variables | Minor problem(s) with chart selection or assignment of elements to variables | Chart selection is appropriate for data and its elements properly assigned to appropriate data variables |
| Design of the Chart | No apparent attention paid to design | Evidence that several of the design rules should have been followed but were not | Evidence that one of the design rules should have been followed but was not | Attention paid to all design rules |
| Content | Misleading | Boring | Not boring | Interesting |
Citation
- GISTEMP Team, 2019: GISS Surface Temperature Analysis (GISTEMP). NASA Goddard Institute for Space Studies. Dataset accessed 2019-02-27 at https://data.giss.nasa.gov/gistemp/.
- Hansen, J., R. Ruedy, M. Sato, and K. Lo, 2010: Global surface temperature change, Rev. Geophys., 48, RG4004, https://doi.org/10.1029/2010RG000345.