High School Contact Networks

The Departure

This looks at data provided by SocioPatterns that looks a the interactions between students at a High School in Marseilles, France, in December of 2013.

Imports

From Python
from collections import Counter
from functools import partial
from pathlib import Path
import os
From PyPi
from bokeh.models import HoverTool
from dotenv import load_dotenv
from holoviews import dim, opts
from holoviews.operation.datashader import datashade, bundle_graph
import holoviews
import hvplot.pandas
import networkx
import pandas as pandas
My Stuff
from graeae.timers import Timer
from graeae.visualization import EmbedHoloview

Load the Dotenv

load_dotenv(".env")

Build the Timer

TIMER = Timer()

Setup The Plotting

holoviews.extension("bokeh")
SLUG = "high-school-contact-networks/"
output = Path("../../files/posts/networks/" + SLUG)
Embed = partial(EmbedHoloview, folder_path=output)
class Plot:
    """Constants for plotting"""
    width = 1000
    height = 800
    fontsize = 18

Load The Data

Let's take a look at the data before loading it into pandas.

HIGH_SCHOOL = Path(os.environ.get("HIGH_SCHOOL")).expanduser()
assert HIGH_SCHOOL.is_dir()

#+begin_src ipython :session highschool :results none
class Files:
    metadata = "metadata_2013.txt"
    contact_diaries = "Contact-diaries-network_data_2013.csv"
    facebook = "Facebook-known-pairs_data_2013.csv"
    friendship = "Friendship-network_data_2013.csv"
    high_school = "High-School_data_2013.csv"

MetaData

metadata_path = HIGH_SCHOOL.joinpath(Files.metadata)
assert metadata_path.is_file()
with metadata_path.open() as reader:
    for line in range(5):
        print(reader.readline(), end="")
650     2BIO1   F
498     2BIO1   F
627     2BIO1   F
857     2BIO1   F
487     2BIO1   F

This first file has the meta-data for the students. The three columns are the student's ID, class, and gender.

meta_data = pandas.read_csv(metadata_path, sep="\t", 
                            names=["id", "class", "gender"])
meta_data.loc[:, "class"] = meta_data["class"].astype("category")
meta_data.loc[:, "gender"] = meta_data.gender.astype("category")
Classes

First a bar-plot to look at how the classes are distributed.

grouped = meta_data.groupby(["class", "gender"]).agg(
    {"class": "count", "gender": "count"})
grouped.columns = ["class_count", "gender_count"]
grouped = grouped.reset_index()
plot = grouped.hvplot.bar(title="Class Counts by Gender", 
                          x="class", y="class_count", 
                          stacked=True,
                          by="gender", height=Plot.height, 
                          width=Plot.width,
                          ylabel="Count",
                          xlabel="Class",
                          tools=["hover"],
                          fontsize=Plot.fontsize).opts(xrotation=90)
Embed(plot=plot, file_name="gender_counts_stacked", height_in_pixels=Plot.height)()

Figure Missing

Link to Plot

This is a look at the same thing except not stacked.

plot = grouped.hvplot.bar(title="Class Counts by Gender", x="class", 
                          y="class_count",
                          xlabel="Class",
                          ylabel="Count",
                          by="gender", height=Plot.height, width=Plot.width, 
                          tools=["hover"],
                          fontsize=Plot.fontsize).opts(xrotation=90)
Embed(plot=plot, file_name="gender_counts", height_in_pixels=Plot.height)()

Figure Missing

Link to Plot

Strangely, the classes that start with 2BIO are more female while the others are more male.

Gender

A stacked bar plot to get a sense of not just the distribution among genders but among classes.

plot = grouped.hvplot.bar(title="Gender Counts", x="gender", y="gender_count",
                          stacked=True,
                          by="class", 
                          xlabel="Count",
                          ylabel="Gender",
                          fontsize=Plot.fontsize,
                          width=Plot.width,
                          height=Plot.height).opts(
                              xrotation=90, 
                              xlabel="Gender and Class")
Embed(plot=plot, file_name="class_counts_stacked", height_in_pixels=Plot.height)()

Figure Missing

Link to Plot

A non-stacked bar plot to get a better sense of how the genders fill the different classes.

plot = grouped.hvplot.bar(title="Gender Counts", x="gender", y="gender_count",
                          xlabel="Gender",
                          ylabel="Count",
                          by="class", 
                          height=Plot.height,
                          width=Plot.width,
                          fontsize=Plot.fontsize).opts(
                              xrotation=90, xlabel="Gender and Class")
Embed(plot=plot, file_name="class_counts", height_in_pixels=Plot.height)()

Figure Missing

Link to Plot

It looks like there were a little more males than females, but not a whole lot more.

The Descent

The Contact Network

This is a dataset that shows whether a student logged contact with another student.

contact_path = HIGH_SCHOOL.joinpath(Files.contact_diaries)
assert contact_path.is_file()
with contact_path.open() as reader:
    for line in range(5):
        print(reader.readline(), end="")
3 28 2
3 106 1
3 147 4
3 177 1
3 295 4

The columns are the person who was making the report, the person that was identified as a contact, and the time spent ecoded into one of four values.

Code Lower Limit (minutes) Upper Limit (minutes)
1 0 5
2 5 15
3 15 60
4 60 infinity
contact_data = pandas.read_csv(contact_path, delimiter=" ", 
                                  names=["reporter", "contact", "time"])
contact_data = contact_data.dropna()

End

Citations

  • R. Mastrandrea, J. Fournet, A. Barrat,

Contact patterns in a high school: a comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS ONE 10(9): e0136497 (2015)