SQL Alchemy Tutorial

This is a walk through the SQL Alchemy Tutorial. There's nothing original here, I'm just trying to re-acquaint myself with SQL Alchemy.

Imports

from sqlalchemy import (
    create_engine,
    Column,
    Integer,
    String,
)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

Create the Engine

This is going to connect to SQLite (in memory).

engine = create_engine("sqlite:///:memory:", echo=True)

The echo parameter tells SQLAlchemy to start logging.

Declare a Mapping

The Base

This creates a base-class that our classes will inherit from.

Base = declarative_base()

A User

First we're going to create a User class that maps to a user-table in the database.

class User(Base):
    __tablename__ = "users"
    id = Column(Integer, primary_key=True)
    name = Column(String)
    fullname = Column(String)
    password = Column(String)

    def __repr__(self):
	return "<User (name={} fullname={} password={})".format(self.name,
								self.fullname,
								self.password)

Create a Schema

SQL Alchemy automatically creates a Table object for us.

User.__table__
Table('users', MetaData(bind=None), Column('id', Integer(), table=<users>, primary_key=True, nullable=False), Column('name', String(), table=<users>), Column('fullname', String(), table=<users>), Column('password', String(), table=<users>), schema=None)

If you try and print it you will only get the table-name 'users'.

Create the Table

Now we actually create the table using the Base object.

Base.metadata.create_all(engine)
2018-11-30 18:05:27,009 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2018-11-30 18:05:27,010 INFO sqlalchemy.engine.base.Engine ()
2018-11-30 18:05:27,011 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2018-11-30 18:05:27,011 INFO sqlalchemy.engine.base.Engine ()
2018-11-30 18:05:27,013 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("users")
2018-11-30 18:05:27,014 INFO sqlalchemy.engine.base.Engine ()
2018-11-30 18:05:27,015 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE users (
	id INTEGER NOT NULL, 
	name VARCHAR, 
	fullname VARCHAR, 
	password VARCHAR, 
	PRIMARY KEY (id)
)


2018-11-30 18:05:27,016 INFO sqlalchemy.engine.base.Engine ()
2018-11-30 18:05:27,017 INFO sqlalchemy.engine.base.Engine COMMIT

Create a User

bob = User(name="bob", fullname="Bob Dobolina", password="bob spelled backwards")

print(bob.name)
print(bob.fullname)
print(bob.id)
bob
Bob Dobolina
None

Create a Session

So far we've only been working with python objects, you need a session to actually talk to a database.

Session = sessionmaker(bind=engine)
session = Session()

Add Bob to the Session

session.add(bob)

This still doesn't actually add it to the database, we need to commit it. This is useful because you can continue to create objects and update them without continuously hitting the database.

Now we actually do our first communication with the database.

session.commit()
2018-12-03 14:39:51,652 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2018-12-03 14:39:51,655 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, password) VALUES (?, ?, ?)
2018-12-03 14:39:51,656 INFO sqlalchemy.engine.base.Engine ('bob', 'Bob Dobolina', 'bob spelled backwards')
2018-12-03 14:39:51,658 INFO sqlalchemy.engine.base.Engine COMMIT
print("Bob's ID: {}".format(bob.id))
Bob's ID: 1

Even though we didn't specify the ID, it gets added when the entry is committed to the database.

More Users

Instead of adding them one at a time you can pass in a list of users.

session.add_all([
    User(name="ted", fullname="Ted Thompson", password="wordpass"),
    User(name="tom", fullname="Tom Tuttle", password="tacoma")
])
session.commit()
2018-12-03 14:46:48,764 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, password) VALUES (?, ?, ?)
2018-12-03 14:46:48,765 INFO sqlalchemy.engine.base.Engine ('ted', 'Ted Thompson', 'wordpass')
2018-12-03 14:46:48,766 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, password) VALUES (?, ?, ?)
2018-12-03 14:46:48,767 INFO sqlalchemy.engine.base.Engine ('tom', 'Tom Tuttle', 'tacoma')
2018-12-03 14:46:48,768 INFO sqlalchemy.engine.base.Engine COMMIT

Now we can make a query to see what's there.

for row in session.query(User).order_by(User.id):
    print(row.id, row.fullname)
2018-12-03 14:48:53,556 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.password AS users_password 
FROM users ORDER BY users.id
2018-12-03 14:48:53,557 INFO sqlalchemy.engine.base.Engine ()
1 Bob Dobolina
2 Ted Thompson
3 Tom Tuttle

Remote jupyter Sessions With ob-ipython

Introduction

These are my notes about trying to use a remote jupyter session in org-mode (with ob-ipython). My main source was this blog post from vxlabs, and also the ipython instructions for setting up a remote-session. I also referred to the ob-ipython notes on setting up a remote session, although I didn't really understand everything it said until after I had done it once.

The Layout

Both this post and the jupyter server are running on my desktop (Hades) while I'm editing them in emacs on my laptop.

Setting up the Session

The steps are:

  • Find out where the JSON file you will need is going to be put on the server
  • Start the server
  • Copy the json file that was created to your client
  • Start an ipython session to connect to the remote session

Find Out Where the JSON File Will Be

While ssh'd into the server (and with the appropriate virtual environment running if you need it), enter the command to show the folder where the JSON file will be.

jupyter --runtime-dir

In my case this is what came out.

/run/user/1000/jupyter

Start the Server

Change into the directory where you want the jupyter server to run and start the ipython kernel.

ipython kernel

This is what came out for me.

NOTE: When using the `ipython kernel` entry point, Ctrl-C will not work.

To exit, you will have to explicitly quit this process, by either sending
"quit" from a client, or using Ctrl-\ in UNIX-like environments.

To read more about this, see https://github.com/ipython/ipython/issues/2049


To connect another client to this kernel, use:
    --existing kernel-10181.json

That last line refers to the JSON file that we're going to need on the client side (kernel-10181.json).

Copy the File From the Server To the Client

Go back to your client (my laptop in this case) and check where your jupyter installation is.

jupyter --runtime-dir
/run/user/1000/jupyter

Both my laptop and desktop had the same location, so it's probably the default. Now change into that directory on the client and copy the file from the server.

cd /run/user/1000/jupyter
scp Hades:/run/user/1000/jupyter/kernel-10181.json .

Start the Client

You can run this next command anywhere on the client where you can run jupyter.

jupyter console --existing kernel-10181.json --ssh Hades

Where kernel-10181.json is the files copied from the server and Hades is the ssh alias for my server (so it would be hades@erebus without the alias).

When you run this command it will open up an ipython prompt that will be connected to the server. This turns out to be more useful than I thought it would be because the prompt that ob-ipython normally opens seems to be broken when I'm connected to the remote client. Maybe there's something else to configure. Anyway, whatever you add to your emacs ob-ipython namespace will be accessible to you in the ipython prompt so you can fiddle with things live in there.

Once you open this it will add another file (kernel-10181-ssh.json in this case) that you can use to connect if you want to open up more than ipython prompt for some reason:

jupyter console --existing kernel-10181-ssh.json

You don't pass in the --ssh flag this time. Note that you're still sharing the same jupyter session so all the variables and stuff will show up in the second console as well.

But, more importantly, this second json file is how we can connect while in emacs.

Use The Session

To make use of the session you need to pass in the name of the ssh kernel file as the name of the ipython session.

#+BEGIN_SRC ipython :session kernel-10181-ssh.json :results none

#+END_SRC

Anything you do in a block with that session ID will send the commands to the remote server to be interpreted.

A Plot

I'm going to use this example from the seaborn gallery to test out the setup. It might not be obvious from the post itself but the web-server for this post is also on my remote machine so I can tell if it worked by checking the page in a browser (because if it was using an ipython session on my laptop the page wouldn't get the image).

Imports From PyPi

import pandas
import seaborn

Plotting Setup

%matplotlib inline
seaborn.set(style="whitegrid",
	    rc={"axes.grid": False,
		"font.family": ["sans-serif"],
		"font.sans-serif": ["Latin Modern Sans", "Lato"],
		"figure.figsize": (13, 13)},
	    font_scale=1)

Load the brain networks example dataset

data = seaborn.load_dataset("brain_networks", header=[0, 1, 2], index_col=0)

Select a subset of the networks.

used_networks = [1, 5, 6, 7, 8, 12, 13, 17]
used_columns = (data.columns.get_level_values("network")
		.astype(int)
		.isin(used_networks))
data = data.loc[:, used_columns]

Create a categorical palette to identify the networks.

network_palette = seaborn.husl_palette(8, s=.45)
# I have no idea what the network_lut is.
network_lut = dict(zip(map(str, used_networks), network_palette))

Convert the palette to vectors that will be drawn on the side of the matrix.

networks = data.columns.get_level_values("network")
network_colors = pandas.Series(networks, index=data.columns).map(network_lut)

Plot.

cluster = seaborn.clustermap(data.corr(), center=0, cmap="vlag",
			     row_colors=network_colors,
			     col_colors=network_colors,
			     linewidths=.75)
title = cluster.fig.suptitle("Brain Networks")

cluster_map.png

Besides the ipython shell not working in emacs I also couldn't get it to render images directly in emacs, but they did render on the remote side so I could see the plot in my web-browser.

One More Thing

I stumbled over this a couple of times so I thought I should mention that the virtualenv you activate in emacs has to be on your client (the laptop in my case), so to make it easier you should set it before opening the remote document via tramp. If you try to activate the virtualenv while in tramp it will attempt to use the remote machine's (server's) virtualenv. You can see that it's the case in the mini-buffer, but I think being prompted like that made me pick the wrong side. Use the virtualenv on the machine you're working on.

Using Nvidia Drivers in Ubuntu 18.10

What is this about?

I had previously found that after upgrading from Ubuntu 18.04 to 18.10 my computer would hang whenever I re-booted. I originally "solved" it by purging the nvidia drivers, but I had actually set out to try and get cuda running this weekend (figuring out the nvidia problem took a day and a half so maybe next weekend) so I wanted a way to get the drivers working. Here's how I did it.

What didn't work?

Disabling the initial login

Since I have disk-encrpytion set up there's an initial login screen to enter the decryption passphrase. This never seemed to hang, but if you google 'disable login screen' or something similar this is what you get. Disabling it basically confirmed that it wasn't the problem. For the record, you do this by editing /etc/default/grub and changing the line:

GRUB_CMDLINE_LINUX_DEFAULT = "quiet splash"

To:

GRUB_CMDLINE_LINUX_DEFAULT = ""

Revert the kernel and driver version

Since I had the nvidia drivers working before I upgraded to Ubuntu 18.10 I decided to try using the same driver version that I had used and running the previous kernel using the Grub Menu. No go the bogeyman.

So what did work?

I don't know why other people do it, but this Stack Overflow post tells you how to disable the GUI. It's basically a one-liner.

sudo systemctl set-default multi-user.target

This will cause the machine to boot up without the GUI running so you can bypass the user-login screen (which seems to be what is causing the problem for me) and login at the command-prompt. The post suggested using systemctl to start the GUI, but that didn't work for me (Ubuntu might not be using lightdm anymore, which is what the post refers to). Instead I entered startx and here I am.

lshw -c video
*-display
     description: VGA compatible controller
     product: GM206 [GeForce GTX 950]
     vendor: NVIDIA Corporation
     physical id: 0
     bus info: pci@0000:01:00.0
     version: a1
     width: 64 bits
     clock: 33MHz
     capabilities: vga_controller bus_master cap_list rom
     configuration: driver=nvidia latency=0
     resources: irq:41 memory:fd000000-fdffffff memory:d0000000-dfffffff memory:ce000000-cfffffff ioport:dc00(size=128) memory:c0000-dffff

Recovering From the Ubuntu 18.10 Upgrade

Note: I had it hang again on rebooting so there's an update at the end.

What is this about?

I updated from Ubuntu 18.04 to Ubuntu 18.10 (Cosmic Cuttlefish) last night and upon rebooting found that I couldn't (finish rebooting, that is). In order to write this I had to recover from the problem so I don't have a screen-grab of the error, but it was basically the same one given in this Stack Overflow post:

WARNING: Failed to connect to lvmetad. Falling back to device scanning.
Volume group "ubuntu-vg" not found
Cannot process volume group ubuntu-vg
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
Reading all physical volumes.  This may take a while...
Found volume group "ubuntu-vg" using metadata type lvm2
WARNING: Failed to connect to lvmetad. Falling back to device scanning.
2 logical volume(s) in volume group "ubuntu-vg" now active /dev/mapper/ubuntu--vg-root: clean, 1180656/30154752 files, 16007858/120604672 blocks

What didn't work?

If you read the Stack Overflow post I linked to above, the solution was to downgrade the kernel version using the grub loader so, following that advice, I brought up the Grub menu by holding down the shift key while the machine booted up, then tried to move the cursor down to Advanced in the menu using the arrow keys but for some reason it wouldn't move. I thought maybe it was my Ultimate Hacking Keyboard so I went and grabbed my Vortex keyboard instead and it still wouldn't work, and at this point some people might think that it probably was not the keyboard, but nope, I went and grabbed my Redragon keyboard and all of a sudden it worked. Why did my $30 keyboard work better than my $300 keyboard? And no, I don't think I paid that much when I bought the Ultimate Hacking Keyboard (but it took something like two years to ship so I probably spent more than that in lost time). Well, anyway, it turned out that I plugged the last keyboard in the left USB slot while my other keyboards were in the right USB slot and for some reason the right slot was disabled when I went into the Grub Menu. No mention of that on Stack Overflow, unfortunately.

What did work?

While I was googling around I found this post on Stack Overflow where it was mentioned that the solution did mess with the kernel but it also mentions that he purged his nvidia drivers, which struck a chord with me because I've had problems with the nvidia drivers on my computer at work whenever I upgrade so I thought that maybe this might be something to check out first, before messing with the kernel. The post suggested using ctrl-alt-f3 to get into a terminal which didn't work for me for some reason so I used the Grub Menu instead. Here's the steps for my future self.

  1. Hold the shift key down while the machine boots up
  2. In the Grub Menu use the arrow keys and return to choose the Advanced option
  3. Choose the (recovery mode) version of the current kernel (4.18.0)
  4. In the Recovery Menu choose the root Drop to root shell prompt option (you have to hit enter again at the next prompt)
  5. The file-system for Recovery Mode is in read-only mode so mount the drive with =mount –options remount,rw /" (you can do this from the Recovery Menu as well, but it's an extra step either way)
  6. Purge the drivers with apt purge nvidia-*
  7. Type exit then back out of Recovery Mode and finish the startup

I got the stuff about removing the nvidia driver from this Stack Overflow post. It took a longer time than I thought it would to start up, but I'm writing this from my machine so it seems to work.

Anything Else?

ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001402sv00001043sd00008581bc03sc00i00
vendor   : NVIDIA Corporation
model    : GM206 [GeForce GTX 950]
driver   : nvidia-driver-390 - distro non-free recommended
driver   : xserver-xorg-video-nouveau - distro free builtin


This card actually tended to freeze when I used the open source video drivers that come with Ubuntu so not having the Nvidia drivers seemed like a bad idea. I'm hoping that re-installing them will fix whatever bad setup values were there so I did:

sudo ubuntu-drivers autoinstall

Cross your fingers.

Update

First, I didn't mention it above (because I hadn't noticed) but the bluetooth interface wouldn't start. The GUI actually let me click on the on-switch but nothing changed and when I looked in Journalctl I could see that it was raising errors. Even hciconfig couldn't bring it up. Maddeningly, there are posts out there that showed the same errors that I saw (like this one) but not only did none of the fixes they suggested work, but once I rebooted the problem went away, so I could have tried that before troubleshooting, and avoided all the hassle (maybe).

Of course, every solution creates a problem, and while rebooting fixed my bluetooth interface it also revealed that I had the same problems as before (only my Redragon keyboard really works on the Grub menu and the boot hangs after the initial login to decrypt the disk). I removed the nvidia drivers again and this time I didn't reinstall them before rebooting and now it works, but now I don't know if my GUI is going to start hanging again like it used to. Oh, well, one problem at a time.

Categorical Plotting

Imports

From Python

from functools import partial

From pypi

from tabulate import tabulate
import matplotlib.pyplot as pyplot
import numpy
import pandas
import seaborn

Set Up

The Plotting

%matplotlib inline
seaborn.set(style="whitegrid", color_codes=True)
FIGURE_SIZE = (14, 12)

The Tables

table = partial(tabulate, headers="keys",
		tablefmt="orgtbl")

The Data

This is just some fake stuff to test it out. We're going to simulate how participants in a survey scored two products on a scale from -3 to 3.

Product One

countries = ["china"] * 50 + ["india"] * 20 + ["japan"] * 2
options = [-3, -2, -1, 0, 1, 2, 3]
scores = pandas.Series(numpy.random.choice(options, size=len(countries)), dtype="category")
one_hot = pandas.get_dummies(scores, "Alpha", " ")
data = pandas.DataFrame.from_dict(dict(country=countries),
				  dtype="category")
data = pandas.concat([data, one_hot], axis="columns")
print(table(data.head(), showindex=False))
country Alpha -3 Alpha -2 Alpha -1 Alpha 0 Alpha 1 Alpha 2 Alpha 3
china 0 0 0 0 0 0 1
china 0 0 0 1 0 0 0
china 1 0 0 0 0 0 0
china 0 1 0 0 0 0 0
china 0 0 0 1 0 0 0

Product Two

countries = ["china"] * 20 + ["india"] * 30 + ["japan"] * 25
options = [-3, -2, -1, 0, 1, 2, 3]
scores = pandas.Series(numpy.random.choice(options, size=len(countries)), dtype="category")
one_hot = pandas.get_dummies(scores, "Beta", " ")
data_2 = pandas.DataFrame.from_dict(dict(country=countries),
				    dtype="category")
data_2 = pandas.concat([data_2, one_hot], axis="columns")
print(table(data_2.head()))
country Beta -3 Beta -2 Beta -1 Beta 0 Beta 1 Beta 2 Beta 3
china 0 0 0 1 0 0 0
china 0 0 0 0 0 0 1
china 0 1 0 0 0 0 0
china 0 0 0 1 0 0 0
china 1 0 0 0 0 0 0

Grouping

grouped = data.groupby("country").sum()
print(table(grouped))
country Alpha -3 Alpha -2 Alpha -1 Alpha 0 Alpha 1 Alpha 2 Alpha 3
china 6 8 3 9 4 11 9
india 3 3 3 3 4 1 3
japan 0 0 1 0 0 0 1
grouped_2 = data_2.groupby("country").sum()
print(table(grouped_2))
country Beta -3 Beta -2 Beta -1 Beta 0 Beta 1 Beta 2 Beta 3
china 1 5 4 2 3 1 4
india 5 2 10 2 3 4 4
japan 5 4 2 3 3 5 3

Concatenate our data

figure, axe = pyplot.subplots(figsize=FIGURE_SIZE)
axe.set_title("Country vs Score")
axe.set_ylabel("Score")
with seaborn.color_palette("Reds", 7):
    axe = grouped_2.plot.bar(ax=axe)
with seaborn.color_palette("Blues", 7):
    axe = grouped.plot.bar(ax=axe)
labels = axe.set_xticklabels(grouped.index.unique())

barplot.png

figure, axe = pyplot.subplots(figsize=FIGURE_SIZE)
axe.set_title("Country vs Score")
axe.set_ylabel("Score")
with seaborn.color_palette("Reds", 7):
    axe = grouped_2.plot.bar(ax=axe, stacked=True)
with seaborn.color_palette("Blues", 7):
    axe = grouped.plot.bar(ax=axe, stacked=True)
labels = axe.set_xticklabels(grouped.index.unique())

barplot_2.png

Scaled

alpha_scaled = (grouped.T/grouped.sum(axis="columns").values).T
print(table(alpha_scaled))
country Alpha -3 Alpha -2 Alpha -1 Alpha 0 Alpha 1 Alpha 2 Alpha 3
china 0.12 0.16 0.06 0.18 0.08 0.22 0.18
india 0.15 0.15 0.15 0.15 0.2 0.05 0.15
japan 0 0 0.5 0 0 0 0.5
beta_scaled = (grouped_2.T/grouped_2.sum(axis="columns").values).T
print(table(beta_scaled))
country Beta -3 Beta -2 Beta -1 Beta 0 Beta 1 Beta 2 Beta 3
china 0.05 0.25 0.2 0.1 0.15 0.05 0.2
india 0.166667 0.0666667 0.333333 0.0666667 0.1 0.133333 0.133333
japan 0.2 0.16 0.08 0.12 0.12 0.2 0.12
figure, axe = pyplot.subplots(figsize=FIGURE_SIZE)
axe.set_title("Country vs Score")
axe.set_ylabel("Score")
with seaborn.color_palette("Reds", 7):
    axe = alpha_scaled.plot.bar(ax=axe)
with seaborn.color_palette("Blues", 7):
    axe = beta_scaled.plot.bar(ax=axe)
labels = axe.set_xticklabels(grouped.index.unique())

alpha_scaled.png

figure, axe = pyplot.subplots(figsize=FIGURE_SIZE)
axe.set_title("Country vs Score")
axe.set_ylabel("Score")
with seaborn.color_palette("Reds", 7):
    axe = alpha_scaled.plot.bar(ax=axe, stacked=True)
with seaborn.color_palette("Blues", 7):
    axe = beta_scaled.plot.bar(ax=axe, stacked=True)
labels = axe.set_xticklabels(grouped.index.unique())

stacked_scaled.png

Well, I guess I need to work on making the reds visible, but I'm out of time.

Date Mean Squared Error

What is this?

This is a short sketch to figure out how to group a bunch of values by month and calculate the Root-Mean-Squared-Error (RMSE) for the mean for the values in that month. This probably isn't the most efficient way to do this, but I'm trying to double check everything as I go and doing the typical Train Wreck like you see in most examples on Stack Overflow.

Imports

From PyPi

Just pandas.

import pandas

The Data

I'm going to create some simple values so that it's easy(ish) to do the math by hand and double-check what comes out. I'll use the pandas Timestamp for the dates. I'm still not one-hundred percent sure why it's better than date-time, but hopefully it's optimized or something.

data = {"date": [
    pandas.Timestamp("2018-09-01"),
    pandas.Timestamp("2018-09-05"),
    pandas.Timestamp("2018-09-05"),
    pandas.Timestamp("2018-10-01"),
    pandas.Timestamp("2018-10-05"),
		 ],
	"value": [1, 2, 3, 1, 2]}
frame = pandas.DataFrame.from_dict(data)

I'm going to use pandas' resample method to group the data by months. the resample method expets the data to have the dates as the index, so I'm going to create a new frame by setting the index to the date-column.

date_frame = frame.set_index("date")

The Mean

The value I'm going to use to estimate the values for each month is the mean.

monthly = date_frame.resample("M")
means = monthly.mean()
print(means)
assert all(means.value == [2.0, 1.5])
            value
date             
2018-09-30    2.0
2018-10-31    1.5

Getting the Mean Back Into the Frame

Now that we have the monthly means, I want to re-add them to the original data-frame by giving them a common column named year_month (using apply) so I can broadcast the means by merging the two data-frames.

frame["year_month"] = frame.date.apply(
    lambda date: pandas.Timestamp(year=date.year,
				  month=date.month, day=1))
print(frame.head())
        date  value year_month
0 2018-09-01      1 2018-09-01
1 2018-09-05      2 2018-09-01
2 2018-09-05      3 2018-09-01
3 2018-10-01      1 2018-10-01
4 2018-10-05      2 2018-10-01

mean_frame = means.reset_index()
mean_frame["year_month"] = mean_frame.date.apply(
    lambda date: pandas.Timestamp(year=date.year,
				  month=date.month,
				  day=1))
print(mean_frame)
        date  value year_month
0 2018-09-30    2.0 2018-09-01
1 2018-10-31    1.5 2018-10-01

The value column in the mean_frame is actually the mean of the values for that month so I'll re-name it before I forget.

mean_frame.rename(dict(value="mean"), axis="columns",
		  inplace=True)
print(mean_frame)
        date  mean year_month
0 2018-09-30   2.0 2018-09-01
1 2018-10-31   1.5 2018-10-01

Now I'll merge the two data frames on the year_month column using the default inner-join (intersection) method.

merged = frame.merge(mean_frame, on="year_month")
del(merged["date_y"])
merged.rename(dict(date_x="date"), axis="columns", inplace=True)
print(merged)
assert all(merged["mean"] == [2, 2, 2, 1.5, 1.5])
        date  value year_month  mean
0 2018-09-01      1 2018-09-01   2.0
1 2018-09-05      2 2018-09-01   2.0
2 2018-09-05      3 2018-09-01   2.0
3 2018-10-01      1 2018-10-01   1.5
4 2018-10-05      2 2018-10-01   1.5

Note that I had to use the merged["mean"] form because the data-frame has a mean method which the dot-notation (merged.mean) would call instead of grabbing the column.

Calculating the RMSE

Error

Since I'm estimating the values for each month using the mean the error is the difference between the mean and each of the values.

merged["error"] = merged["value"] - merged["mean"]
print(merged)
assert all(merged.error==[-1, 0, 1, -.5, .5])
        date  value year_month  mean  error
0 2018-09-01      1 2018-09-01   2.0   -1.0
1 2018-09-05      2 2018-09-01   2.0    0.0
2 2018-09-05      3 2018-09-01   2.0    1.0
3 2018-10-01      1 2018-10-01   1.5   -0.5
4 2018-10-05      2 2018-10-01   1.5    0.5

Error Squared

Now I'll square the error to get rid of the negative error values (which would cancel each other out when we take the mean errors) and to make the effect of the errors non-linear (the errors are exagerrated).

merged["error_squared"] = merged.error.pow(2)
print(merged)
        date  value year_month  mean  error  error_squared
0 2018-09-01      1 2018-09-01   2.0   -1.0           1.00
1 2018-09-05      2 2018-09-01   2.0    0.0           0.00
2 2018-09-05      3 2018-09-01   2.0    1.0           1.00
3 2018-10-01      1 2018-10-01   1.5   -0.5           0.25
4 2018-10-05      2 2018-10-01   1.5    0.5           0.25

Mean Squared Error

So now we take the mean of our squared errors to get an initial estimate of how much we are off each month.

mean_of = merged.groupby("year_month").mean()
print(mean_of.error_squared)
year_month
2018-09-01    0.666667
2018-10-01    0.250000
Name: error_squared, dtype: float64

RMSE

Since the squared error would have units squared, I'll take the root of it to get a more interpretable estimate of the error.

print(mean_of.error_squared.pow(.5))
year_month
2018-09-01    0.816497
2018-10-01    0.500000
Name: error_squared, dtype: float64

Python from G to Whatever

Table of Contents

Introduction

This freecodecamp article has some references to python stuff the author likes. Not all of it is of interest to me, but some of it is, so this is just my dump of the links I like.

Links

Geopy

geopy allows you to use python to find out geographic information for locations. It provides a common front end for different services like Google Maps, Bing, etc.

Inspect

inspect is a python built-in that helps you take advantage of python's introspection features. Besides looking at other people's code, it shows you how well (or not) you are providing information for other coders using your code.

Newspaper

I haven't actually used newspaper but it looks interesting.

sh

sh is a subprocess replacement. I've always been kind of leery about using python to run shell commands, but this at least makes it look nicer.

Wikipedia

Although wikipedia is one of the classic web-scraping examples, there's actually a python library to pull articles more directly.

Ubertooth 2018

Introduction

One of the problems of relying on search engines is that the more obscure the subject you are interested in, the greater the chance that what you'll get will be either unlocatable or outdated. These are my notes on setting up the ubertooth on an ubuntu system in September of 2018, which turned out to be a slightly different experience from what I went through earlier in this year so I thought I should try and make a record of it somewhere.

The Current Release

The current release for the Ubertooth firmware and code is 2018-08-R1. Download the ubertooth-2018-08-R1.tar.xz file from that page and untar it. You will also want the matching libbtb code.

Building and Installing the Code

For some reason the instructions on the home page assume you already have the code built and installed, but there's actually some previous steps you need to do.

Install the Dependencies

sudo apt install cmake libusb-1.0-0-dev make gcc g++ libbluetooth-dev \
pkg-config libpcap-dev python-numpy python-pyside python-qt4

The python-<packages> can be downloaded from pip, and you really don't need the pyside and qt4 libraries unless you want to run the spectral scanner, which is kind of their hello-world, which is nice, but the pyside code is actually depracated because qt5 is the current version and it won't work with python 3.6 (the current ubuntu default) so I personally would skip it. numpy is always a good idea no matter what you end up doing.

Build libbtbb

libbtb is the Bluetooth Baseband Library which you need for ubertooth to decode bluetooth packets. Change into the folder you downloaded earlier and build and install it.

mkdir build
cd build
cmake ..
make
sudo make install
[ 88%] Built target btbb
[ 88%] Built target btaptap
[100%] Built target pcapdump
Install the project...
-- Install configuration: ""
-- Installing: /usr/local/lib/pkgconfig/libbtbb.pc
-- Installing: /usr/local/lib/libbtbb.so.1.0
-- Installing: /usr/local/lib/libbtbb.so.1
-- Installing: /usr/local/lib/libbtbb.so
-- Installing: /usr/local/include/btbb.h
running build
running build_py
running install
running install_lib
creating //usr/local/lib/python2.7/site-packages/pcapdump
copying /home/dogen/Downloads/libbtbb-2018-08-R1/build/python/pcaptools/build/lib.linux-x86_64-2.7/pcapdump/pcapdump.py -> //usr/local/lib/python2.7/site-packages/pcapdump
copying /home/dogen/Downloads/libbtbb-2018-08-R1/build/python/pcaptools/build/lib.linux-x86_64-2.7/pcapdump/__init__.py -> //usr/local/lib/python2.7/site-packages/pcapdump
byte-compiling //usr/local/lib/python2.7/site-packages/pcapdump/pcapdump.py to pcapdump.pyc
byte-compiling //usr/local/lib/python2.7/site-packages/pcapdump/__init__.py to __init__.pyc
running install_egg_info
Writing //usr/local/lib/python2.7/site-packages/pcapdump-0.0.0-py2.7.egg-info
-- Installing: /usr/local/bin/btaptap

Now run ldconfig.

sudo ldconfig

Ubertooth Tools

Do the same thing to build the tools. First untar the file you downloaded, change into the host folder (ubertooth-2018-08-R1/host) and build and install the code.

mkdir build
cd build/
cmake ..
make
sudo make install
sudo ldconfig

You should now have some command-line tools that start with ubertooth- (e.g. ubertooth-rx).

Privileges

You can use setcap to make it so that you don't need to run things as root. This is how to make it so you can run ubertooth-scan.

sudo setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' (which ubertooth-scan)

Update the Firmware

One of the commands installed in the previous step was ubertooth-dfu which is the Do Firmware Update command that we need to update the firmware. First insert the ubertooth USB dongle, then change into the ubertooth-2018-08-R1/ubertooth-one-firmware-bin folder and run the update command.

Warning: According to the documentation you need to make sure to have an antenna on the ubertooth before inserting it. Also, it too a couple of attempts inserting, removing, and re-inserting the ubertooth before it worked for some reason (you can tell it's ready when there are two green LEDs and one red LED lit up).

ubertooth-dfu -d bluetooth_rxtx.dfu -r

As I mentioned before, the GUI for the spectral analysis is out of date and didn't install in my setup, but you can check that the ubertooth is working by just running the spectral analysis without a GUI.

ubertooth-specan

This should send a timestamp, frequency, RSSI CSV to stdout as the ubertooth sweeps the channels and shows the amount of electromagnetic radiation it sees on each.

Spectools

Since we can't run the python spectrum analyzer, we should turn to a better alternative, spectools. This was a project to mostly re-create the metageek wi-spy GUI but it supports the ubertooth. It actually looks like a dead project (the last News item on their site was from 2012), but it works for now. Luckily ubuntu has this in the repositories so you should be able to just install it with apt.

sudo apt install spectools

Once you install it you can either run spectool_curses which will run at the command line but only has one view, or spectool_gtk which will run in an X-windows environment and show three views of the 2.4 GHz spectrum.

Kismet

Now we come to the interesting installation. At this point you probably have all the command-line utilities you need, but kismet provides a convenient interface to start capturing packets. Unfortunately it is going through a major upgrade which makes it incompatible with the ubertooth plugin for it (maybe, there's two plugins and one of them might work, but they don't seem to document it (or much else, for that matter)).