Updating a Nikola Shortcode Plugin

nil

Source: Brian (CC License)/Unmodified

Beginning

These are some notes to help me remember what I did to alter a Nikola shortcode plugin. A while back I looked into using their doc reStructured Text extension ("role"?) which automatically creates links to other posts by looking up the slug or title that you pass to it, but since I switched to using org-mode instead of restructured text I couldn't use it. Then I found out the other day that they built a shortcode that allows you to use it in other formats and when I tried it it worked - mostly.

A tag that looks like this:

{{% doc %}}2-nikola-inter-site-links{{% /doc %}}

Gets turned into this: Nikola Inter-Site Links.

The problem is that, while the doc shortcode supports giving the link an alternate title, it expects restructured-text formatting. e.g. `alternate title<some-slug>`. Why is this a problem? Well, I didn't look into where it's happening, but when the text within the {{% doc %}} tags gets sent to the shortcode code (meaning the python plugin code) for processing it converts the angle brackets (< and >) to HTML character entities (e.g. the < becomes &lt) but the regular expression in the code that extracts the alternate title is looking for the braces (presumably the role is processed differently from the shortcode) anyway, I was trying to figure out some workarounds but then I looked at the code and it looked fairly easy to customize so I decided I would so that I could learn how to do it (and have a way to use alternative titles in org-mode). So, here it goes.

Middle

I'm going to call my shortcode lancelot. I was going to call it link, but that seems like it might clash with something else, either now or in the future, so I decided to name it after Lancelot Link, Secret Chimp instead. Hopefully that won't end up conflicting with anything.

The Plugin File

The first thing I did was poke around in the nikola folders for something I could copy. I decided to use the existing gist shortcode to start with. Why not use doc? because it's in a different folder and inherits from the RestExtension which didn't seem like what I wanted, although maybe that's what makes the angle brackets work, but anyway, the gist seemed like a good place to start. They appear to be using yapsy for the plugin system, which requires two files for your plugin, a python file where you define the plugin and a plugin-info file (which looks like an ini file). So to start I copied nikola/plugins/shortcode/gist.plugin and renamed it lancelot.plugin then edited it to look like this.

[Core]
name = lancelot
module = lancelot

[Nikola]
PluginCategory = Shortcode

[Documentation]
author = The Cloistered Monkey
version = 0.1
website = https://necromuralist.github.io/
description = Variant of the doc shortcode that allows alternate titles for non-restructured text formats.

Seems easy enough. Now to the plugin code.

The Plugin Code

According to the documentation on creating a shortcode, the requirements are that you create a plugin that inherits from the ShortcodePlugin (you'll find it in the nikola/plugin_categories.py file if you want to check it out) and define the handler method that handles the shortcode and returns a tuple of (output, dependencies). The output is the text that will replace the shortcode in the document and the dependencies is a list of files that nikola will use to check if something is out of date (I don't use it here).

There are three required named arguments for the handler method:

  • site: an object that you can use to check the state of the site
  • data: The text between the shortcode tags (if it uses both opening and closing tags)
  • lang: the current language

In addition, any attributes added within the shortcode tag will be passed into the handler method by position or keyword. Anyway, since I copied the doc code I didn't actually read this until just now, but maybe it's good to know. Onward.

The next thing to do was to create lancelot.py in the same folder as the lancelot.plugin file. The basic definition of class (and the start of the file) looks more-or-less the same for all the plugins.

# -*- coding: utf-8 -*-
# This file is public domain according to its author, the Cloistered Monkey

"""Shortcode for non-restructured text inter-site links."""

from nikola.plugin_categories import ShortcodePlugin


class Plugin(ShortcodePlugin):
    """Plugin for non-rst inter-site links."""

    name = "lancelot"

So, other than some doc-strings all you have to do is set the name and then the handler, which we'll do next.

Stealing From the doc

The file that I stole most of the code from is located in nikola/plugins/compile/rest/doc.py. The Plugin class in that file is handling both the restructured text role and the shortcode so we don't need all of it. According to the documentation I linked to above, the set_site method is used to tell nikola to use something other than the handler method, and in this case the author used it to register the functions for the two things it's handling.

def set_site(self, site):
    """Set Nikola site."""
    self.site = site
    roles.register_canonical_role('doc', doc_role)
    self.site.register_shortcode('doc', doc_shortcode)
    doc_role.site = site
    return super(Plugin, self).set_site(site)

If you look at the line self.site.register_shortcode('doc', doc_shortcode) you can figure out that we need to steal from a function named - wait for it… doc_shortcode. Here's what's in that function:

def doc_shortcode(*args, **kwargs):
    """Implement the doc shortcode."""
    text = kwargs['data']
    success, twin_slugs, title, permalink, slug = _doc_link(text, text, LOGGER)
    if success:
        if twin_slugs:
            LOGGER.warning(
                'More than one post with the same slug. Using "{0}" for doc shortcode'.format(permalink))
        return '<a href="{0}">{1}</a>'.format(permalink, title)
    else:
        LOGGER.error(
            '"{0}" slug doesn\'t exist.'.format(slug))
        return '<span class="error text-error" style="color: red;">Invalid link: {0}</span>'.format(text)

It looks pretty straight-forward except it's using two things not defined within it - LOGGER and _doc_link. The LOGGER is just an import so we can just change the start of our file to grab it. The doc_link is a function in the same file as doc_shortcode. My first thought for the _doc_link was that since it's a standalone function I could just import it and call it. That turned out to have a small problem though - right in the middle of _doc_link is this for loop:

for p in doc_role.site.timeline:
    if p.meta('slug') == slug:
        if post is None:
            post = p
        else:
            twin_slugs = True
            break

What you'll notice is that the doc_role function has an attribute site. Well, it doesn't really, until it's set in that set_site method above. So, okay, maybe I could figure out some way to set it… or maybe not, I don't know but even if I could it seems like it'd get kind of convoluted, and who knows what changes the original author might make in the future, it just seemed like it'd make more sense to re-implement it myself.

The Plugin

So, first a different start to our file, this time with the LOGGER and slugify imported (the slugify was in _doc_link which I'm re-defining later).

# -*- coding: utf-8 -*-
# This file is public domain according to its author, the Cloisted Monkey

"""Shortcode for non-restructured text inter-site links.
Re-write of the ``doc`` plugin to allow alternative titles outside of RST"""

from nikola.plugin_categories import ShortcodePlugin
from nikola.utils import LOGGER, slugify


class Plugin(ShortcodePlugin):
    """Plugin for non-rst inter-site links."""

    name = "lancelot"

lancelot_link

To replace the _doc_link I made a new function called lancelot_link which I'll be looking at in parts. First up is the function declaration.

def lancelot_link(site, slug, title):
    """process the slug, check if it exists or is duplicated

    if `title` is None this will grab the post-title

    Args:
     site: the Nikola object
     slug: the text between the shortcode tags
     title: the title passed in by the user (if any)

    Returns:
     tuple (success, has duplicate slugs, title, permalink, slug)
    """

The interface wants the objects that nikola passes into the handler method - the site object , the data (renamed slug) and the title.

Note: I'm not indenting the rest of the code in the post but imagine it's indented four spaces.

  • Slugification

    The first thing the function does is split out any fragments that might have been attached and "slugifies" the slug (makes sure it's ASCII and has only alphanumeric characters).

    if '#' in slug:
        slug, fragment = slug.split('#', 1)
    else:
        fragment = None
    slug = slugify(slug)
    
  • Find the Post

    Next it checks to see if the slug refers to an actual page on the site and if there are duplicate slugs. If the page doesn't exist, then it short-circuits the function so the handler can return some error text instead of a link. If it does exist it saves the post object for the next step (using the first post in the timeline if there were duplicates).

    twin_slugs = False
    post = None
    for p in site.timeline:
        if p.meta('slug') == slug:
            if post is None:
                post = p
            else:
                twin_slugs = True
                break
    
    if post is None:
        return False, False, title, None, slug
    
  • The Title

    If the user didn't pass in an alternative title this grabs the title that was given to the post we're linking to.

    Note: The doc shortcode raises then catches a ValueError exception if there's no matching post. I had thought that this was for logging, but that doesn't appear to be the case so I took it out. But since I don't know what it was doing in the first place I might be breaking something. Not that I can tell, though.

    if title is None:
        title = post.title()
    
  • The Permalink

    Now we grab the permalink.

    permalink = post.permalink()
    if fragment:
        permalink += '#' + fragment
    
  • The Return

    And finally we do the return dance to answer some questions for the handler:

    • Did we find the post?
    • Were there duplicate posts with the same slug?
    • What's the text to display for the link?
    • What's the address for the anchor tag?
    • What's the correct slug?
    return True, twin_slugs, title, permalink, slug
    

    The slug is only for the logging messages.

The handler

Now I'll define the handler that's called by nikola. This is a method of the Plugin class that I started above, but I'm showing it after the lancelot_link function since it mostly just calls lancelot_link and creates the output from what it returned. I originally had it all in the same method (and maybe I'll put it back at some point). But I thought it was a little easier to read this way, especially as I referred back to the original doc plugin to see what's going on.

def handler(self, title=None, site=None, data=None, lang=None):
    """Create an inter-site link

    Args:
     title: optional argument to specify a different title from the post

    Returns:
     output HTML to replace the shortcode
    """
    success, twin_slugs, title, permalink, slug = lancelot_link(
        site, data, title)
    if success:
        if twin_slugs:
            LOGGER.warning(
                'More than one post with the same slug. '
                f'Using "{permalink}" for lancelot shortcode')
        output = f'<a href="{permalink}">{title}</a>'
    else:
        LOGGER.error(
            f'"{slug}" slug doesn\'t exist.')
        output = ('<span class="error text-error" style="color: red;">'
                  f'Invalid link: {data}</span>')
    return output, []

One thing to note here is that the original doc plugin only returns the output, not an empty list, even though the documentation says you should. It works either way, but I noticed the gist plugin returned an empty list with the output so I followed, like a lemming to the sea.

Checking It Out

Now let's give it a dry run.

Does it work like the original doc?

If we put this in the post: {{% lancelot %}}2-nikola-inter-site-links{{% /lancelot %}}

We get this:

Nikola Inter-Site Links

Does it accept a title?

Now, what this was all about.

{{% lancelot title="An old post about linking to another post." %}}2-nikola-inter-site-links{{% /lancelot %}}

Gives us:

An old post about linking to another post.

And if you forget the name of the parameter, you can just pass in the alternate title instead.

{{% lancelot "Without the 'title=' part" %}}2-nikola-inter-site-links{{% /lancelot %}}

Gives this:

Without the 'title=' part

What if the slug doesn't exist?

{{% lancelot title="Oops." %}}this-error-is-on-purpose-ignore-it{{% /lancelot %}}

Gives us:

Invalid link: this-error-is-on-purpose-ignore-it

Note that adding this error to this post means it shows up in the logging every time I re-build. I hope I don't forget and try and hunt it down later.

One More Thing

When I originally was trying to figure this out I put the lancelot files in the shortcodes folder next to the gist files (in the virtualenv, so I knew it was a bad idea, even as I did it) but while I was re-writing them for this post I wanted a place to stash the files so I put them in the plugins folder that sits in the site repository next to the conf.py file - which already existed because that's where nikola put the org-mode plugin - and when I first re-built the site with the code only partially written, it raised an Exception because it turns out that the place to put plugins, including shortcode plugins, for nikola/yapsy to find them is in the plugins folder. Imagine that.

Also, to use an interactive debugger (like my favorite one, pudb) you need to change the verbosity when you build the site to 2.

nikola build -v 2

Otherwise it captures the stdout and you won't see the debugger (it will just look like it hung-up). The other thing is if you see an error something like this:

[2020-07-28 20:53:44] ERROR: Nikola: Shortcode error: Syntax error in shortcode 'lancelot' at line 426, column 27: expecting whitespace!

It more than likely mean that the error is actually in the tag - no space after the first % or before the second one or one time I chopped off the end of a tag when copying and pasting and it gave the same error - it seems to be a generic error that means "check the tag".

I guess that's actually three more things.

End

So, that's my first foray into making a nikola plugin. I've been using nikola for a while now, but I never really looked at the code before. It's nice to see that the plugin system is so easy to use.

Speech and Language Processing

Citation

Jurafsky, D. & Martin, J. (2020). Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition. 3rd Edition draft. (URL)

Notes

Online and PDF version of a (work in progress) revision to this text about text processing.

XML To Pandas

Beginning

I went to the Social Security website to change my password (I had forgotten it so they mailed a temporary one to me) and noticed that they have your yearly reported earnings both as a PDF and as a data download. Unfortunately the data is given as an XML file instead of a CSV (it has more than just the earnings data, so it would have to be a series of CSVs instead of one file) so these are my notes on converting it to a pandas DataFrame using BeautifulSoup.

Imports

To actually do the conversion you only need BeautifulSoup and pandas, all the rest of the stuff comes into play because I'm making a post.

# python
from argparse import Namespace
from functools import partial
from pathlib import Path

import os
import random

# from pypi
from bs4 import BeautifulSoup
from dotenv import load_dotenv

import hvplot.pandas
import pandas

# my stuff
from graeae import EmbedHoloviews

Set Up

The Dotenv

I'm using a .env file to point to the location of the file so this call will load it.

load_dotenv(override=True)

The Plotting

This is just a central place to set up some constants so I don't have to re-type them or hunt around for them if I want to change things.

SLUG = "xml-to-pandas"
Embed =  partial(EmbedHoloviews,
                 create_folder=True,
                folder_path=f"files/posts/{SLUG}")

Plot = Namespace(
    width=990,
    height=780,
    tan="#ddb377",
    blue="#4687b7",
    red="#ce7b6d",
    font_scale=2,
)

The Random Seed

I decided since I'm talking about social security stuff I'd make at least some attempt at obfuscating things so I'm adding a random offset to the years.

OFFSET = random.randint(10, 20)

Middle

The XML

Loading It

First, I'll load the XML into BeautifulSoup.

path = Path(os.environ["SOCIAL_SECURITY"]).expanduser()
assert path.is_file()

with path.open() as reader:
    xml = BeautifulSoup(reader.read(), "xml")

Note that you have to pass in the "xml" argument - BeautifulSoup is primarily an HTML parser so it assumes an HTML document by default. Normally I'd do some exploring at this point, but I didn't find it such an easy thing to do (I don't work with XML data generally) and found that it was easier to look at it with less and figure out what I needed.

The Earnings

The data I wanted is in a set of tags that look like this:

<osss:Earnings endYear="1998" startYear="1998">
<osss:FicaEarnings>526</osss:FicaEarnings>
<osss:MedicareEarnings>526</osss:MedicareEarnings>
</osss:Earnings>

This set represents what was collected in 1998 - $526 for Social Security and Medicare (FICA stands for Federal Insurance Contributions Act and is the name of the tax that funds Social Security). Yes, I only made $526 in 1998 because I was a teenager working a part-time job taking store inventories - still, I can't believe how little I got paid…

Anyway, so the first thing to do is to grab all the nodes representing earning.

earnings = xml.find_all("Earnings")
print(len(earnings))
print(earnings[0])
34
<osss:Earnings endYear="1998" startYear="1998">
<osss:FicaEarnings>526</osss:FicaEarnings>
<osss:MedicareEarnings>526</osss:MedicareEarnings>
</osss:Earnings>

The Years

Now that we have the earnings, we can see about getting the years. Although they have endYear and startYear they're always the same so I'll use startYear. I'm adding the OFFSET here just to obfuscate what years I'm looking at.

for year in earnings[:5]:
    print(int(year.get("startYear")) + OFFSET)
1998
1999
2000
2001
2002

Collected

Next I'll see about grabbing the amounts collected for each year. The FICA and Medicare amounts are always the same so I'll just use the FICA amount.

for year in earnings[:5]:
    print(year.find("FicaEarnings").string)
526
1123
1546
0
0

That looks right. It drops to 0 because I went to college and started working on campus and since I went to a state university they didn't collect FICA.

To a DataFrame

Anyway, that's basically all you need to get it going.

data = pandas.DataFrame.from_dict(
    dict(
        year = [int(year.get("startYear"))  for year in earnings],
        amount = [int(year.find("FicaEarnings").string) for year in earnings]
        )
)

data["offset"] = data.year + OFFSET
print(data.head(5)[["offset", "amount"]])
   offset  amount
0    1998     526
1    1999    1123
2    2000    1546
3    2001       0
4    2002       0

Save It

del(data["offset"])
csv = Path(os.environ["SOCIAL_SECURITY_CSV"]).expanduser()
data.to_csv(csv, index=False)

End

I usually like to put in a little plotting so I'm going to plot the amount over time. Since I already shared my first five years I'm going to start after that and obscure the numbers a little.

Time Series

First I'll add the offset year back in, then I'm going to scale it.

data["year"] = data.year + OFFSET
to_plot = data.iloc[5:]
MAX = to_plot.amount.max()
to_plot.loc[:, "amount"] = to_plot.amount/MAX

plot = to_plot.hvplot(x="year", y="amount").opts(
    title="Income Over Time",
    width=Plot.width, height=Plot.height,
    fontscale=Plot.font_scale,
    xaxis="bare",
    color=Plot.blue,
)

outcome = Embed(plot=plot, file_name="income_over_time")()
print(outcome)

Figure Missing

Well, there you go. Since I'm adding a random offset to the years I removed the x-axis labels, but the plot starts out with me still working for the State, so there's no reported income, then there's an uptick when I took a retail job, then a drop again when I went back to school for a little while to look into getting a design degree, then there's another uptick when I gave up on the design degree and went to work as a clerk for a real estate office, then the next drop comes when I went back to get a Computer Science degree and the rising after that shows the difference between working retail/clerical versus being a computer programmer, with the two plateaus representing the two companies I've worked for.

Jupyter-Emacs Sessions with org-mode

Beginning

These are my notes on using emacs-jupyter using a remote session. It works with a local session as well, but I'll just relate the steps as if you're running the jupyter session on a remote machine.

Middle

Starting the Jupyter Server

The first thing to do is start the jupyter session on the remote machine. Since I'm doing this with nikola I should note that you want to start the session in the same location as the file you're editing in emacs, because all your file references will be based on that directory (so if you, for instance, create an image and want to place it in the files folder, you will need to note where that stands relative to the file you are editing and where you start the jupyter kernel).

In my case I'm editing a file in ~/projects/In-Too-Deep/posts/fastai/.

cd ~/projects/In-Too-Deep/posts/fastai/
jupyter kernel

This will start the kernel and show you the file that you need to copy to your local machine (where you are running emacs). Here's an example output of that command.

(In-Too-Deep) hades@erebus ~/p/I/p/fastai (fastai-restart| Dirty:4)> jupyter kernel
[KernelApp] Starting kernel 'python3'
[KernelApp] Connection file: /home/hades/.local/share/jupyter/runtime/kernel-ae33a6cd-f607-450e-a03b-01abe2a3b232.json
[KernelApp] To connect a client: --existing kernel-ae33a6cd-f607-450e-a03b-01abe2a3b232.json

The important thing to note is the line with Connection file ([KernelApp] Connection file: /home/hades/.local/share/jupyter/runtime/kernel-ae33a6cd-f607-450e-a03b-01abe2a3b232.json). You will need to copy that file to the machine that you are running emacs on. Where do you put it? Check your jupyter location on your local machine (where you're running emacs, not where you're running jupyter).

jupyter --runtime-dir

Change into whatever directory is output by that command and then copy the json file from the machine with the running jupyter kernel onto your local machine.

cd ~/.local/share/jupyter/runtime
scp Hades:/home/hades/.local/share/jupyter/runtime/kernel-ae33a6cd-f607-450e-a03b-01abe2a3b232.json .

Start a Console

Now that you've copied over the information for the jupyter session you can start a console for it. I'll assume you're still in the directory with the json file in it, so I won't pass in the full path.

jupyter --console kernel-ae33a6cd-f607-450e-a03b-01abe2a3b232.json --ssh Hades

Note the second argument where I passed in the SSH alias for my remote machine. If you don't have an alias set up then replace it with something that looks like <username>@<IP Address> (this assumes, of course, that the machine with the jupyter session running on it also has an SSH server running). This command sets up our session to forward our jupyter commands to the remote machine. When you execute this command it should tell you that you can connect to the kernel using a slightly modified JSON file name:

[ZMQTerminalIPythonApp] Forwarding connections to 127.0.0.1 via Hades
[ZMQTerminalIPythonApp] To connect another client via this tunnel, use:
[ZMQTerminalIPythonApp] --existing kernel-ae33a6cd-f607-450e-a03b-01abe2a3b232-ssh.json

Note that --existing argument for the next session (it should be the same as the original json file but with -ssh added to the end of the name).

Setting the Session

Since this is org-mode-based the first thing you should do is connect your emacs buffer to the console. Add this to the top of your buffer (the file where you intend to run python).

#+PROPERTY: header-args :session /home/athena/.local/share/jupyter/runtime/kernel-ae33a6cd-f607-450e-a03b-01abe2a3b232-ssh.json

If you have this in your file when you open it you don't need to do anything special, but otherwise C-c C-c on it to load the session. What this does is allow subsequent python org-mode blocks to use the remote jupyter session when you execute them, without needing to specify a session.

End

At this point you can run org-mode code blocs that are set up to use emacs-jupyter and they will redirect to the remote jupyter session. That is a whole other adventure so I'll leave it for another time (or to someone else).

emacs-jupyter

Beginning

ob-ipython has become one of the most important tools in my workflow (along with Nikola, and the org-mode plugin), but earlier this week I stumbled upon emacs-jupyter and I thought maybe it'd be worth it to at least take a look. If the readme file is to be believed, it does everything ob-ipython does and more, so maybe my world would change once again. But then I ran into a little problem - trying to install it from MELPA caused emacs to crash… with no messages, nothing. So is this thing ready for the world to use yet?

Middle

Finding the Problem

I tried different things based on the errors in the emacs-jupyter's "Issues" but it turned out that this emacs-zmq bug had the answer - the emacs snapshot for Ubuntu wasn't built in a way that works with emacs-jupyter. The bug-report mentions an out of date gcc version, but I didn't confirm that.

Fixing the Problem

So, knowing that the version of emacs I was using was the problem I decided to build it myself. In the earlier days of Ubuntu this was something I did all the time, but it seems like it's been forever since I had to do this so I was a little worried that it might be a huge mess of Makefile debugging, but I found this page on the emacs wiki which made it pretty easy.

First Clone the Repository

git clone --depth 1 git://git.sv.gnu.org/emacs.git

Note: The --depth 1 option pulls only the most recent commit. This helps save on time, as the repository is huge.

Then Install the Dependencies

sudo apt-fast install autoconf automake libtool texinfo build-essential xorg-dev libgtk2.0-dev libjpeg-dev libncurses5-dev libdbus-1-dev libgif-dev libtiff-dev libm17n-dev libpng-dev librsvg2-dev libotf-dev libgnutls28-dev libxml2-dev

Then Build It

Note: This is how I first did it and it doesn't work the way I wanted it to so see the next section for the one that did work.

First do the autogen.

cd emacs
./autogen.sh

Then I did the configure. I wanted to install it in my user directory so I passed in a prefix for the path I wanted. This takes about a half a munite.

./configure --prefix=/home/athena/bin/emacs-jupyter/

And now to actually build it. The make bootstrap took around 15 minutes for me.

make bootstrap
make install

And finally link to it in /usr/local/bin.

sudo ln -s /home/athena/bin/emacs-jupyter/bin/emacs /usr/local/bin/emacs-jupyter

That bit about installing it in my home directory and linking it isn't necessary, but just a habit of mine, since I tend to forget how I installed things and having it set up this way makes me remember.

Fix the Other Problem

It turned out that the build went okay, and I could even install emacs-jupyter (yay), but when I tried to execute M-x jupyter-run-repl I got an error message saying that modules weren't supported (what?). So then I found this blog post that said you have to pass in the --with-modules argument when you run configure… So now the process became this:

cd emacs
./autogen.sh
./configure --prefix=/home/athena/bin/emacs-jupyter/ --with-modules
make bootstrap
make install
sudo ln -s /home/athena/bin/emacs-jupyter/bin/emacs /usr/local/bin/emacs-jupyter

And what do you know, it worked.

An Update That Broke It

At some point after I first wrote this I switched to using the emacs-snapshot package, which worked for a while, but when I updated it on March 9, 2021, it caused emacs-jupyter to fail with a ZMQ error:

error in process filter: Error in ZMQ subprocess: error, ("Lisp nesting exceeds ‘max-lisp-eval-depth’")

I tried re-installing emacs-jupyter and emacs-zmq but that didn't help so I decided to find a version of emacs that was older, but they only had the most recent emacs-snapshot available for Ubuntu 20.10 so I decide to go back to building emacs myself.

Since it was an update that caused it to break (which happened sometime between August of last year when the previous snapshot came out and March 9, when I updated) pulling just the most recent version wouldn't work for me, given that I needed to find code that predated the thing that broke it, so I pulled the whole history and the found the tag for the most recent release (emacs-27.1-rc2) from last August and checked it out.

git checkout emacs-27.1-rc2

According to Stack Overflow you could also just checkout that one tag, but I didn't think to look until after I already cloned it. Also, I don't know how I would have gotten the tag without cloning it first. Must be out there somewhere.

And then built it like I did in the previous section.

Also, since I pulled an older version I had to re-compile the packages as mentioned on Stack Overflow. First M-: then in the mini-buffer

(byte-recompile-directory package-user-dir nil 'force)

End

emacs-jupyter looks like an improvement over ob-ipython in that it adds a lot of features (and claims to be faster), but getting it to work was way harder than I'm used to. I don't think it was "difficult" in a real sense, given what it used to be like to make and install things on Ubuntu, but I think I've gotten used to things just working. Anyway, now I can see if emacs-jupyter lives up to its own hype.

Update: Since I first wrote this I've come to rely on emacs-jupyter a lot, I don't take advantage of a lot of its features, but for executing python code in org-babel, it's everything that I need (so far).

cuDF With emacs-jupyter

Beginning

This is a first attempt to use RAPIDS using their docker container and emacs-jupyter. So there's multiple places where things can go wrong and I don't know why.

Problems Before I Even Started

the RAPIDS instruction for starting the docker container is out of date

The instructions on the getting started page say to start the docker container using this:

docker run --runtime=nvidia --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 \
    rapidsai/rapidsai:0.8-cuda10.0-runtime-ubuntu18.04-gcc7-py3.7

But the --runtime=nvidia flag is for the now-deprecated nvidia-docker2 package (which isn't compatible with Ubuntu Disco Dingo anyway) and it will cause it to fail with an unknown runtime error if you don't have that older package installed (which I don't). Removing the flag (and having the NVIDIA Container Toolkit installed) fixes the error.

The emacs-snapshot isn't compatible with emacs-jupyter

ob-ipython has become such a center-piece for how I work I can't even remember how I did things before I discovered it, but now there's Emacs Jupyter which claims to have even more features, so I thought I'd try it out, but when I tried to install it emacs would crash (during the installation). According to this bug report the emacs snapshot for Ubuntu is built with an out-of-date version of gcc. I don't know if that's true, but I re-built emacs with the instructions on the emacs wiki and it at least installed emacs-jupyter without crashing. Here's where I find out if it works. Of course, I now have two versions of emacs. One that gets updated automatically and one that works with emacs-jupyter. I'll have to figure out what to do about that, assuming emacs-jupyter turns out to be worth keeping.

Imports

PyPi

import cudf
import dask_cudf
import pandas

Middle

Connecting To the Docker Container

According to the emacs-snapshot documentation you can connect via SSH (but the Rapids docker container doesn't have it installed by default) or you can connect to a notebook server. I originally was going to try the SSH route, since I already do that with ob-ipython, but the notebook-server might be more suited to this case. Let's see.

print("test")
test

So, the notebook doesn't seem to work as-is, but the SSH connection does, which is nice, but it's not different from what ob-ipython gave me (well it kind of is in that I didn't copy the file over).

Create Series

CUDF Series

This runs on the GPU.

s = cudf.Series([1, 2, 3, None, 4])
print(s)
0    1
1    2
2    3
3     
4    4
dtype: int64

dask CUDF

This also runs on the GPU, but if you have more than one GPU it will use more than one.

ds = dask_cudf.from_cudf(s, npartitions=2)
print(ds.compute())
0    1
1    2
2    3
3     
4    4
dtype: int64

My machine only has one GPU, so this didn't gain anything, but I do have more than one machine with a GPU so this might help with distributed computing, if I get around to it.

Data Frames

frame = cudf.DataFrame([("a", list(range(10))),
                        ("b", list(range(10)))])
frame["a"] = frame.a * 5
print(frame)
    a  b
0   0  0
1   5  1
2  10  2
3  15  3
4  20  4
5  25  5
6  30  6
7  35  7
8  40  8
9  45  9

From a Pandas DataFrame

frame = pandas.DataFrame({"a": list(range(4)), "b": list(range(4, 8))})
frame = cudf.DataFrame.from_pandas(frame)
print(frame)
   a  b
0  0  4
1  1  5
2  2  6
3  3  7

Selection

print(frame[frame.a > 1])
   a  b
2  2  6
3  3  7

Applyng functions

frame["a"] = frame.a.applymap(lambda row: row + 5)
print(frame)
   a  b
0  5  4
1  6  5
2  7  6
3  8  7

This is basically the pandas.DataFrame.apply method, but they renamed it for some reason.

String Methods

series = cudf.Series(["Alpha", "Beta", "GAMMA", "dELTA"])
print(series.str.lower())
0    alpha
1     beta
2    gamma
3    delta
dtype: object

End

After a certain point, this was kind of a boring exercise, mostly because cuDF runs a subset of pandas but on the GPU, so if you know pandas, you know some of cuDF, but just getting it working (with emacs-jupyter) was a little bit of work, so maybe it's useful to have recorded that here.

Opening Remote Files In Emacs Using SSH

Beginning

For some strange reason, the emacs wiki page on tramp mode doesn't have instructions on how to open a file on a remote machine using tramp in its main section. You instead have to go down to the Tips and Tricks and try and pick out a version that works. So I'm writing this here so I maybe won't have to do all the searching I did the next time. This is the version that worked for me (Emacs 27.0.50, Ubuntu 19.04).

Middle

Editing a Remote User File

You start by opening the file like you would a local file (C-x C-f) and then in when the minibuffer opens up you use this syntax for the path:

/ssh:<username>@<hostname>>:<path to file>

So, for example, I have an SSH alias to hades@erebus named Hades and if I wanted to edit the emacs init file on that machine I would use this:

/ssh:Hades:.emacs.d/init.el

Editing a Remote File As Root

To open a file as root you stick an extra pipe (|sudo) into the previous path syntax.

/ssh:<username>@<hostname>|sudo:<hostname>:<path to file>

This syntax doesn't work with SSH aliases (or didn't seem to when I tried), so editing the /etc/apt/sources.list file the same machine as before would use this:

/ssh:hades@erebus|sudo:erebus:/etc/apt/sources.list

End

I got the sudo version from Stack Overflow (of course).