Natural Language Toolkit Notes

I've been experimenting with Python's Natural Language Toolkit, following along with Steven Bird, Ewan Klein, and Edward Loper's book "Natural Language Processing with Python --- Analyzing Text with the Natural Language Toolkit" (pdf version).

So far, the book's been great. As I'm going through the book, I've been writing down notes relating to the book's examples. I've made a Github repo to store these notes and experiments that I may be doing using the NLTK here.


pushd and popd forever

Becoming tired of typing paths repeatedly in the terminal, I realized that I should be using pushd and popd to be navigating directory structures. For those uninitiated, pushd changes your current directory in a similar fashion to cd but additionally adds the former directory to a stack. You can later return to the former directory by executing popd, popping it from the directory history. Unfortunately, the commands pushd and popd both require at least twice as many characters to type as cd and additionally come with the overhead of having to learnt o use a new command instead of something that is nearly instinctual. Then it came to me: pushd all the time.

Overriding cd with a muted pushd operates exactly like the standard cd command, with the added benefity that the path history is saved. Furthermore, adding an alias of p to popd allows the previous directory to be popped with minimal effort.

Additionally, when exploring the idea, I came across this StackExchange post illustrating a back function, allowing you to switch back and forth between your current and previous directory with removing either from the stack. In the end, this is what I put in my bash profile:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# CD is now silent pushd
cd()
{
  if [ $# -eq 0 ]; then
    DIR="${HOME}"
  else
    DIR="$1"
  fi

  builtin pushd "${DIR}" > /dev/null
}

# Take you back without popd
back()
{
  builtin pushd > /dev/null
  dirs
}

alias p='popd'
alias b='back'

SSH Port Forwarding

The other week I found myself up at 2am in Canada setting up a VPN between my home computer (running Ubuntu) in Seattle and my laptop <partyhard.jpg>. I had enabled SSH access on my home computer and had set up port forwarding on my router to allow for access from the outside world ahead of time, but had forgotten that I would need to have a port forwarded for the VPN server as well. I tried to SSH into my home box and access the router's admin interface from the commandline browser (using Lynx and w3m). This was a bad idea and didn't work, as the browser's admin page required JavaScript for some odd reason.

And then I remembered this command:

ssh -D 8080 -Nf login@server.whatever.com

Pointed my browser's connection settings to SOCKS proxy with server as localhost and port at 8080 and BOOM, was able to access my Seattle home's router's config page from Canada. I've found this trick useful for all sorts of things, typically for one-offs where I need to access a website from the US while in Canada.


EDIT:

Another useful command for when you need to connect to any given port on a remote server is the following:

ssh -N -L [local_port]:[endpoint]:[remote_port]  [user]@[host] 

DataFarts - Bad Name, Cool Tool

At last week's CUGOS, Aaron Racicot showed off a few cool things that he'd come across over the past month.

One of which was DataFart. Okay, it's a bad name. A horrible name. But it's actually a pretty great idea and so easy to set up. In the site's own words, "DataFart lets you easily graph data from the command line." It's essentially an API end-point to pipe your data to. It returns a URL which presents your data graphed via D3.js, turning this:

1.0 2.61 3.1
1.2 2.11 4.8
2.1 3.40 5.2

into this:

DataFart's Example Image

Best off, it's really nothing more than a terminal command:

cat somedata.txt | curl --data-binary @- datafart.com

Installation, so-to-speak, is nothing more than an alias in your .bash_profile:

alias datafart='curl --data-binary @- datafart.com'

For extra points, I append the command with xargs open (for Mac OS) or xargs gnome-open (for Ubuntu) to have the returned URL automatically open in your default browser:

alias datafart='curl --data-binary @- datafart.com | xargs open'