Here you can find my english pages. When there are enough of them, they might get the same or a similar structure as the german ones.

You can view these pages like a blog by checking the


< < new english posts (weblog) > >


- they also feature an RSS-Feed.

Also you can find some more of my english writings by looking at the blog-entries in LJ which I tagged english.

Best wishes,

A tale of foxes and freedom

Singing the songs of creation to shape a free world.

One day the silver kit asked the grey one:

“Who made the light, which brightens our singing place?”

The grey one looked at it lovingly and asked the kit to sit with him, for he would tell a story from the old days when the tribe was young.

“Once there was a time, when the world was light and happiness. During the day the sun shone on the savannah, and at night the moon cast the grass in a silver sheen.

It was during that time, when there were fewer animals in the wild, that the GNUs learned to work songs of creation, deep and vocal, and they taught us and everyone their new findings, and the life of our skulk was happiness and love.

But while the GNUs spread their songs and made new songs for every idea they could imagine, others invaded the plains, and they stole away the songs and only allowed singing them their way. And they drowned out the light, and with it went the happiness and love.

And when everyone shivered in cold and darkness, and stillness and despair were drawn over the land, the others created a false light which cast small enclosures into a pale flicker, in which they let in only those animals who were willing to wear ropes on their throats and limbs, and many animals went to them to escape the darkness, while some fell deeper still and joined the others in enslaving their former friends.

Upon seeing this, the fiercest of the GNUs, the last one of the original herd, was filled with a terrible anger to see the songs of creation turned into a tool for slavery, and he made one special song which created a spark of true light in the darkness which could not be taken away, and which exposed the falsehood in the light of the others. And whenever he sang this song, those who were near him were touched by happiness.

But the others were many and the GNU was alone, and many animals succumbed to the ropes or the ropers and could move no more on their own.

To spread the song, the GNU now searched for other animals who would sing with it, and the song spread, and with it the freedom.

It was during these days, that the GNU met our founders, who lived in golden chains in a palace of glass.

In this palace they thought themselves lucky, and though the light of the palace grew ever paler and the chains grew heavier with every passing day, they didn't leave, because they feared the utter darkness out there.

When they then saw the GNU, they asked him: "Isn't your light weaker than this whole palace?" and the GNU answered: "Not if we sing it together", and they asked "But how will we eat in the darkness?" and the GNU answered "you'll eat in the light of your songs, and plants will grow wherever you sing", and they asked "But is it a song of foxes?" and the GNU said: "You can make it so", and he began to sing, and when our founders joined in, the light became shimmering silver like the moon they still remembered from the days and nights of light, and they rejoiced in its brightness.

And whenever this light touched the glass of the palace, the glass paled and showed its true being, and where the light touched the chains, they whithered and our founders went into the darkness with the newfound light of the moon as companion, and they thanked the GNU and promised to help it, whenever they were needed.

Then they set off to learn the many songs of the world and to spread the silver light of the moon wherever they came.

This is how our founders learned to sing the light, which brightens every one of our songs, and as our skulk grew bigger, the light grew stronger and it became a little moon, which will grow with each new kit, until its light will fill the whole world again one day.”

The grey one looked around where many kits had quietly found a place, and then he laughed softly, before he got up to fetch himself a meal for the night, and the kits began to speak all at once about his story. And they spoke until the silver kit raised its voice and sung the song of moonlight1, and they joined in and the song filled their hearts with joy and the air with light, and they knew that wherever they would travel, this skulk was where their hearts felt home.

PS: This story is far less loosely based on facts than it looks. There are songs of creation, namely computer programs, which once were free and which were truly taken away and used for casting others into darkness. And there was and still is the fierce GNU with his song of light and freedom, and he did spread it to make it into GNU/Linux and found the free software community we know today. If you want to know more about the story as it happened in our world, just read the less flowery story of Richard Stallman, free hackers and the creation of GNU or listen to the free song Infinite Hands.

PPS: I originally wrote this story for Phex, a free Gnutella based p2p filesharing program which also has an anonymous sibling (i2phex). It’s an even stronger fit for Firefox, though.

PPPS: License: This text is given into the public under the GNU FDL without invariant sections and other free licenses by Arne Babenhauserheide (who has the copyright on it).

P4S: Alternate link: http://www.draketo.de/english/tale-of-foxes-and-freedom

  1. To make it perfectly clear: This moonlight is definitely not the abhorrent and patent stricken silverlight port from the mono project. The foxes sing a song of freedom. They wouldn’t accept the shackles of Microsoft after having found their freedom. Sadly the PR departments of some groups try to take over analogies and strong names. Don’t be fooled by them. The moonlight in our songs is the light coming from the moon which resonates in the voices of the kits. And that light is free as in freedom, from copyright restrictions as well as from patent restrictions – though there certainly are people who would love to patent the light of the moon. Those are the ones we need to fight to defend our freedom. 


Cross platform, Free Software, almost all features you can think of, graphical and in the shell: Learn once, use for everything.

» Get Emacs «

Emacs is a self-documenting, extensible editor, a development environment and a platform for lisp-programs - for example programs to make programming easier, but also for todo-lists on steroids, reading email, posting to identi.ca, and a host of other stuff (learn lisp).

It is one of the origins of GNU and free software (Emacs History).

In Markdown-mode it looks like this:

Emacs mit Markdown mode

More on emacs on my german Emacs page.

Babcore: Emacs Customizations everyone should have

Update (2017-05): babcore is at 0.2, but I cannot currently update the marmalade package. See lisplets/babcore.el

1 Intro

PDF-version (for printing)

Package (to install)

orgmode-version (for editing)

repository (for forking)

project page (for fun ☺)

Emacs Lisp (to use)

I have been tweaking my emacs configuration for years, now, and I added quite some cruft. But while searching for the right way to work, I also found some gems which I direly miss in pristine emacs.

This file is about those gems.

Babcore is strongly related to Prelude. Actually it is just like prelude, but with the stuff I consider essential. And staying close to pristine Emacs, so you can still work at a coworkers desk.

But before we start, there is one crucial piece of advice which everyone who uses Emacs should know:

C-g: abort

Hold control and hit g.

That gets you out of almost any situation. If anything goes wrong, just hit C-g repeatedly till the problem is gone - or you cooled off far enough to realize that a no-op is the best way to react.

To repeat: If anything goes wrong, just hit C-g.

2 Package Header

As Emacs package, babcore needs a proper header.

;; Copyright (C) 2013 Arne Babenhauserheide

;; Author: Arne Babenhauserheide (and various others in Emacswiki and elsewhere).
;; Maintainer: Arne Babenhauserheide
;; Created 03 April 2013
;; Version: 0.1.0
;; Version Keywords: core configuration

;; This program is free software; you can redistribute it and/or
;; modify it under the terms of the GNU General Public License
;; as published by the Free Software Foundation; either version 3
;; of the License, or (at your option) any later version.

;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with this program. If not, see <http://www.gnu.org/licenses/>.

;;; Commentary:
;; Quick Start / installation:
;; 1. Download this file and put it next to other files Emacs includes
;; 2. Add this to you .emacs file and restart emacs:
;;      (require 'babcore)
;; Alternatively install via package.el:
;;      (require 'package)
;;      (add-to-list 'package-archives '("marmalade" . "http://marmalade-repo.org/packages/"))
;;      (package-refresh-contents)
;;      (package-install 'babcore)
;; Use Case: Use a common core configuration so you can avoid the
;;   tedious act of gathering all the basic stuff over the years and
;;   can instead concentrate on the really cool new stuff Emacs offers
;;   you.

;;; Change Log:

;; 2016-06-05 - 0.1.0: replace desktop with better savehist config and
;;                     cleanup babcore. Replace flymake with flycheck.
;;                     Remove the eval-region key-chord. Simplify
;;                     x-urgent. Fix switching back from full-screen
;;                     mode. Remove babcore-shell-execute, since
;;                     async-shell-command (M-&) is a built-in which
;;                     does the job better. Add C-M-. as third alias
;;                     for goto-last-change. Add find-file-as-root and
;;                     a few fixes for encumbering behavior.
;; 2013-11-02 - Disable clipboard sync while exporting with org-mode
;;              org-export-dispatch
;; 2013-10-22 - More useful frame titles
;; 2013-04-03 - Minor adjustments
;; 2013-02-29 - Initial release

;;; Code:

Additionally it needs the proper last line. See finish up for details.

3 Feature Gems

3.1 package.el, full setup

The first thing you need in emacs 24. This gives you a convenient way to install just about anything, so you really should use it.

Also I hope that it will help consolidate the various emacs tips which float around into polished packages by virtue of giving people ways to actually get the package by name - and keep it updated almost automatically.

;; Convenient package handling in emacs

(require 'package)
;; use packages from marmalade
(add-to-list 'package-archives '("marmalade" . "http://marmalade-repo.org/packages/"))
;; and the old elpa repo
(add-to-list 'package-archives '("elpa-old" . "http://tromey.com/elpa/"))
;; and automatically parsed versiontracking repositories.
(add-to-list 'package-archives '("melpa" . "http://melpa.milkbox.net/packages/"))

;; Make sure a package is installed
(defun package-require (package)
  "Install a PACKAGE unless it is already installed 
or a feature with the same name is already active.

Usage: (package-require 'package)"
  ; try to activate the package with at least version 0.
  (package-activate package '(0))
  ; try to just require the package. Maybe the user has it in his local config

  (condition-case nil
      (require package)
    ; if we cannot require it, it does not exist, yet. So install it.
    (error (progn
             (package-install package)
             (require package)))))

;; Initialize installed packages
;; package init not needed, since it is done anyway in emacs 24 after reading the init
;; but we have to load the list of available packages, if it is not available, yet.
(when (not package-archive-contents)
  (with-timeout (15 (message "updating package lists failed due to timeout"))

3.2 Flycheck

Flycheck is an example of a quite complex feature which really everyone should have.

It can check any kind of code, and actually anything which can be verified with a program which gives line numbers.

This is a drop-in replacement for the older flymake. See Spotlight: Flycheck, a Flymake replacement for reasons to switch to flycheck.

;; Flycheck: On the fly syntax checking
(package-require 'flycheck)
(add-hook 'after-init-hook #'global-flycheck-mode)
; stronger error display
(defface flycheck-error
  '((t (:foreground "red" :underline (:color "Red1" :style wave) :weight bold)))
  "Flycheck face for errors"
  :group "flycheck")

3.3 auto-complete

This gives you inline auto-completion preview with an overlay window - even in the text-console. Partially this goes as far as API-hints (for example for elisp code). Absolutely essential.

;; Inline auto completion and suggestions
(package-require 'auto-complete)
;; avoid competing with org-mode templates.
(add-hook 'org-mode-hook
          (lambda ()
            (make-local-variable 'ac-stop-words)
            (loop for template in org-structure-template-alist do
                  (add-to-list 'ac-stop-words 
                               (concat "<" (car template))))))

3.4 ido

To select a file in a huge directory, just type a few letters from that file in the correct order, leaving out the non-identifying ones. Darn cool!

; use ido mode for file and buffer Completion when switching buffers
(require 'ido)
(ido-mode t)

3.5 printing

Printing in pristine emacs is woefully inadequate, even though it is a standard function in almost all other current programs.

It can be easy, though:

;; Convenient printing
(require 'printing)
(pr-update-menus t)
; make sure we use localhost as cups server
(setenv "CUPS_SERVER" "localhost")
(package-require 'cups)

3.6 outlining everywhere

Code folding is pretty cool to get an overview of a complex structure. So why shouldn’t you be able to do that with any kind of structured data?

; use allout minor mode to have outlining everywhere.

3.7 Syntax highlighting

Font-lock is the emacs name for syntax highlighting - in just about anything.

; syntax highlighting everywhere
(global-font-lock-mode 1)

3.8 org and babel

Org-mode is that kind of simple thing which evolves to a way of life when you realize that most of your needs actually are simple - and that the complex things can be done in simple ways, too.

It provides simple todo-lists, inline-code evaluation (as in this file) and a full-blown literate programming, reproducible research publishing platform. All from the same simple basic structure.

It might change your life… and it is the only planning solution which ever prevailed against my way of life and organization.

; Activate org-mode
(require 'org)
; and some more org stuff

; http://orgmode.org/guide/Activation.html#Activation

; The following lines are always needed.  Choose your own keys.
(add-to-list 'auto-mode-alist '("\\.org\\'" . org-mode))
; And add babel inline code execution
; babel, for executing code in org-mode.
 ; load all language marked with (lang . t).
 '((C . t)
   (R . t)
   (ditaa . t)
   (dot . t)
   (emacs-lisp . t)
   (gnuplot . t)
   (org . t)
   (python . t)
   (sh . t)

3.9 Nice line wrapping

If you’re used to other editors, you’ll want to see lines wrapped nicely at the word-border instead of lines which either get cut at the end or in the middle of a word.

global-visual-line-mode gives you that.

; Add proper word wrapping
(global-visual-line-mode t)

3.10 goto-chg

This is the kind of feature which looks tiny: Go to the place where you last changed something.

And then you get used to it and it becomes absolutely indispensable.

; go to the last change
(package-require 'goto-chg)
(global-set-key [(control .)] 'goto-last-change)
; M-. can conflict with etags tag search. But C-. can get overwritten
; by flyspell-auto-correct-word. And goto-last-change needs a really
; fast key.
(global-set-key [(meta .)] 'goto-last-change)
; ensure that even in worst case some goto-last-change is available
(global-set-key [(control meta .)] 'goto-last-change)

3.11 flyspell

Whenever you write prosa, a spellchecker is worth a lot, but it should not unnerve you.

Install aspell, then activate flyspell-mode whenever you need it.

It needs some dabbling, though, to make it work nicely with non-english text.

(require 'flyspell)
; Make german umlauts work.
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)

;aspell und flyspell
(setq-default ispell-program-name "aspell")

;make aspell faster but less correctly
(setq ispell-extra-args '("--sug-mode=ultra" "-w" "äöüÄÖÜßñ"))
(setq ispell-list-command "list")

3.12 control-lock

If you have to do the same action repeatedly, for example with flyspell hitting next-error and next-correction hundreds of times, the need to press control can really be a strain for your fingers.

Sure, you can use viper-mode and retrain your hands for the completely alien command set of vim.

A simpler solution is adding a sticky control key - and that’s what control-lock does: You get modal editing with your standard emacs commands.

Since I am a german, I simply use the german umlauts to toggle the control-lock. You will likely want to choose your own commands here.

; control-lock-mode, so we can enter a vi style command-mode with standard emacs keys.
(package-require 'control-lock)
; also bind M-ü and M-ä to toggling control lock.
(global-set-key (kbd "M-ü") 'control-lock-toggle)
(global-set-key (kbd "C-ü") 'control-lock-toggle)
(global-set-key (kbd "M-ä") 'control-lock-toggle)
(global-set-key (kbd "C-ä") 'control-lock-toggle)
(global-set-key (kbd "C-z") 'control-lock-toggle)

3.13 Basic key chords

This is the second strike for saving your pinky. Yes, Emacs is hard on the pinky. Even if it were completely designed to avoid strain on the pinky, it would still be hard, because any system in which you do not have to reach for the mouse is hard on the pinky.

But it also provides some of the neatest tricks to reduce that strain, so you can make Emacs your pinky saviour.

The key chord mode allows you to hit any two keys at (almost) the same time to invoke commands. Since this can interfere with normal typing, I would only use it for letters which are rarely typed after each other.

These default chords have proven themselves to be useful in years of working with Emacs.

; use key chords invoke commands
(package-require 'key-chord)
(key-chord-mode 1)
; buffer actions
(key-chord-define-global "vb"     'eval-buffer)
(key-chord-define-global "cy"     'yank-pop)
(key-chord-define-global "cg"     "\C-c\C-c")
; frame actions
(key-chord-define-global "xo"     'other-window);
(key-chord-define-global "x1"     'delete-other-windows)
(key-chord-define-global "x0"     'delete-window)
(defun kill-this-buffer-if-not-modified ()
  ; taken from menu-bar.el
  (if (menu-bar-non-minibuffer-window-p)
      (kill-buffer-if-not-modified (current-buffer))
(key-chord-define-global "xk"     'kill-this-buffer-if-not-modified)
; file actions
(key-chord-define-global "bf"     'ido-switch-buffer)
(key-chord-define-global "cf"     'ido-find-file)
(key-chord-define-global "vc"     'vc-next-action)

To complement these tricks, you should also install and use workrave or at least type-break-mode.

3.14 X11 tricks

These are ways to improve the integration of Emacs in a graphical environment.

We have this cool editor. But it is from the 90s, and some of the more modern concepts of graphical programs have not yet been integrated into its core. Maybe because everyone just adds them to the custom setup :)

On the other hand, Emacs always provided split windows and many of the “new” window handling functions in dwm and similar - along with a level of integration with which normal graphical desktops still have to catch up. Open a file, edit it as text, quickly switch to org-mode to be able to edit an ascii table more efficiently, then switch to html mode to add some custom structure - and all that with a consistent set of key bindings.

But enough with the glorification, let’s get to the integration of stuff where Emacs arguably still has weaknesses.

3.14.1 frame-to-front

Get the current Emacs frame to the front. You can for example call this via emacsclient and set it as a keyboard shortcut in your desktop (for me it is F12):

emacsclient -e "(show-frame)"

This sounds much easier than it proves to be in the end… but luckily you only have to solve it once, then you can google it anywhere…

(defun show-frame (&optional frame)
  "Show the current Emacs frame or the FRAME given as argument.

And make sure that it really shows up!"
  ; yes, you have to call this twice. Don’t ask me why…
  ; select-frame-set-input-focus calls x-focus-frame and does a bit of
  ; additional magic.
  (select-frame-set-input-focus (selected-frame))
  (select-frame-set-input-focus (selected-frame)))

3.14.2 urgency hint

Make Emacs announce itself in the tray.

;; let emacs blink when something interesting happens.
;; in KDE this marks the active Emacs icon in the tray.
(defun x-urgency-hint (frame arg &optional source)
  "Set the x-urgency hint for the frame to arg: 

- If arg is nil, unset the urgency.
- If arg is any other value, set the urgency.

If you unset the urgency, you still have to visit the frame to make the urgency setting disappear (at least in KDE)."
    (let* ((wm-hints (append (x-window-property 
                "WM_HINTS" frame "WM_HINTS" source nil t) nil))
     (flags (car wm-hints)))
    (setcar wm-hints
        (if arg
        (logior flags #x100)
          (logand flags (lognot #x100))))
    (x-change-window-property "WM_HINTS" wm-hints frame "WM_HINTS" 32 t)))

(defun x-urgent (&optional arg)
  "Mark the current emacs frame as requiring urgent attention. 

With a prefix argument which does not equal a boolean value of nil, remove the urgency flag (which might or might not change display, depending on the window manager)."
  (interactive "P")
  (let (frame (selected-frame))
  (x-urgency-hint frame (not arg))))

3.14.3 fullscreen mode

Hit F11 to enter fullscreen mode. Any self-respecting program should have that… and now Emacs does, too.

; fullscreen, taken from http://www.emacswiki.org/emacs/FullScreen#toc26
; should work for X und OSX with emacs 23.x (TODO find minimum version).
; for windows it uses (w32-send-sys-command #xf030) (#xf030 == 61488)
(defvar babcore-fullscreen-p nil
  "Check if fullscreen is on or off")
(defvar babcore-stored-frame-width nil 
  "width of the frame before going fullscreen")
(defvar babcore-stored-frame-height nil
  "width of the frame before going fullscreen")

(defun babcore-non-fullscreen ()
  (if (fboundp 'w32-send-sys-command)
      ;; WM_SYSCOMMAND restore #xf120
      (w32-send-sys-command 61728)
    (progn (set-frame-parameter nil 'fullscreen nil)
           (set-frame-parameter nil 'width 
                                (if babcore-stored-frame-width
                                    babcore-stored-frame-width 82))
           (sleep-for 0 1) ; 1ms sleep: workaround to avoid unsetting the width in the next command
           (set-frame-parameter nil 'height
                                (if babcore-stored-frame-height 
                                    babcore-stored-frame-height 42)))))

(defun babcore-fullscreen ()
  (setq babcore-stored-frame-width (frame-width))
  (setq babcore-stored-frame-height (frame-height))
  (if (fboundp 'w32-send-sys-command)
      ;; WM_SYSCOMMAND maximaze #xf030
      (w32-send-sys-command 61488)
    (set-frame-parameter nil 'fullscreen 'fullboth)))

(defun toggle-fullscreen ()
  (setq babcore-fullscreen-p (not babcore-fullscreen-p))
  (if babcore-fullscreen-p

(global-set-key [f11] 'toggle-fullscreen)

3.14.4 default key bindings

I always hate it when some usage pattern which is consistent almost everywhere fails with some program. Especially if that is easily avoidable.

This code fixes that for Emacs in KDE.

; Default KDE keybindings to make emacs nicer integrated into KDE. 

; can treat C-m as its own mapping.
; (define-key input-decode-map "\C-m" [?\C-1])

(defun revert-buffer-preserve-modes ()
  (revert-buffer t nil t))

; C-m shows/hides the menu bar - thanks to http://stackoverflow.com/questions/2298811/how-to-turn-off-alternative-enter-with-ctrlm-in-linux
; f5 reloads
(defconst kde-default-keys-minor-mode-map
  (let ((map (make-sparse-keymap)))
    (set-keymap-parent map text-mode-map)
    (define-key map [f5] 'revert-buffer-preserve-modes)
    (define-key map [?\C-1] 'menu-bar-mode)
    (define-key map [?\C-+] 'text-scale-increase)
    (define-key map [?\C--] 'text-scale-decrease) ; shadows 'negative-argument which is also available via M-- and C-M--, though.
    (define-key map [C-kp-add] 'text-scale-increase)
    (define-key map [C-kp-subtract] 'text-scale-decrease)
  "Keymap for `kde-default-keys-minor-mode'.")

;; Minor mode for keypad control
(define-minor-mode kde-default-keys-minor-mode
  "Adds some default KDE keybindings"
  :global t
  :init-value t
  :lighter ""
  :keymap 'kde-default-keys-minor-mode-map

3.14.5 Useful Window/frame titles

The titles of windows of GNU Emacs normally look pretty useless (just stating emacs@host), but it’s easy to make them display useful information:

;; Set the frame title as by http://www.emacswiki.org/emacs/FrameTitle
(setq frame-title-format (list "%b ☺ " (user-login-name) "@" (system-name) "%[ - GNU %F " emacs-version)
      icon-title-format (list "%b ☻ " (user-login-name) "@" (system-name) " - GNU %F " emacs-version))

Now we can always see the name of the open buffer in the frame. No more searching for the right emacs window to switch to in the window list.

3.15 Insert unicode characters

Actually you do not need any configuration here. Just use

M-x ucs-insert

To insert any unicode character. If you want to see them while selecting, have a look at xub-mode from Ergo Emacs.

3.16 Highlight TODO and FIXME in comments

This is a default feature in most IDEs. Since Emacs allows you to build your own IDE, it does not offer it by default… but it should, since that does not disturb anything. So we add it.

fic-ext-mode highlight TODO and FIXME in comments for common programming languages.

;; Highlight TODO and FIXME in comments 
(package-require 'fic-ext-mode)
(defun add-something-to-mode-hooks (mode-list something)
  "helper function to add a callback to multiple hooks"
  (dolist (mode mode-list)
    (add-hook (intern (concat (symbol-name mode) "-mode-hook")) something)))

(add-something-to-mode-hooks '(c++ tcl emacs-lisp python text markdown latex) 'fic-ext-mode)

3.17 Save macros as functions

Now for something which should really be provided by default: You just wrote a cool emacs macro, and you are sure that you will need that again a few times.

Well, then save it!

In standard emacs that needs multiple steps. And I hate that. Something as basic as saving a macro should only need one single step. It does now (and Emacs is great, because it allows me to do this!).

This bridges the gap between function definitions and keyboard macros, making keyboard macros something like first class citizens in your Emacs.

; save the current macro as reusable function.
(defun save-current-kbd-macro-to-dot-emacs (name)
  "Save the current macro as named function definition inside
your initialization file so you can reuse it anytime in the
  (interactive "SSave Macro as: ")
  (name-last-kbd-macro name)
    (find-file-literally user-init-file)
    (goto-char (point-max))
    (insert "\n\n;; Saved macro\n")
    (insert-kbd-macro name)
    (insert "\n")))

3.18 Transparent GnuPG encryption

If you have a diary or similar, you should really use this. It only takes a few lines of code, but these few lines are the difference between encryption for those who know they need it and encryption for everyone.

; Activate transparent GnuPG encryption.
(require 'epa-file)

3.19 Colored shell commands

A shell without colors is really hard to read. Use M-& to run your shell-commands asynchronously and in shell-mode (via async-shell-command).

3.20 Save backups in ~/.local/share/emacs-saves

This is just an aestetic value: Use the directories from the freedesktop specification for save files.

Thanks to the folks at CERN for this.

(setq backup-by-copying t      ; don't clobber symlinks
      '(("." . "~/.local/share/emacs-saves"))    ; don't litter my fs tree
      delete-old-versions t
      kept-new-versions 6
      kept-old-versions 2
      version-control t)       ; use versioned backups

3.21 Basic persistency

If I restart the computer I want my editor to make it easy for me to continue where I left off.

It’s bad enough that most likely my brain buffers were emptied. At least my editor should remember how to go on.

3.21.1 saveplace

If I reopen a file, I want to start at the line at which I was when I closed it.

; save the place in files
(require 'saveplace)
(setq-default save-place t)

3.21.2 savehist

And I want to be able to call my recent commands in the minibuffer. I normally don’t type the full command name anyway, but rather C-r followed by a small part of the command. Losing that on restart really hurts, so I want to avoid that loss.

; save minibuffer history
(require 'savehist)
;; increase the default history cutoff
(setq history-length 500)
(savehist-mode t)
(setq savehist-additional-variables

If this does not suffice for you, have a look at desktop, the chainsaw of Emacs persistency.

3.22 use the system clipboard

Finally one more minor adaption: Treat the clipboard gracefully. This is a tightrope stunt and getting it wrong can feel awkward.

This is the only setting for which I’m not sure that I got it right, but it’s what I use…

(setq x-select-enable-clipboard t)

But do not synchronize anything to the clipboard or primary selection (mouse-selection) while compiling an org-mode file. When I have it enabled, compiling an org-mode file to PDF locks KDE - I think it does so by filling up the clipboard. So the system clipboard is disabled, now, and I use the mouse-selection to transfer text from emacs to other parts.

; When I have x-select-enable-clipboard enabled, compiling an org-mode file to PDF locks
; KDE - I think it does so by filling up the clipboard.
(defadvice org-export-dispatch-no-clipboard-advice (around org-export-dispatch)
  "Do not clobber the system-clipboard while compiling an org-mode file with `org-export`."
  (let ((select-active-regions nil)
        (x-select-enable-clipboard nil)
        (x-select-enable-primary nil)
        (interprogram-cut-function nil)
        (interprogram-paste-function nil))
(ad-activate 'org-export-dispatch-no-clipboard-advice t)

3.23 Add license headers automatically

In case you mostly write free software, you might be as weary of hunting for the license header and copy pasting it into new files as I am. Free licenses, and especially copyleft licenses, are one of the core safeguards of free culture, because they give free software developers an edge over proprietarizing folks. But they are a pain to add to every file…

Well: No more. We now have legalese mode to take care of the inconvenient legal details for us, so we can focus on the code we write. Just call M-x legalese to add a GPL header, or C-u M-x legalese to choose another license.

(package-require 'legalese)

3.24 Find file as root

When I needed to open a file as root to do a quick edit, I used to dump into a shell and run sudo nano FILE, just because that was faster. Since I started using find-current-as-root, I no longer do that: Opening the file as root is now convenient enough in Emacs to no longer tempt me to drop to the shell.

;;; Open files as root - quickly
(defcustom find-file-root-prefix "/sudo:root@localhost:"
"Tramp root prefix to use.")

(defun find-file-as-root ()
  "Like `ido-find-file, but automatically edit the file with
root-privileges (using tramp/sudo), if the file is not writable by
  (let ((file (ido-read-file-name "Edit as root: ")))
    (unless (file-writable-p file)
      (setq file (concat find-file-root-prefix file)))
    (find-file file)))
;; or some other keybinding...
;; (global-set-key (kbd "C-x F") 'djcb-find-file-as-root)

(defun find-current-as-root ()
  "Reopen current file as root"
  (set-visited-file-name (concat find-file-root-prefix (buffer-file-name)))
  (setq buffer-read-only nil))

3.25 Fixes

This stuff should become obsolete, but at the moment it is still needed to improve the Emacs Experience.

;;; Fixes ;;;

3.25.1 Comint: recognize password in all languages

;; Make comint recognize passwords in virtually all languages.
(defcustom comint-password-prompt-regexp
   "\\(^ *\\|"
    '("Enter" "enter" "Enter same" "enter same" "Enter the" "enter the"
      "Old" "old" "New" "new" "'s" "login"
      "Kerberos" "CVS" "UNIX" " SMB" "LDAP" "[sudo]" "Repeat" "Bad") t)
   " +\\)"
    '("Adgangskode" "adgangskode" "Contrasenya" "contrasenya" "Contraseña" "contraseña" "Geslo" "geslo" "Hasło" "hasło" "Heslo" "heslo" "Iphasiwedi" "iphasiwedi" "Jelszó" "jelszó" "Lozinka" "lozinka" "Lösenord" "lösenord" "Mot de passe " "Mot de Passe " "mot de Passe " "mot de passe " "Mật khẩu " "mật khẩu" "Parola" "parola" "Pasahitza" "pasahitza" "Pass phrase" "pass Phrase" "pass phrase" "Passord" "passord" "Passphrase" "passphrase" "Password" "password" "Passwort" "passwort" "Pasvorto" "pasvorto" "Response" "response" "Salasana" "salasana" "Senha" "senha" "Wachtwoord" "wachtwoord" "slaptažodis" "slaptažodis" "Лозинка" "лозинка" "Пароль" "пароль" "ססמה" "كلمة السر" "गुप्तशब्द" "शब्दकूट" "গুপ্তশব্দ" "পাসওয়ার্ড" "ਪਾਸਵਰਡ" "પાસવર્ડ" "ପ୍ରବେଶ ସଙ୍କେତ" "கடவுச்சொல்" "సంకేతపదము" "ಗುಪ್ತಪದ" "അടയാളവാക്ക്" "රහස්පදය" "ពាក្យសម្ងាត់ ៖ " "パスワード" "密码" "密碼" "암호"))
   "\\(?:\\(?:, try\\)? *again\\| (empty for no passphrase)\\| (again)\\)?\
\\(?: for [^:]+\\)?:\\s *\\'")
  "Regexp matching prompts for passwords in the inferior process.
This is used by `comint-watch-for-password-prompt'."
  :version "24.3"
  :type 'regexp
  :group 'comint)

3.25.2 Autoconf-mode builtins

;; Mark all AC_* and AS_* functions as builtin.
(add-hook 'autoconf-mode-hook 
          (lambda () 
            (add-to-list 'autoconf-font-lock-keywords '("\\(\\(AC\\|AS\\|AM\\)_.+?\\)\\((\\|\n\\)" (1 font-lock-builtin-face)))))

3.25.3 Do not beep on alt-gr/M4

; tell emacs to ignore alt-gr clicks needed for M4 in the Neo Layout.
(define-key special-event-map (kbd "<key-17>") 'ignore)
(define-key special-event-map (kbd "<M-key-17>") 'ignore)

3.25.4 yank-pop should just yank on first invocation

When you run yank-pop after a yank, it replaces the yanked text. When you did not do a yank before, it errors out.

This change makes yank-pop yank instead so you can simply hit C-y repeatedly to first yank and then cycle through the yank history.

; yank-pop should yank if the last command was no yank.
(defun yank-pop (&optional arg)
  "Replace just-yanked stretch of killed text with a different stretch.
At such a time, the region contains a stretch of reinserted
previously-killed text.  `yank-pop' deletes that text and inserts in its
place a different stretch of killed text.

With no argument, the previous kill is inserted.
With argument N, insert the Nth previous kill.
If N is negative, this is a more recent kill.

The sequence of kills wraps around, so that after the oldest one
comes the newest one.

When this command inserts killed text into the buffer, it honors
`yank-excluded-properties' and `yank-handler' as described in the
doc string for `insert-for-yank-1', which see."
  (interactive "*p")
  (if (not (eq last-command 'yank))
    (setq this-command 'yank)
    (unless arg (setq arg 1))
    (let ((inhibit-read-only t)
          (before (< (point) (mark t))))
      (if before
          (funcall (or yank-undo-function 'delete-region) (point) (mark t))
        (funcall (or yank-undo-function 'delete-region) (mark t) (point)))
      (setq yank-undo-function nil)
      (set-marker (mark-marker) (point) (current-buffer))
      (insert-for-yank (current-kill arg))
      ;; Set the window start back where it was in the yank command,
      ;; if possible.
      (set-window-start (selected-window) yank-window-start t)
      (if before
          ;; This is like exchange-point-and-mark, but doesn't activate the mark.
          ;; It is cleaner to avoid activation, even though the command
          ;; loop would deactivate the mark because we inserted text.
          (goto-char (prog1 (mark t)
                       (set-marker (mark-marker) (point) (current-buffer))))))

3.25.5 Blink instead of beeping

(setq visible-bell t)

3.25.6 vc-state is slow

TODO: Adjust vc-find-file-hook to call the vcs tool asynchronously.

3.26 finish up

Make it possible to just (require 'babcore) and add the proper package footer.

(provide 'babcore)
;;; babcore.el ends here

4 Summary

With the babcore you have a core setup which exposes some of the essential features of Emacs and adds basic integration with the system which is missing in pristine Emacs.

Now go and see the M-x package-list-packages to see where you can still go - or just use Emacs and add what you need along the way. The package list is your friend, as is Emacswiki.

Happy Hacking!

Note: As almost everything on this page, this text and code is available under the GPLv3 or later.

Conveniently convert CamelCase to words_with_underscores using a small emacs hack

I currently cope with refactoring in an upstream project to which I maintain some changes which upstream does not merge. One nasty part is that the project converted from CamelCase for function names to words_with_underscores. And that created lots of merge errors.

Today I finally decided to speed up my work.

The first thing I needed was a function to convert a string in CamelCase to words_with_underscores. Since I’m lazy, I used google, and that turned up the CamelCase page of Emacswiki - and with it the following string functions:

(defun split-name (s)
   (let ((case-fold-search nil))
      (replace-regexp-in-string "\\([a-z]\\)\\([A-Z]\\)" "\\1 \\2" s)))
(defun underscore-string (s) (mapconcat 'downcase   (split-name s) "_"))

Quite handy - and elegantly executed. Now I just need to make this available for interactive use. For this, Emacs Lisp offers many useful ways to turn Editor information into program information, called interactive codes - in my case the region-code: "r". This gives the function the beginning and the end of the currently selected region as arguments.

With this, I created an interactive function which de-camelcases and underscores the selected region:

(defun underscore-region (begin end) (interactive "r")
  (let* ((word (buffer-substring begin end))
         (underscored (underscore-string word)))
      (widen) ; break out of the subregion so we can fix every usage of the function
      (replace-string word underscored nil (point-min) (point-max)))))

And now we’re almost there. Just create a macro which searches for a function, selects its name, de-camelcaeses and underscores it and then replaces every usage of the CamelCase name by the underscored name. This isn’t perfect refactoring (can lead to errors), but it’s fast and I see every change it does.

C-x C-(
C-s def 
M-x mark-word
M-x underscore-region
C-x C-)

That’s it, now just call the macro repeatedly.

C-x eeeeee…

Now check the diff to fix where this 13 lines hack got something wrong ( like changing __init__ into _init_ - I won’t debug this, you’ve been warned ☺).

Happy Hacking!

2015-01-14-Mi-camel-case-to-underscore.org2.39 KB

Custom link completion for org-mode in 25 lines (emacs)

Update (2013-01-23): The new org-mode removed (org-make-link), so I replaced it with (concat) and uploaded a new example-file: org-custom-link-completion.el.
Happy Hacking!

1 Intro

I recently set up custom completion for two of my custom link types in Emacs org-mode. When I wrote on identi.ca about that, Greg Tucker-Kellog said that he’d like to see that. So I decided, I’d publish my code.

The link types I regularly need are papers (PDFs of research papers I take notes about) and bib (the bibtex entries for the papers). The following are my custom link definitions :

(setq org-link-abbrev-alist
      '(("bib" . "~/Dokumente/Uni/Doktorarbeit-inverse-co2-ch4/aufschriebe/ref.bib::%s")
       ("notes" . "~/Dokumente/Uni/Doktorarbeit-inverse-co2-ch4/aufschriebe/papers.org::#%s")
       ("papers" . "~/Dokumente/Uni/Doktorarbeit-inverse-co2-ch4/aufschriebe/papers/%s.pdf")))

For some weeks I had copied the info into the links by hand. Thus an entry about a paper looks like the following.

* Title [[bib:identifier]] [[papers:name_without_suffix]]

This already suffices to be able to click the links for opening the PDF or showing the bibtex entry. Entering the links was quite inconvenient, though.

2 Implementation: papers

The trick to completion in org-mode is to create the function org-LINKTYPE-complete-link.

Let’s begin with the papers-links, because their completion is more basic than the completion of the bib-link.

First I created a helper function to replace all occurrences of a substring in a string1.

(defun string-replace (this withthat in)
  "replace THIS with WITHTHAT' in the string IN"
    (insert in)
    (goto-char (point-min))
    (replace-string this withthat)
    (buffer-substring (point-min) (point-max))))

As you can see, it’s quite simple: Just create a temporary buffer and and use the default replace-string function I’m using daily while editing. Don’t assume I had figured out that elegant way myself. I just searched for it in the net and adapted the nicest code I found :)

Now we get to the real completion:

(defun org-papers-complete-link (&optional arg)
  "Create a papers link using completion."
  (let (file link)
       (setq file (read-file-name "papers: " "papers/"))

The real magic is in read-file-name. That just uses the file-completion with a custom command prefix.

cleanup-link is only a small list of setq’s which removes parts of the filepath to make it compatible with the syntax for paper-links:

(let ((pwd (file-name-as-directory (expand-file-name ".")))
  (pwd1 (file-name-as-directory (abbreviate-file-name
                 (expand-file-name ".")))))
  (setq file (string-replace "papers/" "" file))
  (setq file (string-replace pwd "" (string-replace pwd1 "" file)))
  (setq file (string-replace ".pdf" "" file))
  (setq link (concat "papers:" file)))

And that’s it. A few lines of simple elisp and I have working completion for a custom link-type which points to research papers - and can easily be adapted when I change the location of the papers.

Now don’t think I would have come up with all that elegant code myself. My favorite language is Python and I don’t think that I should have to know emacs lisp as well as Python. So I copied and adapted most of it from existing functions in emacs. Just use C-h C-f <function-name> and then follow the link to the code :)

Remember: This is free software. Reuse and learning from existing code is not just allowed but encouraged.

3 Implementation: bib

For the bib-links, I chose an even easier way. I just reused reftex-do-citation from reftex-mode:

(defun org-bib-complete-link (&optional arg)
  "Create a bibtex link using reftex autocompletion."
  (concat "bib:" (reftex-do-citation nil t nil)))

For reftex-do-citation to allow using the bib-style link, I needed some setup, but I already had that in place for explicit citation inserting (not generalized as link-type), so I don’t count following as part of the actual implementation. Also I likely copied most of it from emacs-wiki :)

(defun org-mode-reftex-setup ()
  (and (buffer-file-name) (file-exists-p (buffer-file-name))
        ; Reftex should use the org file as master file. See C-h v TeX-master for infos.
        (setq TeX-master t)
        ; don’t ask for the tex master on every start.
        ;add a custom reftex cite format to insert links
         '((?b . "[[bib:%l][%l-bib]]")
           (?n . "[[notes:%l][%l-notes]]")
           (?p . "[[papers:%l][%l-paper]]")
           (?t . "%t")
           (?h . "** %t\n:PROPERTIES:\n:Custom_ID: %l\n:END:\n[[papers:%l][%l-paper]]")))))
  (define-key org-mode-map (kbd "C-c )") 'reftex-citation)
  (define-key org-mode-map (kbd "C-c (") 'org-mode-reftex-search))

(add-hook 'org-mode-hook 'org-mode-reftex-setup)

And that’s it. My custom link types now support useful completion.

4 Result

For papers, I get an interactive file-prompt to just select the file. It directly starts in the papers folder, so I can simply enter a few letters which appear in the paper filename and hit enter (thanks to ido-mode).

For bibtex entries, a reftex-window opens in a lower split-screen and asks me for some letters which appear somewhere in the bibtex entry. It then shows all fitting entries in brief but nice format and lets me select the entry to enter. I simply move with the arrow-keys, C-n/C-p, n/p or even C-s/C-r for searching, till the correct entry is highlighted. Then I hit enter to insert it.


And that’s it. I hope you liked my short excursion into the world of extending emacs to stay focussed while connecting seperate data sets.

I never saw a level of (possible) integration and consistency anywhere else which even came close to the possibilities of emacs.

And by the way: This article was also written in org-mode, using its literate programming features for code-samples which can actually be executed and extracted at will.

To put it all together I just need the following:


Now I use M-x org-babel-tangle to write the code to the file org-custom-link-completion.el. I attached that file for easier reference: org-custom-link-completion.el :)

Have fun with Emacs!

PS: Should something be missing here, feel free to get it from my public .emacs.d. I only extracted what seemed important, but I did not check if it runs in a pristine Emacs. My at-home branch is “fluss”.


1 : Creating a custom function for string replace might not have been necessary, because some function might already exist for that. But writing it myself was faster than searching for it.

2012-06-15-emacs-link-completion-bib.png77.24 KB
2012-06-15-Fr-org-link-completion.org7.29 KB
org-custom-link-completion.el2.13 KB

Easily converting ris-citations to bibtex with emacs and bibutils

The problem

Nature only gives me ris-formatted citations, but I use bibtex.

Also ris is far from human readable.

The background

ris can be reformatted to bibtext, but doing that manually disturbs my workflow when getting references while taking note about a paper in emacs.

I tend to search online for references, often just using google scholar, so when I find a ris reference, the first data I get for the ris-citation is a link.

The solution

Making it possible

bibutils1 can convert ris to an intermediate xml format and then convert that to bibtex.

wget -O reference.ris RIS_URL
cat reference.ris | ris2xml | xml2bib >> ref.bib

This solves the problem, but it is not convenient, because I have to switch to the terminal, download the file, convert it and append the result to my bibtex file.

Making it convenient

With the first step, getting the ris-citation is quite inconvenient. I need 3 steps just for getting a citation.

But those steps are always the same, and since I use Emacs, I can automate and integrate them very easily. So I created a simple function in emacs, which takes the url of a ris citation, converts it to bibtex and appends the result to my local bibtex file. Now I get a ris citation with a simple call to

M-x ris-citation-to-bib

Then I enter the url and the function appends the citation to my bibtex file.2

Feel free to integrate it into your own emacs setup (additionally to the GPLv3 you can use any license used by emacswiki or worg).

(defun ris-citation-to-bib (&optional ris-url) 
  "get a ris citation as bibtex in one step. Just call M-x
ris-citation-to-bib and enter the ris url. 
Requires bibutils: http://sourceforge.net/p/bibutils/home/Bibutils/ 
  (interactive "Mris-url: ")
    (let ((bib-file "/home/arne/aufschriebe/ref.bib")
          (bib-buffer (get-buffer "ref.bib"))
          (ris-buffer (url-retrieve-synchronously ris-url)))
      ; firstoff check if we have the bib buffer. If yes, move point to the last line.
      (if (not (member bib-buffer (buffer-list)))
          (setq bib-buffer (find-file-noselect bib-file)))
        (set-buffer bib-buffer)
        (goto-char (point-max)))
      (if ris-buffer
          (set-buffer ris-buffer))
      (shell-command-on-region (point-min) (point-max) "ris2xml | xml2bib" ris-buffer)
      (let ((pmin (- (search-forward "@") 1))
            (pmax (search-forward "}

))) (if (member bib-buffer (buffer-list)) (progn (append-to-buffer bib-buffer pmin pmax) (kill-buffer ris-buffer) (set-buffer bib-buffer) (save-buffer) ))))))

Happy Hacking!

PS: When I don’t have the URL (many thanks to journals giving me only a download button), I open the file, select the content and hit M-| (shell-command-on-region) with ris2xml | xml2bib (searching backwards via C-r ris so I to avoid typing the exact command) and get the bibtex version in the results buffer.

  1. To get bibutils in Gentoo, just call emerge app-text/bibutils

  2. Well, actually I only use M-x ris- TAB, but that’s a detail (though I would not want to work without it :) ) 

El Kanban Org: parse org-mode todo-states to use org-tables as Kanban tables

Kanban for emacs org-mode.

Update (2020): Kanban moved to sourcehut: https://hg.sr.ht/~arnebab/kanban.el

Update (2013-04-13): Kanban.el now lives in its own repository: on bitbucket and on a statically served http-repo (to be independent from unfree software).

Update (2013-04-10): Thanks to Han Duply, kanban links now work for entries from other files. And I uploaded kanban.el on marmalade.

Some time ago I learned about kanban, and the obvious next step was: “I want to have a kanban board from org-mode”. I searched for it, but did not find any. Not wanting to give up on the idea, I implemented my own :)

The result are two functions: kanban-todo and kanban-zero.

“Screenshot” :)

Refactor in such a way that the
let Presentation manage dumb sprites
return all actions on every command:
Make the UiState adhere the list of
Turn the model into a pure state


kanban-todo provides your TODO items as kanban-fields. You can move them in the table without having duplicates, so all the state maintenance is done in the kanban table. Once you are finished, you mark them as done and delete them from the table.

To set it up, put kanban.el somewhere in your load path and (require 'kanban) (more recent but potentially unstable version). Then just add a table like the following:

|   |   |   |
|   |   |   |
|   |   |   |
|   |   |   |
|   |   |   |
#+TBLFM: $1='(kanban-todo @# @2$2..@>$>)::@1='(kanban-headers $#)

Click C-c C-c with the point on the TBLFMT line to update the table.

The important line is the #+TBLFM. That says “use my TODO items in the TODO column, except if they are in another column” and “add kanban headers for my TODO states”

The kanban-todo function takes an optional parameter match, which you can use to restrict the kanban table to given tags. The syntax is the same as for org-mode matchers. The third argument allows you to provide a scope, for example a list of files.

To only set the scope, use nil for the matcher.

See C-h f org-map-entries and C-h v org-agenda-files for details.


kanban-zero is a zero-state Kanban: All state is managed in org-mode and the table only displays the kanban items.

To set it up, put kanban.el somwhere in your load path and (require 'kanban). Then just add a table like the following:

|   |   |   |
|   |   |   |
|   |   |   |
|   |   |   |
|   |   |   |
#+TBLFM: @2$1..@>$>='(kanban-zero @# $#)::@1='(kanban-headers $#)

The important line is the #+TBLFM. That says “show my org items in the appropriate column” and “add kanban headers for my TODO states”.

Click C-c C-c with the point on the TBLFMT line to update the table.

The kanban-zero function takes an optional parameter match, which you can use to restrict the kanban table to given tags. The syntax is the same as for org-mode matchers. The third argument allows you to provide a scope, for example a list of files.

To only set the scope, use nil for the matcher.

An example for matcher and scope would be:

#+TBLFM: @2$1..@>$>='(kanban-zero @# $# "1w6" '("/home/arne/.emacs.d/private/org/emacs-plan.org"))::@1='(kanban-headers $#)

See C-h f org-map-entries and C-h v org-agenda-files for details.


To contribute to kanban.el, just change the file and write a comment about your changes. Maybe I’ll setup a repo on Bitbucket at some point…


In the Hexbattle game-draft, I use kanban to track my progress:

Table of Contents

1 Kanban

Refactor in such a way that the
let Presentation manage dumb sprites
return all actions on every command:
Make the UiState adhere the list of
Turn the model into a pure state

2 refactor Hexbattle    1w6

… and so on …

Advanced usage

“Graphical” TODO states

To make the todo states easier to grok directly you can use unicode symbols for them. Example:

#+SEQ_TODO: ❢ ☯ ⧖ | ☺ ✔ DEFERRED ✘
| ❢ | ☯ | ⧖ | ☺ | |---+---+---+---| | | | | | #+TBLFM: @1='(kanban-headers $#)::@2$1..@>$>='(kanban-zero @# $#)

In my setup they are ❢ (todo) ☯ (doing) ⧖ (waiting) and ☺ (to report). Not shown in the kanban Table are ✔ (finished), ✘ (dropped) and deferred (later), because they don’t require any action from me, so I don’t need to see them all the time.

Collecting kanban entries via SSH

If you want to create a shared kanban table, you can use the excellent transparent network access options from Emacs tramp to collect kanban entries directly via SSH.

To use that, simply pass an explicit list of files to kanban-zero as 4th argument (if you don’t use tag matching just use nil as 3rd argument). "/ssh:host:path/to/file.org" retrieves the file ~/path/to/file.org from the host.

| ❢ | ☯ |
|   |   |
#+TBLFM: @1='(kanban-headers $#)::@2$1..@>$>='(kanban-zero @# $# nil (list (buffer-file-name) "/ssh:localhost:plan.org"))

Caveeat: all included kanban files have to use at least some of the same todo states: kanban.el only retrieves TODO states which are used in the current buffer.

kanban.el5.86 KB

How to show the abstract before the table of contents in org-mode

I use Emacs Org-Mode for writing all kinds of articles. The standard format for org-mode is to show the table of contents before all other content, but that requires people to scroll down to see whether the article is interesting for them. Therefore I want the abstract to be shown before the table of contents.

1 Intro

There is an old guide for showing the abstract before the TOC in org-mode<8, but since I use org-mode 8, that wasn’t applicable to me.

With a short C-h v org-toc TAB TAB (means: search all variables which start with org- and containt -toc) I found the following even simpler way. After I got that solution working, I found that this was still much too complex and that org-mode actually provides an even easier and very convenient way to add the TOC at any place.

2 Solution

(from the manual)

At the beginning of your file (after the title) add

#+OPTIONS: toc:nil

Then after the abstract add a TOC:

#+TOC: headlines 2

Done. Have fun with org-mode!

3 Appendix: Complex way

This is the complicated way I tried first. It only works with LaTeX, but there it works. Better use the simple way.

Set org-export-with-toc to nil as file-local variable. This means you just append the following to the file:

# Local Variables:
# org-export-with-toc: nil
# End:

(another nice local variable is org-confirm-babel-evaluate: nil, but don’t set that globally, otherwise you could run untrusted code when you export org-mode files from others. When this is set file-local, emacs will ask you for each file you open whether you want to accept the variable setting)

Then write the abstract before the first heading and add tableofcontents after it. Example:

#+LATEX: \tableofcontents
2013-11-21-Do-emacs-orgmode-abstract-before-toc.pdf143.29 KB
2013-11-21-Do-emacs-orgmode-abstract-before-toc.org2.23 KB

IRC-chat via Tor with Emacs on Gentoo

As example: Connecting to #youbroketheinternet.

emerge privoxy torsocks net-vpn/tor
# rc-config start privoxy tor
# rc-update add privoxy default
# rc-update add tor default
mkdir -p ~/.local/EMACS_TOR_HOME/.emacs.d
echo "(require 'socks)" >> ~/.local/EMACS_TOR_HOME/.emacs.d/init.el
HOME=~/.local/EMACS_TOR_HOME torify emacs --title "Emacs-torified"
# M-x customize-variable RET socks-server RET
#   host: localhost
#   port: 9050
#   type: Socks v5
#   (C-x C-s to save and set)
# M-x erc-select
#   server loupsycedyglgamf.onion
#   port 67
# the welcome channel is good to go.

See https://www.emacswiki.org/emacs/ErcProxy#toc2

and http://youbroketheinternet.org/#overlay

Insert a scaled screenshot in emacs org-mode

@marjoleink asked on identi.ca1, if it is possible to use emacs org-mode for showing scaled screenshots inline while writing. Since I thought I’d enjoy some hacking, I decided to take the challenge.

It does not do auto-scaling of embedded images, as far as I know, but the use case of screenshots can be done with a simple function (add this to your ~/.emacs or ~/.emacs.d/init.el):

(defun org-insert-scaled-screenshot ()
  "Insert a scaled screenshot 
for inline display 
into your org-mode buffer."
  (let ((filename 
         (concat "screenshot-" 
                   "date +%Y%m%d%H%M%S")
                  0 -1 )
    (let ((scaledname 
           (concat filename "-width300.png")))
(shell-command (concat "import -window root " filename)) (shell-command (concat "convert -adaptive-resize 300 " filename " " scaledname)) (insert (concat "[[./" scaledname "]]")))))

Now just call M-x org-redisplay-inline-images to see the screenshot (or add it to the function).

In action:

scaled screenshot

Have fun with Emacs - and happy hacking!

PS: In case it’s not obvious: The screenshot shows emacs just as the screenshot is being shot - with the method shown here ☺)

  1. Matthew Gregg: @marjoleink "way of life" thing again, but if you can invest some time, org-mode is a really powerful note keeping environment. → Marjolein Katsma: @mcg I'm sure it is - but seriously: can you embed a diagram2 or screenshot, scale it, and link it to itself? 

  2. For diagrams, you can just insert a link to the image file without description, then org-mode can show it inline. To get an even nicer user-experience (plain text diagrams or ascii-art), you can use inline code via org-babel using graphviz (dot) or ditaa - the latter is used for the diagrams in my complete Mercurial branching strategy

screenshot-20121122101933-width300.png108.08 KB
screenshot-20121122101933-width600.png272.2 KB

Minimal example for literate programming with noweb in emacs org-mode

If you want to use the literate programming features in emacs org-mode, you can try this minimal example to get started: Activate org-babel-tangle, then put this into the file noweb-test.org:

Minimal example for noweb in org-mode

* Assign 

First we assign abc:

#+begin_src python :noweb-ref assign_abc
abc = "abc"

* Use

Then we use it in a function:

#+begin_src python :noweb tangle :tangle noweb-test.py
def x():
  return abc



Hit C-c C-c to evaluate the source block. Hit C-c C-v C-t to put the expanded code into the file noweb-test.py.

The exported code looks like this:

def x():
  abc = "abc"
  return abc


(html generated with org-export-as-html-to-buffer and slightly reniced to escape the additional parsing I have on my site)

And with org-export-as-pdf we get this:



Add :results output to the #+begin_src line of the second block to see the print results under that block when you hit C-c C-c in the block.

You can also use properties of headlines for giving the noweb-ref. Org-mode can then even concatenate several source blocks into one noweb reference. Just hit C-c C-x p to set a property (or use M-x org-set-property), then set noweb-ref to the name you want to use to embed all blocks under this heading together.

Note: org-babel prefixes each line of an included code-block with the prefix used for the reference (here <<assign_abc>>). This way you can easily include blocks inside python functions.

Note: To keep noweb-references literally in the output or similar, have a look at the different options to :noweb.

Note: To do this with shell-code, it’s useful to change the noweb markers to {{{ and }}}, because << and >> are valid shell-syntax, so they disturb the highlighting in sh-mode. Also confirming the evaluation every time makes plain exporting problematic. To fix this, just add the following somewhere in the file (to keep this simple, just add it to the end):

# Local Variables:
# org-babel-noweb-wrap-start: "{{{"
# org-babel-noweb-wrap-end: "}}}"
# org-confirm-babel-evaluate: nil
# org-export-allow-bind-keywords: t
# End:

Have fun with Emacs and org-mode!

noweb-test.pdf81.69 KB
noweb-test.org290 Bytes
noweb-test.py.txt49 Bytes
noweb-test-pdf.png6.05 KB

Org-mode with Parallel Babel

Update 2017: a block with sem -j ... seems to block in recent versions of Emacs until all subtasks are done. It would be great if someone could figure out why (though it likely is the right thing to do). To circumvent that, you can daemonize the job in sem, but that might have unwanted side-effects: sem "[job] &"

Babel in Org

Emacs Org-mode provides the wonderful babel-capability: Including code-blocks in any language directly in org-mode documents in plain text.

In default usage, running such code freezes my emacs until the code is finished, though.

Up to a few weeks ago, I solved this with a custom function, which spawns a new emacs as script runner for the specific code:

; Execute babel source blocks asynchronously by just opening a new emacs.
(defun bab/org-babel-execute-src-block-new-emacs ()
  "Execute the current source block in a separate emacs,
so we do not block the current emacs."
  (let ((line (line-number-at-pos))
        (file (buffer-file-name)))
    (async-shell-command (concat 
                          "TERM=vt200 emacs -nw --find-file " 
                          " --eval '(goto-line "
                          (number-to-string line) 
                          ")' --eval "
     "'(let ((org-confirm-babel-evaluate nil))(org-babel-execute-src-block t))' "
                          "--eval '(kill-emacs 0)'"))))

and its companion for exporting to beamer-latex presentation pdf:

; Export as pdf asynchronously by just opening a new emacs.
(defun bab/org-beamer-export-new-emacs ()
  "Export the current file in a separate emacs,
so we do not block the current emacs."
  (let ((line (line-number-at-pos))
        (file (buffer-file-name)))
    (async-shell-command (concat 
                          "TERM=vt200 emacs -nw --find-file " 
                          " --eval '(goto-line " 
                          (number-to-string line) 
                          ")' --eval "
     "'(let ((org-confirm-babel-evaluate nil))(org-beamer-export-to-pdf))' "
                          "--eval '(kill-emacs 0)'"))))

But for shell-scripts there’s a much simpler alternative:

GNU Parallel to the rescue! Process-pool made easy.

Instead of spawning an external process, I can just use GNU Parallel for the long-running program-calls in the shell-code. For example like this (real code-block):

#+BEGIN_SRC sh :exports none
  cd ~/tm5tools/plotting
  filename="./obsheat-increasing.png" >/dev/null 2>/dev/null
  sem -j -1 ./plotstation.py -c ~/sun-work/ct-production-out-5x7e300m1.0 -C "aircraft" -c ~/sun-work/ct-production-out-5x7e300m1.0no-aircraft -C "continuous"  --obsheat --station allnoaa --title "\"Reducing observation coverage\"" -o ${oldPWD}/${filename}
  cd -

Let me explain this.

sem is a part of GNU parallel which makes parallel execution easy. Essentially it gives us a simple version of the convenience we know from make.

for i in {1..100}; do 
    sem -j -1 [code] # run N-1 processes with N as the number of
                     # pocessors in my computer

This means that the above org-mode block will finish instantly, but there will be a second process managed by GNU parallel which executes the plotting script.

The big advantage here is that I can also set this to execute on exporting a document which might run hundreds of code-blocks. If I did this with naive multiprocessing, that would spawn 100 processes which overwhelm the memory of my system (yes, I did that…).

sem -j -1 ensures, that this does not happen. Essentially it provides a process-pool with which it executes the code.

If you use this on export, take care to add a final code-block which waits until all other blocks finished:

sem --wait

A word of caution: Shell escapes

If you use GNU parallel to run programs, the arguments are interpreted two times: once when you pass them to sem and a second time when sem passes them on. Due to this, you have to add escaped quote-marks for every string which contains whitespace. This can look like the following code (the example above reduced to its essential parts):

sem -j -1 ./plotstation.py --title "\"Reducing observation coverage\""

I stumbled over this a few times, but the convenience of GNU parallel is worth the small extra-caution.

Besides: For easier editing of inline-source-code, set org-src-fontify-natively to true (t), either via M-x customize-variable or by adding the following to your .emacs:

(setq org-src-fontify-natively t)


With the tool sem from GNU parallel you get parallel execution of shell code-blocks in emacs org-mode using the familiar syntax from make:

sem -j -1 [escaped code]

Publish a single file with emacs org-mode

I often write small articles on some experience I make, and since I want to move towards using static pages more often, I tried using emacs org-mode publishing for that. Strangely the simple usecase of publishing a single file seems quite a bit more complex than needed, so I document the steps here.

This is my first use of org-publish, so I likely do not use it perfectly. But as it stands, it works. You can find the org-publish version of this article at draketo.de/proj/orgmode-single-file.

1 Why static pages?

I recently lost a dynamic page to hackers. I could not recover the content from all the spam which flooded it. It was called good news and I had wanted to gather positive news which encourage getting active - but I never really found the time to get it running. See what is left of it: http://gute-neuigkeiten.de

Any dynamic page carries a big maintenance cost, because I have to update all the time to keep it safe from spammers who want to abuse it for commercial spam - in the least horrible case. I can choose a managed solution, but that makes me dependant on the hoster providing what I need. Or I can take the sledgehammer and just use a static site: It never does any writes to the webserver, so there is nothing to hack.

As you can see, that’s what I’m doing nowadays.

2 Why Emacs Org-Mode?

Because after having used MacOS for almost a decade and then various visual-oriented programs for another five years, Emacs is nowadays the program which is most convenient to me. It achieves a level of integration and usability which is still science-fiction in other systems - at least when you’re mostly working with text.

And Org-mode is to Emacs as Emacs is to the Operating System: It begins as a simple todo-list and accompanies you all the way towards programming, reproducible research - and publishing websites.

3 Current Solution

Currently I first publish the single file to FTP and then rename it to index.html. This translates to the following publish settings:

(setq private-publish-ftp-proj (concat "/ftp:" USER "@" HOST ":arnebab/proj/"))

(setq org-publish-project-alist
         :base-directory "~/.emacs.d/private/journal"
         :publishing-directory (concat private-publish-ftp-proj "orgmode-single-file/")
         :base-extension "org"
         :publishing-function org-html-publish-to-html
         :completion-function (lambda () (rename-file 
                                          (concat private-publish-ftp-proj 
                                          (concat private-publish-ftp-proj 
                                                  "orgmode-single-file/index.html") t))
         :section-numbers nil
         :with-toc t
         :html-preamble t
         :exclude ".*"
         :include ["2013-11-25-Mo-publish-single-file-org-mode.org"])))

Now I can use C-c C-e P x orgmode-single-file to publish this file to the webserver whenever I change it.

Note the lambda: I just copy the published to index.html, because I did not find out, how to rename the file by just setting an option. :index-filename did not work. But likely I missed something which would make this much nicer.

Note that if I had wanted to publish a folder full of files, this would have been much easier: There actually is an option to create an automatic index-file and sitemap.

For more details, read the org-mode publishing guide.

4 Conclusion

This is not as simple as I would like it to be. Maybe (or rather: likely) there is a simpler way. But I can now publish arbitrary org-mode files to my webserver without much effort (and without having to switch context so some other program). And that’s something I’ve been missing for a long time, so I’m very happy to finally have it.

And it was less pain that I feared, though publishing this via my drupal-site, too, obviously shows that I’m still far from moving to static pages for everything. For work-in-progress, this is great, though - for example for my Basics for Guile Scheme.

Read your python module documentation from emacs

Update 2021: Fixed links that died with Bitbuckets hosting.

I just found the excellent pydoc-info mode for emacs from Jon Waltman. It allows me to hit C-h S in a python file and enter a module name to see the documentation right away. If the point is on a symbol (=module or class or function), I can just hit enter to see its docs.

pydoc in action

In its default configuration (see the Readme) it “only” reads the python documentation. This alone is really cool when writing new python code, but it s not enough, since I often use third party modules.

And now comes the treat: If those modules use sphinx for documentation (≥1.1), I can integrate them just like the standard python documentation!

It took me some time to get it right, but now I have all the documentation for the inverse modelling framework I contribute to directly at my fingertips: Just hit C-h S ENTER when I’m on some symbol and a window shows me the docs:

custom pydoc in action
The text in this image is from Wouter Peters. Used here as short citation which should be legal almost everywhere under citation rules.

I want to save you the work of figuring out how to do that yourself, so here’s a short guide for integrating the documentation for your python program into emacs.

Integrating your own documentation into emacs

The prerequisite for integrating your own documentation is to use sphinx for documenting your code. See their tutorial for info how to set it up. As soon as sphinx works for you, follow this guide to integrate your docs in your emacs.

Install pydoc-info

First get pydoc-info and the python infofile (adapt this to your local setup):

# get the mode
cd ~/.emacs.d/libs
hg clone https://hg.sr.ht/~arnebab/pydoc-info
# and the pregenerated info-file for python
wget http://www.draketo.de/dateien/python.info.gz
gunzip python.info.gz
sudo cp python.info /usr/share/info
sudo install-info --info-dir=/usr/share/info python.info

To build the info file for python yourself, have a look at the Readme.

Turn your documentation into info

Now turn your own documentation into an info document and install it.

Sphinx uses a core configuration file named conf.py. Add the following to that file, replacing all values but index and False by the appropriate names for you project:

# One entry per manual page. 
# list of tuples (startdocname, 
# targetname, title, author, dir_entry, 
# description, category, toctree_only).
texinfo_documents = [
  ('index', # startdocname, keep this!
   'TARGETNAME', # targetname
   u'Long Title', # title
   u'Author Name', # author
   'Name in the Directory Index of Info', # dir_entry
   u'Long Description', # description
   'Software Development', # cathegory
   False), # better keep this, too, i think.

Then call sphinx and install the info files like this (maybe adapted to your local setup):

sphinx-build -b texinfo source/ texinfo/ 
cd texinfo
sudo install-info --info-dir=/usr/share/info TARGETNAME.info
sudo cp TARGETNAME.info /usr/share/info/

Activate pydoc-info, including your documentation

Finally add the following to your .emacs (or wherever you store your personal adaptions):

; Show python-documentation as info-pages via C-h S
(setq load-path (cons "~/.emacs.d/libs/pydoc-info" load-path))
(require 'pydoc-info)
   :mode 'python-mode
   :parse-rule 'pydoc-info-python-symbol-at-point
   '(("(python)Index" pydoc-info-lookup-transform-entry)
     ("(TARGETNAME)Index" pydoc-info-lookup-transform-entry)))
emacs-pydoc.png52 KB
emacs-pydoc-standardlibrary.png34.22 KB

Recipes for presentations with beamer latex using emacs org-mode

I wrote some recipes for creating the kinds of slides I need with emacs org-mode export to beamer latex.

Update: Read ox-beamer to see how to adapt this to work with the new export engine in org-mode 0.8.

PDF recipes The recipes as PDF (21 slides, 247 KiB)

org-mode file The org-mode sources (12.2 KiB)

Below is an html export of the org-mode file. Naturally it does not look as impressive as the real slides, but it captures all the sources, so I think it has some value.

Note: To be able to use the simple block-creation commands, you need to add #+startup: beamer to the header of your file or explicitely activate org-beamer with M-x org-beamer-mode.

«I love your presentation»:

PS: I hereby allow use of these slides under any of the licenses used by worg and/or the emacs wiki.

1 Introduction

1.1 Usage

1.1.1 (configure your emacs, see Basic Configuration at the end)

1.1.2 C-f <file which ends in .org>

1.1.3 Insert heading:

Hello World

#+LaTeX_CLASS: beamer

* Hello
** Hello GNU
Nice to see you!

1.1.4 M-x org-export-as-pdf

done: Your first org-beamer presentation.

1.2 org-mode + beamer = love

1.2.1 Code    BMCOL

#+LaTeX_CLASS: beamer
* Introduction
** org-mode + beamer =  love
*** Code :BMCOL:
    :BEAMER_col: 0.7
<example block>
*** Simple block  :BMCOL:B_block:
    :BEAMER_col: 0.3
    :BEAMER_env: block
it's that easy!

1.2.2 Simple block    BMCOL B_block

it's that easy!

1.3 Two columns - in commands

1.3.1 Commands    BMCOL B_block

** Two columns - in commands
*** Commands
C-c C-b | 0.7
C-c C-b b
<eTAB (write example) C-n C-n
*** Result
C-c C-b | 0.3
C-c C-b b
even easier - and faster!

1.3.2 Result    BMCOL B_block

even easier - and faster!

2 Recipes

2.1 Four blocks - code

*** Column 1 :B_ignoreheading:BMCOL:
    :BEAMER_env: ignoreheading
    :BEAMER_col: 0.5

*** One
*** Three                                                           

*** Column 2 :BMCOL:B_ignoreheading:
    :BEAMER_col: 0.5
    :BEAMER_env: ignoreheading

*** Two
*** Four

2.2 Four blocks - result

2.2.1 Column 1    B_ignoreheading BMCOL

2.2.2 One

2.2.3 Three

2.2.4 Column 2    BMCOL B_ignoreheading

2.2.5 Two

2.2.6 Four

2.3 Four nice blocks - commands

C-c C-b | 0.5 # column
C-c C-b i # ignore heading
*** One 
C-c C-b b # block
*** Three 
C-c C-b b
C-c C-b | 0.5
C-c C-b i
*** Two 
C-c C-b b
*** Four 
C-c C-b b

2.4 Four nice blocks - result

2.4.1    BMCOL B_ignoreheading

2.4.2 One    B_block

2.4.3 Three    B_block

2.4.4    BMCOL B_ignoreheading

2.4.5 Two    B_block

2.4.6 Four    B_block

2.5 Top-aligned blocks

2.5.1 Code    B_block BMCOL

*** Code                                                      :B_block:BMCOL:
    :BEAMER_env: block
    :BEAMER_col: 0.5
    :BEAMER_envargs: C[t]

*** Result                                                    :B_block:BMCOL:
    :BEAMER_env: block
    :BEAMER_col: 0.5
pretty nice!

2.5.2 Result    B_block BMCOL

pretty nice!

2.6 Two columns with text underneath - code

2.6.1    B_columns

  • Code    BMCOL


    ***  :B_columns:
        :BEAMER_env: columns
    **** Code :BMCOL:
        :BEAMER_col: 0.6
    **** Result :BMCOL:
        :BEAMER_col: 0.4
    *** Underneath :B_ignoreheading:
        :BEAMER_env: ignoreheading
    Much text underneath! Very Much.
    Maybe too much. The whole width!


  • Result    BMCOL

2.6.2 Underneath    B_ignoreheading

Much text underneath! Very Much. Maybe too much. The whole width!

2.7 Nice quotes

2.7.1 Code    B_block BMCOL

Emacs org-mode is a 
great presentation tool - 
Fast to beautiful slides.
- Arne Babenhauserheide

2.7.2 Result    B_block BMCOL

Emacs org-mode is a great presentation tool - Fast to beautiful slides.

  • Arne Babenhauserheide

2.8 Math snippet

2.8.1 Code    BMCOL B_block

2.8.2 Inline    B_block

\( 1 + 2 = 3 \) is clear

2.8.3 As equation    B_block

\[ 1 + 2 \cdot 3 = 7 \]

2.8.4 Result    BMCOL B_block

2.8.5 Inline    B_block

\( 1 + 2 = 3 \) is clear

2.8.6 As equation    B_block

\[ 1 + 2 \cdot 3 = 7 \]

2.9 \( \LaTeX \)

2.9.1 Code    BMCOL B_block

\( \LaTeX \) gives a space 
after math mode.

\LaTeX{} does it, too.

\LaTeX does not.

At the end of a sentence 
both work.
Try \LaTeX. Or try \LaTeX{}.

Only \( \LaTeX \) and \( \LaTeX{} \) 
also work with HTML export.

2.9.2 Result    BMCOL B_block

\( \LaTeX \) gives a space after math mode.

\LaTeX{} does it, too.

\LaTeX does not.

At the end of a sentence both work. Try \LaTeX. Or try \LaTeX{}.

Only \( \LaTeX \) and \( \LaTeX{} \) also work with HTML export.

2.10 Images with caption and label

2.10.1    B_columns

  • Code    B_block BMCOL
    #+caption: GNU Emacs icon
    #+label: fig:emacs-icon
    This is image (\ref{fig:emacs-icon})

  • Result    B_block BMCOL


    GNU Emacs icon

    This is image (emacs-icon)

2.10.2    B_ignoreheading

Autoscaled to the block width!

2.11 Examples

2.11.1 Code    BMCOL B_block

: #+bla: foo
: * Example Header

Gives an example, which does not interfere with regular org-mode parsing.


Gives a simpler multiline example which can interfere.

2.11.2 Result    BMCOL B_block

#+bla: foo
* Example Header

Gives an example, which does not interfere with regular org-mode parsing.


Gives a simpler multiline example which can interfere.

3 Basic Configuration

3.1 Header


#+startup: beamer
#+LaTeX_CLASS: beamer
#+LaTeX_CLASS_OPTIONS: [bigger]
#+AUTHOR: <empty for none, if missing: inferred>
#+DATE: <empty for none, if missing: today>
#+TITLE: <causes <Title> to be regular content!>

3.2 .emacs config

Put these lines into your .emacs or in a file your .emacs pulls in - i.e. via (require 'mysettings) if the other file is named mysettings.el and ends in (provide 'mysettings).

(org-babel-do-load-languages ; babel, for executing 
 'org-babel-load-languages   ; code in org-mode.
 '((sh . t)
   (emacs-lisp . t)))

(require 'org-latex) ; latex export 
(add-to-list         ; with highlighting
  'org-export-latex-packages-alist '("" "minted"))
  'org-export-latex-packages-alist '("" "color"))
(setq org-export-latex-listings 'minted)

3.3 .emacs variables

You can easily set these via M-x customize-variable.

(custom-set-variables ; in ~/.emacs, only one instance 
 '(org-export-latex-classes (quote ; in the init file!
    (("beamer" "\\documentclass{beamer}" 
 '(org-latex-to-pdf-process (quote 
    ((concat "pdflatex -interaction nonstopmode" 
             "-shell-escape -output-directory %o %f") 
     "bibtex $(basename %b)" 
     (concat "pdflatex -interaction nonstopmode" 
             "-shell-escape -output-directory %o %f")
     (concat "pdflatex -interaction nonstopmode" 
             "-shell-escape -output-directory %o %f")))))

(concat "…" "…") is used here to get nice, short lines. Use the concatenated string instead ("pdflatex…%f").

3.4 Required programs

3.4.1 Emacs - (gnu.org/software/emacs)

To get org-mode and edit .org files effortlessly.

emerge emacs

3.4.2 Beamer \( \LaTeX \) - (bitbucket.org/rivanvx/beamer)

To create the presentation.

emerge dev-tex/latex-beamer app-text/texlive

3.4.3 Pygments - (pygments.org)

To color the source code (with minted).

emerge dev-python/pygments

4 Thanks and license

4.1 Thanks

Thanks go to the writers of emacs and org-mode, and for this guide in particular to the authors of the org-beamer tutorial on worg.

Thank you for your great work!

This presentation is licensed under the GPL (v3 or later) with the additional permission to distribute it without the sources and the copy of the GPL if you give a link to those.1


1 : \tiny As additional permission under GNU GPL version 3 section 7, you may distribute these works without the copy of the GNU GPL normally required by section 4, provided you include a license notice and a URL through which recipients can access the Corresponding Source and the copy of the GNU GPL.\normalsize

emacs-org-beamer-recipes-thumnail.png8.92 KB
emacs-org-beamer-recipes-thumnail-org.png20.61 KB
2012-08-08-Mi-recipes-for-beamer-latex-presentation-using-emacs-org-mode.pdf247.11 KB
2012-08-08-Mi-recipes-for-beamer-latex-presentation-using-emacs-org-mode.org12.18 KB

Sending email to many people with Emacs Wanderlust

I recently needed to send an email to many people1.

Putting all of them into the BCC field did not work (mail rejected by provider) and when I split it into 2 emails, many did not see my mail because it was flagged as potential spam (they were not in the To-Field)2.

I did not want to put them all into the To-Field, because that would have spread their email-addresses around, which many would not want3.

So I needed a different solution. Which I found in the extensibility of emacs and wanderlust4. It now carries the name wl-draft-send-to-multiple-receivers-from-buffer.

You simply write the email as usual via wl-draft, then put all email addresses you want write to into a buffer and call M-x wl-draft-send-to-multiple-receivers-from-buffer. It asks you about the buffer with email addresses, then shows you all addresses and asks for confirmation.

Then it sends one email after the other, with a randomized wait of 0-10 seconds between messages to avoid flagging as spam.

If you want to use it, just add the following to your .emacs:

(defun wl-draft-clean-mail-address (address)
  (replace-regexp-in-string "," "" address))
(defun wl-draft-send-to-multiple-receivers (addresses) (loop for address in addresses do (progn (wl-user-agent-insert-header "To" (wl-draft-clean-mail-address address)) (let ((wl-interactive-send nil)) (wl-draft-send)) (sleep-for (random 10)))))
(defun wl-draft-send-to-multiple-receivers-from-buffer (&optional addresses-buffer-name) "Send a mail to multiple recipients - one recipient at a time" (interactive "BBuffer with one address per line") (let ((addresses nil)) (with-current-buffer addresses-buffer-name (setq addresses (split-string (buffer-string) "\n"))) (if (y-or-n-p (concat "Send this mail to " (mapconcat 'identity addresses ", "))) (wl-draft-send-to-multiple-receivers addresses))))

Happy Hacking!

  1. The email was about the birth of my second child, and I wanted to inform all people I care about (of whom I have the email address), which amounted to 220 recipients. 

  2. Naturally this technique could be used for real spamming, but to be frank: People who send spam won’t need it. They will already have much more sophisticated methods. This little trick just reduces the inconvenience brought upon us by the measures which are necessary due to spam. Otherwise I could just send a mail with 1000 receivers in the BCC field - which is how it should be. 

  3. It only needs one careless friend, and your connections to others get tracked in facebook and the likes. For more information on Facebook, see Stallman about Facebook

  4. Sure, there are also template mails and all such, but learning to use these would consume just as much time as extending emacs - and would be much less flexible: Should I need other ways to transform my mails, I’ll be able to just reuse my code. 

Simple Emacs DarkRoom

I just realized that I let myself be distracted by all kinds of not-so-useful stuff instead of finally getting to type the text I already wanted to transcribe from stenografic at the beginning of … last week.


Let’s take a break for a screenshot of the final version, because that’s what we really want to gain from this article: a distraction-free screenshot as distraction from the text :)

Emacs darkroom, screenshot

As you can see, the distractions are removed — the screenshot is completely full screen and only the text is left. If you switch to the minibuffer (i.e. via M-x), the status bar (modeline) is shown.


To remove the distractions I looked again at WriteRoom and DarkRoom and similar which show just the text I want to write. More exactly: I thought about looking at them again, but at second thought I decided to see if I could not just customize emacs to do the same, backed with all the power you get from several decades of being THE editor for many great hackers.

It took some googling and reading emacs wiki, and then some Lisp-hacking, but finally it’s 4 o’clock in the morning and I’m writing this in my own darkroom mode1, toggled on and off by just hitting F11.


I build on hide-mode-line (livejournal post or webonastick) as well as the full-screen info in the emacs wiki.

The whole code just takes 76 lines of code plus 26 lines comments and whitespace:

;;;; Activate distraction free editing with F11

; hide mode line, from http://dse.livejournal.com/66834.html / http://webonastick.com
(autoload 'hide-mode-line "hide-mode-line" nil t)
; word counting
(require 'wc)

(defun count-words-and-characters-buffer ()
  "Display the number of words and characters in the current buffer."
  (message (concat "The current buffer contains "
            (wc-non-interactive (point-min) (point-max)))
           " words and "
            (- (point-max) (point-min)))
           " letters.")))

; fullscreen, taken from http://www.emacswiki.org/emacs/FullScreen#toc26
; should work for X und OSX with emacs 23.x (TODO find minimum version).
; for windows it uses (w32-send-sys-command #xf030) (#xf030 == 61488)
(defvar babcore-fullscreen-p t "Check if fullscreen is on or off")
(setq babcore-stored-frame-width nil)
(setq babcore-stored-frame-height nil)

(defun babcore-non-fullscreen ()
  (if (fboundp 'w32-send-sys-command)
      ;; WM_SYSCOMMAND restore #xf120
      (w32-send-sys-command 61728)
    (progn (set-frame-parameter nil 'width 
                                (if babcore-stored-frame-width
                                    babcore-stored-frame-width 82))
           (set-frame-parameter nil 'height
                                (if babcore-stored-frame-height 
                                    babcore-stored-frame-height 42))
           (set-frame-parameter nil 'fullscreen nil))))

(defun babcore-fullscreen ()
  (setq babcore-stored-frame-width (frame-width))
  (setq babcore-stored-frame-height (frame-height))
  (if (fboundp 'w32-send-sys-command)
      ;; WM_SYSCOMMAND maximaze #xf030
      (w32-send-sys-command 61488)
    (set-frame-parameter nil 'fullscreen 'fullboth)))

(defun toggle-fullscreen ()
  (setq babcore-fullscreen-p (not babcore-fullscreen-p))
  (if babcore-fullscreen-p

(global-set-key [f11] 'toggle-fullscreen)

; simple darkroom with fullscreen, fringe, mode-line, menu-bar and scroll-bar hiding.
(defvar darkroom-enabled nil)
; TODO: Find out if menu bar is enabled when entering darkroom. If yes: reenable.
(defvar darkroom-menu-bar-enabled nil)

(defun toggle-darkroom ()
  (if (not darkroom-enabled)
      (setq darkroom-enabled t)
    (setq darkroom-enabled nil))
  (if darkroom-enabled
        ; if the menu bar was enabled, reenable it when disabling darkroom
        (if menu-bar-mode
            (setq darkroom-menu-bar-enabled t)
          (setq darkroom-menu-bar-enabled nil))
        ; save the frame configuration to be able to restore to the exact previous state.
        (if darkroom-menu-bar-enabled
            (menu-bar-mode -1))
        (scroll-bar-mode -1)
        (let ((fringe-width 
               (* (window-width (get-largest-window)) 
                  (/ (- 1 0.61803) (1+ (count-windows)))))
              (char-width-pixels 6))
        ; 8 pixels is the default, 6 is the average char width in pixels
        ; for some fonts:
        ; http://www.gnu.org/software/emacs/manual/html_node/emacs/Fonts.html
           (set-fringe-mode (truncate (* fringe-width char-width-pixels))))
        (add-hook 'after-save-hook 'count-words-and-characters-buffer))
      (if darkroom-menu-bar-enabled
      (scroll-bar-mode t)
      (set-fringe-mode nil)
      (remove-hook 'after-save-hook 'count-words-and-characters-buffer)

; Activate with M-F11 -> enhanced fullscreen :)
(global-set-key [M-f11] 'toggle-darkroom)

(provide 'activate-darkroom)

Also I now activated cua-mode to make it easier to interact with other programs: C-c and C-x now copy/cut when the mark is active. Otherwise they are the usual prefix keys. To force them to be the prefix keys, I can use control-shift-c/-x. I thought this would disturb me, but it does not.

To make it faster, I also told cua-mode to have a maximum delay of 50ms, so I don’t feel the delay. Essentially I just put this in my ~/.emacs:

(cua-mode t)
(setq cua-prefix-override-inhibit-delay 0.005)


Well, did this get me to transcribe the text? Not really, since I spent the time building my own DarkRoom/WriteRoom, but I enjoyed the little hacking and it might help me get it done tomorrow - and get far more other stuff done.

And it is really fun to write in DarkRoom mode ;)

PS: If you like the simple darkroom, please leave a comment!

I hereby declare that anyone is allowed to use this post and the screenshot under the same licensing as if it had been written in emacswiki.

  1. Actually there already is a darkroom mode, but it only works for windows. If you use that platform, you might enjoy it anyway. So you might want to call this mode “simple darkroom”, or darkroom x11 :) 

2011-01-22-emacs-darkroom.png97.37 KB

Staying sane with Emacs (when facing drudge work)

I have to sift through 6 really boring config files. To stay sane, I call in Emacs for support.

My task looks like this:

(click for full size)

In the lower left window I check the identifier in the table I have to complete (left column), then I search for all instances of that identifier in the right window and insert the instrument type, the SIGMA (uncertainty due to representation error defined for the type of the instrument and the location of the site) and note whether the site is marked as assimilated in the config file.

Then I also check all the other config files and note whether the site is assimilated there.

Drudge work. There are people who can do this kind of work. My wife would likely be able to do it faster without tool support than I can do it with tool support. But I’m really bad at that: When the task gets too boring I tend to get distracted - for example by writing this article.

To get the task done anyway, I create tools which make it more enjoyable. And with Emacs that’s actually quite easy, because Emacs provides most required tools out of the box.

Firstoff: My workflow before adding tools was like this:

  • hit xo to switch from the lower left window to the config file at the right.
  • Use M-x occur then type the station identifier. This displays all occurances of the station identifier within the config file in the upper left window.
  • Hit xo twice to switch to the lower left window again.
  • Type the information into the file.
  • Switch to the next line and repeat the process.

I now want to simplify this to a single command per line. I’ll use F9 as key, because that isn’t yet used for other things in my Emacs setup and because it is easy to reach and my default keybinding as “useful shortcut for this file”. Other single-keystroke options would be F7 and F8. All other F-keys are already used :)

To make this easy, I define a macro:

  • Move to the line above the line I want to edit.
  • Start Macro-recording with C-x C-(.
  • Go to the beginning of the next line with C-n and C-a.
  • Activate the mark with C-SPACE and select the whole identifier with M-f.
  • Make the identifier lowercase with M-x downcase-region, copy it with M-w and undo the downcasing with C-x u (or use the undo key; I defined one in my xmodmap).
  • Switch to the config file with C-x o
  • Search the buffer with M-x occur, inserting the identifier with C-y.
  • Hit C-x o C-x o (yes, twice) to get back into the list of sites.
  • Move to the end of the instrument column with M-f and kill the word with C-BACKSPACE.
  • Save the macro with C-x C-).
  • Bind kmacro-call-macro to F9 with M-x local-set-key F9 kmacro-call-macro.


My workflow is now reduced to this:

  • Hit F9
  • Type the information.
  • Repeat.

I’m pretty sure that this will save me more time today than I spent writing this text ☺

Happy hacking!

2015-01-26-sane-with-emacs-task.png79.85 KB
2015-01-26-sane-with-emacs-task-200.png7.79 KB
2015-01-26-sane-with-emacs-task-300.png15.92 KB
2015-01-26-sane-with-emacs-task-400.png27.28 KB
2015-01-26-sane-with-emacs-task-450.png33.83 KB

Tutorial: Writing scientific papers for ACPD using emacs org-mode

PDF-version (for printing)

orgmode-version (for editing)

Emacs Org mode is an excellent tool for reproducible research,1 but research is only relevant if people learn about it.2 To reach people with scientific work, you need to publish your results in a Journal, so I show here how to publish in ACPD with Emacs Org mode.3

1 Requirements

To use this tutorial, you need

  • a fairly recent version of org-mode (8.0 or later - not yet shipped with emacs 24.3, so you will need to install it separately) and naturally
  • Emacs. Also you need to download the
  • copernicus latex package. And it can’t hurt to have a look at the latex-instructions from ACP. I used them to create my setup.
  • lineno.sty. This is required by copernicus, but not included in the package - and neither in the texlive version I use.

2 Basic Setup

2.1 Emacs

The first step in publishing to ACPD is to activate org-mode and latex export and to create a latex-class in Emacs. To do so, just add the following to your ~/.emacs (or ~/.emacs.d/init.el) and eval it (for example by moving to the closing parenthesis and typing C-x C-e):

  (require 'org)
  (require 'org-latex)
  (require 'ox-latex)
  (setq org-latex-packages-alist 
        (quote (("" "color" t) ("" "minted" t) ("" "parskip" t)))
        (quote (
"pdflatex -interaction nonstopmode -shell-escape -output-directory %o %f" 
"bibtex $(basename %b)" 
"pdflatex -interaction nonstopmode -shell-escape -output-directory %o %f" 
"pdflatex -interaction nonstopmode -shell-escape -output-directory %o %f")))
  (add-to-list 'org-latex-classes
                 ("\\section{%s}" . "\\section*{%s}")
                 ("\\subsection{%s}" "\\newpage" "\\subsection*{%s}" "\\newpage")
                 ("\\subsubsection{%s}" . "\\subsubsection*{%s}")
                 ("\\paragraph{%s}" . "\\paragraph*{%s}")
                 ("\\subparagraph{%s}" . "\\subparagraph*{%s}"))

This allows you to use #+Latex_Class: copernicus_discussions in your org-mode file to set the PDF to export for ACPD.

Also you will likely want to use reftex for nice bibtex integration. To get it, add the following to your ~/.emacs or ~/.emacs.d/init.el:

(require 'reftex-cite)
(defun org-mode-reftex-setup ()
  (and (buffer-file-name) (file-exists-p (buffer-file-name))
        ; Reftex should use the org file as master file. See C-h v TeX-master for infos.
        (setq TeX-master t)
        ; enable auto-revert-mode to update reftex when bibtex file changes on disk
        (global-auto-revert-mode t) ; careful: this can kill the undo
                                    ; history when you change the file
                                    ; on-disk.
        ; add a custom reftex cite format to insert links
        ; This also changes any call to org-citation!
         '((?c . "\\citet{%l}") ; natbib inline text
           (?i . "\\citep{%l}") ; natbib with parens
  (define-key org-mode-map (kbd "C-c )") 'reftex-citation)
  (define-key org-mode-map (kbd "C-c (") 'org-mode-reftex-search))

(add-hook 'org-mode-hook 'org-mode-reftex-setup)

The first line adds reftex-citations with C-c [, the rest sets some reftex-defaults and adds a menu which allows you to chose using \textbackslash citep{} instead of \textbackslash cite{} (this is what ACPD requires).

For nice Sourcecode highlighting, you should also install Pygmentize and then add the following to your .emacs.d:

(add-to-list 'org-latex-packages-alist '("" "minted"))
(add-to-list 'org-latex-packages-alist '("" "color"))
(setq org-latex-listings 'minted)

; add emacs lisp support for minted
(setq org-latex-custom-lang-environments
      '((emacs-lisp "common-lispcode")))

2.2 The working folder

As next step, unzip the copernicus latex package in the folder you want to use for writing your article (do use a dedicated folder for that: org-mode leaves around some files). And remember to use a version-tracking system like Mercurial, so you can always take snapshots of your current state.

This will give you the following files:

  • authblk.sty
  • copernicus.bst
  • copernicus_discussions.cls
  • natbib.sty
  • pdfscreen.sty
  • pdfscreencop.sty

Ensure that all of them are in your folder, not in a subfolder. If necessary copy them there.

Also get lineno.sty and copy it into your folder.

If you want to use unicode-symbols in your text, add uniinput.sty, too.

3 The org-mode document

Using the ACPD style requires some deviations from the standard org-mode export process. Luckily org-mode is flexible to adapt to them. Setup your document as follows:

#+title: YOUR TITLE
#+Options: toc:nil ^:nil
#+BIND: org-latex-title-command ""
#+Latex_Class: copernicus_discussions
#+LaTeX_CLASS_OPTIONS: [acpd, hvmath, online]

# Nice code-blocks
#+BEGIN_SRC elisp :noweb no-export :exports results
  (setq org-latex-minted-options
    '(("bgcolor" "mintedbg") ("frame" "single") ("framesep" "6pt") 
      ("mathescape" "true") ("fontsize" "\\footnotesize")))

#+TOC: headlines 2

#+Latex: \runningtitle{SHORT TITLE}
#+Latex: \runningauthor{SHORT AUTHOR}
#+Latex: \correspondence{AUTHOR NAME\\ EMAIL}
#+Latex: \affil{YOUR UNIVERSITY}
#+Latex: \author[2,*]{SECOND AUTHOR}
#+Latex: \affil[2]{SECOND UNIVERSITY}
#+Latex: \affil[*]{now at: THIRD UNIVERSITY}

#+Latex: \received{}
#+Latex: \pubdiscuss{}
#+Latex: \revised{}
#+Latex: \accepted{}
#+Latex: \published{}
#+Latex: %% These dates will be inserted by ACPD
#+Latex: \firstpage{1}

#+Latex: \maketitle

#+Latex: \introduction
# * Introduction

* Second section

* Discussion

#+Latex: \conclusions
# * Conclusions

#+Latex: \appendix

# use acknowledgements for multiple
Foo Bar Baz.

#+Latex: \bibliographystyle{copernicus}

# Local Variables:
# org-confirm-babel-evaluate: nil
# org-export-allow-bind-keywords: t
# End:

Let’s look at this in more detail.

3.1 Use the LaTeX class

As first step, we set the LaTeX class. In the options we select the journal (acpd) and such - you can find the detailed options in the latex-instructions from ACP.

#+Latex_Class: copernicus_discussions
#+LaTeX_CLASS_OPTIONS: [acpd, hvmath, online]

3.2 Delayed table of contents

The table of contents is set to be shown after the Abstract by setting the toc:nil option and later explicitely calling #+TOC: headlines 2. In org-mode this is really straightforward.

3.3 Delayed maketitle

Delaying \textbackslash maketitle is a bit more convoluted than delaying the TOC. First we add the local variable org-export-allow-bind-keywords: t at the bottom to allow file-local custom bindings for functions in the file, then we inactivate the title-command with #+BIND: org-latex-title-command /""/ and finally we add \textbackslash maketitle where we need it.

3.4 Define minted style

This defines the variables minted uses for beautiful code-blocks. Without this, your code-blocks will just look like inline text.

#+BEGIN_SRC elisp :noweb no-export :exports results
  (setq org-latex-minted-options
    '(("bgcolor" "mintedbg") ("frame" "single") ("framesep" "6pt") 
      ("mathescape" "true") ("fontsize" "\\footnotesize")))

3.5 Intro and conclusions

The Introduction and the conclusions have their own commands in ACPD, because they use them to add bookmarks. You can also use he commands to specify another name.

We call the commands with #+LaTeX: (just like some others) which allows us to explicitely add arbitrary LaTeX-code.

3.6 Appendix

The appendix should be used sparingly. It changes the numbering of the pages.

#+Latex: \appendix

3.7 Bibliography

The bibliography allows referring to entries from your general bibtex-file. Ensure that you use the correct absolute path to that file. For more information, see the org-tutorial page for biblatex.

3.8 Babel evaluate without confirmation

This allows us to just run all code snippets which we embedded in the document when we export the file. If we do not set this local variable, we have to acknowledge each source block before it runs (the block with local variables also contains the variable which allows binding functions on a per-file basis, as explained above).

# Local Variables:
# org-confirm-babel-evaluate: nil
# org-export-allow-bind-keywords: t
# End:

4 Conclusion

With this setup, you can publish your paper with ACPD using org-mode for the actual writing, which has a much lower overhead than LaTeX and offers quite a few unique features for more efficient working - from easy referencing over inline math preview to planning and code-evaluation directly in your file.



General methods for using Emacs org-mode in scientific publishing have been described by \citet{SchulteEmacs2012}.


Research, or rather science not only means to learn new things and to uncover secrets, but just as importantly to share what you learn. Fun fact: The German word for science is “Wissenschaft”, built from the words “wissen” (knowledge) and “schaft” (from schaffen: create), so it more exactly captures the essence of scientific work than the word “science”, that is based on the latin word “scientia” which just means knowledge. It isn’t enough to just learn. Creating knowledge requires telling it to others, so they can build upon it.


I chose ACPD as target for this article, because it is an Open Access journal, and because I want to publish in it (which makes it a rather natural choice for a tutorial).

Unicode char \u8:χ not set up for use with LaTeX: Solution (made easy with Emacs)

For years I regularly stumbled over LaTeX-Errors in the form of Unicode char \u8:χ not set up for use with LaTeX. I always took the chickens path and replaced the unicode characters with the tex-escapes in the file. That was easy, but it made my files needlessly unreadable. Today I decided to FIX the problem once and for all. And it worked. Easily.

Firstoff: The problem I’m facing is that my keyboard layout makes it effortless for me to input characters like ℂ Σ and χ. But LaTeX cannot cope with them out-of-the-box. Org-mode already catches most of these problems, so I can write things like x² instead of x^2, but occasionally it stumbles.

The solution to that is actually pretty simple: I only need to declare the escapes-sequences LaTeX should use when it sees one of the characters (to be used before \begin{document}!):


Or in org-mode:

#+LaTeX_HEADER: \DeclareUnicodeCharacter{03C7}{\chi}

To do this more easily, you can use the uniinput.ins and uniinput.dtx from the neo-layout project. Run latex uniinput.ins to generate uniinput.sty which you can put next to your latex files and use with \usepackage{uniinput} (instructions in German).

Thanks go to Wikibooks:LaTeX for this. Their solution suggests then to read several Unicode definition documents for tracking down the codepoint of the character. But we can make that easier with Emacs (almost everything is easier with Emacs ☺).

Instead of browsing huge documents manually, we simply rely on the unicode-definitions in Emacs: Move the cursor over the char and execute M-x describe-char.

When used with χ, this shows the following output:

             position: 672 of 35513 (2%), column: 0
            character: χ (displayed as χ) (codepoint 967, #o1707, #x3c7)
    preferred charset: unicode-bmp (Unicode Basic Multilingual Plane (U+0000..U+FFFF))
code point in charset: 0x03C7
… (and a bit more) …

What we need is code point in charset: Just leave out the 0x and you have the codepoint.

For the document I currently write, I now use the following definitions:

#+LaTeX_HEADER: \DeclareUnicodeCharacter{03C7}{\chi}
#+LaTeX_HEADER: \DeclareUnicodeCharacter{B2}{^{2}}

And that makes χ² work.

Happy Hacking - and have fun with Emacs Org-Mode!

Unicode-Characters for TODO-States in Emacs Orgmode

By default Emacs Orgmode uses uppercase words for todo keywords. But having tens of entries marked with TODO and DONE in my file looked horribly cluttered to me. So I searched for alternatives. After a few months of experimentation, I decided on the following scheme. It served me well ever since:

  • ❢ To do
  • ☯ In progress
    • ⚙ A program is running (optional detail)
    • ✍ I’m writing (optional detail)
  • ⧖ Waiting
  • ☺ To report
  • ✔ Done
  • ⌚ Maybe do this at some later time
  • ✘ Won’t do

To set this in org-mode, just add the following to the header (and reopen the document, for example with C-x C-v):

#+SEQ_TODO: ❢ ☯ ⧖ | ☺ ✔ ⌚ ✘

or for the complex case (with details on what I do)

#+SEQ_TODO: ❢ ☯ ⚙ ✍ ⧖ | ☺ ✔ ⌚ ✘

Then use C-c C-t or SHIFT-→ (shift + right arrow) to switch to the next state or SHIFT-← (shift + left arrow) to switch to the previous state.

Anything before the | in the SEQ_TODO is shown in red (not yet done), anything after the | is show in green (done). Things which get triggered when something is done (like storing the time of a scheduled entry) happen when the state crosses the |.

And with that, my orgmode documents are not only very useful but also look pretty lean. Just as good as having a GUI with images, but additionally I can access them over SSH and edit the todo state with any tool - because it’s just text.

Use the source, Luke! — Emacs org-mode beamer export with images in figure

I just needed to tweak my Emacs org-mode to beamer-latex export to embed images into a figure environment (not wrapfigure!). After lots of googling and documentation reading I decided to bite the bullet and just read the source. Which proved to be much easier than I had expected.

This tutorial requires at least org-mode 8.0 (before that you had to use hacks to get figure without a caption). It is only tested for org-mode 8.0.2: The code you see when you read the source might look different in other versions.

1 Task

I just needed to tweak my org-mode to beamer-latex export to embed images I produce by a codesnippet in a figure environment. Practially speaking: I had this

#+BEGIN_SRC sh :exports results :results output raw
echo '[[./image.png]]'

which produces this latex snippet


and I needed a snippet which instead produces this:


2 Use the Source!

After lots of googling and documentation reading I decided to bite the bullet and just read the source. Which proved to be much easier than I had expected (warning: obscure list of commands follows. Will be explained afterwards):

C-h f org-latex-export-as-latex
C-x C-o
C-s .el C-b ENTER
C-s figure C-s C-s C-s ...

And less than a minute after starting, I saw this:

(float (let ((float (plist-get attr :float)))
     (cond ((string= float "wrap") 'wrap)
       ((string= float "multicolumn") 'multicolumn)
       ((or (string= float "figure")
            (org-element-property :caption parent))

Translated: Just add this to the output of the source block:

#+attr_latex: :float figure

which makes the sh block look like this:

#+BEGIN_SRC sh :exports results :results output raw
echo '#+attr_latex: :float figure'
echo '[[./image.png]]'

And voila, the export works and the latex looks like this:


Mission accomplished!

3 Commands Explained

For all those who are not fluid in emacs commands, Here’s a short breakdown here’s a breakdown of my source-reading process:

C-h f org-latex-export-as-latex

Get the help (Control-h) for the function (f) org-latex-export-as-latex. I knew that org-mode calls that. If you did not know it, you could have simply used C-h k C-e (get help on the export keyboard shortcut) which would have led you to the function org-export-dispatch and the source file ox.el. But since the org-mode guides tell you to use M-x org-latex-export-as-latex, the function to search for is actually pretty obvious. Alternatively just use M-x org-latex- and then type TAB 2 times. That will show you all the export functions.

C-x C-o

Switch to the other buffer.

C-s .el C-b ENTER

Focus on the source file and open it (the canonical suffix for emacs lisp files is .el).

C-s figure C-s C-s C-s ...

Search for figure. Repeat 9 times to find the correct place in the code (in emacs that’s really easy and fast to do).

Voilà, you found the snippet which tells you that you can use the float-keyword (:float) with the argument "figure".

4 Conclusion

Using the source was actually faster than googling in this case - and if you practise it, you learn bits and pieces about the foundation of the program you use, which will enable you to adapt it even better to your needs in the future.

And with that, I conclude this text.

Enjoy your Emacs and Happy Hacking!

2013-08-28-Mi-use-the-source-beamer-figure.org3.8 KB

Using Macros to avoid tedious tasks (screencast)

Because I am lazy,1 and that makes me fast.


(download (ogg theora video))

Using Macros to avoid tedious tasks


  • [X] Show the task
  • [X] Record Macro
  • [X] Use Macro


I record a macro to find ~, then activate the mark and find a space.


Then kill the region and type ${}

C-w ${}

That’s it.


  • It is resilient: I check each change I do.
  • I avoid repeating unnerving stuff.

Thank you

recorded with recordmydesktop: recordmydesktop --delay 10 --width 800 --height 600 --on-the-fly-encoding

  1. I have lots of stuff to do, so I cannot afford not being lazy ☺ 

using-emacs-macros-to-reduce-tedious-work-screencast.ogv17.81 MB
using-emacs-macros-to-reduce-tedious-work-screencast.org397 Bytes

Wish: KDE with Emacs-style keyboard shortcuts

I would love to be able to use KDE with emacs-style keyboard shortcuts, because Emacs offers a huge set of already clearly defined shortcuts for many different situations. Since its users tend to do very much with the keyboard alone, even more obscure tasks are available via shortcuts.

I think that this would be useful, because Emacs is like a kind of nongraphical desktop environment itself (just look at emacspeak!). For all those who use Emacs in a KDE environment, it could be a nice timesaver to be able to just use their accustomed bindings.

It also has a mostly clean structure for the bindings:

  • "C-x anything" does changes which affect things outside the content of the current buffer.
  • "C-c anything" is kept for specific actions of programs. For example "C-c C-c" in an email sends the email, while "C-c C-c" in a version tracking commit message finishes the message and starts the actual commit.
  • "C-anything but x or c" acts on the content you're currently editing.
  • "M-x" opens a 'command-selection-dialog' (just like alt-f2). You can run commands by name.
  • "M-anything but x" is a different flavor of "C-anything but x or c". For example "C-f" moves the cursor one character forward, while "M-f" moves one word forward. "C-v" moves one page forward, while "M-v" moves one page backwards.

On the backend side, this would require being able to define multistep shortcuts. Everything else is just porting the emacs shortcuts to KDE actions.

The actual porting of shortcuts would then require mapping of the Emacs commands to KDE actions.

Some examples:

  • "C-s" searches in a file. Replaces C-f.
  • "C-r" searches backwards.
  • "C-x C-s" saves a file -> close. Replaces C-w.
  • "C-x C-f" opens a file -> Open. Replaces C-o.
  • "C-x C-c" closes the program -> quit. Replaces C-q.
  • "C-x C-b" switches between buffers/files/tabs -> switch the open file. Replaces alt-right_arrow and a few other (to my knowledge) inconsistent bindings.
  • "C-x C-2" splits a window (or part of a window) vertically. "C-x C-o" switches between the parts. "C-x C-1" undoes the split and keeps the currently selected part. "C-x C-0" undoes the split and hides the currently selected part.

Write multiple images on a single page in org-mode.

How to add show multiple images on one page in the latex-export of emacs org-mode. I had this problem. This is my current solution.

1 Prep

Use the package subfig:

#+latex_header: \usepackage{subfig}

And create an image:

import pylab as pl
import numpy as np
x = np.random.random(size=(2,1000))
pl.scatter(x[0,:], x[1,:], marker=".")
print "\label{fig:image}"
print "[[./test.png]]"

\label{fig:image} test.png

Image: \ref{fig:image}

2 Multiple images on one page in LaTeX

\subfloat[A gull]{\label{fig:latex-gull} 
\subfloat[A tiger]{\label{fig:latex-tiger} 
\subfloat[A mouse]{\label{fig:latex-mouse} 
\caption{Multiple pictures}\label{fig:latex-animals}

Latex-Animals \ref{fig:latex-animals}.

3 Multiple images on one page in org-mode

#+latex: \begin{figure}\centering
#+latex: \subfloat[A gull]{\label{fig:org-gull} 
#+attr_latex: :width 0.3\textwidth
#+latex: }\subfloat[A tiger]{\label{fig:org-tiger} 
#+attr_latex: :width 0.3\textwidth
#+latex: }\subfloat[A mouse]{\label{fig:org-mouse} 
#+attr_latex: :width 0.3\textwidth
#+latex: }\caption{Multiple pictures}\label{fig:org-animals}
#+latex: \end{figure}




Org-Animals \ref{fig:org-animals}.

test.png98.4 KB
2014-01-14-Di-org-mode-multiple-images-per-page.pdf281.84 KB
2014-01-14-Di-org-mode-multiple-images-per-page.org2.48 KB

emacs wanderlust.el setup for reading kmail maildir

This is my wanderlust.el file to read kmail maildirs. You need to define every folder you want to read.

;; mode:-*-emacs-lisp-*-
;; wanderlust 
  elmo-maildir-folder-path "~/.kde/share/apps/kmail/mail"
          ;; where i store my mail

  wl-stay-folder-window t                       ;; show the folder pane (left)
  wl-folder-window-width 25                     ;; toggle on/off with 'i'
  wl-smtp-posting-server "smtp.web.de"            ;; put the smtp server here
  wl-local-domain "draketo.de"          ;; put something here...
  wl-message-id-domain "web.de"     ;; ...

file continued:

  wl-from "Arne Babenhauserheide "                  ;; my From:

  ;; note: all below are dirs (Maildirs) under elmo-maildir-folder-path 
  ;; the '.'-prefix is for marking them as maildirs
  wl-fcc ".sent-mail"                       ;; sent msgs go to the "sent"-folder
  wl-fcc-force-as-read t               ;; mark sent messages as read 
  wl-default-folder ".inbox"           ;; my main inbox 
  wl-draft-folder ".drafts"            ;; store drafts in 'postponed'
  wl-trash-folder ".trash"             ;; put trash in 'trash'
  wl-spam-folder ".gruppiert/Spam"              ;; ...spam as well
  wl-queue-folder ".queue"             ;; we don't use this

  ;; check this folder periodically, and update modeline
  wl-biff-check-folder-list '(".todo") ;; check every 180 seconds
                                       ;; (default: wl-biff-check-interval)

  ;; hide many fields from message buffers
  wl-message-ignored-field-list '("^.*:")

; Encryption via GnuPG

(require 'mailcrypt)
 (load-library "mailcrypt") ; provides "mc-setversion"
(mc-setversion "gpg")    ; for PGP 2.6 (default); also "5.0" and "gpg"

(autoload 'mc-install-write-mode "mailcrypt" nil t)
(autoload 'mc-install-read-mode "mailcrypt" nil t)
(add-hook 'mail-mode-hook 'mc-install-write-mode)

(add-hook 'wl-summary-mode-hook 'mc-install-read-mode)
(add-hook 'wl-mail-setup-hook 'mc-install-write-mode)

;(setq mc-pgp-keydir "~/.gnupg")
;(setq mc-pgp-path "gpg")
(setq mc-encrypt-for-me t)
(setq mc-pgp-user-id "FE96C404")

(defun mc-wl-verify-signature ()

(defun mc-wl-decrypt-message ()
    (let ((inhibit-read-only t))

(eval-after-load "mailcrypt"
  '(setq mc-modes-alist
         ((wl-draft-mode (encrypt . mc-encrypt-message)
            (sign . mc-sign-message))
          (wl-summary-mode (decrypt . mc-wl-decrypt-message)
            (verify . mc-wl-verify-signature))))

; flowed text

 ;; Reading f=f
 (autoload 'fill-flowed "flow-fill")
 (add-hook 'mime-display-text/plain-hook
          (lambda ()
            (when (string= "flowed"
                           (cdr (assoc "format"
                                        (mime-entity-content-type entity)))))
; writing f=f
;(mime-edit-insert-tag "text" "plain" "; format=flowed")

(provide 'private-wanderlust)

UPDATE (2012-05-07): ~/.folders

I now use a ~/.folders file, to manage my non-kmail maildir subscriptions, too. It looks like this:

.~/.local/share/mail/mgl_spam   "mgl spam" 
.~/.local/share/mail/to.arne_bab    "to arne_bab"
.inbox  "inbox" 
.trash  "Trash"
..gruppiert.directory/.inbox.directory/Freunde  "Freunde"
.drafts "Drafts"
..gruppiert.directory/.alt.directory/Posteingang-2011-09-18 "2011-09-18"

The mail in ~/.local/share/mail is fetched via fetchmail and procmail to have a really reliable mail fetching system which does not rely on a non-broken database or free space on the disk to keep working…

keep auto-complete from competing with org-mode structure-templates

For a long time it bothered me that auto-complete made it necessary for me to abort completion before being able to use org-mode templates.

I typed <s and auto-complete showed stuff like <string, forcing me to hit C-g before I could use TAB to complete the template with org-mode.

I fixed this for me by adding all the org-mode structure templates as stop-words:

;; avoid competing with org-mode templates.
(add-hook 'org-mode-hook
          (lambda ()
            (make-local-variable 'ac-stop-words)
            (loop for template in org-structure-template-alist do
                  (add-to-list 'ac-stop-words 
                               (concat "<" (car template))))))

Note, that with this snippet you will have to reopen a file if you add an org-mode template and want it recognized as stop-word in that file.

PS: I added this as bug-report to auto-complete, so with some luck you might not have to bother with this, if you’re willing to simply wait for the next release ☺

Free Software

„Free, Reliable, Ethical and Efficient“
„Frei, Robust, Ethisch und Innovativ”
„Libre, Inagotable, Bravo, Racional y Encantado“

Articles connected to Free Software (mostly as defined by the GNU Project). This is more technical than Politics and Free Licensing, though there is some overlap.

Also see my lists of articles about specific free software projects:

  • Emacs - THE Editor.
  • Freenet - Decentralized, Anonymous Communication.
  • Mercurial - Decentralized Version Control System.

There is also a German Version to this Page: Freie Software. Most articles are not translated, so the content on the german page and on the english page is very different.

wisp: Whitespace to Lisp

New version: draketo.de/software/wisp

» I love the syntax of Python, but crave the simplicity and power of Lisp.«

display "Hello World!" ↦ (display "Hello World!")
define : factorial n     (define (factorial n)            
    if : zero? n       ↦     (if (zero? n)                
       . 1                      1                      
       * n : factorial {n - 1}  (* n (factorial {n - 1}))))

Wisp basics

»ArneBab's alternate sexp syntax is best I've seen; pythonesque, hides parens but keeps power« — Christopher Webber in twitter, in identi.ca and in his blog: Wisp: Lisp, minus the parentheses
♡ wow ♡
»Wisp allows people to see code how Lispers perceive it. Its structure becomes apparent.« — Ricardo Wurmus in IRC, paraphrasing the wisp statement from his talk at FOSDEM 2019 about Guix for reproducible science in HPC.
☺ Yay! ☺
with (open-file "with.w" "r") as port
     format #t "~a\n" : read port
Familiar with-statement in 25 lines.

 ↓ skip updates ↓

Update (2020-09-15): Wisp 1.0.3 provides a wisp binary to start a wisp repl or run wisp files, builds with Guile 3, and moved to sourcehut for libre hosting: hg.sr.ht/~arnebab/wisp.
After installation, just run wisp to enter a wisp-shell (REPL).
This release also ships wisp-mode 0.2.6 (fewer autoloads), ob-wisp 0.1 (initial support for org-babel), and additional examples. New auxiliary projects include wispserve for experiments with streaming and download-mesh via Guile and wisp in conf:
conf new -l wisp PROJNAME creates an autotools project with wisp while conf new -l wisp-enter PROJAME creates a project with natural script writing and guile doctests set up. Both also install a script to run your project with minimal start time: I see 25ms to 130ms for hello world (36ms on average). The name of the script is the name of your project.
For more info about Wisp 1.0.3, see the NEWS file.
To test wisp v1.0.3, install Guile 2.0.11 or later and bootstrap wisp:

wget https://www.draketo.de/files/wisp-1.0.3.tar_.gz;
tar xf wisp-1.0.3.tar_.gz ; cd wisp-1.0.3/;
./configure; make check;
examples/newbase60.w 123

If it prints 23 (123 in NewBase60), your wisp is fully operational.
If you have additional questions, see the Frequently asked Questions (FAQ) and chat in #guile at freenode.
That’s it - have fun with wisp syntax!

Update (2019-07-16): wisp-mode 0.2.5 now provides proper indentation support in Emacs: Tab increases indentation and cycles back to zero. Shift-tab decreases indentation via previously defined indentation levels. Return preserves the indentation level (hit tab twice to go to zero indentation).
Update (2019-06-16): In c programming the uncommon way, specifically c-indent, tantalum is experimenting with combining wisp and sph-sc, which compiles scheme-like s-expressions to c. The result is a program written like this:
pre-include "stdio.h"

define (main argc argv) : int int char**
  declare i int
  printf "the number of arguments is %d\n" argc
  for : (set i 0) (< i argc) (set+ i 1)
    printf "arg %d is %s\n" (+ i 1) (array-get argv i)
  return 0 ;; code-snippet under GPLv3+
To me that looks so close to C that it took me a moment to realize that it isn’t just using a parser which allows omitting some special syntax of C, but actually an implementation of a C-generator in Scheme (similar in spirit to cython, which generates C from Python), which results in code that looks like a more regular version of C without superfluous parens. Wisp really completes the round-trip from C over Scheme to something that looks like C but has all the regularity of Scheme, because all things considered, the code example is regular wisp-code. And it is awesome to see tantalum take up the tool I created and use it to experiment with ways to program that I never even imagined! ♡
TLDR: tantalum uses wisp for code that looks like C and compiles to C but has the regularity of Scheme!
Update (2019-06-02): The repository at https://www.draketo.de/proj/wisp/ is stale at the moment, because the staticsite extension I use to update it was broken by API changes and I currently don’t have the time to fix it. Therefore until I get it fixed, the canonical repository for wisp is https://bitbucket.org/ArneBab/wisp/. I’m sorry for that. I would prefer to self-host it again, but the time to read up what i have to adjust blocks that right now (typically the actual fix only needs a few lines). A pull-request which fixes the staticsite extension for modern Mercurial would be much appreciated!
Update (2019-02-08): wisp v1.0 released as announced at FOSDEM. Wisp the language is complete:
display "Hello World!"
↦ (display "Hello World!")

And it achieves its goal:
“Wisp allows people to see code how Lispers perceive it. Its structure becomes apparent.” — Ricardo Wurmus at FOSDEM
Tooling, documentation, and porting of wisp are still work in progress, but before I go on, I want thank the people from the readable lisp project. Without our initial shared path, and without their encouragement, wisp would not be here today. Thank you! You’re awesome!
With this release it is time to put wisp to use. To start your own project, see the tutorial Starting a wisp project and the wisp tutorial. For more info, see the NEWS file. To test wisp v1.0, install Guile 2.0.11 or later and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-1.0.tar.gz;
tar xf wisp-1.0.tar.gz ; cd wisp-1.0/;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
If you have additional questions, see the Frequently asked Questions (FAQ) and chat in #guile at freenode.
That’s it - have fun with wisp syntax!
Update (2019-01-27): wisp v0.9.9.1 released which includes the emacs support files missed in v0.9.9, but excludes unnecessary files which increased the release size from 500k to 9 MiB (it's now back at about 500k). To start your own wisp-project, see the tutorial Starting a wisp project and the wisp tutorial. For more info, see the NEWS file. To test wisp v0.9.9.1, install Guile 2.0.11 or later and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-;
tar xf wisp- ; cd wisp-;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2019-01-22): wisp v0.9.9 released with support for literal arrays in Guile (needed for doctests), example start times below 100ms, ob-wisp.el for emacs org-mode babel and work on examples: network, securepassword, and downloadmesh. To start your own wisp-project, see the tutorial Starting a wisp project and the wisp tutorial. For more info, see the NEWS file. To test wisp v0.9.9, install Guile 2.0.11 or later and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.9.9.tar.gz;
tar xf wisp-0.9.9.tar.gz ; cd wisp-0.9.9/;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2018-06-26): There is now a wisp tutorial for beginning programmers: “In this tutorial you will learn to write programs with wisp. It requires no prior knowledge of programming.”Learn to program with Wisp, published in With Guise and Guile
Update (2017-11-10): wisp v0.9.8 released with installation fixes (thanks to benq!). To start your own wisp-project, see the tutorial Starting a wisp project. For more info, see the NEWS file. To test wisp v0.9.8, install Guile 2.0.11 or later and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.9.8.tar.gz;
tar xf wisp-0.9.8.tar.gz ; cd wisp-0.9.8/;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2017-10-17): wisp v0.9.7 released with bugfixes. To start your own wisp-project, see the tutorial Starting a wisp project. For more info, see the NEWS file. To test wisp v0.9.7, install Guile 2.0.11 or later and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.9.7.tar.gz;
tar xf wisp-0.9.7.tar.gz ; cd wisp-0.9.7/;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2017-10-08): wisp v0.9.6 released with compatibility for tests on OSX and old autotools, installation to guile/site/(guile version)/language/wisp for cleaner installation, debugging and warning when using not yet defined lower indentation levels, and with wisp-scheme.scm moved to language/wisp.scm. This allows creating a wisp project by simply copying language/. A short tutorial for creating a wisp project is available at Starting a wisp project as part of With Guise and Guile. For more info, see the NEWS file. To test wisp v0.9.6, install Guile 2.0.11 or later and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.9.6.tar.gz;
tar xf wisp-0.9.6.tar.gz ; cd wisp-0.9.6/;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2017-08-19): Thanks to tantalum, wisp is now available as package for Arch Linux: from the Arch User Repository (AUR) as guile-wisp-hg! Instructions for installing the package are provided on the AUR page in the Arch Linux wiki. Thank you, tantalum!
Update (2017-08-20): wisp v0.9.2 released with many additional examples including the proof-of-concept for a minimum ceremony dialog-based game duel.w and the datatype benchmarks in benchmark.w. For more info, see the NEWS file. To test it, install Guile 2.0.11 or later and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.9.2.tar.gz;
tar xf wisp-0.9.2.tar.gz ; cd wisp-0.9.2/;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2017-03-18): I removed the link to Gozala’s wisp, because it was put in maintenance mode. Quite the opposite of Guile which is taking up speed and just released Guile version 2.2.0, fully compatible with wisp (though wisp helped to find and fix one compiler bug, which is something I’m really happy about ☺).
Update (2017-02-05): Allan C. Webber presented my talk Natural script writing with Guile in the Guile devroom at FOSDEM. The talk was awesome — and recorded! Enjoy Natural script writing with Guile by "pretend Arne" ☺

presentation (pdf, 16 slides) and its source (org).
Have fun with wisp syntax!
Update (2016-07-12): wisp v0.9.1 released with a fix for multiline strings and many additional examples. For more info, see the NEWS file. To test it, install Guile 2.0.11 or later and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.9.1.tar.gz;
tar xf wisp-0.9.1.tar.gz ; cd wisp-0.9.1/;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2016-01-30): I presented Wisp in the Guile devroom at FOSDEM. The reception was unexpectedly positive — given some of the backlash the readable project got I expected an exceptionally sceptical audience, but people rather asked about ways to put Wisp to good use, for example in templates, whether it works in the REPL (yes, it does) and whether it could help people start into Scheme. The atmosphere in the Guile devroom was very constructive and friendly during all talks, and I’m happy I could meet the Hackers there in person. I’m definitely taking good memories with me. Sadly the video did not make it, but the schedule-page includes the presentation (pdf, 10 slides) and its source (org).
Have fun with wisp syntax!
Update (2016-01-04): Wisp is available in GNU Guix! Thanks to the package from Christopher Webber you can try Wisp easily on top of any distribution:
guix package -i guile guile-wisp
guile --language=wisp
This already gives you Wisp at the REPL (take care to follow all instructions for installing Guix on top of another distro, especially the locales).
Have fun with wisp syntax!
Update (2015-10-01): wisp v0.9.0 released which no longer depends on Python for bootstrapping releases (but ./configure still asks for it — a fix for another day). And thanks to Christopher Webber there is now a patch to install wisp within GNU Guix. For more info, see the NEWS file. To test it, install Guile 2.0.11 or later and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.9.0.tar.gz;
tar xf wisp-0.9.0.tar.gz ; cd wisp-0.9.0/;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2015-09-12): wisp v0.8.6 released with fixed macros in interpreted code, chunking by top-level forms, : . parsed as nothing, ending chunks with a trailing period, updated example evolve and added examples newbase60, cli, cholesky decomposition, closure and hoist in loop. For more info, see the NEWS file.To test it, install Guile 2.0.x or 2.2.x and Python 3 and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.6.tar.gz;
tar xf wisp-0.8.6.tar.gz ; cd wisp-0.8.6/;
./configure; make check;
examples/newbase60.w 123
If it prints 23 (123 in NewBase60), your wisp is fully operational.
That’s it - have fun with wisp syntax! And a happy time together for the ones who merge their paths today ☺
Update (2015-04-10): wisp v0.8.3 released with line information in backtraces. For more info, see the NEWS file.To test it, install Guile 2.0.x or 2.2.x and Python 3 and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.3.tar.gz;
tar xf wisp-0.8.3.tar.gz ; cd wisp-0.8.3/;
./configure; make check;
guile -L . --language=wisp tests/factorial.w; echo
If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2015-03-18): wisp v0.8.2 released with reader bugfixes, new examples and an updated draft for SRFI 119 (wisp). For more info, see the NEWS file.To test it, install Guile 2.0.x or 2.2.x and Python 3 and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.2.tar.gz;
tar xf wisp-0.8.2.tar.gz ; cd wisp-0.8.2/;
./configure; make check;
guile -L . --language=wisp tests/factorial.w; echo
If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2015-02-03): The wisp SRFI just got into draft state: SRFI-119 — on its way to an official Scheme Request For Implementation!
Update (2014-11-19): wisp v0.8.1 released with reader bugfixes. To test it, install Guile 2.0.x and Python 3 and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.1.tar.gz;
tar xf wisp-0.8.1.tar.gz ; cd wisp-0.8.1/;
./configure; make check;
guile -L . --language=wisp tests/factorial.w; echo
If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2014-11-06): wisp v0.8.0 released! The new parser now passes the testsuite and wisp files can be executed directly. For more details, see the NEWS file. To test it, install Guile 2.0.x and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.0.tar.gz;
tar xf wisp-0.8.0.tar.gz ; cd wisp-0.8.0/;
./configure; make check;
guile -L . --language=wisp tests/factorial.w;
If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational.
That’s it - have fun with wisp syntax!
On a personal note: It’s mindboggling that I could get this far! This is actually a fully bootstrapped indentation sensitive programming language with all the power of Scheme underneath, and it’s a one-person when-my-wife-and-children-sleep sideproject. The extensibility of Guile is awesome!
Update (2014-10-17): wisp v0.6.6 has a new implementation of the parser which now uses the scheme read function. `wisp-scheme.w` parses directly to a scheme syntax-tree instead of a scheme file to be more suitable to an SRFI. For more details, see the NEWS file. To test it, install Guile 2.0.x and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.6.6.tar.gz;
tar xf wisp-0.6.6.tar.gz; cd wisp-0.6.6;
./configure; make;
guile -L . --language=wisp
That’s it - have fun with wisp syntax at the REPL!
Caveat: It does not support the ' prefix yet (syntax point 4).
Update (2014-01-04): Resolved the name-clash together with Steve Purcell und Kris Jenkins: the javascript wisp-mode was renamed to wispjs-mode and wisp.el is called wisp-mode 0.1.5 again. It provides syntax highlighting for Emacs and minimal indentation support via tab. You can install it with `M-x package-install wisp-mode`
Update (2014-01-03): wisp-mode.el was renamed to wisp 0.1.4 to avoid a name clash with wisp-mode for the javascript-based wisp.
Update (2013-09-13): Wisp now has a REPL! Thanks go to GNU Guile and especially Mark Weaver, who guided me through the process (along with nalaginrut who answered my first clueless questions…).
To test the REPL, get the current code snapshot, unpack it, run ./bootstrap.sh, start guile with $ guile -L . (requires guile 2.x) and enter ,language wisp.
Example usage:
display "Hello World!\n"
then hit enter thrice.
Voilà, you have wisp at the REPL!
Caveeat: the wisp-parser is still experimental and contains known bugs. Use it for testing, but please do not rely on it for important stuff, yet.
Update (2013-09-10): wisp-guile.w can now parse itself! Bootstrapping: The magical feeling of seeing a language (dialect) grow up to live by itself: python3 wisp.py wisp-guile.w > 1 && guile 1 wisp-guile.w > 2 && guile 2 wisp-guile.w > 3 && diff 2 3. Starting today, wisp is implemented in wisp.
Update (2013-08-08): Wisp 0.3.1 released (Changelog).

Table of Contents

2 What is wisp?

Wisp is a simple preprocessor which turns indentation sensitive syntax into Lisp syntax.

The basic goal is to create the simplest possible indentation based syntax which is able to express all possibilities of Lisp.

Basically it works by inferring the parentheses of lisp by reading the indentation of lines.

It is related to SRFI-49 and the readable Lisp S-expressions Project (and actually inspired by the latter), but it tries to Keep it Simple and Stupid: wisp is a simple preprocessor which can be called by any lisp implementation to add support for indentation sensitive syntax. To repeat the initial quote:

I love the syntax of Python, but crave the simplicity and power of Lisp.

With wisp I hope to make it possible to create lisp code which is easily readable for non-programmers (and me!) and at the same time keeps the simplicity and power of Lisp.

Its main technical improvement over SRFI-49 and Project Readable is using lines prefixed by a dot (". ") to mark the continuation of the parameters of a function after intermediate function calls.

The dot-syntax means, instead of marking every function call, it marks every line which does not begin with a function call - which is the much less common case in lisp-code.

See the Updates for information how to get the current version of wisp.

Frequently asked Questions

Can this represent any Scheme code?

Yes. Wisp enables you to write arbitrary code structures using indentation. When you write code in wisp and run it with Guile, it is full Scheme code with all its capabilities.

How do Macros work with wisp?

Just like they work in Scheme code that has parentheses: Write the same structure as with Scheme but use indentation for structure instead of parentheses where that is more readable to you or your future readers. See for example the macro-writing-macro Enter in Enter three witches.

3 Wisp syntax rules

  1. A line without indentation is a function call, just as if it would start with a bracket.
    display "Hello World!"      ↦      (display "Hello World!")

  2. A line which is more indented than the previous line is a sibling to that line: It opens a new bracket.
    display                              ↦    (display
      string-append "Hello " "World!"    ↦      (string-append "Hello " "World!"))

  3. A line which is not more indented than previous line(s) closes the brackets of all previous lines which have higher or equal indentation. You should only reduce the indentation to indentation levels which were already used by parent lines, else the behaviour is undefined.
    display                              ↦    (display
      string-append "Hello " "World!"    ↦      (string-append "Hello " "World!"))
    display "Hello Again!"               ↦    (display "Hello Again!")

  4. To add any of ' , or ` to a bracket, just prefix the line with any combination of "' ", ", " or "` " (symbol followed by one space).
    ' "Hello World!"      ↦      '("Hello World!")

  5. A line whose first non-whitespace characters are a dot followed by a space (". ") does not open a new bracket: it is treated as simple continuation of the first less indented previous line. In the first line this means that this line does not start with a bracket and does not end with a bracket, just as if you had directly written it in lisp without the leading ". ".
    string-append "Hello"        ↦    (string-append "Hello"
      string-append " " "World"  ↦      (string-append " " "World")
      . "!""!")

  6. A line which contains only whitespace and a colon (":") defines an indentation level at the indentation of the colon. It opens a bracket which gets closed by the next less- or equal-indented line. If you need to use a colon by itself. you can escape it as "\:".
    let                       ↦    (let
      :                       ↦      ((msg "Hello World!"))
        msg "Hello World!"    ↦      (display msg))
      display msg             ↦      

  7. A colon sourrounded by whitespace (" : ") in a non-empty line starts a bracket which gets closed at the end of the line.
    define : hello who                    ↦    (define (hello who)
      display                             ↦      (display 
        string-append "Hello " who "!"    ↦        (string-append "Hello " who "!")))

  8. You can replace any number of consecutive initial spaces by underscores, as long as at least one whitespace is left between the underscores and any following character. You can escape initial underscores by prefixing the first one with \ ("\___ a" → "(___ a)"), if you have to use them as function names.
    define : hello who                    ↦    (define (hello who)
    _ display                             ↦      (display 
    ___ string-append "Hello " who "!"    ↦        (string-append "Hello " who "!")))


To make that easier to understand, let’s just look at the examples in more detail:

3.1 A simple top-level function call

display "Hello World!"      ↦      (display "Hello World!")

This one is easy: Just add a bracket before and after the content.

3.2 Multiple function calls

display "Hello World!"      ↦      (display "Hello World!")
display "Hello Again!"      ↦      (display "Hello Again!")

Multiple lines with the same indentation are separate function calls (except if one of them starts with ". ", see Continue arguments, shown in a few lines).

3.3 Nested function calls

display                              ↦    (display
  string-append "Hello " "World!"    ↦      (string-append "Hello " "World!"))

If a line is more indented than a previous line, it is a sibling to the previous function: The brackets of the previous function gets closed after the (last) sibling line.

3.4 Continue function arguments

By using a . followed by a space as the first non-whitespace character on a line, you can mark it as continuation of the previous less-indented line. Then it is no function call but continues the list of parameters of the funtcion.

I use a very synthetic example here to avoid introducing additional unrelated concepts.

string-append "Hello"        ↦    (string-append "Hello"
  string-append " " "World"  ↦      (string-append " " "World")
  . "!""!")

As you can see, the final "!" is not treated as a function call but as parameter to the first string-append.

This syntax extends the notion of the dot as identity function. In many lisp implementations1 we already have `(= a (. a))`.

= a        ↦    (= a
  . a      ↦      (. a))

With wisp, we extend that equality to `(= '(a b c) '((. a b c)))`.

. a b c    ↦    a b c

3.5 Double brackets (let-notation)

If you use `let`, you often need double brackets. Since using pure indentation in empty lines would be really error-prone, we need a way to mark a line as indentation level.

To add multiple brackets, we use a colon to mark an intermediate line as additional indentation level.

let                       ↦    (let
  :                       ↦      ((msg "Hello World!"))
    msg "Hello World!"    ↦      (display msg))
  display msg             ↦      

3.6 One-line function calls inline

Since we already use the colon as syntax element, we can make it possible to use it everywhere to open a bracket - even within a line containing other code. Since wide unicode characters would make it hard to find the indentation of that colon, such an inline-function call always ends at the end of the line. Practically that means, the opened bracket of an inline colon always gets closed at the end of the line.

define : hello who                            ↦    (define (hello who)
  display : string-append "Hello " who "!"    ↦      (display (string-append "Hello " who "!")))

This also allows using inline-let:

let                       ↦    (let
  : msg "Hello World!"    ↦      ((msg "Hello World!"))
  display msg             ↦      (display msg))

and can be stacked for more compact code:

let : : msg "Hello World!"     ↦    (let ((msg "Hello World!"))
  display msg                  ↦      (display msg))

3.7 Visible indentation

To make the indentation visible in non-whitespace-preserving environments like badly written html, you can replace any number of consecutive initial spaces by underscores, as long as at least one whitespace is left between the underscores and any following character. You can escape initial underscores by prefixing the first one with \ ("\___ a" → "(___ a)"), if you have to use them as function names.

define : hello who                    ↦    (define (hello who)
_ display                             ↦      (display 
___ string-append "Hello " who "!"    ↦        (string-append "Hello " who "!")))

4 Syntax justification

I do not like adding any unnecessary syntax element to lisp. So I want to show explicitely why the syntax elements are required to meet the goal of wisp: indentation-based lisp with a simple preprocessor.

4.1 . (the dot)

We have to be able to continue the arguments of a function after a call to a function, and we must be able to split the arguments over multiple lines. That’s what the leading dot allows. Also the dot at the beginning of the line as marker of the continuation of a variable list is a generalization of using the dot as identity function - which is an implementation detail in many lisps.

`(. a)` is just `a`.

So for the single variable case, this would not even need additional parsing: wisp could just parse ". a" to "(. a)" and produce the correct result in most lisps. But forcing programmers to always use separate lines for each parameter would be very inconvenient, so the definition of the dot at the beginning of the line is extended to mean “take every element in this line as parameter to the parent function”.

Essentially this dot-rule means that we mark variables at the beginning of lines instead of marking function calls, since in Lisp variables at the beginning of a line are much rarer than in other programming languages. In Lisp, assigning a value to a variable is a function call while it is a syntax element in many other languages. What would be a variable at the beginning of a line in other languages is a function call in Lisp.

(Optimize for the common case, not for the rare case)

4.2 : (the colon)

For double brackets and for some other cases we must have a way to mark indentation levels without any code. I chose the colon, because it is the most common non-alpha-numeric character in normal prose which is not already reserved as syntax by lisp when it is surrounded by whitespace, and because it already gets used for marking keyword arguments to functions in Emacs Lisp, so it does not add completely alien characters.

The function call via inline " : " is a limited generalization of using the colon to mark an indentation level: If we add a syntax-element, we should use it as widely as possible to justify the added syntax overhead.

But if you need to use : as variable or function name, you can still do that by escaping it with a backslash (example: "\:"), so this does not forbid using the character.

4.3 _ (the underscore)

In Python the whitespace hostile html already presents problems with sharing code - for example in email list archives and forums. But in Python the indentation can mostly be inferred by looking at the previous line: If that ends with a colon, the next line must be more indented (there is nothing to clearly mark reduced indentation, though). In wisp we do not have this help, so we need a way to survive in that hostile environment.

The underscore is commonly used to denote a space in URLs, where spaces are inconvenient, but it is rarely used in lisp (where the dash ("-") is mostly used instead), so it seems like a a natural choice.

You can still use underscores anywhere but at the beginning of the line. If you want to use it at the beginning of the line you can simply escape it by prefixing the first underscore with a backslash (example: "\___").

5 Background

A few months ago I found the readable Lisp project which aims at producing indentation based lisp, and I was thrilled. I had already done a small experiment with an indentation to lisp parser, but I was more than willing to throw out my crappy code for the well-integrated parser they had.

Fast forward half a year. It’s February 2013 and I started reading the readable list again after being out of touch for a few months because the birth of my daughter left little time for side-projects. And I was shocked to see that the readable folks had piled lots of additional syntax elements on their beautiful core model, which for me destroyed the simplicity and beauty of lisp. When language programmers add syntax using \\, $ and <>, you can be sure that it is no simple lisp anymore. To me readability does not just mean beautiful code, but rather easy to understand code with simple concepts which are used consistently. I prefer having some ugly corner cases to adding more syntax which makes the whole language more complex.

I told them about that and proposed a simpler structure which achieved almost the same as their complex structure. To my horror they proposed adding my proposal to readable, making it even more bloated (in my opinion). We discussed a long time - the current syntax for inline-colons is a direct result of that discussion in the readable list - then Alan wrote me a nice mail, explaining that readable will keep its direction. He finished with «We hope you continue to work with or on indentation-based syntaxes for Lisp, whether sweet-expressions, your current proposal, or some other future notation you can develop.»

It took me about a month to answer him, but the thought never left my mind (@Alan: See what you did? You anchored the thought of indentation based lisp even deeper in my mind. As if I did not already have too many side-projects… :)).

Then I had finished the first version of a simple whitespace-to-lisp preprocessor.

And today I added support for reading indentation based lisp from standard input which allows actually using it as in-process preprocessor without needing temporary files, so I think it is time for a real release outside my Mercurial repository.

So: Have fun with wisp v0.2 (tarball)!

PS: Wisp is linked in the comparisions of SRFI-110.

wisp-1.0.3.tar_.gz756.71 KB

Live stream from the Guile devroom at FOSDEM 2017!

Update: The recording is now online at ftp.fau.de/fosdem/2017/K.4.601/naturalscriptwritingguile.vp8.webm

Here’s the stream to the Guile devroom at #FOSDEM: https://live.fosdem.org/watch/k4601

Schedule (also on the FOSDEM page):

  • 09:45 10:30: Small languages panel Christopher Webber, Ludovic Courtès, Etiene Dalcol, Justin Cormack
  • 10:30 11:00: An introduction to functional package management with GNU Guix Ricardo Wurmus
  • 11:00 11:30: User interfaces with Guile and their application John Darrington
  • 11:30 12:00: Hacking with Guile… Alex Sassmannshausen
  • 12:00 12:45: Composing system services in GuixSD Ludovic Courtès
  • 12:45 13:15: Reproducible packaging and distribution of software with GNU Guix Pjotr Prins
  • 13:15 14:00: Network freedom, live at the REPL! Christopher Webber
  • 14:00 14:30: Natural script writing with Guile Arne Babenhauserheide (sadly I had to cancel my attendance, Christopher Allan Webber will present the slides — thank you!)
  • 14:30 15:00: Mes -- Maxwell's Equations of Software Jan Nieuwenhuizen (janneke)
  • 15:00 15:30: Adding GNU/Hurd support to GNU Guix and GuixSD Manolis Ragkousis
  • 15:30 16:00: Workflow management with GNU Guix Roel Janssen
  • 16:00 16:30: Getting started with guile-wiredtiger Amirouche Boubekki (amz3)
  • 16:30 17:00: Future of Guix Christopher Webber, Ludovic Courtès, Pjotr Prins, Ricardo Wurmus

Every one of these talks sounds awesome! Here’s where we get deep.

Using Guile Scheme Wisp for low ceremony embedded languages

Update 2020: In Dryads Wake I am starting a game using the way presented here to write dialogue-focused games with minimal ceremony.

Update 2017: A matured version of the work shown here was presented at FOSDEM 2017 as Natural script writing with Guile. There is also a video of the presentation (held by Christopher Allan Webber; more info…). Happy Hacking!

Programming languages allow expressing ideas in non-ambiguous ways. Let’s do a play.

say Yes, I do!
Yes, I do!

This is a sketch of applying Wisp to a pet issue of mine: Writing the story of games with minimal syntax overhead, but still using a full-fledged programming language. My previous try was the TextRPG, using Python. It was fully usable. This experiment drafts a solution to show how much more is possible with Guile Scheme using Wisp syntax (also known an SRFI-119).

To follow the code here, you need Guile 2.0.11 on a GNU Linux system. Then you can install Wisp and start a REPL with

wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.6.tar.gz
tar xf wi*z; cd wi*/; ./c*e; make check; guile -L . --language=wisp

For finding minimal syntax, the first thing to do is to look at how such a structure would be written for humans. Let’s take the obvious and use Shakespeare: Macbeth, Act 1, Scene 1 (also it’s public domain, so we avoid all copyright issues). Note that in the original, the second and last non-empty line are shown as italic.

SCENE I. A desert place.

    Thunder and lightning. Enter three Witches

First Witch
    When shall we three meet again
    In thunder, lightning, or in rain?

Second Witch
    When the hurlyburly's done,
    When the battle's lost and won.

Third Witch
    That will be ere the set of sun.

First Witch
    Where the place?

Second Witch
    Upon the heath.

Third Witch
    There to meet with Macbeth.

First Witch
    I come, Graymalkin!

Second Witch
    Paddock calls.

Third Witch

    Fair is foul, and foul is fair:
    Hover through the fog and filthy air.


Let’s analyze this: A scene header, a scene description with a list of people, then the simple format

    something said
    and something more

For this draft, it should suffice to reproduce this format with a full fledged programming language.

This is how our code should look:

First Witch
    When shall we three meet again
    In thunder, lightning, or in rain?

As a first step, let’s see how code which simply prints this would look in plain Wisp. The simplest way would just use a multiline string:

display "First Witch
    When shall we three meet again
    In thunder, lightning, or in rain?\n"

That works, but it’s not really nice. For one thing, the program does not have any of the semantic information a human would have, so if we wanted to show the First Witch in a different color than the Second Witch, we’d already be lost. Also throwing everything in a string might work, but when we need highlighting of certain parts, it gets ugly: We actually have to do string parsing by hand.

But this is Scheme, so there’s a better way. We can go as far as writing the sentences plainly, if we add a macro which grabs the variable names for us. We can do a simple form of this in just six short lines:

define-syntax-rule : First_Witch a ...
  format #t "~A\n" 
      map : lambda (x) (string-join (map symbol->string x))
            quote : a ...
      . "\n"

This already gives us the following syntax:

    When shall we three meet again
    In thunder, lightning, or in rain?

which prints

When shall we three meet again
In thunder, lightning, or in rain?

Note that :, . and , are only special when they are preceded by whitespace or are the first elements on a line, so we can freely use them here.

To polish the code, we could get rid of the underscore by treating everything on the first line as part of the character (indented lines are sublists of the main list, so a recursive syntax-case macro can distinguish them easily), and we could add highlighting with comma-prefixed parens (via standard Scheme preprocessing these get transformed into (unquote (...))). Finally we could add a macro for the scene, which creates these specialized parsers for all persons.

A completed parser could then read input files like the following:

SCENE I. A desert place.

    Thunder and lightning.

    Enter : First Witch
            Second Witch
            Third Witch

First Witch
    When shall ,(emphasized we three) meet again
    In thunder, lightning, or in rain?

Second Witch
    When the hurlyburly's done,
    When the battle's lost and won.

; ...

    Fair is foul, and foul is fair:
    Hover through the fog and filthy air.


And with that the solution is sketched. I hope it was interesting for you to see how easy it is to create this!

Note also that this is not just a specialized text-parser. It provides access to all of Guile Scheme, so if you need interactivity or something like the branching story from TextRPG, scene writers can easily add it without requiring help from the core system. That’s part of the Freedom for Developers from the language implementors which is at the core of GNU Guile.

Don’t use this as data interchange format for things downloaded from the web, though: It does give access to a full Turing complete language. That’s part of its power which allows you to realize a simple syntax without having to implementent all kinds of specialized features which are needed for only one or two scenes. If you want to exchange the stories, better create a restricted interchange-format which can be exported from scenes written in the general format. Use lossy serializiation to protect your users.

And that’s all I wanted to say ☺

Happy Hacking!

PS: For another use of Shakespeare in programming languages, see the Shakespeare programming language. Where this article uses Wisp as a very low ceremony language to represent very high level concepts, the Shapespeare programming language takes the opposite approach by providing an extremely high-ceremony language for very low-level concepts. Thanks to ZMeson for reminding me ☺

2015-09-12-Sa-Guile-scheme-wisp-for-low-ceremony-languages.org6.35 KB
enter-three-witches.w1.23 KB

Going from Python to Guile Scheme - a natural progression

py2guile book

Python is the first language I loved. I dreamt in Python, I planned in Python, I thought I would never need anything else.

 - Free: html | pdf
 - Softcover: 14.95 €
   with pdf, epub, mobi
 - Source: download
   free licensed under GPL

I will show you why I love Python

Python is a language where I can teach a handful of APIs and cause people to learn most of the language as a whole.Raymond Hettinger (2011-06-20)

  • Pseudocode which runs
  • One way to do it
  • Hackable
  • Batteries and Bindings
  • Scales up

Where I hit its limits

Why, I feel all thin, sort of stretched if you know what I mean: like butter that has been scraped over too much bread. — Bilbo Baggins in “The Lord of the Rings”

  • Dual Syntax: What we teach new users is no longer what we use
  • Ceremony crops in
  • Complexity is on the rise

And how I lost its shackles

You must unlearn what you have learned. — Yoda in “The Empire Strikes Back“

Guile Scheme is the official GNU extension language, used for example in GNU Cash and GNU Guix.

Accompany me on a path beyond Python

Every sufficiently complex appli­ca­tion/langu­age/tool will either have to use Lisp or reinvent it the hard way.Greenspuns 10th rule

As free cultural work, py2guile is licensed under the GPLv3 or later. You are free to share, change, remix and even to resell it as long as you say that it’s from me (attribution) and provide the whole corresponding source under the GPL (sharealike).

For instructions on building the ebook yourself, see the README in the source.

Happy Hacking!

— Arne Babenhauserheide

Gratis py2guile from Freenet

py2guile book

py2guile is a book I wrote about Python and Guile Scheme. It’s selling at 0.01 ฿ | 2.99 € for the ebook and 14.95 € for the printed softcover.

To fight the new german data retention laws, you can get the ebook gratis: Just install Freenet, then the following links work:

Escape total surveillance and get an ebook about the official GNU extension language for free today!

Python chooses Github, therefore I’m releasing the py2guile PDF for free

py2guile book

Python is the first language I loved. I dreamt in Python, I planned in Python, I thought I would never need anything else.

  Download “Python to Guile” (pdf)

You can read more about this on the Mercurial mailing list.

 - Free: html | pdf
   preview edition

Yes, this means that with Guile I will contribute to a language developed via Git, but it won’t be using a proprietary platform.

If you like py2guile, please consider buying the book:

 - Ebook: 2.99 € | 0.01 ฿
   epub | pdf, epub, mobi
 - Softcover: 14.95 €
   with digital companion
 - Source: download
   free licensed under GPL

More information: draketo.de/py2guile

Commentary on Python and Github

Subjective popularity contest without robust data

I was curious why this happened so I read through PEP 0481. It's interesting that Git was chosen to replace Mercurial due to Git's greater popularity, yet a technical comparison was deemed as subjective. In fact, no actual comparison (of any kind) was discussed. What a shame. — Emmanuel Rosa on G+

yes. And the popularity contest wasn’t done in any robust way — they present values between 3x as popular and 18x as popular. That is a crazy margin of error — especially for a value on which to base a very disrupting decision. — my answer

No more dogfooding

Yesterday Python maintainers chose to move to GitHub and Git. Python is now developed using a C-based tool on a Ruby-based, unfree platform. And that changed my view on what’s happening in the community. Python no longer fosters its children and it even stopped dogfooding where its tools are as good as or better than other tools. I don’t think it will die. But I don’t bet on it for the future anymore. — EDIT to my answer on Quora “is Python a dying language?” which originally stated “it’s not dying, it’s maturing”.

Github invades your workflows

The PEP for github hedges somewhat by using github for code but not bug tracker. Not ideal considering BitKeeper, but a full on coup for GitHub. — Martin Owens

that’s something like basic self-defense, but my experience with projects who moved to GitHub is that GitHub soon starts to invade your workflows, changing your cooperation habits. At some point people realize that they can’t work well without GitHub anymore.

Not becoming dependent on GitHub while using it requires constant vigilance. Seeing how Python already switched to Git and GitHub because existing infrastructure wasn’t maintained does not sound like they will be able or willing to put in the work to keep independent. — my answer on G+

Foreboding since 2014

I was already pretty disappointed when I heard that Python is moving to Git. Seeing it choose the proprietary platform is an even sadder choice from my perspective. Two indicators for a breakage in the culture of the project.

For me that’s a reason to leave Python. Though it’s not like I did not get a foreboding of that. It’s why I started learning Guile Scheme in 2013 — and wrote about the experience.

I will still use Python for many practical tasks — it acquired the momentum for that, especially in science (I was a champion for Python in the institute, which is now replacing Matlab and IDL for many people here, and I will be teaching Python starting two weeks from now). I think it will stay strong for many years; a good language to start and a robust base for existing programs. But with what I learned the past years, Python is no longer where I place my bets. — slightly adjusted version of my post on the Mercurial mailing list.

Popularity without robust data instead of quality

(this is taken from a message I wrote to Brett, so I don’t have to say later that I stayed silent while Python went down. I got a nice answer, and despite the disagreement we said a friendly good bye)

Back when I saw that Python might move to git, I silently resigned and stopped caring to some degree. I have seen a few projects move to Git in the past years (and in every project problems remained even years after the switch), so when it came to cPython, the quarrel with git-fans just didn’t feel worthwhile anymore.

Seeing Python choose GitHub with the notion of “git is 3x to 18x more popular than Mercurial and free solutions aren’t better than GitHub” makes me lose my trust in the core development community, though.

PEP 481 states, that it is about the quality of the tooling, but it names the popularity numbers quite prominently: python.org/dev/peps/pep-0481/

If they are not relevant, they shouldn’t be included, but they are included, so they seem to be relevant to the decision. And “the best tooling” is mostly subjective, too — which is shown in the PEP itself which mostly talks about popularity, not quality. It even goes into length about how to avoid many of the features of GitHub.

I’ve seen quite a few projects try to avoid lock-in to GitHub. None succeeded. Not even in one where two of about six active developers were deeply annoyed by GitHub. This is exactly what the scipy part of the PEP describes: lock-in due to group effects.

Finally, using hg-git is by far not seamless. I use it for several projects, and when the repositories become big (as cPython’s is), the overhead of the conversion becomes a major hassle. It works, but native Mercurial would be much more efficient. When pushing takes minutes, you start to think twice about whether you’ll just do the quick fix right now. Not to forget that at some point people start to demand signing of commits in git-style (not possible with hg-git, you can only sign commits mercurial-style) as well as other gitologisms (which have an analogue in Mercurial but aren’t converted by hg-git).

Despite my disappointment, I wish you all the best. Python is a really cool language. It’s the first one I loved and will always stay dear to me, so I’m happy that you work on it — and I hope you keep it up.

So, I think this is goodbye. A bit melancholic, but that’s how that goes.

Good luck to you in your endeavors,
Arne Babenhauserheide

Enough negativity

And that’s enough negativity from me.

Thank you, Brett, for reminding me that even though we might disagree, it’s important to remember that people in the project are hit by negativity much harder than it feels for the one who writes.

For my readers: If that also happened to you one time or the other, please read his article:

How I stay happy making open source software

Thank you, Brett. Despite everything I wrote here, I still think that Python is a great project, and it got many things right — some of which are things which are at least as important as code but much less visible, like having a large, friendly community.

I’m happy that Python exists, and I hope that it keeps going. And where I use programming to make a living, I’m glad when I can do it in Python. Despite all my criticism, I consider Python as the best choice for many tasks, and this is also written in py2guile: almost the the first half of the book talks about the strengths of Python. Essentially I could not criticize Python as strongly as I’m doing it here if I did not like it so much. Keep that in mind when you think about what you read.

Also Brett now published an article where he details his decision to move to GitHub. It is a good read: The history behind the decision to move Python to GitHub — Or, why it took over a year for me to make a decision

For me, Gentoo is about *convenient* choice

It's often said, that Gentoo is all about choice, but that doesn't quite fit what it is for me.

After all, the highest ability to choose is Linux from scratch and I can have any amount of choice in every distribution by just going deep enough (and investing enough time).

What really distinguishes Gentoo for me is that it makes it convenient to choose.

Since we all have a limited time budget, many of us only have real freedom to choose, because we use Gentoo which makes it possible to choose with the distribution-tools. Therefore only calling it “choice” doesn't ring true in general - it misses the reason, why we can choose.

So what Gentoo gives me is not just choice, but convenient choice.

Some examples to illustrate the point:

KDE 4 without qt3

I recently rebuilt my system after deciding to switch my disk layout (away from reiserfs towards a simple ext3 with reiser4 for the portage tree). When doing so I decided to try to use a "pure" KDE 4 - that means, a KDE 4 without any remains from KDE3 or qt3.

To use kde without any qt3 applications, I just had to put "-qt3" and "-qt3support" into my useflags in /etc/make.conf and "emerge -uDN world" (and solve any arising conflicts).

Imagine doing the same with a (K)Ubuntu...

Emacs support

Similarly to enable emacs support on my GentooXO (for all programs which can have emacs support), I just had to add the "emacs" useflag and "emerge -uDN world".

Selecting which licenses to use

Just add


to your /etc/make.conf to make sure you only get software under licenses which are approved by the FSF.

For only free licenses (regardless of the approved state) you can use:


All others get marked as masked by license. Default (no ACCEPT_LICENSE in /etc/make.conf) is “* -@EULA”: No unfree software. You can check your setting via emerge --info | grep ACCEPT_LICENSE. More information…

One program (suite) in testing, but the main system rock stable

Another part where choosing is made convenient in Gentoo are testing and unstable programs.

I remember my pain with a Kubuntu, where I wanted to use the most recent version of Amarok. I either had to add a dedicated Amarok-only testing repository (which I'd need for every single testing program), or I had to switch my whole system into testing. I did the latter and my graphical package manager ceased to work. Just imagine how quickly I ran back to Gentoo.

And then have a look at the ease of deciding to take one package into testing in Gentoo:

  • emerge --autounmask-write =cathegory/package-version
  • etc-update
  • emerge =cathegory/package-version

EDIT: Once I had a note here “It would be nice to be able to just add the missing dependencies with one call”. This is now possible with --autounmask-write.

And for some special parts (like KDE 4) I can easily say something like

  • ln -s /usr/portage/local/layman/kde-testing/Documentation/package.keywords/kde-4.3.keywords /etc/portage/package.keywords/kde-4.3.keywords

(I don't have the kde-testing overlay on my GentooXO, where I write this post, so the exact command might vary slightly)

Closing remarks

So to finish this post: For me, Gentoo is not only about choice. It is about convenient choice.

And that means: Gentoo gives everybody the power to choose.

I hope you enjoy it as I do!

Automatic updates in Gentoo GNU/Linux

Update 2016: I nowadays just use emerge --sync; emerge @security

To keep my Gentoo up to date, I use daily and weekly update scripts which also always run revdep-rebuild after the saturday night update :)

My daily update is via pkgcore to pull in all important security updates:

pmerge @glsa

That pulls in the Gentoo Linux Security Advisories - important updates with mostly short compile time. (You need pkgcore for that: "emerge pkgcore")

Also I use two cron scripts.

Note: It might be useful to add the lafilefixer to these scripts (source).

The following is my daily update (in /etc/cron.daily/update_glsa_programs.cron )

Daily Cron

\#! /bin/sh

\### Update the portage tree and the glsa packages via pkgcore

\# spew a status message
echo $(date) "start to update GLSA" >> /tmp/cron-update.log

\# Sync only portage
pmaint sync /usr/portage

\# security relevant programs
pmerge -uDN @glsa > /tmp/cron-update-pkgcore-last.log || cat \
    /tmp/cron-update-pkgcore-last.log >> /tmp/cron-update.log  

\# And keep everything working

\# Finally update all configs which can be updated automatically
cfg-update -au

echo $(date) "finished updating GLSA" >> /tmp/cron-update.log

And here's my weekly cron - executed every saturday night (in /etc/cron.weekly/update_installed_programs.cron ):

Weekly Cron


\### Update my computer using pgkcore, 
\### since that also works if some dependencies couldn't be resolved.

\# Sync all overlays

\## First use pkgcore
\# security relevant programs (with build-time dependencies (-B))
pmerge -BuD @glsa

\# system, world and all the rest
pmerge -BuD @system
pmerge -BuD @world
pmerge -BuD @installed

\# Then use portage for packages pkgcore misses (inlcuding overlays) 
\# and for *EMERGE_DEFAULT_OPTS="--keep-going"* in make.conf 
emerge -uD @security
emerge -uD @system
emerge -uD @world
emerge -uD @installed

\# And keep everything working
emerge @preserved-rebuild

\# Finally update all configs which can be updated automatically
cfg-update -au

pkgcore vs. eix → pix (find packages in Gentoo)

For a long time it bugged me, that eix uses a seperate database which I need to keep up to date. But no longer: With pkgcore as fast as it is today, I set up pquery to replace eix.

The result is pix:

alias pix='pquery --raw -nv --attr=keywords'

(put the above in your ~/.bashrc)

The output looks like this:

$ pix pkgcore
 * sys-apps/pkgcore
    repo: gentoo
    description: pkgcore package manager
    homepage: http://www.pkgcore.org
    keywords: ~alpha ~amd64 ~arm ~hppa ~ia64 ~ppc ~ppc64 ~s390 ~sh ~sparc ~x86

It’s still a bit slower than eix, but it operates directly on the portage tree and my overlays — and I no longer have to use eix-sync for syncing my overlays, just to make sure eix is updated.

Some other treats of pkgcore

Aside from pquery, pkgcore also offers pmerge to install packages (almost the same syntax as emerge) and pmaint for synchronizing and other maintenance stuff.

From my experience, pmerge is hellishly fast for simple installs like pmerge kde-misc/pyrad, but it sometimes breaks with world updates. In that case I just fall back on portage. Both are Python, so when you have one, adding the other is very cheap (spacewise).

Also pmerge has the nice pmerge @glsa feature: Get Gentoo Linux security updates. Due to it’s almost unreal speed (compared to portage) checking for security updates now doesn’t hurt anymore.

$ time pmerge -p @glsa
 * Resolving...
Nothing to merge.

real    0m1.863s
user    0m1.463s
sys     0m0.100s

It differs from portage in that you call world as set explicitely — either via a command like pmerge -aus world or via pmerge -au @world.

pmaint on the other hand is my new overlay and tree synchronizer. Just call pmaint sync to sync all, or pmaint sync /usr/portage to sync only the given overlay (in this case the portage tree).


Using pix as replacement of eix isn’t yet perfect. You might hit some of the following:

  • pix always shows all packages in the tree and the overlays. The keywords are only valid for the highest version, though. marienz from #pkgcore on irc.freenode.net is working on fixing that.

  • If you only want to see the packages which you can install right away, just use pquery -nv. pix is intended to mimik eix as closely as possible, so I don’t have to change my habits ;) If it doesn’t fit your needs, just change the alias.

  • To search only in your installed packages, you can use pquery --vdb -nv.

  • Sometimes pquery might miss something in very broken overlay setups (like my very grown one). In that case, please report the error in the bugtracker or at #pkgcore on irc.freenode.net:

    23:27 <marienz> if they're reported on irc they're probably either fixed pretty quickly or they're forgotten
    23:27 <marienz> if they're reported in the tracker they're harder to forget but it may take longer before they're noticed

I hope my text helps you in changing your Gentoo system further towards the system which fits you best!

No, it ain’t “forever” (GNU Hurd code_swarm from 1991 to 2010)

If the video doesn’t show, you can also download it as Ogg Theora & Vorbis “.ogv” or find it on youtube.

This video shows the activity of the Hurd coders and answers some common questions about the Hurd, including “How stagnated is Hurd compared to Duke Nukem Forever?”. It is created directly from commits to Hurd repositories, processed by community codeswarm.

Every shimmering dot is a change to a file. These dots align around the coder who did the change. The questions and answers are quotes from todays IRC discussions (2010-07-13) in #hurd at irc.freenode.net.

You can clearly see the influx of developers in 2003/2004 and then again a strenthening of the development in 2008 with less participants but higher activity than 2003 (though a part of that change likely comes from the switch to git with generally more but smaller commits).

I hope you enjoyed the high-level look on the activity of the Hurd project!

PS: The last part is only the information title with music to honor Sean Wright for allowing everyone to use and adapt his Album Enchanted.

Some technical advantages of the Hurd

→ An answer to just accept it, truth hurds, where Flameeyes told his reasons for not liking the Hurd and asked for technical advantages (and claimed, that the Hurd does not offer a concept which got incorporated into other free software, contributing to other projects). Note: These are the points I see. Very likely there are more technical advantages which I don’t see well enough to explain them.

The translator system in the Hurd is a simple concept that makes many tasks easy, which are complex with Linux (like init, network transparency, new filesystems, …). Additionally there are capabilities (give programs only the access they need - adjusted at runtime), subhurds and (academic) memory management.

Information for potential testers: The Hurd is already usable, but it is not yet in production state. It progressed a lot during the recent years, though. Have a look at the status report if you want to see if it’s already interesting for you. See running the Hurd for testing it yourself.

Table of Contents:

Influence on other systems: FUSE in Linux and limited translators in NetBSD

Firstoff: FUSE is essentially an implementation of parts of the translator system (which is the main building block of the Hurd) to Linux, and NetBSD recently got a port of the translators system of the Hurd. That’s the main contribution to other projects that I see.

translator-based filesystem

On the bare technical side, the translator-based filesystem stands out: The filesystem allows for making arbitrary programs responsible for displaying a given node (which can also be a directory tree) and to start these programs on demand. To make them persistent over reboots, you only need to add them to the filesystem node (for which you need the right to change that node). Also you can start translators on any node without having to change the node itself, but then they are not persistent and only affect your view of the filesystem without affecting other users. These translators are called active, and you don’t need write permissions on a node to add them.

network transparency on the filesystem level

The filesystem implements stuff like Gnome VFS (gvfs) and KDE network transparency on the filesystem level, so those are available for all programs. And you can add a new filesystem as simple user, just as if you’d write into a file “instead of this node, show the filesystem you get by interpreting file X with filesystem Y” (this is what you actually do when setting a translator but not yet starting it (passive translator)).

One practical advantage of this is that the following works:

settrans -a ftp\: /hurd/hostmux /hurd/ftpfs /
dpkg -i ftp://ftp.gnu.org/path/to/*.deb

This installs all deb-packages in the folder path/to on the FTP server. The shell sees normal directories (beginning with the directory “ftp:”), so shell expressions just work.

You could even define a Gentoo mirror translator (settrans mirror\: /hurd/gentoo-mirror), so every program could just access mirror://gentoo/portage-2.2.0_alpha31.tar.bz2 and get the data from a mirror automatically: wget mirror://gentoo/portage-2.2.0_alpha31.tar.bz2

unionmount as user

Or you could add a unionmount translator to root which makes writes happen at another place. Every user is able to make a readonly system readwrite by just specifying where the writes should go. But the writes only affect his view of the filesystem.

persistent translators, started when needed

Starting a network process is done by a translator, too: The first time something accesses the network card, the network translator starts up and actually provides the device. This replaces most initscripts in the Hurd: Just add a translator to a node, and the service will persist over restarts.

It’s a surprisingly simple concept, which reduces the complexity of many basic tasks needed for desktop systems.

And at its most basic level, Hurd is a set of protocols for messages which allow using the filesystem to coordinate and connect processes (along with helper libraries to make that easy).

add permissions at runtime (capabilities)

Also it adds POSIX compatibility to Mach while still providing access to the capabilities-based access rights underneath, if you need them: You can give a process permissions at runtime and take them away at will. For example you can start all programs without permission to use the network (or write to any file) and add the permissions when you need them.

Different from Linux, you do not need to start privileged and drop permissions you do not need (goverened by the program which is run), but you start as unprivileged process and add the permissions you need (governed by an external process):

groups # → root
addauth -p $(ps -L) -g mail
groups # → root mail 

lightweight virtualization

And then there are subhurds (essentially lightweight virtualization which allows cutting off processes from other processes without the overhead of creating a virtual machine for each process). But that’s an entire post of its own…

Easy to test lowlevel hacking

And the fact that a translator is just a simple standalone program means that these can be shared and tested much more easily, opening up completely new options for lowlevel hacking, because it massively lowers the barrier of entry.

For example the current Hurd can use the Linux network device drivers and run them in userspace (via DDE), so you can simply restart them and a crashing driver won’t bring down your system.

subdividing memory management

And then there is the possibility of subdividing memory management and using different microkernels (by porting the Hurd layer, as partly done in the NetBSD port), but that is purely academic right now (search for Viengoos to see what its about).


So in short:

The translator system in the Hurd is a simple concept that makes many tasks easy, which are complex with Linux (like init, network transparency, new filesystems, …). Additionally there are capabilities (give programs only the access they need - adjusted at runtime), subhurds and (academic) memory management.

Best wishes,

PS: I decided to read flameeyes’ post as “please give me technical reasons to dispell my emotional impression”.

PPS: If you liked this post, it would be cool if you’d flattr it: Flattr this

PPPS: Additional information can be found in Gaël Le Mignot’s talk notes, in niches for the Hurd and the GNU Hurd documentation pages.

P4S: This post is also available in the Hurd Staging Wiki.

P5S: As an update in 2015: A pretty interesting development I saw in the past few years is that the systemd developers have been bolting features onto Linux which the Hurd already provided 15 years ago. Examples: socket-activation provides on-demand startup like passive translators, but as crude hack piggybacked on dbus which can only be used by dbus-aware programs while passive translators can be used by any program which can access the filesystem, calling priviledged programs via systemd provides jailed priviledge escalation like adding capabilities at runtime, but as crude hack piggybacked on dbus and specialized services.

That means, there is a need for the features of the Hurd, but instead of just using the Hurd, where they are cleanly integrated, these features are bolted onto a system where they do not fit and suffer from bad performance due to requiring lots of unnecessary cruft to circumvent limitations of the base system. The clean solution would be to just set 2-3 full-time developers onto the task of resolving the last few blockers (mainly sound and USB) and then just using the Hurd.

(A)GPL as hack on a Python-powered copyright system

AGPL is a hack on copyright, so it has to use copyright, else it would not compile/run.

All the GPL licenses are a hack on copyright. They insert a piece of legal code into copyright law to force it to turn around on itself.

You run that on the copyright system, and it gives you code which can’t be made unfree.

To be able to do that, it has to be written in copyright language (else it could not be interpreted).

my_code = "<your code>"

def AGPL ( code ): 
    >>> is_free ( AGPL ( code ) )
    return eval (
        transform_to_free ( code ) )

copyright ( AGPL ( my_code ) )

You pass “AGPL ( code )” to the copyright system, and it ensures the freedom of the code.

The transformation means that I am allowed to change your code, as long as I keep the transformation, because copyright law sees only the version transformed by AGPL, and that stays valid.

Naturally both AGPL definition and the code transformed to free © must be ©-compatible. And that means: All rights reserved. Else I could go in and say: I just redefine AGPL and make your code unfree without ever touching the code itself (which is initially owned by you by the laws of ©):

def AGPL ( code ): 
    >>> is_free ( AGPL ( code ) )
    return eval (
        transform_to_mine ( code ) )

In this Python-powered copyright-system, I could just define this after your definition but before your call to copyright(), and all calls to APGL ( code ) would suddenly return code owned by me.

Or you would have to include another way of defining which exact AGPL you mean. Something like “AGPL, but only the versions with the sha1 hashes AAAA BBBB and AABA”. cc tries to use links for that, but what do you do if someone changes the DNS resolution to point creativecommons.org to allmine.com? Whose DNS server is right, then - legally speaking?

In short: AGPL is a hack on copyright, so it has to use copyright, else it would not compile/run.

Are there 10x programmers?

→ An answer I wrote to this question on Quora.

Software Engineering: What is the truth of 10x programmers?
Do they really exist?…

Let’s answer the other way round: I once had to take heavy anti-histamines for three weeks. My mind was horribly hazy from that, and I felt awake only about two hours per day. However I spent every day working on a matrix multiplication problem.

It was three weeks of failure, because I just could not grasp the problem. I was unable to hold it in my mind.

Then I could finally drop the anti-histamine.

On the first day I solved the problem on my way to buy groceries. On the second day I planned the implementation while walking for two hours . On the third day I finished the program.

This taught me to accept it when people don’t manage to understand things I understand: I know that the brain can actually have different paces and that complexity which feels easy to me might feel infeasible for others. It sure did feel that way to me while I took the anti-histamines.

It also taught me to be humble: There might be people to whom my current state of mind feels like taking anti-histamines felt to me. I won’t be able to even grasp the patterns they see, because they can use another level of complexity.

To get a grasp of the impact, I ask myself a question: How would an alien solve problems who can easily keep 100 things in its mind — instead of the 4 to 7 which is the rough limit for humans?

BY-SA and GPL: creativecommons closed the chasm in the sharealike/copyleft community

This is the biggest news item for free culture and free software in the past 5 years: The creativecommons attribution sharealike license is now one-way compatible to the GPL — see the message from creativecommons and from the Free Software Foundation.

Some license compatibility legalese might sound small, but the impact of this is hard to overestimate.

(I’ll now revise some of my texts about licensing — CC BY-SA got a major boost in utility because it not longer excludes usage in copyleft documents which need the source to have a defended sharealike clause)

Communicating your project: honest marketing for free software projects

You have an awesome project, but you see people reach for inferior tools? There are people using your project, but you can’t reach the ones you care about? Read on for a way to ensure that your communication doesn’t ruin your prospects but instead helps your project to shine.

Communicating your project is an essential step for getting the users you want. Here I summarize my experience from working on several different projects including KDE (where I learned the basics of PR - yay, sebas!), the Hurd (where I could really make a difference by improving the frontpage and writing the Month of the Hurd), Mercurial (where I practiced minimally invasive PR) and 1d6 (my own free RPG where I see how much harder it is to do PR, if the project to communicate is your own).

Since voicing the claim that marketing is important often leads to discussions with people who hate marketing of any kind, I added an appendix with an example which illustrates nicely what happens when you don’t do any PR - and what happens if you do PR of the wrong kind.

If you’re pressed for time and want the really short form, just jump to the questionnaire.

What is good marketing?

Before we jump directly to the guide, there is an important term to define: Good marketing. That is the kind of marketing, we want to do.

The definition I use here is this:

Good marketing ensures that the people to whom a project would be useful learn about the project.


Good marketing starts with the existing strengths of a project and finds people to whom these strengths are useful.

Thus good marketing does not try to interfere with the greater plan of the project, though it might identify some points where a little effort can make the project much more interesting to users. Instead it finds users to whom the project as it is can be useful - and ensures that these know about the project.

Be fair to competitors, be honest to users, put the project goals before generic marketing considerations.

As such, good marketing is an interface between the project and its (potential) users.

How to communicate your project?

This guide depends on one condition: Your project already has at least one area in which it excels over other projects. If that isn’t the case, please start by making your project useful to at least some people.

The basic way for communicating your project to its potential users always follows the same steps.

To make this text easier to follow, I’ll intersperse it with examples from the latest project where I did this analysis: GNU Guile: The GNU Ubiquitous Intelligent Language for Extensions. Guile provides a nice example, because its mission is clearly established in its name and it has lots of backing, but up until our discussion actually had a wikipedia-page which was unappealing to the point of being hostile against Guile itself.

To improve the communication of our project, we first identify our target groups.

Who are our Target Groups?

To do so, we begin by asking ourselves, who would profit from our project:

  • What can we do well and how do we compare to others?
  • To whom would we already be useful or interesting if people knew about our strengths?
  • To whom are we already the best option?

Try to find about 3 groups of people and give them names which identify them. Those are the people we must reach to grow on the short term.

In the next step, we ask ourselves, whom we want or need as users to fullfill our mission (our long-term goal):

  • Where do we want to get? What is our goal? (do we have a mission statement?)
  • Whom do we need to get there?
  • Whom do we want as users? Those shape us as they take part in the development - either as users or as fellow developers.

Again try to find about 3 groups of people and give them names which identify them. Those are the people we must reach to achieve our longterm goal. If while writing this down you find that one of the already identified groups which we could reach would actually detract us from our goal, mark them. If they aren’t direly needed, we would do best to avoid targeting them in our communication, because they will hinder us in our longterm progress: They could become a liability which we cannot get rid of again.

Now we have about 6 target groups: Those are the people who should know about our project, either because they would benefit from it for pursuing their goals, or because we need to reach them to achieve our own goals. We now need to find out, which kind of information they actually need or search.

Example: Target Groups for Guile

GNU Guile is called The GNU Ubiquitous Intelligent Language for Extensions. So its mission is clear: Guile wants to become the de-facto standard language for extending programs - at least within the GNU project.

For whom are we already useful or interesting? Name them as Target-Groups.
  1. Schemer: Wants to see what GNU Scheme can do.
  2. Extender: GNU enthusiast wants to extend an existing program with a scripting language.
  3. Learner: Free Software enthusiast thinks about using Guile to learn programming
  4. Project-Starter: Experienced Programmer wants to start a new project.
  5. 1337: Programmer wants the coolness-factor.
  6. Emacser: Emacs users want to see what the potential future of Emacs would hold.
Whom do we want as users on the long run? Name them as Target-Groups.
  1. GNU-folk: All GNU developers.

What could they ask?

This part just requires thinking ourselves into the role of each of the target groups. For each of the target groups, ask yourself:

What would you want to know, if you were to read about our project?

As result of this step, we have a set of answers. Judge them on their strengths: Would these answers make you want to invest time to test our project? If not, can we find a better answer?

Example: Questions for the Target-Groups of Guile

  1. Schemer: What can guile do better than other Schemes?
  2. Extender: What does Guile offer my program? Why Guile and not Python/Lua?
  3. Learner: How easy and how powerful is Guile Scheme? Why Guile and not Python?
  4. Starter: What’s the advantage of starting my advanced project with guile?
  5. 1337: Why is guile cool?
  6. Emacser: What does Guile offer for Emacs?
  7. GNU-folk: What does Guile offer my program? (Being a GNU package is a distinct advantage, so there is less competition by non-GNU languages)

Whose wishes can we fullfill?

If our answers for a given group are not yet strong enough, we cannot yet communicate our project convincingly to them. In that case it is best to postpone reaching out to that group, otherwise they could get a lasting weak image of our project which would make it harder to reach them when we have stronger answers at some point in the future.

Remove all groups whose wishes we cannot yet fullfill, or for whom we do not see ourselves as the best choice.

Example: Chosen Target-Groups

  1. Schemer: Guile is a solid implementation of Scheme. For a comparison, see An opinionated Guide to Scheme implementations.
  2. Extender: The guile manual offers a nicely detailed guide for extending a program with Guile. We’re a bit weak on the examples and existing extensions, though, especially on non-GNU-plattforms.
  3. Learner: There aren’t yet tutorials for learning to program in Guile, though there are tutorials for learning to write scheme - and even one for understanding Scheme from the view of a Python-user. But our project resources cannot yet support people who cannot program at all well enough, so we have to restrict ourselves to programmers who want to learn a new language.
  4. Starter: Guile has solid support for many unix-specific things, but it is not yet a complete project-publishing solution. So we have to restrict ourselves to targeting people who want to start a project which is mainly intended to be used in environments with proper package management (mostly GNU/Linux).
  5. 1337: Guile is explicitely named in the GNU Coding Standards. It doesn’t get much cooler than that - at least for a certain idea of cool. We can’t get the Java-1337s, but we can get the Free Software-1337s.
  6. Emacser: Guile provides foreign-function-call. If guile gets used as base for Emacs, Emacs users get direct access to all scheme functions, too - as well as real threading. And that’s pretty strong. Also Geiser provides solid Guile Scheme support in Emacs.
  7. GNU-folk: They are either extenders or project starters or learners, but additionally they want to know in which GNU projects they can use Guile.

Provide those answers!

Now we have answers for the target groups. When we now talk or write about our project, we should keep those target groups in mind.

You can make that arbitrarily complex, for example by trying to find out which of our target groups use which medium. But lets keep it simple:

Ensure that our website (and potentially existing wikipedia page) includes the information which matters to our target groups. Just take all the answers for all the target groups we can already reach and check whether the basic information contained in them is given on the front page of our website.

And if not, find ways to add it.

As next steps, we can make sure that the questions we found for the target groups not only get answered, but directly lead the target groups to actions: For example to start using our project.

Example: The new Wikipedia-Page of Guile

For Guile, we used this analysis to fix the Wikipedia-Page. The old-version mainly talked about history and weaknesses (to the point of sounding hostile towards Guile), and aside from the latest release number, it was horribly outdated. And it did not provide the information our target groups required.

The current Wikipedia-Page of GNU Guile works much better - for the project as well as for the readers of the page. Just compare them directly and you’ll see quite a difference. But aside from sounding nicer, the new site also addresses the questions of our target groups. To check that, we now ask: Did we include information for all the potential user-groups?

  1. Schemers: Yepp (it’s scheme and there’s a section on Guile Scheme
  2. Extenders: Yepp (libguile)
  3. Learners: Not yet. We might need a syntax-section with some examples. But wikipedians do not like Howto-Like sections. Also the interpreter should get a notice.
  4. Project-Starters: Partly in the “core idea”-part in the section Guile Scheme. It might need one more paragraph showing advantages of Guile which make it especially suited for that.
  5. 1337s: It is the preferred extension system for the GNU Project. If you’re not that kind of 1337: The Macro-System is hygienic (no surprising side-effects).
  6. Emacs users: They got their own section.
  7. GNU-Folk: They have a section on Guile in make. We should add a list of GNU projects with Guile support.

So there you go: Not perfect, but most of the groups are covered. And this also ensures that the Wikipedia-page is more interesting to its readers: A clear win-win.

Further points

Additional points which we should keep in mind:

  • On the website, do all of our target groups quickly find their way to advanced information about their questions? This is essential to keep the ones interested who aren’t completely taken by the short answers.
  • What is a common negative misconception about our project? We need to ensure that we do not write anything which strengthens this misconception. Is there an existing strength, which we can show to counter the negative misconception?
  • Where do we want to go? Do we have a mission statement?

bab-com q: Arne Babenhauserheide’s Project Communication Questionaire

  • For whom are we already useful or interesting? Name them as Target-Groups.

    • (1)
    • (2)
    • (3)
  • Whom do we want as users on the long run? Name them as Target-Groups.

    • (4)
    • (5)
    • (6)
  • What could the Target-Groups ask? What are their needs? Formulate them as questions.
    • (1)
    • (2)
    • (3)
    • (4)
    • (5)
    • (6)
  • Answer their questions.
    • (1)
    • (2)
    • (3)
    • (4)
    • (5)
    • (6)
  • Whose needs can we already fulfill well? For whom do we see ourselves as the best choice?
    • (1)
    • (2)
    • (3)
    • (4)
  • Ensure that our communication includes the answers to these questions (i.e. website, wikipedia page, talks, …), at least for the groups who are likely to use the medium on which we communicate!

Use bab-com to avoid bad-com ☺ - yes, I know this phrase is horrible, but it is catchy and that fits this article: you need catchy things

Note: The mission statement

The mission statement is a short paragraph in which a project defines its goal.

A good example is:

Our mission is to create a general-purpose kernel suitable for the GNU operating system, which is viable for everyday use, and gives users and programs as much control over their computing environment as possible.GNU Hurd mission explained

Another example again comes from Guile:

Guile was conceived by the GNU Project following the fantastic success of Emacs Lisp as an extension language within Emacs. Just as Emacs Lisp allowed complete and unanticipated applications to be written within the Emacs environment, the idea was that Guile should do the same for other GNU Project applications. This remains true today.Guile and the GNU project

Closely tied to the mission statement is the slogan: A catch-phrase which helps anchoring the gist of your project in your readers mind. Guile does not have that, yet, but judging from its strengths, the following could work quite well for Guile 2.0 - though it falls short of Guile in general:

GNU Guile scripting: Use Guile Scheme, reuse anything.


We saw why it is essential to communicate the project to the outside, and we discussed a simple structure to check whether our way of communication actually fits our projects strengths and goals.

Finding the communication strategy actually boils down to 3 steps:

  • Target those who would profit from our project or whom we need.
  • Check what they need to know.
  • Answer that.

Also a clear mission statement, slogan and project description help to make the project more tangible for readers. In this context, good marketing means to ensure that the right people learn about the real strengths of the project.

With that I’ll conclude this guide. Have fun and happy hacking!
— Arne Babenhauserheide

Appendix: Why communicating your project?

In free software we often think that quality is a guarantee for success. But in just the 10 years I’ve been using free software nowadays, I saw my share of technically great projects succumbing to inferior projects which simply reached more people and used that to build a dynamic which greatly outpaced the technically better product.

One example for that are pkgcore and paludis. When portage, the package manager of Gentoo, grew too slow because it did ever more extensive tests, two teams set out to build a replacement.

One of the teams decided that the fault of the low performance lay in Python, the language used by portage. That team built a package manager in C++ and had --wonderfully-long-command-options without shortcuts (have fun typing), and you actually had to run it twice: Once to see what would get installed and then again to actually install it (while portage had had an --ask option for ages, with -a as shortcut). And it forgot all the work it had done in the previous run, so you could wait twice as long for the result. They also had wonderful latin names, and they managed the feat of being even slower than portage, despite being written in C++. So their claim that C++ would be magically faster than python was simply wrong (because they skipped analyzing the real performance bottlenecks). They called their program paludis.

Note: Nowadays paludis has a completely new commandline interface which actually supports short command options. That interface is called cave and looks sane.

The other team did a performance analysis and realized that the low performance actually lay with the filesystem: The portage tree, which holds the required information, contains about 30,000 ebuilds and almost 200,000 files in total, and portage accessed far more of those files than actually needed for resolving the dependencies needed to install the package. They picked python as their language - just like portage. They used almost the same commandline options as portage, except for the places where functionality differed. And they actually got orders of magnitude faster than portage - so fast that their search command often finished after less than a second while. portage took over 10 seconds. They called their program pkgcore.

Both had more exact resolution of packages and could break cyclic dependencies and so on.

So, judging from my account of the quality, which project would you expect to succeed?

I sure expected pkgcore to replace portage within a few months. But this is not what happened. And as I see it in hindsight, the difference lay purely in PR.

The paludis team with their slow and hard-to-use program went all over the Gentoo forums claiming that Python is a horrible language and that a C program will kick portage any time. On their website they repeated their attacks against python and claimed superiority at every step. And they gathered quite a few zealots. While actually being slower than portage. Eventually they rebranded paludis as just better and more correct, not faster. And they created their own distribution (exherbo) as direct rival of Gentoo. With a new, portage-incompatible package format. As if they had read the book, how not to be a friendly competitor.

The pkgcore team on the other hand focussed on good technology. They created the snakeoil library for high-performance python code, but they were friendly about it and actually contributed back to portage where code could be shared. But their website was out of date, often not noting the newest release and you actually had to run pmerge --help to see the most current commandline options (though you could simply guess them if you knew portage). And they got attacked by paludis zealots so much, that this year the main developer finally sacked the project: He told me on IRC that he had taken so much vitriol over the years that it simply wasn’t worth the cost anymore.

Update: About a year later someone else took over. Good code often survives the loss of its creator.

So, what can we learn from this? Technical superiority does not gain you anything, if you fail to convince people to actually use your project.

If you don't communicate your project, you don't get users. If you don’t get users, your chances of losing motivation are orders of magnitude higher than if you get users who support you.

And aggressive marketing works, even if you cannot actually deliver on your promises. Today they have a better user-interface and even short option-names. But even to date, exherbo has much fewer packages in its repositories than Gentoo. If the number of files is any measure, the 10,000 files in their special repositories are just about 5% of the almost 200,000 files portage holds. But they managed quite well to fraction the Gentoo users - at least for some time. And their repeated pushes for new standards in the portage tree (EAPIs) created a constant pressure on pkgcore to adapt, which had the effect that nowadays pkgcore cannot install from the portage tree anymore (the search still works, though, and I still use it - I will curse mightily on the day they manage to also break that).

Update: Someone else took over and now pkgcore can install again.

So aggressive marketing and doing everything in the book of unfriendly competition might have allowed the paludis devs to gather some users and destroy the momentum of pkgcore, but it did not allow them to actually become a replacement of portage within Gentoo. Their behaviour alienated far too many people for that. So aggressive and unfriendly marketing is better than no marketing, but it has severe drawbacks which you will likely want to avoid.

If you use overly aggressive, unfriendly or dishonest communication tactics, you get some users, but if your users know their stuff, you won’t win the mindshare you need to actually make a difference.

If on the other hand you want to see communication done right, just take a look at KDE and Gnome nowadays. They cooperate quite well, and they compete on features and by improving their project so users can take an informed choice about the project they choose.

And their number of contributors steadily keeps growing.

So what do they do? Besides being technically great, it boils down to good marketing.

Conveniently merge a NEWS file without conflicts

Writing a NEWS file (a list of changes per version, targeted at end-users) significantly reduces the effort for doing a release: To write your release notes, just copy the latest entries from the NEWS file into a message. It is one of the gems in the GNU coding standards: Simple yet extremely useful. (For a detailed realization, refer to the Perl Specification for CPAN Changes files.)

However when you’re developing features in parallel, for example by using a pull-request workflow and requiring contributors to update the NEWS file, you will often run into merge conflicts. Resolving these takes time, though the resolution is trivial: Just use the lines from both heads.

To resolve the problem, you can set your version tracking system to use union-merge for NEWS files.

Table of Contents


echo "
# avoid bogus conflicts in NEWS files
NEWS = internal:union
" >> .hg/hgrc

(necessary for each contributor to avoid surprising users)


echo "/NEWS merge=union" >> .gitattributes
git add .gitattributes
git commit -m "union-merge NEWS" .gitattributes

(committed, so it sticks, but might mislead contributors into missing genuine conflicts, because a contibutor does not necessarily know about the setting)

Download one page from a website with all its prerequisites

Often I want to simply backup a single page from a website. Until now I always had half-working solutions, but today I found one solution using wget which works really well, and I decided to document it here. That way I won’t have to search it again, and you, dear readers, can benefit from it, too ☺

Update 2020: You can also use the copyweb-script from pyFreenet: copyweb -d TARGET_FOLDER URL
Install via pip3 install --user pyFreenet3.

In short: This is the command:

wget --no-parent --timestamping --convert-links --page-requisites --no-directories --no-host-directories --span-hosts --adjust-extension --no-check-certificate -e robots=off -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20070802 SeaMonkey/1.1.4' [URL]

Optionally add --directory-prefix=[target-folder-name]

(see the meaning of the options and getting wget for some explanation)

That’s it! Have fun copying single sites! (but before passing them on, ensure that you have the right to do it)

Does this really work?

As a test, how about running this:

wget -np -N -k -p -nd -nH -H -E --no-check-certificate -e robots=off -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20070802 SeaMonkey/1.1.4' --directory-prefix=download-web-site http://draketo.de/english/download-web-page-with-all-prerequisites

(this command uses the short forms of the options)

Then test the downloaded page with firefox:

firefox download-web-site/download-web-page-all-prerequisites.html

Getting wget

If you run GNU/Linux, you likely already have it - and if not, then your package manager has it. GNU wget is one of the standard tools available everywhere.

Some information in the (sadly) typically terse style can be found on the wget website from the GNU project: gnu.org/s/wget.

In case you run Windows, have a look at Wget for Windows from the gnuwin32 project or at GNU Wgetw for Windows from eternallybored.

Alternatively you can get a graphical interface via WinWGet from cybershade.

Or you can get serious about having good tools and install MSYS or Cygwin - the latter gets you some of the functionality of a unix working environment on windows, including wget.

If you run MacOSX, either get wget via fink, homebrew or MacPorts or follow the guide from osxdaily or the german guide from dirk (likely there are more guides - these two were just the first hits in google).

The meaning of the options (and why you need them):

  • --no-parent: Only get this file, not other articles higher up in the filesystem hierarchy.
  • --timestamping: Only get newer files (don’t redownload files).
  • --page-requisites: Get all files needed to display this page.
  • --convert-links: Change files to point to the local files you downloaded.
  • --no-directories: Do not create directories: Put all files into one folder.
  • --no-host-directories: Do not create separate directories per web host: Really put all files in one folder.
  • --span-hosts: Get files from any host, not just the one with which you reached the website.
  • --adjust-extension: Add a .html extension to the file.
  • --no-check-certificate: Do not check SSL certificates. This is necessary if you’re missing one of the host certificates one of the hosts uses. Just use this. If people with enough power to snoop on your browsing would want to serve you a changed website, they could simply use one of the fake certifications authorities they control.
  • -e robots=off: Ignore robots.txt files which tell you to not spider and save this website. You are no robot, but wget does not know that, so you have to tell it.
  • -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20070802 SeaMonkey/1.1.4': Fake being an old Firefox to avoid blocking based on being wget.
  • --directory-prefix=[target-folder-name]: Save the files into a subfolder to avoid having to create the folder first. Without that options, all files are created in the folder in which your shell is at the moment. Equivalent to mkdir [target-folder-name]; cd [target-folder-name]; [wget without --directory-prefix]


If you know the required options, mirroring single pages from websites with wget is fast and easy.

Note that if you want to get the whole website, you can just replace --no-parent with --mirror.

Happy Hacking!

Elegant commandline argument parsing on the shell

Parsing command line arguments on the shell is often done in an ad-hoc fashion, growing unwieldy as time goes by, but there are tools to make that elegant. Here’s a complete example.

I use this in the conf project (easy setup of autotools projects). It builds on the great solution by Adam Katz.

# outer loop to allow processing option arguments at the end
while test ! $# -eq 0; do
    # getopts loop, here you define the short options: 
    # h for -h, l: for -l <lang>. -: provides support for long-options.
    while getopts -- hl:-: arg "$@"; do
        case $arg in
            h ) ARG_HELP=true ;;
            l ) ARG_LANG="$OPTARG" ;;
            - ) LONG_OPTARG="${OPTARG#*=}"
                case "$OPTARG" in
                    help    ) ARG_HELP=true;;
                    lang=?* ) ARG_LANG="$LONG_OPTARG" ;;
                    # FIXME: using the same option twice (either both
                    # after the argument or both before it) gives the
                    # first, not the second value
                    lang*   ) ARG_LANG="${@:$OPTIND:1}" ; OPTIND=$(($OPTIND + 1));;
                    vcs=?*  ) ARG_VCS="$LONG_OPTARG" ;;
                    vcs*    ) ARG_VCS="${@:$OPTIND:1}" ; OPTIND=$(($OPTIND + 1));;
                    '' )      break ;; # "--" terminates argument
                                       # processing to allow giving
                                       # options for autogen.sh after
                                       # --
                    * )       echo "Illegal option --$OPTARG" >&2; exit 2;;
            \? ) exit 2 ;;  # getopts already reported the illegal
                            # option
    shift $((OPTIND-1)) # remove parsed options and args from $@ list
    # reinitialize OPTIND to allow parsing again
    # provide help output.
    if test x"${ARG_HELP}" = x"true"; then
        echo "${PROG} new [-h | --help] [-l | --lang <LANGUAGE>] [--vcs <VCS>] PROJECT_NAME"
        exit 0
    # get the argument
    if test x"${1}" = x"--"; then
        if test x"${PROJ}" = x""; then
            echo "Missing project name." >&2; exit 2
            # nothing more to parse.
            # Remove -- from the remaining arguments
            shift 1
    if test ! x"${1}" = x""; then
        PROJ="${1%/}" # without trailing slash
    # remove the argument, then continue the loop to allow putting
    # the options after the argument
    shift 1

Additional explanation for this is available from Adam Katz (2015). I’m allowed to include it here, because every answer on Stackoverflow is licensed under creativecommons attribution sharealike (cc by-sa) and because cc by-sa is upwards compatible to GPLv3.

# From Adam Katz, 2015: http://stackoverflow.com/users/519360/adam-katz
# Available at http://stackoverflow.com/a/28466267/7666
# License: cc by-sa: https://creativecommons.org/licenses/by-sa/3.0/
while getopts ab:c-: arg; do
  case $arg in
    a )  ARG_A=true ;;
    b )  ARG_B="$OPTARG" ;;
    c )  ARG_C=true ;;
    - )  LONG_OPTARG="${OPTARG#*=}"
         case $OPTARG in
           alpha    )  ARG_A=true ;;
           bravo=?* )  ARG_B="$LONG_OPTARG" ;;
           bravo*   )  echo "No arg for --$OPTARG option" >&2; exit 2 ;;
           charlie  )  ARG_C=true ;;
           alpha* | charlie* )
                       echo "No arg allowed for --$OPTARG option" >&2; exit 2 ;;
           '' )        break ;; # "--" terminates argument processing
           * )         echo "Illegal option --$OPTARG" >&2; exit 2 ;;
         esac ;;
    \? ) exit 2 ;;  # getopts already reported the illegal option
shift $((OPTIND-1)) # remove parsed options and args from $@ list

With this and with the practical usage at the top you should be able to implement clean commandline parsing with ease.

Happy Hacking!

GNU Guix in 5 minutes

So you get excited when you hear about surviving a power-outage during updates without a hitch and you want to give Guix a try — but woes, you only have 5 minutes of time?

Fear not, that’s enough to get it up and running — all the way to per-user environments and package install as non-priviledged user!

The instructions here are from the official docs, specialized for a GNU Linux host and cut to what I need in a working system.

as user:

$ cd /tmp
$ wget ftp://alpha.gnu.org/gnu/guix/guix-binary-0.8.3.x86_64-linux.tar.xz

become root

$ sudo screen

unpack install and setup Guix

# tar xf guix-binary-0.8.3.x86_64-linux.tar.xz 
# mv var/guix /var/ && mv gnu /
# ln -sf /var/guix/profiles/per-user/root/guix-profile ~root/.guix-profile

Create the build users as per Build-Environment-Setup:

# groupadd --system guixbuild
# for i in `seq -w 1 10`;
      useradd -g guixbuild -G guixbuild           \
              -d /var/empty -s `which nologin`    \
              -c "Guix build user $i" --system    \

Run the daemon:

# ~root/.guix-profile/bin/guix-daemon --build-users-group=guixbuild

Switch to a second root window with CTRL-a c to adjust the PATH, allow substitutes from the Hydra build server, and to install and set locales (required since we’re installing an overlay, not a full distro).

# echo 'PATH="$HOME/.guix-profile/bin:$HOME/.guix-profile/sbin:${PATH}"' >> $HOME/.bashrc
# echo 'export LOCPATH=$HOME/.guix-profile/lib/locale'  >> $HOME/.bashrc
# source $HOME/.bashrc
# guix archive --authorize < ~root/.guix-profile/share/guix/hydra.gnu.org.pub
# guix package -i glibc-utf8-locales

Allow all users to use the guix command (as long as guix-daemon is running):

# mkdir -p /usr/local/bin
# cd /usr/local/bin
# ln -s /var/guix/profiles/per-user/root/guix-profile/bin/guix

Switch back to your regular user and provide the guix profile. Also install the locales (remember that the installation is really per-user, though the users share packages if they install them both). The per-user profile will be generated the first time you run guix package.

$ ln -sf /var/guix/profiles/per-user/$(whoami)/guix-profile ~/.guix-profile
$ echo 'export PATH="$HOME/.guix-profile/bin:$HOME/.guix-profile/sbin:${PATH}"' >> $HOME/.bashrc
$ echo 'export LOCPATH=$HOME/.guix-profile/lib/locale'  >> $HOME/.bashrc
$ source $HOME/.bashrc
$ guix package -i glibc-utf8-locales

And now:

$ guix package -i guile-emacs --fallback
$ ~/.guix-profile/bin/emacs -Q

So you believed that only a pipe-dream, just like power-loss-resistant updates and functional packaging using the official GNU extension language? I was glad to be proven wrong, and I hope you’ll feel the same ☺ (though guile-emacs is still experimental it already allows calling elisp functions directly from scheme)

Happy Hacking!

GnuPG/PGP signature, short explanation

»What is the .asc file?« This explanation is intended to be copied as-is into emails when someone asks about your signature.

The .asc file is a signature which can be used to verify that the email was really sent by me and wasn’t tampered with.[1] It can be verified with standard email security tools like Enigmail[2], Gpg4win[3] or MacGPG[4] - and others tools supporting OpenPGP[5].

Best wishes,

[1]: For further information on signatures see

[2]: Enigmail enables secure communication in Thunderbird:

[3]: GPG4win provides secure encryption for Windows:

[4]: MacGPG provides encryption for MacOSX:

[5]: Encryption for other systems is available from the GnuPG website:

Going from a simple Makefile to Autotools

Table of Contents



I recently started looking into Autotools, to make it easier to run my code on multiple platforms.

Naturally you can use cmake or scons or waf or ninja or tup, all of which are interesting in there own respect. But none of them has seen the amount of testing which went into autotools, and none of them have the amount of tweaks needed to support about every system under the sun. And I recently found pyconfigure which allows using autotools with python and offers detection of library features.

Warning 2016: Contains some cargo-cult-programming — my current setup is cleaner thanks to using AC_CONFIG_LINKS in configure.ac.

I had already used Makefiles for easily storing the build information of anything from python projects (python setup.py build) to my PhD thesis with all the required graphs.

I also had used scons for those same tasks.

But I wanted to test, what autotools have to offer. And I found no simple guide which showed me how to migrate from a Makefile to autotools - and what I could gain through that.

So I decided to write one.

My Makefile

The starting point is the Makefile I use for building my PhD. That’s pretty generic and just uses the most basic features of make.

If you do not know it yet: A basic makefile has really simple syntax:

# comments start with #
thing : required source files # separated by spaces
    build command
    second build command
# ^ this is a TAB.

The code above is a rule. If you put a file with this content into some folder using the filename Makefile and then run make thing in that folder (in a shell), the program “make” will check whether the source files have been changed after it last created the thing and if they have been changed, it will execute the build commands.

You can use things from other rules as source file for your thing and make will figure out all the tasks needed to create your thing.

My Makefile below creates plots from data and then builds a PDF from an org-mode file.

all: doktorarbeit.pdf sink.pdf

sink.pdf : sink.tex images/comp-t3-s07-tem-boas.png images/comp-t3-s07-tem-bona.png images/bona-marble.png images/boas-marble.png
    pdflatex sink.tex
    rm -f  *_flymake* flymake* *.log *.out *.toc *.aux *.snm *.nav *.vrb # kill litter

comp-t3-s07-tem-boas.png comp-t3-s07-tem-bona.png : nee-comp.pyx nee-comp.txt
    pyxplot nee-comp.pyx

doktorarbeit.pdf : doktorarbeit.org
    emacs --batch --visit "doktorarbeit.org" --funcall org-export-as-pdf  

Feature Equality

The first step is simple: How can I replicate with autotools what I did with the plain Makefile?

For that I create the files configure.ac and Makefile.am. The basic Makefile.am is simply my Makefile without any changes.

The configure.ac sets the project name, inits automake and tells autoreconf to generate a Makefile.

dnl run `autoreconf -i` to generate a configure script. 
dnl Then run ./configure to generate a Makefile.
dnl Finally run make to generate the project.

AC_INIT([Doktorarbeit Inverse GHG], [0.1], [arne.babenhauserheide@kit.edu])
dnl we use the build type foreign here instead of gnu because I do not have a NEWS file and similar, yet.

Now, if I run `autoreconf -i` it generates a Makefile for me. Nothing fancy here: The Makefile just does what my old Makefile did.

First milestone reached: Feature Equality!

But the generated Makefile is much bigger, offers real –help output and can generate a distribution - which does not work yet, because it misses the source files. But it clearly tells me that with `make distcheck`.

make dist: distributing the project

Since `make dist` does not work yet, let’s change that.

… easier said than done. It took me the better part of a day to figure out how to make it happy. Problems there:

  • I have to explicitely give automake the list of sources so it can copy them to the distributed package.
  • distcheck uses a separate build dir. Yes, this is the clean way, but it needs some hacking to get everything to work.
  • I use pyxplot for generating some plots. Pyxplot does not have a way (I know of) to search for datafiles in a different folder. I have to copy the files to the build dir and kill them after the build. But only if I use a separate build dir.
  • pdflatex can’t find included images. I have to adapt the TEXINPUT environment variable to give it the srcdir as additional search path.
  • Some of my commands litter the build directory with temporary or intermediate files. I have to clean them up.

So, after much haggling with autotools, I have a working make distcheck:

pdf_DATA = sink.pdf doktorarbeit.pdf

sink = sink.tex
pkgdata_DATA = images/comp-t3-s07-tem-boas.png images/comp-t3-s07-tem-bona.png
dist_pkgdata_DATA = images/bona-marble.png images/boas-marble.png

plotdir = .
dist_plot_DATA = nee-comp.pyx nee-comp.txt

doktorarbeit = doktorarbeit.org

EXTRA_DIST = ${sink} ${dist_pkgdata_DATA} ${doktorarbeit}

MOSTLYCLEANFILES = \#* *~ *.bak # kill editor backups

sink.pdf : ${sink} ${pkgdata_DATA} ${dist_pkgdata_DATA}
    TEXINPUTS=${TEXINPUTS}:$(srcdir)/:$(srcdir)/images// pdflatex $<
    rm -f  *_flymake* flymake* *.log *.out *.toc *.aux *.snm *.nav *.vrb # kill litter

${pkgdata_DATA} : ${dist_plot_DATA}
    $(foreach i,$^,if test "$(i)" != "$(notdir $(i))"; then cp -u "$(i)" "$(notdir $(i))"; fi;)
    ${MKDIR_P} images
    pyxplot $<
    $(foreach i,$^,if test "$(i)" != "$(notdir $(i))"; then rm -f "$(notdir $(i))"; fi;)

doktorarbeit.pdf : ${doktorarbeit}
    if test "$<" != "$(notdir $<)"; then cp -u "$<" "$(notdir $<)"; fi
    emacs --batch --visit "$(notdir $<)" --funcall org-export-as-pdf
    if test "$<" != "$(notdir $<)"; then rm -f "$(notdir $<)"; rm -f $(basename $(notdir $<)).tex $(basename $(notdir $<)).tex~; else rm -f $(basename $<).tex $(basename $<).tex~; fi

You might recognize that this is not the simple Makefile anymore. It is now a setup which defines files for distribution and has custom rules for preparing script runs and for cleanup.

But I can now make a fully working distribution, so when I want to publish my PhD thesis, I can simply add the generated release tarball. I work in a Mercurial repo, so I would more likely just include the repo, but there might be reasons for leaving out the history - and be it only that the history might grow quite big.

Second milestone reached: make distcheck!

An advantage is that in the process of preparing the dist, my automake file got cleanly separated into a section defining files and dependencies and one defining build rules.

But I now also understand where newer build tools like scons got their inspiration for the abstractions they use.

I should note, however, that if you were to build a software project in one of the languages supported by automake (C, C++, Python and quite a few others), I would not have needed to specify the build rules myself.

And being able to freely mix the dependency declaration in automake style with Makefile rules gives a lot of flexibility which I missed in scons.

Finding programs

Now I can build and distribute my project, but I cannot yet make sure that the programs I need for building actually exist.

And that’s finally something which can really help my build, because it gives clear error messages when something is missing, and it allows users to specify which of these programs to use via the configure script. For example I could now build 5 different versions of Emacs and try the build with each of them.

Also I added cross compilation support, though that is a bit over the top for simple PDF creation :)

Firstoff I edited my configure.ac to check for the tools:

dnl run `autoreconf -i` to generate a configure script. 
dnl Then run ./configure to generate a Makefile.
dnl Finally run make to generate the project.

AC_INIT([Doktorarbeit Inverse GHG], [0.1], [arne.babenhauserheide@kit.edu])
# Check for programs I need for my build
AC_ARG_VAR([emacs], [How to call Emacs.])
AC_CHECK_TARGET_TOOL([emacs], [emacs], [no])
AC_ARG_VAR([pyxplot], [How to call the Pyxplot plotting tool.])
AC_CHECK_TARGET_TOOL([pyxplot], [pyxplot], [no])
AC_ARG_VAR([pdflatex], [How to call pdflatex.])
AC_CHECK_TARGET_TOOL([pdflatex], [pdflatex], [no])
AS_IF([test "x$pdflatex" = "xno"], [AC_MSG_ERROR([cannot find pdflatex.])])
AS_IF([test "x$emacs" = "xno"], [AC_MSG_ERROR([cannot find Emacs.])])
AS_IF([test "x$pyxplot" = "xno"], [AC_MSG_ERROR([cannot find pyxplot.])])
# Run automake

And then I used the created variables in the Makefile.am: See the @-characters around the program names.

pdf_DATA = sink.pdf doktorarbeit.pdf

sink = sink.tex
pkgdata_DATA = images/comp-t3-s07-tem-boas.png images/comp-t3-s07-tem-bona.png
dist_pkgdata_DATA = images/bona-marble.png images/boas-marble.png

plotdir = .
dist_plot_DATA = nee-comp.pyx nee-comp.txt

doktorarbeit = doktorarbeit.org

EXTRA_DIST = ${sink} ${dist_pkgdata_DATA} ${doktorarbeit}

MOSTLYCLEANFILES = \#* *~ *.bak # kill editor backups

sink.pdf : ${sink} ${pkgdata_DATA} ${dist_pkgdata_DATA}
    TEXINPUTS=${TEXINPUTS}:$(srcdir)/:$(srcdir)/images// @pdflatex@ $<
    rm -f  *_flymake* flymake* *.log *.out *.toc *.aux *.snm *.nav *.vrb # kill litter

${pkgdata_DATA} : ${dist_plot_DATA}
    $(foreach i,$^,if test "$(i)" != "$(notdir $(i))"; then cp -u "$(i)" "$(notdir $(i))"; fi;)
    ${MKDIR_P} images
    @pyxplot@ $<
    $(foreach i,$^,if test "$(i)" != "$(notdir $(i))"; then rm -f "$(notdir $(i))"; fi;)

doktorarbeit.pdf : ${doktorarbeit}
    if test "$<" != "$(notdir $<)"; then cp -u "$<" "$(notdir $<)"; fi
    @emacs@ --batch --visit "$(notdir $<)" --funcall org-export-as-pdf
    if test "$<" != "$(notdir $<)"; then rm -f "$(notdir $<)"; rm -f $(basename $(notdir $<)).tex $(basename $(notdir $<)).tex~; else rm -f $(basename $<).tex $(basename $<).tex~; fi  
Third milestone reached: Checking for required tools!


With this I’m at the limit of the advantages of autotools for my simple project.

They allow me to create and check a distribution tarball with relative ease (if I know how to do it), and I can use them to check for tools - and to specify alternative tools via the commandline.

For a C or C++ project, autotools would have given me a lot of other things for free, but even the basic features shown here can be useful.

You have to judge for yourself if they outweight the cost of moving away from the dead simple Makefile syntax.

Comparing SCons

A little bonus I want to share.

I also wrote an scons script as alternative to my Makefile which I think might be interesting to you. It is almost equivalent to my Makefile since it can build my files, but scons does not match the features of the full autotools build and distribution system. Missing: Clean up temporary files and create a validated distribution tarball.

Missing in SCons: No distcheck!

You might notice that the more declarative style with explicit dependency information looks quite a bit more similar to automake than to plain Makefiles.

The following is my SConstruct file:

#!/usr/bin/env python
## I need a couple of special builders for my projects
# the $SOURCE replacement only uses the first source file. $SOURCES gives all.
# specifying all source files makes it possible to rerun the build if a single source file changed.
orgexportpdf = 'emacs --batch --visit "$SOURCE" --funcall org-export-as-pdf'
pyxplot = 'pyxplot $SOURCE'
# pdflatex is quite dirty. I directly clean up after it with rm.
pdflatex = 'pdflatex $SOURCE -o $TARGET; rm -f  *_flymake* flymake* *.log *.out *.toc *.aux *.snm *.nav *.vrb'

# build the PhD thesis from emacs org-mode.
Command("doktorarbeit.pdf", "doktorarbeit.org",

# create plots

# build my sink.pdf

# My editors leave tempfiles around. I want them gone after a build clean. This is not yet supported!
tempfiles = Glob('*~') + Glob('#*#') + Glob('*.bak')
# using this here would run the cleaning on every run.
#Command("clean", [], Delete(tempfiles))

If you want to integrate building with scons into a Makefile, the following lines allow you to run scons with `make sconsrun`. You might have to also mark sconsrun as .PHONY.

sconsrun : scons
    python scons/bootstrap.py -Q

scons : 
    hg clone https://bitbucket.org/ArneBab/scons

Here you can see part of the beauty of autotools, because you can just add this to your Makefile.am instead of the Makefile and it will work inside the full autotools project (though without the dist-integration). So autotools is a real superset of simple Makefiles.


If org-mode export keeps pestering you about selecting a TeX-master everytime you build the PDF, add the following to your org-mode file:

%%% Local Variables:
%%% TeX-master: t
%%% End:
2013-03-05-Di-make-to-autotools.org12.9 KB

How to fix a bug, using the example of Quod Libet empty panes on Gentoo GNU/Linux (bug solving process)

PDF-version (for printing)

orgmode-version (for editing)

For a few days now my Quod Libet has been broken, showing only empty space instead of information panes.


I investigated halfheartedly, but did not find the cause with quick googling. Today I decided to change that. I document my path here, because I did not yet write about how I actually tackle problems like these - and I think I would have profited from having a writeup like this when I started, instead of having to learn it by trial-and-error.

Update: Quodlibet 2.6.3 is now in the Gentoo portage tree - using my ebuild. The update works seamlessly. So to get your Quodlibet 2.5 running again, just call emerge =media-sound/quodlibet-2.6.3 =media-plugins/quodlibet-plugins-2.6.3. Happy Hacking!

Update: I got a second reply in the bug tracker which solved the plugins problem: I had user-plugins which require Quod Libet 3. Solution: mv ~/.quodlibet/plugins ~/.quodlibet/plugins.for-ql3. Quod Libet works completely again.

Solution for the impatient: Update to Quod Libet 2.5.1. In Gentoo that’s easy.

1 Gathering Information

As starting point I installed the Quod Libet plugins (media-libs/quodlibet-plugins), thinking that the separation between plugins and mediaplayer might not be perfect. That did not fix the problem, but a look at the plugin listing gave me nice backtraces:


And these actually show the reason for the breakage: Cannot import GTK:

Traceback (most recent call last):
  File "/home/arne/.quodlibet/plugins/songsmenu/albumart.py", line 51, in <module>
    from gi.repository import Gtk, Pango, GLib, Gdk, GdkPixbuf
  File "/usr/lib64/python2.7/site-packages/gi/__init__.py", line 27, in <module>
    from ._gi import _API, Repository
ImportError: cannot import name _API

Let’s look which package this file belongs to:

equery belongs /usr/lib64/python2.7/site-packages/gi/__init__.py
 * Searching for /usr/lib64/python2.7/site-packages/gi/__init__.py ... 
dev-python/pygobject-3.8.3 (/usr/lib64/python2.7/site-packages/gi/__init__.py)

So I finally have an answer: pygobject changed the API. Can’t be hard to fix… (a realization process follows)

2 The solution-hunting process

  • let’s check the Gentoo forums for pygobject
  • pygobject now pulls systemd??? - and they wonder why I’m pissed off by systemd: hugely invasive changes just for some small packages… KDE gets rid of the monolithic approach, and now Gnome starts it, just much more invasive into the basic structure of all distros?
  • set the USE flag -systemd to avoid systemd (why didn’t I have that yet? I guess I did not expect that Gentoo would push that on me…)
  • check when I updated pygobject:
qlop -l pygobject
Thu Dec  5 00:26:27 2013 >>> dev-python/pygobject-3.8.3
  • a week ago - that fits the timeframe. Damn… pygobject-3.8.3, you have to go.
echo =dev-python/pygobject-3.8.3 >> /usr/portage/package.mask
emerge -u pygobject
  • hm, no, the backtrace was for the plugin, but when I start Quod Libet from the shell, I see this:
LANG=C quodlibet
/usr/lib64/python2.7/site-packages/quodlibet/qltk/songlist.py:44: GtkWarning: Unable to locate theme engine in module_path: "clearlooks",
  _label = gtk.Label().create_pango_layout("")
  • emerge x11-themes/clearlooks-phenix to get clearlooks again. Looks nicer now, but still not fixed.


  • back to the drawing board. Let’s tackle this pygobject thing: emerge -C =dev-python/pygobject-3.8.3/, emerge -1 =dev-python/pygobject-2.28.6-r55.
  • not fixed. OK… let’s report a bug: empty information panes (screenshots attached).

3 The core solution

In the bug report at Quod Libet I got a reply: Known issue with quodlibet 2.5 “which triggered a bug in a recent pygtk release, resulting in lists not showing”. The plugins seem to be unrelated. Solution to my immediate problem: Update to 2.5.1. That’s not yet in gentoo, but this is easy to fix:

cd /usr/portage/media-sound/
# create the category in my local portage overlay, defined as
# PORTAGE_OVERLAY=/usr/local/portage in /etc/make.conf
mkdir -p /usr/local/portage/media-sound
# copy over the quodlibet directory, keeping the permissiong with -p
cp -rp quodlibet /usr/local/portage/media-sound
# most times it is enough to simply rename the ebuild to the new version
cd /usr/local/portage/media-sound/quodlibet
mv quodlibet-2.5.ebuild quodlibet-2.5.1.ebuild
# now prepare all the metadata portage needs - this requires
# app-portage/gentoolkit
ebuild quodlibet-2.5.1.ebuild digest compile 
# now it's prepared for the package manager. Just update it as usual:
emerge -u quodlibet

I wrote the solution in the Gentoo bug report. I should also state, that the gentoo package for Quod Libet is generally out of date (releases 2.6.3 and 3.0.2 are not yet in the tree).

Quod Libet works again.


As soon as the ebuild in the portage tree is renamed, Quod Libet should work again for all Gentoo users.

The plugins still need to be fixed, but I’ll worry about that later.

4 Conclusion

Solving the core problem took me some time, but it wasn’t really complicated. The part of the solution process which got me forward boils down to:

  • checking the project bug tracker,
  • checking the distribution bug tracker,
  • reporting a bug for the project with the information I could gather - including screenshots (or anything else which shows the problem directly - see How to Report Bugs Effectively for hints on that), and
  • checking the reported bug again a few hours or days later - and synchronizing the information between the project bug tracker and the distribution bug tracker to help fixing the bug for all users of the distribution and of other distributions.

And that’s it: To get something working again, check the bug trackers, report bugs and help synchronizing bug tracker info.

2013-12-11-quod-libet-broken.png49.59 KB
2013-12-11-quod-libet-broken-clearlooks.png50.44 KB
2013-12-11-quod-libet-broken-plugins.png27.47 KB
2013-12-11-quod-libet-fixed.png85.61 KB
2013-12-11-Mi-quodlibet-broken.org7.11 KB
2013-12-11-Mi-quodlibet-broken.pdf419.37 KB

How to run your own GNU Hurd (in 140 letters)

Don’t want to rely on other’s opinions about the Hurd? How to run your own GNU Hurd, in 140 letters:

wget http://people.debian.org/~sthibault/hurd-i386/debian-hurd.img.tar.gz; tar xf de*hu*gz; qemu-system-x86_64 -hda de*hu*g -m 1G

This is the GNU Hurd

For additional convenience and performance, setup ssh access and enable kvm:

wget http://people.debian.org/~sthibault/hurd-i386/debian-hurd.img.tar.gz; tar xf de*hu*gz; qemu-system-x86_64 -enable-kvm -net user,hostfwd=tcp: -net nic -m 1G -drive cache=writeback,file=$(ls de*hu*g)

⇒ login: root, no pw needed. Set a password for user demo:

passwd demo

⇒ log into your Hurd via ssh:

ssh demo@localhost -p 2222

That’s it: You run the Hurd. You you would want to do that? See cat translator_intro — and much more.

Additional information:

Run your own GNU Hurd

2016-06-08-hurd-howto-140-combined.xcf119.56 KB
2016-06-08-hurd-howto-140-combined.png19.92 KB
hurd-test-2017.webm1.05 MB

Huge datafiles in free culture projects under GPL

4 ways how large raw artwork files are treated in free culture projects to provide the editable source.1

In the discussion about license compatibility of the creativecommons sharealike license towards the GPL, Anthony asked how the source-requirement is solved for artwork which often has huge raw files. These are the 4 basic ways I described in my answer.

1. The Wesnoth Way

“The Source is what we have”

The project just asks artists for full resolution PNG image files (without all the layering information) - and only uses these to develop the art. This was spearheaded by the GPL-licensed strategy game Battle for Wesnoth.

This is a viable strategy and also allows developing art, though a bit less convenient than with the layered sources. For example the illustrator who created many of the images in the RPG I work on used our PNG instead of her photoshop file to extract a die from the cover she created for us. She took the chance to also touch up the colors a bit - she had learned some new tricks to improve her paintings.

This clearly complies with the GPL, because the GPL just requires providing the file used for editing published file. If the released file is what you actually use to change published files, then the published file is the source.

2. The External Storage

“Use the FTP, Luke”

Here, files which are too big to be versioned effectively or which most people don’t need when working with the project get version-numbers and are put into an external storage - like an FTP server.

I do that for gimp-files: I put these into our public release-listing via FTP. For example I used that for a multi-layer cover which gets baked into our PDF.

3. The Elegant Way

“Make it so!”

Here huge files are simply versioned alongside other files and the versions to be used are created directly from the multi-layered files. The usual way to do that is a Makefile in which scripts explicitly define how the derived file can be extracted.

This is most elegant, because it has no duplication of information, the source is always trivial to find, it’s always clear that the derived file really originated from the source and it is easy to avoid quality loss or even reduce it later.

The disadvantage is that it can be very cumbersome to force new developers to get all huge files and then create them before being able to really start developing.

The common way to do this is a Makefile - for example the one I use for building my PhD thesis.

4. Pragmatic Elegance

“Hybrids win”

All the ways above can be combined: Huge files are put in version control, but the derived files are included, too, to make it easier for new people to get in. Maybe the huge files are only included on request - for example they could be stubs with which the version control system can retrieve the full files when the user wants them. This can partially be done with the largefiles extension in Mercurial by just not getting the large files.

Also you can just keep separate raw files and derived files. This is also used in Battle for Wesnoth: Optimized files of the right size for the game are stored in one folder while the bigger full resolution files are stored separately.

If you want to include free art in a GPL-covered work, I hope this article gave you some inspiration!

  1. The die was created by Trudy Wenzel (2013) and is licensed under GPLv3 or later. 

Immutable function arguments and variables

  1. Dev A: “Fortran is totally outdated.”
  2. Dev B: “I wish we could declare objects in function arguments or variable values as immutable in Java and Javascript.”

Fortran developer silently weeps:

! immutable 2D array as argument in Fortran
  integer, intent(in) :: arg(:,:)
! constant value
  character(len=10), parameter :: numbers = "0123456789"

See parameter vs. intent(in).

(yes, I’m currently reading a Javascript book)

If you now want to see more of Fortran:

Installing GNU Guix 0.6, easily

Org-Source (for editing)

PDF (for printing)

“Got a power-outage while updating?
No problem: Everything still works”

GNU Guix is the new functional package manager from the GNU Project which complements the Nix-Store with a nice Guile Scheme based package definition format.

What sold it to me was “Got a power-outage while updating? No problem: Everything still works” from the Guix talk of Ludovico at the GNU Hacker Meeting 2013. My son once found the on-off-button of our power-connector while I was updating my Gentoo box. It took me 3 evenings to get it completely functional again. This would not have happened with Guix.

Update (2014-05-17): Thanks to zerwas from IRC @ freenode for the patch to guix 0.6 and nice cleanup!


Installation of GNU Guix is straightforward, except if you follow the docs, but it’s not as if we’re not used to that from other GNU utilities, which often terribly short-sell their quality with overly general documentation ☺

So I want to provide a short guide how to setup and run GNU Guix with ease. My system natively runs Gentoo, My system natively runs Gentoo, so some details might vary for you. If you use Gentoo, you can simply copy the commands here into the shell, but better copy them to a text-file first to ensure that I do not try to trick you into doing evil things with the root access you need.

In short: This guide provides the First Contact and Black Triangle for GNU Guix.

Getting GNU Guix

mkdir guix && cd guix
wget http://alpha.gnu.org/gnu/guix/guix-0.6.tar.gz
wget http://alpha.gnu.org/gnu/guix/guix-0.6.tar.gz.sig
gpg --verify guix-0.?.tar.gz.sig

Installing GNU Guix

tar xf guix-0.?.tar.gz
cd guix-0.?
./configure && make -j16
sudo make install

Setting up GNU Guix

Build users

Build-users allow for strong separation of build processes: They cannot affect each other, because they actually run as different users.

sudo screen
groupadd guix-builder
for i in `seq 1 10`;
    useradd -g guix-builder -G guix-builder           \
            -d /var/empty -s `which nologin`          \
            -c "Guix build user $i" --system          \

(if you do not have GNU screen yet, you should get it. It makes working on remote servers enjoyable.

Add user work folder.

Also we want to run guix as regular user. We need to pre-create the user-specific build-directory. Note: This should really be done automatically.

sudo mkdir -p /usr/local/var/nix/profiles/per-user/$USER
sudo chown -R $USER:$USER /usr/local/var/nix/profiles/per-user/$USER

Fix store permissions

chgrp 1002 /nix/store; chmod 1775 /nix/store

Starting the guix daemon and making it launch at startup

this might be quite Gentoo-specific.

sudo screen
echo "#\!/bin/sh" >> /etc/local.d/guix-daemon.start
echo "guix-daemon --build-users-group=guix-builder &" >> /etc/local.d/guix-daemon.start
echo "#\!/bin/sh" >> /etc/local.d/guix-daemon.stop
echo "pkill guix-daemon" >> /etc/local.d/guix-daemon.stop
chmod +x /etc/local.d/guix-daemon.start
chmod +x /etc/local.d/guix-daemon.stop

(the pkill is not the nice way of killing the daemon. Ideally the daemon should have a –kill option)

To avoid having to restart, we just launch the daemon once, now.

sudo /etc/local.d/guix-daemon.start

Adding the guix-installed programs to your PATH

Guix installs each state of the system in its own directory, which actually enables rollbacks. The current state is made available via ~/.guix-profile/, and so we need ~/.guix-profile/bin in our path:

echo "export PATH=$PATH:~/.guix-profile/bin" >> ~/.bashrc
. ~/.bashrc

Using guix

Guix comes with a quite complete commandline interface. The basics are

  • Update the package listing: guix pull
  • List available packages: guix package -A
  • Install a package: guix package -i PACKAGE
  • Update all packages: guix package -u


For a new distribution-tool, Guix is quite nice. Remember, though, that it builds on Nix: It is not a complete reinvention but rather “stands on the shoulders of giants”.

The download speeds are abysmal, though. http://hydra.gnu.org seems to have a horribly slow internet connection…

And what I direly missed is a short command explanation in the help output:

$ guix --help
Usage: guix COMMAND ARGS...

COMMAND must be one of the sub-commands listed below:


Also I miss the categories I know from Gentoo: Having package-names like grue-hunter seems very unorganized compared to the games-text/grue-hunter which I know from Gentoo.

And it would be nice to have shorthands for the command names:

  • "guix pa -i" instead of "guix package -i" (though there is a namespace clash with guix pull :( )
  • "guix pu" for "guix pull"

and so on.

But anyway: A very interesting project which I plan to keep tracking. It might allow me to do less risky local package installs of stuff I need, like small utilities I wrote myself.

The big advantage of that would be, that I could actually take them with me when I have to use different distros (though I’ve been a happy Gentoo user for ~10 years and I don’t see it as likely that I’ll switch completely: Guix would have to include all the roughly 30k packages in Gentoo to actually be a full-fledged alternative - and provide USE flags and all the convenient configurability which makes Gentoo such a nice experience).

Using guix for such small stuff would allow me to decouple experiments from my production environment (which has to keep working).

But enough talk: Have fun with GNU Guix and Happy Hacking!

Author: Arne Babenhauserheide

Created: 2014-05-17 Sa 23:40

Emacs 24.3.1 (Org mode 8.2.5h)


2013-09-04-Mi-guix-install.org6.53 KB
2013-09-04-Mi-guix-install.pdf171.32 KB

Installing Scipy and PyNIO on a Bare Cluster with the Intel Compiler

2 years ago I had the task of running a python-program using scipy on our university cluster, using the Intel Compiler. I needed all those (as well as PyNIO and some other stuff) for running TM5 with the python shell on the HC3 of KIT.

This proved to be quite a bit more challenging than I had expected - but it was very interesting, too (and there I learned the basics of GNU autotools which still help me a lot).

But no one should have to go to the same effort with as little guidance as I had, so I decided to publish the script and the patches I created for installing everything we needed.1

The script worked 2 years ago, so you might have to fix some bits. I won’t promise that this contains everything you need to run the script - or that it won’t be broken when you install it. Actually I won’t promise anything at all, except that if the stuff here had been available 2 years ago, that could have saved me about 2 months of time (each of the patches here required quite some tracking of problems, experimenting and fixing, until it provided basic functionality - but actually I enjoyed doing that - I learned a lot - I just don’t want to be forced to do it again). Still, this stuff contains quite some hacks - even a few ugly ones. But it worked.

2 libraries and programs which get installed (=requirements)

This script requires and installs quite a few libraries. I retrieved most of the following tarballs from my Gentoo distfiles dir after installing the programs locally. I uploaded them to draketo.de/dateien/scipy-pynio-deps. These files are included there:

satexputils.so also needs interpolatelevels.F90 which I think that I am not allowed to share, so you’re on your own there. Guess why I do not like using non-free (or not-guaranteed-to-be-free) software.

3 Known Bugs

3.1 HDF autotools patch throws away some CFLAGS

The hdf autotools patch only retrieves the last CFLAG instead of all:

export CC='gcc-4.8.1 -Wall -Werror'                                                          
echo $CC | grep \ - | sed 's/.* -/-/'                                                                     

If you have the regexp-foo to fix that, please improve the patch! But without perl (otherwise we’d have to install perl, too).

3.2 SciPy inline-C via weaver does not work

Udo Grabowski, the maintainer of our institutes sun-cluster somehow managed to get that working on OpenIndiana with the Sun-Compiler, but since I did not need it, I did not dig deeper to see whether I could adapt his solutions to the intel-compiler.

5 Implementation

This is the full install script I used to install all necessary dependencies.


# Untar

for i in *.tar* *.tgz; do
  tar xvf $i || exit

# Install


# Blas

cp ../blas-make.inc make.inc || exit
#make -j9 clean
F77=ifort make -j9 || exit
#make -j9 install --prefix=$PREFIX
# OR for Intel compiler:
ifort -fPIC -FI -w90 -w95 -cm -O3 -xHost -unroll -c *.f || exit
#Continue below irrespective of compiler:
ar r libfblas.a *.o || exit
ranlib libfblas.a || exit
cd ..
ln -s BLAS blas

## Lapack

cd lapack-3.3.1
ln -s ../blas
# this has a hardcoded absolute path to blas in it: replace is with the appropriate one for you.
cp ../lapack-make.inc make.inc || exit
make -j9 clean  || exit
make -j9
make -j9 || exit
cp lapack_LINUX.a libflapack.a || exit
#make -j9 install --prefix=$PREFIX
cd ..

# C interface

patch -p0 < lapacke-ifort.diff

cd lapacke
# patch for lapack 3.3.1 and blas
for i in gnu inc intel ; do 
    sed -i s/lapack-3\.2\.1\\/lapack\.a/lapack-3\.3\.1\\/lapack_LINUX.a/ make.$i; 
    sed -i s/lapack-3\.2\.1\\/blas\.a/blas\\/blas_LINUX.a/ make.$i; 

make -j9 clean || exit
#make -j9
LINKER=ifort LDFLAGS=-nofor-main make -j9 # || exit
#LINKER=ifort LDFLAGS=-nofor-main make -j9 install
cd ..


cp ../Make.Linux_HC3 . || exit
echo "ATLAS needs manual intervention. Run make by hand first."
#echo "just say yes. It makes some stuff we need later."
#mv bin/Linux_UNKNOWNSSE2_8 bin/Linux_HC3
#for i in bin/Linux_HC3/*; do sed -i s/UNKNOWNSSE2_8/HC3/ $i ; done
#rm bin/Linux_HC3/Make.inc
#cd bin/Linux_HC3/
#ln -s ../../Make.Linux_HC3 Make.inc
#cd -

make -j9 install arch=Linux_HC3 || exit
cd lib
for i in Linux_HC3/* ; do ln -s $i ; done
cd ../bin
for i in Linux_HC3/* ; do ln -s $i ; done
cd ../include
for i in Linux_HC3/* ; do ln -s $i ; done
cd ..
cd ..

# Numpy and SciPy with intel compilers

# Read this: http://marklodato.github.com/2009/08/30/numpy-scipy-and-intel.html

# patching

patch -p0 < SuiteSparse.diff  || exit
patch -p0 < SuiteSparse-umfpack.diff  || exit

rm numpy
ln -s numpy-*.*.*/ numpy
patch -p0 < numpy-icc.diff  || exit
patch -p0 < numpy-icpc.diff || exit
patch -p0 <<EOF
--- numpy/numpy/distutils/fcompiler/intel.py      2009-03-29 07:24:21.000000000 -0400
+++ numpy/numpy/distutils/fcompiler/intel.py  2009-08-06 23:08:59.000000000 -0400
@@ -47,6 +47,7 @@
     module_include_switch = '-I'

     def get_flags(self):
+        return ['-fPIC', '-cm']
         v = self.get_version()
         if v >= '10.0':
             # Use -fPIC instead of -KPIC.
@@ -63,6 +64,7 @@
         return ['-O3','-unroll']

     def get_flags_arch(self):
+        return ['-xHost']
         v = self.get_version()
         opt = []
         if cpu.has_fdiv_bug():
# include -fPIC in the fcompiler.
sed -i "s/w90/w90\", \"-fPIC/" numpy/numpy/distutils/fcompiler/intel.py
# and more of that
patch -p0 < numpy-ifort.diff

rm scipy
ln -s scipy-*.*.*/ scipy

patch -p0 < scipy-qhull-icc.diff || exit
patch -p0 < scipy-qhull-icc2.diff || exit

# # unnecessary!
# patch -p0 <<EOF
# --- scipy/scipy/special/cephes/const.c    2009-08-07 01:56:43.000000000 -0400
# +++ scipy/scipy/special/cephes/const.c        2009-08-07 01:57:08.000000000 -0400
# @@ -91,12 +91,12 @@
# double THPIO4 =  2.35619449019234492885;       /* 3*pi/4 */
# double TWOOPI =  6.36619772367581343075535E-1; /* 2/pi */
# -double INFINITY = 1.0/0.0;  /* 99e999; */
# +double INFINITY = __builtin_inff();
# #else
# double INFINITY =  1.79769313486231570815E308;    /* 2**1024*(1-MACHEP) */
# #endif
# #ifdef NANS
# -double NAN = 1.0/0.0 - 1.0/0.0;
# +double NAN = __builtin_nanf("");
# #else
# double NAN = 0.0;
# #endif

# building

# TODO: try again later

cd SuiteSparse

make -j9 -C AMD || exit
make -j9 -C UMFPACK || exit

cd ..

# TODO: build numpy again and make sure it has blas and lapack (and ATLAS?)

cd numpy
python setup.py -v build_src config --compiler=intel build_clib \
    --compiler=intel build_ext --compiler=intel || exit
python setup.py install --prefix=$PYPREFIX || exit
cd ..

# scons and numscons
cd scons-2.0.1
python setup.py -v install --prefix=/home/ws/babenhau/python/ || exit
cd ..

git clone git://github.com/cournape/numscons.git
cd numscons 
python setup.py -v install --prefix=/home/ws/babenhau/python/  || exit
cd ..

# adapt /home/ws/babenhau/python/lib/python2.7/site-packages/numpy/distutils/fcompiler/intel.py by hand to include fPIC for intelem

cd scipy

PYTHONPATH=/home/ws/babenhau/python//lib/scons-2.0.1/ ATLAS=../ATLAS/ \
    LAPACK=../lapack-3.3.1/libflapack.a LAPACK_SRC=../lapack-3.3.1 BLAS=../BLAS/libfblas.a \
    F77=ifort f77_opt=ifort python setup.py -v config --compiler=intel --fcompiler=intelem build_clib \
    --compiler=intel --fcompiler=intelem build_ext --compiler=intel --fcompiler=intelem \
    -I../SuiteSparse/UFconfig # no exit, because we do the linking by hand later on.

# one file is C++ :(
icpc -fPIC -I/home/ws/babenhau/python/include/python2.7 -I/home/ws/babenhau/python/lib/python2.7/site-packages/numpy/core/include -I/home/ws/babenhau/python/lib/python2.7/site-packages/numpy/core/include -c scipy/spatial/qhull/src/user.c -o build/temp.linux-x86_64-2.7/scipy/spatial/qhull/src/user.o || exit

# linking by hand

# for x in csr csc coo bsr dia; do
#    icpc -xHost -O3 -fPIC -shared \
#        build/temp.linux-x86_64-2.7/scipy/sparse/sparsetools/${x}_wrap.o \
#        -o build/lib.linux-x86_64-2.7/scipy/sparse/sparsetools/_${x}.so || exit
# done
#icpc -xHost -O3 -fPIC -openmp -shared \
#   build/temp.linux-x86_64-2.7/scipy/interpolate/src/_interpolate.o \
#   -o build/lib.linux-x86_64-2.7/scipy/interpolate/_interpolate.so || exit

# build again with the C++ file already compiled

PYTHONPATH=/home/ws/babenhau/python//lib/scons-2.0.1/ ATLAS=../ATLAS/ \
    LAPACK=../lapack-3.3.1/libflapack.a LAPACK_SRC=../lapack-3.3.1 BLAS=../BLAS/libfblas.a \
    F77=ifort f77_opt=ifort python setup.py config --compiler=intel --fcompiler=intelem build_clib \
    --compiler=intel --fcompiler=intelem build_ext --compiler=intel --fcompiler=intelem \
    -I../SuiteSparse/UFconfig || exit

# make sure we have cephes
cd scipy/special
PYTHONPATH=/home/ws/babenhau/python//lib/scons-2.0.1/ ATLAS=../../../ATLAS/ \
    LAPACK=../../../lapack-3.3.1/libflapack.a LAPACK_SRC=../lapack-3.3.1 BLAS=../../../BLAS/libfblas.a \
    F77=ifort f77_opt=ifort python setup.py -v config --compiler=intel --fcompiler=intelem build_clib \
    --compiler=intel --fcompiler=intelem build_ext --compiler=intel --fcompiler=intelem \
cd ../..

# install
PYTHONPATH=/home/ws/babenhau/python//lib/scons-2.0.1/ ATLAS=../ATLAS/ \
    LAPACK=../lapack-3.3.1/libflapack.a LAPACK_SRC=../lapack-3.3.1 BLAS=../BLAS/libfblas.a \
    F77=ifort f77_opt=ifort python setup.py config --compiler=intel --fcompiler=intelem build_clib \
    --compiler=intel --fcompiler=intelem install --prefix=$PYPREFIX || exit

cd ..


# netcdf-4

patch -p0 < netcdf-patch1.diff || exit
patch -p0 < netcdf-patch2.diff || exit

cd netcdf-4.1.3

CPPFLAGS="-I/home/ws/babenhau/libbutz/hdf5-1.8.7/include -I/home/ws/babenhau/include" LDFLAGS="-L/home/ws/babenhau/libbutz/hdf5-1.8.7/lib/ -L/home/ws/babenhau/lib -lsz -L/home/ws/babenhau/libbutz/szip-2.1/lib -L/opt/intel/Compiler/11.1/080/lib/intel64/libifcore.a -lifcore" ./configure --prefix=/home/ws/babenhau/ --enable-netcdf-4 --enable-shared || exit

make -j9; make check install -j9 || exit

cd ..

# NetCDF4
cd netCDF4-0.9.7
HAS_SZIP=1 SZIP_PREFIX=/home/ws/babenhau/libbutz/szip-2.1/ HAS_HDF5=1 HDF5_DIR=/home/ws/babenhau/libbutz/hdf5-1.8.7 HDF5_PREFIX=/home/ws/babenhau/libbutz/hdf5-1.8.7 HDF5_includedir=/home/ws/babenhau/libbutz/hdf5-1.8.7/include HDF5_libdir=/home/ws/babenhau/libbutz/hdf5-1.8.7/lib HAS_NETCDF4=1 NETCDF4_PREFIX=/home/ws/babenhau/ python setup.py build_ext --compiler="intel" --fcompiler="intel -fPIC" install --prefix $PYPREFIX
cd ..

# parallel netcdf and hdf5: ~/libbutz/

patch -p0 < pynio-fix-no-grib.diff || exit

cd PyNIO-1.4.1
HAS_SZIP=1 SZIP_PREFIX=/home/ws/babenhau/libbutz/szip-2.1/ HAS_HDF5=1 HDF5_DIR=/home/ws/babenhau/libbutz/hdf5-1.8.7 HDF5_PREFIX=/home/ws/babenhau/libbutz/hdf5-1.8.7 HDF5_includedir=/home/ws/babenhau/libbutz/hdf5-1.8.7/include HDF5_libdir=/home/ws/babenhau/libbutz/hdf5-1.8.7/lib HAS_NETCDF4=1 NETCDF4_PREFIX=/home/ws/babenhau/ python setup.py install --prefix=$PYPREFIX || exit
# TODO: Make sure that the install goes to /home/ws/.., not home/ws/...
cd ..

# satexp_utils.so

f2py -c -m satexp_utils --f77exec=ifort --f90exec=ifort interpolate_levels.F90 || exit

## pyhdf

# recompile hdf with fPIC - grr!
cd hdf-4*/
# Fix configure for compilers with - in the name.
patch -p0 < ../hdf-fix-configure.ac.diff
FFLAGS="-ip -O3 -xHost -fPIC -r8" CFLAGS="-ip -O3 -xHost -fPIC" CXXFLAGS="$CFLAGS -I/usr/include/rpc  -DBIG_LONGS -DSWAP" F77=ifort ./configure --prefix=/home/ws/babenhau/ --disable-netcdf --with-szlib=/home/ws/babenhau/libbutz/szip-2.1 # --with-zlib=/home/ws/babenhau/libbutz/zlib-1.2.5 --with-jpeg=/home/ws/babenhau/libbutz/jpeg-8c
# finds zlib and jpeg due to LD_LIBRARY_PATH (hack but works…)
make install
cd ..

# build pyhdf
cd pyhdf-0.8.3/
INCLUDE_DIRS="/home/ws/babenhau/include:/home/ws/babenhau/libbutz/szip-2.1/include" LIBRARY_DIRS="/home/ws/babenhau/lib:/home/ws/babenhau/libbutz/szip-2.1/lib" python setup.py build -c intel --fcompiler ifort install --prefix=/home/ws/babenhau/python 
cd ..

## matplotlib

cd matplotlib-1.1.0
patch -p0 < ../matplotlib-add-icc-support.diff
python setup.py build -c intel install --prefix=/home/ws/babenhau/python
cd ..

# GEOS → http://download.osgeo.org/geos/geos-3.3.2.tar.bz2

cd geos*/ 
./configure --prefix=/home/ws/babenhau/
make check
make install 
cd ..

# basemap

easy_install --prefix /home/ws/babenhau/python basemap
# fails but should now have all dependencies.

cd basemap-*/

python setup.py build -c intel install --prefix=/home/ws/babenhau/python

cd ..

6 Appendix

6.1 All patches inline

To ease usage and upstreaming of my fixes, I include all the patches below, so you can find them directly in this text instead of having to browse external textfiles.

6.1.1 SuiteSparse-umfpack.diff

--- SuiteSparse/UMFPACK/Lib/GNUmakefile 2009-11-11 21:09:54.000000000 +0100
+++ SuiteSparse/UMFPACK/Lib/GNUmakefile 2011-09-09 14:18:57.000000000 +0200
@@ -9,7 +9,7 @@
     -I../Include -I../Source -I../../AMD/Include -I../../UFconfig \
     -I../../CCOLAMD/Include -I../../CAMD/Include -I../../CHOLMOD/Include \
-    -I../../metis-4.0/Lib -I../../COLAMD/Include
+    -I../../COLAMD/Include

 # source files

6.1.2 SuiteSparse.diff

--- SuiteSparse/UFconfig/UFconfig.mk    2011-09-09 13:14:03.000000000 +0200
+++ SuiteSparse/UFconfig/UFconfig.mk    2011-09-09 13:15:03.000000000 +0200
@@ -33,11 +33,11 @@
 # C compiler and compiler flags:  These will normally not give you optimal
 # performance.  You should select the optimization parameters that are best
 # for your system.  On Linux, use "CFLAGS = -O3 -fexceptions" for example.
-CC = cc
-CFLAGS = -O3 -fexceptions
+CC = icc
+CFLAGS = -O3 -xHost -fPIC -openmp -vec_report=0

 # C++ compiler (also uses CFLAGS)

 # ranlib, and ar, for generating libraries
 RANLIB = ranlib
@@ -49,8 +49,8 @@
 MV = mv -f

 # Fortran compiler (not normally required)
-F77 = f77
-F77FLAGS = -O
+F77 = ifort
+F77FLAGS = -O3 -xHost
 F77LIB =

 # C and Fortran libraries
@@ -132,13 +132,13 @@
 # The path is relative to where it is used, in CHOLMOD/Lib, CHOLMOD/MATLAB, etc.
 # You may wish to use an absolute path.  METIS is optional.  Compile
 # CHOLMOD with -DNPARTITION if you do not wish to use METIS.
-METIS_PATH = ../../metis-4.0
-METIS = ../../metis-4.0/libmetis.a
+# METIS_PATH = ../../metis-4.0
+# METIS = ../../metis-4.0/libmetis.a

 # If you use CHOLMOD_CONFIG = -DNPARTITION then you must use the following
 # options:
-# METIS =

 # UMFPACK configuration:
@@ -194,7 +194,7 @@
 # -DNSUNPERF       for Solaris only.  If defined, do not use the Sun
 #          Performance Library


 # SuiteSparseQR configuration:

6.1.3 hdf-fix-configure.ac.diff (fixes a bug but still contains another known bug - see Known Bugs!)

--- configure.ac    2012-03-01 15:00:28.000000000 +0100
+++ configure.ac    2012-03-01 15:00:40.000000000 +0100
@@ -815,7 +815,7 @@
 dnl Report anything stripped as a flag in CFLAGS and 
 dnl only the compiler in CC_VERSION.
 CC_NOFLAGS=`echo $CC | sed 's/ -.*//'`
-CFLAGS_TO_ADD=`echo $CC | grep - | sed 's/.* -/-/'`
+CFLAGS_TO_ADD=`echo $CC | grep \ - | sed 's/.* -/-/'`
 if test -n $CFLAGS_TO_ADD; then

6.1.4 lapacke-ifort.diff

--- lapacke/make.intel.old  2011-10-05 13:24:14.000000000 +0200
+++ lapacke/make.intel  2011-10-05 16:17:00.000000000 +0200
@@ -56,7 +56,7 @@
 # Ensure that the libraries have the same data model (LP64/ILP64).
 LAPACKE = lapacke.a
-LIBS = ../../../lapack-3.3.1/lapack_LINUX.a ../../../blas/blas_LINUX.a -lm
+LIBS = /opt/intel/Compiler/11.1/080/lib/intel64/libifcore.a ../../../lapack-3.2.1/lapack.a ../../../lapack-3.2.1/blas.a -lm -ifcore
 #  The archiver and the flag(s) to use when building archive (library)
 #  If your system has no ranlib, set RANLIB = echo.

6.1.5 matplotlib-add-icc-support.diff

diff -r 38c2a32c56ae matplotlib-1.1.0/setup.py
--- a/matplotlib-1.1.0/setup.py Fri Mar 02 12:29:47 2012 +0100
+++ b/matplotlib-1.1.0/setup.py Fri Mar 02 12:30:39 2012 +0100
@@ -31,6 +31,13 @@
 if major==2 and minor1<4 or major<2:
     raise SystemExit("""matplotlib requires Python 2.4 or later.""")

+if "intel" in sys.argv or "icc" in sys.argv:
+    try: # make it compile with the intel compiler
+        from numpy.distutils import intelccompiler
+    except ImportError:
+        print "Compiling with the intel compiler requires numpy."
+        raise
 import glob
 from distutils.core import setup
 from setupext import build_agg, build_gtkagg, build_tkagg,\

6.1.6 netcdf-patch1.diff

--- netcdf-4.1.3/fortran/ncfortran.h    2011-07-01 01:22:22.000000000 +0200
+++ netcdf-4.1.3/fortran/ncfortran.h    2011-09-14 14:56:03.000000000 +0200
@@ -658,7 +658,7 @@
  * The following is for f2c-support only.

-#if defined(f2cFortran) && !defined(pgiFortran) && !defined(gFortran)
+#if defined(f2cFortran) && !defined(pgiFortran) && !defined(gFortran) &&!defined(__INTEL_COMPILER)

  * The f2c(1) utility on BSD/OS and Linux systems adds an additional

6.1.7 netcdf-patch2.diff

--- netcdf-4.1.3/nf_test/fortlib.c  2011-09-14 14:58:47.000000000 +0200
+++ netcdf-4.1.3/nf_test/fortlib.c  2011-09-14 14:58:38.000000000 +0200
@@ -14,7 +14,7 @@
 #include "../fortran/ncfortran.h"

-#if defined(f2cFortran) && !defined(pgiFortran) && !defined(gFortran)
+#if defined(f2cFortran) && !defined(pgiFortran) && !defined(gFortran) &&!defined(__INTEL_COMPILER)
  * The f2c(1) utility on BSD/OS and Linux systems adds an additional
  * underscore suffix (besides the usual one) to global names that have

6.1.8 numpy-icc.diff

--- numpy/numpy/distutils/intelccompiler.py 2011-09-08 14:14:03.000000000 +0200
+++ numpy/numpy/distutils/intelccompiler.py 2011-09-08 14:20:37.000000000 +0200
@@ -30,11 +30,11 @@
     """ A modified Intel x86_64 compiler compatible with a 64bit gcc built Python.
     compiler_type = 'intelem'
-    cc_exe = 'icc -m64 -fPIC'
+    cc_exe = 'icc -m64 -fPIC -xHost -O3'
     cc_args = "-fPIC"
     def __init__ (self, verbose=0, dry_run=0, force=0):
         UnixCCompiler.__init__ (self, verbose,dry_run, force)
-        self.cc_exe = 'icc -m64 -fPIC'
+        self.cc_exe = 'icc -m64 -fPIC -xHost -O3'
         compiler = self.cc_exe

6.1.9 numpy-icpc.diff

--- numpy-1.6.1/numpy/distutils/intelccompiler.py   2011-10-06 16:55:12.000000000 +0200
+++ numpy-1.6.1/numpy/distutils/intelccompiler.py   2011-10-10 10:26:14.000000000 +0200
@@ -10,11 +10,13 @@
     def __init__ (self, verbose=0, dry_run=0, force=0):
         UnixCCompiler.__init__ (self, verbose,dry_run, force)
         self.cc_exe = 'icc -fPIC'
+   self.cxx_exe = 'icpc -fPIC'
         compiler = self.cc_exe
+   compiler_cxx = self.cxx_exe
-                             compiler_cxx=compiler,
-                             linker_exe=compiler,
+                             compiler_cxx=compiler_cxx,
+                             linker_exe=compiler_cxx,
                              linker_so=compiler + ' -shared')

 class IntelItaniumCCompiler(IntelCCompiler):

6.1.10 numpy-ifort.diff

--- numpy-1.6.1/numpy/distutils/fcompiler/intel.py.old  2011-10-10 17:52:34.000000000 +0200
+++ numpy-1.6.1/numpy/distutils/fcompiler/intel.py  2011-10-10 17:53:51.000000000 +0200
@@ -32,7 +32,7 @@
     executables = {
         'version_cmd'  : None,          # set by update_executables
         'compiler_f77' : [None, "-72", "-w90", "-fPIC", "-w95"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'compiler_fix' : [None, "-FI"],
         'linker_so'    : ["<F90>", "-shared"],
         'archiver'     : ["ar", "-cr"],
@@ -129,7 +129,7 @@
         'version_cmd'  : None,
         'compiler_f77' : [None, "-FI", "-w90", "-fPIC", "-w95"],
         'compiler_fix' : [None, "-FI"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'linker_so'    : ['<F90>', "-shared"],
         'archiver'     : ["ar", "-cr"],
         'ranlib'       : ["ranlib"]
@@ -148,7 +148,7 @@
         'version_cmd'  : None,
         'compiler_f77' : [None, "-FI", "-w90", "-fPIC", "-w95"],
         'compiler_fix' : [None, "-FI"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'linker_so'    : ['<F90>', "-shared"],
         'archiver'     : ["ar", "-cr"],
         'ranlib'       : ["ranlib"]
@@ -180,7 +180,7 @@
         'version_cmd'  : None,
         'compiler_f77' : [None,"-FI","-w90", "-fPIC","-w95"],
         'compiler_fix' : [None,"-FI","-4L72","-w"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'linker_so'    : ['<F90>', "-shared"],
         'archiver'     : [ar_exe, "/verbose", "/OUT:"],
         'ranlib'       : None
@@ -232,7 +232,7 @@
         'version_cmd'  : None,
         'compiler_f77' : [None,"-FI","-w90", "-fPIC","-w95"],
         'compiler_fix' : [None,"-FI","-4L72","-w"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'linker_so'    : ['<F90>',"-shared"],
         'archiver'     : [ar_exe, "/verbose", "/OUT:"],
         'ranlib'       : None

6.1.11 pynio-fix-no-grib.diff

--- PyNIO-1.4.1/Nio.py  2011-09-14 16:00:13.000000000 +0200
+++ PyNIO-1.4.1/Nio.py  2011-09-14 16:00:18.000000000 +0200
@@ -98,7 +98,7 @@
         if ncarg_dir == None or not os.path.exists(ncarg_dir) \
           or not os.path.exists(os.path.join(ncarg_dir,"lib","ncarg")):
             if not __formats__['grib2']:
-                return None
+                return "" # "", because an env variable has to be a string.
                 print "No path found to PyNIO/ncarg data directory and no usable NCARG installation found"

6.1.12 scipy-qhull-icc.diff

--- scipy/scipy/spatial/qhull/src/qhull_a.h 2011-02-27 11:57:03.000000000 +0100
+++ scipy/scipy/spatial/qhull/src/qhull_a.h 2011-09-09 15:42:12.000000000 +0200
@@ -102,13 +102,13 @@
 #elif defined(__MWERKS__) && defined(__INTEL__)
 #   define QHULL_OS_WIN
-#if defined(__INTEL_COMPILER) && !defined(QHULL_OS_WIN)
-template <typename T>
-inline void qhullUnused(T &x) { (void)x; }
-#  define QHULL_UNUSED(x) qhullUnused(x);
+/*#if defined(__INTEL_COMPILER) && !defined(QHULL_OS_WIN)*/
+/*template <typename T>*/
+/*inline void qhullUnused(T &x) { (void)x; }*/
+/*#  define QHULL_UNUSED(x) qhullUnused(x);*/
 #  define QHULL_UNUSED(x) (void)x;

 /***** -libqhull.c prototypes (alphabetical after qhull) ********************/

6.1.13 scipy-qhull-icc2.diff

--- scipy/scipy/spatial/qhull/src/qhull_a.h 2011-09-09 15:43:54.000000000 +0200
+++ scipy/scipy/spatial/qhull/src/qhull_a.h 2011-09-09 15:45:17.000000000 +0200
@@ -102,13 +102,7 @@
 #elif defined(__MWERKS__) && defined(__INTEL__)
 #   define QHULL_OS_WIN
-/*#if defined(__INTEL_COMPILER) && !defined(QHULL_OS_WIN)*/
-/*template <typename T>*/
-/*inline void qhullUnused(T &x) { (void)x; }*/
-/*#  define QHULL_UNUSED(x) qhullUnused(x);*/
 #  define QHULL_UNUSED(x) (void)x;

 /***** -libqhull.c prototypes (alphabetical after qhull) ********************/

6.1.14 scipy-spatial-lifcore.diff

--- scipy-0.9.0/scipy/spatial/setup.py  2011-10-10 17:11:23.000000000 +0200
+++ scipy-0.9.0/scipy/spatial/setup.py  2011-10-10 17:11:09.000000000 +0200
@@ -22,6 +22,8 @@
                        # XXX: GCC dependency!
+                       # XXX intel compiler dependency
+                       extra_compiler_args=['-lifcore'],

     lapack = dict(get_info('lapack_opt'))

7 Summary

I hope this helps someone out there saving some time - or even better: improving the upstream projects. At least it should be a nice reference for all who need to get scipy working on not-quite-supported architectures.

Happy Hacking!



: Actually I already wanted to publish that script more than a year ago, but time flies and there’s always stuff to do. But at least I now managed to get it done.

Author: Arne Babenhauserheide

Created: 2013-09-26 Do

Emacs 24.3.1 (Org mode 8.0.2)

Validate XHTML 1.0

2013-09-26-Do-installing-scipy-and-matplotlib-on-a-bare-cluster-with-the-intel-compiler.org29.2 KB

JSON will bite us badly

JSON, the javascript object notation format, is everywhere nowadays. But there are 3 facts which will challenge its dominance.

  1. CPU cores are not getting much faster.
  2. You can rent VMs per core, and you pay per core.
  3. The network is still getting faster and cheaper, and HTTP/2 reduces the minimum cost per file.

Due to these changes, servers will become CPU bound again, and basic data structures on the web will become much more relevant. But the most efficient parsing of JSON requires guessing the final data structure while reading the data.

Therefore the changing costs will bring a comeback for binary data structures, and WebAssembly will provide efficient parsers and emitters in the clients.

Look at a typical website and count how much of the dynamic data it uses is structured data. Due to this I expect that 5 years from now, there will be celebrity talks with titles like

Scaling 10x higher with streams of structured data.

(And yes, that tech communication often works like this is a problem.)

If you have deep-rooted doubts, have a look at Towards a JavaScript Binary AST, which convinced me to finally publish this article.

(and parsing JSON is a minefield)

Memory requirement of Python datastructures: numpy array, list of floats and inner array

Easily answering the question: “How much space does this need?”


We just had the problem to find out whether a given dataset will be shareable without complex trickery. So we took the easiest road and checked the memory requirements of the datastructure.

If you have such a need, there’s always a first stop: Fire up the interpreter and try it out.

The test

We just created a three dimensional numpy array of floats and then looked at the memory requirement in the system monitor - conveniently bound to CTRL-ESC in KDE. By making the array big enough we can ignore all constant costs and directly get the cost per stored value by dividing the total memory of the process by the number of values.

All our tests are done in Python3.


For numpy we just create an array of random values cast to floats:

import numpy as np
a = np.array(np.random.random((100, 100, 10000)), dtype="float")

Also we tested what happens when we use "f4" and "f2" instead of "float" as dtype in numpy.

Native lists

For the native lists, we use the same array, but convert it to a list of lists of lists:

import numpy as np
a = [[[float(i) for i in j] for j in k] 
     for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]

Array module

Instead of using the full-blown numpy, we can also turn the inner list into an array.

import numpy as np
a = [[array.array("d", [float(i) for i in j]) for j in k] 
     for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]

The results

With a numpy array we need roughly 8 Byte per float. A linked list however requires roughly 32 Bytes per float. So switching from native Python to numpy reduces the required memory per floating point value by factor 4.

Using an inner array (via array module) instead of the innermost list provides roughly the same gains.

I would have expected factor 3: The value plus a pointer to the next and to the previous entry.

The details are in the following table.

Table 1: Memory requirement of different ways to store values in Python
  total memory per value
list of floats 3216.6 MiB 32.166 Bytes
numpy array of floats 776.7 MiB 7.767 Bytes
np f4 395.2 MiB 3.95 Bytes
np f2 283.4 MiB 2.834 Bytes
inner array 779.1 MiB 7.791 Bytes

This test was conducted on a 64 bit system, so floats are equivalent to doubles.

The scipy documentation provides a list of all the possible dtype definitions cast to C-types.


In Python large numpy arrays require 4 times less memory than a linked list structure with the same data. Using an inner array from the array module instead of the innermost list provides roughly the same gains.

Ogg Theora and h.264 - which video codec as standard for internet-video?

- Video encoder comparison - a much more thorough comparision than mine

We had a kinda long discussion on identi.ca about Ogg Theora and h.264, and since we lacked a simple comparision method, I hacked up a quick script to test them.

It uses frames from Big Buck Bunny and outputs the files bbb.ogg and bbb.264 (license: cc by).

The ogg file looks like this:

The h.264 file looks like this: download


What you can see by comparing both is that h.264 wins in terms of raw image quality at the same bitrate (single pass).

So why am I still strongly in favor of Ogg Theora?

The reason is simple:

Due to licensing costs of h.264 (a few millions per year, due from 2015 onwards) making h.264 the standard for internet video would have the effect that only big companies would be able to make a video enabled browser - or we would get a kind of video tax for free software: if you want to view internet video with free software, you have to pay for the right to use the x264 library (else the developers couldn't cough up the money to pay for the parent license). And noone but the main developers and huge corporations could distribute the x264 library, because they’d have to pay license fees for that.

And noone could hack on the browser or library and distribute the changed version, so the whole idea of free software would be led ad absurdum. It wouldn't matter that all code would be free licensed, since only those with a h.264 patent license could change it.

So this post boils down to a simple message:

“Support !theora against h.264 and #flash [as video codec for the web]. Otherwise only big companies will be able to write video browsers - or we get a h.264 tax on !fs”

Theoras raw quality may still be worse, but the license costs and their implications provide very clear reasons for supporting Theora - which in my view are far more important than raw technical stuff.

The test-script

for k in {0..1}
     do for i in {0..9}
         do for j in {0..9}
wget http://media.xiph.org/BBB/BBB-360-png/big_buck_bunny_00$k$i$j.png

mplayer -vo yuv4mpeg -ao null -nosound mf://*png -mf fps=50

theora_encoder_example -z 0 --soft-target -V 400 -o bbb.ogg stream.yuv

mencoder stream.yuv -ovc x264 -of rawvideo -o bbb.264 -x264encopts bitrate=400 -aspect 16:9 -nosound -vf scale=640:360,harddup

bbb-400bps.ogg212.88 KB
bbb-400bps.264214.39 KB
encode.sh428 Bytes

Phoronix conclusions distort their results, shown with the example of GCC vs. LLVM/Clang On AMD's FX-8350 Vishera

Phoronix recently did a benchmark of GCC vs. LLVM on AMD hardware. Sadly their conclusion did not fit the data they showed. Actually it misrepresented the data so strongly, that I decided to speak up here instead of having my comments disappear in their forums. This post was started on 2013-05-14 and got updates when things changed - first for the better, then for the worse.

Update 3 (the last straw, 2013-11-09): In the recent most blatant attack by Phoronix on copyleft programs - this time openly targeted at GNU - Michael Larabel directly misrepresented a post from Josh Klint to badmouth GDB (Josh confirmed this1). Josh gave a report of his initial experience with GDB in a Kickstarter Update in which he reported some shortcomings he saw in GDB (of which the major gripe is easily resolved with better documentation2) and concluded with “the limitations of GDB are annoying, but I can deal with it. It's very nice to be able to run and debug our editor on Linux”. Michael Larabel only quoted the conclusion up to “annoying” and abused that to support the claim that game developers (in general) call GDB “crap” and for further badmouthing of GDB. With this he provided the straw which I needed to stop reading Phoronix: Michael Larabel is hostile to copyleft and in particular to GNU and he goes as far as rigging test results3 and misrepresenting words of others to further his agenda. I even donated to Phoronix a few times in the past. I guess I won’t do that again, either. I should have learned from the error of the german pirates and should have avoided reading media which is controlled by people who want to destroy what I fight for (sustainable free software).
Update 2 (2013-07-06): But the next went down the drain again… “Of course, LLVM/Clang 3.3 still lacks OpenMP support, so those tests are obviously in favor of GCC.” — I couldn’t find a better way to say that those tests are completely useless while at the same time devaluing OpenMP support as “ignore this result along with all others where GCC wins”…
Update (2013-06-21): The recent report of GCC 4.8 vs. LLVM 3.3 looks much better. Not perfect, but much better.

Taking out the OpenMP benchmarks (where GCC naturally won, because LLVM only processes those tests single-threaded) and the build times (which are irrelevant to the speed of the produced binaries), their benchmark had the following result:

LLVM is slower than GCC by:

  • 10.2% (HMMer)
  • 12.7% (MAFFT)
  • 6.8% (BLAKE2)
  • 9.1% (HIMENO)
  • 42.2% (C-Ray)

With these results (which were clearly visible on their result summary on OpenBenchmarking, Michael Larabel from Phoronix concluded:

» The performance of LLVM/Clang 3.3 for most tests is at least comparable to GCC «

Nobu from their Forums supplied a conclusion which represents the data much better:

» GCC is much faster in anything which uses OpenMP, and moderately faster or equal in anything (except compile times) which doesn't [use OpenMP] «

But Michael from Phoronix did not stop at just ignoring the performance difference between GCC and LLVM. He went on claiming, that

In a few benchmarks LLVM/Clang is faster, particularly when it comes to build times.

And this is blatant reality-distortion which I am very tempted to ascribe to favoritism. LLVM is not “particularly” faster when it comes to build times.

LLVM on AMD FX-8350 Vishera is faster ONLY when it comes to build times!

This was not the first time that I read data-distorting conclusions on Phoronix - and my complaints about that in their forum did not change their actions. So I hope that my post here can help making them aware that deliberately distorting test results is unacceptable.

For my work, compiler performance is actually quite important, because I use programs which run for days or weeks, so 10% runtime reduction can mean saving several days - not counting the cost of using up cluster time.

To fix their blunders, what they would have to do is:

  • Avoiding Benchmarks which only one compiler supports properly (OpenMP).
  • Marking the compile time tests explicitely, so they strongly stand out from the rest, because they measure a completely different parameter than the other tests: Compiler Runtime vs. Performance of the Compiled Binaries.
  • Writing conclusions which actually fit their results.

Their current approach gives a distinct disadvantage to GCC (even for the OpenMP tests, because they convey the notion that if LLVM only had OpenMP, it would be better in everything - which as this test shows is simply false), so the compiler-tests from Phoronix work as covert propaganda against GCC, even in tests where GCC flat-out wins. And I already don’t like open propaganda, but when the propaganda gets masked as objective testing, I actually get angry.

I hope my post here can help move them towards doing proper testing again.

PS: I write so strongly here, because I actually like the tests from Phoronix a lot. I think we need rather more than less testing and their testsuite actually seems to do a good job - when given the right parameters - so seeing Phoronix distorting the tests to a point where they become almost useless (except as political tool against GCC) is a huge disappointment to me.

  1. Josh Klint from Leadwerks confirmed that Phoronix misrepresented his post and wrote a followup-post: » @ArneBab That really wasn't meant to be controversial. I was hoping to provide constructive feedback from the view of an Xcode / VS user.« » Slightly surprised my complaints about GDB are a hot topic. I can make just as many criticisms of other compilers and IDEs.« » The first 24 hours are the best for usability feedback. I figure if they notice a pattern some of those things will be improved.« » GDB Follwup «@Leadwerks, 2:04 AM - 11 Nov 13, 2:10 AM - 11 Nov 13 and @JoshKlint, 2:07 AM - 11 Nov 13, 8:48 PM - 11 Nov 13

  2. The first-impression criticism from Josh Klint was addressed by a Phoronix reader by pointing to the frame command. I do not blame Josh for not knowing all tricks: He wrote a fair account of his initial experience with GDB (and he said later that he wrote the post after less than 24 hours of using GDB, because he considers that the best time to provide feedback) and his experience can serve as constructive criticism to improve tutorials, documentation and the UI of GDB. Sadly his visibility and the possible impact of his work on free software made it possible for Phoronix to abuse a personal report as support for a general badmouthing of the tool. In contrast the full message of Josh Klint ended really positive: Although some annoyances and limitations have been discovered, overall I have found Linux to be a completely viable platform for application development. — Josh Klint, Leadwerks 

  3. I know that rigging of tests is a strong claim. The actions of Michael Larabel deserve being called rigging for three main reasons: (1) Including compile-time data along with runtime performance without clear distinction between both, even though compile-time of the full code is mostly irrelevant when you use a proper build system and compile time and runtime are completely different classes of results, (2) including pointless tests between incomparable setups whose only use is to relativate any weakness of his favorite system and (3) blatantly lying in the summaries (as I show in this article). 

Python for beginning programmers

(written on ohloh for Python)

Since we already have two good reviews from experienced programmers, I'll focus on the area I know about: Python as first language.

My experience:

  • I began to get into coding only a short time ago. I already knew about processes in programs, but not how to get them into code.
  • I wanted to learn C/C++ and failed at general structure. After a while I could do it, but it didn't feel right.
  • I tried my luck with Java and didn't quite get going.
  • Then I tried Python, and got in at once.

Advantages of Python:

  • The structure of programs can be understood easily.
  • The Python interpreter lets you experiment very quickly.
  • You can realize complex programs, but Python also allows for quick and simple scripting.
  • Code written by others is extremely readable.
  • And coding just flows - almost like natural speaking/thinking.

How it looks:

def hello(user):
    print("Hello " + user + "!")
# prints Hello Fan! on screen

As a bonus, there is the great open book How to Think Like a Computer Scientist which teaches Python and is being used for teaching Python and Programming at universities.

So I can wholeheartedly recommend Python to beginners in programming, and as the other reviews on Ohloh show, it is also a great language for experienced programmers and seems to be a good language to accompany you in your whole coding life.

PS: Yes, I know about the double meaning of "first language" :)

Recursion wins!

I recently read the little schemer and that got me thinking about recursion and loops.

After starting my programming life with Python, I normally use for-loops to solve problems. But actually they are an inferior mechanism when compared to recursion, if the language provides proper syntactic support for that. Since that claim pretty much damns Python on a theoretical level (even though it is still a very good tool in practice and I still love it!), I want to share a simplified version of the code which made me realize this.

Let’s begin with how I would write that code in Python.

res = ""
instring = False
for letter in text:
    if letter = "\"":
        # special conditions for string handling go here
        # lots of special conditions
        # and more special conditions
        # which cannot easily be moved out, 
        # because we cannot skip multiple letters
        # in one step
        instring = not instring
    if instring:
        res += letter
    # other cases

Did you spot the comment “special conditions go here”? That’s the point which damns for-loops: You cannot easily factor out these special conditions.1 In this example all the complexity is in the variable instring. But depending on the usecase, this could require lots of different states being tracked within the loop and cluttering up the namespace as well as entangling complexity from different parts of the loop.

This is how the same could be done with proper let-recursion:

; first get SRFI-71: multi-value let for syntactic support for what I
; want to do
use-modules : srfi srfi-71

let process-text
    : res ""
      letter : string-take text 1
      unprocessed : string-drop text 1
    when : equal? letter "\""
               ; all the complexity of string-handling is neatly
               ; confined in the helper-function consume-string
               : (to-res next-letter still-unprocessed) : consume-string unprocessed
                   string-append res to-res
                   . next-letter
                   . still-unprocessed
    ; other cases

The basic code for recursion is a bit longer, because the new values in the next step of the processing are given explicitly. But it is almost trivial to shell out parts of the loop to another function. It just needs to return the next state of the recursion.

And that’s what consume-string does:

define : consume-string text
        : res ""
          next-letter : string-take text 1
          unprocessed : string-drop text 1
        ; lots of special handling here
        values res next-letter unprocessed

To recite from the Zen of Python:

Explicit is better than implicit.

It’s funny to see how Guile Scheme allows me to follow that principle more thoroughly than Python.

(I love Python, but this is a case where Scheme simply wins - and I’m not afraid to admit that)

PS: Actually I found this technique when thinking about use-cases for multiple return-values of functions.

PPS: This example uses wisp-syntax for the scheme-examples to avoid killing Pythonistas with parens.

  1. While you cannot factor out parts of for loops easily, functions which pass around iterators get pretty close to the expressivity of tail recursion. They might even go a bit further and I already missed them for some scheme code where I needed to generate expressions step by step from a function which always returned an unspecified number of expressions per call. If Python continues to make it easier to use iterators, they could reduce the impact of the points I make in this article. 

2014-03-05-Mi-recursion-wins.org3.36 KB

Reducing the Python startup time

The python startup time always nagged me (17-30ms) and I just searched again for a way to reduce it, when I found this:

The Python-Launcher caches GTK imports and forks new processes to reduce the startup time of python GUI programs.

Python-launcher does not solve my problem directly, but it points into an interesting direction: If you create a small daemon which you can contact via the shell to fork a new instance, you might be able to get rid of your startup time.

To get an example of the possibilities, download the python-launcher and socat and do the following:

PYTHONPATH="../lib.linux-x86_64-2.7/" python python-launcher-daemon &
echo pass > 1
for i in {1..100}; do 
    echo 1 | socat STDIN UNIX-CONNECT:/tmp/python-launcher-daemon.socket & 

Todo: Adapt it to a given program and remove the GTK stuff. Note the & at the end: Closing the socket connection seems to be slow, so I just don’t wait for socat to finish. Breaks at somewhere over 200 simultaneous connections. Option: Use a datagram socket instead.

The essential trick is to just create a server which opens a socket. Then it reads all the data from the socket. Once it has the data, it forks like the following:

        pid = os.fork()
        if pid:

        signal.signal(signal.SIGPIPE, signal.SIG_DFL)
        signal.signal(signal.SIGCHLD, signal.SIG_DFL)

        glob = dict(__name__="__main__")
        print 'launching', program
        execfile(program, glob, glob)

        raise SystemExit

Running a program that way 100-times took just 0.23 seconds for me so the Python startup time of 17ms got reduced to 2.3ms.

You might have to switch from forking to just executing the code instead of forking if you want to be even faster and the code snippets are small. For example when running the same test without the fork and the signals, 100 executions of the same code took just 0.09s, cutting down the startup time to an impressing 0.9ms - with the cost of no longer running in parallel.

(That’s what I also do with emacsclient… My emacs takes ~30s to start (due to excessive use of additional libraries I added), but emacsclient -c shows up almost instantly.)

I tested the speed by just sending a file with the following snippet to the server:

import time
with open("2", "a") as f:
    f.write(str(time.time()) + "\n")

Note: If your script only needs the included python libraries (batteries) and no custom-installed libs, you can also reduce the startuptime by avoiding site initialization:

python -S [script]

Without -S python -c '' takes 0.018s for me. With -S I am down to

time python -S -c '' → 0.004s. 

Note that you might miss some installed packages that way. This is slower than the daemon method by up to factor 4 (4ms instead of 0.9), but still faster than the default way. Note that cold disk buffers can make the difference much bigger on the first run which is not relevant in this case but very much relevant in general for the impression of startup speed.

PS: I attached the python-launcher 0.1.0 in case its website goes down. License: GPL and MIT; included. This message was originally written at stackoverflow.

python-launcher-0.1.0.tar.gz11.11 KB

Relicensing a project from GPLv2 or later to AGPLv3 or later

Switching from GPLv2 or later to AGPL is perfectly legal. But if it is not your own project, it is often considered rude.

This does not relicense the original code, it just sets the license of new code and of the project as a whole. The old code stays GPLv2+, but when it is combined with the new code under AGPLv3 (or later), the combined project will be under AGPLv3 (or later).

However switching from GPL2+ to AGPL3(+) without consensus of all other contributors is considered rude, because it could prevent some of the original authors from using future versions of the project. Their professional use of the project might depend on the loopholes in the copyleft of the GPL.

And the ones you will want most of all as users of your fork of a mostly discontinued project are the original authors, because that can mend the split between the two versions.

This question came up in a continuation of a widely used package whose development seemed to have stalled. The discussion was unfocussed, so I decided to write succinct information for all who might find themselves in a similar situation. I will not link to them, because I do not wish to re-ignite the discussion through an influx of rehashed arguments.

Replacing man with info

GNU info is lightyears ahead of man in terms of features, with sub-pages, clickable links, topic-spanning search, clean html- and latex-export and efficient interactive navigation.

But man pages are still the de-facto standard for getting quick information on a GNU/Linux system.

This guide intends to help you change that for your system. It needs GNU texinfo >= 6.1.

Update: If you prefer vi-keys, adjust the function below to call info --vi-keys instead of plain info. You could then call that function iv

1 Advantages of man-pages over pristine info

I see strong reasons for sticking to man pages instead of info: man pages provide what most people need right away (how to use this?) and they fail fast if the topic is not available.

Their advanced features are mostly hidden away (i.e. checking the Linux programmers manual instead of checking installed programs man 2 stat vs. man stat).

Different from that, the default error state of info is to show you all the other info nodes in which you are really not interested at the moment. And man basename gives you the commandline invocation of the basename utility, while info basename gives you libc "5.8 Finding Tokens in a String".

Also man is fast. And works on most terminals, while info fails at dumb ones.

In short: man does what most users need right now, and if it can’t do that, it simply fails, so the user can try something else. That’s a huge UI advantage, but not due to an inherent limitation of GNU info. GNU Info can do the same, and even defer to man pages for stuff for which there is no info document. It just does not provide that conveniently by default.

2 Fixing GNU info with a simple bash function

GNU Info can provide the same useful interface as man. So let’s make it do that.

To keep all flexibility without needing to adjust the PATH, let’s make a bash function. That function can go into ~/.bashrc, or /etc/bash/bashrc.1 I chose the latter, because it provides the function for all accounts on the system and keeps it separate from the general setup.

The function will be called i: To get information about any thing, just call i thing.

Let’s implement that:

function i()
    INFOVERSIONLINE=$(info --version | head -n 1)
    INFOGT5=$(if test ${INFOVERSION%%.*} -gt 5; then echo true; else echo false; fi)
    # start with special cases which are quick to check for
    if test $# -lt 1; then
        # show info help notice
        info --help
    elif test $# -gt 1 && ! echo $1 | grep -q "[0-9]"; then
        # user sent complex request, but not with a section command. Just use info
        info "$@"
    elif test $# -gt 1 && echo $1 | grep -q "[0-9]"; then
        # user sent request for a section from the man pages, we must defer to man
        man "$@"
    elif test x"$1" = x"info"; then
        # for old versions of info, calling info --usage info fails to
        # provide info about calling info
        if test x"$INFOGT5" = x"true"; then
            info --usage info
            info --usage -f info-stnd
    elif test x"$1" = x"man"; then
        # info --all -w ./man fails to find the man man page
        info man
        # start with a fast but incomplete info lookup
        INFOPAGELOCATION="$(info --all -w ./"$@" | head -n 1)"
        INFOPAGELOCATION_PAGENAME="$(info --all -w "$1".info | head -n 1)"
        INFOPAGELOCATION_COREUTILS="$(info -w coreutils -n "$@")"
        # check for usage from fast info, if that fails check man and
        # if that also fails, just get the regular info page.
        if test x"${INFOPAGELOCATION}" = x"*manpages*" || test x"${INFOPAGELOCATION}" != x""; then
           info "$@"; # use info to read the known page, man or info
        elif test x"${INFOPAGELOCATION_COREUTILS}" != "x" && info -f "${INFOPAGELOCATION_COREUTILS}" -n "$@" | head -n 1 | grep -q -i "$@"; then
            # coreutils utility
            info -f "${INFOPAGELOCATION_COREUTILS}" -n "$@"
        elif test x"${INFOPAGELOCATION}" = x"" && test x"${INFOPAGELOCATION_PAGENAME}" = x""; then
           # unknown to quick search, try slow search or defer to man.
           # TODO: it would be nice if I could avoid this double search.
           if test x"$(info -w "$@")" = x"*manpages*"; then
               info "$@"
               # defer to man, on error search for alternatives
               man "$@" || (echo nothing found, searching info ... && \
                            while echo $1 | grep -q '^[0-9]$'; do shift; done && \
                            info -k "$@" && false)
        elif test x"${INFOPAGELOCATION_PAGENAME}" != x""; then
             # search for alternatives (but avoid numbers)
           info --usage -f "${INFOPAGELOCATION_PAGENAME}" 2>/dev/null || man "$@" ||\
             (echo searching info &&\
              while echo $1 | grep -q '^[0-9]$'; do shift; done && \
              info -k "$@" && false)            
        else # try to get usage instructions, then try man, then
             # search for alternatives (but avoid numbers)
           info --usage -f "${INFOPAGELOCATION}" 2>/dev/null || man "$@" ||\
             (echo searching info &&\
              while echo $1 | grep -q '^[0-9]$'; do shift; done && \
              info -k "$@" && false)
        # ensure that unsuccessful requests report an error status
        if test ${INFORETURNVALUE} -eq 0; then
            unset INFORETURNVALUE
            return 0
            unset INFORETURNVALUE
            return 1

3 Examples

Let’s see what that gives us.

3.1 First check: Getting info on info:

i info | head
echo ...
Next: Cursor Commands,  Prev: Stand-alone Info,  Up: Top

2 Invoking Info

GNU Info accepts several options to control the initial node or nodes
being viewed, and to specify which directories to search for Info files.
Here is a template showing an invocation of GNU Info from the shell:

     info [OPTION...] [MANUAL] [MENU-OR-INDEX-ITEM...]

3.2 Second check: Some random GNU command

i grep | head | sed 's/\[[0-9]*m//g' # stripping simple colors
echo ...
Next: Regular Expressions,  Prev: Introduction,  Up: Top

2 Invoking ‘grep’

The general synopsis of the ‘grep’ command line is


There can be zero or more OPTIONS.  PATTERN will only be seen as such

Note: If there’s a menu at the bottom, you can jump right to it’s entries by hitting the m key.

3.3 Utility which also exists as libc function

Checking for i stat gives us the stat command:

i stat | head
Next: sync invocation,  Prev: du invocation,  Up: Disk usage

14.3 ‘stat’: Report file or file system status

‘stat’ displays information about the specified file(s).  Synopsis:

     stat [OPTION]… [FILE]…

   With no option, ‘stat’ reports all information about the given files.

…while checking for i libc stat gives us the libc function:

i libc stat | head
Next: Testing File Type,  Prev: Attribute Meanings,  Up: File Attributes

14.9.2 Reading the Attributes of a File

To examine the attributes of files, use the functions 'stat', 'fstat'
and 'lstat'.  They return the attribute information in a 'struct stat'
object.  All three functions are declared in the header file

3.4 Something which only has a man-page

i man cleanly calls info man.

i man | head | sed "s,\x1B\[[0-9;]*[a-zA-Z],,g" # stripping colors
man(1)                      General Commands Manual                     man(1)

       man  -  Formatieren  und Anzeigen von Seiten des Online-Handbuches (man
       manpath - Anzeigen  des  Benutzer-eigenen  Suchpfades  für  Seiten  des
       Online-Handbuches (man pages)

3.5 A request for a man page section

i 2 stat cleanly defers to man 2 stat

i 2 stat | head | sed "s,\x1B\[[0-9;]*[a-zA-Z],,g" # stripping colors
STAT(2)                    Linux Programmer's Manual                   STAT(2)

       stat, fstat, lstat, fstatat - get file status

       #include <sys/types.h>
       #include <sys/stat.h>

3.6 Something unknown

In case there is no info directly available, do a keyword search and propose sources.

i em | head
echo ...
nothing found, searching info ...
"(emacspeak)Speech System" -- speech system
"(cpio)Copy-pass mode" -- copy files between filesystems
"(tar)Basic tar" -- create, complementary notes
"(tar)problems with exclude" -- exclude, potential problems with
"(tar)Basic tar" -- extract, complementary notes
"(tar)Incremental Dumps" -- extract, using with --listed-incremental
"(tar)Option Summary" -- incremental, summary
"(tar)Incremental Dumps" -- incremental, using with --list
"(tar)Incremental Dumps" -- list, using with --incremental

4 Summary

i thing gives you info on some thing. It makes using info just as convenient as using man.

Its usage even beats man in convenience, since it defers to man if needed, offers alternatives and provides named categories instead of having to remember the handbook numbers to find the right function.

And as developer you can use texinfo to provide high quality documentation in many formats. You can even include a comprehensive tutorial in your documentation while still enabling your users to quickly reach the information they need.

We had this all along, except for a few nasty roadblocks. Here I did my best to eliminate these roadblocks.



Or it can go into /etc/bash/bashrc.d/info.sh (if you have a bashrc directory). That is the cleanest option.

2016-09-12-Mo-replacing-man-with-info.org10.46 KB

Screencast: Tabbing of everything in KDE

I just discovered tabbing of everything in KDE:


Created with recordmydesktop, cut with kdenlive, encoded to ogg theora with ffmpeg2theora (encoding command).

Music: Beat into Submission on Public Domain by Tryad.

To embed the video on your own site you can simply use:


If you do so, please provide a backlink here.

License: cc by-sa, because that’s the license of the song. If you omit the audio, you can also use one of my usual free licenses (or all of them, including the GPL). Here’s the raw recording (=video source).

¹: Feel free to upload the video to youtube or similar. I license my stuff under free licenses to make it easy for everyone to use, change and spread them.

²: Others have shown this before, but I don’t mind that. I just love the feature, so I want to show it :)

³: The command wheel I use for calling programs is the pyRad.

screencast-tabbing-everywhere-kde.ogv10.75 MB

Simple daemon with start-stop-daemon and runit


PDF (to print)

Org (source)

Creating a daemon with almost zero effort.


The example with the start-stop-daemon uses Gentoo OpenRC as root.

The simplest daemon we can create is a while loop:

echo '#!/bin/sh' > whiledaemon.sh
echo 'while true; do true; done' >> whiledaemon.sh
chmod +x whiledaemon.sh

Now we start it as daemon

start-stop-daemon --pidfile whiledaemon.pid \
--make-pidfile --background ./whiledaemon.sh

Top shows that it is running:

top | grep whiledaemon.sh

We stop it using the pidfile:

start-stop-daemon --pidfile whiledaemon.pid \
--stop ./whiledaemon.sh

That’s it.

Hint: To add cgroups support on a Gentoo install, open /etc/rc.conf and uncomment


Then in the initscript you can set the other variables described below that line. Thanks for this hint goes to Luca Barbato!

If you want to ensure that the daemon keeps running without checking a PID file (which might in some corner cases fail because a new process claims the same PID), we can use runsvdir from runit.

daemon with runit

Minimal examples for runit daemons - first as unpriviledged user, then as root.

runit as simple user

Create a script which dies

echo '#!/usr/bin/env python\nfor i in range(100): a = i*i' >/tmp/foo.py
chmod +x /tmp/foo.py

Create the daemon folder

mkdir -p ~/.local/run/runit_services/python
ln -sf /tmp/foo.py ~/.local/run/runit_services/python/run

Run the daemon via runsvdir

runsvdir ~/.local/run/runit_services

Manage it with sv (part of runit)

# stop the running daemon
SVDIR=~/.local/run/runit_services/ sv stop python
# start the service (it shows as `run` in top)
SVDIR=~/.local/run/runit_services/ sv start python

runit as root

Minimal working example for setting up runit as root - like a sysadmin might do it.

echo '#!/usr/bin/env python\nfor i in range(100): a = i*i' >/tmp/foo.py &&
    chmod +x /tmp/foo.py &&
    mkdir -p /run/arne_service/python &&
    printf '#!/bin/sh\nexec /tmp/foo.py' >/run/arne_service/python/run &&
    chmod +x /run/arne_service/python/run &&
    chown -R arne /run/arne_service &&
    su - arne -c 'runsvdir /run/arne_service'

Or without bash indirection (giving up some flexibility we don’t need here)

echo '#!/usr/bin/env python\nfor i in range(100): a = i*i' >/tmp/foo.py && 
    chmod +x /tmp/foo.py &&
    mkdir -p /run/arne_service/python &&
    ln -s /tmp/foo.py /run/arne_service/python/run &&
    chown -R arne /run/arne_service &&
    su - arne -c 'runsvdir /run/arne_service'
2015-04-15-Mi-simple-daemon-openrc.org2.92 KB
2015-04-15-Mi-simple-daemon-openrc.pdf152.99 KB

Simple positive trust scheme with threshholds

Update: I nowadays think that voting down is useful, but only for protection against spam and intentional disruption of communication. Essentially a distributed function to report spam.

I don’t see a reason for negative reputation schemes — voting down is in my view a flawed concept.

The rest of this article is written for freetalk inside freenet, and also posted there with my nonanonymous ID.

That just allows for community censorship, which I see as incompatible with the goals of freenet.

Would it be possible to change that to use only positive votes and a threshhold?

  • If I like what some people write, I give them positive votes.
  • If I get too much spam, I increase the threshhold for all people.
  • Effective positive votes get added. It suffices that some people I trust also trust someone else and I’ll see the messages.
  • Effective trust is my trust (0..1) · the trust of the next in the chain (0..1) · …


  • Zwister trusts Alice and Bob.
  • Alice trusts Lilith.
  • Bob hates Lilith.

In the current scheme (as I understand it), zwister wouldn’t see posts from Lilith.

In a pure positive scheme, zwister would see the posts. If zwister wants to avoid seeing the posts from Lilith, he has to untrust Alice or ask Alice to untrust Lilith. Add to that a personal (and not propagating) blocking option which allows me to “never see anything from Lilith again”.

Bob should not be able to interfere with me seeing the messages from Lilith, when Alice trusts Lilith.

If zwisters trust for Alice (0..1) multiplied with Alices trust for Lilith (0..1) is lower than zwisters threshhold, zwister doesn’t see the messages.

PS: somehow adapted from Credence, which would have brought community spam control to Gnutella, if Limewire had adopted it.

PPS: And adaption for news voting: You give positive votes on news which show up. Negative votes assign a private threshhold to the author of the news, so you then only see news from that author which enough people vote for.

Simple steps to attach the GNU Public License (GPL) to your project

Here's the simple steps to attach a GPL license to your source files (written after requests by DiggClone and Bandnet):

For your own project, just add the following text-notice to the header/first section of each of your source-files, commented out in whatever way your language uses:

----------------following is the notice-----------------
* Your Project Name - -you slogan-
* Copyright (C) 2007 - 2007 Your Name
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* GNU General Public License for more details.
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
the "2007 - 2007" needs to be adjusted to "year when you gave it the license in the first place" - "current year".

Then put the file gpl.txt into the source-folder or a docs folder: http://www.gnu.org/licenses/gpl.txt

If you are developing together with other people, you need their permission to put the project under the GPL.


Just for additional Info, I found this license comparision paper by sun: http://mediacast.sun.com/share/webmink/SunLicensingWhitePaper042006.pdf

And comments to it: http://blogs.sun.com/webmink/entry/open_source_licensing_paper#comments

It does look nice, but it misses one point:

GPL is trust: Contributors can trust, that their contributions will keep helping the community, and that the software they contribute to will keep being accessible for the community.

(That's why I decided some years ago to only support GPL projects. My contributions to one semi-closed project got lost, because the project wasn't free and the developer just decided not to offer them anymore, and I could only watch hundreds of hours of work disappear, and that hurt.)

Best wishes,
PS: If anything's missing, please write a comment!

Some Python Programs of mine

heavily outdated page. See bitbucket.org/ArneBab for many more projects…


I created some projects with pyglet and some tools to facilitate 2D
game development (for me), and I though you might be interested.

  • babglet: basic usage of pyglet for 2D games with optional collision
    detection and avoidance.
  • blob_swarm: a swarm of blobs with emerging swarm behaviour through only pair relations.
  • blob_battle: a duel-style battle between two blobs (basic graphics,
    control and movement done)
  • fuzzy_collisions: 2 groups of blobs. One can be controlled. When two
    blobs collide, they move away a (random) bit to avoid the collision.

They are avaible from the rpg-1d6 project on sourceforge:
-> https://sf.net/projects/rpg-1d6/

The download can be found at the sf.net download page:
-> https://sourceforge.net/project/showfiles.php?group_id=199744

Strengths and weaknesses of Python

a reply I wrote on quora.

Python is easy to learn and low ceremony. Both are pretty hard targets to hit. It also has great libraries for scientific work, for system scripting and for web development — and for most everything else. And it is pragmatic in a sense: It gets stuff done. And in a way which others can typically understand easily. Which is an even harder target to hit, especially with low ceremony languages. If you look for reasons, import this aka PEP 20 -- The Zen of Python is a good start.

Python has rightfully been called “Pseudocode which actually runs”. There’s often no need for pseudocode if you can show some Python.

However it has its weaknesses. Many here already talked about performance. I won’t go there, because you can fix most of that with cython, pypy and time (as the javascript engines in Browsers show which often get 50% of the speed of optimized C). What irks me are some limitations in its syntax which I begun to hit more and more about two years ago.

List comprehensions make actual code more complicated than simple examples, because you have kind of a dual syntax to it. And there is some ceremony in tools which were added later. For example this is the template I nowadays use to start a Python project: a minimal Python script — this could be part of the language so that I would not even need to put it into the script. But this is not how history works: It cannot break backwards compatibility (a fate which hits all useful and widespread programming languages). Also things like having to spell out the underscore names feel more and more strange to me. Therefore I started into Guile Scheme to see how different programming could be if I shed the constraints of Python. You can read my journey in py2guile: Going from Python to Guile Scheme - a natural progression (a free ebook).

Also see my other Python-articles on this site.

Surprising behaviour of Fortran (90/95)

1 Introduction

I recently started really learning Fortran (as opposed to just dabbling with existing code until it did what I wanted it to).

Here I document the surprises I found along the way.

If you want a quick start into Fortran, I’d suggest to begin with the tutorial Writing a commandline tool in Fortran and then to come back here to get the corner cases right.

As reference: I come from Python, C++ and Lisp, and I actually started to like Fortran while learning it. So the horror-stories I heard while studying were mostly proven wrong. I uploaded the complete code as base60.f90.

2 Testing Skelleton

This is a code sample for calculating a base60 value from an integer.

The surprises are taken out of the program and marked with double angle brackets («surprise»). They are documented in the chapter Surprises.

program base60
  ! first step: Base60 encode. 
  ! reference: http://faruk.akgul.org/blog/tantek-celiks-newbase60-in-python-and-java/
  ! 5000 should be 1PL
  implicit none
end program base60
  implicit none
  !!! preparation
  ! work variables
  integer :: n = 0
  integer :: remainder = 0
  ! result
  ! actual algorithm
  if (number == 0) then
  end if
  ! calculate the base60 string
  n = number ! the input argument: that should be safe to use.
  ! catch number = 0
  do while(n > 0)
     remainder = mod(n, 60)
     n = n/60
     ! write(*,*) number, remainder, n
  end do

2.1 Helpers

write(*,*) 0, trim(numtosxg(0))
write(*,*) 100000, trim(numtosxg(100000))
write(*,*) 1, trim(numtosxg(1))
write(*,*) 2, trim(numtosxg(2))
write(*,*) 60, trim(numtosxg(60))
write(*,*) 59, trim(numtosxg(59))

3 Surprises

3.1 I have to declare the return type of a function in the main program and in the function

! I have to declare the return type of the function in the main program, too.
character(len=1000) :: numtosxg
character(len=1000) function numtosxg( number )

Alternatively to declaring the function in its header, I can also declare its return type in the declaration block inside the function body:

function numtosxg (number)
  character(len=1000) :: numtosxg
end function numtosxg

3.2 Variables in Functions accumulate over several function calls

This even happens, when I initialize the variable when I declare it:

character(len=1000) :: res = ""

Due to that I have to begin the algorithm with resetting the required variable.

res = " " ! I have to explicitely set res to " ", otherwise it
          ! accumulates the prior results!

This provides a hint that initialization in a declaration inside a function is purely compile-time.

program accumulate
  implicit none
  integer :: acc
  write(*,*) acc(), acc(), acc() ! prints 1 2 3
end program accumulate

integer function acc()
  implicit none
  integer :: ac = 0
  ac = ac + 1
  acc = ac
end function acc
program accumulate
  implicit none
  integer :: acc
  write(*,*) acc(), acc(), acc() ! prints 1 1 1
end program accumulate

integer function acc()
  implicit none
  integer :: ac
  ac = 0
  ac = ac + 1
  acc = ac
end function acc

3.3 parameter vs. intent(in)

Defining a variable as parameter gives a constant, not an unchanged function argument:

! constants: marked as parameter: not function parameters, but
! algorithm parameters!
character(len=61), parameter :: base60chars = "0123456789"&

An argument the function is not allowed to change is defined via intent(in):

! input: ensure that this is purely used as input.
! intent is only useful for function arguments.
integer, intent(in) :: number

3.4 To return values from functions, assign the value to the function itself

This feels surprisingly obvious, but it was surprising to me nontheless.

numtosxg = "0"

The return statement is only needed when returning within a function. At the end of the function it is implied.

  numtosxg = res
end function numtosxg

3.5 Fortran array indizes start at 1 - and are inclusive

For an algorithm like the example base60, where 0 is identified by the first character of a string, this requires adding 1 to the index.

! note that fortran indizes start at 1, not at 0.
res = base60chars(remainder+1:remainder+1)//trim(res)

Also note that the indizes are inclusive. The following actually gets the single letter at index n+1:


In python on the other hand, the second argument of the array is exclusive, so to get the same result you would use [n:n+1]:


3.6 I have to trim strings when concatenating

It is necessary to get rid of trailing blanks (whitespace) from the last char to the end of the declared memory space, otherwise there will be huge gaps in combined strings - or you will get missing characters.

program test
  character(len=5) :: res
  write(*,*) res ! undefined. In the last run it gave me null-bytes, but
                 ! that is not guaranteed.
  res = "0"
  write(*,*) res ! 0
  res = trim(res)//"a"
  write(*,*) res ! 0a
  res = res//"a"
  write(*,*) res ! 0a: trailing characters are silently removed.
  ! who else expected to see 0aa?
  write(res, '(a, "a")') trim(res) ! without trim, this gives an error!
                                   ! *happy*
  write(*,*) res
end program test

Hint from Alexey: use trim(adjustl(…)) to get rid of whitespace on the left and the right side of the string. Trim only removes trailing blanks.

Author: Arne Babenhauserheide

Emacs 24.3.1 (Org mode 8.0.2)

surprises.org8.42 KB
accumulate.f90226 Bytes
accumulate-not.f90231 Bytes
base60-surprises.f901.6 KB
trim.f90501 Bytes
surprises.pdf206.83 KB
surprises.html22.47 KB
base60.f902.79 KB

Tail Call Optimization (TCO), dependency, broken debug builds in C and C++ — and gcc 4.8

TCO: Reducing the algorithmic complexity of recursion.
Debug build: Add overhead to a program to trace errors.
Debug without TCO: Obliterate any possibility of fixing recursion bugs.

“Never develop with optimizations which the debug mode of the compiler of the future maintainer of your code does not use.”°

UPDATE: GCC 4.8 gives us -Og -foptimize-sibling-calls which generates nice-backtraces, and I had a few quite embarrassing errors in my C - thanks to AKF for the catch!

1 Intro

Tail Call Optimization (TCO) makes this

def foo(n):
    return foo(n+1)

behave like this

def foo(n):
    return n+1
n = 1 while True: n = foo(n)

Table of Contents

I recently told a colleague how neat tail call optimization in scheme is (along with macros, but that is a topic for another day…).

Then I decided to actually test it (being mainly not a schemer but a pythonista - though very impressed by the possibilities of scheme).

So I implemented a very simple recursive function which I could watch to check the Tail Call behaviour. I tested scheme (via guile), python (obviously) and C++ (which proved to provide a surprise).

2 The tests

2.1 Scheme

(define (foo n)
  (display n)
  (foo (1+ n)))

(foo 1)

2.2 Python

def foo(n):
    print n
    return foo(n+1)


2.3 C++

The C++ code needed a bit more work (thanks to AKF for making it less ugly/horrible!):

#include <stdio.h>

int recurse(int n)
  printf("%i\n", n);
  return recurse(n+1);

int main()
  return recurse(1);

Additionally to the code I added 4 different ways to build the code: Standard optimization (-O2), Debug (-g), Optimized Debug (-g -O2), and only slightly optimized (-O1).

all : C2 Cg Cg2 C1

# optimized
C2 : tailcallc.c
    g++ -O2 tailcallc.c -o C2

# debug build
Cg : tailcallc.c
    g++ -g tailcallc.c -o Cg

# optimized debug build
Cg2 : tailcallc.c
    g++ -g -O2 tailcallc.c -o Cg2

# only slightly optimized
C1 : tailcallc.c
    g++ -O1 tailcallc.c -o C1

3 The results

So now, let’s actually check the results. Since I’m interested in tail call optimization, I check the memory consumption of each run. If we have proper tail call optimization, the required memory will stay the same over time, if not, the function stack will get bigger and bigger till the program crashes.

3.1 Scheme

Scheme gives the obvious result. It starts counting numbers and keeps doing so. After 10 seconds it’s at 1.6 million, consuming 1.7 MiB of memory - and never changing the memory consumption.

3.2 Python

Python is no surprise either: it counts to 999 and then dies with the following traceback:

Traceback (most recent call last):
 File "tailcallpython.py", line 6, in <module>
 File "tailcallpython.py", line 4, in foo
   return foo(n+1)
… repeat about 997 times …
RuntimeError: maximum recursion depth exceeded

Python has an arbitrary limit on recursion which keeps people from using tail calls in algorithms.

3.3 C/C++

C/C++ is a bit trickier.

First let’s see the results for the optimized run:

3.3.1 Optimized

g++ -O2 C.c -o C2

Interestingly that runs just like the scheme one: After 10s it’s at 800,000 and consumes just 144KiB of memory. And that memory consumption stays stable.

3.3.2 Debug

So, cool! C/C++ has tail call optimization. Let’s write much recursive tail call using code!

Or so I thought. Then I did the debug run.

g++ -g C.c -o Cg

It starts counting just like the optimized version. Then, after about 5 seconds and counting to about 260,000, it dies with a segmentation fault.

And here’s a capture of its memory consumption while it was still running (thanks to KDEs process monitor):


7228 KB   [stack]
56 KB [heap]
40 KB /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.2/libstdc++.so.6.0.17
24 KB /lib64/libc-2.15.so
12 KB /home/arne/.emacs.d/private/journal/Cg


352 KB    /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.2/libstdc++.so.6.0.17
252 KB    /lib64/libc-2.15.so
108 KB    /lib64/ld-2.15.so
60 KB /lib64/libm-2.15.so
16 KB /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.2/libgcc_s.so.1

That’s 7 MiB after less than 5 seconds runtime - all of it in the stack, since that has to remember all the recursive function calls when there is no tail call optimization.

So we now have a program which runs just fine when optimized but dies almost instantly when run in debug mode.

But at least we have nice gdb traces for the start:
recurse (n=43) at C.c:5
5         printf("%i\n", n);
6         return recurse(n+1);

3.4 Optimized debug build

So, is all lost? Luckily not: We can actually specify optimization with debugging information.

g++ -g -O2 C.c -o Cg2

When doing so, the optimized debug build chugs along just like the optimized build without debugging information. At least that’s true for GCC.

But our debug trace now looks like this:
5         printf("%i\n", n);
printf (__fmt=0x40069c "%i\n") at /usr/include/bits/stdio2.h:105
105       return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
6         return recurse(n+1);
That’s not so nice, but at least we can debug with tail call optimization. We can also improve on this (thanks to AKF for that hint!): We just need to enable tail call optimization separately:
g++ -g -O1 -foptimize-sibling-calls C.c -o Cgtco
But this still gives ugly backtraces (if I leave out -O1, it does not do TCO). So let’s turn to GCC 4.8 and use -Og.
g++ -g -Og -foptimize-sibling-calls C.c -o Cgtco
And we have nice backtraces!
recurse (n=n@entry=1) at C.c:4
4       {
5         printf("%i\n", n);
6         return recurse(n+1);
5         printf("%i\n", n);
6         return recurse(n+1);

3.5 Optimized for size

Can we invert the question? Is all well, now?

Actually not…

If we activate minor optimization, we get the same unoptimized behaviour again.

g++ -O1 C.c -o C1

It counts to about 260,000 and then dies from a stack overflow. And that is pretty bad™, because it means that a programmer cannot trust his code to work when he does not know all the optimization strategies which will be used with his code.

And he has no way to define in his code, that it requires TCO to work.

4 Summary

Tail Call Optimization (TCO) turns an operation with a memory requirement of O(N)1 into one with a memory requirement of O(1).

It is a nice tool to reduce the complexity of code, but it is only safe in languages which explicitely require tail call optimization - like Scheme.

And from this we can find a conclusion for compilers:

C/C++ compilers should always use tail call optimization, including debug builds, because otherwise C/C++ programmers should never use that feature, because it can make it impossible to use certain optimization settings in any code which includes their code.

And as a finishing note, I’d like to quote (very loosely) what my colleague told me from some of his real-life debugging experience:

“We run our project on an AIX ibm-supercomputer. We had spotted a problem in optimized runs, so we activated the debugger to trace the bug. But when we activated debug flags, a host of new problems appeared which were not present in optimized runs. We tried to isolate the problems, but they only appeared if we ran the full project. When we told the IBM coders about that, they asked us to provide a simple testcase… The problems likely happened due to some crazy optimizations - in our code or in the compiler.”

So the problem of undebuggable code due to a dependency of the program on optimization changes is not limited to tail call optimization. But TCO is a really nice way to show it :)

Let’s use that to make the statement above more general:

C/C++ compilers should always do those kinds of optimizations which lead to changes in the algorithmic cost of programs.

Or from a pessimistic side:

You should only rely on language features, which are also available in debug mode - and you should never develop your program with optimization turned on.

And by that measure, C/C++ does not have Tail Call Optimization - at least until all mainstream compilers include TCO in their default options. Which is a pretty bleak result after the excitement I felt when I realized that optimizations can actually give C/C++ code the behavior of Tail Call Optimization.

Never develop with optimizations which the debug mode of the compiler of the future maintainer of your code does not use.ABNever develop with optimizations which are not required by the language standard.

Note, though, that GCC 4.8 added the -Og option, which improves the debugging a lot (Phoronix wrote about plans for that last september). It still does not include -foptimize-sibling-calls in -Og, but that might be only a matter of time… I hope it is.


1 : O(1) and O(N) describe the algorithmic cost of an algorithm. If it is O(N), then the cost rises linearly with the size of the problem (N is the size, for example printing 20,000 consecutive numbers). If it is O(1), the cost is stable regardless of the size of the problem.

Top 5 systemd troubles - a strategic view for distros

systemd is a new way to start a Linux-system with the expressed goal of rethinking all of init. These are my top 5 gripes with it. (»skip the updates«)

Update (2019): I now use GNU Guix with shepherd. That’s one more better option than systemd. In that it joins OpenRC and many others.

Update (2016-09-28): Systemd is an exploit kit just waiting to be activated. And once it is active, only those who wrote it will be able to defuse it — and check whether it is defused. And it is starting: How to crash systemd in one tweet? Alternatives? Use OpenRC for system services. That’s simple and fast and full-featured with minimal fuss. Use runit for process supervision of user-services and system-services alike.

Update (2014-12-11): One more deconstruction of the strategies around systemd: systemd: Assumptions, Bullying, Consent. It shows that the attitude which forms the root of the dangers of systemd is even visible in its very source code.

Update (2014-11-19): The Debian General Resolution resulted in “We do not need a general resolution to decide systemd”. The vote page provides detailed results and statistics. Ian Jackson resigned from the Technical Committee: “And, speaking personally, I am exhausted.”

Update (2014-10-16): There is now a vote on a General Resolution in Debian for preserving the ability to switch init systems. It is linked under “Are there better solutions […]?” on the site Shall we fork Debian™? :^|.

Update (2014-10-07): Lennart hetzt (german) describes the rhetoric tricks used by Lennart Poettering to make people forget that he is a major part of the communication problems we’re facing at times - and to hide valid technical, practical, pragmatical, political und strategical criticism of Systemd.

Update (2014-09-24): boycott systemd calls for action with 12 reasons against systemd: “We do recognize the need for a new init system in the 21st century, but systemd is not it.”

Update (2014-04-03): And now we have Julian Assange warning about NSA control over Debian, Theodore Ts’o, maintainer of ext4, complaining about incomprehensible systemd, and Linus Torvalds (you know him, right?) rant against disrupting behavior from systemd developers, going as far as refusing to merge anything from the developers in question into Linux. Should I say “I said so”? Maybe not. After all, I came pretty late. Others saw this trend 2 years before I even knew about systemd. Can we really assume that there won’t be intentional disruption? Maybe I should look for solutions. It could be a good idea to start having community-paid developers.

Update (2014-02-18): An email to the mailing list of the technical committee of debian summarized the strategic implications of systemd-adoption for Debian and RedHat. It was called conspiracy theory right away, but the gains for RedHat are obvious: RedHat would be dumb not to try this. And only a fool trusts a company. Even the best company has to put money before ethics.

Update (2013-11-20): Further reading shows that people have been giving arguments from my list since 2011, and they got answers in the range of “anything short of systemd is dumb”, “this cannot work” (while OpenRC clearly shows that it works well), requests for implementation details without justification and insults and further insults; but the arguments stayed valid for the last 2 years. That does not look like systemd has a friendly community - or is healthy for distributions adopting it. Also an OpenRC developer wrote the best rebuttal of systemd propaganda I read so far: “Alternativlos”: Systemd propaganda (note, though, that I am biased against systemd due to problems I had in the past with udev kernel-dependencies)

  • Losing Control: systemd does so many crucial things itself that the developers of distributions lose their control over the init process: If systemd developers decide to change something, the distributions might actually have to fork systemd and keep the fork up-to-date, and this requires rare skills and lots of resources (due to the pace of systemd). See the Gentoo eudev-Project for a case where this had to happen so the distribution could keep providing features its users rely on. Systemd nowadays incorporates udev. Go reason how systemd devs will act.1 Why losing control is a bad idea: Strategy Letter V: Commodities

  • No scripts (as if you can know beforehand all the things the init system will need to do in each distribution). Nowadays any system should be user-extendable to avoid bottlenecks for development. This essentially boils down to providing a scripting language. Using the language which almost every system administrator knows is a very sane choice for that - and means making it possible to use Shell-Scripts to extend the init-system. Scripts mean that the distribution will never be in a position where it is blocked because it absolutely can’t provide a given fringe feature. And as the experiment with paludis in Gentoo shows, an implementation in C isn’t magically faster than one in a scripting language and can actually be much slower (just compare paludis to pkgcore), because the execution time of the language only very rarely is the real bottleneck - and you can easily shell out that part to a faster language with negligible time loss,2 especially in shell-scripts (pun partially intended). While systemd can be told to run a shell script, this requires a mental context switch and the script cannot tie into all the machinery inside systemd. If there’s a bug in systemd, you need to fix systemd, if you need more than systemd provides out of the box, you need either a script or you have to patch systemd, and otherwise you write in a completely different language (so most people won’t have the skills to go beyond the fences of the ground defined by the systemd developers as proper for users). Why killing scripts is a bad idea: Bloatware and the 80/20 Myth

  • Linux-specific3 (are you serious??). This makes the distribution an add-on to the kernel instead of the distribution being a focus point of many different development efforts. This is a second point where distributions become commodities, and as for systemd itself, this is against the interest of the distributions. On the other hand, enabling the use of many different kernels strengthens the Distribution - even if currently only few people are using them. Why being Linux-only is a bad idea for distributions: Strategy Letter V: Commodities

  • Requiring an up-to-date kernel. This problem already gives me lots of headaches for my OLPC due to udev (from the same people as systemd… which is one of the reasons why I hope that Gentoo-devs will succeed with eudev), since it is not always easy to go to a newer kernel when you’re on a fringe platform (I’m currently fighting with that). An init system should not require some special kernel version just to boot… Why those hard dependencies are a bad idea: Bloatware and the 80/20 Myth AND Strategy Letter V: Commodities

  • Requiring D-Bus. D-Bus was already broken a few times for me, and losing not just some KDE functionality but instead making my system unbootable is unacceptable. It’s bad enough that so much stuff relies on udev.4

In my understanding, we need more services which can survive without the others, so the system gets resilient against failures in a given part. As the system gets more and more complex, this constantly gets more important: Less interdependencies, and the services which are crucial to get my system in a debuggable state should be small and simple - and should not require many changes to implement new features.

Having multiple tools to solve the same problem looks like wasted resources, but actually this extends the range of problems which can be solved with our systems and avoids bottlenecks and single points of failure (either tools or communities), so it makes us resilient. Also it encourages standard-formats to minimize the cost of maintaining several systems side-by-side.

You can see how systemd manages to violate all these principles…

This does not mean, that the features provided by systemd are useless. It says that the way they are embedded in systemd with its heavy dependencies is detrimental to a healthy distribution.

Note: I am neither a developer of systemd, nor of upstart, sysvinit or OpenRC. I am just a humble user of distributions, but I can recognize impending horrible fallout when I see it.


I’ll finish this with a quote from 30 myths about systemd, written by the systemd developers themselves:

We try to get rid of many of the more pointless differences of the various distributions in various areas of the core OS. As part of that we sometimes adopt schemes that were previously used by only one of the distributions and push it to a level where it's the default of systemd, trying to gently push everybody towards the same set of basic configuration.
— Lennart Poettering, main developer of systemd

I could not show much clearer why distributions should be very wary about systemd than Lennart Poettering does here in the post where he tries to refute myths about systemd.

PS: I’m definitely biased against systemd, after having some horrifying experiences with kernel-dependencies in udev. Resilience looks different. And I already modified some init scripts to adjust my systems behavior so it better fits my usecase. Now go and call me part of a fringe group which wants to add “pointless differences” to the system. If you force Gentoo devs to issue a warning in the style of “you MUST activate feature X in your kernel, else your system will become unbootable”, this should be a big red flag to you that you’re doing something wrong. If you do that twice, this is a big red flag to users not to trust your software. And regaining that trust requires reestablishing a long record of solid work. Which I do not see at the moment. Also do read Bloatware and the 80/20 Myth (if you didn’t do that by now): It might be true that 80% of the users only use 20% of the features, but they do not use the same 20%.

  1. Update 2014: Actually there is no need to guess how the systemd developers will act: They showed (again) that they will keep breaking systems of their users: “udev now silently fails to do anything useful if devtmpfs is missing, almost as if resilience was a disease” — bonsaikitten, Gentoo developer, 2014-01, long after udev was subsumed into systemd. 

  2. Running a program in a subshell increases the runtime by just six milliseconds. I measured that when testing ways to run GNU Guile modules as scripts. So you have to start almost 100 subshells during bootup to lose half a second of runtime. Note that OpenRC can boot a system and power down again in under 0.7 seconds and the minimal boot-to-login just takes 250 ms. There is no need for systemd to get a faster boot. 

  3. The systemd proponents in the debian initsystem discussion explicitly stated that they don’t want to port systemd to other kernels. 

  4. And D-Bus is slow, slow, slow when your system is under heavy memory and IO-pressure, as my systems tend to be (I’m a Gentoo user. I often compile a new version of all KDE-components or of Firefox while I do regular work on the computer). From dbus I’m used to reaction times up to several seconds… 

Translating a lookup-dictionary to bash: Much simpler than I thought

I wanted to name Transcom Regions in my plots by passing their names to the command-line tool, but I only had their region-number and a lookup dictionary in Python. To avoid tampering with the tool, I needed to translate the dictionary to a bash function, and thanks to the case statement it was much simpler than I had expected.

This is the original dictionary:

#: Names of transcom regions
transcomregionnames = {
    1: "NAM Boreal",
    2: "NAM Temperate",
    3: "South American tropical",
    # and so forth

This is how lookup works in Python:

region = 2
name = transcomregionnames[2]

The solution in bash is a simple mechanic translation:

function regionname () {
    case $number in
        1) echo "NAM Boreal";;
        2) echo "NAM Temperate";;
        3) echo "South American tropical";;
        # and so forth

And the lookup is easier than anything I hoped for:

name=$(regionname $region)

This is how it looks in my actual code:

for region in {1..22} ; do ./plotstation.py -c /home/arne/sun-work/ct-tccon/ct-tccon-2015-5x7-use-obspack-no-tccon-nc/ -C "GA: in-situ ground and aircraft"  -c /home/arne/sun-work/ct-tccon/ct-tccon-2015-5x7-use-obspack-use-tccon-noassimeu/ -C "TneGA: non-European TCCON and GA" -c /home/arne/sun-work/ct-tccon/ct-tccon-2015-5x7-use-obspack-no-tccon-no-aircraft-doesitbreaktoo/ -C "G: in-situ ground"  --regionfluxtimeseries $region --toaverage 5 --exclude-validation  --colorscheme paulforabp --linewidth 4 --font-size 36 --start 2009-12-03 --stop 2012-12-02  --title "Effect of assimilating non-EU TCCON, $(regionname ${region})"  -o ~/flux-GA-vs-TneGA-vs-G-region-${region}.pdf; done

For your convenience, here’s my entire transcom naming function:

function regionname () {
    case $number in
        1) echo "NAM Boreal" ;;
        2) echo "NAM Temperate";;
        3) echo "South American tropical";;
        4) echo "South American temperate";;
        5) echo "Northern Africa";;
        6) echo "Southern Africa";;
        7) echo "Eurasian Boreal";;
        8) echo "Eurasian Temperate";;
        9) echo "Tropical Asia";;
        10) echo "Australia";;
        11) echo "Europe";;
        12) echo "North Pacific Temperate";;
        13) echo "West Pacific Tropics";;
        14) echo "East Pacific Tropics";;
        15) echo "South Pacific Temperate";;
        16) echo "Northern Ocean";;
        17) echo "North Atlantic Temperate";;
        18) echo "Atlantic Tropics";;
        19) echo "South Atlantic Temperate";;
        20) echo "Southern Ocean";;
        21) echo "Indian Tropical";;
        22) echo "South Indian Temperate";;

Happy Hacking!

Weltenwald-theme under AGPL (Drupal)

After the last round of polishing, I decided to publish my theme under AGPLv3. Reason: If you use AGPL code and people access it over a network, you have to offer them the code. Which I hereby do ;)
That’s the only way to make sure that website code stays free.

It’s still for Drupal 5, because I didn’t get around to port it, and it has some ugly hacks, but it should be fully functional.

Just untar it in any Drupal 5 install.

tar xjf weltenwald-theme-2010-08-05_r1.tar.bz2

Maybe I’ll get around to properly package it in the future…

Until then, feel free to do so yourself :)

And should I change the theme without posting a new layout here, just drop me a line and I’ll upload a new version — as required by AGPL. And should you have some problem, or if something should be missing, please drop me a line, too.

No screenshot, because a live version kicks a screenshot any day ;)
(in case it isn’t clear: Weltenwald is the theme I use on this site)

weltenwald-theme-2010-08-05_r1.tar.bz2877.74 KB

Which language is best, C, C++, Python or Java?

My answer to the question about the best language on Quora. If you continue reading from here, please stick with me to the end. Ready to read to the end? Enjoy the ride!

My current answer is: Scheme ☺ It gives me a large degree of freedom to explore ways to program which were much harder hard to explore in Python, C++ and Java. That’s why I’m currently switching from Python to Scheme.1

But depending on my current step on the road to improve my skills2 and the development group and project, that answer might have been any other language — C, C++, Java, Python, Fortran, R, Ruby, Haskell, Go, Rust, Clojure, ….

Therefore this answer is as subjective as most other answers, because we have no context on your personal situation nor on the people with whom you’ll work and from whom you can learn or the requirements of the next project you want to tackle.

Put another way:

The only correct answer is “it depends”.

The other answers in this thread should help you find the right answer for you.

Why Gnutella scales quite well

You might have read in some (almost ancient) papers, that a network like Gnutella can't scale. So I want to show you, why the current Version of Gnutella does scale, and does it well.

In earlier versions, up to v0.4, Gnutella was a a pure broadcast network. That means, that every search request did reach every participant, so the number of search requests hitting each node was for an optimal network exactly equal to the number of requests, made by nodes who were in the network. And you can see easily why that can't scale.
But that was only true for Gnutella 0.4.

In the current incarnation of Gnutella (Gnutella 0.6), Gnutella is no longer a pure Broadcast network. Instead, only the smallest percentage of the traffic is done via broadcast.

If you want to read about the methods used to realize this, please have a look at the GnuFU guide (english, german).

Here I want to limit it to the statement, that the first two hops of a search request are governed via Dynamic Querying, which stops the request as soon as it has enough sources (this stops a search as soon as it gets about 250 results), and that the last two hops are governed via the Query Routing Protocol, which ensures, that a search request reaches only those hosts, which can actually have the file (which is only about 5% of the nodes).

So in todays reality, Gnutella is a quite structured and very flexible network.

To scale it, Ultrapeers can increase their number of connections from their current 32 upwards, which makes Dynamic Querying (DQ) and the Query Routing Protocol (QRP) even more effective.

In the case of DQ most queries for popular files will still provide enough results after the same number of clients have been contacted, so increasing the number of connections won't change the network traffic at all which is caused by the first two steps.

In the case of QRP, queries wil still only reach the hosts, which can have the file, and if Ultrapeers are connected to more nodes at the same time (by increasing the number of connections), it will provide more results for each connection, so DQ will stop even earlier than with fewer connections per Ultrapeer.

So Gnutella is now far from a broadcast model, and the act of increasing the size of the Gnutella Network can even increase its efficiency for popular files.

For rare files, QRP kicks in with full force, and even though DQ will likely check all other nodes for content, QRP will make sure that only those nodes are reached, which can have the content, which might be only 0.1% of the net or even far less.

Here, increasing the number of nodes per Ultrapeer means that nodes with rare files are in effect closer to you than before, so Gnutella also gets more efficient when you increase the network size, when rare file searches are your major concern.

So you can see, that Gnutella has become a network, which scales extremly well for keyword searches, and due to that it can also very efficiently be used to search for metadata and similar concepts.

The only thing which Gnutella can't do well are searches for strings which aren't seperate words (for example file-hashes), because that kills QRP, so they will likely not reach (m)any hosts. For these types of searches, the Gnutella developers work on a DHT (Distributed Hash Table), which will only be used, if the string can't be split into seperate words, and that DHT will most likely be Kademlia, which is also proven to work quite well.

And with that, the only problem which remains in need of fixing is spam, because that inhibits DQ when you do a rare search, but I am sure that the devs will also find a way to stop spamming, and even with spam, Gnutella is quite effective and consumes very little bandwidth, when you are acting as a leaf, and only moderate bandwidth when you are acting as ultrapeer.

Some figures as finishing touch:

  • Leaf network traffic: About 1kB/s if you add outgoing and incoming traffic, which is about the seventh part of the speed of a 56k modem.
  • Ultrapeer traffic: About 7kB/s, outgoing and incoming added together, which is about one full ISDN line of less than 1/8th of a DSLs outgoing speed.

Have fun with Gnutella!
- ArneBab 08:14, 15. Nov 2006 (CET)

PS: This guide ignores, that requests must travel through intermediate nodes. But since those nodes make up only about 3% of the network and only 3% of those nodes will be reached by a (QRP-routed) rare file request, it seems safe to ignore these 0.1% of the network in the calculations for the sake of making it easier to follow them mentally (QRP takes care of that).

Why Python 3?

At the Institute we use both Python 2 and Python 3. While researching the current differences (Python 3.5, compared to Python 2.7), I found two beautiful articles by Brett Cannon, the current manager of Python, and summarized them for my work group.

The articles:

  1. Why Python 3: Why Python3 exists
  2. Why use 3: How to pitch Python 3 to Management

The relevant points for us1 are the following:

  1. Why Python 3 was necessary:

    • Python2: string = byte-array.
      • Py3 avoids Encoding-Bugs in Unicode: all Strings are Unicode.
    • Python2: sources in ASCII. β in a comment needed # encoding: utf-8
      • Py3 uses utf-8 in source files by default.
    • Last chance: the cost of the chance increased every year.
  2. Why use 3 (relevant for us, e.G. for new projects):

    • int/long -> int
    • Unicode in Code: σ = sqrt(var) # only letters, but i.e. not Σ
    • H.dot(β) -> H @ β
    • chained exceptions: Traceback ... during handling ... Traceback — simplifies debugging
    • print() facilitates structured output2

The effect of these points is much larger than this short text suggests: avoid surprises, avoid awkward workarounds, and easier debugging.

  1. I have summarized them because I can not expect scientists (or other people who only use Python) to read the full articles, just to decide what they do when they get the channce to tackle a new project. 

  2. Example for print():
    nums = [1, 2, 3]
    with open("data.csv", "a") as f:
        print(*nums, sep=";", file=f) 

Write programs you can still hack when you feel dumb

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. — Brian Kernighan

In the article Hyperfocus and balance, Arc Riley from PySoy talks about trying to get to the Hyperfocus state without endangering his health. Since I have similar needs, I am developing some strategies for that myself (though not for my health, but because my wife and children can’t be expected to let me work 8h without any interruptions in my free time).

Different from Arc, I try to change my programming habits instead of changing myself to fit to the requirements of my habits.1

Easy times

Let’s begin with Programming while you feel great.

The guideline I learned from writing PnP roleplaying games is to keep the number of things to know at 7 or less at each point (according to Miller, 1956; though the current best guess of the limitation for average humans is only 4 objects!). For a function of code I would convert that as follows:

  1. You need to keep in mind the function you work in (location), and
  2. the task it should perform (purpose and effect), and
  3. the resources it uses (arguments or global values/class attributes).

Only 4 things left for the code of your function. (three if you use both class attributes/global values and function arguments. Two, if you have complex custom data-structures with peculiar names or access-methods which you have to understand for doing anything. One if you also have to remember the commands of an unfamiliar2 editor or VCS tool. See how fast this approaches zero even when starting with 7 things?)

Add an if-switch, for-loop or similar and you have only 3 things left.

You need those for what the function should actually do, so better put further complexities into subfunctions.

Also ensure that each of the things you work with is easy enough. If you get the things you use down to 7 by writing functions with 20 arguments, you don’t win anything. Just the resources you could use in the function will blow your mind when you try to change the function a few months later. This goes for every part of your program: The number of functions, the number of function arguments, the number of variables, the lines of code per function and even the number of hierarchy levels you use to reduce the other things you need to keep in mind at any given time.

Hard times

But if you want to be able to hack that code while you feel dumb (compared to those streaks of genius when you can actually hold the whole structure of your program in your head and forsee every effect of a given change before actually doing it), you need to make sure that you don’t have to take all 7 things into account.

Tune it down for the times when you feel dumb by starting with 5 things.3 After substracting one for the location, for the task and for the resources, you are left with only two things:

Two things for your function. Some Logic and calling stuff are 2 things.

If it is an if-switch, let it be just an if-switch calling other functions.4 Yes, it may feel much easier to do it directly here, when you are fully embedded in your code and feel great, but it will bite you when you are down. Which is exactly when you won’t want to be bitten by your own code.

Loose coupling and tight cohesion

Programming is a constant battle against complexity. Stumble from the sweet spot of your program into any direction, and complexity raises its ugly head. But finding the sweet spot requires constant vigilance, as it shifts with the size and structure of your program and your development group.

To find a practical way of achieving this, Django’s concept of loose coupling and tight cohesion (more detailed) helped me most, because it reduces the interdependencies.

The effects of any given change should be contained in the part of the code you work in - and in one type of code.

As web framework, Django seperates the templates, the URI definitions, the program code and the database access from each other. (see how these are already 4 categories, hitting the limit of our mind again?)

For a game on the other hand, you might want to seperate story, game logic, presentation (what you see on the screen) and input/user actions. Also people who write a scenario or level should only have to work in one type of code, neatly confined in one file or a small set of files which reside in the same place.

And for a scientific program, data input, task definition, processing and data output might be seperated.

Remember that this seperation does not only mean that you put those parts of the code into different files, but that they are loosely coupled:

They only use lean and clearly defined interfaces and don’t need to know much about each other.


This strategy does not only make your program easier to adapt (because the parts you need to change for implementing a given feature are smaller). If you apply it not only to the bigger structure, but to every part of the program, it’s main advantage is that any part of the code can be understood without having to understand other parts.

And you can still understand and hack your code, when your child is sick, your wife is overworked, you slept 3 hours the night before - and can only work for half an hour straight, because it’s evening and you don’t want to be a creep (but this change has to be finished nontheless).

Note that finding a design which accomplishes this is far more complex than it sounds. If people can read your code and say “oh, that’s easy. I can hack that” (and manage to do so), then you did it right.

Designing a simple structure to solve a complex task is far harder than designing a complex structure to solve that task.

And being able to hack your program while you feel dumb (and maybe even hold it in your head) is worth investing some of your genius-time5 into your design (and repeating that whenever your code grows too hairy).

PS (7 years later): This only applies to the version of your code that stays in your codebase. During short-term experiments these rules do not apply, because there you still have the newly written code in your head. But take pains to clean it up before it takes on a life of its own. The last point for that is when you realize that you’re no longer sure how it works (then you know that you already missed the point of refactoring, but you can at least save your colleagues and your future self from stumbling even worse than you do at that moment). That way you also always have some leeway in short-term complexity that you can use during future experimentation. Also don’t make your code too simple: If you find that you’re bored while coding or that you spend more time fighting the structures you built than solving the actual problems, you took these principles too far, because you’re no longer getting full benefits from your brain. Well chosen local complexity reduces global complexity and the required work per change.

  1. Where I got bitten badly by my high-performance coding habits is the keyboard layout evolution program. I did not catch my error when the structure grew too complex (while adding stuff), and now that I do not have as much uninterrupted time as before, I cannot work on it efficiently anymore. I’m glad that this happened with a mostly finished project on whoose evolution no ones future depended. Still it is sad that this will keep me from turning it into a realtime visual layout optimizer. I can still work on its existing functionality (I kept improving it for the most importang task: the cost calculation), but adding new functionality is a huge pain. 

  2. This limit only applies to unfamiliar things: things you did not yet learn well enough that they work automatically. Once you know a tool well enough that you don’t have to think about it anymore, it no longer counts against the 7 thing limit, since you don’t need to remember it.6 That’s strong support for writing conventional code — or at least code you’ll still write similarly a decade later — and using tools which can accompany you for a long time. 

  3. See how I actually don’t get below 5 here? A good TODO list which shows you the task so you can forget it while coding might get you down to 4. But don’t bet on it. Not knowing where you are or where you want to go are recipes for desaster… And if you make your functions too small, the collection of functions gets more complex, or the object hierarchy too deep, adding complexity at other places and making it harder to change the structure (refactor) when requirements change. Well, no one said creating well-structured programs would be easy. You need to find the right compromise for you. 

  4. Keeping functions simple does not mean that they must be extremely short. If you have a library which provides many tools that get used for things like labelling axes in a plot, and you don’t get much repetition between different functions, then having a function of 20 to 30 lines can be simpler than building an abstraction which only works at the current state of the code but will likely break when you add the next function. This is inherent, function-local complexity: you cannot reduce it with structure. Therefore the sweet spot of simplicity for some tasks is using medium-sized functions. If you find yourself repeating exactly the same code multiple times, however, you likely missed the sweet spot and should investigate shortening the functions by extracting the common tasks, or restructuring the function to separate semantically different tasks. 

  5. How to find your genius time? That’s a tautology: Your genius time is when you can hold your program in your mind. If I could tell you when your genius time occurs, or even how to trigger it, I could make lots of money by consulting about every tech company in existence. A good starting point is reading about “flow”, known in many other creative activities (some starting points). Reaching the flow often includes spending time outside the flow, so best write programs you can still hack when you feel dumb.7 

  6. This is reasoning from experience. I think the actual reason why people can juggle large familiar structures is more likely that they have an established mental model which allows them to use multiple dimensions and cut the amount of bits you need for referring to the thing.8 See the Absolute Judgments of Multidimensional Stimuli section, the recoding section and the difference between chunks and bits in George A. Miller (1956). This is part of writing programs you can still hack when you feel dumb — but one which only helps those who use the same structures and one which binds you to your established coding style. 

  7. And in all this reduction of local complexity, keep in mind that there is no silver bullet (Brooks, 1986). Just take care that you design your code against the limits of the humans who work with it, and only in the second place against the limits of the tools you use — you can change the tools, but you cannot easily change the humans; often you cannot change the humans at all. In the best case you can make your tools fit and expand the limits of humans. But remember also that your code must run well enough on the machine. And you often do not know what "well enough" means. I know that this is not a simple answer. If that irks you, keep in mind that there is no silver bullet (Brooks, 1986), and this text isn’t one either. It’s just a step on the way — I hope it is useful to you. 

  8. Aside from being able to remember the full mental model, it is often enough to remember something close enough and then find the correct answer with assisted guessing. A typical example is narrowing down auto-completion candidates by matching on likely names until something feels right. This is how good auto-completion — or rather: guided interactive code inspection — massively expands the size of models we can work with efficiently. It depends on easily guessable naming, typically aided by experience, and it benefits from tools which can limit or order the potential candidates by the context. With good tool-support it suffices to have a general feeling about the direction to take for doing something. The guidelines in this article should help you with guessing, and should help your tool with limiting candidates to plausible choices and with ordering them by context. 

Writing a commandline tool in Fortran

Here I want to show you how to write a commandline tool in Fortran. Because Fortran is much better than its reputation — most of all in syntax. I needed a long time to understand that — to get over my predjudices — and I hope I can help you save some of that time.1

This provides a quick-start into Fortran. After finishing it, I suggest having a look at Fortran surprises to avoid stumbling over differences between Fortran and many other languages.

The first program: Hello world :)

Code to be executed when the program runs is enclosed in program and end program:

program hello
  use iso_fortran_env
  write (output_unit,*) "Hello World!"
  write (output_unit,*) 'Hello Single Quote!'
end program hello

Call this fortran-hello.f90 (.f is for the old Fortran 77).

The fastest free compiler is gfortran.

gfortran -std=gnu -O3 fortran-hello.f90 -o fortran-hello
Hello World!
Hello Single Quote!

That’s it. This is your first commandline tool.

Reading arguments

Most commandline tools accept arguments. Fortran-developers long resisted this and preferred explicit configuration files, but with 2003 argument parsing entered the standard. The tool for this is get_command_argument.

program cli
  implicit none ! no implicit declaration: all variables must be declared
  character(1000) :: arg

  call get_command_argument(1, arg) ! result is stored in arg, see 
  ! https://gcc.gnu.org/onlinedocs/gfortran/GET_005fCOMMAND_005fARGUMENT.html

  if (len_trim(arg) == 0) then ! no argument given
      write (*,*) "Call me --world!"
      if (trim(arg) == "--world") then
          call get_command_argument(2, arg)
          if (len_trim(arg) == 0) then
              arg = "again!"
          end if
          write (*,*) "Hello ", trim(arg)
          ! trim reduces the fixed-size array to non-blank letters
      end if
  end if
end program
gfortran -std=gnu -O3 fortran-commandline.f90 -o fortran-helloworld
./fortran-helloworld --world World
./fortran-helloworld --world
Call me --world!
Hello World
Hello again!

Adding structure with modules

The following restructures the program into modules. If you used any OO tool, you know what this does. use X, only : a, b, c gets a, b and c from module x.

Note that you have to declare all variables used in the function at the top of the function.

module hello
  implicit none
  character(100),parameter :: prefix = "Hello" ! parameters are constants
  public :: parse_args, prefix
  function parse_args() result ( res )
    implicit none
    character(1000) :: res

    call get_command_argument(1, res)  
    if (trim(res) == "--world") then
        call get_command_argument(2, res)
        if (len_trim(res) == 0) then
            res = "again!"
        end if
    end if
  end function parse_args
end module hello

program helloworld
  use hello, only : parse_args, prefix
  implicit none
  character(1000) :: world
  world = parse_args()
  write (*,*) trim(prefix), " ", trim(world)
end program helloworld
gfortran -std=gnu -O3 fortran-modules.f90 -o fortran-modules
./fortran-modules --world World
Hello World

You can also declare functions as pure (free from side effects). I did not yet check whether the compiler enforces that already, but if it does not do it now, you can be sure that this will be added. Fortran compilers are pretty good at enforcing what you tell them. Do see the fortran surprises for a few hints on how to tell them what you want.

Performance considerations

Fortran is fast, really fast. But if you come from C, you need to retrain a bit: The inner loop is the first part of the reference, while with C it is the last part.

The following tests the speed difference when looping over the outer or the inner part. You can get a factor 3-5 difference by having the tight inner loop go over the inner part of the multidimensional array.

Note the L1 cache comments: If you want to get really fast with any language, you cannot ignore the capabilities of your hardware.

Also note that this code works completely naturally on multidimensional arrays.

! Thanks to http://infohost.nmt.edu/tcc/help/lang/fortran/time.html
program cheaplooptest
  integer :: i,j,k,s
  integer, parameter :: n=150 ! 50 breaks 32KB L1 cache, 150 breaks 256KB L2 cache
  integer,dimension(n,n,n) :: x, y
  real etime
  real elapsed(2)
  real total1, total2, total3, total4
  y(:,:,:) = 0
  x(:,:,:) = 1
  total1 = etime(elapsed)
  print *, "start time ", total1
  ! first index as outer loop
  do s=1,n
     do i=1,n
        do j=1,n
           y(i,j,:) = y(i,j,:) + x(i,j,:)
        end do
     end do
  end do
  total2 = etime(elapsed)
  print *, "time for outer loop", total2 - total1
  ! first index as inner loop is much cheaper (difference depends on n)
  do s=1,n
     do k=1,n
        do j=1,n
           y(:,j,k) = y(:,j,k) + x(:,j,k)
        end do
     end do
  end do
  total3 = etime(elapsed)
  print *, "time for inner loop", total3-total2
  ! plain copy is slightly faster still
  do s=1,n
     y = y + x
  end do
  total4 = etime(elapsed)
  print *, "time for simple loop", total4-total3

end program cheaplooptest
gfortran -std=gnu -O3 fortran-faster.f90 -o fortran-faster
start time    2.33319998E-02
time for outer loop   19.0533314    
time for inner loop  0.799999237    
time for simple loop  0.729999542    

This now seriously looks like Python, but faster by factor 5 to 20, if you do it right (avoid the outer loop).

Just to make it completely clear: The following is how the final test code looks (without the additional looping which make it slow enough to time it).

program cleanloop
  integer, parameter :: n=150 ! 50 breaks 32KB L1 cache, 150 breaks 256KB L2 cache
  integer,dimension(n,n,n) :: x, y
  y(:,:,:) = 0
  x(:,:,:) = 1
  y = y + x
end program cleanloop

That’s it. If you want to work with any multidimensional stuff like matrices, that’s in most cases exactly what you want. And fast.

A full tool: base60

The previous tools were partial solutions. The following is a complete solution, including numerical work (which is where Fortran really shines). And setting the numerical precision. I’m sharing it in total, so you can see everything I needed to do to get it working well.

This implements newbase60 by tantek.

It could be even nicer, if I could find an elegant way to add complex numbers to the task :)

module base60conv
  implicit none ! if you use this here, the module must come before the program in gfortran
  ! constants: marked as parameter: not function parameters, but
  ! algorithm parameters!
  character(len=61), parameter :: base60chars = "0123456789"&
  integer, parameter :: longlong = selected_int_kind(32) ! length up to 32 in base10, int(16)
  integer(longlong), parameter :: sixty = 60
  public :: base60chars, numtosxg, sxgtonum, longlong
  private ! rest is private
  function numtosxg( number ) result ( res )
    implicit none
    !!! preparation
    ! input: ensure that this is purely used as input.
    ! intent is only useful for function arguments.
    integer(longlong), intent(in) :: number
    ! work variables
    integer(longlong) :: n
    integer(longlong) :: remainder
    ! result
    character(len=1000) :: res ! do not initialize variables when
    ! declaring them: That only initializes
    ! at compile time not at every function
    ! call and thus invites nasty errors
    ! which are hard to find.  actual
    ! algorithm
    if (number == 0) then
       res = "0"
    end if
    ! calculate the base60 string

    res = "" ! I have to explicitely set res to "", otherwise it
    ! accumulates the prior results!
    n = number ! the input argument: that should be safe to use.
    ! catch number = 0
    do while(n > 0)
       ! in the first loop, remainder is initialized here.
       remainder = mod(n, sixty)
       n = n/sixty
       ! note that fortran indizes start at 1, not at 0.
       res = base60chars(remainder+1:remainder+1)//trim(res)
       ! write(*,*) number, remainder, n
    end do
    ! numtosxg = res
  end function numtosxg

  function sxgtonum( base60string ) result ( number )
    implicit none
    ! Turn a base60 string into the equivalent integer (number)
    character(len=*), intent(in) :: base60string
    integer :: i ! running index
    integer :: idx, badchar ! found index of char in string
    integer(longlong) :: number
    ! integer,dimension(len_trim(base60string)) :: numbers ! for later openmp
    badchar = verify(base60string, base60chars)
    if (badchar /= 0) then ! one not
       write(*,"(a,i0,a,a)") "# bad char at position ", badchar, ": ", base60string(badchar:badchar)
       stop 1 ! with OS-dependent error code 1
    end if

    number = 0
    do i=1, len_trim(base60string)
       number = number * 60
       idx = index(base60chars, base60string(i:i), .FALSE.) ! not backwards
       number = number + (idx-1)
    end do
    ! sxgtonum = number
  end function sxgtonum

end module base60conv

program base60
  ! first step: Base60 encode. 
  ! reference: http://faruk.akgul.org/blog/tantek-celiks-newbase60-in-python-and-java/
  ! 5000 should be 1PL
  use base60conv
  implicit none

  integer(longlong) :: tests(14) = (/ 5000, 0, 100000, 1, 2, 60, &
       61, 59, 5, 100000000, 256, 65536, 215000, 16777216 /)
  integer :: i, badchar ! index for the for loop
  integer(longlong) :: n ! the current test to run
  integer(longlong) :: number
  ! program arguments
  character(1000) :: arg
  call get_command_argument(1, arg) ! modern fortran 2003!
  if (len_trim(arg) == 0) then ! run tests
     ! I have to declare the return type of the function in the main program, too.
     ! character(len=1000) :: numtosxg
     ! integer :: sxgtonum
     ! test the functions.
     do i=1,size(tests) 
        n = tests(i)
        write(*,"(i12,a,a,i12)") n, " ", trim(numtosxg(n)), sxgtonum(trim(numtosxg(n)))
     end do
     if (trim(arg) == "-r") then
        call get_command_argument(2, arg)
        badchar = verify(arg, " 0123456789")
        if (badchar /= 0) then
           write(*,"(a,i0,a,a)") "# bad char at position ", badchar, ": ", arg(badchar:badchar)
           stop 1 ! with OS-dependent error code 1
        end if
        read (arg, *) number ! read from arg, write to number
        write (*,*) trim(numtosxg(number))
        write (*,*) sxgtonum(arg)
     end if
  end if
end program base60
gfortran -std=gnu -O3 fortran-base60.f90 -o fortran-base60
./fortran-base60 P
./fortran-base60 h
./fortran-base60 D
./fortran-base60 PhD
factor $(./fortran-base60 PhD) # yes, it’s prime! :)
./fortran-base60 -r 85333
./fortran-base60 "!" || echo $?
echo "^ with error code on invalid input :)"
85333: 85333
# bad char at position 1: !
^ with error code on invalid input :)


Fortran done right looks pretty clean. It does have its warts, but not more than all the other languages which are stable enough that the program you write today will still run in 10 years to come. And it is fast. And free.

Why I’m writing this? To save you a few years of lost time I spent adjusting my mistaken distaste for a pretty nice language which got a bad reputation because it once was the language everyone had to learn to get anything done (with sufficient performance). And its code did once look pretty bad, but that’s long become ancient history — except for the tools which were so unbelievably good that they are still in use 40 years later.

You can ask "what makes a programming language cool?". One easily overlooked point is: Making your programs still run three decades later. That doesn’t look fancy and it doesn’t look modern, but it brings a lot of value.

And if you use it where it is strong, Fortran is almost as easy to write as Python, but a lot faster (in terms of CPU requirement for the whole task) with much lower resource consumption (in terms of memory usage and startup time). Should you now ask "what about multiprocessing?", then have a look at OpenMP.

  1. After I finished my Diploma, I thought of Fortran as "this horribly unreadable 70th language". I thought it should be removed and that it only lived on due to pure inertia. I thought that its only deeper use were to provide the libraries to make numeric Python faster. Then I actually had to use it. In the beginning I mocked it and didn’t understand why anyone would choose Fortran over C. What I saw was mostly Fortran 77. The first thing I wrote was "Fortran surprises" — all the strange things you can stumble over. But bit by bit I realized the similarities with Python. That well-written Fortran actually did not look that different from Python — and much cleaner than C. That it gets stuff done. This year Fortran turns 60 (heise reported in German). And I understand why it is still used. And thanks to being an ISO standard it is likely that it will stick with us and keep working for many more decades. 

2017-04-10-Mo-fortran-commandline-tool.pdf172.84 KB
2017-04-10-Mo-fortran-commandline-tool.org14.01 KB

Your browser history can be sniffed with just 64 lines of Python (tested with Firefox 3.5.3)

Update: The basic bug shown here is now fixed in Firefox. Read on to see whether the fix works for you. Keep in mind that there are much stronger attacks than the one shown here. Use private mode to reduce the amount of data your Browser keeps. What’s not there cannot be claimed.

After the example of making-the-web, I was quite intrigued by the ease of sniffing the history via simple CSS tricks.

- Firefox Bug report - finally resolved fixed.
- Start Panic! - a site dedicated to spreading the news about the vulnerability.

So I decided to test, how small I get a Python program which can sniff the history via CSS - without requiring any scripting ability on the browser-side.

I first produced fully commented code (see server.py) and then stripped it down to just 64 lines (server-stripped.py), to make it really crystal clear, that making your browser vulnerable to this exploit is a damn bad idea. I hope this will help get Firefox fixed quickly.

If you see http://blubber.blau as found, you're safe. If you don't see any links as found, you're likely to be safe. In any other case, everyone in the web can grab your history - if given enough time (a few minutes) or enough iframes (which check your history in parallel). This doesn't use Javascript.

It currently only checks for the 1000 or so most visited websites and doesn't keep any logs in files (all info is in memory and wiped on every restart), since I don't really want to create a full fledged history ripper but rather show how easy it would be to create one.

Besides: It does not need to be run in an iframe. Any Python-powered site could just run this test as regular part of the site while you browse it (and wonder why your browser has so much to do for a simple site, but since we’re already used to high load due to Javascript, who is going to care?). So don’t feel safe, just because there are no iframes. To feel and be safe, use one of the solutions from What the Internet knows about you.

Konqueror seems to be immune: It also (pre-)loads the "visited"-images from not visited links, so every page is seen as visited - which is the only way to avoid spreading my history around on the web and still providing “visited” image-hints in the browser!

Firefox 4.0.1 seems to be immune, too: It does not show any :visited-images, so the server does not get any requests.

So please don't let your browser load anything depending on the :visited state of a link tag! It shouldn't load anything based on internal information, because that always publicizes private information - and you don't know who will read it!

In short: Don't keep repeating Ennesbys Mistake:

  • Mistake: http://www.schlockmercenary.com/d/20071201.html

  • Effects: http://www.schlockmercenary.com/d/20071206.html

(comic strips not hosted here and not free licensed → copyright: Howard V. Tayler)

And to the Firefox developers: Please remove the optimization of only loading required css data based on the visited info! I already said so in a bug report, and since the bug isn't fixed, this is my way to put a bit of weight behind it. Please stop putting your users privacy at risk.


  • python server.py
    start the server at port 8000. You can now point your browser to to get sniffed :)

To get more info, just use ./server.py --help.

adapt plainnat bibtex natbib style to only show the url if no doi is available

Since the URL in a bibtex entry is typically just duplicate information when the entry has a DOI, I want to hide it.1

Here’s how:

diff -r 5b78f551d0a0 plainnatnoturl.bst
--- a/plainnatnoturl.bst    Tue Apr 04 10:45:08 2017 +0200
+++ b/plainnatnoturl.bst    Tue Apr 04 10:52:25 2017 +0200
@@ -1,5 +1,7 @@
-%% File: `plainnat.bst'
-%% A modification of `plain.bst' for use with natbib package 
+%% File: `plainnatnoturl.bst'
+%% A modification of `plain.bst' and `plainnat.bst' for use with natbib package 
+%% From /usr/share/texmf-dist/bibtex/bst/natbib/plainnat.bst
 %% Copyright 1993-2007 Patrick W Daly
 %% Max-Planck-Institut f\"ur Sonnensystemforschung
@@ -285,7 +288,11 @@
 FUNCTION {format.url}
 { url empty$
     { "" }
-    { new.block "URL \url{" url * "}" * }
+    { doi empty$
+      { new.block "URL \url{" url * "}" * }
+      { "" }
+      if$
+    }

Just put this next to your .tex file, add a header linking the doi

\newcommand*{\doi}[1]{\href{http://dx.doi.org/#1}{doi: #1}}

and use the bibliography referencing plainnatnoturl.bst


That’s it. Thanks to toliveira from tex.stackexchange!



Also I’m scraping at my page limit and cutting a line for roughly every second entry helps a lot :)

complex number compiler and libc bugs (cexp+conj) on OSX and with the intel compiler (icc)

Today a bug in complex number handling surfaced in guile which only appeared on OSX.

This is a short note just to make sure that the bug is reported somewhere.

Test-code (written mostly by Mark Weaver who also analyzed the bug - I only ran the code on a few platforms I happened to have access to):

// test.c
// compile with gcc -O0 -o test test.c -lm
// or with icc -O0 -o test test.c -lm
#include <complex.h>
#include <stdio.h>

main (int argc, char **argv)
  double complex z = conj (1.0);
  double complex result;

  if (argc == 1)
    z = conj (0.0);

  result = cexp (z);

  printf ("cexp (%f + %f i) => %f + %f i\n",
          creal (z), cimag (z), creal (result), cimag (result));
  result = conj(result);
  printf ("conj(cexp (%f + %f i)) => %f + %f i\n",
          creal (z), cimag (z), creal (result), cimag (result));

  return 0;

As by the C-11 standard (pages 561 and 216) this should return:

cexp (0.000000 + -0.000000 i) => 1.000000 + -0.000000 i

conj(cexp (0.000000 + -0.000000 i)) => 1.000000 + 0.000000 i

Page 561:

— cexp(conj(z)) = conj(cexp(z)).

Page 216:

The conj functions compute the complex conjugate of z, by reversing the sign of its imaginary part.

On OSX it returns (compiled with GCC):

TODO: Check the second line!

cexp (0.000000 + -0.000000 i) => 1.000000 + 0.000000 i

With the intel compiler it returns:

cexp (0.000000 + 0.000000 i) => 1.000000 + 0.000000 i

conj(cexp (0.000000 + 0.000000 i)) => 1.000000 + 0.000000 i

In short: On OSX cexp seems broken. With the intel compiler conj seems broken.

icc --version
# => icc (ICC) 13.1.3 20130607
# => Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

The OSX compiler is GCC 4.8.2 from MacPorts.

[taylanub] ArneBab: You might want to add that compiler optimizations can result in cexp() calls where there are none (which is how this bug surfaced in our case).

[mark_weaver] cexp(z) = e^z = e^(a+bi) = e^a * e^(bi) = e^a * (cos(b) + i*sin(b))

[mark_weaver] for real 'b', e^(bi) is a point on the unit circle on the complex plane.

[mark_weaver] so cexp(bi) can be used to compute cos(b) and sin(b) simultaneously, and probably faster than calling 'sin' and 'cos' separately.

minimal Python script

Over the years I found a few things which in my opinion are essential for any Python script:

  • A description,
  • useful logging
  • argument parsing and
  • doctests

Everything in this setup is low-overhead and available from Python 2.6 to 3.x, so you can use it to start any kind of project.

# encoding: utf-8

"""Minimal setup for a Python script.

No project should start without this.

import argparse # for Python <2.6 use optparse
# setup sane logging. It tells you why, where and when something was
# logged, so you can jump to the source line right away.
import logging
                    format=' [%(levelname)-7s] (%(asctime)s) %(filename)s::%(lineno)d %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S')

def main():
    """The main entry point."""

# output test results as base60 number (for aesthetics)
def numtosxg(n):
    CHARACTERS = ('0123456789'
    s = ''
    if not isinstance(n, int) or n == 0:
        return '0'
    while n > 0:
        n, i = divmod(n, 60)
        s = CHARACTERS[i] + s
    return s

def _test():
    """  run doctests, can include setup. Complex example:
    >>> import sys
    >>> handlers = logging.getLogger().handlers # to stdout
    >>> logging.getLogger().handlers = []
    >>> logging.getLogger().addHandler(
    ...     logging.StreamHandler(stream=sys.stdout))
    >>> logging.warn("test logging")
    test logging
    >>> logging.getLogger().handlers = handlers
    from doctest import testmod
    tests = testmod()
    if not tests.failed:
        return "^_^ ({})".format(numtosxg(tests.attempted))
    else: return ":( "*tests.failed

# keep argument setup and parsing together

parser = argparse.ArgumentParser(description=__doc__.splitlines()[0])
parser.add_argument("arguments", metavar="args", nargs="*",
                    help="Commmandline arguments")
parser.add_argument("--debug", action="store_true",
                    help="Set log level to debug")
parser.add_argument("--info", action="store_true",
                    help="Set log level to info")
parser.add_argument("--quiet", action="store_true",
                    help="Set log level to error")
parser.add_argument("--test", action="store_true",
                    help="Run tests")

# add a commandline switch to increase the log-level when running this
# script standalone. --test should run the tests.
if __name__ == "__main__":
    args = parser.parse_args()
    if args.debug:
    elif args.info:
    elif args.quiet:
    if args.test:

pyRad - a wheel type command interface for KDE

Arrrrrr! Ye be replacin' th' walk th' plank alt-tab wi' th' keelhaulin' pirate wheel, matey! — Lacrocivious

pyRad is a wheel type command interface for KDE1, designed to appear below your mouse pointer at a gesture.

install | setup | usage and screenshots | download and sources

pyRad command wheel


in any distro

  • Get Python.
  • call easy_install pyRadKDE in any shell.
  • Test it by calling pyrad.py.
  • This should automatically pull in pyKDE4. If it doesn’t, you need to install that seperately.
  • Visual icon selection requires the kdialog program (a standard part of KDE).

  • For a "live" version, just clone the pyrad Mercurial repo and let KDE run "path/to/repo/pyrad.py" at startup. You can stop a running pyrad via pyrad.py --quit. pyrad.py --help gives usage instructions.

In Gentoo

  • emerge -a kde-misc/pyrad

In unfree systems (like MacOSX and Windows)

  • I have no clue since I don’t use them. You’ll need to find out yourself or install a free system. Examples are Kubuntu for beginners and Gentoo for convenient tinkering. Both run GNU/Linux.


  • Run /usr/bin/pyrad.py. Then add it as script to your autostart (systemsettings→advanced→autostart). You can now use Alt-F6 and Meta-F6 to call it.

Mouse gesture (optional)

  • Add the mouse gesture in systemsettings (systemsettings→shortcuts) to call D-Bus: Program: org.kde.pyRad ; Object: /MainApplication ; Function: newInstance (you might have to enable gestures in the settings, too - in the shortcuts-window you should find a settings button).

  • Alternately set the gesture to call the command dbus-send --type=method_call --dest=org.kde.pyRad /MainApplication org.kde.KUniqueApplication.newInstance.

Customize the wheel

Customize the menu by editing the file "$HOME/.pyradrc" or middle-clicking (add) and right-clicking (edit) items.

Usage and screenshots

To call pyRad and see the command wheel, you simply use the gesture or key you assigned.

pyRad command wheel

Then you can activate an action with a single left click. Actions can be grouped into folders. To open a folder, you also simply left-click it.

Also you can click the keyboard key shown at the beginning of the tooltip to activate an action (hover the mouse over an icon to see the tooltip).

To make the wheel disappear or leave a folder, click the center or hit the key 0. To just make it disappear, hit escape.

For editing an action, just right click it, and you’ll see the edit dialog.

pyRad edit dialog

Each item has an icon (either an icon name from KDE or the path to an icon) and an action. The action is simply the command you would call in the shell (only simple commands, though, no real shell scripting or glob).

To add a new action, simply middle-click the action before it. The wheel goes clockwise, with the first item being at the bottom. To add a new first item, middle-click the center.

To add a new folder (or turn an item into a folder), simply click on the folder button, say OK and then click it to add actions in there.

See it in action:

pyRad in action (screenshot)

download and sources

pyRad is available from

PS: The name is a play on ‘python’, ‘Rad’ (german for wheel) and pirate :-)

PPS: KDE, K Desktop Environment and the KDE Logo are trademarks of KDE e.V.

PPPS: License is GPL+ as with almost everything on this site.Arrrrrr! Ye be replacin' th' walk th' plank alt-tab wi' th' keelhaulin' pirate wheel, matey! Arrrrr! → http://draketo.de/light/english/pyrad

  1. powered by KDE 

pyrad-0.4.3-screenshot.png26.67 KB
pyrad-0.4.3-screenshot-edit-action.png36.28 KB
pyrad-0.4.3-screenshot-edit-folder.png39.18 KB
pyrad-0.4.3-screenshot2.png29.03 KB
pyrad-0.4.3-screenshot3.png27.59 KB
powered_by_kde_horizontal_190.png11.96 KB
pyrad-0.4.3-fullscreen.png913.3 KB
pyrad-0.4.3-fullscreen-400x320.png143.69 KB
pyrad-0.4.4-screenshot-edit-action.png40.94 KB

pyRad is now in Gentoo portage! *happy*

My wheel type command interface pyRad just got included in the official Gentoo portage-tree!

So now you can install it in Gentoo with a simple emerge kde-misc/pyrad.

pyRad command wheel

Many thanks go to the maintainer Andreas K. Hüttel (dilfridge) and to jokey and Tommy[D] from the Gentoo sunrise project (wiki) for providing their user-overlay and helping users with creating ebuilds as well as Arfrever, neurogeek, floppym from the Gentoo Python-Herd for helping me to clean up the ebuild and convert it to EAPI 3!

shell basics (bash)

These are the notes to a short tutorial I gave to my working group as part of our groundwork group meetings. Some parts here require GNU Bash.

1 Outline

1.1 Outline

  • user-output: echo
  • pipes: |, xargs, - (often stdin)
  • text-processing: cat/tac, sed, grep, cut, head/tail
  • variables (foo=1; echo ${foo})
  • subshell: $(command)
  • loops (for; do; done) (while; do; done)
  • conditionals (if; then; fi)
  • scripts: shebang
  • return values: $?
  • script-arguments: $1, $#, $@ and getopt
  • command chaining: ;, &, && and ||
  • functions and function-arguments
  • math: $((1+2))
  • help: man and info

2 Notes

2.1 user-output

echo "foobar"
echo foobar
echo echo # second echo not executed but printed!

2.2 Pipes

  • basic way of passing info between programs
echo foobar | xargs echo
# same output as
echo foobar
echo foo > test.txt # pipe into file, replacing the content
echo bar >> test.txt # append to file
# warning: 
cat test.txt > test.txt # defined as generating an empty file!

2.3 text-processing

echo foobar | sed s/foo.*/foo/ | xargs echo
# same output as 
echo foo
echo foo | grep bar # empty
echo foobar | grep oba # foobar, oba higlighted

2.4 Variables

foo=1 # no spaces around the equal sign!
echo ${foo} # "$foo" == "1", "$foobar" == "", "${foo}bar" == "1bar"

2.5 Subshells

echo $(echo foobar)
# equivalent to
echo foobar | xargs echo

2.6 loops

for i in a b c; do 
    echo $i
# ; can replace a linebreak
for i in a b c; do echo $i; done
for i in {1..5}; do # 1 2 3 4 5
    echo $i
while true; do 
# break: stop
# continue: start the loop again

2.7 Quoting

echo "${foo}" # 1
echo '${foo}' # ${foo} <- literal string
for i in "a b c"; do # quoted: one argument
    echo ${i}; 
# => a b c
for i in a b c; do # unquoted: whitespace is separator!
    echo ${i}; 
# a
# b
# c

2.8 conditionals

# string equality
if [[ x"${a}" == x"${b}" ]] ; then
    echo a
    echo b
# other tests
if test -z ""; then 
    echo empty
if [ -z "" ]; then
    echo same check
if [ ! -z "not empty" ]; then
    echo inverse check
if test ! -z "not empty"; then
    echo inverse check with test
if test 5 -ge 2; then
    echo 5 is greater or equal 2

also check test 1 -eq 1, and info test.

2.9 scripts: shebang/hashbang

#!/usr/bin/env bash
echo "Hello World"
chmod +x hello.sh

2.10 Scripts: return value

echo 1
echo $? # 0: success
grep 1 /dev/null # fails
echo $? # 1: failure
exit 0 # exit a script with success value (no further processing of the script)
exit 1 # exit with failure (anything but 0 is a failure)

2.11 define shell arguments with getopt

# info about this script
version="shell option parsing example 0.1"
# check for the kind of getopt
getopt -T > /dev/null
if [ $? -eq 4 ]; then
    # GNU enhanced getopt is available
    eval set -- `getopt --name $(basename $0) --long help,verbose,version,output: --options hvo: -- "$@"`
    # Original getopt is available
    eval set -- `getopt hvo: "$@"`

# # actually parse the options
# PROGNAME=`basename $0`
# ARGS=`getopt --name "$PROGNAME" --long help,verbose,version,output: --options hvo: -- "$@"`
# if [ $? -ne 0 ]; then
#   exit 1
# fi
# eval set -- $ARGS

# default options

# check, if the default wisp exists and can be executed. If not, fall
# back to wisp.py (which might be in PATH).
if [ ! -x $WISP ]; then

while [ $# -gt 0 ]; do
    case "$1" in
        -h | --help)        HELP=yes;;
        -o | --output)      OUTPUT="$2"; shift;;
        -v | --verbose)     VERBOSE=yes;;
        --version)          VERSION=yes;;
        --)              shift; break;;
# all other arguments stay in $@

2.12 act on options

# Provide help output

if [[ $HELP == "yes" ]]; then
    echo "$0 [-h] [-v] [-o FILE] [- | filename]
        Show commandline option parsing.

        -h | --help)        This help output.
        -o | --output)      Save the executed wisp code to this file.
        -v | --verbose)     Provide verbose output.
        --version)          Print the version string of this script.
    exit 0

if [[ x"$VERSION" == x"yes" ]]; then
    echo "$version"
    exit 0 # script ends here

if [[ ! x"$OUTPUT" == x"no" ]]; then
    echo writing to $OUTPUT

# just output all other arguments
if [ $# -gt 0 ]; then
    echo $@

2.13 default help output formatting

# ... means that you can specify something multiple times
# short and long options
prog [-h | --help] [-v | --verbose] [--version] [-f FILE | --file FILE] 
# concatenated short options
hg help [-ec] [THEMA] # hg help -e -c == -ec

2.14 Common parameters for commands

prog --help # provide help output. Often also -h
prog --version # version of the program. Often also -v
prog --verbose # often to give more detailed information. Also --debug

By convention and the minimal GNU standards

2.15 Command chaining

echo 1 ; echo 2 ; echo 3 # sequential
echo 1 & echo 2 & echo 3 # backgrounding: possibly parallel

grep foo test.txt && echo foo is in test.txt # conditional: Only if grep is successful
grep foo test.txt || echo foo is not in test.txt # conditional: on failure

2.16 Math (bash-builtin)

echo $((1+2)) # 3
echo $((a*b)) # 6
echo $((a**$(echo 3))) # 8

2.17 help

man [command]
info [topic]
info [topic subtopic]
# emacs: C-h i

more convenient info:

function i()
    if [[ "$1" == "info" ]]; then
        info --usage -f info-stnd
        # check for usage from fast info, if that fails check man and if that also fails, just get the regular info page.
        info --usage -f "$@" 2>/dev/null || man "$@" || info "$@"

turn files with wikipedia syntax to html (simple python script using mediawiki api)

I needed to convert a huge batch of mediawiki-files to html (had a 2010-03 copy of the now dead limewire wiki lying around). With a tip from RoanKattouw in #mediawiki@freenode.net I created a simple python script to convert arbitrary files from mediawiki syntax to html.


  • Download the script and install the dependencies (yaml and python 3).
  • ./parse_wikipedia_files_to_html.py <files>

This script is neither written for speed or anything (do you know how slow a webrequest is, compared to even horribly inefficient code? …): The only optimization is for programming convenience — the advantage of that is that it’s just 47 lines of code :)

It also isn’t perfect: it breaks at some pages (and informs you about that).

It requires yaml and Python 3.x.

#!/usr/bin/env python3

"""Simply turn all input files to html. 
No errorchecking, so keep backups. 
It uses the mediawiki webapi, 
so you need to be online.

Copyright: 2010 © Arne Babenhauserheide
License: You can use this under the GPLv3 or later, 
         if you add the appropriate license files
         → http://gnu.org/licenses/gpl.html

from urllib.request import urlopen
from urllib.parse import quote
from urllib.error import HTTPError, URLError
from time import sleep
from random import random
from yaml import load
from sys import argv

mediawiki_files = argv[1:]

def wikitext_to_html(text):
    """parse text in mediawiki markup to html."""
    url = "http://en.wikipedia.org/w/api.php?action=parse&format=yaml&text=" + quote(text, safe="") + " "
    f = urlopen(url)
    y = f.read()
    text = load(y)["parse"]["text"]["*"]
    return text

for mf in mediawiki_files:
    with open(mf) as f:
        text = f.read()
    HTML_HEADER = "<html><head><title>" + mf + "</title></head><body>"
    HTML_FOOTER = "</body></html>"
        text = wikitext_to_html(text)
        with open(mf, "w") as f:
    except HTTPError:
        print("Error converting file", mf)
    except URLError:
        print("Server doesn’t like us :(", mf)
    # add a random wait, so the api server doesn’t kick us
parse_wikipedia_files_to_html.py.txt1.47 KB


When free speech dies, we need a place to organize.

Freenet is a censorship resistant, distributed p2p-publishing platform.

Too technical? Let’s improve that: Freenet is the internet's last, best hope for Freedom. Join now:


It lets you anonymously share files, browse and publish “freesites”, chat on forums and even do microblogging, using a generic Web of Trust, shared by different plugins, to avoid spam. For really careful people it offers a “darknet” mode, where users only connect to their friends, with which it is very hard to detect that they are running freenet.

The overarching design goal of freenet is to make censorship as hard as technically possible. That’s the reason for providing anonymity (else you could be threatened with repercussions - as seen in the case of the wikileaks informer from the army in the USA), building it as a decentral network (else you could just shut down the central website, as people tried with wikileaks), providing safe pseudonyms and caching of the content on all participating nodes (else people could censor by spamming or overloading nodes) and even the darknet mode and enhancements in usability (else freenet could be stopped by just prosecuting everyone who uses it, or it would reach too few people to be able to counter censorship in the open web).

I don’t know anymore what triggered my use of freenet initially, but I know all too well what keeps me running it instead of other anonymizers:

I see my country (Germany) turning more and more into a police-state, starting with attacks on p2p, continuing with censorship of websites (“we all know child-porn is bad, so it can’t be bad to censor it, right? Sure we could just make the providers delete it, so noone can access it, but… no, we have to censor it, so only people who can use google can find it – which luckily excludes us, because we are not pedocriminals.”) and leading into directions I really don’t like.

And in case the right for freedom of speech dies, we need a place where we can organize to get it back and fight for the rights laid out in our constitution (the Grundgesetz).

When free speech dies, we need a place to organize.

And that’s what Freenet is to me.

A technical way to make sure we can always organize acting by section 20 of our constitution (german link — google translated version): the right to oppose everyone who wants to abolish our constitutional order.

PS: New entries on my site are also available in freenet (via freereader: downloads RSS feeds and republishes them in freenet).

PPS: If you like this text, please redent/retweet the associated identi.ca/twitter notices so it spreads:

50€ for the Freenet Project - and against censorship

As I pledged1, I just donated to freenet 50€ of the money I got back because I cannot go to FilkCONtinental. Thanks go to Nemesis, a proud member of the “FiB: Filkers in Black” who will take my place at the Freusburg and fill these old walls with songs of stars and dreams - and happy laughter.

It’s a hard battle against censorship, and as I now had some money at hand, I decided to do my part (freenetproject.org/donate.html).

  1. The pledge can be seen in identi.ca and in a Sone post in freenet (including a comment thread; needs a running freenet node (install freenet in a few clicks) and the Sone plugin). 

A bitcoin-marketplace using Freenet?

A few days ago, xor, the developer of the Web of Trust in Freenet got in contact with the brain behind the planned Web of Trust for Openbazaar, and toad, the former maintainer of Freenet questioned whether we would actually want a marketplace using Freenet.

I took a a few days to ponder the question, and I think a marketplace using Freenet would be a good idea - for Freenet as well as for society.

Freenet is likely the most secure way for implementing a digital market, which means it can work safely for small sums, but not for large ones - except if you can launder huge amounts of digital money. As such it is liberating for small people, but not for syndicates. For example a drug cartel needs to be able to turn lots of money into clean cash to pay henchmen abroads. Since you can watch bitcoin more easily than cash and an anonymous network makes it much harder to use scare-tactics against competing sellers, moving the marketplace from the street to the internet weakens syndicates and other organized crime by removing part of their options for creating a monopoly by force.

If a bitcoin marketplace with some privacy for small-scale users should become a bigger problem than the benefit it brings by weakening organized crime, any state or other big player can easily force the majority of users to reveal their identities by using the inherent tracability of bitcoin transactions.

Also the best technologies in freenet were developed (or rather: got to widespread use), because it had to actually withstand attacks.

Freenet as marketplace with privacy for small people equivalent to cash-payments would also help improve its suitability for whistleblowers - see hiding in the forest: A better alternative.

For free speech this would also help, because different from other solutions, freenet has the required properties for that: a store with lifetime depending on the popularity of content, not the power of the publisher, which provides DoS-resistant hosting without the need to have a 24/7 server, stable and untraceable pseudonyms (ignoring fixable attack-vectors) and an optional friend-to-friend darknet.

In short: A decentralized ebay-killer would be cool and likely beneficial to Freenet and Free Speech without bringing actual benefit for organized crime.

Also this might be what is needed to bring widespread darknet adoption.

And last but not least, we would not be able to stop people from implementing a marketplace over freenet: Censorship resistance also means resistance against censorship by us.

Final note: Openbazaar is written in Python and Freenet has decent Python Bindings (though they are not beautiful everywhere), so it should not be too hard to use it for Openbazaar. A good start could be the WoT-code written for Infocalypse in last years GSoC: Web of Trust integration as well as private messaging.

freenet_logo.png16.72 KB
freenet-banner.png3.39 KB

A deterministic upper bound for the network load of the fully decentralized Freenet spam filter

Goal: Improve the decentralized spam filter in Freenet (WoT) to have deterministic network load, bounded to a low, constant number of subscriptions and fetches.

This article provides calculations which show that decentralized spam filtering with privacy through pseudonyms can scale to communication systems that connect all of humanity. It is also applicable to other systems than Freenet, see use in other systems.

Originally written as a comment to bug 3816. The bug report said "someone SHOULD do the math". I then did the math. Here I’m sharing the results.

Useful prior reading is Optimizing a distributed spam filter for Freenet.

This proposal has two parts:

  1. Ensuring an upper bound on the network cost, and
  2. Limiting the cost due to checking stale IDs.


  • ID, "identity" or "pseudonym" is a user account. You can have multiple.
  • OwnID is one of your own identities, a pseudonym you control.
  • Trust is a link from one ID (a) to another ID (b). It has a numerical value.
    • Positive values mean that (a) considers (b) to be a constructive contributor.
    • Negative values mean that (a) thinks that (b) is trying to disrupt communication.
  • Key is an identifier you can use as part of a link to download data. Every ID has one key.
  • Editions are the versions of keys. They are increased by one every time a key is updated.
  • Fetch means to download some data from some key for some edition.
  • Subscription is a lightweight method to get informed if a key was updated to a new edition.
  • Edition hints are part of an ID. They show for each trusted ID (b) which edition of it was last seen by the trusting ID (a).
  • The rank of an ID describes the number of steps needed to get from your OwnID to that ID when following trust paths.


  • N the number of identities the OwnID gave positive trust. Can be assumed to be bounded to 150 active IDs (as by Dunbar’s number).⁰
  • M a small constant for additional subscriptions, i.e. 10.
  • F a small constant for additional fetches per update, i.e. 10.

⁰: https://en.wikipedia.org/wiki/Dunbar's_number - comment by bertm: that assumes all statements of "OwnID trusts ID to not be a spammer" to be equivalent to "OwnID has a stable social relationship with ID". I'm not quite sure of that equivalence. That said, for purposes of analysis, we can well assume it to be bounded by O(1).

Limit network load with a constant upper bound


Subscribe to all rank 1 IDs (which have direct trust from your OwnID). These are the primary subscriptions. There are N primary subscriptions.

All the other IDs are split into two lists: rank2 (secondary IDs) and rank3+ (three or more steps to reach them). Only a subset of those get subscriptions, and the subset is regularly changed:

  • Subscribe to the M rank2 IDs which were most recently updated. These have the highest probability of being updated again. The respective list must be updated whenever a rank2 ID is fetched successfully (the ordering might change).
  • Subscribe to the M rank3+ IDs which were most recently updated. The respective list must be updated whenever a rank3+ ID is fetched successfully (the ordering might change).
  • Subscribe to M rank2 IDs chosen at random (secondary subscriptions). When a secondary or random subscription yields an update, replace it with another ID of rank2, chosen at random.
  • Subscribe to M IDs of rank 3 or higher chosen at random (random subscriptions). When a random subscription yields an update, replace it with another rank3+ ID, chosen at random.

Also replace one of the randomly chosen rank2 and rank3+ subscription every hour. This ensures that WoT will always eventually see every update.

If any subscription yields an update, download its key and process all edition hints. Queue these as fetches in separate queues for rank1 (primary), rank2 (secondary), and rank3+ (random), and process them independently.

At every update of a subscription (rank1, rank2, or rank3+), choose F fetches from the respective edition hint fetch queue at random and process them. This bounds the network load to ((N × F) + (4M × F)) × update frequency.

These fetches and subscriptions must be deduplicated: If we already have a subscription, there’s no use in starting a fetch, since the update will already have been seen.

Calculating the upper bound of the cost

To estimate an upper bound for the fetch frequency, we can use the twitter frequency, which is about 5 tweets per day on average and 10 to 50 for people with many followers¹ (those are more likely to be rank1 IDs of others).

There are two possible extremes: Very hierarchic trust structure and egalitarian trust structure. Reality is likely a power-law structure.

  • In a hierarchic trust structure, we can assume that rank1 or rank2 IDs (trustee subscriptions) are all people with many followers, so we use 22 updates per day (as by ¹).
  • In an egalitarian trust structure we can assume 5 updates per day (as by ¹).

For high frequency subscriptions (most recently updated) we can assume 4 updates per hour for 16 hours per day, so 64 updates per day.⁰ For random subscriptions we can assume 5 updates per day (as by ¹).

¹: http://blog.hubspot.com/blog/tabid/6307/bid/4594/Is-22-Tweets-Per-Day-the-Optimum.aspx ← on the first google page, not robust, but should be good enough for this usecase.

((N × F) + (M × F)) × trustee update frequency + 2M × F × high update frequency + 2M × F × random update frequency.

For a very hierarchic WoT (primaries are very active) this gives the upper bound:

= (150 × 10 × 22) + (10 × 10 × 22) + (10 × 10 × 64) + (2 × 10 × 10 × 5) + (10 × 10 × 64)
= (1500 × 22) + (100 × 22) + (100 × 64) + (100 × 5) + (100 × 64)
= 33000 + 2200 + 6400 + 500 + 6400 # primary triggered + random rank2 + active rank2 + random rank3+ + active rank3+
= 48500 fetches per day
~ 34 fetches per minute.

For an egalitarian trust structure (primaries have average activity) this gives the upper bound:

= (150 × 10 × 5) + (10 × 10 × 5) + (10 × 10 × 64) + (10 × 10 × 5) + (10 × 10 × 64)
= (1500 × 5) + (100 × 5) + (100 × 64) + (100 × 5) + (100 × 64)
= 7500 + 500 + 6400 + 500 + 6400 # primary triggered + random rank2 + active rank2 + random rank3+ + active rank3+
= 21300 fetches per day
~ 15 fetches per minute.

This gives a plausible upper bound of the network load per day from this scheme, assuming a very centralized WoT. The upper bound for a very hierarchic trust structure is dominated by the primary subscriptions. The upper bound for an egalitarian trust structure is dominated by the primary subscriptions and the high frequency subscriptions.

The rank2 subscriptions and the random subscriptions together make up about 5% of the network load. They are needed to guarantee that the WoT always eventually converges to a globally consistent view.

One fetch for an ID transfers about 1KiB data. For a hierarchic WoT (one fetch per two seconds) this results in a maximum bandwidth consumption on a given node of 1KiB/s × hops. This is about 5KiB/s for the average of 5 hops — slightly higher than our minimum bandwidth. For an egalitarian WoT this results in a maximum bandwidth consumption on a given node of 0.5KiB/s × hops. This is about 2.5KiB/s for the average of 5 hops — 60% of our minimum bandwidth. The real bandwidth requirement should be lower, because IDs are cached very well.

The average total number of subscriptions to active IDs should be bounded to 190.

⁰: The cost of active IDs might be overestimated here, because WoT has an upper bound of one update per hour. In this case the cost of this algorithm would be reduced by about 30% for the egalitarian structure and by about 10% for the hierarchic structure.

prune subscriptions to stale IDs to improve the rank2+ update detection delay to (less than) O(N), with N the known active IDs

The process to check IDs with rank >= 2 can be improved from essentially checking them at random (with the real risk of missing IDs — there is no guarantee to ever check them all, not even networkwide), to having each active ID check all IDs in O(N) (with N the number of of IDs).


When removing a random subscription to an ID with rank2 or higher, with 50% probability add the ID+currentversion to a blocklist which avoids processing this same ID with this or a lower version again and prune it from the WoT.¹

When receiving a version hint from another ID with a higher version than the one which is blocked, the ID is removed from the blocklist.

The total cost in memory is on the order of the number of old IDs already checked, bounded to O(N), the number of Identities.

¹: Pruning the ID from WoT is not strictly necessary on the short term. However on the long term (a decade and millions of users), we must remove information.

Expected effect

Assume that 9k of the 10k IDs in WoT are stale (a reasonable assumption, because only about 300 IDs are inserted from an up to date version of WoT right now).

When replacing one random rank2 and one random rank3+ subscription per hour, that yields about 16k subscription replacements per year, or (in a form which simplifies the math) about two replacements per ID in the WoT.

Looking at only a single ID:

For the first replacement there is a 90% probability that the ID in question is stale, and a 50% probability that it will be put on the blocklist if it is stale, which yields a combined 45% probability that the number of stale IDs decreases by one. In other words, it takes on average 2.2 steps to remove the first stale ID from the IDs to check.

As a rough estimate, for 10 IDs it would take 15 steps to prune out 5 of the 9 stale IDs. Scaling this up should give an estimation of the time required for 9k IDs. So after about 15k steps (one year) half the stale IDs should be on the blocklist.

Looking at the whole network

For a given stale ID, after one year there is roughly a 50% chance that it is on the blocklist of a given active ID. But the probability that it is on the blocklist of every active ID is just about 0.5k, with k the number of active IDs. So when there is an update to this previously stale ID, it is almost certain that some ID will see it and remove it from the blocklists of most other IDs within O(N) steps by providing an edition hint (this will accelerate as more stale IDs are blocked).

Rediscovering inactive IDs when they return

I am sure that there is a beautiful formula to calculate exactly the proportion of subscriptions to stale IDs we’ll have with this algorithm when it entered a steady state, and the average discovery time for a previously stale ID to be seen networkwide again when it starts updating again. To show that this algorithm should work, we only need a much simpler answer, though:

How long will it take an ID which was inactive for 10 years to be seen networkwide again (if its direct trusters are all inactive, else the primary subscriptions would detect and spread its update within minutes)?

After 10 years, the ID will be on the blocklist of 99.9% of the IDs. In a network with 10k active IDs, that means that only about 10 IDs did not block it yet¹. Every year there is a 50% probability for each of the IDs that the update will be seen.

Therefore detection of the update to an ID which was inactive for 10 years and whose direct trusters are all inactive will take about 10 weeks. Then the update should spread rapidly via edition hints.

¹: There is a 7% probability that 15 or more IDs could still see it and a 1.2% probability that less than 5 IDs still see it. The probability that only a single ID did not block it yet is just 0.005%. In other words: If 99% of IDs would become inactive and then active again after 10 years, approximately one will need about two years to be seen and most will be detected again within 10 weeks. Therefore this scheme is robust against long-term inactivity.


This algorithm can give the distributed spam filter in Freenet a constant upper bound in cost without limiting interaction.

A vision for a social Freenet with WoT, FreeTalk and Sone

I let my thought wander a bit around the question how a social Freenet (2.0 ;) ) could look from the view of a newcomer.

I imagine myself installing freenet. The first thing to come up after starting it is the node page. (italic Text in brackets is a comment. The links need a Freenet running on to work)

“Welcome to Freenet, where no one can tell you’re reading”

“Freenet tries hard to project your privacy. Therefore we created a pseudonymous ID for you. Its name is Gandi Schmidt. Visit the [your IDs site] to see a legend we prepared for you. You can use this legend as fictional background for your ID, if you are really serious about staying anonymous.”

(The name should be generated randomly for each ID. A starting point for that could be a list of scientists from around the world compiled from the wikipedia (link needs freenet). The same should be true for the legend, though it is harder to generate. The basic information should be a quote (people remember that), a job and sex, the country the ID comes from (maybe correlated with the name) and a hobby.)

“During the next few restarts, Freenet will ask you to solve various captchas to prove that you are indeed human. Once enough other nodes successfully confirmed that you are human, you will gain write access to the forums and microblogging. This might take a few hours to a few days.”

(as soon as the ID has sufficient trust, automatically activate posting to FreeTalk, Sone and others. Access is delayed to ensure that when people talk they can get answers)

“Note that other nodes don’t know who you are. They don’t know your IP, nor your real identity. The only thing they know is that you exist, that you can solve captchas and how to send you a message.”

“You can create additional IDs at any time and give them any name and legend you choose by adding it on the WebOfTrust-page. Each new ID has to verify for itself that it’s human, though. If you carefully keep them seperate, others can only find out with a lot of effort that your IDs are related. Mind your writing style. In doubt, keep your sentences short. To make it easier for you to stay anonymous, you can autogenerate Name and Legend at random. Don’t use the nicest from many random trials, else you can be traced by the kind of random IDs you select.”

“While your humanity is being confirmed, you can find a wealth of content on the following indexes, some published anonymously, some not. If you want to publish your own anonymous site, see Upload a Freesite. The list of indexes uses dynamic bookmarks. You get notified whenever a bookmarked site (like the indexes below) gets updated.”

“Note: If you download content from freenet, it is being cached by other nodes. Therefore popular content is faster than rare content and you cannot overload nodes by requesting their data over and over again.”

“You are currently using medium security in the range from low to high.”

“In this security level, seperated IDs are no perfect protection of your anonymity, though, since other members might not be able to see what you do in Freenet, but they can know that you use freenet in the first place, and corporations or governments with medium sized infrastructure can launch attacks which might make it possible to trace your contributions and accesses. If you want to disappear completely from the normal web and keep your freenet usage hidden, as well as make it very hard to trace your contributions, to be able to really exercise your right of free speech without fearing repercussions, you can use Freenet as Darknet — the more secure but less newcomer friendly way to use freenet; the current mode is Opennet.”

“To enter the Darknet, you add people you know and trust personally as your darknet friends. As soon as you have enough trusted friends, you can increase the security level to high and freenet will only connect to your trusted friends, making you disappear from the regular internet. The only way to tell that you are using freenet will then be to force your ISP to monitor all traffic coming from your computer.”

“And once transport plugins are integrated, steganography will come into reach and allow masking your traffic as regular internet usage, making it very hard to distinguish freenet from encrypted internet-telephony. If you want to help making this a reality in the near future, please consider contributing or donating to freenet.”

“Welcome to the pseudonymous web where no one can know who you are, but only that you are always using the same ID — if you do so.”

“To show this welcome message again, you can at any time click on Intro in the links.”

What do you think? Would this be a nice way to integrate WoT, FreeTalk, Sone and general user education in a welcome message, while adding more incentive to keep the node running?

PS: Also posted in the Freenet Bugtracker, in Freetalk and in Sone – the last two links need a running Freenet to work.

PPS: This vision is not yet a reality, but all the necessary infrastructure is already in place and working in Freenet. You can already do everything described in here, just without the nice guide and the level of integration (for example activating plugins once you have proven your humanity, which equals enough trust by others to be actually seen).

Anonymous code collaboration with Mercurial and Freenet

Anonymous DVCS in the Darknet.

There is a new Mercurial extension for interaction with Freenet called "infocalypse" (which should keep working after the information apocalypse).

It offers "fn-push" and "fn-pull" as an optimized way to store code in freenet: bundles are inserted and pulled one after the other. An index tells infocalypse in which order to pull the bundles. It makes using Mercurial in freenet far more efficient and convenient.

Real Life Infocalypse
easy setup of infocalypse (script)
distributed, anonymous development

Also you can use it to publish collaborative anonymous websites like the freefaq and Technophob.

And it is a perfect fit for the workflow automatic trusted group of committers.

Otherwise it offers the same features as FreenetHG.

The rest of the article is concerned with the older FreenetHG extension. If you need to choose between the two, use Infocalypse: It’s concept for sharing over Freenet is more robust.

Using FreenetHG you can collaborate anonymously without having to give everyone direct write access to your code.

To work with others, you simply setup a local repository for your own work and use FreenetHG to upload your code automatically into Freenet under your private ID. Others can then access your code with the corresponding public ID, do their changes locally and publish them in their own anonymous repository.

You then pull changes you like into your repository and publish them again under your key.

FreenetHG uses freenet which offers the concept of pseudonymity to make anonymous communication more secure and Mercurial to allow for efficient distributed collaboration.

With pseudonymity you can't find out whom you're talking to, but you know that it is the same person, and with distibuted collaboration you don't need to let people write to your code directly, since every code repository is a full clone of the main repository.

Even if the main repository should go down, every contributor can still work completely unhindered, and if someone else breaks things in his repository, you can simply decide not to pull the changes from him.

What you need

To use FreenetHG you obviously need a running freenet node and a local Mercurial installation. Also you need the FreenetHG plugin for Mercurial and PyFCP which provides Python bindings for Freenet.

  • get FreenetHG (the link needs a running freenet node on
  • alternatively just do

    hg clone static-,E3S1MLoeeeEM45fDLdVV~n8PCr9pt6GMq0tuH4dRP7c,AQACAAE/freenethg/1/

Setup a simple anonymous workflow

To guide you through the steps, let's assume we want to create the anonymous repository "AnoFoo".

After you got all dependencies, you need to activate the FreenetHG plugin in your ~/.hgrc file

freenethg = path/to/FreenetHG.py

You can get the FreenetHG.py from the freenethg website or from the Mercurial repository you cloned.

Now you setup your anofoo Mercurial repository:

hg init AnoFoo

As a next step we create some sections in the .hg/hgrc file in the repository:




Now we enter the repository and use the setup wizard

cd AnoFoo
hg fcp-setupwitz

The setup wizard asks us for your username to use for this repository (to avoid accidently breaking our anonymity), the address to our freenet instance and for the path to our repository on freenet.

The default answers should fit. The only one where we have to set something else is the project name. There we enter AnoFoo.

Since we don't yet have a freenet URI for the repository, we just answer '.' to let FreenetHG generate one for us. That's also the default answer.

The commit hook makes sure that we don't commit with another but the selected username.

Also the wizard will print a line like the following:

Request uri is: USK@xlZb9yJbGaKO1onzwawDvt5aWXd9tLZRoSoE17cjXoE,zFqFxAk15H-NvVnxo69oEDFNyU9uNViyNN5ANtgJdbU,AQACAAE/freenethg_test/1/

This is the line others can use to clone your project and pull from it.

And with this we finished setting up our anonymous collaboration repository.

When we commit, every commit will directly be uploaded into Freenet.

So now we can pass the freenet Request uri to others who can clone our repository and setup their own repositories in freenet. When they add something interesting, we then pull the data from their Request uri and merge their code with ours.

Setup a more convenient anonymous workflow

This workflow is already useful, but it's a bit inconvenient to have to wait after each commit until your changes have been uploaded. So we'll now change this basic workflow a bit to be able to work more conveniently.

First step: clone our repositories to a backup location:

hg clone AnoFoo BackFoo

Second step: change our .hg/hgrc to only update when we push to the backup repository, and add the default-push path to the backup repository:

default-push = ../BackFoo

pretxncommit = python:freenethg.username_checker                      
outgoing = python:freenethg.updatestatic_hook                           

username = anonymuse

commitusername = anonymuse
inserturi = USK@VERY_LONG_PRIVATE_KEY/AnoFoo/1/

Changes: We now have a default-push path, and we changed the "commit" hook to an "outgoing" hook which is evoked everytime changes leave this repository. It will also be evoked when someone pulls from this repo, but not when we clone it locally.

Now our commits roll as fast as we're used to from other Mercurial repositories and freenethg will make sure we don't use the wrong username.

When we want to anonymously publish the repository we then simply use

hg push

This will push the changes to the backup and then upload it to your anonymous repository.

And now we finished setting up our reopsitory and can begin using an anonymous and almost infinitely scaleable workflow which only requires our freenet installation to be running when we push the code online.

One last touch: If an upload should chance to fail, you can always repeat it manually with

hg fcp-uploadstatic

Time to go

...out there and do some anonymous coding (Maybe with the workflow automatic trusted group of committers).

Happy hacking!

And if this post caught your interest or you want to say anything else about it, please write a comment.

Also please have a look at and vote for the wish to add a way to contribute anonymously to freenet, to make it secure against attacks on developers.

And last but not least: vote for this article on digg and on yigg.

Answers to “I can't use Freenet”

Short answers to questions from a message in the anonymous Freenet Message System:

Ultra-short answer: Go to https://freenetproject.org/pages/download.html and run the installer. It’s fast and easy.

Now onward to the message:

psst@GdwO… wrote :

ArneBab@-jtT… wrote : Yes. And that’s one of the reasons why we need Freenet: to wrestle back control over our communication channel.

Good luck getting people to use it though.

Yes, that’s something we need to fix. And there’s a lot we can do for that. It’s just a lot of boring work.

Let’s go through your points and see which we could fix:

I can't use Freenet. It's illegal! It isn't? How do you know?

It’s created by a registered tax-exempt charity1, how can it be illegal?

I don't want people to think I'm some kind of paranoid nutjob.

Maybe we should add some quotes from the Guardian on the frontpage, and maybe also quote the CNN news about Freenet as a counterpoint?

»You don't need to be talking to a terror suspect to have your communications data analysed by the NSA. The agency is allowed to travel "three hops" from its targets — who could be people who talk to people who talk to people who talk to you. Facebook, where the typical user has 190 friends, shows how three degrees of separation gets you to a network bigger than the population of Colorado. How many people are three "hops" from you?« — The Guardian in NSA files decoded, 2013.

»There is now no shield from forced exposure. . . The foundation of Groklaw is over. . . the Internet is over« – Groklaw, Forced Exposure (2013-08-20)

»This is the most visible line in the sand for people: Can they see my dick?« — »When your junk was passed by Gmail (to a foreign server), the NSA caught a copy of that.« — John Oliver and Edward Snowden in Last Week Tonight: Government Surveillance, 2015, quoted by engadget in Snowden shows John Oliver how the NSA can see your dick pics.

»there is no central server and no one knows who's using it so it can not be shut down … where there is a message it is likely to find a medium.« — CNN about Freenet, 2005-12-19.

Why don't you grow up, and just accept that you have to be ruled by authority? It's just the way the world works!

Democracy without free press is meaningless. Let’s quote some presidents on this.

»The liberty of the press is essential to the security of freedom in a state: it ought not, therefore, to be restrained in this commonwealth.« — John Adams, 1780, second president of the USA.

»When people talk of the Freedom of Writing, Speaking, or thinking, I cannot choose but laugh. No such thing ever existed. No such thing now exists; but I hope it will exist. But it must be hundreds of years after you and I shall write and speak no more.« — John Adams Letter to Thomas Jefferson (15 July 1817)

»No experiment can be more interesting than that we are now trying, and which we trust will end in establishing the fact, that man may be governed by reason and truth. Our first object should therefore be, to leave open to him all the avenues to truth. The most effectual hitherto found, is the freedom of the press.« — Thomas Jefferson, third president of the USA, in a letter to Judge John Tyler (June 28, 1804)

»Our liberty depends on the freedom of the press, and that cannot be limited without being lost.« — Thomas Jefferson, letter to Dr. James Currie (28 January 1786) Lipscomb & Bergh 18:ii.

»What makes it possible for a totalitarian or any other dictatorship to rule is that people are not informed; how can you have an opinion if you are not informed?« — Hannah Arendt, 1974

»And that is why our press was protected by the First Amendment — the only business in America specifically protected by the Constitution — … to inform, to arouse, to reflect, to state our dangers and our opportunities, to indicate our crises and our choices, to lead, mold, educate and sometimes even anger public opinion.« — John F. Kennedy, 35th president of the united state, Address before the American Newspaper Publishers Association (27 April 1961)

»Without general elections, without freedom of the press, freedom of speech, freedom of assembly, without the free battle of opinions, life in every public institution withers away, becomes a caricature of itself, and bureaucracy rises as the only deciding factor.« — Rosa Luxemburg, Reported in Paul Froelich, Die Russische Revolution (1940).

»A popular Government without popular information, or the means of acquiring it, is but a Prologue to a Farce or a Tragedy, or perhaps both.« — James Madison, fourth president of the USA, in a letter to W.T. Barry (1822-08-04).

»A critical, independent and investigative press is the lifeblood of any democracy.« — Nelson Mandela on freedom of expression, At the international press institute congress (14 February 1994).

»we believe that when governments censor or control information, that ultimately that undermines not only the society, but it leads to eventual encroachments on individual rights as well.« — Barack Obama, 44th president of the USA, in Rangoon, Burma on November 14, 2014

»If in other lands the press and books and literature of all kinds are censored, we must redouble our efforts here to keep them free.« — Franklin D. Roosevelt, 32nd president of the USA, Address to the National Education Association (30 June 1938).

»The liberty of the press is no greater and no less than the liberty of every subject of the Queen.« — Lord Russell of Killowen, Reg. v. Gray (1900), L. R. 2 Q. B. D. 40.

… and many more by Wikiquote: Freedom of the press.

There's no need for Freenet, because nothing is wrong, otherwise my daily commute in my gas guzzler and my TV would be bad, and I like those!

You don’t have to change your life to use Freenet. You do harm yourself quite a bit if you let others control your communication, though. They might make you think your life is bad.

Get a life, you neckbeard.

Let’s play some games on Freenet. We need more fun and life here, that’s true.

Why are you being so distrustful and negative? What are you hiding?

Did you see what they did to Edward Snowden?

If I use it, then I'm helping terrorists blow us up!

If you let terrorists listen in on your communication, you help them scout out their targets!

It's slow!

Let’s not advertise sending movies. Chat over Freenet is nice (FLIP/FLIRCP).

I have to install two programs?

Need to recover flircp and enable it by default. Also advertise node-to-node textmessages (friend-to-friend talk).

Same for Sharesite and Darknet Chat.

I'm not good with computers!

Freenet is easier to install than Starcraft!

im confuse can i install without thinking loll??? I don't care enough to bother.

Yes you can. Most times it actually works.

My computer says it's a dangerous virus!

Need to get fred whitelisted in more anti-virus databases… the new C# based installer should help.

I'm not a hacker!

I don’t break into computers either. And I don’t want others to publish what I tell you in private.

Is there an app for my iPhone?

There is something for your Android: - https://f-droid.org/repository/browse/?fdid=co.loubo.icicle

Can't you just send me the files on Skype?

Sure, but I won’t send anything I wouldn’t also send to the local newspaper. Microsoft has been shown to actually try to use login links sent over skype.

I don't have time for this I have to go to work.

Just try again a few weeks or months later.

Short term solutions (stuff which should take less than 6 months to deploy):


  • put more prominently on front page that Freenet Project Inc. is a registered charity.
  • quote the guardian or so about the importance of secure communication.
  • quote a US president and the UN secretary on the importance of free speech for democracy.
  • quote Edward Snowden.
  • quote someone on the importance of secure communication to fight terrorists.
  • make the download page look easy. Maybe a big button instead of a text-link?
  • link the icicle app on the webpage. With an image.
  • promote the use of node-to-node messages in friend-to-friend mode.
  • ask people every few months to try to invite their friends again. Hey, how about sending another note to your friends today?

Using Freenet

  • get more positive, friendly content on Freenet.
  • play fun games over Freenet.

Freenet development

  • recover flircp. Make flircp and Darknet Chat official. Activated by default.
  • polish the user interface. A lot.


Go to https://freenetproject.org/pages/download.html and run the installer. Send your friends there, too. It’s fast and easy. And gives you a confidential communication channel.

Originally published on random_babcom: my in-Freenet single-page blog.

  1. The Freenet Project Inc is a 501(c)(3) non-profit organization, with the mission "to assist in developing and disseminating technological solutions to further the open and democratic distribution of information". It is registered under EIN 95-4864038. 

Background of Freenet Routing and the probes project (GSoC 2012)

The probes project is a google summer of code project of Steve Dougherty intended to optimize the network structure of freenet. Here I will give the background of his project very briefly:

The Small World Structure

Freenet organizes nodes by giving them locations - like coordinates. The nodes know some others and can send data only to those, to which they are connected directly. If your node wants to contact someone it does not know directly, it sends a message to one of the nodes it knows and asks that one to forward the message. The decision whom to ask to forward the message is part of the routing.

And the routing algorithm in Freenet assumes a small world network: Your node knows many people who are close to you and a few who are far away. Imagine that as knowing many people in your home town and few in other towns. There is mathematical proof, that the routing is very efficient and scales to billions of users - if it really operates on a small world network.

So each freenet node tries to organize its connections in such a way, that it is connected to many nodes close by and some from far away.⁽¹⁾ The structure of the local connections of your own node can be characterized by the link length distribution: “How many short and how many long connections do you have?”

Probes and their Promise

The probes project from Steve is to analyze the structure of the network and the structure of the local connections of nodes in an anonymous way to improve the self-organization algorithm in freenet. The reason is that if the structure of the network is no small world network, the routing algorithm becomes much less efficient.

That in turn means that if you want to get some data on the network, that data has to travel over far more intermediate nodes, because freenet cannot determine the shortest route. And if the data has to travel over more nodes, it consumes more bandwidth and takes longer to reach you. In the worst case it could happen that freenet does not find the data at all.

To estimate the effect of that, you can look at the bar chart The Seeker linked to:


Low is an ideal structure with 16 connections per node, Conforming is the measured structure with about 17 connections per node (a cluster with 12, one with ~25). Ideally we would want Normal with 26 connections per node and an ideal structure. High is 86 connections. The simulated network sizes are 6000 nodes (Small), 18 000 (Normal, as measured), 36 000 (Large). Fewer hops is better.

It shows how many steps a request has to take to find some content. “Conforming” is the actually measured structure. “low”, “normal” and “high” shows the number of connections per node in an optimal network: 16, 26 and 86. The actually measured mean number of connections in freenet is similar to “low”, so that’s the bar with which we need to compare the “confirming” bar to see the effect of the suboptimal structure. And that effect is staggering: By default a request needs about two times as many steps in the real world than it would need in an optimally structured network.

Practically: If freenet would manage to get closer to the optimal structure, it could double its speed and cut the reaction times by factor 2. Without changing anything else - and also without changing the local bandwidth consumption: You would simply get your content much faster.

If we would manage to increase the mean number of connections to about 26 (that’s what a modern DSL connection can manage without too many ill effects), we could double the speed and half the reaction times again (but that requires more bandwidth in the nodes who currently have a low number of connections: Many have only about 12 connections, many have about 25 or so, few have something in between).

Essentially that means we could gain factor 2 to factor 4 in speed and reaction times. And better scaleability (compare the normal and the large network).

Note ⁽¹⁾: Network Optimization using Only Local Knowledge

To achieve a good local connection-structure, the node can use different strategies for Opennet and Darknet (this section is mostly guessed, take it with a grain of salt. I did not read the corresponding code).

In Opennet it can look if it finds nodes which would improve its local structure. If it finds one, it can replaces the local connection, which distorts its local structure the most, with the new connection.

In Darknet on the other hand, where it can only connect to the folks it already knows, it looks for locations of nodes it hears about. It then checks if its local connection would be better if it had that other nodes location. In that case, it asks the other node if it would agree to swap its location with it (without changing any real connections: It only changes the notion where it lives. As if you would swap the flat with someone else but without changing who your friends are. Afterwards both the other one and you live closer to your respective friends).

In short: In Opennet, Freenet changes to whom it is connected in order to achieve a small world structure: It selects its friends based on where it lives. In Darknet it swaps its location with stranges to live be closer to its friends.

freenet-probes-size-degree-chart.png13.94 KB

Bootstrapping the Freenet WoT with GnuPG - and GnuPG with Freenet


When you enter the freenet Web of Trust, you first need to get some trust from people by solving captchas. And even when people trust you somehow, you have no way to prove your identity in an automatic way, so you can’t create identities which freenet can label as trusted without manual intervention from your side.


To change this, we can use the Web of Trust used in GnuPG to infer trust relationships between freenet WoT IDs.

Practically that means:

  • Write a message: “I am the WoT ID USK@” (replace with the public key of your WoT ID).
  • Sign that message with a GnuPG key you want to connect to the ID. The signature proves, that you control the GnuPG key.
  • Upload the signed message to your WoT key: USK@/bootstrap/0/gnupg.asc. To make this upload, you need the private key of the ID, so the upload proves, that you control the WoT ID.

Now other people can download the file from you, and when they trust the GnuPG key, they can transfer their trust to the freenet WoT-ID.


Ideally all this should be mostly automatic:

  • click a link in the freenet interface and select the WoT ID to have freenet create the file and run your local GnuPG program.
  • Then select your GnuPG key in the GnuPG program and enter your password.
  • Finally check the information to be inserted and press a button to start the upload.

As soon as you have a GnuPG key connected with your WoT ID, freenet should scout all other WoT IDs for gnupg keys and check if the local GnuPG key you assigned to your WoT ID trusts the other key. If yes, give automatic trust (real person → likely no spammer).


To make the connection one-way (bootstrap the WoT from GnuPG, but not expose the key), you might be able to encrypt the message to all people who signed your GnuPG key. Then these can recognize you, but others cannot.

This will lose you the indirect trust in the GnuPG web-of-trust, though.

I hope this bootstrap-WoT draft sounded interesting :)

Happy hacking!

Building the darknet one ref at a time

Freenet Logo: Follow the RabbitBuilding the darknet one ref at a time. That’s what we have to do. If you invite three people⁰ to Freenet and help those of your friends with similar interests to connect¹², and when the people you invited then do the same, we get exponential growth.

⁰: To invite a friend into Freenet, you can send an email like this:
    Let us talk over Freenet, so I can speak freely again.

¹: Helping your friends to connect works as follows:

  1. ask: First ask your friends whether they want to connect to others. Just go to the friends page ( ), tick the checkbox next to each of the friends you want to ask and click the drop-down list at the bottom named -- Select action --. Select Send N2NTM to selected peers³ and click Go. A text field opens with which you can send a message to all the peers you selected. I typically ask something like "Hi, do you want to connect via darknet to fellow pirate party members?" (replace "pirate party members" by whatever unites the group of people you’re asking).
  2. noderefs: Go to the friends page in advanced mode ( ). There you find a link named noderef next to each name. Just download the noderefs of the people who want to connect.
  3. introduction file: Then copy them into a text file and add a short description of each person before the persons noderef.
  4. upload: Now upload that text file. I use freenetupload from pyFreenet for that, but regular insert via the browser ( ) works as well. When the upload finishes, you’ll find the link on the uploads page ( - see the column key).
  5. message: Go to the friends page again (I’m lazy and use simple mode: ), tick the checkbox next to each of the friends you want to help connect and click the drop-down list at the bottom named -- Select action --. Select Send N2NTM to selected peers and click Go. A text field opens with which you can send a message to all the peers you selected.
  6. write and send: Write something like "The following link includes the noderefs of people you might want to connect to. Just copy the noderef (from 'identity' to 'End') into the text field on if you want to connect. If both of you do that, your freenet nodes will connect". Copy the link to the uploaded introduction text file into the text field (below your text) and click Send message.

²: Only connect those with similar interests (who might in the real world meet in a club or at work or who are related by blood or association). This is needed for efficient routing in Freenet.

When free speech dies, we need a place to organize. Let’s build that place.

³: A N2NTM is a Node-To-Node-Text-Message: A confidential message sent between people whose Freenet nodes are connected as friends.

Thanks for this text goes to ts.

De-Orchestrating Freenet with the QUEEN program

So Poul-Henning Kamp thought this just a thought experiment …

In Fosdem2014 Poul-Henning Kamp talked about a hypothetical “Project ORCHESTRA” by the NSA with the goal of disrupting internet security: Information, Slides, Video (with some gems not in the slides).

One of the ideas he mentioned was the QUEEN program: Psy-Ops for Nerds.

I’ve been a contributor to the Freenet Project for several years. And in that time, I experienced quite a few of these hypothetical tactics first-hand.

This is the list of good matches: Disruptive actions which managed to keep Freenet from moving onwards, often for several months. It’s quite horrifying how many there are. Things which badly de-orchestrated Freenet:

  • Steer discussions to/from hot spots (“it can’t be that hard to exchange a text file!” ⇒ noderef exchange fails all the time, which is the core of darknet!)
  • Disrupt consensus building: Horribly long discussions which cause the resolution to be forgotten due to a fringe issue.
  • “Secrecy without authentication is pointless”.
  • “It gives a false sense of security” (if you talor [these kind of things] carefully, they speak to people's political leanings: If it’s not perfect: “No, that wouldn’t do it”. This stopped many implementations, till finally Bombe got too fed up and started the simple and working microblogging tool Sone)
  • “you shouldn’t do that! Do you really know what you are doing? Do you have a PhD in that? The more buttons you press, the more warnings you get” ← this is “filter failed”: No, I don’t understand this, “get me out of that!” ⇒ Freenet downloads fail when the filter failed.
  • Getting people to not do things by misdirecting their attention on it. Just check the Freenet Bugtracker for unresolved simple bugs with completely fleshed out solutions that weren’t realized.
  • FUD: I could be supporting bad content! (just like you do if your provider has a transparent proxy to reduce outgoing bandwidth - or with any VPN, Tor, i2p, .... Just today I read this: « you seriously think people will ever use freenet to post their family holiday photos, favourite recipes etc? … can you envisage ordinary people using freenet for stuff where they don't really have anything to hide? » — obvious answer: I do that, so naturally other people might do it, too.)
  • “Bikeshed” discussions: Sometimes just one single email from an anonymous person can derail a free software project for months!
  • Soak mental bandwidth with bogus crypto proposals: PSKs? (a new key-proposal which could make forums scale better but actually just soaked up half a year of the time of the main developer and wasn’t implemented - and in return, critical improvements for existing forums where delayed)
  • Witless volunteers (overlooking practical advantages due to paranoia, theoretical requirements which fail in the real world, overly pessimistic stance which scares away newcomers, voicing requirements for formal specification of protocols which are in flux).
  • Affect code direction (lot’s of the above - also ensuring that there is no direction, so it doesn’t really work well for anybody because it tries to have the perfect features for everybody before actually getting a reasonable user experience).
  • Code obfuscation (some of the stuff is pretty bad, lots of it looks like it was done in a hurry, because there was so much else to do).
  • Misleading documentation (or outdated or none…: There is plenty of Freenet 0.5 documentation while 0.7 is actually a very different beast)
  • Deceptive defaults (You have to setup your first pseudonym by hand, load two plugins manually and solve CAPTCHAS, before you are able to talk to people anonymously, darknet does not work out of the box, the connection speed when given no unit is interpreted as Bytes/s - I’m sure someone once voiced a reason for that)

Phew, quite a list…

I provided this because naming the problems is an important step towards resolving them. I am sure that we can fix most of this, but it’s important to realize that while many of the points I named are most probably homegrown, it is quite plausible that some of them were influenced from the outside. Freenet was always a pretty high profile project in the crypto community, so it is an obvious target. We’d be pretty naive to think that we weren’t targeted.

And we have to keep this in mind when we communicate: We don’t only have to look out for bad code, but also for influences which make us take up toxic communication patterns which keep us from moving forward.

The most obvious fix is: Stay friendly, stick together, keep honest and greet every newcomer as a potential ally. And call out disrupting behaviour early on: If someone insults new folks or takes up huge amounts of discussion time by rehashing old discussions instead of talking about the way forward - in a way which actually leads to going forward - then say that this is your impression. Still stay friendly: Most of the time that’s not intentional. And people can be affected by outside influences like someone attacking them in other channels, so it would be important to help them recover and not to push them away because their behaviour became toxic for some time (as long as the time investment for that is not overarching).

Overall it’s about keeping the community together despite the knowledge that some of us might actually be aggressors or influenced from the outside to disrupt our work.

Distributed censorship-resistant Wikipedia

Thanks to doublec, there are now distributed censorship-resistant Wikipedia mirrors in Freenet: Distributed Wikipedia Mirrors in Freenet

The current largest mirror is the Simple English Wikipedia (the obvious choice to fight censorship worldwide: it is readable with basic english skills).

With this mirror, information from Wikipedia can be accessed in high-censorship countries:


To access the site, install Freenet from https://freenetproject.org (or get the installer from someone). If you run it on the default port, you can access the mirror anonymously via the following link:

Censorship-resistant Simple English Wikipedia

To test this without installing Freenet, see

(this one is not anonymous!)

Effortless password protected sharing of files via Freenet

TL;DR: Inserting a file into Freenet using the key KSK@<password> creates an invisible, password protected file which is available over Freenet.

Often you want to exchange some content only with people who know a given password and make it accessible to everyone in your little group but invisible to the outside world.

Until yesterday I thought that problem slightly complex, because everyone in your group needs a given encryption program, and you need a way to share the file without exposing the fact that you are sharing it.

Then I learned two handy facts about Freenet:

  • Content is invisible to all but those with the key
    <ArneBab> evanbd: If I insert a tiny file without telling anyone the key, can they get the content in some way?
    <evanbd> ArneBab: No.

  • You generate a key from a password by using a KSK-key
    <toad_> dogon: KSK@<any string of text> -> generate an SSK private key from the hash of the text
    <toad_> dogon: if you know the string, you can both insert and retrieve it

In other words:

Just inserting a file into Freenet using the key KSK@<password> creates an invisible, password protected file which is shared over Freenet.

The file is readable and writeable by everyone who knows the password (within limits1), but invisible to everyone else.

To upload a file as KSK, just go to the filesharing tab, click “upload a file”, switch to advanced mode and enter the KSK key.

Or simply click here (requires freenet to be running on your computer with default settings).

It’s strange to think that I only learned this after more than 7 years of using Freenet. How many more nuggets might be hidden there, just waiting for someone to find them and document them in a style which normal users understand?

Freenet is a distributed datastore which can find and transfer data efficiently on restricted routes (search for meshnet scaling to see why that type of routing is really hard), and it uses a WebOfTrust for real-life spam-resistance without the need for a central authority (look at your mailbox to see how hard that is, even with big money).

How many more complex problems might it already have solved as byproduct of the search for censorship resistance?

So, what’s still to be said? Well, if Freenet sounds interesting: Join in!

  1. A KSK is writeable with the limit, that you cannot replace the file if people still have it in their stores: You have to wait till it has been displaced or be aware that now two states for the file exist: One with your content and one with the old. Better just define a series of KSKs: Add a number to the KSK and if you want to write, simply insert the next one. 

Exact Math to the rescue - with Guile Scheme

I needed to calculate the probability that for every freenet user there are at least 70 others in a distance of at most 0.01. That needs binomial coefficients with n and k on the order of 4000. My old Python script failed me with an OverflowError: integer division result too large for a float. So I turned to Guile Scheme and exact math.

1 The challenge

I need the probability that within 4000 random numbers between 0 and 1, at least 70 are below 0.02.

Then I need the probability that within 4000 random numbers, at most 5 find less than 70 others to which the distance is at most 0.02.

Or more exactly: I need to find the right maximum length to replace the 0.02.

2 The old script

I had a Python-script lying around which I once wrote for estimating the probability that a roleplaying group will have enough people to play in a given gaming night.

It’s called spielfaehig.py (german for “able to play”).

It just does this:

from math import factorial
fac = factorial
def nük(n, k): 
   if k > n: return 0
   return fac(n) / (fac(k)*fac(n-k))

def binom(p, n, k): 
   return nük(n, k) * p** k * (1-p)**(n-k)

def spielfähig(p, n, min_spieler): 
      return sum([binom(p, n, k) for k in range(min_spieler, n+1)])
   except ValueError: return 1.0

Now when I run this with p=0.02, n=4000 and minspieler=70, it returns

OverflowError: integer division result too large for a float

The reason is simple: There are some intermediate numbers which are much larger than what a float can represent.

3 Solution with Guile

To fix this, I rewrote the script in Guile Scheme:

#!/usr/bin/env guile-2.0

(define-module (spielfaehig)
  #:export (spielfähig))
(use-modules (srfi srfi-1)) ; for iota with count and start

(define (factorial n)
  (if (zero? n) 1 
      (* n (factorial (1- n)))))

(define (nük n k)
  (if (> k n) 0
      (/ (factorial n) 
         (factorial k) 
         (factorial (- n k)))))

(define (binom p n k)
  (* (nük n k) 
     (expt p k) 
     (expt (- 1 p) (- n k))))

(define (spielfähig p n min_spieler) 
  (apply + 
         (map (lambda (k) (binom p n k)) 
              (iota (1+ (- n min_spieler)) min_spieler))))

To use this with exact math, I just need to call it with p as exact number:

(use-modules (spielfaehig))
(spielfähig #e.03 4000 70)
;           ^ note the #e - this means to use an exact representation
;                           of the number

; To make Guile show a float instead of some huge division, just
; convert the number to an inexact representation before showing it.
(format #t "~A\n" (exact->inexact (spielfähig #e.03 4000 70)))

And that’s it. Automagic hassle-free exact math is at my fingertips.

It just works and uses less then 200 MiB of memory - even though the intermediate factorials return huge numbers. And huge means huge. It effortlessly handles numbers with a size on the order of 108000. That is 10 to the power of 8000 - a number with 8000 digits.

4 The Answer

42! :)

The real answer is 0.0125: That’s the maximum length we need to choose for short links to get more than a 95% probability that in a network of 4000 nodes there are at most 5 nodes for which there are less than 70 peers with a distance of at most the maximum length.

If we can assume 5000 nodes, then 0.01 is enough. And since this is the number we directly got from an analysis of our link length distribution, it is the better choice, though it will mean that people with huge bandwidth cannot always max out their 100 connections.

5 Conclusion

Most of the time, floats are OK. But there are the times when you simply need exact math.

In these situations Guile Scheme is a lifesaver.

Dear GNU Hackers, thank you for this masterpiece!

And if you were crazy enough to read till here, Happy Hacking to you!

2014-07-21-Mo-exact-math-to-the-rescue-guile-scheme.org4.41 KB

Exploring the probability of successfully retrieving a file in freenet, given different redundancies and chunk lifetimes

In this text I want to explore the behaviour of the degrading yet redundant anonymous file storage in Freenet. It only applies to files which were not subsequently retrieved.

Every time you retrieve a file, it gets healed which effectively resets its timer as far as these calculations here are concerned. Due to this, popular files can and do live for years in freenet.

1 Static situation

Firstoff we can calculate the retrievability of a given file with different redundancy levels, given fixed chunk retrieval probabilities.

Files in Freenet are cut into segments which are again cut into up to 256 chunks each. With the current redundancy of 100%, only half the chunks of each segment have to be retrieved to get the whole file. I call that redundancy “2x”, because it inserts data 2x the size of the file (actually that’s just what I used in the code and I don’t want to force readers - or myself - to make mental jumps while switching from prose to code).

We know from the tests done by digger3, that after 31 days about 50% of the chunks are still retrievable, and after 30 days about 30%. Let’s look how that affects our retrieval probabilities.

# encoding: utf-8
from spielfaehig import spielfähig
from collections import defaultdict
data = []
res = []
for chunknumber in range(5, 105, 5):...
byred = defaultdict(list)
for num, prob, red, retrieval in data:...
csv = "; num prob retrieval"
for red in byred:...

# now plot the files

plotcmd = """
set term png
set width 15
set xlabel "chunk probability"
set ylabel "retrieval probability"
set output freenet-prob-redundancy-2.png
plot "2.csv" using 2:3 select ($1 == 5) title "5 chunks", "" using 2:3 select ($1 == 10) title "10 chunks", "" using 2:3 select ($1 == 30) title "30 chunks", "" using 2:3 select ($1 == 100) title "100 chunks"
set output freenet-prob-redundancy-3.png
plot "3.csv" using 2:3 select ($1 == 5) title "5 chunks", "" using 2:3 select ($1 == 10) title "10 chunks", "" using 2:3 select ($1 == 30) title "30 chunks", "" using 2:3 select ($1 == 100) title "100 chunks"
set output freenet-prob-redundancy-4.png
plot "4.csv" using 2:3 select ($1 == 5) title "5 chunks", "" using 2:3 select ($1 == 10) title "10 chunks", "" using 2:3 select ($1 == 30) title "30 chunks", "" using 2:3 select ($1 == 100) title "100 chunks"
with open("plot.pyx", "w") as f:...

from subprocess import Popen
Popen(["pyxplot", "plot.pyx"])

So what does this tell us?


Retrieval probability of a given file in a static case. redundancy 100% (2)

redundancy 200% (3)

Retrieval probability of a given file in a static case. redundancy 200% (3)

redundancy 300% (4)

Retrieval probability of a given file in a static case. redundancy 300% (4)

This looks quite good. After all, we can push the lifetime as high as we want by just increasing redundancy.

Sadly it is also utterly wrong :) Let’s try to get closer to the real situation.

2 Dynamic Situation: The redundancy affects the replacement rate of chunks

To find a better approximation of the effects of increasing the redundancy, we have to stop looking at freenet as a fixed store and have to start seeing it as a process. More exactly: We have to look at the replacement rate.

2.1 Math

A look on the stats from digger3 shows us, that after 4 weeks 50% of the chunks are gone. Let’s call this the dropout rate. The dropout rate consists of churn and chunk replacement:

dropout = churn + replacement

Since after one day the dropout rate is about 10%, I’ll assume that the churn is lower than 10%. So for the following parts, I’ll just ignore the churn (naturally this is wrong, but since the churn is not affected by redundancy, I just take it as constant factor. It should reduce the negative impacts of increasing redundancy). So we will only look at replacement of blocks.

Replacement consists of new inserts and healing of old files.

replacement = insert + healing

If we increase the redundancy from 2 to 3, the insert and healing rate should both increase by 50%, so the replacement rate should increase by 50%, too. The healing rate might increase a bit more, because healing can now restore 66% of the file as long as at least 33% are available. I’ll ignore that, too, for the time being (which is wrong again. We will need to keep this in mind when we look at the result).

redundancy 2 → 3 ⇒ replacement rate × 1.5

Increasing the replacement rate by 50% should decrease the lifetime of chunks by 1/1.5, or:

chunk lifetime × 2/3

So we will be at the 50% limit not after 4 weeks, but after 10 days. But on the other hand, redundancy 3 only needs 33% chunk probability, which has 2× the lifetime of 50% chunk probability. So the file lifetime should change by 2×2/3 = 4/3:

file lifetime × 4/3 = file lifetime +33%

Now doesn’t that look good?

As you can imagine, this pretty picture hides a clear drawback: The total storage capacity of Freenet gets reduced by 33%, too, because now every file requires 1.5× as much space as before.

2.2 Caveats (whoever invented that name? :) )

We ignored churn, so the chunk lifetime reduction should be a bit less than the estimated 33%%. That’s good and life is beautiful, right? :)

NO. We also ignored the increase in the healing rate. This should be higher, because every retrieved file can now insert more of itself in the healing process. If we had no new inserts, I would go as far as saying that the healing-rate might actually double with the increased redundancy. So in a network completely filled network without new data, the effects of the higher redundancy and the higher replacement rate would exactly cancel. But the higher redundancy would be able to store less files. Since we are constantly pushing new data into the network (for example via discussions in Sone), this should not be the case.

2.3 Dead space

Aside from hiding some bad effects, this simple model also hides a nice effect: A decreased amount of dead space.

Firstoff, lets define it:

2.4 What is dead space?

Dead space is the part of the storage space which cannot be used for retrieving files. With any redundancy, that dead space is just about the size of the original file without redundancy multiplier. So for redundancy 2, the storage space occupied by the file is dead, when less than 50% are available. With redundancy 3, it is dead when less than 33% are available.

2.5 Effect

That dead space is replaced like any other space, but it is never healed. So the higher replacement rate means that dead space is recovered more quickly. So, while a network with higher redundancy can store less files overall, those files which can no longer be retrieved take up less space. I won’t add the math for that, here, though (because I did not do that yet).

2.6 Closing

So, as closing remark, we can say that increasing the redundancy will likely increase the lifetime of files. It will also reduce the overall storage space in Freenet, though. I think it would be worthwhile.

It might also be possible to give probability estimates in the GUI which show how likely it is that we can retrieve a given file after a few percent were downloaded: If more than 1/redundancy chunks succeed, the probability to get the file is high. if close to 1/redundancy succeed, the file will be slow, because we might have to wait for nodes which went online and will come back at some point. Essentially we will have to hope for churn. If much less than 1/redundancy of the chunks succeed, we can stop trying to get the file.

Just use the code in here for that :)

3 Background and deeper look

Why redundancy after all redundancy 1: 1 chunk fails ⇒ file fails. redundancy 2: 50% redundancy 3: 33%

3.1 No redundancy

Let’s start with redundancy 1. If one chunk fails, the whole file fails.

Compared to freenet today the replacement rate would be halved, because each file takes up only half the current space. So the 50% dead chunks rate would be reached after 8 weeks instead of after 4 weeks. And 90% would be after 2 days instead of after 1 day. We can guess that 99% would be after a few hours.

Let’s take a file with 100 chunks as example. That’s 100× 32 kiB, or about 3 Megabyte. After a few hours the chance will be very high that it will have lost one chunk and will be irretrievable. Freenet will still have 99% of the chunks, but they will be wasted space, because the file cannot be recovered anymore. The average lifetime of a file will just be a few hours.

With 99% probability of retrieving a chunk, the probability of retrieving a file will be only about 37%.

from spielfaehig import spielfähig
return spielfähig(0.99, 100, 100)
→ 0.366032341273

To achieve 90% retrievability of the file, we need a chunk availability of 99,9%! The file is essentially dead directly after the insert finishes.

from spielfaehig import spielfähig
return spielfähig(0.999, 100, 100)
→ 0.904792147114

3.2 1% redundancy

Now, lets add one redundant chunk. Almost nothing will have changed for inserting and replacing, but now the probability of retrieving the file when the chunks have 99% availability is 73%!

from spielfaehig import spielfähig
return spielfähig(0.99, 101, 100)
→ 0.732064682546

The replacement rate is increased by 1%, as is the storage space.

To achieve 90% retrievability, we actually need a chunk availability of 99,5%. So we might have 90% retrievability one hour after the insert.

from spielfaehig import spielfähig
return spielfähig(0.995, 101, 100)
→ 0.908655654736

Let’s check for 50%: We need a chunk probability of about 98,4%

from spielfaehig import spielfähig
return spielfähig(0.984, 101, 100)
→ 0.518183035909

The mean lifetime of a file changed from about zero to a few hours.

3.3 50% redundancy

Now, let’s take a big step: redundancy 1.5. Now we need 71,2% block retrievability to have a 90% chance of retrieving one file.

from spielfaehig import spielfähig
return spielfähig(0.712, 150, 100)
→ 0.904577767501

for 50% retrievability we need 66,3% chunk availability.

from spielfaehig import spielfähig
return spielfähig(0.663, 150, 100)
→ 0.500313163333

66% would be reached in the current network after about 20 days (between 2 weeks and 4 weeks), and in a zero redundancy network after 40 days. fetch-pull-stats

At the same time, though, the chunk replacement rate increased by 50%, so the mean chunk lifetime decreased by factor 2/3. So the lifetime of a file would be 4 weeks.

3.4 Generalize this

So, now we have calculations for redundancy 1, 1.5, 2 and 3. Let’s see if we can find a general (if approximate) rule for redundancy.

From the fetch-pull-graph from digger3 we see empirically, that between one week and 18 weeks each doubling of the lifetime corresponds to a reduction of the chunk retrieval probability of 15% to 20%.

Also we know that 50% probability corresponds to 4 weeks lifetime.

And we know that redundancy x has a minimum required chunk probability of 1/x.

With this, we can model the required chunk lifetime as a function of redundancy:

chunk lifetime = 4 * 2**((0.5-1/x)/0.2)

with x as redundancy. Note: this function is purely empirical and approximate.

Having the chunk lifetime, we can now model the lifetime of a file as a function of its redundancy:

file lifetime = (2/x) * 4 * (2**((0.5-1/x)/0.2))

We can now use this function to find an optimum of the redundancy if we are only concerned about file lifetime. Naturally we could get the trusty wxmaxima and get the derivative of it to find the maximum. But that is not installed right now, and my skills in getting the derivatives by hand are a bit rusty (note: install running). So we just do it graphically. The function is not perfectly exact anyway, so the errors introduced by the graphic solution should not be too big compared to the errors in the model.

Note however, that this model is only valid in the range between 20% and 90% chunk retrieval probability, because the approximation for the chunk lifetime does not hold anymore for values above that. Due to this, redundancy values close to or below 1 won’t be correct.

Also keep in mind that it does not include the effect due to the higher rate of removing dead space - which is space that belongs to files which cannot be recovered anymore. This should mitigate the higher storage requirement of higher redundancy.

# encoding: utf-8
plotcmd = """
set term png
set width 15
set xlabel "redundancy"
set ylabel "lifetime [weeks]"
set output "freenet-prob-function.png"
set xrange [0:10]
plot (2/x) * 4 * (2**((0.5-1/x)/0.2))
with open("plot.pyx", "w") as f:...

from subprocess import Popen
Popen(["pyxplot", "plot.pyx"])

4 Summary: Merit and outlook

Now, what do we make of this?

Firstoff: If the equations are correct, an increase in redundancy would improve the lifetime of files by a maximum of almost a week. Going further reduces the lifetime, because the increased replacement of old data outpaces the improvement due to the higher redundancy.

Also higher redundancy needs a higher storage capacity, which reduces the overall capacity of freenet. This should be partially offset by the faster purging of dead storage space.

The results support an increase in redundancy from 2 to 3, but not to 4.

Well, and aren’t statistics great? :)

Additional notes: This exploration ignores:

  • healing creates less insert traffic than new inserts by only inserting failed segments, and it makes files which get accessed regularly live much longer,
  • inter-segment redundancy improves the retrieving of files, so they can cope with a retrievability of 50% of any chunks of the file, even if the distribution might be skewed for a single segment,
  • Non-uniformity of the network which makes it hard to model effects with global-style math like this,
  • Seperate stores for SSK and CHK keys, which improve the availability of small websites and
  • Usability and security impact of increased insert times (might be reduced by only inserting 2/3rd of the file data and letting healing do the rest when the first downloader gets the file)

Due to that, the findings can only provides clues for improvements, but cannot perfectly predict the best path of action. Thanks to evanb for pointing them out!

If you are interested in other applications of the same theory, you might enjoy my text Statistical constraints for the design of roleplaying games (RPGs) and campaigns (german original: Statistische Zwänge beim Rollenspiel- und Kampagnendesign). The script spielfaehig.py I used for the calculations was written for a forum discussion which evolved into that text :)

This text was written and checked in emacs org-mode and exported to HTML via `org-export-as-html-to-buffer`. The process integrated research and documentation. In hindsight, that was a pretty awesome experience, especially the inline script evaluation. I also attached the org-mode file for your leisure :)

freenet-prob-redundancy-2.png67.05 KB
freenet-prob-redundancy-3.png65.67 KB
freenet-prob-redundancy-4.png63.43 KB
freenet-success-probability.org14.84 KB
freenet-prob-function.png20.5 KB
fetch_dates_graph-2012-03-16.png17.25 KB
spielfaehig.py.txt1.15 KB

Freenet Communication Primitives: Part 1, Files and Sites

Basic building blocks for communication in Freenet.

This is a guide to using Freenet as backend for communication solutions - suitable for anything from filesharing over chat up to decentrally hosted game content like level-data. It uses the Python interface to Freenet for its examples.

TheTim from Tim Moore, licensed under cc by
from Tim Moore,
License: cc by.

This guide consists of several installments: Part 1 (this text) is about exchanging data, Part 2 is about confidential communication and finding people and services without drowning in spam and Part 3 ties it all together by harnessing existing plugins which already include all the hard work which distinguishes a quick hack from a real-world system. Happy Hacking and welcome to Freenet, the forgotten cypherpunk paradise where no one can watch you read!

1 Introduction

The immutable datastore in Freenet provides the basic structures for implementing distributed, pseudonymous, spam-resistant communication protocols. But until now there was no practically usable documentation how to use them. Every new developer had to find out about them by asking, speculating and second guessing the friendly source (also known as SGTFS).

We will implement the answers using pyFreenet. Get it from http://github.com/freenet/pyFreenet

We will not go into special cases. For these have a look at the API-documentation of fcp.node.FCPNode().

1.1 Install pyFreenet

To follow the code examples in this article, install Python 2 with setuptools and then run

easy_install --user --egg pyFreenet==0.4.0

2 Sharing a File: The CHK (content hash key)

The first and simplest task is sharing a file. You all know how this works in torrents and file hosters: You generate a link and give that link to someone else.

To create that link, you have to know the exact content of the file beforehand.

import fcp
n = fcp.node.FCPNode()
key = n.put(data="Hello Friend!")
print key

Just share this key, and others can retrieve it. Use as prefix, and they can even click it - if they run Freenet on their local computer or have an SSH forward for port 8888.

The code above only returns once the file finished uploading. The Freenet Client Protocol (that’s what fcp stands for) however is asynchronous. When you pass async=True to n.put() or n.get(), you get a job object which gives you the result via job.wait().

To generate the key without actually uploading the file, use chkonly=True as argument to n.put().

Let’s test retrieving a file:

import fcp
n = fcp.node.FCPNode()
key = n.put(data="Hello Friend!")
mime, data, meta = n.get(key)
print data

This code anonymously uploads an invisible file into Freenet which can only be retrieved with the right key. Then it downloads the file from Freenet using the key and shows the data.

That the put and the get request happen from the same node is a mere implementation detail: They could be fired by total strangers on different sides of the globe and would still work the same. Even the performance would be similar.

Note: fcp.node.FCPNode() opens a connection to the Freenet node. You can have multiple of these connections at the same time, all tracking their own requests without interfering with each other. Just remember to call n.shutdown() on each of them to avoid getting ugly backtraces.

So that’s it. We can upload and download files, completely decentrally, anonymously and confidentially.

There’s just one caveat: We have to exchange the key. And to generate that key, we have to know the content of the file.

Let’s fix that.

3 Public/Private key publishing: The SSK (signed subspace key)

Our goal is to create a key where we can upload a file in the future. We can generate this key and tell someone else: Watch this space.

So we will generate a key, start to download from the key and insert the file to the key afterwards.

import fcp
n = fcp.node.FCPNode()
# we generate a key with the additional filename hello.
public, private = n.genkey(name="hello")
job = n.get(public, async=True)
n.put(uri=private, data="Hello Friend!")
mime, data, meta = job.wait()
print data

These 8 lines of code create a key which you could give to a friend. Your friend will start the download and when you get hold of that secret hello-file, you upload it and your friend gets it.

Hint: If you want to test whether the key you give is actually used, you can check the result of n.put(). It returns the key with which the data can be retrieved.

Using the .txt suffix makes Freenet use the mimetype text/plain. Without extension it will use application/octet-stream.

If you start downloading before you upload as we do here, you can trigger a delay of about half an hour due to overload protections (the mechanism is called “recently failed”).

Note that you can only write to a given key-filename combination once. If you try to write to it again, you’ll get conflicts – your second upload will in most cases just not work. You might recognize this from immutable datastructures (without the conflict stuff). Freenet is the immutable, distributed, public/private key database you’ve been phantasizing about when you had a few glasses too many during that long night. So best polish your functional programming skills. You’re going to use them on the level of practical communication.

3.1 short roundtrip time (speed hacks)

A SSK is a special type of key, and similar to inodes in a filesystem it can carry data. But if used in the default way, it will forward to a CHK: The file is salted and then inserted to a CHK which depends on the content and then some, ensuring that the key cannot be predicted from the data (this helps avoid some attacks against your anonymity).

When we want a fast round trip time, we can cut that. The condition is that your data plus filename is less than 1KiB after compression, the amount of data a SSK can hold. And we have to get rid of the metadata. And that means: With pyFreenet use the application/octet-stream mime type, because that’s the default one, so it is left out on upload. If you use raw access to FCP, omit Metadata.ContentType or set it to "". And insert single files (we did not yet cover uploading folders: You can do that, but they will forward to a CHK).

import fcp
n = fcp.node.FCPNode()
# we generate a key with the additional filename hello.
public, private = n.genkey(name="hello.txt")
job = n.get(public, async=True, realtime=True, priority=0)
n.put(uri=private, data="Hello Friend!", mimetype="application/octet-stream", realtime=True, priority=0)
mime, data, meta = job.wait()
print public
print data

To check whether we managed to avoid the metadata, we can use the KeyUtils plugin to analyze the key.

If it is right, when putting the key into the text field on the site, you’ll see something like this:

0000000: 4865 6C6C 6F20 4672 6965 6E64 21
         Hello Friend!

Also we want to use realtime mode (optimized for the webbrowser: reacting quickly but with low throughput) with a high priority.

Let’s look at the round trip time we achieve:

import time
import fcp
n = fcp.node.FCPNode()
# we generate two keys with the additional filename hello.
public1, private1 = n.genkey(name="hello1.txt")
public2, private2 = n.genkey(name="hello2.txt")
starttime = time.time()
job1 = n.get(public1, async=True, realtime=True, priority=1)
job2 = n.get(public2, async=True, realtime=True, priority=1)
n.put(uri=private1, data="Hello Friend!",
      realtime=True, priority=1)
mime, data1, meta = job1.wait()
n.put(uri=private2, data="Hello Back!",
      realtime=True, priority=1)
mime, data2, meta = job2.wait()
rtt = time.time() - starttime
print public1
print public2
print data1
print data2
print "RTT (seconds):", rtt

When I run this code, I get less than 80 seconds round trip time. Remember that we’re uploading two files anonymously into a decentralized network, discover them and then download them, and all that in serial. Less than a minute to detect an upload to known key.

90s is not instantaneous, but when looking at usual posting frequencies in IRC and other chat, it’s completely sufficient to implement a chat system. And in fact it’s how FLIP is implemented: IRC over Freenet.

Compare this to the performance when we do not use the short round trip time trick of avoiding the Metadata and using the realtime queue:

import time
import fcp
n = fcp.node.FCPNode()
# we generate two keys with the additional filename hello.
public1, private1 = n.genkey(name="hello1.txt")
public2, private2 = n.genkey(name="hello2.txt")
starttime = time.time()
job1 = n.get(public1, async=True)
job2 = n.get(public2, async=True)
n.put(uri=private1, data="Hello Friend!")
mime, data1, meta = job1.wait()
n.put(uri=private2, data="Hello Back!")
mime, data2, meta = job2.wait()
rtt = time.time() - starttime
print public1
print public2
print data1
print data2
print "RTT (seconds):", rtt

With 300 seconds (5 minutes), that’s more than 3x slower. So you see, if you have small messages and you care about latency, you want to do the latency hacks.

4 Upload Websites: SSK as directory

So now we can upload single files, but the links look a lot like what we see on websites: So can we just mirror a website? The answer is: Yes, definitely!

import fcp
n = fcp.node.FCPNode()
# We create a key with a directory name
public, private = n.genkey() # no filename: we need different ones
index = n.put(uri=private + "index.html",
    <link rel="stylesheet" type="text/css" href="style.css">
    <title>First Site!</title></head>
  <body>Hello World!</body></html>''')
n.put(uri=private + "style.css", 
      data='body {color: red}\n')
print index

Now we can navigate to the key in the freenet web interface and look at our freshly uploaded website! The text is colored red, so it uses the stylesheet. We have files in Freenet which can reference each other by relative links.

4.1 Multiple directories below an SSK

So now we can create simple websites on an SSK. But here’s a catch: key/hello/hello.txt simply returns key/hello. What if we want multiple folders?

For this purpose, Freenet provides manifests instead of single files. Manifests are tarballs which include several files which are then downloaded together and which can include references to external files - named redirects. They can be uploaded as folders into the key. And in addition to these, there are quite a few other tricks. Most of them are used in freesitemgr which uses fcp/sitemgr.py.

But we want to learn how to do it ourselves, so let’s do a more primitive version manually via n.putdir():

import os
import tempfile

import fcp
n = fcp.node.FCPNode()
# we create a key again, but this time with a name: The folder of the
# site: We will upload it as a container.
public, private = n.genkey()
# now we create a directory
tempdir = tempfile.mkdtemp(prefix="freesite-")
with open(os.path.join(tempdir, "index.html"), "w") as f:
    <link rel="stylesheet" type="text/css" href="style.css">
    <title>First Site!</title></head>
    <body>Hello World!</body></html>''')

with open(os.path.join(tempdir, "style.css"), "w") as f:
    f.write('body {color: red}\n')

uri = n.putdir(uri=private, dir=tempdir, name="hello", 
               filebyfile=True, allatonce=True, globalqueue=True)
print uri

That’s it. We just uploaded a folder into Freenet.

But now that it’s there, how do we upload a better version? As already said, files in Freenet are immutable. So what’s the best solution if we can’t update the data, but only upload new files? The obvious solution would be to just number the site.

And this is how it was done in the days of old. People uploaded hello-1, hello-2, hello-3 and so forth, and in hello-1 they linked to an image under hello-2. When visitors of hello-1 saw that the image loaded, they knew that there was a new version.

When more and more people adopted that, Freenet added core support: USKs, the updatable subspace keys.

We will come to that in the next part of this series: Service Discovery and Communication.

thetim-tim_moore-flickr-cc_by-2471774514_8c9ed2a7e5_o-276x259.jpg19.79 KB

Freenet Communication Primitives: Part 2, Service Discovery and Communication

Basic building blocks for communication in Freenet.

This is a guide to using Freenet as backend for communication solutions - suitable for anything from filesharing over chat up to decentrally hosted game content like level-data. It uses the Python interface to Freenet for its examples.

Mirror, Freenet Project, Arne Babenhauserheide, GPL
Freenet Project,
License: GPL.

This guide consists of several installments: Part 1 is about exchanging data, Part 2 is about confidential communication and finding people and services without drowning in spam and Part 3 ties it all together by harnessing existing plugins which already include all the hard work which distinguishes a quick hack from a real-world system (this is currently a work in progress, implemented in babcom.py which provides real-world usable functionality).

Note: You need the current release of pyFreenet for the examples in this article (0.3.2). Get it from PyPI:

# with setuptools
easy_install --user --egg pyFreenet==0.4.0
# or pip
pip install --user --egg pyFreenet==0.4.0

This is part 2: Service Discovery and Communication. It shows how to find new people, build secure communication channels and create community forums. Back when I contributed to Gnutella, this was the holy grail of many p2p researchers (I still remember the service discovery papers). Here we’ll build it in 300 lines of Python.

Welcome to Freenet, where no one can watch you read!

USK: The Updatable Subspace Key

USKs allow uploading increasing versions of a website into Freenet. Like numbered uploads from the previous article they simply add a number to site, but they automate upload and discovery of new versions in roughly constant time (using Date Hints and automatic checking for new versions), and they allow accessing a site as <key>/<name>/<minimal version>/ (never understimate the impact of convenience!).

With this, we only need a single link to provide an arbitrary number of files, and it is easy and fast to always get the most current version of a site. This is the ideal way to share a website in Freenet. Let’s do it practically.

import os
import tempfile

import fcp
n = fcp.node.FCPNode()
# we create a key again, but this time with a name: The folder of the
# site: We will upload it as a container.
public, private = n.genkey()
# now we create a directory
tempdir = tempfile.mkdtemp(prefix="freesite-")
with open(os.path.join(tempdir, "index.html"), "w") as f:
    <link rel="stylesheet" type="text/css" href="style.css">
    <title>First Site!</title></head>
    <body>Hello World!</body></html>''')

with open(os.path.join(tempdir, "style.css"), "w") as f:
    f.write('body {color: red}\n')

uri = n.putdir(uri=private, dir=tempdir, name="hello",
               filebyfile=True, allatonce=True, globalqueue=True,
print uri

But we still need to first share the public key, so we cannot just tell someone where to upload the files so we see them. Though if we were to share the private key, then someone else could upload there and we would see it in the public key. We could not be sure who uploaded there, but at least we would get the files. Maybe we could even derive both keys from a single value… and naturally we can. This is called a KSK (old description).

KSK: Upload a file to a password

KSKs allow uploading a file to a pre-determined password. The file will only be detectable for those who know the password, so we have effortless, invisible, password protected files.

import fcp
import uuid # avoid spamming the global namespace

n = fcp.node.FCPNode()
_uuid = str(uuid.uuid1())
key = "KSK@" + _uuid
n.put(uri=key, data="Hello World!",
      Global=True, persistence="forever",
      realtime=True, priority=1)
print key
print n.get(key)[1]

Note: We’re now writing a communication protocol, so we’ll always use realtime mode. Be aware, though, that realtime is rate limited. If you use it for large amounts of data, other nodes will slow down your requests to preserve quick reaction of the realtime queue for all (other) Freenet users.

Note: Global=True and


allows telling Freenet to upload some data and then shutting down the script. Use async=True and waituntilsent=True to just start the upload. When the function returns you can safely exit from the script and let Freenet upload the file in the background - if necessary it will even keep uploading over restarts. And yes, Capitcalized Global looks crazy. For pyFreenet that choice is sane (though not beautiful), because Global gets used directly as parameter in the Freenet Client Protocol (FCP). This is the case for many of the function arguments. In putdir() there’s a globalqueue parameter which also sets persistence. That should become part of the put() API, but isn’t yet. There are lots of places where the pyFreenet is sane, but not beautiful. It seems like that’s its secret how it could keep working from 2008 till 2014 with almost no maintenance

For our purposes the main feature of KSKs is that we can tell someone to upload to an arbitrary phrase and then download that.

If we add a number, we can even hand out a password to multiple people and tell them to just upload to the first unused version. This is called the KSK queue.

KSK queue: Share files by uploading to a password

The KSK queue used to be the mechanism of choice to find new posts in forums, until spammers proved that real anonymity means total freedom to spam: they burned down the Frost Forum System. But we’ll build this, since it provides a basic building block for the spam-resistant system used in Freenet today.

Let’s just do it in code (descriptions are in the comments):

import fcp
import uuid # avoid spamming the global namespace

n = fcp.node.FCPNode()
_uuid = str(uuid.uuid1())
print "Hey, this is the password:", _uuid
# someone else used it before us
for number in range(2):
    key = "KSK@" + _uuid + "-" + str(number)
    n.put(uri=key, data="Hello World!", 
          Global=True, persistence="forever",
          realtime=True, priority=1,
          timeout=360) # 6 minutes
# we test for a free slot
for number in range(4):
  key = "KSK@" + _uuid + "-" + str(number)
          realtime=True, priority=1, 
  except fcp.node.FCPNodeTimeout:
# and write there
n.put(uri=key, data="Hello World!",
      Global=True, persistence="forever",
      realtime=True, priority=1,
      timeout=360) # 6 minutes
print key
print n.get(key)[1]

Note that currently a colliding put – uploading where someone else uploaded before – simply stalls forever instead of failing. This is a bug in pyFreenet. We work around it by giving an explicit timeout.

But it’s clear how this can be spammed.

And it might already become obvious how this can be avoided.

KSK queue with CAPTCHA

Let’s assume I do not tell you a password. Instead I tell you where to find a riddle. The solution to that riddle is the password. Now only those who are able to solve riddles can upload there. And each riddle can be used only once. This restricts automated spamming, because it requires an activity of which we hope that only humans can do it reliably.

In the clearweb this is known as CAPTCHA. For the examples in this guide a plain text version is much easier.

import fcp
import uuid # avoid spamming the global namespace

n = fcp.node.FCPNode()
_uuid = str(uuid.uuid1())
_uuid2 = str(uuid.uuid1())
riddlekey = "KSK@" + _uuid
riddle =  """
What goes on four legs in the morning,                          
two legs at noon, and three legs in the                         
A <answer>
# The ancient riddle of the sphinx
n.put(uri=riddlekey, data="""To reach me, answer this riddle.


Upload your file to %s-<answer>
""" % (riddle, _uuid2),
      Global=True, persistence="forever",
      realtime=True, priority=1)

print n.get(riddlekey, realtime=True, priority=1)[1]
answer = "human"
print "answer:", answer
answerkey = "KSK@" + _uuid2 + "-%s" % answer

n.put(uri=answerkey, data="Hey, it's me!",
      Global=True, persistence="forever",
      realtime=True, priority=1)

print n.get(answerkey, realtime=True, priority=1)[1]

Now we have fully decentralized, spam-resistant, anonymous communication.

Let me repeat that: fully decentralized, spam-resistant, anonymous communication.

The need to solve a riddle everytime we want to write is not really convenient, but it provides the core of the feature we need. Everything we now add just makes this more convenient and makes it scale for many-to-many communication.

(originally I wanted to use the Hobbit riddles for this, but I switched to the sphinx riddle to avoid the swamp of multinational (and especially german) quoting restrictions)

Convenience: KSK queue with CAPTCHA via USK to reference a USK

The first step to improve this is getting rid of the requirement to solve a riddle every single time we write to a person. The second is to automatically update the list of riddles.

For the first, we simply upload a public USK key instead of the message. That gives a potentially constant stream of messages.

For the second, we upload the riddles to a USK instead of to a KSK. We pass out this USK instead of a password. Let’s realize this.

To make this easier, let’s use names. Alice wants to contact Bob. Bob gave her his USK. The answer-uuid we’ll call namespace.

import fcp
import uuid # avoid spamming the global namespace
import time # to check the timing

tstart = time.time()
def elapsed_time():
    return time.time() - tstart

n = fcp.node.FCPNode()

bob_public, bob_private = n.genkey(usk=True, name="riddles")
alice_to_bob_public, alice_to_bob_private = n.genkey(usk=True, name="messages")
namespace_bob = str(uuid.uuid1())
riddle =  """
What goes on four legs in the morning,                          
two legs at noon, and three legs in the                         
A <answer>
print "prepared:", elapsed_time()
# Bob uploads the ancient riddle of the sphinx
put_riddle = n.put(uri=bob_private,
                   data="""To reach me, answer this riddle.


Upload your key to %s-<answer>
""" % (riddle, namespace_bob),
                   Global=True, persistence="forever",
                   realtime=True, priority=1, async=True,
                   IgnoreUSKDatehints="true") # speed hack for USKs.

riddlekey = bob_public
print "riddlekey:", riddlekey
print "time:", elapsed_time()
# Bob shares the riddlekey. We're set up.

# Alice can insert the message before telling Bob about it.
put_first_message = n.put(uri=alice_to_bob_private,
                          data="Hey Bob, it's me, Alice!",
                          Global=True, persistence="forever",
                          realtime=True, priority=1, async=True,

print "riddle:", n.get(riddlekey, realtime=True, priority=1, followRedirect=True)[1]
print "time:", elapsed_time()

answer = "human"
print "answer:", answer
answerkey = "KSK@" + namespace_bob + "-%s" % answer
put_answer = n.put(uri=answerkey, data=alice_to_bob_public,
                   Global=True, persistence="forever",
                   realtime=True, priority=1, async=True)

print ":", elapsed_time()
# Bob gets the messagekey and uses it to retrieve the message from Alice

# Due to details in the insert process (i.e. ensuring that the file is
# accessible), the upload does not need to be completed for Bob to be
# able to get it. We just try to get it.
messagekey_alice_to_bob = n.get(answerkey, realtime=True, priority=1)[1]

print "message:", n.get(uri=messagekey_alice_to_bob, realtime=True, priority=1,
                        followRedirect=True, # get the new version

print "time:", elapsed_time()
# that's it. Now Alice can upload further messages which Bob will see.

# Bob starts listening for a more recent message. Note that this does
# not guarantee that he will see all messages.
def next_usk_version(uri):
    elements = uri.split("/")
    elements[2] = str(abs(int(elements[2])) + 1)
    # USK@.../name/N+1/...
    return "/".join(elements)

next_message_from_alice = n.get(
    realtime=True, priority=1, async=True,
    followRedirect=True) # get the new version

print "time:", elapsed_time()
# Alice uploads the next version.
put_second_message = n.put(uri=next_usk_version(alice_to_bob_private),
                           data="Me again!",
                           Global=True, persistence="forever",
                           realtime=True, priority=1,

# Bob sees it.
print "second message:", next_message_from_alice.wait()[1]
print "time:", elapsed_time()

print "waiting for inserts to finish"
print "time:", elapsed_time()


From start to end this takes less than 2 minutes minutes, and now Alice can send Bob messages with roughly one minute delay.

So now we set up a convenient communication channel. Since Alice already knows Bobs key, Bob could simply publish a bob-to-alice public key there, and if both publish GnuPG keys, these keys can be hidden from others: Upload not the plain key, but encrypt the key to Bob, and Bob could encrypt his bob-to-alice key using the GnuPG key from Alice. By regularly sending themselves new public keys, they could even establish perfect forward secrecy. I won’t implement that here, because when we get to the third part of this series, we will simply use the Freemail and Web of Trust plugin which already provide these features.

This gives us convenient, fully decentralized, spam-resistant, anonymous communication channels. Setting up a communication channel to a known person requires solving one riddle (in a real setting likely a CAPTCHA, or a password-prompt), and then the channel persists.

Note: To speed up these tests, I added another speed hack: IgnoreUSKDatehints. That turns off Date Hints, so discovering new versions will no longer be constant in the number of intermediate versions. For our messaging system that does not hurt, since we don’t have many intermediate messages we want to skip. For websites however, that could lead your visitors to see several old versions before they finally get the most current version. So be careful with this hack - just like you should with the other speed hacks.

But if we want to reach many people, we have to solve one riddle per person, which just doesn’t scale. To fix this, we can publish a list of all people we trust to be real people. Let’s do that.

Many-to-many: KSK->CAPTCHA->USK->USK which is linked in the original USK

To enable (public) many-to-many communication, we propagate the information that we believe that someone isn’t a spammer and add a blacklist to get rid of people who suddenly start to spam.

The big change with this scheme is that there is two-step authentication: Something expensive (solving a riddle) gets you seen by a few people, and if you then contribute constructively in a social context, they mark you as non-spammer and you get seen by more people.

The clever part about that scheme is that socializing is actually no cost to honest users (that’s why we use things like Sone or FMS), while it is a cost to attackers.

Let’s take Alice and Bob again, but add Carol. First Bob introduces himself to Alice, then Carol introduces herself to Alice. Thanks to propagating the riddle-information, Carol can directly write to Bob, without first solving a riddle. Scaling up that means that you only need to prove a single time that you are no spammer (or rather: not disruptive) if you want to enter a community.

To make it easier to follow, we will implement this with a bit of abstraction: People have a private key, can introduce themselves and publish lists of messages. Also they keep a public list of known people and a list of people they see as spammers who want to disrupt communication.

I got a bit carried away while implementing this, but please bear with me: It’ll work hard to make it this fun.

The finished program is available as alice_bob_carol.py. Just download and run it with python alice_bob_carol.py.

Let’s start with the minimal structure for any pyFreenet using program:

import fcp

n = fcp.node.FCPNode() # for debugging add verbosity=5



The body contains the definitions of a person with different actors, an update step (as simplification I use global stepwise updates) as well as the setup of the communication. Finally we need an event loop to run the system.






We start with some imports – and a bit of fun :)

import uuid
import random
    import chatterbot # let's get a real conversation :)
    # https://github.com/guntherc/ChatterBot/wiki/Quick-Start
    # get with `pip install --user chatterbot`
    irc_loguri = "USK@Dtz9FjDPmOxiT54Wjt7JwMJKWaqSOS-UGw4miINEvtg,cuIx2THw7G7cVyh9PuvNiHa1e9BvNmmfTcbQ7llXh2Q,AQACAAE/irclogs/1337/"
    print "Getting the latest IRC log as base for the chatterbot"
    IRC_LOGLINES = n.get(uri=irc_loguri, realtime=True, priority=1, followRedirect=True)[1].splitlines()
    import re # what follows is an evil hack, but what the heck :)
    p = re.compile(r'<.*?>')
    q = re.compile(r'&.*?;')
    IRC_LOGLINES = [q.sub('', p.sub('', str(unicode(i.strip(), errors="ignore"))))
                    for i in IRC_LOGLINES]
    IRC_LOGLINES = [i[:-5] for i in IRC_LOGLINES # skip the time (last 5 letters)
                    if (i[:-5] and # skip empty
                        not "spam" in i # do not trigger spam-marking
                    )][7:] # skip header 
except ImportError:
    chatterbot = None

The real code begins with some helper functions – essentially data definition.

def get_usk_namespace(key, name, version=0):
    """Get a USK key with the given namespace (foldername)."""
    return "U" + key[1:] + name + "/" + str(version) + "/"

def extract_raw_from_usk(key):
    """Get an SSK key as used to identify a person from an arbitrary USK."""
    return "S" + (key[1:]+"/").split("/")[0] + "/"

def deserialize_keylist(keys_data):
    """Parse a known file to get a list of keys. Reverse: serialize_keylist."""
    return [i for i in keys_data.split("\n") if i]

def serialize_keylist(keys_list):
    """Serialize the known keys into a text file. Reverse: parse_known."""
    return "\n".join(keys_list)

Now we can define a person. The person is the primary actor. To keep everything contained, I use a class with some helper functions.

class Person(object):
    def __init__(self, myname, mymessage):
        self.name = myname
        self.message = mymessage
        self.introduced = False
        self.public_key, self.private_key = n.genkey()
        print self.name, "uses key", self.public_key
        # we need a list of versions for the different keys
        self.versions = {}
        for name in ["messages", "riddles", "known", "spammers"]:
            self.versions[name] = -1 # does not exist yet
        # and sets of answers, watched riddle-answer keys, known people and spammers.
        # We use sets for these, because we only need membership-tests and iteration.
        # The answers contain KSKs, the others the raw SSK of the person.
        # watched contains all persons whose messages we read.
        self.lists = {}
        for name in ["answers", "watched", "known", "spammers", "knowntocheck"]:
            self.lists[name] = set()
        # running requests per name, used for making all persons update asynchronously
        self.jobs = {}
        # and just for fun: get real conversations. Needs chatterbot and IRC_LOGLINES.
        # this is a bit slow to start, but fun. 
            self.chatbot = chatterbot.ChatBot(self.name)
            self.chatbot = None

    def public_usk(self, name, version=0):
        """Get the public usk of type name."""
        return get_usk_namespace(self.public_key, name, version)
    def private_usk(self, name, version=0):
        """Get the private usk of type name."""
        return get_usk_namespace(self.private_key, name, version)

    def put(self, key, data):
        """Insert the data asynchronously to the key. This is just a helper to
avoid typing the realtime arguments over and over again.

        :returns: a job object. To get the public key, use job.wait(60)."""
        return n.put(uri=key, data=data, async=True,
                     Global=True, persistence="forever",
                     realtime=True, priority=1,

    def get(self, key):
        """Retrieve the data asynchronously to the key. This is just a helper to
avoid typing the realtime arguments over and over again.

        :returns: a job object. To get the public key, use job.wait(60)."""
        return n.get(uri=key, async=True,
                     realtime=True, priority=1,

    def introduce_to_start(self, other_public_key):
        """Introduce self to the other by solving a riddle and uploading the messages USK."""
        riddlekey = get_usk_namespace(other_public_key, "riddles", "-1") # -1 means the latest version
        except KeyError:
            self.jobs["getriddle"] = [self.get(riddlekey)]

    def introduce_start(self):
        """Select a person and start a job to get a riddle."""
        known = list(self.lists["known"])
        if known: # introduce to a random person to minimize
                  # the chance of collisions
            k = random.choice(known)

    def introduce_process(self):
        """Get and process the riddle data."""
        for job in self.jobs.get("getriddle", [])[:]:
            if job.isComplete():
                    riddle = job.wait()[1]
                except Exception as e: # try again next time
                    print self.name, "getting the riddle from", job.uri, "failed with", e
                answerkey = self.solve_riddle(riddle)
                messagekey = self.public_usk("messages")
                    self.jobs["answerriddle"].append(self.put(answerkey, messagekey))
                except KeyError:
                    self.jobs["answerriddle"] = [self.put(answerkey, messagekey)]

    def introduce_finalize(self):
        """Check whether the riddle answer was inserted successfully."""
        for job in self.jobs.get("answerriddle", [])[:]:
            if job.isComplete():
                    self.introduced = True
                except Exception as e: # try again next time
                    print self.name, "inserting the riddle-answer failed with", e

    def new_riddle(self):
        """Create and upload a new riddle."""
        answerkey = "KSK@" + str(uuid.uuid1()) + "-answered"
        self.versions["riddles"] += 1
        next_riddle_key = self.private_usk("riddles", self.versions["riddles"])
        self.put(next_riddle_key, answerkey)

    def solve_riddle(self, riddle):
        """Get the key for the given riddle. In this example we make it easy:
The riddle is the key. For a real system, this needs user interaction.
        return riddle

    def update_info(self):
        for name in ["known", "spammers"]:
            data = serialize_keylist(self.lists[name])
            self.versions[name] += 1
            key = self.private_usk(name, version=self.versions[name])
            self.put(key, data)

    def publish(self, data):
        self.versions["messages"] += 1
        messagekey = self.private_usk("messages", version=self.versions["messages"])
        print self.name, "published a message:", data
        self.put(messagekey, data)

    def check_network_start(self):
        """start all network checks."""
        # first cancel all running jobs which will be replaced here.
        for name in ["answers", "watched", "known", "knowntocheck", "spammers"]:
            for job in self.jobs.get(name, []):
        # start jobs for checking answers, for checking all known people and for checking all messagelists for new messages.
        for name in ["answers"]:
            self.jobs[name] = [self.get(i) for i in self.lists[name]]
        for name in ["watched"]:
            self.jobs["messages"] = [self.get(get_usk_namespace(i, "messages")) for i in self.lists[name]]
        self.jobs["spammers"] = []
        for name in ["known", "knowntocheck"]:
            # find new nodes
            self.jobs[name] = [self.get(get_usk_namespace(i, "known")) for i in self.lists[name]]
            # register new nodes marked as spammers
            self.jobs["spammers"].extend([self.get(get_usk_namespace(i, "spammers")) for i in self.lists[name]])

    def process_network_results(self):
        """wait for completion of all network checks and process the results."""
        for kind, jobs in self.jobs.items():
            for job in jobs:
                if not kind in ["getriddle", "answerriddle"]:
                        res = job.wait(60)[1]
                        self.handle(res, kind, job)

    def handle(self, result, kind, job):
        """Handle a successful job of type kind."""
        # travel the known nodes to find new ones
        if kind in ["known", "knowntocheck"]:
            for k in deserialize_keylist(result):
                if (not k in self.lists["spammers"] and
                    not k in self.lists["known"] and
                    not k == self.public_key):
                    print self.name, "found and started to watch", k
        # read introductions
        elif kind in ["answers"]:
            self.lists[kind].remove(job.uri) # no longer need to watch this riddle
            k = extract_raw_from_usk(result)
            if not k in self.lists["spammers"]:
                print self.name, "discovered", k, "through a solved riddle"
        # remove found spammers
        elif kind in ["spammers"]:
            for k in deserialize_keylist(result):
                if not result in self.lists["known"]:
        # check all messages for spam
        elif kind in ["messages"]:
            k = extract_raw_from_usk(job.uri)
            if not "spam" in result:
                if not k == self.public_key:
                    print self.name, "read a message:", result
                    self.chat(result) # just for fun :)
                    if not k in self.lists["known"]:
                        print self.name, "marked", k, "as known person"
                if not k in self.lists["spammers"]:
                    print self.name, "marked", k, "as spammer"

    def chat(self, message):
        if self.chatbot and not "spam" in self.message:
            msg = message[message.index(":")+1:-10].strip() # remove name and step
            self.message = self.name + ": " + self.chatbot.get_response(msg)

# some helper functions; the closest equivalent to structure definition

Note that nothing in here depends on running these from the same program. All communication between persons is done purely over Freenet. The only requirement is that there is a bootstrap key: One person known to all new users. This person could be anonymous, and even with this simple code there could be multiple bootstrap keys. In freenet we call these people “seeds”. They are the seeds from which the community grows. As soon as someone besides the seed adds a person as known, the seed is no longer needed to keep the communication going.

The spam detection implementation is pretty naive: It trusts people to mark others as spammers. In a real system, there will be disputes about what constitutes spam and the system needs to show who marks whom as spammer, so users can decide to stop trusting the spam notices from someone when they disagree. As example for a real-life system, the Web of Trust plugin uses trust ratings between -100 and 100 and calculates a score from the ratings of all trusted people to decide how much to trust people who are not rated explicitly by the user.

With this in place, we need the update system to be able to step through the simulation. We have a list of people who check keys of known other people.

We first start all checks for all people quasi-simultaneously and then check the results in serial to avoid long wait times from high latency. Freenet can check many keys simultaneously, but serial checking is slow.

people = []

def update(step):
    for p in people:
        if not p.introduced:
    for p in people:
    for p in people:
        if p.message:
            p.publish(p.name + ": " + p.message + "   (step=%s)" % step)
    for p in people:
        if not p.introduced:
    for p in people:
    for p in people:
        if not p.introduced:

So that’s the update tasks - not really rocket science thanks to the fleshed out Persons. Only two things remain: Setting up the scene and actually running it.

For setup: We have Alice, Bob and Carol. Lets also add Chuck who wants to prevent the others from communicating by flooding them with spam.

def gen_person(name):
        return Person(myname=name, mymessage=random.choice(IRC_LOGLINES))
        return Person(myname=name, mymessage="Hi, it's me!")

# start with alice
alice = gen_person("Alice")

# happy, friendly people
for name in ["Bob", "Carol"]:
    p = gen_person(name)

# and Chuck
p = Person(myname="Chuck", mymessage="spam")

# All people know Alice (except for Alice).
for p in people:
    if p == alice:

# upload the first version of the spammer and known lists
for p in people:

That’s it. The stage is set, let the trouble begin :)

We don’t need a while loop here, since we just want to know whether the system works. So the event loop is pretty simple: Just call the update function a few times.

for i in range(6):

That’s it. We have spam-resistant message-channels and community discussions. Now we could go on and implement more algorithms on this scheme, like the turn-based games specification (ever wanted to play against truly anonymous competitors?), Fritter (can you guess from its name what it is? :)), a truly privacy respecting dropbox or an anonymizing, censoriship resistant, self-hosting backend for a digital market like OpenBazaar (there’s a 4 Bitcoin bounty on that – info, wallet – you might want to give it a shot).

But that would go far beyond the goal of this article – which is to give you, my readers, the tools to create the next big thing by harnessing the capabilities of Freenet.

These capabilities have been there for years, but hidden beneath non-existing and outdated documentation, misleading claims of being in alpha-stage even though Freenet has been used in what amounts to production for over a decade and, not to forget, the ever-recurring, ever-damning suggestion to SGTFS (second-guess the friendly source). As written in Forgotten Cypherpunk Paradise, Freenet already solved many problems which researchers only begin to tackle now, but there are reasons why it was almost forgotten. With this series I intend fix some of them and start moving Freenet documentation towards the utopian vision laid out in Teach, Don’t Tell. It’s up to you to decide whether I succeeded. If I did, it will show up as a tiny contribution to the utilities and works of art and vision you create.

Note that this is not fast (i.e. enough for blogging but not enough for chat). We can make it faster by going back to SSKs instead of USKs with their additional logic for finding the newest version in O(1), but for USK there are very cheap methods to get notified of new versions for large numbers of keys (subscribing) which are used by more advanced tools like the Web of Trust and the Sone plugin, so this would be an optimization we would have to revert later. With these methods, Sone reaches round trip times of 5-15 minutes despite using large uploads.

Also since this uses Freenet as backend, it scales up: If Alice, Bob, Carol und Chuck used different computers instead of running on my single node, their communication would actually be faster, and if they called in all their alphabet and unicode friends, the system would still run fast. We’re harvesting part of the payoff from using a fully distributed backend :)

And with that, this installment ends. You can now implement really cool stuff using Freenet. In the next article I’ll describe how to avoid doing this stuff myself by interfacing with existing plugins. Naturally I could have done that from the start, but then how could I have explained the Freenet communication primitives these plugins use? :)

If you don’t want to wait, have a look at how Infocalypse uses wot to implement github-like access with user/repo, interfaces with Freemail to realize truly anonymous pull-requests from the command line and builds on FMS to provide automated updates of a DVCS wiki over Freenet.

Happy Hacking!

PS: You might ask “What is missing?”. You might have a nagging feeling that something we do every day isn’t in there. And you’re right. It’s scalable search. Or rather: scalable, spam- and censorship-resistant search. Scalable search would be Gnutella. Spam-resistance would be Credence on the social graph (the people you communicate with). Censorship-resistant is unsolved – even Google fails there. But seeing that Facebook just overtook Google as the main source of traffic, we might not actually need fully global search. Together with the cheap and easy update notifications in Freenet (via USKs), a social recommendation and bookmark-sharing system should make scalable search over Freenet possible. And until then there’s always the decentralized YaCy search engine which has been shown to be capable of crawling Freenet. Also there are the Library and Spider plugins, but they need some love to work well. Also there are the Library and Spider plugins, but they need some love to work well.

PPS: You can download the final example as alice_bob_carol.py

Freenet Interview with Zilion

Zilion Web conducted an Interview about Freenet with me. Zilion asked interesting questions and I kind of went overboard in answering them. They include:

  • When did you become a freenet developer? Why?
  • Freenet has 18 years of continuous development, from here to there, how do you see your growth?
  • Frost vs. FMS, what is your choice and why?
  • What do you think about people who use Freenet just for illegal purposes? And what is your concept of freedom about that?
  • What to expect from the future in Freenet?
  • Can you tell us how Opennet and Darknet works, and its pros and cons?

To see the answers, just head over to the article:

Interview with Freenet Developer (ArneBab)

And do install Freenet and then connect confidentially to your friends to build the darknet one friend at a time.

Freenet anonymity: Best case and Worst case

As the i2p people say, anynomity is no boolean. Freenet allows you to take it a good deal further than i2p or tor, though. If you do it right.

  • Worst case