Marcin Borkowski: emacs-reveal

Some time ago, I learned from the Org-mode mailing list about a very interesting extension to the well-known org-reveal package. The emacs-reveal allows to embed audio files in reveal.js presentations. I find this quite fascinating, especially that I actually did prepare quite a few educational presentations.
-1:-- emacs-reveal (Post)--L0--C0--October 16, 2017 04:19 PM

Wilfred Hughes: These Weeks in Remacs III

Time for another Remacs update: lots of contributions, a wide range of features, and even a logo!

Contributing

Since the last update, we’ve seen contributions from lots of new people. We’ve added @brotzeit and @shanavas786, bringing us to seven wonderful people who can approve your PRs.

Speaking of PRs, we’ve merged an amazing 64 pull requests since the last update!

If you’re looking for a good feature for your first contribution, @brotzeit has been regularly adding new suggestions under the ‘good first issue’ label.

Features

Many Emacs features have now been ported to Rust, with new Rust APIs for accessing elisp datastructures.

Here’s an overview of the features that have landed.

Arithmetic: arithmetic, floating point, random number generation (using a Rust RNG!), and comparisons.

Symbols: symbol properties, interning, obarrays unbinding, keywords and indirect symbols.

Checksums: MD5sum (using a Rust MD5 crate!).

Windows: liveness check, type check, overlays and minibuffer, minibuffer check positions and margins.

Processes: accessing, type check, data structures and names.

Buffers: for the current thread, accessing, file names, size and modification.

Point: bobp, bolp, eolp, markers, point-min, point-max forward-point and goto-char.

Hash tables: copying and accessing.

Characters: multibyte conversions, character tables, category tables

Fonts: type checks.

Miscellaneous: prefix arguments and identity.

We’re also periodically pulling GNU Emacs features into Remacs, so all the features available GNU Emacs trunk are included in Remacs.

Idiomatic Rust in Remacs

Remacs has gradually developed a set of conventions for elisp data types. For each type Foo, we define a LispObject::as_foo, LispObject::as_foo_or_error and a FooRef when you know your elisp datatype is actually a Foo.

For example, here’s how overlay-start was implemented in C:

DEFUN ("overlay-start", Foverlay_start, Soverlay_start, 1, 1, 0,
       doc: /* Return the position at which OVERLAY starts.  */)
  (Lisp_Object overlay)
{
  CHECK_OVERLAY (overlay);

  return (Fmarker_position (OVERLAY_START (overlay)));
}

The C codebase makes heavy use of macros for checking types (CHECK_OVERLAY) and for accessing struct attributes (OVERLAY_START).

Here’s the Rust equivalent:

/// Return the position at which OVERLAY starts.
#[lisp_fn]
fn overlay_start(overlay: LispObject) -> LispObject {
    let marker = overlay.as_overlay_or_error().start();
    marker_position(marker)
}

We use procedural macros to simplify defining an elisp primitive function, and type checking is much more explicit.

(This example is from PR #298.)

Other exciting Rusty features include variadic macros to replace call1, call2 in C with just call! in Rust, and the ability to mock extern C functions so we can write unit tests.

Hash Maps

We’re not always able to leverage the Rust libraries available. @DavidDeSimone showed some amazing Rust-fu exploring using Rust’s FnvHashMap inside Remacs.

Sadly, we weren’t able to use the Rust hash map implementation. The C layer assumes that it can mutate hash table keys in place, and unexec does not play nicely with mmap. See the PR for the full details.

Finally, we’re discussing a logo for Remacs. We’ve had some great submissions:

You can join the logo discussion at PR #360.

As always, if you fancy writing some Rust in support of the world’s lispiest text editor, you can join us on GitHub!

-1:-- These Weeks in Remacs III (Post Wilfred Hughes (me@wilfred.me.uk))--L0--C0--October 16, 2017 12:00 AM

Pragmatic Emacs: Using a visible bell in Emacs

Here’s a tiny and basic tip. If you want you Emacs to flash at you instead of beeping for an error, add the following to your emacs config file

;; turn on visible bell
(setq visible-bell t)
-1:-- Using a visible bell in Emacs (Post Ben Maughan)--L0--C0--October 15, 2017 11:35 PM

Irreal: Start an Engineering Notebook

Camilla over at Winterflower argues that software engineers should keep an engineering notebook. That’s advice that everybody knows they should follow but that too many of us don’t. We’re busy and we think, “I’ll remember what I just did, I don’t need to write it down.” Of course, a little later we don’t remember and have to go through the pain of figuring things out all over again.

I keep a journal in which I record everything I do and discover but I’ve only been doing this for about 3 years. I really wish I’d started earlier. A good way of making it easier to get started and keep with it is to have a good system for recording things.

Of course, as an Emacs and Org mode user that means I have a built-in infrastructure for such things. One of the things that Emacs and Org mode provide is an easy way of retrieving information from your notebook. Org mode tags provide an excellent way of finding things. For example months ago I revised the way I compile Emacs (which can be a bit finicky on macOS), put it in my journal, and added the tags emacs and compiling. When I want to compile Emacs, I merely type Ctrl+c a m emacs:comiling to find all my journal entries with those tags. Even better, I have the commands in a code block so I can run them automatically by just typing Ctrl+c Ctrl+c in the block. That’s a real win over trying to figure out everything each time I compile a new Emacs.

There’s lots of examples like this, of course, and the more you put into your notebook, the more you can get out and the easier it will make things for you. Of you aren’t already keeping an engineering notebook, you should start. Or at least make a New Years resolution to start. I promise you, you’ll be glad you did.

-1:-- Start an Engineering Notebook (Post jcs)--L0--C0--October 15, 2017 05:55 PM

Irreal: Elisp for Configuration

Chris Done has a nice introduction to Elisp for configuration. This introduction is aimed at programmers who are relatively new to Emacs and want to start doing some simple customization.

He doesn’t intend to provide an thorough guide to the Elisp language but rather to give you just enough to be able to use more comprehensive documentation such as the Elisp manual. He assumes the reader has enough programming experience that an “Introduction of Programming” section isn’t necessary.

If you fall into the intended readership, you should give Done’s article a read. It might help you get started making Emacs your own editor.

-1:-- Elisp for Configuration (Post jcs)--L0--C0--October 13, 2017 04:24 PM

Endless Parentheses: Mold Slack entirely to your liking with Emacs

Although fine-tuning your slack notifications is already reason enough to run slack in Emacs, that’s only the beginning. Once everything is up and running you get to decide what you want out of your slack. Some of the snippets below simply make up for missing functionality, other customize the package beyond what you can do on the Slack Webapp.

Priorities first. The most important improvement you can implement is install emojify-mode and turn it on for slack chats.

(add-hook 'slack-mode-hook #'emojify-mode)

Secondly, make sure you costumize the chat faces to your liking. Just open a chat buffer, place your cursor on a piece of text whose face you want to customize, and call customize-face.

In order to keep track of new messages in the mode-line, slack.el uses a package called tracking, which is the same one circe uses for IRC chats. The command tracking-next-buffer is a fantastic way to cycle through your pending messages, bind it to something short.

(with-eval-after-load 'tracking
  (define-key tracking-mode-map [f11]
    #'tracking-next-buffer))
;; Ensure the buffer exists when a message arrives on a
;; channel that wasn't open.
(setq slack-buffer-create-on-notify t)

I’ll never know who thought user statuses were a good idea for Slack. But, thanks to a tip by _asummers on HackerNews, I can live in a world where they don’t exist.

(defun slack-user-status (_id _team) "")

I like notifications with minimal titles, and the package is kind enough to make these configurable.

;;; Channels
(setq slack-message-notification-title-format-function
      (lambda (_team room threadp)
        (concat (if threadp "Thread in #%s") room)))

(defun endless/-cleanup-room-name (room-name)
  "Make group-chat names a bit more human-readable."
  (replace-regexp-in-string
   "--" " "
   (replace-regexp-in-string "#mpdm-" "" room-name)))

;;; Private messages and group chats
(setq
 slack-message-im-notification-title-format-function
 (lambda (_team room threadp)
   (concat (if threadp "Thread in %s") 
           (endless/-cleanup-room-name room))))

Slack.el uses lui for the chat buffers. If you, like me, are a heavy user of abbrevs in Emacs, you’ll find it annoying that the final word of each message won’t get expanded unless you explicitly hit SPC before RET. That’s easy to remedy with an advice.

(advice-add #'lui-send-input :before
            (lambda (&rest _)
              (ignore-errors (expand-abbrev))))

Finally, the biggest missing feature from this package is that it displays the author on every message output, even when the same user sends several messages in a row. The snippet below adds a hook to omit the author name for a message whenever it’s the same author as the previous message.

(defun endless/-author-at (pos)
  (replace-regexp-in-string
   (rx "\n" (* anything)) ""
   (or (get-text-property pos 'lui-raw-text) "")))

(defun endless/-remove-slack-author ()
  "Remove author here if it's the same as above."
  (let ((author-here (endless/-author-at (point)))
        (author-above (endless/-author-at (1- (point)))))
    (when (and (looking-at-p (regexp-quote author-here))
               (equal author-here author-above))
      (delete-region (1- (point))
                     (1+ (line-end-position))))))

(defun endless/remove-slack-author-hook ()
  "For usage in `lui-pre-output-hook'."
  (when (derived-mode-p 'slack-mode)
    (save-excursion
      (goto-char (point-min))
      (save-restriction
        (widen)
        (endless/-remove-slack-author)))))

(add-hook 'lui-pre-output-hook
          #'endless/remove-slack-author-hook)

You don’t have to stop here, of course. Want to fine-tune which buffers get tracked on the mode-line? Hack into tracking.el. Want to change the face used for your own messages, or even align them to the right? Redefine slack-buffer-insert. Your workflow is yours to build.

Comment on this.

-1:-- Mold Slack entirely to your liking with Emacs (Post)--L0--C0--October 09, 2017 11:43 PM

Pragmatic Emacs: Tree-style directory views in dired with dired-subtree

By default, Emacs’ file browser/manager dired usually presents you with a flat list of files in a given directory. Entering a subdirectory then opens a new buffer with the listing of the subdirectory. Sometimes you might want to be able to see the contents of the subdirectory and the current directory in the same view. Many GUI file browsers visualise this with a tree structure with nodes that can be expanded or collapsed. In Emacs there is a built-in function dired-insert-subdir that inserts a listing of the subdirectory under the cursor, at the bottom of the current buffer instead of in a new buffer, but I’ve never found that very helpful.

The dired-subtree package (part of the magnificent dired hacks) improves on this by allowing you to expand subdirectories in place, like a tree structure. To install the package, use the following code:

(use-package dired-subtree
  :config
  (bind-keys :map dired-mode-map
             ("i" . dired-subtree-insert)
             (";" . dired-subtree-remove)))

This sets up the keybinds so that in dired, hitting i on a subdirectory expands it in place with an indented listing. You can expand sub-subdirectories in the same way, and so on. Hitting ; inside an expanded subdirectory collapses it.

Happily, some of my other favourite tools from dired hacks like dynamically narrowing the directory listing or copying and pasting files work as you would want in these expanded subdirectories.

-1:-- Tree-style directory views in dired with dired-subtree (Post Ben Maughan)--L0--C0--October 08, 2017 11:29 PM

Manuel Uberti: To shell or not to shell

As much as most of my daily workflow revolves around Emacs, I always have GNOME terminal ready to fly with Fish shell and tmux. I keep EShell next to me for quick tasks, but I have never relied on shell-mode or ansi-term for other CLI-intensive work.

I don’t know what happened to the lovely “Emacs Chat” series from Sacha Chua, but more than three years ago she interviewed Mickey Petersen. Mickey talked with great enthusiasm about shell-mode and at the time I admittedly made a mental note about giving it a try. Regretfully, I have only recently come back to that note.

My first M-x shell didn’t look that great. I haven’t debugged the compatibility issues with Fish, probably something related to my heavily customised config.fish. Anyway, falling back to Bash is enough.

(validate-setq explicit-shell-file-name "/bin/bash")

Note that I am using validate-setq as explained here.

Another thing I have noticed is the lack of colours for the output of ls. Fortunately, Emacs StackExchange has an answer for that, so I have added this line to my .bash_aliases file:

alias ls="TERM=ansi ls --color=always"

The input echoing is easily turned off following the instructions on the manual. Also, the history is much cleaner and easier to navigate with counsel-shell-history.

(unbind-key "C-c C-l" shell-mode-map)
(bind-key "C-c C-l" #'counsel-shell-history shell-mode-map)

Note that unbind-key and bind-key are macros from bind-key.el, which is part of the fantastic use-package.

Last but not least, I like to have my shell buffer filling the whole window in the current frame. Thus, display-buffer-alist to the rescue.

(validate-setq
 display-buffer-alist
 `(
   ;; … other stuff …
   (,(rx bos "*shell")
    (display-buffer-same-window)
    (reusable-frames . nil))
   ;; … other stuff …
  ))
-1:-- To shell or not to shell (Post)--L0--C0--October 07, 2017 12:00 AM

Modern Emacs: Solving ligature spacing in Emacs - proof of concept

Ligatures are single-character replacements of strings. Examples of ligatures: replacing "alpha" with the alpha symbol and "!=" with the a slashed equal sign. See Coding with Mathematical Notation for details and pictures.

There is a serious flaw with ligatures - either the indentation you see with ligatures or without ligatures is correct, not both. So if someone that does not use ligatures works on your code, your indentation's will not match. An example:


;; True indentation, what you want others to see
(alpha b
       c)

;; Emacs indentation, what you want to see when working
(a b
   c)

This problem significantly hampers ligature adoption.

I do not believe any editor implements a solution to ligatures such that you see the indentation you want to see, while the true indentation remains correct.

I present a proof-of-concept solution to ligature spacing,

How Emacs displays text

Emacs associates text-properties with strings. A property can be anything. Some property names are special and tell Emacs to handle the text in a particular way, like face for how a text is highlighted.

An overlay has associated text-properties but is buffer-local. So when we move that text to another buffer, if that overlay had a face, then that face would not be carried over.

Properties to be aware of:

  • display : How Emacs displays that region, can be any string.
  • invisible : Whether the text should be displayed.
  • modification-hooks : When text in the overlay is edited, run these hooks.
  • evaporate (overlays) : Once the overlay is "done-with", delete the overlay.

Compose region

Additionally, compose-region is similar to display in that the composed region is displayed as (possibly many) characters. Current implementations of ligatures all leverage compose-region by searching the buffer for say alpha and composing from alphas beginning to end point the Unicode symbol for alpha.

There are several important distinctions between compose-region and put-text-property 'display

  1. Indentation uses the composed character for indenting while the text-property
  2. display indents with the true, original string.
  3. Composition cannot be set for overlays. The internal composition text property,
  4. unlike all other properties, cannot be put manually.
  5. Editing within a composed region will undo the composition while one must
  6. delete the whole region with the display property to undo the display.

Working through a solution

To compose or display the ligature?

Because composition adjusts the underlying indentation, it cannot be used for a ligature spacing solution. Indentation cannot be adjusted in a major-mode agnostic manner. Indentation always considers the true number of characters preceding the text on the line, so dynamically adding invisible spaces will not work.

But how to make editing a display behave like a composition?

It is a serious issue to have to delete the whole text for the ligature to disappear.

The solution is the modification-hooks text-property.


(defun lig-mod-hook (overlay post-mod? start end &optional _)
  (when post-mod?
    (overlay-put overlay 'display nil)
    (overlay-put overlay 'modification-hooks nil)))  ; force evaporation

(overlay-put lig-overlay 'modification-hooks '(lig-mod-hook))

Now editing text with the display property will behave as desired.

So how to visually collapse the indentation?

We could set invisible on the first 5 spaces of the line to collapse the visual indentation by 5. But the invisible property will modify subsequent line's indentation by 5 fewer (if necessary), an issue that cannot be resolved as we cannot determine in general the "if necessary" part.

The trick is to make the 5 first spaces display as one space. Because display doesn't modify indentation, subsequent lines will be indented properly.


(overlay-put space-overlay 'display " ")

How do we determine the indentation we want to see then?

We let Emacs do the work - we create a mirror buffer where the ligatures are actually composed and compare the differences in indentation.

Overlays are not just buffer-local, they also do not transfer to indirect buffers. Ideally we would have a hidden indirect buffer where we keep ligatures composed instead. Unfortunately, since the composition text property is special, it can only be set with compose-region which does not work for overlays.

Further, calculating indentation always adjusts the indentation. The significance is that whenever we indent the indirect buffer, all the text will move back-and-forth. So indirect buffers are out.

Instead we create temporary buffers for the composition and retrieve an alist of lines and their composed indentations.

A working example

The current ligature snippets floating around hack font-locks to perform the ligature substitutions. I recently became familiar with context-sensitive syntax highlighting via the syntax-propertize-function in my work on hy-mode.

I develop a minimal major-mode lig-mode that uses the syntax function to implement ligatures.

Setup

First we setup a basic major-mode for testing.


(provide 'lig-mode)

(add-to-list 'auto-mode-alist '("\\.lig\\'" . lig-mode))

(define-derived-mode lig-mode fundamental-mode "Lig"
  (setq-local indent-line-function 'lisp-indent-line)
  (setq-local syntax-propertize-function 'lig-syntax-propertize-function))

This is a proof-of-concept - we implement spacing for a single ligature for now. Lets replace "hello" with a smiley face.


(defun lig--match-lig (limit)
  (re-search-forward (rx word-start "hello" word-end) limit t))

(setq lig-char #x263a)
(setq lig-str "☺")

Determining the indents we want to see

We copy the buffer contents to a temporary buffer, search and compose the symbols, indent the buffer, and copy the indentation for each line.


(defvar lig-diff-indents nil)

(defun lig-get-diff-indents ()
  (setq lig-diff-indents nil)
  (save-excursion
    ;; Compose the ligatures
    (goto-char (point-min))
    (while (re-search-forward (rx word-start "hello" word-end) nil t)
      (compose-region (match-beginning 0) (match-end 0) lig-char))

    ;; Change indent to match the composed symbol
    (indent-region (point-min) (point-max))

    ;; Build an alist of line and indention column
    (goto-char (point-min))
    (setq line 1)
    (while (< (point) (point-max))
      (push (cons line (current-indentation))
            lig-diff-indents)
      (forward-line)
      (setq line (1+ line)))))

(defun run-lig-get-diff-indents ()
  (let ((true-buffer (current-buffer)))
    (with-temp-buffer
      (fundamental-mode)
      (setq-local indent-line-function 'lisp-indent-line)
      (insert-buffer-substring-no-properties true-buffer)
      (lig-get-diff-indents))))

Bringing it together

For details on how syntax-propertize-function works, check this post.

Whenever we edit the buffer this hook will run, recalculating and visually collapsing all the leading spaces as needed.


(defun lig-syntax-propertize-function (start-limit end-limit)
  ;; Make sure visual indentations are current
  (run-lig-get-diff-indents)

  (save-excursion
    (goto-char (point-min))

    (while (lig--match-lig end-limit)
      (let ((start (match-beginning 0))
            (end (match-end 0)))
        (unless (-contains? (overlays-at start) lig-overlay)
          ;; Create and set the lig overlays if not already set
          (setq lig-overlay (make-overlay start end))
          (overlay-put lig-overlay 'display lig-str)
          (overlay-put lig-overlay 'evaporate t)
          (overlay-put lig-overlay 'modification-hooks '(lig-mod-hook)))))

    ;; Remove all spacing overlays from buffer
    (remove-overlays nil nil 'invis-spaces t)

    ;; Recalcualte and add all spacing overlays
    (goto-char (point-min))
    (setq line 1)

    (while (< (point) (point-max))
      ;; Don't add the spacing overlay until we indent
      (unless (> (+ (current-indentation) (point))
                 (point-max))
        (let* ((vis-indent (alist-get line lig-diff-indents))
               (num-spaces (- (current-indentation) vis-indent))
               (start (point))
               (end (+ num-spaces (point))))

         ;; only add invisible spaces if the indentations differ
         (unless (<= num-spaces 1)
            (setq space-overlay (make-overlay start end))
            (overlay-put space-overlay 'invis-spaces t)
            (overlay-put space-overlay 'display " ")
            (overlay-put space-overlay 'evaporate t))

         (setq line (1+ line))
         (forward-line))))))

The result

Enable lig-mode to see:


;; The true text
(hello how
       are
       you (hello hi
                  again))

;; What we see
(☺ how
   are
   you (☺ hi
         again))

The indentation we see is not the true indentation anymore!

The full and current code is hosted here.

The missing space on the second hello is a bug. There are many issues with this implementation - this is a proof of concept. I suspect a completely correct solution to be still some time and effort away, if only because this approach is incredibly inefficient.

This post shows that we maybe can have our cake and eat it too in regards to ligatures.

-1:-- Solving ligature spacing in Emacs - proof of concept (Post)--L0--C0--October 05, 2017 12:00 AM

(or emacs: Extending completion-at-point for Org-mode

Intro

When creating documents, context aware completion is a powerful mechanism that can help you improve the speed, correctness and discoverability.

Emacs provides context aware completion via the complete-symbol command, bound to C-M-i by default. In order for it to do something useful, completion-at-point-functions has to be set up.

Documentation:

Special hook to find the completion table for the thing at point.
Each function on this hook is called in turn without any argument and should
return either nil to mean that it is not applicable at point,
or a list of the form (START END COLLECTION) where
START and END delimit the entity to complete and should include
point, COLLECTION is the completion table to use to complete it.

For each major-mode, a different value of completion-at-point-functions can (and probably should) apply. One of the modes that's set up nicely by default is emacs-lisp-mode: press C-M-i to get completion for Elisp variable and function names. Org-mode, on the other hand, is quite lacking in this regard: nothing useful happens with C-M-i.

Here's my current setting for Org-mode:

(setq completion-at-point-functions
      '(org-completion-symbols
        ora-cap-filesystem
        org-completion-refs))

org-completion-symbols

When I write about code in Org-mode, I quote items like this:

=/home/oleh/=, =HammerFactoryFactory=, etc.

Quoting has several advantages:

  • It looks nice, since it's in a different face,
  • flyspell doesn't need to check it, which makes sense since it would fail on most variable and class names,
  • Prevents Org from confusing directory names for italics mark up.

Completion has one more advantage on top of that: if I refer to a symbol name multiple times within a document, completion helps me to enter it quickly and correctly. Here's the corresponding completion source:

(defun org-completion-symbols ()
  (when (looking-back "=[a-zA-Z]+")
    (let (cands)
      (save-match-data
        (save-excursion
          (goto-char (point-min))
          (while (re-search-forward "=\\([a-zA-Z]+\\)=" nil t)
            (cl-pushnew
             (match-string-no-properties 0) cands :test 'equal))
          cands))
      (when cands
        (list (match-beginning 0) (match-end 0) cands)))))
  1. First of all, it checks if the point is e.g. after =A, i.e. we are in fact entering a new quoted symbol. If that's not the case, return nil and let the other completion sources have a go.

  2. Next, it looks through the current buffer for each =foo= and =bar=, accumulating them into a list.

  3. Finally, it returns the bounds of what we've got so far, plus the found candidates. It's important that the bounds are passed to the completion engine, so that it can delete everything inside the bounds before inserting the whole selected symbol.

org-cap-filesystem

This source is for completing file names:

(defun ora-cap-filesystem ()
  (let (path)
    (when (setq path (ffap-string-at-point))
      (let ((compl
             (all-completions path #'read-file-name-internal)))
        (when compl
          (let ((offset (ivy-completion-common-length (car compl))))
            (list (- (point) offset) (point) compl)))))))

I usually enter ~, so that ffap-string-at-point recognizes it as a path. Then complete each part of the path with C-M-i. It's very similar to counsel-find-file. In fact, I could just use counsel-find-file for this, with M-o i to insert the file name instead of opening the selected file.

org-completion-refs

org-completion-refs is very similar to org-completion-symbols: it will collect all instances of e.g. \label{foo}, and offer them for completion when you enter \ref{. If you want to look at the code, it's available in my config.

Outro

I hope I convinced you about the usefulness of completion at point. It's especially cool since it's a universal interface for major-mode-specific completion. So any IDE-like package for any language could provide its own completion using the familiar interface. That could go a long way towards providing a "just works" experience, particularly when dealing with a new language.

-1:-- Extending completion-at-point for Org-mode (Post)--L0--C0--October 03, 2017 10:00 PM

Modern Emacs: Deep diving into a major mode - Part 1

I've taken up maintaining hy-mode - a major mode for lispy python.

I narrate working through specific problems in auto-completion, indentation, shell integration, and so on.

This post touches on: syntax, indentation, font-locking, and context-sensitive syntax.

All code snippets require the Emacs packages dash and s.

Syntax Tables

The first step in a major mode is the syntax table.

In any major mode run describe-syntax to see its syntax table. As we are working with a lisp, we copy its syntax-table to start with.


(defconst hy-mode-syntax-table
  (-let [table
         (copy-syntax-table lisp-mode-syntax-table)]
    ;; syntax modifications...
    table)
  "Hy modes syntax table.")

The syntax table isn't set explicitly, its name identifies and sets it for hy-mode.

Configuration is performed with modify-syntax-entry, its docstring provides all the syntactic constructs we can pick from.

A subset to be familiar with:

  • ( ) : open/close parenthesis. These are for all bracket-like constructs such
  • as [ ] or { }. The first character should be the syntactic construct, namely "(" or ")", and the second character should be the closing delimiter.

(modify-syntax-entry ?\{ "(}" table)
(modify-syntax-entry ?\} "){" table)
(modify-syntax-entry ?\[ "(]" table)
(modify-syntax-entry ?\] ")[" table)
  • ' : prefix character. Prefixes a symbol/word.

;; Quote characters are prefixes
(modify-syntax-entry ?\~ "'" table)
(modify-syntax-entry ?\@ "'" table)
  • _ and w : symbol and word constituent respectively.

;; "," is a symbol in Hy, namely the tuple constructor
(modify-syntax-entry ?\, "_ p" table)

;; "|" is a symbol in hy, naming the or operator
(modify-syntax-entry ?\| "_ p" table)

;; "#" is a tag macro, we include # in the symbol
(modify-syntax-entry ?\# "_ p" table)
  • | : generic string fence. A more general string quote syntactic construct.
  • Used for delimiting multi-line strings like with triple quotes in Python. I go into depth on this construct in the "context-sensitive syntax" section.

Indentation

Look through calculate-lisp-indent, the indentation workhorse of lisp-mode derivatives, and it is quickly seen that indentation is hard.

Indentation is set with indent-line-function.

In the case of a lisp, we actually do:


(setq-local indent-line-function 'lisp-indent-line)
(setq-local lisp-indent-function 'hy-indent-function)

Where the real work is performed by calculate-lisp-indent that makes calls to lisp-indent-function, accepting an indent-point and state.

The function at heart is parse-partial-sexp, taking limiting points and retrieving a 10 element list describing the syntax at the point.

As this is a (necessarily) excessive amount of information, I recommend as many other modes have done - define some aliases. I have:


(defun hy--sexp-inermost-char (state) (nth 1 state))
(defun hy--start-of-last-sexp (state) (nth 2 state))
(defun hy--in-string? (state) (nth 3 state))
(defun hy--start-of-string (state) (nth 8 state))

Observe you can also omit state and call syntax-ppss to get state which runs parse-partial-sexp from point-min to current point, with the caveat that the 2nd and 6th state aren't reliable. I prefer to pass the state manually.

These are the building blocks for indentation - we can then write utilities to better get our head around indentation like:


(defun hy--prior-sexp? (state)
  (number-or-marker-p (hy--start-of-last-sexp state)))

The indent function

The three cases:


;; Normal Indent
(normal b
        c)
(normal
  b c)

;; Special Forms
(special b
  c)

;; List-likes
[a b
 c]

Hy's current indent function:


(defun hy-indent-function (indent-point state)
  "Indent at INDENT-POINT where STATE is `parse-partial-sexp' for INDENT-POINT."
  (goto-char (hy--sexp-inermost-char state))

  (if (hy--not-function-form-p)
      (1+ (current-column))  ; Indent after [, {, ... is always 1
    (forward-char 1)  ; Move to start of sexp

    (cond ((hy--check-non-symbol-sexp (point))  ; Comma tuple constructor
           (+ 2 (current-column)))

          ((hy--find-indent-spec state)  ; Special form uses fixed indendation
           (1+ (current-column)))

          (t
           (hy--normal-indent calculate-lisp-indent-last-sexp)))))

When we indent we jump to the sexp's innermost char, ie. "(", "[", "{", etc..

If that character is a list-like, then we 1+ it and are done.

Otherwise we move to the start of the sexp and investigate if (thing-at-point 'symbol). If it is, then we check a list of special forms like when, do, defn for a match. If we found a (possibly fuzzy) match, then regardless of whether the first line contains args or not, we indent the same.


(defun hy--normal-indent (last-sexp)
  "Determine normal indentation column of LAST-SEXP.

Example:
 (a (b c d
       e
       f))

1. Indent e => start at d -> c -> b.
Then backwards-sexp will throw error trying to jump to a.
Observe 'a' need not be on the same line as the ( will cause a match.
Then we determine indentation based on whether there is an arg or not.

2. Indenting f will go to e.
Now since there is a prior sexp d but we have no sexps-before on same line,
the loop will terminate without error and the prior lines indentation is it."
  (goto-char last-sexp)
  (-let [last-sexp-start nil]
    (if (ignore-errors
          (while (hy--anything-before? (point))
            (setq last-sexp-start (prog1
                                      ;; Indentation should ignore quote chars
                                      (if (-contains? '(?\' ?\` ?\~)
                                                      (char-before))
                                          (1- (point))
                                        (point))
                                    (backward-sexp))))
          t)
        (current-column)
      (if (not (hy--anything-after? last-sexp-start))
          (1+ (current-column))
        (goto-char last-sexp-start)  ; Align with function argument
        (current-column)))))

Normal indent does the most work. To notice, if we are on the next line without a function arg above, then last-sexp-start will be nil as backward-sexp will throw an error and the setq won't go off.

If there is a function call above, then the current-column of the innermost, non-opening sexp, will end up as the indent point.

If we indent the line of the funcall, it will jump to the containing sexp and calculate its indent.

Other indentation functions are a bit more advanced in that they track the number of prior sexps in the indent-function to distinguish between eg. the then and else clause of an if statement. Those cases use the same fundamentals that are seen here.

Developing indentation from scratch can be challenging. The approach I took was to look at clojure's indentation and trim it down until it fit this language. I've removed most of the extraneous details that it adds to handle special rules for eg. clojure.spec but it is still possible that I could trim this further.

Font Locks and Highlighting

Two symbols are the entry points to be aware of into font locking: hy-font-lock-kwds and hy-font-lock-syntactic-face-function.


(setq font-lock-defaults
        '(hy-font-lock-kwds
          nil nil
          (("+-*/.<>=!?$%_&~^:@" . "w"))  ; syntax alist
          nil
          (font-lock-mark-block-function . mark-defun)
          (font-lock-syntactic-face-function  ; Differentiates (doc)strings
           . hy-font-lock-syntactic-face-function)))

Font lock keywords

There exists many posts on modifying the variable font-lock-keywords.

The approach taken in hy-mode is to separate out the language by category:


(defconst hy--kwds-constants
  '("True" "False" "None" "Ellipsis" "NotImplemented")
  "Hy constant keywords.")

(defconst hy--kwds-defs
  '("defn" "defun"
    "defmacro" "defmacro/g!" "defmacro!"
    "defreader" "defsharp" "deftag")
  "Hy definition keywords.")

(defconst hy--kwds-operators
  '("!=" "%" "%=" "&" "&=" "*" "**" "**=" "*=" "+" "+=" "," "-"
    "-=" "/" "//" "//=" "/=" "<" "<<" "<<=" "<=" "=" ">" ">=" ">>" ">>="
    "^" "^=" "|" "|=" "~")
  "Hy operator keywords.")

;; and so on

And then use the amazing rx macro for constructing the regexes.

Now due to rx being a macro and its internals, in order to use variable definitions in the regex construction we have to call rx-to-string instead.

The simplest definition:


(defconst hy--font-lock-kwds-constants
  (list
   (rx-to-string
    `(: (or ,@hy--kwds-constants)))

   '(0 font-lock-constant-face))

  "Hy constant keywords.")

A more complex example with multiple groups taking different faces:


(defconst hy--font-lock-kwds-defs
  (list
   (rx-to-string
    `(: (group-n 1 (or ,@hy--kwds-defs))
        (1+ space)
        (group-n 2 (1+ word))))

   '(1 font-lock-keyword-face)
   '(2 font-lock-function-name-face nil t))

  "Hy definition keywords.")

Of course not all highlighting constructs are determined by symbol name. We can highlight the shebang line for instance as:


(defconst hy--font-lock-kwds-shebang
  (list
   (rx buffer-start "#!" (0+ not-newline) eol)

   '(0 font-lock-comment-face))

  "Hy shebang line.")

We then collect all our nice and modular font locks as hy-font-lock-kwds that we set earlier:


(defconst hy-font-lock-kwds
  (list hy--font-lock-kwds-constants
        hy--font-lock-kwds-defs
        ;; lots more ...
        hy--font-lock-kwds-shebang)

  "All Hy font lock keywords.")

Syntactic face function

This function is typically used for distinguishing between string, docstrings, and comments. It does not need to be set unless you want to distinguish docstrings.


(defun hy--string-in-doc-position? (state)
  "Is STATE within a docstring?"
  (if (= 1 (hy--start-of-string state))  ; Identify module docstring
      t
    (-when-let* ((first-sexp (hy--sexp-inermost-char state))
                 (function (save-excursion
                             (goto-char (1+ first-sexp))
                             (thing-at-point 'symbol))))
      (s-matches? (rx "def" (not blank)) function))))  ; "def"=="setv"

(defun hy-font-lock-syntactic-face-function (state)
  "Return syntactic face function for the position represented by STATE.
STATE is a `parse-partial-sexp' state, and the returned function is the
Lisp font lock syntactic face function. String is shorthand for either
a string or comment."
  (if (hy--in-string? state)
      (if (hy--string-in-doc-position? state)
          font-lock-doc-face
        font-lock-string-face)
    font-lock-comment-face))

It is rather straightforward - we start out within either a string or comment. If needed, we jump to the first sexp and see if it is a "def-like" symbol, in which case we know its a doc.

This implementation isn't perfect as any string with a parent def-sexp will use the doc-face, so if your function returns a raw string, then it will be highlighted as if its a doc.

Context sensitive syntax

An advanced feature Emacs enables is context-sensitive syntax. Some examples are multi-line python strings, where there must be three single quotes together, or haskell's multiline comments.

Hy implements multiline string literals for automatically escaping quote characters. The syntax is #[optional-delim[the-string]optional-delim] where the string can span lines.

In order to identify and treat the bracket as a string, we look to setting the syntax-propertize-function.

It takes two arguments, the start and end points with which to search through. syntax.el handles the internals of limiting and passing the start and end and applying/removing the text properties as the construct changes.


(defun hy--match-bracket-string (limit)
  "Search forward for a bracket string literal."
  (re-search-forward
   (rx "#["
       (0+ not-newline)
       "["
       (group (1+ (not (any "]"))))
       "]"
       (0+ not-newline)
       "]")
   limit
   t))

(defun hy-syntax-propertize-function (start end)
  "Implements context sensitive syntax."
  (save-excursion
    (goto-char start)

    ;; Start goes to current line, need to go to char-before the #[ block
    (when (nth 1 (syntax-ppss))
      (goto-char (- (hy--sexp-inermost-char (syntax-ppss)) 2)))

    (while (hy--match-bracket-string end)
      (put-text-property (1- (match-beginning 1)) (match-beginning 1)
                         'syntax-table (string-to-syntax "|"))

      (put-text-property (match-end 1) (1+ (match-end 1))
                         'syntax-table (string-to-syntax "|")))))

We go to the start and jump before its innermost containing sexp begins minus two for the hash sign and bracket characters.

If the regex matches a bracket string, we then set the innermost brackets on both sides to have the string-fence syntax.

When the syntax is set - parse-partial-sexp and in particular font lock mode and indent-line will now recognize that block as a string - so proper indentation and highlighting follow immediately. And when we modify the brackets, the string-fence syntax is removed and behaves as expected.

This function can handle any kind of difficult syntactic constructs. For instance, I could modify it to only work if the delimiters on both side of the bracket string are the same. I could also associate some arbitrary, custom text property that other parts of hy-mode interact with.

Note that there is the macro syntax-propertize-rules for automating the searching and put-text-property portions. I prefer to do the searching and application manually to 1. have more flexibility and 2. step through the trace easier.

Closing

Building a major mode teaches a lot about how Emacs works. I'm sure I've made errors, but so far this has been enough to get hy-mode up and running. The difference in productivity in Hy I've enjoyed since taking maintainer-ship has made the exercise more than worth it.

I also have auto-completion and shell/process integration working which I'll touch on in future posts.

-1:-- Deep diving into a major mode - Part 1 (Post)--L0--C0--October 03, 2017 12:00 AM

Endless Parentheses: Turbo up your Ruby console in Emacs

Keeping a REPL (or a console) always by your side is never a bad habit, and if you use an IDE-package (like Robe for Ruby, or Cider for Clojure) it’s nigh unavoidable. Being an essential part of your environment, it would be ridiculous not to invest some time optimizing it.

One obvious optimization is to bind a key to your “start console” command, but that’s just the start. You pretty much never need two running consoles for the same project, so why not have the same key switch to it if it’s already running?

But we can go a bit farther with very little work. I have a file where I define a lot of small helper methods for my Ruby console, so let’s require it automatically whenever a new console is started.

(defcustom endless/ruby-extensions-file
  "../console_extensions.rb"
  "File loaded when a ruby console is started.
Name is relative to the project root.")

;; Skip ENV prompt that shows up in some cases.
(setq inf-ruby-console-environment "development")

(defun endless/run-ruby ()
  (interactive)
  (require 'inf-ruby)
  (let ((default-directory (projectile-project-root))
        (was-running (get-buffer-process inf-ruby-buffer)))
    ;; This function automatically decides between starting
    ;; a new console or visiting an existing one.
    (inf-ruby-console-auto)
    (when (and (not was-running)
               (get-buffer-process (current-buffer))
               (file-readable-p endless/ruby-extensions-file))
      ;; If this brand new buffer has lots of lines then
      ;; some exception probably happened.
      (send-string
       (get-buffer-process (current-buffer))
       (concat "require '" endless/ruby-extensions-file
               "'\n")))))

;; CIDER users might recognize this key.
(define-key ruby-mode-map (kbd "C-c M-j")
  #'endless/run-ruby)

If you use Projectile and want to go even faster, check out the j key on my post about Projectile.

Comment on this.

-1:-- Turbo up your Ruby console in Emacs (Post)--L0--C0--October 02, 2017 08:57 PM

Marcin Borkowski: Converting TeX sequences to Unicode characters

I quite often deal with LaTeX files using stuff like \'a or \"e, and I really prefer having those encoded in UTF-8. So the natural question arises: how to convert one into another? The problem is especially frustrating because Emacs can do this – either via C-x 8 prefix, or with the TeX input method. It is not trivial, however, to find out how it does these things, and to get hold of the data used to actually perform the conversion. (At least, I didn’t find a way to do it.) After a bit of searching, however, I came up with another solution. I’m hesitant to call it “clever”; it’s rather hackish, but hey, it works, so who cares.
-1:-- Converting TeX sequences to Unicode characters (Post)--L0--C0--October 02, 2017 06:14 PM

Jonas Bernoulli: Borg 2.0 and Epkg 3.0 released

I am excited to announce the release of Borg v2.0, Epkg v3.0, Closql v0.4 and Emir v2.0.
-1:-- Borg 2.0 and Epkg 3.0 released (Post)--L0--C0--September 20, 2017 03:00 PM

Manuel Uberti: Taming closing delimiters in my s-expressions

As I explained when I wrote about my daily Clojure workflow, I rely heavily on Smartparens for my editing. With Lisp-like languages in particular, I enable smartparens-strict-mode to keep my s-expressions balanced even when I happen to use delete-char or kill-word dangerously near a closing parenthesis.

I have sp-kill-sexp bound to C-M-k, however out of habit I often use C-k to kill a line, which in my configuration is set up as Artur Malabarba explained in his Kill Entire Line with Prefix Argument. Doing that in the middle of an s-expression creates unnerving chaos.

Smartparens comes with a handy binding to temporarily disable the enforced balancing and let me insert a closing delimiter. Just pressing C-q followed by the desired matching parenthesis brings the order back.

Unfortunately, it’s not always that easy. Take this snippet which appears at the end of a ClojureScript function:

(when-not (empty? @data)
            [:div
             {:style {:padding "1em" :text-align "center"}}
             [graph]])]]))))

Carelessly hitting C-k near [graph] disrupts an otherwise elegant s-expression. I could undo, of course, but what if after C-k I do other kill-and-yank edits?

This is exactly why I have come to love syntactic-close.

(use-package syntactic-close            ; Automatically insert closing delimiter
  :ensure t
  :bind ("C-c x c" . syntactic-close))

As soon as I discover an unbalanced s-expression, I can use C-c x c as many times as needed to add back the right closing delimiters.

-1:-- Taming closing delimiters in my s-expressions (Post)--L0--C0--September 17, 2017 12:00 AM

Bryan Murdock: Not Leaky, Just Wrong

Intel recently announced new tools for FPGA design. I should probably try to understand OpenCL better before bagging on it, but when I read, "[OpenCL] allows users to abstract away hardware-specific development and use a higher-level software development flow." I cringe. I don't think that's how we get to a productive, higher-level of abstraction in FPGA design. When you look at the progress of software from low-level detailed design to high-level abstract design you see assembly to C to Java to Python (to pick one line of progression among many). The thing that happened every time a new higher-level language gained traction is people recognized patterns that developers were using over and over in one language and made language features in a new language that made those patterns one-liners to implement.

Examples of design patterns turning into language features are, in assembly people developed the patterns of function calls: push arguments onto the stack, save the program counter, jump to the code the implements the function, the function code pops arguments off the stack, does it's thing, then jumps back to the the code that called it. In C the tedium of all that was abstracted away by the language providing you with syntax to define a function, pass it arguments, and just call return at the end. In C people then started developing patterns of structs containing data and function pointers for operating on that data which turned into classes and objects in Java. Java also abstracted away memory management with a garbage collector. Patterns in Java (Visitor, State, etc.) are no longer needed in Python because of features in that language (related discussion here).

This is the path that makes most sense to me for logic design as well. Right now in RTL Verilog people use patterns like registers (always block that activates on posedge clk, has reset, inputs, outputs, etc.), state machines (case statement and state registers, next_state logic...), interfaces (SV actually attempted to add syntax for this), and so on. It seems like the next step in raising the abstraction level is to have a language with those sorts of constructs built-in. Then let people use that for a while and see what new patterns develop and encapsulate those patterns in new language features. Maybe OpenCL does this? I kind of doubt it if it's a "software development flow." It's probably still abstracting away CPU instructions.

-1:-- Not Leaky, Just Wrong (Post Bryan (noreply@blogger.com))--L0--C0--September 15, 2017 03:04 PM

Timo Geusch: Emacs 25.3 released

Emacs 25.3 has been released on Monday. Given that it’s a security fix I’m downloading the source as I write this. If you’re using the latest Emacs I’d recommend you update your Emacs. The vulnerability as been around since Emacs Read More

The post Emacs 25.3 released appeared first on The Lone C++ Coder's Blog.

-1:-- Emacs 25.3 released (Post Timo Geusch)--L0--C0--September 15, 2017 04:20 AM

punchagan: Emacs frame as a pop-up input

I wanted to try using a dialog box/pop-up window as a prompt to remind me to periodically make journal entries. I had the following requirements:

  • Simple, light-weight dialog box that allows text of arbitrary length
  • Ability to launch the dialog from the shell
  • Ability to have some placeholder or template text, each time the dialog is shown
  • Save the input text to a specific org-mode file
  • Write as little code of my own, as possible, to do this

I had initially thought about using a tool like zenity, or write a simple dialog box in Python using Qt, wx or even tk, and then yank the input text at the desired location. This probably wouldn’t have turned out to be too hard, but getting things to look and work exactly the way I wanted would have required more code than I was willing to write or maintain.

After avoiding doing this for a while, I finally realized that I could simply use Emacs with a new frame with the appropriate dimensions, and with the correct file/buffer open to the desired location. This would

  • eliminate the need for me to write the UI myself
  • eliminate the need to do text manipulation in code, to yank it at the right place, in the right form. By directly opening up the editor at the required location, the onus is on me (as a text inputting user) to put it in, the way I want it.
  • additionally provide me the comfort of being able to write with the full power of Emacs - keybindings and all that jazz.
  • let me leverage elisp to do essentially whatever I want with the buffer being displayed as the dialog box.

I ended up with a command that looks something like this

emacsclient -c -n\
            -F '((title . "Title") (left . (+ 550)) (top . (+ 400)) (width . 110) (height . 12))'\
            -e '(pc/open-journal-buffer)'

This worked pretty nicely, except for the fact that with gnome-shell, the pop-up frame doesn’t always appear raised. It often gets hidden in the Emacs windows group, and the whole idea of the pop-up acting as a reminder goes for a toss! But, thanks to this Ask Ubuntu post, I could fix this pretty easily.

emacsclient -c -n\
            -F '((title . "Title") (left . (+ 550)) (top . (+ 400)) (width . 110) (height . 12))'\
            -e '(progn (pc/open-journal-buffer) (raise-frame) (x-focus-frame (selected-frame)))'
-1:-- Emacs frame as a pop-up input (Post)--L0--C0--September 14, 2017 04:56 PM

Jonas Bernoulli: Magit 2.11 released

I am excited to announce the release of Magit version 2.11, consisting of 303 commits since the last feature release six months ago.
-1:-- Magit 2.11 released (Post)--L0--C0--September 13, 2017 11:00 AM

Steven Pigeon: Undo that mess

During last marking season (at the end of the semester), I had, of course, to grade a lot of assignments. For some reason, every semester, I have a good number of students that write code like they just don’t care. I get code that looks like this:

int fonction              (int random_spacing)^M{           ^M
  int            niaiseuses;

  for (int i=0;i<random_spacing;         i++){
                    {
       {
        std::cout
         << bleh
         << std::endl;
    }}

  }
}

There’s a bit of everything. Random spacing. Traces of conversions from one OS to another, braces at the end of line. Of course, they lose points, but that doesn’t make the code any easier to read. In a previous installment, I proposed something to rebuild the whitespaces only. Now, let’s see how we can repair as many defects as possible with an Emacs function.

Let’s start at the beginning: a list of the things to repair:

  • OS-related conversion. Linux/*nixes end lines in \n, Windows in \r\n. Other platforms may use something else. Let’s not concern ourselves with the ZX80.
  • Replace longs series of (white)spaces by only one space.
  • Deal with braces at the end of lines.
  • Reindent everything else using the defined style.

The first two items can be combined. Since transforming \r\n into \n only requires to remove \r, we can bundle series of (white)spaces and \r for replacement. I’m not a regex ninja: I came up with this:

; replaces multiple spaces and stray ^M
(while (re-search-forward "[[:space:]\|?\r]+" nil t)
  (replace-match " " nil nil))

Trailing braces are a bit more complicated. They may, or mayn’t, be preceded by spaces and followedby spaces. This time, the regex is a bit more complicated:

; remove fiendish { at end of (non-empty) line
(while (re-search-forward
 "\\([^[:space:]{?\n]+\\)\\([[:space:]]*\\)\\({\\)\\([[:space:]]*$\\)" nil t)
 (replace-match "\\1\n{" nil nil))

It matches three parts. Something that is not whitespaces, followed by something that is whitespaces, the brace {, then whitespaces to the end of line. OK, that makes four. The only one we’re interested in not replacing is the first (the \\1 argument in replace). Everything else, most of it whitespaces, is replaced by newline, { , newline.

Now, the buffer should be in a rather messy state, possibly with trailing whitespaces and destroyed indentation. Calls to whitespace-cleanup and indent-region should finish the job.

Putting all that together:

(defun cleanup-whole-buffer()
   "Removes ^M, tabs, and reindent whole buffer"
   (interactive)
   (save-excursion
     (undo-boundary)

     (beginning-of-buffer)
     ; replaces multiple spaces and stray ^M
     (while (re-search-forward "[[:space:]\|?\r]+" nil t)
       (replace-match " " nil nil))

     (beginning-of-buffer)
     ; remove fiendish { at end of (non-empty) line
     (while (re-search-forward
             "\\([^[:space:]{?\n]+\\)\\([[:space:]]*\\)\\({\\)\\([[:space:]]*$\\)" nil t)
       (replace-match "\\1\n{" nil nil))

     (beginning-of-buffer)
     (whitespace-cleanup)
     (indent-region (point-min) (point-max) nil)
     )
   )

A few explanations on the other stuff we haven’t discussed yet. The save-excursion primitive saves cursor position so that when the function ends, we are still where we called it from. The undo-boundary makes sure that we won’t need a series of undos to undo the cleanup. beginning-of-buffer moves the cursor… at the beginning of the buffer.

Applying it to the above code snippet, we end up with:

int fonction (int random_spacing)
{
  int niaiseuses;

  for (int i=0;i<random_spacing; i++)
   {
    {
     {
      std::cout
       << bleh
       << std::endl;
     }}

   }
}

There are still a number of issues. For example, i++ has still an extraneous space before it, and we still have two closing braces on the same line. Maybe we should fix that sometime.


Filed under: emacs, hacks Tagged: braces, elisp, n!, newline, whitespace, \r, \r\n
-1:-- Undo that mess (Post Steven Pigeon)--L0--C0--September 12, 2017 03:33 PM

Chris Wellons: Gap Buffers Are Not Optimized for Multiple Cursors

Gap buffers are a common data structure for representing a text buffer in a text editor. Emacs famously uses gap buffers — long-standing proof that gap buffers are a perfectly sufficient way to represent a text buffer.

  • Gap buffers are very easy to implement. A bare minimum implementation is about 60 lines of C.

  • Gap buffers are especially efficient for the majority of typical editing commands, which tend to be clustered in a small area.

  • Except for the gap, the content of the buffer is contiguous, making the search and display implementations simpler and more efficient. There’s also the potential for most of the gap buffer to be memory-mapped to the original file, though typical encoding and decoding operations prevent this from being realized.

  • Due to having contiguous content, saving a gap buffer is basically just two write(2) system calls. (Plus fsync(2), etc.)

A gap buffer is really a pair of buffers where one buffer holds all of the content before the cursor (or point for Emacs), and the other buffer holds the content after the cursor. When the cursor is moved through the buffer, characters are copied from one buffer to the other. Inserts and deletes close to the gap are very efficient.

Typically it’s implemented as a single large buffer, with the pre-cursor content at the beginning, the post-cursor content at the end, and the gap spanning the middle. Here’s an illustration:

The top of the animation is the display of the text content and cursor as the user would see it. The bottom is the gap buffer state, where each character is represented as a gray block, and a literal gap for the cursor.

Ignoring for a moment more complicated concerns such as undo and Unicode, a gap buffer could be represented by something as simple as the following:

struct gapbuf {
    char *buf;
    size_t total;  /* total size of buf */
    size_t front;  /* size of content before cursor */
    size_t gap;    /* size of the gap */
};

This is close to how Emacs represents it. In the structure above, the size of the content after the cursor isn’t tracked directly, but can be computed on the fly from the other three quantities. That is to say, this data structure is normalized.

As an optimization, the cursor could be tracked separately from the gap such that non-destructive cursor movement is essentially free. The difference between cursor and gap would only need to be reconciled for a destructive change — an insert or delete.

A gap buffer certainly isn’t the only way to do it. For example, the original vi used an array of lines, which sort of explains some of its quirky line-oriented idioms. The BSD clone of vi, nvi, uses an entire database to represent buffers. Vim uses a fairly complex rope-like data structure with page-oriented blocks, which may be stored out-of-order in its swap file.

Multiple cursors

Multiple cursors is fairly recent text editor invention that has gained a lot of popularity recent years. It seems every major editor either has the feature built in or a readily-available extension. I myself used Magnar Sveen’s well-polished package for several years. Though obviously the concept didn’t originate in Emacs or else it would have been called multiple points, which doesn’t quite roll off the tongue quite the same way.

The concept is simple: If the same operation needs to done in many different places in a buffer, you place a cursor at each position, then drive them all in parallel using the same commands. It’s super flashy and great for impressing all your friends.

However, as a result of improving my typing skills, I’ve come to the conclusion that multiple cursors is all hat and no cattle. It doesn’t compose well with other editing commands, it doesn’t scale up to large operations, and it’s got all sorts of flaky edge cases (off-screen cursors). Nearly anything you can do with multiple cursors, you can do better with old, well-established editing paradigms.

Somewhere around 99% of my multiple cursors usage was adding a common prefix to a contiguous serious of lines. As similar brute force options, Emacs already has rectangular editing, and Vim already has visual block mode.

The most sophisticated, flexible, and robust alternative is a good old macro. You can play it back anywhere it’s needed. You can zip it across a huge buffer. The only downside is that it’s less flashy and so you’ll get invited to a slightly smaller number of parties.

But if you don’t buy my arguments about multiple cursors being tasteless, there’s still a good technical argument: Gap buffers are not designed to work well in the face of multiple cursors!

For example, suppose we have a series of function calls and we’d like to add the same set of arguments to each. It’s a classic situation for a macro or for multiple cursors. Here’s the original code:

foo();
bar();
baz();

The example is tiny so that it will fit in the animations to come. Here’s the desired code:

foo(x, y);
bar(x, y);
baz(x, y);

With multiple cursors you would place a cursor inside each set of parenthesis, then type x, y. Visually it looks something like this:

Text is magically inserted in parallel in multiple places at a time. However, if this is a text editor that uses a gap buffer, the situation underneath isn’t quite so magical. The entire edit doesn’t happen at once. First the x is inserted in each location, then the comma, and so on. The edits are not clustered so nicely.

From the gap buffer’s point of view, here’s what it looks like:

For every individual character insertion the buffer has to visit each cursor in turn, performing lots of copying back and forth. The more cursors there are, the worse it gets. For an edit of length n with m cursors, that’s O(n * m) calls to memmove(3). Multiple cursors scales badly.

Compare that to the old school hacker who can’t be bothered with something as tacky and modern (eww!) as multiple cursors, instead choosing to record a macro, then play it back:

The entire edit is done locally before moving on to the next location. It’s perfectly in tune with the gap buffer’s expectations, only needing O(m) calls to memmove(3). Most of the work flows neatly into the gap.

So, don’t waste your time with multiple cursors, especially if you’re using a gap buffer text editor. Instead get more comfortable with your editor’s macro feature. If your editor doesn’t have a good macro feature, get a new editor.

If you want to make your own gap buffer animations, here’s the source code. It includes a tiny gap buffer implementation:

-1:-- Gap Buffers Are Not Optimized for Multiple Cursors (Post)--L0--C0--September 07, 2017 01:34 AM

Chen Bin (redguardtoo): Split Emacs window with certain ratio

Emacs window with certain ratio :en:emacs:

The idea comes from yangdaweihit. Here is the implementation.

(defvar my-ratio-dict
  '((1 . 1.61803398875)
    (2 . 2)
    (3 . 3)
    (4 . 4)
    (5 . 0.61803398875))
  "The ratio dictionary.")

(defun my-split-window-horizontally (&optional ratio)
  "Split window horizontally and resize the new window.
Always focus bigger window."
  (interactive "P")
  (let* (ratio-val)
    (cond
     (ratio
      (setq ratio-val (cdr (assoc ratio my-ratio-dict)))
      (split-window-horizontally (floor (/ (window-body-width)
                                           (1+ ratio-val)))))
     (t
      (split-window-horizontally)))
    (set-window-buffer (next-window) (other-buffer))
    (if (or (not ratio-val)
            (>= ratio-val 1))
        (windmove-right))))

(defun my-split-window-vertically (&optional ratio)
  "Split window vertically and resize the new window.
Always focus bigger window."
  (interactive "P")
  (let* (ratio-val)
    (cond
     (ratio
      (setq ratio-val (cdr (assoc ratio my-ratio-dict)))
      (split-window-vertically (floor (/ (window-body-height)
                                         (1+ ratio-val)))))
     (t
      (split-window-vertically)))
    ;; open another window with other-buffer
    (set-window-buffer (next-window) (other-buffer))
    ;; move focus if new window bigger than current one
    (if (or (not ratio-val)
            (>= ratio-val 1))
        (windmove-down))))

(global-set-key (kbd "C-x 2") 'my-split-window-vertically)
(global-set-key (kbd "C-x 3") 'my-split-window-horizontally)

Usage is simple. For example, C-x 2 is similar to original split-winddow-vertically while C-u 1 C-x 2 split the window in golden ratio.

-1:-- Split Emacs window with certain ratio (Post Chen Bin)--L0--C0--September 05, 2017 01:26 PM

Raimon Grau: Everyone welcome Wilfred to the emacs hall of fame.

The recent emacs' hall of fame: Magnars, Malabarba, abo-abo..... and now, we have Wilfred Hughes.

Thank you all for inspiring us, each one with different styles, influences and strategies. Kudos!


-1:-- Everyone welcome Wilfred to the emacs hall of fame. (Post Raimon Grau (noreply@blogger.com))--L0--C0--August 31, 2017 06:38 PM

Wilfred Hughes: Helpful: Adding Contextual Help to Emacs

I’ve just released Helpful, a new way of getting help in Emacs!

The *Help* built-in to Emacs is already pretty good. Helpful goes a step further and includes lots of contextual info. Let’s take a look.

Have you ever wondered which major modes have a keybinding for a function? Helpful reports keybindings in all keymaps!

When you’re hacking on some new code, you might end up with old function aliases after renaming a function. Helpful provides discoverable debug buttons, so you don’t need to remember fmakunbound.

Helpful also has strong opinions on viewing docstrings. Summaries are given focus, and text is fontified. We solve the text-quoting-style debate by removing superfluous puncuation entirely.

Helpful will even show all the references to the symbol you’re looking at, using elisp-refs. This is great for understanding how and where a function is used.

Finally, Helpful will rifle through your Emacs instance to find source code to functions:

  • If you’ve defined a function interactively, Helpful will use edebug properties to find the source code.

  • If Emacs can only find the raw closure, helpful will convert it back to an equivalent defun.

  • If Emacs can only find the byte-compiled files, helpful will just pretty-print that.

I’ve just released v0.1, so there will be bugs. Please give it a try, and let me know what you think, or how we can make it even more, well, helpful!

-1:-- Helpful: Adding Contextual Help to Emacs (Post Wilfred Hughes (me@wilfred.me.uk))--L0--C0--August 30, 2017 12:00 AM

emacsninja: Parsing the Hard Way

Hello again and sorry for the long downtime! My current Emacs project is an EPUB reader, something that requires parsing XML and validating/extracting data from it. The former can be done just fine in Emacs Lisp with the help of libxml2 (or alternatively, xml.el), for the latter there is no good solution. Typically people go for one of the following approaches:

  • Don’t parse at all and just use regular expressions on the raw XML. This works somewhat okayish if your input is predictable and doesn’t change much.
  • Parse and walk manually through the parse tree with car, cdr and assoc. Rather tedious and requires writing your own tree traversal functions for anything less than static XML.
  • Invent your own library and use a selector DSL for DOM traversal. I’ve seen a few of those, like xml+.el, enlive.el and xml-query.el, however they support relatively little features in their selectors, use their own language instead of a well-established one (such as CSS selectors or XPath) and are usually not available from a package archive for easy installation.

As I’m a big fan of APIs like Python’s lxml with the cssselect module and used the esxml package before, I decided to implement CSS selectors for it. The general strategy was to take parse a CSS selector into a suitable form, do tree traversal by interpreting the parse tree and return the nodes satisfying the selector. Surprisingly enough, the hardest part of this were the parsing bits, so I’ll go into a bit more of detail on how you’d do it properly without any dependencies.

The approach taken in esxml-query.el is recursive-descent parsing, as seen in software like GCC. Generally speaking, a language can be described by a set of rules where the left side refers to its name and the right side explains what it expands to. Expansions are sequences of other rules or constants (which naturally cannot be expanded) and may contain syntactic sugar, such as the Kleene star (as seen in regular expressions). Given an input string described by the grammar, a parser breaks it down according to its rules until it has found a valid combination. The easiest way to turn a grammar into code is by expressing it with a function for each rule, with each function being free to call others. Success and failure can be expressed by returning a piece of the parse tree, a special sentinel value (I’ve chosen to return nil if the rule wasn’t completely matched) or throwing an error, thereby terminating the computation. If all recursively called rule functions returned a bit of the parse tree, the top-level call returns the complete parse tree and the parsing attempt has been successful.

Traditionally there is an extra step before parsing the string, as it’s a bit tedious to express the terminating rules as a sequence of characters, the string is typically preprocessed by a so-called lexer into a list of tagged tokens. This is relatively simple to do in Emacs Lisp by treating the string like a buffer, finding a token that matches the current position, adding it to the list of found tokens and advancing the position until the input has been exhausted. There is one non-trivial problem though, depending on the token definitions it can happen that there are two different kinds of tokens for a given position in the input string. A simple solution here is picking the longer match, this is why the tokenization in esxml--tokenize-css-selector finds all possible matches and picks the longest one.

The syntactical sugar used for the official CSS grammars consists of alternation (|), grouping ([...]), optionals (?) and greedy repetition (* and +). Given the basic token operations (peek) (return first token in the stream) and (next) (pop first token in the stream), it’s straight-forward to translate them to working code by using conditionals and loops. For example, the rule whitespace: SPACE* is consumed by calling (next) while (pop) returns a whitespace. To make things easier, I’ve also introduced an (accept TYPE) helper that uses (peek) to check whether the following token matches TYPE and either consumes it and returns the value or returns nil without consuming. With it the preceding example can be shortened to (while (accept 'space)). Similarly, alternation is expressed with cond and grouping with a while where the body checks whether the grouped content could be matched.

This parsing strategy allows for highly flexible error reporting going beyond “Invalid selector” errors I’ve seen previously in a browser console as you immediately know at which place the parser fails and are free to insert code dealing with the error as you see fit. Be warned though that you must understand the grammar well enough to transform it into a more suitable form, yet equivalent form if you run into rules that are hard or even impossible to express as code. Debugging isn’t too bad either, you can observe the junctions taken by your code and quickly spot at which it goes wrong.

I’m looking forward to venture into parser combinators and PEGs next as they follow the same approach, but involve less code to achieve similar results.

-1:-- Parsing the Hard Way (Post Vasilij Schneidermann)--L0--C0--August 26, 2017 09:52 PM

Chris Wellons: Vim vs. Emacs: The Working Directory

Vim and Emacs have different internals models for the current working directory, and these models influence the overall workflow for each editor. They decide how files are opened, how shell commands are executed, and how the build system is operated. These effects even reach outside the editor to influence the overall structure of the project being edited.

In the traditional unix model, which was eventually adopted everywhere else, each process has a particular working directory tracked by the operating system. When a process makes a request to the operating system using a relative path — a path that doesn’t begin with a slash — the operating system uses the process’ working directory to convert the path into an absolute path. When a process forks, its child starts in the same directory. A process can change its working directory at any time using chdir(2), though most programs never need to do it. The most obvious way this system call is exposed to regular users is through the shell’s built-in cd command.

Vim’s spiritual heritage is obviously rooted in vi, one of the classic unix text editors, and the most elaborate text editor standardized by POSIX. Like vi, Vim closely follows the unix model for working directories. At any given time Vim has exactly one working directory. Shell commands that are run within Vim will start in Vim’s working directory. Like a shell, the cd ex command changes and queries Vim’s working directory.

Emacs eschews this model and instead each buffer has its own working directory tracked using a buffer-local variable, default-directory. Emacs internally simulates working directories for its buffers like an operating system, resolving absolute paths itself, giving credence to the idea that Emacs is an operating system (“lacking only a decent editor”). Perhaps this model comes from ye olde lisp machines?

In contrast, Emacs’ M-x cd command manipulates the local variable and has no effect on the Emacs process’ working directory. In fact, Emacs completely hides its operating system working directory from Emacs Lisp. This can cause some trouble if that hidden working directory happens to be sitting on filesystem you’d like to unmount.

Vim can be configured to simulate Emacs’ model with its autochdir option. When set, Vim will literally chdir(2) each time the user changes buffers, switches windows, etc. To the user, this feels just like Emacs’ model, but this is just a convenience, and the core working directory model is still the same.

Single instance editors

For most of my Emacs career, I’ve stuck to running a single, long-lived Emacs instance no matter how many different tasks I’m touching simultaneously. I start the Emacs daemon shortly after logging in, and it continues running until I log out — typically only when the machine is shut down. It’s common to have multiple Emacs windows (frames) for different tasks, but they’re all bound to the same daemon process.

While with care it’s possible to have a complex, rich Emacs configuration that doesn’t significantly impact Emacs’ startup time, the general consensus is that Emacs is slow to start. But since it has a really solid daemon, this doesn’t matter: hardcore Emacs users only ever start Emacs occasionally. The rest of the time they’re launching emacsclient and connecting to the daemon. Outside of system administration, it’s the most natural way to use Emacs.

The case isn’t so clear for Vim. Vim is so fast that many users fire it up on demand and exit when they’ve finished the immediate task. At the other end of the spectrum, others advocate using a single instance of Vim like running a single Emacs daemon. In my initial dive into Vim, I tried the single-instance, Emacs way of doing things. I set autochdir out of necessity and pretended each buffer had its own working directory.

At least for me, this isn’t the right way to use Vim, and it all comes down to working directories. I want Vim to be anchored at the project root with one Vim instance per project. Everything is smoother when it happens in the context of the project’s root directory, from opening files, to running shell commands (ctags in particular), to invoking the build system. With autochdir, these actions are difficult to do correctly, particularly the last two.

Invoking the build

I suspect the Emacs’ model of per-buffer working directories has, in a Sapir-Whorf sort of way, been responsible for leading developers towards poorly-designed, recursive Makefiles. Without a global concept of working directory, it’s inconvenient to invoke the build system (M-x compile) in some particular grandparent directory that is the root of the project. If each directory has its own Makefile, it usually makes sense to invoke make in the same directory as the file being edited.

Over the years I’ve been reinventing the same solution to this problem, and it wasn’t until I spent time with Vim and its alternate working directory model that I truly understood the problem. Emacs itself has long had a solution lurking deep in its bowels, unseen by daylight: dominating files. The function I’m talking about is locate-dominating-file:

(locate-dominating-file FILE NAME)

Look up the directory hierarchy from FILE for a directory containing NAME. Stop at the first parent directory containing a file NAME, and return the directory. Return nil if not found. Instead of a string, NAME can also be a predicate taking one argument (a directory) and returning a non-nil value if that directory is the one for which we’re looking.

The trouble of invoking the build system at the project root is that Emacs doesn’t really have a concept of a project root. It doesn’t know where it is or how to find it. The vi model inherited by Vim is to leave the working directory at the project root. While Vim can simulate Emacs’ working directory model, Emacs cannot (currently) simulate Vim’s model.

Instead, by identifying a file name unique to the project’s root (i.e. a “dominating” file) such as Makefile or build.xml, then locate-dominating-file can discover the project root. All that’s left is wrapping M-x compile so that default-directory is temporarily adjusted to the project’s root.

That looks very roughly like this (and needs more work):

(defun my-compile ()
  (interactive)
  (let ((default-directory (locate-dominating-file "." "Makefile")))
    (compile "make")))

It’s a pattern I’ve used again and again and again, working against the same old friction. By running one Vim instance per project at the project’s root, I get the correct behavior for free.

-1:-- Vim vs. Emacs: The Working Directory (Post)--L0--C0--August 22, 2017 04:51 AM

emacspeak: Emacs Start-Up: Speeding It Up

Emacs Start-Up: Speeding It Up

1 TL;DR:

Describes my Emacs start-up file, and what I did to speed it up from
12 seconds to under 4 seconds.

2 Overview Of Steps

  • Byte-compile start-up files.
  • Temporarily increase gc-cons-threshold during startup.
  • Load package autoloads (not packages) during start-up.
  • Use eval-after-load to advantage for post-package setup.
  • Lexically bind file-name-handler-alist to nil if start-up
    is split across many files.
  • Used memoization to avoid network lookup of current location during startup.


I have a large number of elpa/melpa packages installed:

(length load-path)
400


With the above, my emacs (Emacs 26 built from Git) startup time is on
average 4 seconds. This includes starting up emacspeak (including
speech servers), as well as launching a number of project-specific
shell buffers. Given that I rarely restart Emacs, the startup time is
academic — but speeding up Emacs startup did get me to clean-up my
Emacs setup.


3 Introduction

I have now used Emacs for more than 25 years, and my Emacs start-up
file
has followed the same structure through this time.


  1. The init file defines a start-up-emacs function that does the
    bulk of the work.
  2. Package-specific configuration is split up into
    <package>-prepare.el files.
  3. All of these files are byte-compiled.

As a first step, I added code to my start-up file to time the loading
of various modules.


4 Load Byte-Compiled Start-Up File

I keep my emacs-startup.el checked into GitHub.
My Emacs init-file is a symlink to the byte-compiled version of the
above — this is something that goes back to my time as a
grad-student at Cornell (when GitHub of course did not exist).
That is also when I originally learnt the trick of temporarily setting
gc-cons-threshold to 8MB — Emacs' default is 800K.


5 Package Autoloads And eval-after-load

Over time, some of the package-specific setup files had come to
directly load packages — it just made it easier to do
package-specific setup at the time. As part of the cleanup, I updated
these to strictly load package-autoload files and wrapped post-package
setup code in eval-after-load — this is effectively the same as
using use-package.



6 Loading Files Faster

Emacs has an extremely flexible mechanism for loading files — this
means you can load compressed, encrypted or remote files without
having to worry about it. That flexibility comes at a cost — if you
are sure you dont need this flexibility during start-up, then locally
binding file-name-handler-alist to nil is a big win — in my
case, it sped things up by 50%.


7 Avoid Network Calls During Start-Up

In my case, I set calendar-latitude and calendar-longitude by
geocoding my address — geocoding is done by calling the Google Maps
API. The geocoding API is plenty fast that you normally dont notice
it — but it was adding anywhere from 1–3 seconds during
startup. Since my address doesn't change that often, I updated module
gmaps to use a memoized version. My address is set via Customize,
and the geocoded lat/long is saved to disk automatically.





8 References

  1. Emacs Speed What got it all started.
  2. file-name-handler-alist The article that gave me the most useful
    tip of them all.


Net

-1:-- Emacs Start-Up: Speeding It Up (Post T. V. Raman (noreply@blogger.com))--L0--C0--August 21, 2017 08:01 PM

sachachua: 2017-08-21 Emacs news

Links from reddit.com/r/emacs, /r/orgmode, /r/spacemacs, Hacker News, planet.emacsen.org, YouTube, the changes to the Emacs NEWS file, and emacs-devel.

Past Emacs News round-ups

-1:-- 2017-08-21 Emacs news (Post Sacha Chua)--L0--C0--August 21, 2017 07:16 AM

sachachua: 2017-08-14 Emacs news

Links from reddit.com/r/emacs, /r/orgmode, /r/spacemacs, Hacker News, planet.emacsen.org, YouTube, the changes to the Emacs NEWS file, and emacs-devel.

Past Emacs News round-ups

-1:-- 2017-08-14 Emacs news (Post Sacha Chua)--L0--C0--August 14, 2017 06:33 AM

emacshorrors: make-temp-name

Update: Reddit points out that this has been fixed on master by replacing most of the code with a call to gnulib’s gen_tempname.

For someone not terribly experienced in writing safe programs, one can only hope that building blocks like make-temp-file are doing the right thing and cannot be subverted by a malicious third party. The general advice here is that it’s preferable to use the primitive for creating the temporary file instead of the primitive to generate its name. Now, does Emacs reuse mkstemp(3) for this? Or at least tmpnam(3)? Of course not! Where we go, we can just invent our own source of randomness:

make-temp-file looks as follows:

static const char make_temp_name_tbl[64] =
{
  'A','B','C','D','E','F','G','H',
  'I','J','K','L','M','N','O','P',
  'Q','R','S','T','U','V','W','X',
  'Y','Z','a','b','c','d','e','f',
  'g','h','i','j','k','l','m','n',
  'o','p','q','r','s','t','u','v',
  'w','x','y','z','0','1','2','3',
  '4','5','6','7','8','9','-','_'
};

static unsigned make_temp_name_count, make_temp_name_count_initialized_p;

/* Value is a temporary file name starting with PREFIX, a string.

   The Emacs process number forms part of the result, so there is
   no danger of generating a name being used by another process.
   In addition, this function makes an attempt to choose a name
   which has no existing file.  To make this work, PREFIX should be
   an absolute file name.

   BASE64_P means add the pid as 3 characters in base64
   encoding.  In this case, 6 characters will be added to PREFIX to
   form the file name.  Otherwise, if Emacs is running on a system
   with long file names, add the pid as a decimal number.

   This function signals an error if no unique file name could be
   generated.  */

Lisp_Object
make_temp_name (Lisp_Object prefix, bool base64_p)
{
  Lisp_Object val, encoded_prefix;
  ptrdiff_t len;
  printmax_t pid;
  char *p, *data;
  char pidbuf[INT_BUFSIZE_BOUND (printmax_t)];
  int pidlen;

  CHECK_STRING (prefix);

  /* VAL is created by adding 6 characters to PREFIX.  The first
     three are the PID of this process, in base 64, and the second
     three are incremented if the file already exists.  This ensures
     262144 unique file names per PID per PREFIX.  */

  pid = getpid ();

  if (base64_p)
    {
      pidbuf[0] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidbuf[1] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidbuf[2] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidlen = 3;
    }
  else
    {
#ifdef HAVE_LONG_FILE_NAMES
      pidlen = sprintf (pidbuf, "%"pMd, pid);
#else
      pidbuf[0] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidbuf[1] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidbuf[2] = make_temp_name_tbl[pid & 63], pid >>= 6;
      pidlen = 3;
#endif
    }

  encoded_prefix = ENCODE_FILE (prefix);
  len = SBYTES (encoded_prefix);
  val = make_uninit_string (len + 3 + pidlen);
  data = SSDATA (val);
  memcpy (data, SSDATA (encoded_prefix), len);
  p = data + len;

  memcpy (p, pidbuf, pidlen);
  p += pidlen;

  /* Here we try to minimize useless stat'ing when this function is
     invoked many times successively with the same PREFIX.  We achieve
     this by initializing count to a random value, and incrementing it
     afterwards.

     We don't want make-temp-name to be called while dumping,
     because then make_temp_name_count_initialized_p would get set
     and then make_temp_name_count would not be set when Emacs starts.  */

  if (!make_temp_name_count_initialized_p)
    {
      make_temp_name_count = time (NULL);
      make_temp_name_count_initialized_p = 1;
    }

  while (1)
    {
      unsigned num = make_temp_name_count;

      p[0] = make_temp_name_tbl[num & 63], num >>= 6;
      p[1] = make_temp_name_tbl[num & 63], num >>= 6;
      p[2] = make_temp_name_tbl[num & 63], num >>= 6;

      /* Poor man's congruential RN generator.  Replace with
         ++make_temp_name_count for debugging.  */
      make_temp_name_count += 25229;
      make_temp_name_count %= 225307;

      if (!check_existing (data))
        {
          /* We want to return only if errno is ENOENT.  */
          if (errno == ENOENT)
            return DECODE_FILE (val);
          else
            /* The error here is dubious, but there is little else we
               can do.  The alternatives are to return nil, which is
               as bad as (and in many cases worse than) throwing the
               error, or to ignore the error, which will likely result
               in looping through 225307 stat's, which is not only
               dog-slow, but also useless since eventually nil would
               have to be returned anyway.  */
            report_file_error ("Cannot create temporary name for prefix",
                               prefix);
          /* not reached */
        }
    }
}

DEFUN ("make-temp-name", Fmake_temp_name, Smake_temp_name, 1, 1, 0,
       doc: /* Generate temporary file name (string) starting with PREFIX (a string).
The Emacs process number forms part of the result, so there is no
danger of generating a name being used by another Emacs process
\(so long as only a single host can access the containing directory...).

This function tries to choose a name that has no existing file.
For this to work, PREFIX should be an absolute file name.

There is a race condition between calling `make-temp-name' and creating the
file, which opens all kinds of security holes.  For that reason, you should
normally use `make-temp-file' instead.  */)
  (Lisp_Object prefix)
{
  return make_temp_name (prefix, 0);
}

The generated file name is therefore a combination of the prefix, the Emacs PID and three characters from the above table. This makes about 200.000 possible temporary files that can be generated with a given prefix in an Emacs session. This range can be traversed in a negligible amount of time to recreate the state of the RNG and accurately predict the next temporary file name.

(defun make-temp-file (prefix &optional dir-flag suffix)
  "Create a temporary file.
The returned file name (created by appending some random characters at the end
of PREFIX, and expanding against `temporary-file-directory' if necessary),
is guaranteed to point to a newly created empty file.
You can then use `write-region' to write new data into the file.

If DIR-FLAG is non-nil, create a new empty directory instead of a file.

If SUFFIX is non-nil, add that at the end of the file name."
  ;; Create temp files with strict access rights.  It's easy to
  ;; loosen them later, whereas it's impossible to close the
  ;; time-window of loose permissions otherwise.
  (with-file-modes ?\700
    (let (file)
      (while (condition-case ()
                 (progn
                   (setq file
                         (make-temp-name
                          (if (zerop (length prefix))
                              (file-name-as-directory
                               temporary-file-directory)
                            (expand-file-name prefix
                                              temporary-file-directory))))
                   (if suffix
                       (setq file (concat file suffix)))
                   (if dir-flag
                       (make-directory file)
                     (write-region "" nil file nil 'silent nil 'excl))
                   nil)
               (file-already-exists t))
        ;; the file was somehow created by someone else between
        ;; `make-temp-name' and `write-region', let's try again.
        nil)
      file)))

It’s interesting that the docstring of this function states that the return value “is guaranteed to point to a newly created empty file.”. If there were to exist a file for every possible combination for a prefix, this function would just fall into an infinite loop and block Emacs for no apparent reason. Both of these issues have been solved in a better way in glibc.

At least the impact of predicting the name is lessened if one uses make-temp-file instead of make-temp-name on its own. An attacker cannot create a symlink pointing to a rogue location with the predicted name as that would trigger a file-already-exists error and make the function use the next random name. All they could do is read out the file afterwards iff they have the same permission as the user Emacs runs with. A symlink attack can only be executed successfully with a careless make-temp-name user, thankfully I’ve not been able to find one worth subverting on GitHub yet.

Thanks to dale on #emacs for bringing this to my attention!

-1:-- make-temp-name (Post Vasilij Schneidermann)--L0--C0--August 13, 2017 06:37 PM