Emacs align-regexp explained in detail with examples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Emacs comes with powerful and useful functions to align text to a specific column, by regexp.
Align is a simplified align-regexp with some predefined rules applying to current major mode. Sometimes it doesn't work well. So we focus on align-regexp here.
== Example ==
Take the following unreasonable code as an example.
#define A 0x41 #define WHAT_EVER 0x1BADB002 #define ABCDE 0xabcde
We want it aligned as follows:
#define A 0x41 #define WHAT_EVER 0x1BADB002 #define ABCDE 0xabcde
== Howto ==
0. Recommended settings before start:
(setq align-to-tab-stop nil)
C-x C-e if you want tabs treated the same as spaces.
(defalias 'ar 'align-regexp)
C-x C-e to help type align-regexp easily.
Or just put them in your emacs init file.
1. Select the target lines.
2. C-u M-x align-regexp
RET, then choose
\(\s-*\)␣[A-Z]
,
1
,
0
,
n
3. Select again and: C-u M-x align-regexp
RET, then choose
\(\s-*\)␣0
,
1
,
0
,
n
Or the previous two steps done in just one command:
C-u M-x align-regexp
RET, then choose
\(\s-*\)␣[A-Z0]
,
1
,
0
,
y
Even more simple one:
C-u M-x align-regexp
RET, then choose
\(\s-*\)␣
,
1
,
0
,
y
Note: ␣
means exactly one space.
== Explain ==
First let's look at the function prototype of align-regexp.
C-h f align-regexp
RET
(align-regexp BEG END REGEXP &optional GROUP SPACING REPEAT)
When used interactively, this function need 4 args entered by the user in the minibuffer:
- REGEXP:
\(\s-*\) [A-Z], 1, 0, n
- GROUP:
\(\s-*\) [A-Z], 1, 0, n
- SPACING:
\(\s-*\) [A-Z], 1, 0, n
- REPEAT:
\(\s-*\) [A-Z], 1, 0, n
REGEXP is the field matched to align with. The character need to be aligned is indicated by this REGEXP. Read on to find out how it works.
In our example, the pattern \(\s-*\)␣[A-Z]
matches the as follows:
#define A 0x41
#define WHAT_EVER 0x1BADB002
#define ABCDE 0xabcde
The subexpression \(\s-*\)
is entered automatically by emacs when we invoke align-regexp with prefix C-u. The matched place in \(\)
is where to insert or truncate characters to fulfill the alignment, and this is usually space field. \s-
means whitespace(space or tab), same as \s␣
and ␣
(if we have no tabs). \s-*
means zero or more spaces.
So \(\s-*\)
matches the lighter background part(space field), and ␣[A-Z]
matches the darker part(a space along with an UPPER character).
Now let's talk about align. Align means make the first character exactly after the \(\)
subexpression aligned at the same column.
In our case, this character is the SPACE before the UPPER character. So these spaces on different lines are to be aligned.
The second arg GROUP is parenthesis group to modify. When invoked interactively there is usually exactly one parenthesis group \(\)
. So just leave default value 1. In some case, we need this to be -1, we will talk about it later when we do a right-alignment.
The third arg SPACING means the amount of spaces we want in the space field. In our case we need all the spaces in the parenthesis group to be deleted, so we provide 0.
The fourth arg REPEAT means whether we want repeating the match and alignment. If we need exactly one match and alignment, then we provide n. See the next section to get more on REPEAT.
== Get a deep understanding of \(\) ==
Let's get a deep understanding of \(\)
. Use the following REGEXP pattern varient to get the alignments done.
C-u M-x align-regexp
RET,
\(\s-+\)[A-Z0]
,
1
,
1
,
y
Note there is no space between ) and [ this time.
\s-+
means 1 or more whitespaces, equals to \s-*␣
if no tabs. This REGEXP matches as follows:
#define A 0x41
#define WHAT_EVER 0x1BADB002
#define ABCDE 0xabcde
The difference is the space field matched and the character chosen to be aligned. All spaces are now in the \(\)
group, and we need one space in the final result, so the third arg SPACING need to be 1 instead of 0. If you need more spaces just provide the value you want. The REGEXP matches two columns and both need aligned, so REPEAT=y.
After understanding how align works, the REGEXP can be simplified:
C-u M-x align-regexp
RET,
\(\s-+\)
,
1
,
1
,
y
This means make whatever character immediately after the last space aligned.
The regexp matches as follows. Notice the subtile difference of the matched part.
#define A 0x41
#define WHAT_EVER 0x1BADB002
#define ABCDE 0xabcde
== Align to the right ==
Now let's talk about alignment to the right side. Say we need the following alignment.
#define A 0x01 #define WHAT_EVER 0x1BADB002 #define ABCDE 0xabcde
We can do it like this:
C-u M-x align-regexp
RET,
\(\s-+[A-Z_]+\)
,
-1
,
1
,
n
.
\s-+
means 1 or more spaces. [A-Z_]+
matches the UPPER and UPPER_WORD.
So this regexp matches as follows(the part in box on each line):
#define A 0x41 #define WHAT_EVER 0x1BADB002 #define ABCDE 0xabcde
Now we set arg GROUP to -1 which means justify. According to the source code docs, justify means DO NOT delete non-whitespace characters in the group and only insert or delete spaces of the initial spaces in the group. The character after the last UPPER is chosen to be the alignment character(the char exactly after \(\)
), i.e. the space in darker background. With GROUP = -1, spaces are inserted or deleted at the left side of the first UPPER to fulfill the alignment. In this way we get right-aligned.
We can match more characters before \(\)
, e.g. .*␣\(\s-*[A-Z_]+\)
, but this is not necessary in our case. If the REGEXP matches more fields and not all them are our target fields, the REGEXP should be changed to match more characters to distinguish the fields.
Post: use
C-u M-x align-regexp
RET,
\(\s-*\)0x
,
1
,
4
,
n
to do a left side alignment for the 0x
part. Or
C-u M-x align-regexp
RET,
\(\s-*0x\)
,
-1
,
4
,
n
to do a right side alignment for the 0x
part.
== More practice ==
①
struct stat64 { unsigned long long st_dev; /* Device. */ unsigned long long st_ino; /* File serial number. */ unsigned int st_mode; /* File mode. */ unsigned int st_nlink; /* Link count. */ unsigned int st_uid; /* User ID of the file's owner. */ unsigned int st_gid; /* Group ID of the file's group. */ unsigned long long st_rdev; /* Device number, if device. */ unsigned long long __pad1; long long st_size; /* Size of file, in bytes. */ int st_blksize; /* Optimal block size for I/O. */ int __pad2; long long st_blocks; /* Number 512-byte blocks allocated. */ int st_atime; /* Time of last access. */ unsigned int st_atime_nsec; int st_mtime; /* Time of last modification. */ unsigned int st_mtime_nsec; int st_ctime; /* Time of last status change. */ unsigned int st_ctime_nsec; unsigned int __unused4; unsigned int __unused5; };
Mark all the lines inside {} before aligh-regexp. (Tips: if you use Evil, vi{
; if you use expand-region, M-x er/mark-inside-pairs
.)
C-u M-x align-regexp
RET,
\(\s-*\)␣[s_]
,
1
,
0
,
n
C-u M-x align-regexp
RET,
\(\s-*\)␣/
,
1
,
4
,
n
struct stat64 { unsigned long long st_dev; /* Device. */ unsigned long long st_ino; /* File serial number. */ unsigned int st_mode; /* File mode. */ unsigned int st_nlink; /* Link count. */ unsigned int st_uid; /* User ID of the file's owner. */ unsigned int st_gid; /* Group ID of the file's group. */ unsigned long long st_rdev; /* Device number, if device. */ unsigned long long __pad1; long long st_size; /* Size of file, in bytes. */ int st_blksize; /* Optimal block size for I/O. */ int __pad2; long long st_blocks; /* Number 512-byte blocks allocated. */ int st_atime; /* Time of last access. */ unsigned int st_atime_nsec; int st_mtime; /* Time of last modification. */ unsigned int st_mtime_nsec; int st_ctime; /* Time of last status change. */ unsigned int st_ctime_nsec; unsigned int __unused4; unsigned int __unused5; };
--------------------------------------------------------------------------------
②
my @primes = ( 1,2,3,5,7, 11,13,17,19,23, 29,31,37,41,43, );
C-u M-x align-regexp
RET,
,\(\s-*\)[0-9]
,
1
,
1
,
y
my @primes = ( 1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, );
C-u M-x align-regexp
RET,
\([0-9]+,\)
,
-1
,
1
,
y
my @primes = ( 1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, );
--------------------------------------------------------------------------------
③
California 423,970 km² Taiwan 36,008 km² Japan 377,944 km² Germany 357,021 km² Iraq 438,317 km² Iran 1,648,195 km² Korea (North+South) 219,140 km² Mexico 1,964,375 km²
C-u M-x align-regexp
RET,
\(\s-*␣[0-9,]+\)
,
-1
,
1
,
n
Or
C-u M-x align-regexp
RET,
\(\s-+[[:digit:],]+\)
,
-1
,
1
,
n
Or
C-u M-x align-regexp
RET,
.*␣\(\s-*[0-9,]+\)
,
-1
,
0
,
n
Or
C-u M-x align-regexp
RET,
.*\(\s-*␣[0-9,]+\s-*\).*
,
-1
,
1
,
n
California 423,970 km² Taiwan 36,008 km² Japan 377,944 km² Germany 357,021 km² Iraq 438,317 km² Iran 1,648,195 km² Korea (North+South) 219,140 km² Mexico 1,964,375 km²