Typesetting - More Lua Scripting

More Lua Scripting

I keep hearing people say "I need to learn some Lua" or "I need to learn to write automation scripts", and not many seem to really have gotten into it. This should help get you started. You should read lyger's guide first, because I'm not gonna explain the same things again, but I want to provide some more practical tips. Rather than explaining lua itself, I'll explain more about scripts for Aegisub specifically.

Learning Lua is not much of an issue. You can learn all the Lua stuff you need in an hour. It's mostly just if/then/end, the for cycle, gsub, and a few other things.

A large part of what you need is regular expressions, or rather Lua's simplified pattern matching. That, again, is something you can learn in an hour.

What I want to talk about the most is how to work with the Subtitles object, which is not really a matter of Lua, but rather of Aegisub and the ASS format. This is explained in the Aegisub manual, but since that may be confusing for beginners, I'll provide some specific practical examples. The goal is to explain how to write a basic automation script in as simple terms as possible. Once you understand how a script that adds blur works, adding more complex functions will be easy because that's just maths.

Here's the very basics:

script_name="test"
script_description="testing stuff"
script_author="some guy"
script_version="1"

function test(subs, sel, act)
	-- stuff goes here
end

aegisub.register_macro(script_name, script_description, test)

The last line puts an entry in your automation menu. It has 3 parts. 2 of them are defined at the beginning - script_name and script_description. The name will appear in the menu and as an undo entry when you run the script. You can see description in the Automation Manager. The 3rd part means that running this script will run a function called "test".

script_author and script_version aren't really important, but I'm sure you get the idea.

Let's look at function test(subs, sel, act). I probably wrote at least 20 scripts before I actually understood what this is. Since this function is referenced by register_macro, it's the main function of the script, and as such is by default given the Subtitles object to work with. The 3 parts — subtitles, selected lines, and active line — give you 3 things you can work with.

You can name them whatever you want. You just have to stick to the naming. I tend to keep everything short, though I'm sure I'm not the only one who uses subs/sel/act. It's probably best to use these even just because of the fact that others do it too, which makes it easier to make sense of each others' scripts.

subs is the whole subtitles object. You always have to use this. In simple terms, it's like a table of all lines in the ASS script, including headers, styles, etc.

sel is selected lines, and if you want your function applied to all lines, you don't have to use this. You can have function test(subs).

act is the active line, and you probably won't need it very often. You can use it for functions that are supposed to run on only one line or read some info from the active line. If you select one line and use sel, it's pretty much the same as using act.

The "stuff goes here" part is where the actual function will be written.

Here's an example of a simple function that runs on the whole script:

function test(subs)
    for i=1,#subs do
        if subs[i].class=="dialogue" then
            line=subs[i]
            text=subs[i].text
	    
	    line.effect="test"
	    
	    line.text=text
            subs[i]=line
	end
    end
    aegisub.set_undo_point(script_name)
end

The green part is what you'll usually have for every script that runs on all lines.
The purple part is the actual specific function.

#subs is how many lines there are in subs (including headers and all). If the ASS file has 200 lines, the for cycle will run 200 times. You only want to apply this to dialogue lines, not to styles or headers, so you have to specify this condition:
if subs[i].class=="dialogue".

So, the iterator i is going from 1 to 200, so when it's let's say 25, subs[i] is subs[25], or the 25th line in the ASS file. line=subs[i] means that you create element line and put subs[i] into it. Note that single = does not mean "equals". You could read it as "line is now subs[25]" (when i is 25). Then you work with line, and for it to be of any use, you have to put the line back in subs[i] at the end. line is something you created, subs[i] is the actual line in the subtitles, so you need the subs[i]=line at the end.

You see the same with text, even though in this case I don't need it, but usually you work with text the most. The purpose is to use something that's short instead of typing subs[i].text all the time. Also, it could also say text=line.text since line is already defined at that point. You can name those things anything you want, for example just l and t, which may be good for a short script, but again, line and text are commonly used by most of us, so it keeps things clear.

aegisub.set_undo_point(script_name) sets the undo point, and should be at the end of the main function, though I think Aegisub does it automatically anyway. You can, however, create multiple undo points, like for every function in your script, but it's usually only confusing and not very practical.

Now, the actual thing this script does is line.effect="test". line.effect is the effect filed, and here it takes the value "test", which means the text of the effect field will be "test". So what this script does is that it puts "test" in the effect field of every dialogue line.

Now, the thing I did here with text would have made more sense if I'd done it with "effect" instead (because I didn't actually do anything with text), ie. effect=line.effect. Then the purple line could be just effect="test". You have to always think about what pays off and what doesn't. For this script, the purple line would be 5 characters shorter, but you would need two extra lines, to assign value to effect and to give the value back to line.effect, so that doesn't pay off. If you use something only once, you might as well keep it as is. The more often you use something, the more sense it makes to assign it to something with a short name.

Let's look at working with selected lines now.

function test(subs, sel)
    for x, i in ipairs(sel) do
        line=subs[i]
        text=line.text
	    
	if text:match("you're tie is crooked") then line.effect="the editor is an idiot" end
	    
	line.text=text
        subs[i]=line
    end
    aegisub.set_undo_point(script_name)
    return sel
end

I'm too lazy to do high-effort html, but you can paste the code into Notepad++ for proper syntax highlighting. You can see the for loop is using ipairs. sel consists of pairs of two numbers. The first is the index of the selection, and the second is the index in subs. If the ASS file has 50 lines and you select the last 3, then the x in ipairs will be 1, 2, and 3, and i will be 48, 49, and 50. In the previous example, x and i are the same thing because it goes through all lines.

Don't forget that the function must have (subs, sel). Of course you can always include the sel even if you're not using it, just to be sure that you never forget it. I pretty much always use (subs, sel) and in rare cases add act.

The purple line is a basic example of an action dependent on a condition. You can read it as:
If you find "you're tie is crooked" in text, put "the editor is an idiot" in effect.

return sel makes sure that you keep the selection you started with (or new selection if you changed it).

You could also use for i=1,#sel do instead of ipairs, like we did with subs. If your script is deleting or adding lines, you need to go backwards, because the new/deleted lines are changing the index of the selected lines. If you delete line 1, line 2 becomes line 1 and line 3 becomes line 2, so going in the normal direction you'd either be skipping lines or going through them twice.

    for i=#sel,1,-1 do
        local line=subs[sel[i]]
	
	...
	
	subs[sel[i]]=line

This is what I use. It starts at the last selected line and goes backwards to the first. i=a,b,c means it goes from a to b by steps of c. i=8,2,-2 would go through lines 8, 6, 4, 2. The default for steps is 1, so unless you go backwards like here, you don't need to write it.

An important thing is that if you use this, then line is subs[sel[i]], not subs[i]. here i is the number of the selected line, starting from 1, so if you used subs[i] when i is 1, you'd have the first line in the ASS file, not the first selected line. sel[3] is the number in subs corresponding to the 3rd selected line.

This thing kept confusing me for quite a while, so let's try a more specific example. Let's say subs has 50 lines (including headers and styles) and you select last 5 lines.
sel would now be {46,47,48,49,50}.
sel[1]==46
sel[2]==47
sel[5]==50
Using the for cycle will go from 1 to 5, so i will be 1-5, and sel[i] will be 46-50. subs[i] would be lines 1-5, which is not what you want. subs[sel[i]] will be lines 46-50. That's what you need.

So, that about covers the structure of the main function. With this and a bunch of if/then/end lines you can make simple scripts.

Now, let's look at some ways to manipulate text.


text=text.." end of line"

This attaches a string to the end of text.


text="Start of line "..text

This attaches a string to the beginning of text. This way you can add tags:


text="{\\blur0.6}"..text

This is how the old "Add edgeblur" script worked. Of course, this doesn't join it with other tags, doesn't replace existing blur, etc.


text="This is a test."

text=""

Here the first one sets "This is a test." as text, deleting whatever was there before.
The second one just deletes text, by making it an empty string.

gsub

gsub is pretty much the core of every automation script. It's what replaces one thing with another.
It works like this:


text2=text:gsub("string A","string B")

This translates to: If you find "string A" in text, replace it with "string B" and assign the modified text to text2.
I used text2 for the sake of explanation, but normally you'd use text=text:gsub, which just keeps the result in text.

"I could not see him."

text=text:gsub("could not","couldn't")

» "I couldn't see him."

This way you can, for example, write a script for making contractions.


text=text
:gsub("([cws]h?)ould not","%1ouldn't")
:gsub("did not","didn't")
:gsub("was not","wasn't")

You only need the text=text part once. Then you can add as many :gsub lines as you want and create a whole list of contractions.
While you can just add them one by one, you can also use pattern matching (lua's version of regexp) to keep the code short. The first gsub line will match could, would, and should. It will also match chould and whould, but as those don't exist, that doesn't bother us. The part in parentheses is a capture. [cws] means "c or w or s", and h? means "h if it's there or nothing if it's not". In standard regexp you could replace this capture with (c|w|s|ch|wh|sh) to get the same result. Lua doesn't have this option, so sometimes you just need more lines than you'd need with full regexp.
The %1 is the capture, so whatever it matched in the first part will be pasted in the second.

Now we can use this to replace existing blur value with our new value.


text=text:gsub("\\blur[%d%.]+","\\blur0.6")

Blur can have numbers and a decimal point, so use [%d%.]+ to match anything that's a number or a dot as many times in a row as possible, so whatever value the blur has will be replaced with 0.6.
The same effect could be achieved in different ways:


text=text:gsub("(\\blur)[%d%.]+","%10.6")
text=text:gsub("\\blur[^\\}]+","\\blur0.6")

The first one captures the \\blur part, so you don't have to type it again (may be useful if it's something longer).
The second one matches anything that's not a backslash or } as many times it can, ie. until it hits something that IS a backslash or }, which is where the blur value would logically end. This can pretty efficiently capture the whole value of any tag, since any tag has to end with \ or }. Of course with tags like \pos, you'll want to capture the coordinates rather than include the ().

You can also use a function within gsub:


text=text:gsub("(\\blur)([%d%.]+)",function(a,b) return a .. 2*b end)

a and b are the captures. The function uses them, returning a (\\blur) as is, and multiplying b by 2, thus giving you the blur value doubled. So you can divide your pattern into a bunch of captures and do some operations with them.

Here's how you capitalize the first letter of a line:


text=text:gsub("^(%l)([^{]-)", function (c,d) return c:upper()..d end)

First capture is a lowercase letter at the beginning of a line. Second capture is from after the first letter until {, meaning before it hits a comment or tag. Returned is the first capture capitalized and second capture as is (which means it doesn't even have to be there in this case, but you could for example return d:lower() to be sure that the rest of the string will be lowercase).

Now you can understand how my Teleporter works:


text=text:gsub("\\pos%(([%d%.%-]+),([%d%.%-]+)%)",function(a,b) return "\\pos(".. a+xx.. "," ..b+yy..")" end)

Notice that literal ( and ), as in not captures, have to be escaped with %. Coordinates captures are [%d%.%-]+. You see that compared to what we had for blur, thse include %-, because coordinates can be negative. If you don't include that, the script will only work when coordinates are positive. So it captures X and Y coordinate, and adds to them the user input, which is xx and yy here. Yep, that simple.

One more example. This is "reverse move":


text=text:gsub("\\move%(([%d%.%-]+),([%d%.%-]+),([%d%.%-]+),([%d%.%-]+)","\\move(%3,%4,%1,%2")

That's the whole thing. Capture the 4 coordinates and return them in changed order: 3, 4, 1, 2. This is a good example of how captures can be useful. You may notice that the ( is not escaped in the right half. Things in the right part of gsub don't need to be escaped with % - it's only used for captures. Only the left part uses regexp.

Escape characters

When using regexp, these characters have to be escaped with %: . ? * - + ( ) [ ] and % itself

Characters that have to be escaped with \: " ' and \ itself

Backslashes and quotation marks always have to be escaped, even in literal strings. (An actual quotation mark ends the string.)
If you want to match an actual question mark in a sentence, you must match %?.

Regular Expressions

I'm not gonna explain regexp from scratch, because there's plenty about that on the Internet. What I'm gonna do is list some patterns that are useful for Aegisub automation scripts.


{[^\\}]-}	-- comment (stuff between { and } that doesn't have a backslash)
{\\[^}]-}	-- tags (stuff between { and } that starts with a backslash)
{[^}]-}		-- comment or tags (stuff between { and })

The third one shows you a typical way of matching stuff between two markers. You match the first marker, then what's-not-the-second-marker with a - or *, and then the second marker.
The difference between - and * is that {.-} matches only one comment or set of tags, while {.*}, if you have a string like "abc{def}ghi{jkl}" will match from the first { to the last }, so "{def}ghi{jkl}". You always have to think about whether you need +, -, or *. If you choose the wrong one, it may still work in simple cases, like if there's only one comment in the line, but it will break on more complex lines. I recommend creating a testing ASS file and fill it with all kinds of different lines, including with mistakes, bad tags, broken comments, etc. Have all combinations of text, tags, and comments, use some transforms, some mocha lines, anything that can be in a script. If you write a function, it needs to do what it's supposed to do no matter what line you apply it to.


%d+		-- sequence of numbers
[%d%.]+		-- sequence of numbers, can have decimal point (values of \bord, \blur, \fscx, and so on)
[%d%.%-]+	-- sequence of numbers, can have decimal point, can be negative (\xshad, \fsp, \frz...)
&H%x+&		-- values for colours and alpha
%([^%)]-%)	-- stuff between ( and )

%(([%d%.%-]+),([%d%.%-]+)%)

This will capture coordinates of \pos or \org. It could also capture fade in and out in \fad, though that doesn't need the -.
For \move, capture 4 coordinates and don't include the ending %), because \move may or may not have timecodes.


[%a']+		-- word (sequence of letters or apostrophes, to match words like "don't")
[^%s]+		-- word (sequence of what's not a space)

You may need different ways of matching a word. The first one here will not include punctuation, the second one will. Sometimes you may need one, sometimes the other. You may also wanna replace %a with %w, if you want to include "words" like AK47 or just count 20 in "I'm 20 years old." as a word.


\\[1234]?c&	-- colour tag (doesn't match value, just matches that the tag is there)

This matches \c, \1c, \2c, \3c, \4c, but not \clip (important!).
(Also note that \\fs matches \\fsp and \\fscx, so be careful about patterns that may match things you don't want.)
Since primary can be \c or \1c, in order to avoid complicated code that would deal with both,
I recommend using this at the beginning:

text=text:gsub("\\1c&","\\c&")

\c is what the inbuilt Aegisub tool creates, so keep those as standard.

Speaking off... tricks like this are often very useful. If your code needs to account for a lot of different things, see if you can reduce the number of these things with some easy trick. A common issue is for example matching the beginning of a word. A word starts either after a space, or at the beginning of a line. You need to match two patterns for that. However, you can start with adding a space at the beginning of a line, then use just one matching pattern, and then remove the space at the end of the script.

Another thing is dealing with lines with and without tags (when working with text). You can start with this:

tags=""
if text:match("^{\\[^}]*}") then tags=text:match("^({\\[^}]*})") end
text=text:gsub("^{\\[^}]*}","")

If the line has no tags, then tags will be an empty string. If it finds tags at the beginning, they will be saved to tags, thus replacing the empty string. Then the gsub deletes the tags.
Now you can work with the text knowing that you have no tags in the way.
When you're done with the text, you'll do this:

text=tags..text

This attaches your saved tags at the start of the line. If there were no tags, you have an empty string saved there, so basically nothing happens.

Another trick I use is when I want to add some tags, and the line may or may not already have some tags.


text="{\\tag}"..text				-- add tag when no tags are present
text=text:gsub("^{\\","{\\tag\\")		-- attach tag before other tags
text=text:gsub("^({\\[^}]-)}","%1\\tag}")	-- attach tag after other tags

These would be the regular options. The second and third depend on what you want to do. Just a matter of preference. It works either way. But you have to first find out if the line has tags, and then use the appropriate method. So again, there's a way to avoid that.

if not text:match("^{\\") then text="{\\}"..text end

You start with adding {\} at the beginning of a line without tags. The second and third method of adding tag now work just fine, but you have an extra backslash somewhere in there. It will end up as either a doubleslash somewhere in there, or at the end before }. So at the end of the script, you do a simple cleanup.

:gsub("\\\\","\\")
:gsub("\\}","}")

:gsub("{}","")

The first two are what you really need. The third is another "cleanup" line, useful when you've removed some tags and possibly ended up with just empty {}. (Of course the gsub is for text.)


\\t%([^%(%)]-%)				-- transforms
\\t%([^%(%)]-%([^%)]-%)[^%)]-%)		-- transforms with \clip in them

The tricky thing about transforms is that they can have () within () if there's a transform for a clip, so to efficiently get all transforms, you always need both patterns. Yeah, this is a bit messy.
Matching transforms is useful when you modify tags' values but don't want to change the tags inside transforms. You create transforms="". Then you match those two patterns and save them for example to tf1 and tf2. Then do transforms=transforms..tf1..tf2 and you'll have transforms saved in the transforms string. Then you remove them from the text with gsub and work with the text... and at the end put them back. This is a bit complex and you actually need gmatch because there may be many trnasforms. So once you get familiar enough with the code and everything, here's what I do:


function trem(tags)
	trnsfrm=""
	for t in tags:gmatch("(\\t%([^%(%)]-%))") do trnsfrm=trnsfrm..t end
	for t in tags:gmatch("(\\t%([^%(%)]-%([^%)]-%)[^%)]-%))") do trnsfrm=trnsfrm..t end
	tags=tags:gsub("(\\t%([^%(%)]+%))","")
	tags=tags:gsub("(\\t%([^%(%)]-%([^%)]-%)[^%)]-%))","")
	return tags
end

You run this function on tags. It goes through every instance of a transform and adds it to the trnsfrm string. When you're done with whatever you're doing to the other tags, you put this string at the end of the tags.

Some more regexp examples:

\\i?clip	-- match clip or iclip
\\[xy]?bord	-- match bord or xbord or ybord

-- remove spaces at the beginning and end of a line
text=text:gsub("^%s+","")  :gsub("%s+$","")

GUI

Here's a very simple GUI:

dialog_config=
{
    {x=0,y=0,width=1,height=1,class="label",label="\\blur",},
    {x=1,y=0,width=1,height=1,class="floatedit",name="blur",value=0.6},
} 	
buttons={"blur","cancel"}
pressed,res=aegisub.dialog.display(dialog_config,buttons)
if pressed=="cancel" then aegisub.cancel() end
if pressed=="blur" then blur(subs, sel) end

dialog_config is a table with all the stuff in the GUI except the buttons. This one contains two things - a label, and an editbox for numbers. The label is "\\blur". That's what you'll see in the GUI, followed by the editbox, which will have "0.6" in it as default value.
buttons is the buttons you will click on.
aegisub.dialog.display is what displays the GUI, using the dialog_config and buttons. pressed determines which button was pressed. (You can name this whatever you want.)
res is the user input from editboxes, checkboxes, etc.
I started with "pressed, result", as that's what lyger's guide had, but over time changed "result" to "res" because it gets typed a lot. Again, it can be anything you want.
As you can see in the last line, if you press the "blur" button, function blur(subs, sel) will be executed.
To get the blur value inside that function, you'll use this:

blurtag="\\blur"..res.blur

res.blur is the value given by the user, so if you type "1.5" in the editbox, blurtag will now be "\\blur1.5".

Other types of input:

-- Checkbox
{x=0,y=1,width=1,height=1,class="checkbox",name="nobreak",label="remove linebreaks",value=false},

Usage: if res.nobreak==true then (stuff)
You don't have to type "==true" because that's implied by default, so it can be just:
if res.nobreak then followed by what should be done if the checkbox is checked.

The opposite would be either if res.nobreak==false then or if not res.nobreak then.


-- Dropdown menu
{x=3,y=0,width=1,height=1,class="dropdown",name="an",
	items={"an1","an2","an3","an4","an5","an6","an7","an8","an9"},value="an8"},


-- Colour
{x=4,y=0,width=1,height=1,class="color",name="c1"},

The colours come in this format: "#000000" in RRGGBB order. The actual tag is "&H000000&" in BBGGRR order, so you have to transform the result, for example like this:

colour1=res.c1:gsub("#(%x%x)(%x%x)(%x%x)","&H%3%2%1&")

Debugging / logging

This is very useful when you're getting errors. If you want to find where exactly your script has failed, you can use aegisub.log. There are two main ways to use it. One is to check whether the script passed a certain point, and another is to check a specific value.

The first one will usually go after then, like this:

if text:match("\\blur") then aegisub.log("check")	-- function continues after this

If this "check" gets logged, you know the condition has been met, ie. "\blur" was found in the text. This tells you how far the function has gone before something broke and helps you narrow down where the problem is.

The second way is for checking the value of something.

aegisub.log("abc: "..abc)

The first part is just text, so that you know what's being logged if you're using several logs. The part after .. is the value of the variable called "abc".
I often log multiple things when testing/debugging, and it gets chaotic unless each log is on a new line, so I automatically put "\n" into each log:

aegisub.log("\n text: "..text)

This would be the most common one. Usually you work with text and make changes to it, so this shows you which changes did or didn't happen.
So if you're getting errors, use logging to find out what exactly breaks and where.

Various stuff


-- duration of a line
dur=line.end_time-line.start_time


-- character count
-- this counts the string length after removing comments/tags. you can add :gsub(" ","") to not count spaces.
visible=text:gsub("{[^}]-}","")
characters=visible:len()


-- working with the line after the current one
-- if you're on the last line, there is no next line and you'd get an error, thus the condition
if i~=#subs then nextline=subs[i+1] end


-- working with previous line
-- previous line always exists, but the one before the first dialogue line would be a style or something
prevline=subs[i-1]
if prevline.class=="dialogue" then blabla end


-- counting stuff. "count=0" must be at the beginning, before the for loop.
count=0
-- then in the main function:
if text:match("stuff") then count=count+1 end
-- at the end, after the for loop, you can log the result like this:
aegisub.log("Stuff apears "..count.." times.")


-- error messages to the user
-- if a script requires \pos tag and the user runs it on a line that doesn't have one, you can do this:
if not text:match("\\pos") then aegisub.dialog.display({{class="label",
	label="No \\pos tag found.",x=0,y=0,width=1,height=2}},{"OK"}) aegisub.cancel() end


-- marking lines that have changed (running gsub doesn't tell you whether the pattern was found)
text2=text
text=text:gsub("\\1c&","\\c&")
if text2~=text then effect=effect.."colour tag modified" end


-- a simple script to convert between clip and iclip
-- you can't do one and then the other, because that would just convert the iclips you made back to clips
-- therefore, you need elseif, which only comes to play if the first condition isn't met
if text:match("\\clip") then text=text:gsub("\\clip","\\iclip")
elseif text:match("\\iclip") then text=text:gsub("\\iclip","\\clip") end

Functions

Click here for some small functions that I've written (or got from someone, in a few cases) and that you can use.

Here is my Italicize script explained line by line. I figured I would take the smallest script I've made and explain it, but it kinda turns out that this one is actually pretty complicated and I could barely make sense of it myself (because it checks the style, deals with inline tags, checks for some mistakes, etc.). Anyway, I explained it as best I could, so hopefully it helps. (Every comment refers to what comes after it.)

This should be more than enough to get you started. Once you learn all this, you can figure out more from looking at existing scripts.

« Back to Typesetting Main