Gm microoptimisation war crime - An experimental optimisation for Garry's Mod

Overview

Wat

This attractively named repository contains a Garry's Mod module that performs a micro optimisation that makes use of LuaJIT's constant folding on all Lua scripts.

The module intercepts the loading of Lua scripts and preprocesses the Lua code to replace all instances of SERVER and CLIENT with true and false depending on the current realm.

The module does not replace these in comments or strings and does not change the size of the file or span of the tokens (by inserting extra whitespace.)

Explanation

In Garry's Mod there are two runtime realms, the server realm and the client realm. Because Lua doesn't have conditional compilation, developers have to use an if statement to determine whether the script is running on the server or the client:

if SERVER then
    print("Hello, server!")
end
if CLIENT then
    print("Hello, client!")
end

The equivalent C preprocessor code would look like this:

#ifdef SERVER
    printf("Hello, server!\n")
#endif
#ifdef CLIENT
    printf("Hello, client!\n")
#endif

Results

First I should probably note that this is just a dumb experiment and I expected it to not have much performance impact.

CPU branch predictors are pretty good these days and LuaJIT should hopefully help too to make this not so much of an issue for performance.

Additionally, it's quite rare that a runtime realm check would find itself in a hot loop or hot function, so the impact of overall server/game performance from such an optimisation is probably completely unnoticable.

Bytecode

The generated bytecode is only slightly different after this hack. It would seem that LuaJIT does not do dead code elimination in this case. It does however eliminate the branch and replaces it with a direct jump.

Sample Code

function helloWorld()
    if SERVER then
        print("Hello SERVER")
    end
    if CLIENT then
        print("Hello CLIENT")
    end
end

Unoptimised

0000	FUNCF    3
0001	GGET     0   0      ; "SERVER"
0002	ISF          0
0003	JMP      1 => 0007
0004	GGET     0   1      ; "print"
0005	KSTR     2   2      ; "Hello SERVER"
0006	CALL     0   1   2
0007	GGET     0   3      ; "CLIENT"
0008	ISF          0
0009	JMP      1 => 0013
0010	GGET     0   1      ; "print"
0011	KSTR     2   4      ; "Hello CLIENT"
0012	CALL     0   1   2
0013	RET0     0   1

Optimised

0000	FUNCF    3
0001	GGET     0   0      ; "print"
0002	KSTR     2   1      ; "Hello SERVER"
0003	CALL     0   1   2
0004	JMP      0 => 0005
0005	JMP      0 => 0009
0006	GGET     0   0      ; "print"
0007	KSTR     2   2      ; "Hello CLIENT"
0008	CALL     0   1   2
0009	RET0     0   1

Benchmarks

jit.off() jit.on()
jit.flush()
print(jit.status()) print()

local seed = math.random(1, 2)

do
    local function globals()
        local low = math.huge
        local high = 0
        local total = 0
        for i = 1, 10 do
            local start = SysTime()
            for i = 1, 100000 do
                local n = 0
                if SERVER then
                    n = n + seed
                end
                if CLIENT then
                    n = n + seed
                end
            end
            local delta = SysTime() - start
            low = math.min(low, delta)
            high = math.max(high, delta)
            total = total + delta
        end
        print("(Global) Unoptimised")
        print("Low: " .. low * (10 ^ 6) .. "us")
        print("High: " .. high * (10 ^ 6) .. "us")
        print("Mean: " .. (total / 10) * (10 ^ 6) .. "us\n")
    end
    globals()
end

do
    local function localised()
        local SERVER = SERVER
        local CLIENT = CLIENT

        local low = math.huge
        local high = 0
        local total = 0
        for i = 1, 10 do
            local start = SysTime()
            for i = 1, 100000 do
                local n = 0
                if SERVER then
                    n = n + seed
                end
                if CLIENT then
                    n = n + seed
                end
            end
            local delta = SysTime() - start
            low = math.min(low, delta)
            high = math.max(high, delta)
            total = total + delta
        end
        print("(Local) Unoptimised")
        print("Low: " .. low * (10 ^ 6) .. "us")
        print("High: " .. high * (10 ^ 6) .. "us")
        print("Mean: " .. (total / 10) * (10 ^ 6) .. "us\n")
    end
    localised()
end

do
    local function localised_no_const()
        local SERVER = math.random(1, 2) == 1
        local CLIENT = not SERVER

        local low = math.huge
        local high = 0
        local total = 0
        for i = 1, 10 do
            local start = SysTime()
            for i = 1, 100000 do
                local n = 0
                if SERVER then
                    n = n + seed
                end
                if CLIENT then
                    n = n + seed
                end
            end
            local delta = SysTime() - start
            low = math.min(low, delta)
            high = math.max(high, delta)
            total = total + delta
        end
        print("(Local NoConst) Unoptimised")
        print("Low: " .. low * (10 ^ 6) .. "us")
        print("High: " .. high * (10 ^ 6) .. "us")
        print("Mean: " .. (total / 10) * (10 ^ 6) .. "us\n")
    end
    localised_no_const()
end

do
    local function optimised()
        local low = math.huge
        local high = 0
        local total = 0
        for i = 1, 10 do
            local start = SysTime()
            for i = 1, 100000 do
                local n = 0
                if true then
                    n = n + seed
                end
                if false then
                    n = n + seed
                end
            end
            local delta = SysTime() - start
            low = math.min(low, delta)
            high = math.max(high, delta)
            total = total + delta
        end
        print("Optimised")
        print("Low: " .. low * (10 ^ 6) .. "us")
        print("High: " .. high * (10 ^ 6) .. "us")
        print("Mean: " .. (total / 10) * (10 ^ 6) .. "us")
    end
    optimised()
end
true    SSE2    SSE3    SSE4.1  AMD     BMI2    fold    cse     dce     fwd     dse     narrow  loop    abc     sink   fuse

(Global) Unoptimised
Low: 77.999999575695us
High: 115.39999968591us
Mean: 85.85999994466us

(Local) Unoptimised
Low: 34.700000014709us
High: 53.599999773724us
Mean: 39.509999987786us

(Local NoConst) Unoptimised
Low: 34.700000014709us
High: 70.099999902595us
Mean: 39.699999979348us

Optimised
Low: 34.699999559962us
High: 51.699999858101us
Mean: 37.459999975908us

Conclusion

It is clear that most of the overhead from if SERVER and if CLIENT runtime checks come from global lookups.

Unsurprisingly, the branching overhead is extremely small. We can see this because the (Local) Unoptimised average is only ~2.24us difference when compared to the Optimised average.

Because SERVER and CLIENT are just booleans, they will be copied into the localised variables. This could potentially enable additional optimisations since LuaJIT can see that the local value could be constants. I tested this with the Local NoConst benchmark but it seemed to have negligible difference, so in this case I don't think it was relevant.

In conclusion from these observations, this module can optimise runtime checking of realm when looking up from the global table but has negligible impact when those globals have been localised.

Alternative Solution

A module that prepends common globals as locals to the top of scripts could be an alternative, and potentially even better method for this, but is definitely more error prone as Lua has a maximum upvalue amount and adding new locals to the top of a script could cause an error for this reason. However, a more advanced module that implements a real GLua lexer + parser could catch this issue and skip optimising that file.

You might also like...
Experimental language build in Rust to make it fast and robust

Reg-lang Experimental language build with Rust. Its aim is : To be simple to help learning programmation with, and in a second hand, to be robust enou

An experimental GUI library for Rust 🦀

guee 🚧 Nothing to see here 🚧 ... But if you insist: This is an experimental UI library I'm working on for Blackjack. The idea is to find a very prag

An experimental cross-platform UI framework in rust.
An experimental cross-platform UI framework in rust.

Cross-platform UI framework in Rust with Easy functional composasbles Flexible state management Desktop and mobile support Accessibility Native skia r

Wraps cargo to move target directories to a central location [super experimental]

targo Wraps cargo to move target directories to a central location [super experimental] To use, cargo install --git https://github.com/sunshowers/targ

Cuprate, an upcoming experimental, modern & secure monero node. Written in Rust

Cuprate an upcoming experimental, modern & secure monero node. Written in Rust (there is nothing working at the moment, stay tuned if you want to see

Experimental extension that brings OpenAI API to your PostgreSQL to run queries in human language.

Postgres ChatGPT Experimental PostgreSQL extension that enables the use of OpenAI GPT API inside PostgreSQL, allowing for queries to be written usi

auto-rust is an experimental project that aims to automatically generate Rust code with LLM (Large Language Models) during compilation, utilizing procedural macros.
auto-rust is an experimental project that aims to automatically generate Rust code with LLM (Large Language Models) during compilation, utilizing procedural macros.

Auto Rust auto-rust is an experimental project that aims to automatically generate Rust code with LLM (Large Language Models) during compilation, util

Experimental integration of `fedimint-client` with the Leptos web frontend framework
Experimental integration of `fedimint-client` with the Leptos web frontend framework

CAUTION: highly experimental, the Database implementation is likely horribly broken Fedimint Client built with Leptos This repo contains a proof-of-co

Experimental OS, built with rust

You can support this night time project by hiring me for a day time job ! Fomos Experimental OS, built with Rust output.mp4 Fun fact: there are 3 apps

Comments
  • You mentioned dead code elimination not being done by LuaJIT

    You mentioned dead code elimination not being done by LuaJIT

    So regarding DCE, I've never heard the term but I'm going to assume it basically would be dropping code that would logically NEVER be ran at all.

    You mention it doesn't get performed by LuaJIT in this case, would it be very difficult to get the preprocessor here to instead of changing SERVER / CLIENT to true / false, have it perform an outright DCE itself? I'd imagine it being similar results(maybe slightly better numbers give that theres not a local or global look up at all anymore, and dead code wouldn't exist at all anymore. I have no clue if the deadcode would be loaded but not used or if it would cause some small impact. I know removing it would mean less memory usage if the dead code was dropped however.

    Like for custom built servers that use only their own stuff it likely won't be much of a differance(I'd imagine smarter servers doing this would make sure to not use SERVER / CLIENT much at all), but on say your average darkRP server running a ton of workshop and gmodstore scripts, it could probably free a good chunk of memory. Some imaginary numbers just for a guess here, but say dropping from 300-400 MB down to 250-350MB, depending on how bloated the server is?

    I mean its likely not much noticable but for sake of micro-optimization, I figured I'd atleast bring up the idea of doing that and get your thoughts on it and if it would be worth attempting for doing.

    opened by VaasKahnGrim 1
Owner
William
aka Billy
William
A Minecraft mod manager for the terminal.

Hopper A Minecraft mod manager for the terminal. Donate High-level Goals modrinth mod searching modrinth mod installation curseforge api too? per-inst

Tebibyte Media 17 Dec 24, 2022
A Rust-based Garry's Mod module for fetching environment variables.

gm_environ Using Environment Variables in Garry's Mod. Installation Download a copy of the module from the releases (or compile from source) Move the

Joshua Piper 2 Jan 4, 2022
A curl(libcurl) mod for rust.

curl Master Dev A lightweight Curl-wrapper for using (mostly) HTTP from Rust. While there are a couple of Rust HTTP libraries like rust-http and its s

Valerii Hiora 2 Sep 14, 2016
Cross-platform CLI Rimworld Mod manager. Still under development

rwm Inspired by Spoons rmm. This is a cross-platform Mod Manager for RimWorld intended to work with macOS, linux and Windows Up to now, you must have

Alejandro O 7 Sep 5, 2022
Mod for pxtone Collage that adds some shiny features

ptcMod Mod for pxtone Collage that adds some shiny features READ THIS FIRST The program is extremely invasive to ptCollage. This project makes extensi

David M. 6 Sep 17, 2022
Mod for Mega Man Battle Network Legacy Collection to restore the WWW base music in the postgame.

MMBNLC Postgame WWW Base Music mod This is a mod for Mega Man Battle Network Legacy Collection Vol. 2 adjusts the field music played in the WWW base i

Prof. 9 3 May 7, 2023
An experimental, work-in-progress PAM module for Tailscale

Experimental Tailscale PAM Module This is a very very experimental Tailscale PAM module that allows you to SSH using your Tailscale credentials. This

Tailscale 129 Nov 20, 2022
Experimental Rust UI library for Audulus. "rui" is a temporary name.

Experimental Rust UI library for Audulus. "rui" is a temporary name.

Audulus LLC 1.1k Dec 28, 2022
An experimental real-time operating system (RTOS) written in Rust

An experimental real-time operating system (RTOS) written in Rust

null 0 Nov 14, 2022
Experimental implementation of the Privacy Preserving Measurement (PPM) specification.

janus Janus is an experimental implementation of the Privacy Preserving Measurement (PPM) specification. It is currently in active development. Runnin

Divvi Up (ISRG) 33 Dec 12, 2022