Speeding up Brilirs #122

Pat-Lafon · 2021-01-13T19:31:14Z

This pr does some of the low hanging fruit to bring brilirs to about 7.5x-11x faster than brili. Dependent on #117.

All but three benchmarks do not run for long enough to be accurately reported by hyperfine (hyperfine reports ~5ms for all of them with brilirs, about ~40-60ms when running with brili).

make release
./long_benchmark.sh

Benchmark	`brili`	`brilirs`	Speedup
ackermann	555ms ±44ms	68.9ms ± 2.5ms	~8 ± .7
eight-queens	511ms ±66ms	46.5 ms ± 9.2ms	~11 ± 2.5
mat-mul	876ms ± 84ms	82.9ms ±11 ms	~10.5 ± 2

This isn't attempting to be a rigorous benchmarking of brili vs brilirs but to more so show a significant improvement in the performance of brilirs(which started at ~1x-1.5x over brili on these benchmarks ).

Two of the most significant optimizations are a drastic reduction in the amount of clone()/String::from() calls and switching from Hashmap to FxHashMap with an initialized capacity.

Other optimizations relate to the release profile in cargo.toml, switching to the MiMalloc allocator and annotating many of the functions with #[inline(always)].

Typechecking has been pulled out and is now done first before executing main for a small, but noticeable performance increase. It should now also do a mostly complete validation of the bril code.

I made attempts to use async code, the smallvec crate, unsafe argument indexing, and program parallelism but there was no immediate improvement.

brilirs now uses nightly rust for or patterns in match statements and a slight performance increase.

sampsyo

Well, that's pretty cool! Those are some nice performance results.

Just a couple of high-level questions:

Now that the type checking is static rather than dynamic, it seems a little odd that it's coupled with the interpreter. Conceptually, it could be a separate flag or even a completely separate executable (like the current Python type checker and the proposed OCaml one in #107). IMO, a fast interpreter need not necessarily check types… just doing something crazy or crashing on bad programs is fine.
Nightly Rust does make things a bit more inconvenient to use (especially since the underlying library doesn't require it). If it's really necessary, it's probably OK, especially if the features you want seem like they're coming to stable Rust soon… but if it's just for a little extra performance, then it doesn't quite seem worth it to require it?

sampsyo · 2021-01-14T13:02:15Z

brilirs/benchmark.sh

+        "../benchmarks/sum-bits.json" "../benchmarks/sum-sq-diff.json"
+        )
+args=( "3 6" "128" "50" "7" "645634654" "8" "" "10" "101" "4 20" "8" "50 109658" "96 false" "496" "125" \
+        "-5 8 21" "" "8" "100" "" "42" "100")


It is a little bit too bad that this script needs to re-list all the benchmarks and all their arguments instead of just using ../benchmarks/*.bril. Maybe Brench could help with this—it could even wrap a call to Hyperfine? Not sure.

I haven't really tried to use Brench yet but I will look into it. I was mostly going off of the blog post so I threw something together.

The closest I've gotten is the following one-liner as a single stage pipeline for brench. (export bril=$(bril2json </dev/stdin); export arg='3 6'; hyperfine -s basic --warmup 5 'echo $bril | brili -p $arg') where I'm only trying to get the Ackermann.bril benchmark to run. I get a nice ValueError: I/O operation on closed file. when running Brench which I'm unsure of. It runs as expected on it's own(providing the ackermann file to stdin).

Weird. Any chance it's a bash/sh difference?

#!/bin/sh ps -p $$ (export bril=$(bril2json </dev/stdin); export arg='3 6'; hyperfine -s basic --warmup 5 'echo $bril | brili -p $arg') < ../benchmarks/ackermann.bril

This appears to show the above line using /bin/sh and running without issue.

Pat-Lafon · 2021-01-14T18:17:57Z

I had originally planned to have a flag to turn off type checking but the overall runtime is dominated by the actual running of the program. Some rough numbers in lib.rs(running on Ackermann.bril), loading in the program and creating the cfg takes 156 microseconds, type_check takes 7 microseconds, and execute_main takes 65 milliseconds. I could add flags like --check to only type check and --no-check to run without type checking but the performance will not be noticable(or measurable with hyperfine).
I mainly added it for the or_patterns feature which is slated to be added to stable in March. It's alot nicer to write:

Instruction::Value {
            op: ValueOps::And | ValueOps::Or,
            dest,
            op_type,
            args,
            funcs,
            labels,
        }

over

Instruction::Value {
            op: ValueOps::And,
            dest,
            op_type,
            args,
            funcs,
            labels,
        } |
Instruction::Value {
            op: ValueOps::Or,
            dest,
            op_type,
            args,
            funcs,
            labels,
        }

I probably need to go through one more time to check that the type checking is correct. I'll look at the work over in Algebraic types extension #107. The issues of undefined variables for to_ssa.py probably applies here as well.

sampsyo · 2021-01-15T13:34:02Z

Cool! Not sure a --no-check flag is strictly necessary (why not, I guess?), but I do think a "check only" flag would be super useful because the standalone checking seems broadly applicable.

or_patterns does seem cool. Any chance you might remember to come back in March and turn off the nightly requirement once the feature stabilizes? 😇

Pat-Lafon · 2021-01-27T07:26:24Z

Two more changes have been made. The first being the previously mentioned --check flag which only runs the type checking phase. Looking at the other implementations, this checker supports more bril extensions though I don't attempt any type inference like in the python version. There do not appear to be addition test cases provided by these implementations to test on(beyond inference/algebraic types) so I would be interested in cases where the brilirs checker is less correct. The second change is to replace unreachable!() with it's unsafe version unreachable_unchecked. This allows the compiler to optimize the From<&Value> functions into a branchless version as shown by godbolt which brings ~10ms reductions to the eight-queens and mat-mul benchmarks.

sampsyo · 2021-01-27T13:33:22Z

Neat!

Pat-Lafon · 2021-02-02T17:43:37Z

To bring things full circle, I've reimplemented a version of the original optimization which replaces string variable names with an int identifier. This provides another considerable improvement in performance. I implemented a first run of error messages by implementing the Display trait for InterpError. They don't quite line up with the brili error messages but it's hopefully a little nicer.

I've updated the setup instructions for brilirs. Currently they don't recommend the --release flag which might lead to a significant and unexpected performance hit for someone who isn't familiar with Rust/cargo. Instead I'm recommending to use the cargo install command which would allow a user to use brilirs as a true drop-in replacement for brili.

Some final numbers on my limited benchmark suite. Seeing the changes in brili times/variance suggests that benchmarking is hard.

Benchmark	`brili`	`brilirs`	Speedup
ackermann	545ms ±35ms	30.8ms ± 1.6ms	~17.7 ± 1.5
eight-queens	434ms ±3.4ms	18.2 ms ± 1.2ms	~24 ± 1.5
mat-mul	863ms ± 37ms	38.3ms ±2.7 ms	~22.5 ± 2

sampsyo · 2021-02-03T12:53:42Z

This all looks great. I fear I may have done the wrong thing by merging #117 first—looks like there may be a conflict to resolve now? In any case, it seems cool to merge this if you're interested. I'm still dreaming of getting rid of the nightly requirement, but I guess we can revisit that when the next Rust release rolls around.

Pat-Lafon · 2021-02-04T18:07:03Z

Merged

sampsyo

LGTM; thanks!

sampsyo reviewed Jan 14, 2021

View reviewed changes

Pat-Lafon added 10 commits February 4, 2021 12:59

benchmarking, removing clones, and fxhashmap

a4f7af9

Slightly better allocator

7d3e1cb

Do typechecking seperately and use nightly

933db48

--check arg and fmt check.rs

63ebc65

bump clap version

4b2d5ed

unsafe_unreachable and reference fixes

e913e0b

Use ints for variable names

1a84603

Slightly improved error messages

c1094d6

Some comments and documentation

c0853a9

document --check

f4b5be7

Pat-Lafon force-pushed the brilirs-benchmarking branch from f99b2dd to f4b5be7 Compare February 4, 2021 18:01

Pat-Lafon mentioned this pull request Feb 4, 2021

Brilirs stable #125

Merged

sampsyo approved these changes Feb 5, 2021

View reviewed changes

sampsyo merged commit 8ac601d into sampsyo:master Feb 5, 2021

Pat-Lafon deleted the brilirs-benchmarking branch March 7, 2021 00:05

Pat-Lafon mentioned this pull request Mar 31, 2022

Rust code doesn't conform to commonly accepted unsafe semantics #183

Closed

Speeding up Brilirs #122

Speeding up Brilirs #122

Uh oh!

Conversation

Pat-Lafon commented Jan 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sampsyo left a comment

Choose a reason for hiding this comment

Uh oh!

sampsyo Jan 14, 2021

Choose a reason for hiding this comment

Uh oh!

Pat-Lafon Jan 14, 2021

Choose a reason for hiding this comment

Uh oh!

Pat-Lafon Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

sampsyo Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

Pat-Lafon Jan 27, 2021

Choose a reason for hiding this comment

Uh oh!

Pat-Lafon commented Jan 14, 2021

Uh oh!

sampsyo commented Jan 15, 2021

Uh oh!

Pat-Lafon commented Jan 27, 2021

Uh oh!

sampsyo commented Jan 27, 2021

Uh oh!

Pat-Lafon commented Feb 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sampsyo commented Feb 3, 2021

Uh oh!

Pat-Lafon commented Feb 4, 2021

Uh oh!

sampsyo left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pat-Lafon commented Jan 13, 2021 •

edited

Loading

Pat-Lafon commented Feb 2, 2021 •

edited

Loading