Skip to content

Conversation

@Pat-Lafon
Copy link
Contributor

@Pat-Lafon Pat-Lafon commented Jan 13, 2021

This pr does some of the low hanging fruit to bring brilirs to about 7.5x-11x faster than brili. Dependent on #117.

All but three benchmarks do not run for long enough to be accurately reported by hyperfine (hyperfine reports ~5ms for all of them with brilirs, about ~40-60ms when running with brili).

make release
./long_benchmark.sh 
Benchmark brili brilirs Speedup
ackermann 555ms ±44ms 68.9ms ± 2.5ms ~8 ± .7
eight-queens 511ms ±66ms 46.5 ms ± 9.2ms ~11 ± 2.5
mat-mul 876ms ± 84ms 82.9ms ±11 ms ~10.5 ± 2

This isn't attempting to be a rigorous benchmarking of brili vs brilirs but to more so show a significant improvement in the performance of brilirs(which started at ~1x-1.5x over brili on these benchmarks ).

Two of the most significant optimizations are a drastic reduction in the amount of clone()/String::from() calls and switching from Hashmap to FxHashMap with an initialized capacity.

Other optimizations relate to the release profile in cargo.toml, switching to the MiMalloc allocator and annotating many of the functions with #[inline(always)].

Typechecking has been pulled out and is now done first before executing main for a small, but noticeable performance increase. It should now also do a mostly complete validation of the bril code.

I made attempts to use async code, the smallvec crate, unsafe argument indexing, and program parallelism but there was no immediate improvement.

brilirs now uses nightly rust for or patterns in match statements and a slight performance increase.

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's pretty cool! Those are some nice performance results.

Just a couple of high-level questions:

  • Now that the type checking is static rather than dynamic, it seems a little odd that it's coupled with the interpreter. Conceptually, it could be a separate flag or even a completely separate executable (like the current Python type checker and the proposed OCaml one in #107). IMO, a fast interpreter need not necessarily check types… just doing something crazy or crashing on bad programs is fine.
  • Nightly Rust does make things a bit more inconvenient to use (especially since the underlying library doesn't require it). If it's really necessary, it's probably OK, especially if the features you want seem like they're coming to stable Rust soon… but if it's just for a little extra performance, then it doesn't quite seem worth it to require it?

"../benchmarks/sum-bits.json" "../benchmarks/sum-sq-diff.json"
)
args=( "3 6" "128" "50" "7" "645634654" "8" "" "10" "101" "4 20" "8" "50 109658" "96 false" "496" "125" \
"-5 8 21" "" "8" "100" "" "42" "100")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little bit too bad that this script needs to re-list all the benchmarks and all their arguments instead of just using ../benchmarks/*.bril. Maybe Brench could help with this—it could even wrap a call to Hyperfine? Not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really tried to use Brench yet but I will look into it. I was mostly going off of the blog post so I threw something together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest I've gotten is the following one-liner as a single stage pipeline for brench. (export bril=$(bril2json </dev/stdin); export arg='3 6'; hyperfine -s basic --warmup 5 'echo $bril | brili -p $arg') where I'm only trying to get the Ackermann.bril benchmark to run. I get a nice ValueError: I/O operation on closed file. when running Brench which I'm unsure of. It runs as expected on it's own(providing the ackermann file to stdin).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. Any chance it's a bash/sh difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#!/bin/sh
ps -p $$

(export bril=$(bril2json </dev/stdin); export arg='3 6'; hyperfine -s basic --warmup 5 'echo $bril | brili -p $arg') < ../benchmarks/ackermann.bril

This appears to show the above line using /bin/sh and running without issue.

@Pat-Lafon
Copy link
Contributor Author

  • I had originally planned to have a flag to turn off type checking but the overall runtime is dominated by the actual running of the program. Some rough numbers in lib.rs(running on Ackermann.bril), loading in the program and creating the cfg takes 156 microseconds, type_check takes 7 microseconds, and execute_main takes 65 milliseconds. I could add flags like --check to only type check and --no-check to run without type checking but the performance will not be noticable(or measurable with hyperfine).
  • I mainly added it for the or_patterns feature which is slated to be added to stable in March. It's alot nicer to write:
Instruction::Value {
            op: ValueOps::And | ValueOps::Or,
            dest,
            op_type,
            args,
            funcs,
            labels,
        }

over

Instruction::Value {
            op: ValueOps::And,
            dest,
            op_type,
            args,
            funcs,
            labels,
        } |
Instruction::Value {
            op: ValueOps::Or,
            dest,
            op_type,
            args,
            funcs,
            labels,
        }
  • I probably need to go through one more time to check that the type checking is correct. I'll look at the work over in Algebraic types extension #107. The issues of undefined variables for to_ssa.py probably applies here as well.

@sampsyo
Copy link
Owner

sampsyo commented Jan 15, 2021

Cool! Not sure a --no-check flag is strictly necessary (why not, I guess?), but I do think a "check only" flag would be super useful because the standalone checking seems broadly applicable.

or_patterns does seem cool. Any chance you might remember to come back in March and turn off the nightly requirement once the feature stabilizes? 😇

@Pat-Lafon
Copy link
Contributor Author

Two more changes have been made. The first being the previously mentioned --check flag which only runs the type checking phase. Looking at the other implementations, this checker supports more bril extensions though I don't attempt any type inference like in the python version. There do not appear to be addition test cases provided by these implementations to test on(beyond inference/algebraic types) so I would be interested in cases where the brilirs checker is less correct. The second change is to replace unreachable!() with it's unsafe version unreachable_unchecked. This allows the compiler to optimize the From<&Value> functions into a branchless version as shown by godbolt which brings ~10ms reductions to the eight-queens and mat-mul benchmarks.

@sampsyo
Copy link
Owner

sampsyo commented Jan 27, 2021

Neat!

@Pat-Lafon
Copy link
Contributor Author

Pat-Lafon commented Feb 2, 2021

To bring things full circle, I've reimplemented a version of the original optimization which replaces string variable names with an int identifier. This provides another considerable improvement in performance. I implemented a first run of error messages by implementing the Display trait for InterpError. They don't quite line up with the brili error messages but it's hopefully a little nicer.

I've updated the setup instructions for brilirs. Currently they don't recommend the --release flag which might lead to a significant and unexpected performance hit for someone who isn't familiar with Rust/cargo. Instead I'm recommending to use the cargo install command which would allow a user to use brilirs as a true drop-in replacement for brili.

Some final numbers on my limited benchmark suite. Seeing the changes in brili times/variance suggests that benchmarking is hard.

Benchmark brili brilirs Speedup
ackermann 545ms ±35ms 30.8ms ± 1.6ms ~17.7 ± 1.5
eight-queens 434ms ±3.4ms 18.2 ms ± 1.2ms ~24 ± 1.5
mat-mul 863ms ± 37ms 38.3ms ±2.7 ms ~22.5 ± 2

@sampsyo
Copy link
Owner

sampsyo commented Feb 3, 2021

This all looks great. I fear I may have done the wrong thing by merging #117 first—looks like there may be a conflict to resolve now? In any case, it seems cool to merge this if you're interested. I'm still dreaming of getting rid of the nightly requirement, but I guess we can revisit that when the next Rust release rolls around.

@Pat-Lafon Pat-Lafon force-pushed the brilirs-benchmarking branch from f99b2dd to f4b5be7 Compare February 4, 2021 18:01
@Pat-Lafon
Copy link
Contributor Author

Merged

@Pat-Lafon Pat-Lafon mentioned this pull request Feb 4, 2021
Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; thanks!

@sampsyo sampsyo merged commit 8ac601d into sampsyo:master Feb 5, 2021
@Pat-Lafon Pat-Lafon deleted the brilirs-benchmarking branch March 7, 2021 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants