-
-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Improve lexer performance by 5-10% overall, improve string lexer performance 15% #149689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
r? @nnethercote rustbot has assigned @nnethercote. Use |
This comment has been minimized.
This comment has been minimized.
|
this is the benchmark library to track performance changes: |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Improve lexer performance by 5-10% overall, improve string lexer performance 15%
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (e0cf684): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary 2.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 3.1%, secondary 1.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 470.249s -> 469.703s (-0.12%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. I like the attempt and the careful measurements. Unfortunately it doesn't seem to help full-compiler performance, as measured by rust-timer, and slightly regresses a few benchmarks. Lexing is a really small component of overall compilation time, and it's already been micro-optimized quite a bit, so it's hard for it have much effect on performance. I suspect adding #[inline] to all those functions had the biggest effect and caused the regressions.
compiler/rustc_lexer/src/cursor.rs
Outdated
|
|
||
| /// Bumps the cursor if the next character is either of the two expected characters. | ||
| #[inline] | ||
| pub(crate) fn bump_if2(&mut self, expected1: char, expected2: char) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this bump_if_either. bump_if2 makes me think that expected1 must be followed by expected2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure I can rename it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be logical to rename eat_past2 to eat_past_either too, and use byte1 and byte2 to match already existing eat_until style, what do you suggest?
compiler/rustc_lexer/src/cursor.rs
Outdated
| #[inline] | ||
| pub(crate) fn bump_bytes(&mut self, n: usize) { | ||
| self.chars = self.as_str()[n..].chars(); | ||
| self.chars = self.as_str().get(n..).unwrap_or("").chars(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the thinking behind this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it removes the panic handling code generation and branching which in my experiments it is always faster even when panic doesn't happen. if it can be proven by llvm that it will never panic unwrap_or will be optimized away like the panic handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that I am wrong and it does not apply everywhere as even in my benchmark suite it reduces performance in cursor_eat_until/eat_until_newline, I'm going to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that only works if function is #[inline].
compiler/rustc_lexer/src/lib.rs
Outdated
| let nl_fence_pattern = format!("\n{:-<1$}", "", length_opening as usize); | ||
| if let Some(closing) = self.as_str().find(&nl_fence_pattern) { | ||
| #[inline] | ||
| fn find_closing_fence(s: &str, dash_count: usize) -> Option<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Micro-optimizing frontmatter lexing doesn't seem worthwhile. It's just a tiny fraction of general lexing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that frontmatter lexing doesn’t need to be heavily optimized. That said, the memchr version eliminates a possible heap allocation (rare in practice, but still nice to avoid) and gives a ~4× speedup on that path. It improves the overall tone and consistency of the lexer code, and there’s no real harm in keeping it. Totally your call, though if you’d rather drop it for simplicity, I can remove it without issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it in my last commit, but let me know if you are interested in having it back.
|
Thank you for reviewing this PR. I’m exploring the rustc codebase in my spare time, and the lexer was the first part I dove into. I’m just trying to contribute what I can to help improve the compiler’s performance. I’m happy to drop the |
…per function names for better readability
|
Let's do another perf run just for completeness: @bors try @rust-timer queue What to do will depend on the result there. Overall this does add more code without particularly improving readability, IMO, so if there's no perf improvement the impetus to merge is low. It is cool that you are looking at performance, though. If you have a Linux machine then I would recommend trying out rustc-perf and using that as a starting point for investigating compiler performance. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Improve lexer performance by 5-10% overall, improve string lexer performance 15%
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (1566a60): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary 3.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.7%, secondary -0.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 471.709s -> 470.591s (-0.24%) |
|
Thanks for taking the time to review this PR! fn double_quoted_string(&mut self) -> bool {
debug_assert!(self.prev() == '"');
while let Some(c) = self.eat_past_either(b'"', b'\\') {
match c {
b'"' => {
return true;
}
b'\\' => _ = self.bump_if_either('\\', '"'),
_ => unreachable!(),
}
}
false
}vs fn double_quoted_string(&mut self) -> bool {
debug_assert!(self.prev() == '"');
while let Some(c) = self.bump() {
match c {
'"' => {
return true;
}
'\\' if self.first() == '\\' || self.first() == '"' => {
// Bump again to skip escaped character.
self.bump();
}
_ => (),
}
}
// End of file reached.
false
} |
|
For the quoted example, the new code is slightly shorter but it requires the separate function, and it's clunkier in a way because it does double comparison of the char -- first in If you want to morph this PR into a readability-oriented one instead of a performance-oriented one, that's fine, but at the moment it feels a bit like an attempt to do both and it's not quite working on either front. |
…ic and improve readability in lexer
This comment has been minimized.
This comment has been minimized.
|
Thank you for your response. I've removed the As I said before, the decision is entirely yours and I fully respect it. I’ve invested a lot of time into this PR, and naturally I’d love to see it merged, but my only real motivation is to help make Rust faster. If you feel it doesn’t belong here, it’s better not to merge it at all. I'm currently reading the compiler source code and will follow your guidance by using P.S. Now I think it is actually much cleaner: fn double_quoted_string(&mut self) -> bool {
debug_assert!(self.prev() == '"');
while let Some(c) = self.eat_past_either(b'"', b'\\') {
if c == b'"' {
return true;
}
// Current is '\\', bump again if next is an escaped character.
self.bump_if_either('\\', '"');
}
// End of file reached.
false
} |
Hi, this PR improves lexer performance by ~5-10% when lexing the entire standard library. It specifically targets the string lexer, comment lexer, and frontmatter lexer.
eat_past2function that leveragesmemchr2.format!and rewrote the lexer usingmemchr-based scanning, which is roughly 4× faster.I also applied a few minor optimizations in other areas.
I’ll send the benchmark repo in the next message. Here are the results on my x86_64 laptop (AMD 6650U):