-
Notifications
You must be signed in to change notification settings - Fork 47
Fix regexp not removing multi-line comments #88
base: master
Are you sure you want to change the base?
Conversation
The previous regular expression had two issues: 1. It was only able to remove 1-line comments. 2. It was not set to be ungreedy, and could potentially remove content between two comments. E.g. when something like `<!-- comment --> content <!-- comment -->` does not contain a newline character, the content would actually be removed. We run into this issue at https://gerrit.wikimedia.org/r/489323. As a temporary workaround we made all our comments 1-line comments, and made sure each comment is on a separate line.
|
|
||
| // SVG XML -> HTML5 | ||
| [/\<([A-Za-z]+)([^\>]*)\/\>/g, "<$1$2></$1>"], // convert self-closing XML SVG nodes to explicitly closed HTML5 SVG nodes | ||
| [/\<([a-z]+)([^\>]*)\/\>/gi, "<$1$2></$1>"], // convert self-closing XML SVG nodes to explicitly closed HTML5 SVG nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case anyone else is a little rusty on their RegExps, this is a readability improvement: drop A-Z and make a-z case-insensitive with the i flag.
| [/<\?xml[\s\S]*?>/gi, ""], | ||
| [/<!doctype[\s\S]*?>/gi, ""], | ||
| [/<!--.*-->/gi, ""], | ||
| [/<!--[\s\S]*?-->/g, ""], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not sure if anyone else is rusty but . does not match newlines. Adding newlines can be done with (.|\n) but that adds a capturing group so [\s\S] is used instead, which matches all whitespace and all non-whitespace characters which together form the set of all characters. No casing specifier is needed so that's dropped. *? is like * but non-greedy.
The previous regular expression had two issues:
<!-- comment --> content <!-- comment -->does not contain a newline character, the content would actually be removed.We run into this issue at https://gerrit.wikimedia.org/r/489323. As a temporary workaround we made all our comments 1-line comments, and made sure each comment is on a separate line.