admin管理员组文章数量:1025509
I know that a parser would best be suited for this situation but in my current situation it has to be just straight javascript.
I have a regex to find the closing body tag of an html doc.
var closing_body_tag = /(<\/body>)/i;
However, this fails when source has more than 1 body tag set. So I was thinking about going with something like this..
var last_closing_body_tag = /(<\/body>)$/gmi;
This works for the case when multiple tags are found, but for some reason it is failing on cases with just 1 set of tags.
Am I making a mistake that would cause mixed results for single tag cases?
Yes, I understand more than one body tag is incorrect, however, we have to handle all bad source.
I know that a parser would best be suited for this situation but in my current situation it has to be just straight javascript.
I have a regex to find the closing body tag of an html doc.
var closing_body_tag = /(<\/body>)/i;
However, this fails when source has more than 1 body tag set. So I was thinking about going with something like this..
var last_closing_body_tag = /(<\/body>)$/gmi;
This works for the case when multiple tags are found, but for some reason it is failing on cases with just 1 set of tags.
Am I making a mistake that would cause mixed results for single tag cases?
Yes, I understand more than one body tag is incorrect, however, we have to handle all bad source.
Share Improve this question edited Apr 24, 2015 at 15:16 Adam asked Apr 24, 2015 at 15:06 AdamAdam 3,6656 gold badges36 silver badges52 bronze badges 15- 7 And why would you have more than one body tag ? – adeneo Commented Apr 24, 2015 at 15:08
- 1 Just curious. Why do you need to find the closing body tag? What are you going to do with that? – hindmost Commented Apr 24, 2015 at 15:09
- 3 You don't need jQuery for parsing HTML. – Ram Commented Apr 24, 2015 at 15:09
- 1 @Adam You don't need Regexp for that. Use DOM manipulation methods instead – hindmost Commented Apr 24, 2015 at 15:11
-
1
document.body.appendChild
inserts an element right before the closing tag. A regex does not ? – adeneo Commented Apr 24, 2015 at 15:13
4 Answers
Reset to default 3You can use this regex:
/<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i
(?![\s\S]*<\/body>[\s\S]*$)
is a lookahead that ensures there is no more closing body tag before the end of the string.
Here is a demo.
Sample code for adding a tag:
var re = /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i;
var str = '<html>\n<body>\n</body>\n</html>\n<html>\n<body>\n</body>\n</html>';
var subst = '<tag/>';
var result = str.replace(re, subst);
RegExp
As I suggested in the ments, use:
/^[\S\s]+(<\/body>)/i
How
This will get all text (greedy) until the text </body>
the flag i
means case-insensitive. This will work no matter how many body tags you have
</body>
</BODY>
</BoDY>
</body><!--This one's selected-->
You said you were using JavaScript which can be used as:
yourString.match(/^[\S\s]+(<\/body>)/i)[1];
.match
works fine when you don't have the g
flag. To further explain this RegExp
Explanation
^
Matches it at the beginning of the whole string because we don't have them
flag
[\S\s]+
will match everything until the following. The+
can be replaced by a*
(<\/body>)
will get the body tag after the previous (the last one) and add it as a match
i
thei
flag makes the string case-insensitive (remove if you want it to be case sensitive)
JavaScript appendChild
If you have multiple body tags, you can still add an element before it.
var elem = document.createElement('div');
elem.setAttribute('id', 'mydiv');
elem.innerHTML = 'Foo';
Now, elem
can be added in multiple ways:
1:
window.document.body.appenedChild(elem);
2:
var body_elems = document.getElementsByTagName('body');
body_elems[body_elems.length - 1].appendChild(elem);
Use
/(.|[\r\n])*(<\/body>)/mi
as a regexp. Capture group is $2.
This exploits greedy matching in connection with the multiline option. Note that the 'any char' symbol does not match newlines/carriage returns, which thus need explicit referral.
The regex to match the last body tag is fairly simple:
/[\s\S]*(</body>)/i
What this does is match as many possible of any character (more specifically, any whitespacespace or anything that's not whitespace) before </body>
.
The i
flag means that it'll match any case for </body>
, so anything like:
</body>
</BODY>
</BodY>
Will all match.
I used [\s\S]
instead of .
because .
matches everything but the newline operators, which probably isn't what you want. \s
matches all whitespace -- spaces, tabs, every kind of newline -- and \S
is equivalent to [^\s]
, so it matches everything that isn't whitespace. Together, these match every possible character. I'd imagine a similar thing is possible with \w\W
, \d\D
, etc., but \s\S
is my preference.
I know that a parser would best be suited for this situation but in my current situation it has to be just straight javascript.
I have a regex to find the closing body tag of an html doc.
var closing_body_tag = /(<\/body>)/i;
However, this fails when source has more than 1 body tag set. So I was thinking about going with something like this..
var last_closing_body_tag = /(<\/body>)$/gmi;
This works for the case when multiple tags are found, but for some reason it is failing on cases with just 1 set of tags.
Am I making a mistake that would cause mixed results for single tag cases?
Yes, I understand more than one body tag is incorrect, however, we have to handle all bad source.
I know that a parser would best be suited for this situation but in my current situation it has to be just straight javascript.
I have a regex to find the closing body tag of an html doc.
var closing_body_tag = /(<\/body>)/i;
However, this fails when source has more than 1 body tag set. So I was thinking about going with something like this..
var last_closing_body_tag = /(<\/body>)$/gmi;
This works for the case when multiple tags are found, but for some reason it is failing on cases with just 1 set of tags.
Am I making a mistake that would cause mixed results for single tag cases?
Yes, I understand more than one body tag is incorrect, however, we have to handle all bad source.
Share Improve this question edited Apr 24, 2015 at 15:16 Adam asked Apr 24, 2015 at 15:06 AdamAdam 3,6656 gold badges36 silver badges52 bronze badges 15- 7 And why would you have more than one body tag ? – adeneo Commented Apr 24, 2015 at 15:08
- 1 Just curious. Why do you need to find the closing body tag? What are you going to do with that? – hindmost Commented Apr 24, 2015 at 15:09
- 3 You don't need jQuery for parsing HTML. – Ram Commented Apr 24, 2015 at 15:09
- 1 @Adam You don't need Regexp for that. Use DOM manipulation methods instead – hindmost Commented Apr 24, 2015 at 15:11
-
1
document.body.appendChild
inserts an element right before the closing tag. A regex does not ? – adeneo Commented Apr 24, 2015 at 15:13
4 Answers
Reset to default 3You can use this regex:
/<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i
(?![\s\S]*<\/body>[\s\S]*$)
is a lookahead that ensures there is no more closing body tag before the end of the string.
Here is a demo.
Sample code for adding a tag:
var re = /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i;
var str = '<html>\n<body>\n</body>\n</html>\n<html>\n<body>\n</body>\n</html>';
var subst = '<tag/>';
var result = str.replace(re, subst);
RegExp
As I suggested in the ments, use:
/^[\S\s]+(<\/body>)/i
How
This will get all text (greedy) until the text </body>
the flag i
means case-insensitive. This will work no matter how many body tags you have
</body>
</BODY>
</BoDY>
</body><!--This one's selected-->
You said you were using JavaScript which can be used as:
yourString.match(/^[\S\s]+(<\/body>)/i)[1];
.match
works fine when you don't have the g
flag. To further explain this RegExp
Explanation
^
Matches it at the beginning of the whole string because we don't have them
flag
[\S\s]+
will match everything until the following. The+
can be replaced by a*
(<\/body>)
will get the body tag after the previous (the last one) and add it as a match
i
thei
flag makes the string case-insensitive (remove if you want it to be case sensitive)
JavaScript appendChild
If you have multiple body tags, you can still add an element before it.
var elem = document.createElement('div');
elem.setAttribute('id', 'mydiv');
elem.innerHTML = 'Foo';
Now, elem
can be added in multiple ways:
1:
window.document.body.appenedChild(elem);
2:
var body_elems = document.getElementsByTagName('body');
body_elems[body_elems.length - 1].appendChild(elem);
Use
/(.|[\r\n])*(<\/body>)/mi
as a regexp. Capture group is $2.
This exploits greedy matching in connection with the multiline option. Note that the 'any char' symbol does not match newlines/carriage returns, which thus need explicit referral.
The regex to match the last body tag is fairly simple:
/[\s\S]*(</body>)/i
What this does is match as many possible of any character (more specifically, any whitespacespace or anything that's not whitespace) before </body>
.
The i
flag means that it'll match any case for </body>
, so anything like:
</body>
</BODY>
</BodY>
Will all match.
I used [\s\S]
instead of .
because .
matches everything but the newline operators, which probably isn't what you want. \s
matches all whitespace -- spaces, tabs, every kind of newline -- and \S
is equivalent to [^\s]
, so it matches everything that isn't whitespace. Together, these match every possible character. I'd imagine a similar thing is possible with \w\W
, \d\D
, etc., but \s\S
is my preference.
本文标签: javascriptRegex find last body tagStack Overflow
版权声明:本文标题:javascript - Regex find last body tag - Stack Overflow 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://it.en369.cn/questions/1745629118a2160051.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论