admin管理员组

文章数量:1025509

I know that a parser would best be suited for this situation but in my current situation it has to be just straight javascript.

I have a regex to find the closing body tag of an html doc.

var closing_body_tag = /(<\/body>)/i;

However, this fails when source has more than 1 body tag set. So I was thinking about going with something like this..

var last_closing_body_tag = /(<\/body>)$/gmi;

This works for the case when multiple tags are found, but for some reason it is failing on cases with just 1 set of tags.

Am I making a mistake that would cause mixed results for single tag cases?

Yes, I understand more than one body tag is incorrect, however, we have to handle all bad source.

I know that a parser would best be suited for this situation but in my current situation it has to be just straight javascript.

I have a regex to find the closing body tag of an html doc.

var closing_body_tag = /(<\/body>)/i;

However, this fails when source has more than 1 body tag set. So I was thinking about going with something like this..

var last_closing_body_tag = /(<\/body>)$/gmi;

This works for the case when multiple tags are found, but for some reason it is failing on cases with just 1 set of tags.

Am I making a mistake that would cause mixed results for single tag cases?

Yes, I understand more than one body tag is incorrect, however, we have to handle all bad source.

Share Improve this question edited Apr 24, 2015 at 15:16 Adam asked Apr 24, 2015 at 15:06 AdamAdam 3,6656 gold badges36 silver badges52 bronze badges 15
  • 7 And why would you have more than one body tag ? – adeneo Commented Apr 24, 2015 at 15:08
  • 1 Just curious. Why do you need to find the closing body tag? What are you going to do with that? – hindmost Commented Apr 24, 2015 at 15:09
  • 3 You don't need jQuery for parsing HTML. – Ram Commented Apr 24, 2015 at 15:09
  • 1 @Adam You don't need Regexp for that. Use DOM manipulation methods instead – hindmost Commented Apr 24, 2015 at 15:11
  • 1 document.body.appendChild inserts an element right before the closing tag. A regex does not ? – adeneo Commented Apr 24, 2015 at 15:13
 |  Show 10 more ments

4 Answers 4

Reset to default 3

You can use this regex:

  /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i

(?![\s\S]*<\/body>[\s\S]*$) is a lookahead that ensures there is no more closing body tag before the end of the string.

Here is a demo.

Sample code for adding a tag:

var re = /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i; 
var str = '<html>\n<body>\n</body>\n</html>\n<html>\n<body>\n</body>\n</html>';
var subst = '<tag/>'; 
var result = str.replace(re, subst);

RegExp

As I suggested in the ments, use:

/^[\S\s]+(<\/body>)/i

How

This will get all text (greedy) until the text </body> the flag i means case-insensitive. This will work no matter how many body tags you have

</body>
</BODY>
</BoDY>
</body><!--This one's selected-->

You said you were using JavaScript which can be used as:

yourString.match(/^[\S\s]+(<\/body>)/i)[1];

.match works fine when you don't have the g flag. To further explain this RegExp

Explanation

^ Matches it at the beginning of the whole string because we don't have the m flag

[\S\s]+ will match everything until the following. The + can be replaced by a *

(<\/body>) will get the body tag after the previous (the last one) and add it as a match

i the i flag makes the string case-insensitive (remove if you want it to be case sensitive)

JavaScript appendChild

If you have multiple body tags, you can still add an element before it.

var elem = document.createElement('div');
elem.setAttribute('id', 'mydiv');
elem.innerHTML = 'Foo';

Now, elem can be added in multiple ways:

1:

window.document.body.appenedChild(elem);

2:

var body_elems = document.getElementsByTagName('body');
body_elems[body_elems.length - 1].appendChild(elem);

Use

/(.|[\r\n])*(<\/body>)/mi

as a regexp. Capture group is $2.

This exploits greedy matching in connection with the multiline option. Note that the 'any char' symbol does not match newlines/carriage returns, which thus need explicit referral.

The regex to match the last body tag is fairly simple:

/[\s\S]*(</body>)/i

What this does is match as many possible of any character (more specifically, any whitespacespace or anything that's not whitespace) before </body>.

The i flag means that it'll match any case for </body>, so anything like:

</body>
</BODY>
</BodY>

Will all match.

I used [\s\S] instead of . because . matches everything but the newline operators, which probably isn't what you want. \s matches all whitespace -- spaces, tabs, every kind of newline -- and \S is equivalent to [^\s], so it matches everything that isn't whitespace. Together, these match every possible character. I'd imagine a similar thing is possible with \w\W, \d\D, etc., but \s\S is my preference.

I know that a parser would best be suited for this situation but in my current situation it has to be just straight javascript.

I have a regex to find the closing body tag of an html doc.

var closing_body_tag = /(<\/body>)/i;

However, this fails when source has more than 1 body tag set. So I was thinking about going with something like this..

var last_closing_body_tag = /(<\/body>)$/gmi;

This works for the case when multiple tags are found, but for some reason it is failing on cases with just 1 set of tags.

Am I making a mistake that would cause mixed results for single tag cases?

Yes, I understand more than one body tag is incorrect, however, we have to handle all bad source.

I know that a parser would best be suited for this situation but in my current situation it has to be just straight javascript.

I have a regex to find the closing body tag of an html doc.

var closing_body_tag = /(<\/body>)/i;

However, this fails when source has more than 1 body tag set. So I was thinking about going with something like this..

var last_closing_body_tag = /(<\/body>)$/gmi;

This works for the case when multiple tags are found, but for some reason it is failing on cases with just 1 set of tags.

Am I making a mistake that would cause mixed results for single tag cases?

Yes, I understand more than one body tag is incorrect, however, we have to handle all bad source.

Share Improve this question edited Apr 24, 2015 at 15:16 Adam asked Apr 24, 2015 at 15:06 AdamAdam 3,6656 gold badges36 silver badges52 bronze badges 15
  • 7 And why would you have more than one body tag ? – adeneo Commented Apr 24, 2015 at 15:08
  • 1 Just curious. Why do you need to find the closing body tag? What are you going to do with that? – hindmost Commented Apr 24, 2015 at 15:09
  • 3 You don't need jQuery for parsing HTML. – Ram Commented Apr 24, 2015 at 15:09
  • 1 @Adam You don't need Regexp for that. Use DOM manipulation methods instead – hindmost Commented Apr 24, 2015 at 15:11
  • 1 document.body.appendChild inserts an element right before the closing tag. A regex does not ? – adeneo Commented Apr 24, 2015 at 15:13
 |  Show 10 more ments

4 Answers 4

Reset to default 3

You can use this regex:

  /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i

(?![\s\S]*<\/body>[\s\S]*$) is a lookahead that ensures there is no more closing body tag before the end of the string.

Here is a demo.

Sample code for adding a tag:

var re = /<\/body>(?![\s\S]*<\/body>[\s\S]*$)/i; 
var str = '<html>\n<body>\n</body>\n</html>\n<html>\n<body>\n</body>\n</html>';
var subst = '<tag/>'; 
var result = str.replace(re, subst);

RegExp

As I suggested in the ments, use:

/^[\S\s]+(<\/body>)/i

How

This will get all text (greedy) until the text </body> the flag i means case-insensitive. This will work no matter how many body tags you have

</body>
</BODY>
</BoDY>
</body><!--This one's selected-->

You said you were using JavaScript which can be used as:

yourString.match(/^[\S\s]+(<\/body>)/i)[1];

.match works fine when you don't have the g flag. To further explain this RegExp

Explanation

^ Matches it at the beginning of the whole string because we don't have the m flag

[\S\s]+ will match everything until the following. The + can be replaced by a *

(<\/body>) will get the body tag after the previous (the last one) and add it as a match

i the i flag makes the string case-insensitive (remove if you want it to be case sensitive)

JavaScript appendChild

If you have multiple body tags, you can still add an element before it.

var elem = document.createElement('div');
elem.setAttribute('id', 'mydiv');
elem.innerHTML = 'Foo';

Now, elem can be added in multiple ways:

1:

window.document.body.appenedChild(elem);

2:

var body_elems = document.getElementsByTagName('body');
body_elems[body_elems.length - 1].appendChild(elem);

Use

/(.|[\r\n])*(<\/body>)/mi

as a regexp. Capture group is $2.

This exploits greedy matching in connection with the multiline option. Note that the 'any char' symbol does not match newlines/carriage returns, which thus need explicit referral.

The regex to match the last body tag is fairly simple:

/[\s\S]*(</body>)/i

What this does is match as many possible of any character (more specifically, any whitespacespace or anything that's not whitespace) before </body>.

The i flag means that it'll match any case for </body>, so anything like:

</body>
</BODY>
</BodY>

Will all match.

I used [\s\S] instead of . because . matches everything but the newline operators, which probably isn't what you want. \s matches all whitespace -- spaces, tabs, every kind of newline -- and \S is equivalent to [^\s], so it matches everything that isn't whitespace. Together, these match every possible character. I'd imagine a similar thing is possible with \w\W, \d\D, etc., but \s\S is my preference.

本文标签: javascriptRegex find last body tagStack Overflow