admin管理员组文章数量:1022997
I have an HTML document that might have <
and >
in some of the attributes. I am trying to extract this and run it through an XSLT, but the XSLT engine errors telling me that <
is not valid inside of an attribute.
I did some digging, and found that it is properly escaped in the source document, but when this is loaded into the DOM via innerHTML
, the DOM is unencoding the attributes. Strangely, it does this for <
and >
, but not some others like &
.
Here is a simple example:
var div = document.createElement('DIV');
div.innerHTML = '<div asdf="<50" fdsa="&50"></div>';
console.log(div.innerHTML)
I have an HTML document that might have <
and >
in some of the attributes. I am trying to extract this and run it through an XSLT, but the XSLT engine errors telling me that <
is not valid inside of an attribute.
I did some digging, and found that it is properly escaped in the source document, but when this is loaded into the DOM via innerHTML
, the DOM is unencoding the attributes. Strangely, it does this for <
and >
, but not some others like &
.
Here is a simple example:
var div = document.createElement('DIV');
div.innerHTML = '<div asdf="<50" fdsa="&50"></div>';
console.log(div.innerHTML)
I'm assuming that the DOM implementation decided that HTML attributes can be less strict than XML attributes, and that this is "working as intended". My question is, can I work around this without writing some horrible regex replacement?
Share Improve this question asked Oct 6, 2015 at 15:31 murrayjumurrayju 1,80218 silver badges21 bronze badges 4-
@Abel I am using jQuery's
.html()
, I just attempted to reduce down to where I think the "problem" is occurring. The source document is XML, which I run through a browser XSLT before inserting with.html()
. Later I take it through the inverse process to get the XML back out. I just find it strange that the DOM is unescaping this character (and not others). – murrayju Commented Oct 6, 2015 at 16:13 -
I can't modify the source XML, and need to preserve the same content in the output at the end. I could run whatever transforms are necessary in the middle, but am looking for a way to do it better than some regex replace. Especially considering the character is
<
, which the document is full of. – murrayju Commented Oct 6, 2015 at 16:16 -
@Abel my only goal is to get it back out of the DOM the same way it went in (as
<
). I'm putting it in with.text(string)
and getting it out with.text()
. The problem I have with this round-trip is that the input doesn't equal the output (only in this case). – murrayju Commented Oct 6, 2015 at 16:40 -
Ah, sorry. Well, that is probably only possible with other DOM methods, not with
innerHTML
. I.e., this works:div.firstChild.attributes['title']
. But this requires a whole lot extra machinery to "mimic" innerHTML. – Abel Commented Oct 6, 2015 at 16:45
4 Answers
Reset to default 2Try XMLSerializer:
var div = document.getElementById('d1');
var pre = document.createElement('pre');
pre.textContent = div.outerHTML;
document.body.appendChild(pre);
pre = document.createElement('pre');
pre.textContent = new XMLSerializer().serializeToString(div);
document.body.appendChild(pre);
<div id="d1" data-foo="a < b && b > c">This is a test</div>
You might need to adapt the XSLT to take account of the XHTML namespace XMLSerializer inserts (at least here in a test with Firefox).
I am not sure if this is what you are looking but do have a look.
var div1 = document.createElement('DIV');
var div2 = document.createElement('DIV');
div1.setAttribute('asdf','<50');
div1.setAttribute('fdsa','&50');
div2.appendChild(div1);
console.log(div2.innerHTML.replace(/&/g, '&'));
What ended up working best for me was to double-escape these using an XSLT on the ining document (and reverse this on the outgoing doc).
So <
in an attribute bees &lt;
. Thanks to @Abel for the suggestion.
Here is the XSLT I added, in case others find it helpful:
First is a template for doing string replacements in XSLT 1.0. If you can use XSLT 2.0, you can use the built in replace
instead.
<xsl:template name="string-replace-all">
<xsl:param name="text"/>
<xsl:param name="replace"/>
<xsl:param name="by"/>
<xsl:choose>
<xsl:when test="contains($text, $replace)">
<xsl:value-of select="substring-before($text,$replace)"/>
<xsl:value-of select="$by"/>
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="substring-after($text,$replace)"/>
<xsl:with-param name="replace" select="$replace"/>
<xsl:with-param name="by" select="$by"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Next are the template that does the specific replacements that I need:
<!-- xml -> html -->
<xsl:template name="replace-html-codes">
<xsl:param name="text"/>
<xsl:variable name="lt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$text"/>
<xsl:with-param name="replace" select="'<'"/>
<xsl:with-param name="by" select="'&lt;'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="gt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$lt"/>
<xsl:with-param name="replace" select="'>'"/>
<xsl:with-param name="by" select="'&gt;'"/>
</xsl:call-template>
</xsl:variable>
<xsl:value-of select="$gt"/>
</xsl:template>
<!-- html -> xml -->
<xsl:template name="restore-html-codes">
<xsl:param name="text"/>
<xsl:variable name="lt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$text"/>
<xsl:with-param name="replace" select="'&lt;'"/>
<xsl:with-param name="by" select="'<'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="gt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$lt"/>
<xsl:with-param name="replace" select="'&gt;'"/>
<xsl:with-param name="by" select="'>'"/>
</xsl:call-template>
</xsl:variable>
<xsl:value-of select="$gt"/>
</xsl:template>
The XSLT is mostly a pass-through. I just call the appropriate template when copying attributes:
<xsl:template match="@*">
<xsl:attribute name="data-{local-name()}">
<xsl:call-template name="replace-html-codes">
<xsl:with-param name="text" select="."/>
</xsl:call-template>
</xsl:attribute>
</xsl:template>
<!-- copy all nodes -->
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
Several things worth mentioning that might help someone:
- Make sure that your HTML is truly valid, e.g. I was accidentally using
\
when I should have had/
and it caused this problem. - As the OP pointed out in the question, you can use
&
, so you might try e.g.&lt;
and&gt;
. - There are alternatives to
<
and>
that look similar. - There is an alternate way to express
<
and>
:<
and>
.
I have an HTML document that might have <
and >
in some of the attributes. I am trying to extract this and run it through an XSLT, but the XSLT engine errors telling me that <
is not valid inside of an attribute.
I did some digging, and found that it is properly escaped in the source document, but when this is loaded into the DOM via innerHTML
, the DOM is unencoding the attributes. Strangely, it does this for <
and >
, but not some others like &
.
Here is a simple example:
var div = document.createElement('DIV');
div.innerHTML = '<div asdf="<50" fdsa="&50"></div>';
console.log(div.innerHTML)
I have an HTML document that might have <
and >
in some of the attributes. I am trying to extract this and run it through an XSLT, but the XSLT engine errors telling me that <
is not valid inside of an attribute.
I did some digging, and found that it is properly escaped in the source document, but when this is loaded into the DOM via innerHTML
, the DOM is unencoding the attributes. Strangely, it does this for <
and >
, but not some others like &
.
Here is a simple example:
var div = document.createElement('DIV');
div.innerHTML = '<div asdf="<50" fdsa="&50"></div>';
console.log(div.innerHTML)
I'm assuming that the DOM implementation decided that HTML attributes can be less strict than XML attributes, and that this is "working as intended". My question is, can I work around this without writing some horrible regex replacement?
Share Improve this question asked Oct 6, 2015 at 15:31 murrayjumurrayju 1,80218 silver badges21 bronze badges 4-
@Abel I am using jQuery's
.html()
, I just attempted to reduce down to where I think the "problem" is occurring. The source document is XML, which I run through a browser XSLT before inserting with.html()
. Later I take it through the inverse process to get the XML back out. I just find it strange that the DOM is unescaping this character (and not others). – murrayju Commented Oct 6, 2015 at 16:13 -
I can't modify the source XML, and need to preserve the same content in the output at the end. I could run whatever transforms are necessary in the middle, but am looking for a way to do it better than some regex replace. Especially considering the character is
<
, which the document is full of. – murrayju Commented Oct 6, 2015 at 16:16 -
@Abel my only goal is to get it back out of the DOM the same way it went in (as
<
). I'm putting it in with.text(string)
and getting it out with.text()
. The problem I have with this round-trip is that the input doesn't equal the output (only in this case). – murrayju Commented Oct 6, 2015 at 16:40 -
Ah, sorry. Well, that is probably only possible with other DOM methods, not with
innerHTML
. I.e., this works:div.firstChild.attributes['title']
. But this requires a whole lot extra machinery to "mimic" innerHTML. – Abel Commented Oct 6, 2015 at 16:45
4 Answers
Reset to default 2Try XMLSerializer:
var div = document.getElementById('d1');
var pre = document.createElement('pre');
pre.textContent = div.outerHTML;
document.body.appendChild(pre);
pre = document.createElement('pre');
pre.textContent = new XMLSerializer().serializeToString(div);
document.body.appendChild(pre);
<div id="d1" data-foo="a < b && b > c">This is a test</div>
You might need to adapt the XSLT to take account of the XHTML namespace XMLSerializer inserts (at least here in a test with Firefox).
I am not sure if this is what you are looking but do have a look.
var div1 = document.createElement('DIV');
var div2 = document.createElement('DIV');
div1.setAttribute('asdf','<50');
div1.setAttribute('fdsa','&50');
div2.appendChild(div1);
console.log(div2.innerHTML.replace(/&/g, '&'));
What ended up working best for me was to double-escape these using an XSLT on the ining document (and reverse this on the outgoing doc).
So <
in an attribute bees &lt;
. Thanks to @Abel for the suggestion.
Here is the XSLT I added, in case others find it helpful:
First is a template for doing string replacements in XSLT 1.0. If you can use XSLT 2.0, you can use the built in replace
instead.
<xsl:template name="string-replace-all">
<xsl:param name="text"/>
<xsl:param name="replace"/>
<xsl:param name="by"/>
<xsl:choose>
<xsl:when test="contains($text, $replace)">
<xsl:value-of select="substring-before($text,$replace)"/>
<xsl:value-of select="$by"/>
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="substring-after($text,$replace)"/>
<xsl:with-param name="replace" select="$replace"/>
<xsl:with-param name="by" select="$by"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$text"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Next are the template that does the specific replacements that I need:
<!-- xml -> html -->
<xsl:template name="replace-html-codes">
<xsl:param name="text"/>
<xsl:variable name="lt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$text"/>
<xsl:with-param name="replace" select="'<'"/>
<xsl:with-param name="by" select="'&lt;'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="gt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$lt"/>
<xsl:with-param name="replace" select="'>'"/>
<xsl:with-param name="by" select="'&gt;'"/>
</xsl:call-template>
</xsl:variable>
<xsl:value-of select="$gt"/>
</xsl:template>
<!-- html -> xml -->
<xsl:template name="restore-html-codes">
<xsl:param name="text"/>
<xsl:variable name="lt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$text"/>
<xsl:with-param name="replace" select="'&lt;'"/>
<xsl:with-param name="by" select="'<'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="gt">
<xsl:call-template name="string-replace-all">
<xsl:with-param name="text" select="$lt"/>
<xsl:with-param name="replace" select="'&gt;'"/>
<xsl:with-param name="by" select="'>'"/>
</xsl:call-template>
</xsl:variable>
<xsl:value-of select="$gt"/>
</xsl:template>
The XSLT is mostly a pass-through. I just call the appropriate template when copying attributes:
<xsl:template match="@*">
<xsl:attribute name="data-{local-name()}">
<xsl:call-template name="replace-html-codes">
<xsl:with-param name="text" select="."/>
</xsl:call-template>
</xsl:attribute>
</xsl:template>
<!-- copy all nodes -->
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
Several things worth mentioning that might help someone:
- Make sure that your HTML is truly valid, e.g. I was accidentally using
\
when I should have had/
and it caused this problem. - As the OP pointed out in the question, you can use
&
, so you might try e.g.&lt;
and&gt;
. - There are alternatives to
<
and>
that look similar. - There is an alternate way to express
<
and>
:<
and>
.
本文标签: javascriptinnerHTML unencodes amplt in attributesStack Overflow
版权声明:本文标题:javascript - innerHTML unencodes &lt; in attributes - Stack Overflow 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://it.en369.cn/questions/1745555881a2155870.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论