admin管理员组文章数量:1023827
I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.
To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0]
in order to extract the text I want from the element with class name "critic_consensus".
I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:
var xhr = new XMLHttpRequest();
xhr.onload = function() {
alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "/",true);
xhr.responseType = "document";
xhr.send();
It shows this error message when I run it in Firefox Scratchpad:
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at /. This can be fixed by moving the resource to the same domain or enabling CORS.
PS. The reason why I don't use the Rotten Tomatoes API is that they've removed the critics consensus from it.
I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.
To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0]
in order to extract the text I want from the element with class name "critic_consensus".
I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:
var xhr = new XMLHttpRequest();
xhr.onload = function() {
alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "http://www.rottentomatoes./m/godfather/",true);
xhr.responseType = "document";
xhr.send();
It shows this error message when I run it in Firefox Scratchpad:
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://www.rottentomatoes./m/godfather/. This can be fixed by moving the resource to the same domain or enabling CORS.
PS. The reason why I don't use the Rotten Tomatoes API is that they've removed the critics consensus from it.
- 2 What is not-working? What error do you get? – Bergi Commented Nov 5, 2014 at 19:20
- 2 No error message inside Firefox's Scratchpad. After seeing Igor Barinov's reply, I checked the Firefox Web Console and that's where appears the error message he mentioned. I added the error message to my question. – darkred Commented Nov 5, 2014 at 19:52
- I edited my answer with new idea, give it a try! – Igor Barinov Commented Nov 5, 2014 at 20:38
3 Answers
Reset to default 5For cross-origin requests, where the fetched site has not helpfully set a permissive CORS policy, Greasemonkey provides the GM_xmlhttpRequest()
function. (Most other userscript engines also provide this function.)
GM_xmlhttpRequest
is expressly designed to allow cross-origin requests.
To get your target information create a DOMParser
on the result. Do not use jQuery methods as this will cause extraneous images, scripts and objects to load, slowing things down, or crashing the page.
Here's a plete script that illustrates the process:
// ==UserScript==
// @name _Parse Ajax Response for specific nodes
// @include http://stackoverflow./questions/*
// @require http://ajax.googleapis./ajax/libs/jquery/2.1.0/jquery.min.js
// @grant GM_xmlhttpRequest
// ==/UserScript==
GM_xmlhttpRequest ( {
method: "GET",
url: "http://www.rottentomatoes./m/godfather/",
onload: function (response) {
var parser = new DOMParser ();
/* IMPORTANT!
1) For Chrome, see
https://developer.mozilla/en-US/docs/Web/API/DOMParser#DOMParser_HTML_extension_for_other_browsers
for a work-around.
2) jQuery.parseHTML() and similar are bad because it causes images, etc., to be loaded.
*/
var doc = parser.parseFromString (response.responseText, "text/html");
var criticTxt = doc.getElementsByClassName ("critic_consensus")[0].textContent;
$("body").prepend ('<h1>' + criticTxt + '</h1>');
},
onerror: function (e) {
console.error ('**** error ', e);
},
onabort: function (e) {
console.error ('**** abort ', e);
},
ontimeout: function (e) {
console.error ('**** timeout ', e);
}
} );
The problem is: XMLHttpRequest cannot load http://www.rottentomatoes./m/godfather/. No 'Access-Control-Allow-Origin' header is present on the requested resource.
Because you are not the owner of the resource you can not set up this header.
What you can do is set up a proxy on heroku which will proxy all requests to rottentomatoes web site Here is a small node.js proxy https://gist.github./igorbarinov/a970cdaf5fc9451f8d34
var https = require('https'),
http = require('http'),
util = require('util'),
path = require('path'),
fs = require('fs'),
colors = require('colors'),
url = require('url'),
httpProxy = require('http-proxy'),
dotenv = require('dotenv');
dotenv.load();
var proxy = httpProxy.createProxyServer({});
var host = "www.rottentomatoes.";
var port = Number(process.env.PORT || 5000);
process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0";
var server = require('http').createServer(function(req, res) {
// You can define here your custom logic to handle the request
// and then proxy the request.
var path = url.parse(req.url, true).path;
req.headers.host = host;
res.setHeader("Access-Control-Allow-Origin", "*");
proxy.web(req, res, {
target: "http://"+host+path,
});
}).listen(port);
proxy.on('proxyRes', function (res) {
console.log('RAW Response from the target', JSON.stringify(res.headers, true, 2));
});
util.puts('Proxying to '+ host +'. Server'.blue + ' started '.green.bold + 'on port '.blue + port);
I modified https://github./massive/firebase-proxy/ code for this
I published proxy on http://peaceful-cove-8072.herokuapp./ and on http://peaceful-cove-8072.herokuapp./m/godfather you can test it
Here is a gist to test http://jsfiddle/uuw8nryy/
var xhr = new XMLHttpRequest();
xhr.onload = function() {
alert(this.responseXML.getElementsByClassName(critic_consensus)[0]);
}
xhr.open("GET", "http://peaceful-cove-8072.herokuapp./m/godfather",true);
xhr.responseType = "document";
xhr.send();
The JavaScript same origin policy prevents you from accessing content that belongs to a different domain.
The above reference also gives you four techniques for relaxing this rule (CORS being one of them).
I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.
To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0]
in order to extract the text I want from the element with class name "critic_consensus".
I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:
var xhr = new XMLHttpRequest();
xhr.onload = function() {
alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "/",true);
xhr.responseType = "document";
xhr.send();
It shows this error message when I run it in Firefox Scratchpad:
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at /. This can be fixed by moving the resource to the same domain or enabling CORS.
PS. The reason why I don't use the Rotten Tomatoes API is that they've removed the critics consensus from it.
I want to make a Greasemonkey script that, while you are in URL_1, the script parses the whole HTML web page of URL_2 in the background in order to extract a text element from it.
To be specific, I want to download the whole page's HTML code (a Rotten Tomatoes page) in the background and store it in a variable and then use getElementsByClassName[0]
in order to extract the text I want from the element with class name "critic_consensus".
I've found this in MDN: HTML in XMLHttpRequest so, I ended up in this unfortunately non-working code:
var xhr = new XMLHttpRequest();
xhr.onload = function() {
alert(this.responseXML.getElementsByClassName(critic_consensus)[0].innerHTML);
}
xhr.open("GET", "http://www.rottentomatoes./m/godfather/",true);
xhr.responseType = "document";
xhr.send();
It shows this error message when I run it in Firefox Scratchpad:
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://www.rottentomatoes./m/godfather/. This can be fixed by moving the resource to the same domain or enabling CORS.
PS. The reason why I don't use the Rotten Tomatoes API is that they've removed the critics consensus from it.
- 2 What is not-working? What error do you get? – Bergi Commented Nov 5, 2014 at 19:20
- 2 No error message inside Firefox's Scratchpad. After seeing Igor Barinov's reply, I checked the Firefox Web Console and that's where appears the error message he mentioned. I added the error message to my question. – darkred Commented Nov 5, 2014 at 19:52
- I edited my answer with new idea, give it a try! – Igor Barinov Commented Nov 5, 2014 at 20:38
3 Answers
Reset to default 5For cross-origin requests, where the fetched site has not helpfully set a permissive CORS policy, Greasemonkey provides the GM_xmlhttpRequest()
function. (Most other userscript engines also provide this function.)
GM_xmlhttpRequest
is expressly designed to allow cross-origin requests.
To get your target information create a DOMParser
on the result. Do not use jQuery methods as this will cause extraneous images, scripts and objects to load, slowing things down, or crashing the page.
Here's a plete script that illustrates the process:
// ==UserScript==
// @name _Parse Ajax Response for specific nodes
// @include http://stackoverflow./questions/*
// @require http://ajax.googleapis./ajax/libs/jquery/2.1.0/jquery.min.js
// @grant GM_xmlhttpRequest
// ==/UserScript==
GM_xmlhttpRequest ( {
method: "GET",
url: "http://www.rottentomatoes./m/godfather/",
onload: function (response) {
var parser = new DOMParser ();
/* IMPORTANT!
1) For Chrome, see
https://developer.mozilla/en-US/docs/Web/API/DOMParser#DOMParser_HTML_extension_for_other_browsers
for a work-around.
2) jQuery.parseHTML() and similar are bad because it causes images, etc., to be loaded.
*/
var doc = parser.parseFromString (response.responseText, "text/html");
var criticTxt = doc.getElementsByClassName ("critic_consensus")[0].textContent;
$("body").prepend ('<h1>' + criticTxt + '</h1>');
},
onerror: function (e) {
console.error ('**** error ', e);
},
onabort: function (e) {
console.error ('**** abort ', e);
},
ontimeout: function (e) {
console.error ('**** timeout ', e);
}
} );
The problem is: XMLHttpRequest cannot load http://www.rottentomatoes./m/godfather/. No 'Access-Control-Allow-Origin' header is present on the requested resource.
Because you are not the owner of the resource you can not set up this header.
What you can do is set up a proxy on heroku which will proxy all requests to rottentomatoes web site Here is a small node.js proxy https://gist.github./igorbarinov/a970cdaf5fc9451f8d34
var https = require('https'),
http = require('http'),
util = require('util'),
path = require('path'),
fs = require('fs'),
colors = require('colors'),
url = require('url'),
httpProxy = require('http-proxy'),
dotenv = require('dotenv');
dotenv.load();
var proxy = httpProxy.createProxyServer({});
var host = "www.rottentomatoes.";
var port = Number(process.env.PORT || 5000);
process.env.NODE_TLS_REJECT_UNAUTHORIZED = "0";
var server = require('http').createServer(function(req, res) {
// You can define here your custom logic to handle the request
// and then proxy the request.
var path = url.parse(req.url, true).path;
req.headers.host = host;
res.setHeader("Access-Control-Allow-Origin", "*");
proxy.web(req, res, {
target: "http://"+host+path,
});
}).listen(port);
proxy.on('proxyRes', function (res) {
console.log('RAW Response from the target', JSON.stringify(res.headers, true, 2));
});
util.puts('Proxying to '+ host +'. Server'.blue + ' started '.green.bold + 'on port '.blue + port);
I modified https://github./massive/firebase-proxy/ code for this
I published proxy on http://peaceful-cove-8072.herokuapp./ and on http://peaceful-cove-8072.herokuapp./m/godfather you can test it
Here is a gist to test http://jsfiddle/uuw8nryy/
var xhr = new XMLHttpRequest();
xhr.onload = function() {
alert(this.responseXML.getElementsByClassName(critic_consensus)[0]);
}
xhr.open("GET", "http://peaceful-cove-8072.herokuapp./m/godfather",true);
xhr.responseType = "document";
xhr.send();
The JavaScript same origin policy prevents you from accessing content that belongs to a different domain.
The above reference also gives you four techniques for relaxing this rule (CORS being one of them).
本文标签:
版权声明:本文标题:javascript - How to use XMLHttpRequest to download an HTML page in the background and extract a text element from it? - Stack Ov 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://it.en369.cn/questions/1745597498a2158258.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论