node.js — using cheerio.js to find all script elements in a page

Finding <script> nodes in a page

Why.. why? Just because it’s useful when pages had dynamic content in javascript. Is there a way to subsequently evaluate the javascript parsed.. that’s for another article, but for now, I’m going to assume you have node.js installed, and you have at least come idea of how to use it.

The idea

Finding all the <script> nodes in an HTML page, rendered using

‘request.get()’

.

In the example, url (in this case www.amazon.com) is resolved and the HTML loaded. The loaded HTML is then passed to cheerio using this expression:

var $ = cheerio.load(html,{ normalizeWhitespace: false, xmlMode: false, decodeEntities: true });

.. then iterated upon using the .each( ..) object method.

$(‘script’).each( function () {…

In the very simple example the follows the script is logged to the console (STDOUT) for display. In an more advanced and useful implementation, the returned javascript would be interacted with, parsed or some other action taken.

The Script

// MAKE REQUIREMENTS
var request = require(‘request’);
var cheerio = require(‘cheerio’);

// Local Vars
var url = ‘https://www.amazon.com’;

// Define the requests default params
var request = request.defaults({
jar: true,
headers: { ‘User-Agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Firefox/24.0’ },
})

// execute request and parse all the javascript blocks
request(formUrl, function (error, response, html) {
if (!error && response.statusCode == 200) {

// load the html into cheerio
var $ = cheerio.load(html,{ normalizeWhitespace: false, xmlMode: false, decodeEntities: true });

// iterate on all of the JS blocks in the page
$(‘script’).each( function () {
console.log(‘JS: %s’,$(this).text());
});
}
else {
console.log(‘ERR: %j\t%j’,error,response.statusCode);
}
});

End

One thought on “node.js — using cheerio.js to find all script elements in a page”

  1. David,
    You mentioned there is a way to run JS code within cheerio. I am being vexed by exactly this problem. Have you ever written a post about it?

    Thanks,

    Chao

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.