node.js — using cheerio.js to find all script elements in a page

Finding <script> nodes in a page

Why.. why? Just because it’s useful when pages had dynamic content in javascript. Is there a way to subsequently evaluate the javascript parsed.. that’s for another article, but for now, I’m going to assume you have node.js installed, and you have at least come idea of how to use it.

The idea

Finding all the <script> nodes in an HTML page, rendered using

‘request.get()’

.

In the example, url (in this case www.amazon.com) is resolved and the HTML loaded. The loaded HTML is then passed to cheerio using this expression:

var $ = cheerio.load(html,{ normalizeWhitespace: false, xmlMode: false, decodeEntities: true });

.. then iterated upon using the .each( ..) object method.

$(‘script’).each( function () {…

In the very simple example the follows the script is logged to the console (STDOUT) for display. In an more advanced and useful implementation, the returned javascript would be interacted with, parsed or some other action taken.

The Script

// MAKE REQUIREMENTS
var request = require(‘request’);
var cheerio = require(‘cheerio’);

// Local Vars
var url = ‘https://www.amazon.com’;

// Define the requests default params
var request = request.defaults({
jar: true,
headers: { ‘User-Agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Firefox/24.0’ },
})

// execute request and parse all the javascript blocks
request(formUrl, function (error, response, html) {
if (!error && response.statusCode == 200) {

// load the html into cheerio
var $ = cheerio.load(html,{ normalizeWhitespace: false, xmlMode: false, decodeEntities: true });

// iterate on all of the JS blocks in the page
$(‘script’).each( function () {
console.log(‘JS: %s’,$(this).text());
});
}
else {
console.log(‘ERR: %j\t%j’,error,response.statusCode);
}
});

End

node.js — parse page title (simple example)

node.js — Toolkit of the Code Gods!!

Or, so some would have you believe. Is it pretty awesome, YES. I’m I sold on it yet, NO. But it’s growing on me.

Since parsing webpages has been my business for nearly 15 years now, I’ve used a lot of tools and strategies, but it was only recently I decided to try out node.js for a few of my projects.

Starting with node.js

If you are new to node.js, go check out these URLS here. They more than successfully cover getting started with node.

The goal here is to answer a question that for some reason eluded my best searches for code examples. I thought I had the syntax dialed but still saw some strange responses. This page will show you definitively how to get a page title. Every time (every time the page loads at least).

How I parsed the title off a page

Here is how I did it, using cheerio and request:

/*
* MAKE REQUIREMENTS
*/
var request = require(‘request’);
var cheerio = require(‘cheerio’);

/*
* Handle Commandline Params
*/
var url = process.argv[2];

/*
* Local Vars
*/
// Define the requests default params
var request = request.defaults({
headers: { ‘User-Agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Firefox/24.0’ },
})

// DO THE WORK!!
request(url, function (error, response, html) {
if (!error && response.statusCode == 200) {
var $ = cheerio.load(html,{ normalizeWhitespace: true, decodeEntities: true });
var title = $(‘title’).text();
console.log(“TITLE: %j”,title);
}
else {
console.log(‘ERR: %j\t%j’,error,response.statusCode);
}
});

Running the example from the command line looks like this (I’m using the 1st available parameter to pass my URL, hard-coding is for fools):

node get.title.js http://www.yahoo.com
TITLE: “Yahoo”

Conclusion

The reason I’ve posted this blog, is that this specific node.js cheerio syntax was not clearly specd:

var title = $(‘title’).text();

Enjoy toying around with node.js to parse your super-awesome pages.

Patching OSX against the ‘ShellShock’ exploit

While everyone waits for Apple to release a patch for the ShellShock bug, one of the maintainers of BASH assisted with detailing out how to patch BASH (and SH) on OSX to prevent the Vuln. This comes from the helpful Apple section of Stack Exchange.

NOTE: To perform this patch you MUST be granted sudo privs on your machine — if not you won’t be able to move the new files into the required location.

Testing to see if you are vulnerable

First things first.. see if you are vulnerable by checking your version of BASH. The desired version is this; GNU bash, version 3.2.54:
Screen Shot 2014-09-29 at 8.05.00 AM

If you are not seeing that, then you should check to see if you have the vuln. When I checked my updated version of OX Mavericks, I was on Bash 3.2.52 and it was vulnerable to the exploit.

If you see the word ‘vulnerable’ when you run this, your at risk!
env x='() { :;}; echo vulnerable' bash -c 'echo hello'

This is a PASS (OK):
env x='() { :;}; echo vulnerable' bash -c 'echo hello'
hello

This is a FAIL:
env x='() { :;}; echo vulnerable' bash -c 'echo hello'
vulnerable
hello

Time to get down to patching

This process is going to require you to do some command line work, namely compiling bash and replacing the bad versions with the good ones. If you are NOT comfortable do that.. best to wait for Apple to create the installable patch. If your geek level is above basic, continue forward:

First, agree to using xcodebuild
If you have no run xcodebuild, you are going to need to run it, then agree to the terms, before you’ll be able to finish this build. I recommend you run it NOW and get that out of the way:
xcodebuild

Set environment to NOT auto-include
This capability is part of the reason the exploit exists. It’s highly recommend you turn this on before starting the build. Ignore at your own peril. This parameter is used in the build stage for two patches:

export ADD_IMPORT_FUNCTIONS_PATCH=YES

Make a place to build the new objects
I dropped everything into the directory ‘new-bash’… and did it thus. NOTE: I am not using sudo, (yet)

mkdir new-bash

Download base-92 source
Move to that directory and download the the bash-92 source using good old curl and extract the compressed tarball:

cd new-bash
curl https://opensource.apple.com/tarballs/bash/bash-92.tar.gz | tar zxf -

Get the patch packages next
CD to the source directory for bash, and then download 2 patch packages:

cd bash-92/bash-3.2
curl https://ftp.gnu.org/pub/gnu/bash/bash-3.2-patches/bash32-052 | patch -p0
curl https://ftp.gnu.org/pub/gnu/bash/bash-3.2-patches/bash32-053 | patch -p0

Start creating the patches
Execute these two commands, in order two build and apply the two patches:

[ "$ADD_IMPORT_FUNCTIONS_PATCH" == "YES" ] && curl http://alblue.bandlem.com/import_functions.patch | patch -p0
[ "$ADD_IMPORT_FUNCTIONS_PATCH" == "YES" ] || curl https://ftp.gnu.org/pub/gnu/bash/bash-3.2-patches/bash32-054 | patch -p0

Start building!
Traverse back up the tree and start running the builds. It is recommended that you NOT run xcodebuild at this point. Doing so could enable root powers in the shell and that is something that you certainly do not want!

xcodebuild

OK.. PATCH MADE!
At this point you have a new bash and sh object build to replace the exploitable ones. Backup your old versions, move these into place and you are now safe.

# Test your versions:
build/Release/bash --version # you should see "version 3.2.54(1)-release"
build/Release/sh --version # you should see "version 3.2.54(1)-release"

# move the files into location
sudo mv /bin/bash /bin/bash.BAD
sudo mv /bin/sh /bin/sh.BAD
sudo mv build/Release/bash /bin
sudo mv build/Release/sh /bin

Now clean up the local mess
Now the local directory where you build bash is no longer needed. I don’t like to leave cruft around on my system that creates a confusing environment. Removing the source tree is my last task. You can leave it if you like, but if I need to do this again I’m going to perform a full fresh rebuild, so this will not be re-used.

cd
rm -rf new-bash

YOU ARE DONE!

BIG HUGE THANKS TO ALL THAT DID THE REAL WORK HERE.. the people maintaining bash, the people that post awesome solutions to StackExchange and all the other fantastic resources on the net!