Skip to content Skip to sidebar Skip to footer

Retrieved Anchors List Gets Corrupted?

I am trying to analyze anchor links ( their text property ) in PhantomJS. The retrieval happens here: var list = page.evaluate(function() { return document.getElementsByTagName('

Solution 1:

The current version of phantomjs permits only primitive types (boolean, string, number, [] and {}) to pass to and from the page context. So essentially all functions will be stripped and that is what DOM elements are. t.niese found the quote from the docs:

Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.

Closures, functions, DOM nodes, etc. will not work!

You need to do a part of the work inside of the page context. If you want the innerText property of every node, then you need to map it to a primitive type first:

var list = page.evaluate(function() {
    returnArray.prototype.map.call(document.getElementsByTagName('a'), function(a){
        return a.innerText;
    });
});
console.log(list[0]); // innerText

You can of course map multiple properties at the same time:

returnArray.prototype.map.call(document.getElementsByTagName('a'), function(a){
    return { text: a.innerText, href: a.href };
});

Post a Comment for "Retrieved Anchors List Gets Corrupted?"