# PhantomJS
*Deprecated*
PhantomJS is being replaced by Puppeteer.
PhantomJS (opens new window) is a headless web browser scriptable with JavaScript. We use it at Marfeel to run the whiteCollar which extracts and organises section pages.
This article describes all the options to run it in local, and the error codes it can generate. Those error codes are visible both in a local environment during development, and in the documents visible in Kibana, under the invalidations index.
WARNING
PhantomJS doesn't support ES2015+ JavaScript.
# Command line
The script running PhantomJS is mrf-phantomjs, which implementation is in MarfeelXP/Jinks (opens new window).
If you're not using one of the options to specify the section to extract, this command always extracts the home section.
TIP
You can always run mrf-phantomjs -h from anywhere in the console, in order to see all the options.
Help output:
mrf-phantomjs [-h] [-e | -p SECTIONNUM | -n SECTIONNAME] [-b SUBFOLDER]
[-g PAGENUM] [-w WCPATH] [-u URL] [-s USERAGENT]
[-m METADATA] [-l] [-d] [-v]
optional arguments:
-h, --help show this help message and exit
-e, --extract extract manually
-p SECTIONNUM, --sectionNum SECTIONNUM
the number of the section you want to extract
-n SECTIONNAME, --sectionName SECTIONNAME
the name of the section you want to extract
-b SUBFOLDER, --subfolder SUBFOLDER
the subfolder of the tenant
-g PAGENUM, --pageNum PAGENUM
the number of the page you want to extract
-w WCPATH, --wcPath WCPATH
the path of the whiteCollar script
-u URL, --url URL the url of the page you want to extract
-s USERAGENT, --useragent USERAGENT
the useragent you want to use
-m METADATA, --metadata METADATA
the metadataProviders to use
-l, --legacy use legacy alibaba in order to support apiOrder,
layout and disableSortByRelevance
-d, --debug debug in safari at localhost:9001
-v, --verbose show traceback logs
# Exit codes
To identify any issues or controlled errors that arise from this extraction, Marfeel has the following set of exit codes to pinpoint the error that occurred.
3fail loading page from arg[1] Indicates that the page cannot be loaded where Phantom is trying to extract items. That is, the URL is not valid.4failinjectingWhiteCollar script The page is loading correctly, however the whiteCollar being used is not found.5no items found on loaded page No items could be found on the page but the body size is greater than 1KB.7Tenant whiteCollar script not present The whiteCollar has a bad configuration.8Tenant whiteCollar script malformed The whiteCollar is not configured correctly.11whiteCollar script failed to extract items whiteCollar failed to extract and format items for Marfeelization.12MetadataProvider error Extracting active MetadataProviders failed or, MetadataProviders were not added in the metadataProvider JS files.13Redirection response without redirectURL Phantomjs is unable handle the redirection.14Fail loading request from page When a page request a resource it fails16Redirection loop encountered The URL is a redirection loop, and will never end up having a valid 200 response.17Client timeout Phantomjs could not retrieve the answer from the tenant because it took too long.18Empty Body We throw this error when the body of the section retrieved has less than 1KB.19Redirect found Exit code thrown when a redirect is found. Only used on consumer profile by now.255Something unexpected that cannot be identified occurred.
# Debugging
In order to debug in details PhantomJS's behaviour, you can use Safari browser.
You might need to do so in the following cases:
- Missing or malformed properties across the items(articles) extracted. For instance if we have a wrong uri selector no articles will appear.
- To test in the real DOM phantom is working (without tenant's js execution)
- To spot bugs in the functions created in the whiteCollar.
- From the console, plac eyourself inside the tenant's repository and execute phantom with the
-doption:
cd www.example.com
mrf-phantomjs -d -pX<section number>
- Open safari browser and go to localhost:9000
- In the browser console, type:
document.marfeel.alibaba.execute()in order to start the extraction process.
WARNING
At this point, the whiteCollar was already executed once, including any setup function it contains.
Running it again may create duplicated content.