# WC library
The WhiteCollar library functions can be used either as modifier functions, or anywhere in the whiteCollar where they are useful.
Whenever you need a custom behavior not covered by the library yet, evaluate if it makes sense to add it for everybody.
The library code is in Gutenberg (opens new window).
# getAllProcessedNodesFromGetters
WC.getAllProcessedNodesFromGetters(Array getters, Array globalModifiers)
Receives an array of getter
functions and an array of modifiers
.
Applies the modifiers to all items selected by getters
.
# limitArticles
WC.limitArticles(Number limit)
Limits the number of the extracted articles according to the specified number.
# filterEqConsecutiveArticles
WC.filterEqConsecutiveArticles(String propertyName [default 'uri'])
Filters all the articles which are consecutives and have the same property value as the one specified as the argument.
If no argument is specified, it defaults to URI.
# uniqueBy
WC.uniqueBy(String propertyName)
Filters all the articles which have the same property value as the one specified as the argument.
If no argument is specified, it defaults to URI.
# applyBlacklist
WC.applyBlacklist(Array blacklistedStrings)
Filters out all the items by URI which contains any of the strings specified in the blacklist
Array.
# contains
WC.contains(Array or String container)(Any content)
The curried function (opens new window) checks if content
is contained by container
.
For example, to check if the URI is part of the "awesome" subdomain:
WC.contains(item.uri)("awesome.");
# getHref
WC.getHref(Node node)
Returns the attribute href
from the node.
# getSrc
WC.getSrc(Node node)
Returns the attribute src
from the node.
# getAlt
WC.getAlt(Node node)
Returns the attribute alt
from the node, if not found fallbacks to title attribute.
# getLazyImg
WC.getLazyImg(String selector, String attribute, String altAttribute)
Retrieves the image for lazy-loading images. Includes source and alt.
The selector
is used as argument in a call to qs
.
The attribute
is the element attribute where the image source is stored. It defaults to data-src
.
The altAttribute
is the element attribute where the image alt is stored. Defaults to alt
, then fallbacks to title
.
# getSectionName
WC.getSectionName(function extractor)
Applies the extractor
function to the current page URI, and returns a String.
If no argument is specified, the default extractor takes the first string of the pathname.
For example:
// current page URI: "http://example.tenant.com/sports/and/others
var sectionName = WC.getSectionName(); // sectioName will be "sports"
# getPageNumber
WC.getPageNumber(function extractor)
Applies the extractor
function to the current page URI, and returns a String.
If no argument is specified, the default extractor takes the number from '/page/(number)'. If that pattern doesn't exist in the current URI, it returns 1
.
For example:
// current page URI: "http://example.tenant.com/home/page/3
var pageNumber = WC.getPageNumber(); // pageNumber will be 3
# convertToArray
WC.convertToArray(Object arrayLike)
Converts array-like objects to Array.
# filterFalsy
WC.filterFalsy(Array content)
Filters out all items of an array which are falsy values (''
, 0
, NaN
, null
...).
# getText
WC.getText(Node node)
Returns the text content of a node, trimmed.
# qs
WC.qs(String selector, Node node[default document])
Equivalent to the standard querySelector
, starting from node
if provided, or from document
otherwise. Returns the first matching element.
# qsAll
WC.qsAll()
Equivalent to the standard querySelector
, starting from node
if provided, or from document
otherwise. Returns an array of all matching elements.
# merge
WC.merge(obj1, obj2)
Equivalent to the Rambda merge R.merge
.
Creates a new object with the own properties of the first object merged with the own properties of the second object. If a key exists in both objects, the value from the second object will be used.
# create3piWidget
WC.create3piWidget(String className, Object options)
Creates a 3pi widget in Mosaic. The parameters are the following:
className
: It's the class that will have the parent element of the widget iframe.options
: It's a Json object with different parameters:src
: widget source. The same as in thewidgets.json
file. This is mandatory.selector
: widget selector. The same as in thewidgets.json
. This is also mandatory and must be a class, not an id.width
: iframe width. If there is no width, the default value will be 100%.height
: iframe height. If there is no width, the default value will be auto.params
: Json object with the parameters needed.
# getBalcon
(deprecated)
Builds the pocket object for a content group.
This method is deprecated, in favour of only defining a key
in the whiteCollar pocket.
Use the layout descriptor for the layout-related configuration of the content group.
# getBalconKey
(only puppeteer ripper)
Returns the key for the specified content group.
It uses the deprecated logic of getBalcon to retrieve the content group from the node
and returns the previous balcon.name
.
This method is compatible with the layout descriptor approach.
Example:
pocket: (node) => {
key: WC.getBalconKey(node, '.news', 'h1')
}
# getBestSrcFromSrcSet
WC.getBestSrcFromSrcSet(String srcset)
Returns the best src of an image closer to 480px of width.
Example:
"elva-fairy-320w.jpg 320w, elva-fairy-480w.jpg 480w, elva-fairy-800w.jpg 800w"
-> it will return elva-fairy-480w.jpg
srcset
: It's the srcset string from the img.
# difference
WC.difference(Array first, Array second)
From R.difference (opens new window).
Finds the set (i.e. no duplicates) of all elements in the first list not contained in the second list. Objects and Arrays are compared in terms of value equality, not reference equality.
WC.difference([1,2,3,4], [7,6,5,4,3]); //=> [1,2]
# notExtractableIf
WC.notExtractableIf(Function checker)
Runs the checker
function for each item and marks them as NOT extractable if the function returns true.
Should be used inside the modifiers array.
...
modifiers: [WC.notExtractableIf(function (item) {
return item.title.indexOf('test') > -1;
}]
...
# getUniqueUri
WC.getUniqueUri()
Returns a uri made from the current href and a random query parameter. https://page.com/?marfeelqp=1234567
Useful when you want to create a dummy item for an iframe/widget
# getInnerHtml
Dangerous
We should be very sure about the html that we're extracting as injecting it can break the whole section page.
WC.getInnerHtml(HTMLNode node)
Shortcut function to get the innerHTML from a node.
# extractIframe
(depracated)
WC.extractIframe(Object pocket, String title)
Returns the selectors to extract an iframe for WC. We should use layoutDescriptor to insert iframes into mosaic.
...
{
title: 'h2 > a',
uri: 'h2 > a',
...
},
extractIframe({}, 'Top results of Tennis table'),
{
title: '.title',
uri: '.link > a',
...
},
...
# cleanBalconName
WC.cleanBalconName(String name)
Cleans not valid characters from the name. The only allowed characters are a-z
and 0-9
# capitalize
WC.capitalize(String string)
Transforms the first character of a string to upper case.
# containsClass
WC.containsClass(HTMLNode element, String className)
Returns true if the element class list contains the className defined.
# containsId
WC.containsId(HTMLNode element, String id)
Returns true if the element.id is equal to the id.
# encodeToLatinAlfabet
WC.encodeToLatinAlfabet(String string)
Applies the encodeUri (opens new window) javascript function and then applies cleanBalconName.
Very useful for content group names that have tildes or asiatic characters.
# prefferedImgAttribute
(only puppeteer ripper)
modifiers: [WC.prefferedImgAttribute(attributeName)]
Prioritizes the attribute specified when obtaining the image using the default media extractor.
Example:
selector: '.news-article'
extractors: {
uri: 'h2 > a',
title: 'h2 > a',
media: 'img'
},
modifiers: [WC.prefferedImgAttribute('data-image-url')]
It will try to get the media from the attribute data-image-url
from the IMG node selected.
Call to contributions
Not all WC library methods are described yet.
Open an issue on MarfeelDocs (opens new window) if you know what a function does, or directly open a PR to update this file (opens new window).
Missing functions:
- createGroups,
- removeUnwantedNodes,
- createSectionStaticContent,
- createSectionHtmlContent