# JSOUP
Content groups in JSOUP
At the moment, Jsoup ripper does not support content groups.
TIP
JSOUP performs better than WhiteCollarRipper, therefore it should be used when possible. If a section doesn't need content groups, try extracting it using JSOUP.
To use the Jsoup ripper for a specific section, use the feedRipper attribute in the section configuration.
"sectionDefinitions" : [ {
"name" : "seo", --> the name of the section
"title" : "Seo", --> the title of the section
"feedDefinitions" : [ {
"uri" : "https://example.com/seo",
"alibabaDefinition" : {
"configuration" : {
"feedRipper" : "jsoupRipper",
"jsoupSelectors" : "index/src/jsoup/seo.properties"
}
}
} ]
}]
jsoupSelectors
defines the path of the file containing the articles selectors.
This file must be in under src/jsoup/
in the site code repository.
TIP
The properties
file extension (opens new window) is mainly used in Java to store configurable parameters
The .properties
file functions as a whitecollar where all the selectors for the section need to be identified and defined to be extracted.
The following is a usage example showcased in the previous step above:
ARTICLES=article, .article
TITLE=.title
URI=a
IMG=img
DATE=date
AUTHOR=.author
EXCERPT=.excerpt
SUBTITLE=.subtitle
TIP
As you can see, multiple selectors can be concatenated by commas.
# Static content with JSOUP
We can retrieve static element using the the special selector HTML_ARTICLES
in our .properties file.
HTML_ARTICLES=.static-content
HTML_ARTICLES
This is a special extractor. It returns all the HTML
inside the selector as static content.
Use it in the layout descriptor with the value jsoupWidget
.
Layout descriptor example for this section:
{
"layouts": [
{
"name": "newspaper/pill",
"key": "jsoupWidget"
},
"newspaper/thumb"
]
}
In this case, the static content is placed on top of the section, and all the articles are after it.
← Glossary WhiteCollar →