# Section pagination
The section pagination feature replaces a section's feed definition, respecting the tenant's original website pagination.
It fits perfectly on any section paginated by the tenant, including the usual dynamic tags
or author
. Pagination improves SEO, as it leads to an increased crawling of sections pages (that end up in more articles crawled).
Infinite scroll pairs with section pagination in order to improve user engagement, by transparently appending more and more content on the section while scrolling.
Infinite scroll is not visible by search engines crawlers, therefore not affecting the SEO impact of pagination.
We must answer these two questions to paginate a section:
- Before the extraction, at page request time: is the requested url a section page? If so, from which section?
- At section extraction time: what are next and previous pages of this section?
Once we answer these questions, we can paginate anything.
# Required configuration
# Page pattern
We need a pagePattern
to recognise a section page.
A pagePattern
at Marfeel is a regular expression that reflects the way the tenant implements pagination.
For instance, if the tenant appends /page/2/
to a section url to say it is page #2, the pagePattern is "/page/([0-9]+)/"
.
- "[0-9]+" makes it work on any page number,
- the parenthesis allow to capture the actual number.
Once a page is identified as a section, rippers can look for clues in the DOM while extracting the section.
Selectors such as [.next, .previous]
help the ripper finding the right tags: the ones with an href
with link to previous/next pages.
# Standard Example
Let's render tenant.com/sport/page/42
.
Is the requested url a section page? if so, from which section?
Yes, it is a page of the section sport
, since it matches sport pagePattern
:
"tenant.com/sport/page/([0-9]+)"
.
TIP
page/number
is a standard way of building pagination URLs, so it can be detected automatically by Gutenberg.
Let's load this section page.
What are next/previous pages of this section?
When we extract sport/page/42, we find the pagination links:
previous = tenant.com/sport/page/41 ; next = tenant.com/sport/page/43
We can now render the pagination with appropriate links [ 41 | 42 | 43 ]
Infinite scroll kicks in and appends the content of these links automatically while the user scrolls down.
# Custom Example
Let's render tenant.com/news/10_0_0_0_0_3
.
Is the requested url a section page? if so, from which section?
Without any configuration in the definition.json
, this page appears to be an article.
If the news
section has the pattern "tenant.com/news/10_0_0_0_0([0-9])"
, it can be recognised as a section.
What are next/previous pages of this section?
If the HTML tags for pagination are not standard, like this:
<div class="paginate">
<a class="prev2" href="/news/10_0_0_0_0_2">prev</a>
<a href="/news/10">1</a>
<a href="/news/10_0_0_0_0_2">2</a>
<a href="/news/10_0_0_0_0_3">3</a>
<a href="/news/10_0_0_0_0_4">4</a>
<a href="/news/10_0_0_0_0_5">5</a>
<a class="next2" href="/news/10_0_0_0_0_4">next</a>
</div>
We need the tags to be configured in the section's ripper file, with .paginate>.prev2
and .paginate>.next2
to identify the previous and next pages.
# Feature toggles
Enable pagination with the feature flag renderSectionPagination
(disabled by default).
Infinite scroll is controlled by the flag lazyPagination
.
It is enabled by default, so it automatically kicks in when pagination is enabled.
# Configuration
Most tenants are paginated with no configuration at all.
If pagination doens't work by default on a site, it might be due to unrecognised tags or special URL patterns.
# Customize extracted pages
Marfeel expects rel
tags to detect pagination.
This tags are <link rel="prev">
and <link rel="next">
or <a rel="prev">
and <a rel="next">
.
Each tag must contain an href
property with the appropriate absolute url.
Section pages are extracted depending on those tags, and they can be configured in definition.json
.
This configuration is propagated and used by default in both JSOUPRipper and WhiteCollar.
{
...
"configuration" : {
"pageSelectors" : ".pagination li:first-child a | .pagination li:last-child a"
},
...
}
Note that both prev and next selectors are specified in the same property separated by a pipe (|) character as follows:
"pageSelectors" : "${prevSelector} | ${nextSelector}"
# Show section pagination
In this section, "showing" section pagination is equivalent to infinite scroll, if it is active.
The most common pagination pattern is: /page/([0-9]+)/
.
It means that for a given section url:
http://example.com/desporte/
, the next page would be http://example.com/desporte/page/2/
This pattern is supported by default, no configuration needed.
Other similar page patterns such as /([0-9]+)/
are automatically configured at tenant scaffolding time by MarfeelAlfred.
If MarfeelAlfred has not detected the page patterns adequately, configure it in the definition.json
file.
Affect all sections at once by declaring a pagePattern
in the root configuration
object of definition.json
:
{
...,
"configuration" : {
"pagePattern": "/page/([0-9]+)/"
},
...
}
Prefer this method if most sections are paginated, with the same pattern.
When pagePattern
is configured globally in the definition.json, there's no need to add the .*
. This is controlled from the backend and adds it when necessary, depending on if it's a default or a dynamic section.
When it's configured for only one section, add the .*
when necessary.
If some of the tenant URL's don't use a /
at the end and some others do, you can also try with the following pattern: /page/([0-9]+)/?
.
Exclude individual sections with the enablePagination
flag in their specific configuration:
"enablePagination": "false"
By default, the home section is excluded. Include it with the same flag:
"enablePagination": "true"
When tenants don't count the first page of a section for their pagination numeration, the pageNumberStartsFromZero
flag is required in the definition.json configuration. Eg: "Page 1" is test.com/games/ and "Page 2" is test.com/games/page/1/.
"configuration": {
"pageNumberStartsFromZero": "true"
}
TIP
Section pagination rendering is now feature toggled for each tenant, so you need to activate the feature renderSectionPagination in order to show section pages.
For sites without a common pattern, set specific pagePatterns in the configuration of each section:
{
"name" : "tag",
"title" : "Tags",
"type" : "DYNAMIC",
"uri" : "/tag/**",
"configuration" : {
"titlePattern" : "/tag/(.*)"
},
"alibabaDefinition" : {
"configuration": {
"feedRipper" : "jsoupRipper",
"jsoupSelectors" : "index/src/jsoup/tag.properties"
}
},
"pagePatterns": [{
"pattern": ".*/page/([0-9]+)"
}]
}
pagePatterns
is an array: define as many patterns as required for a section:
{
"pagePatterns": [
{
"pattern": ".*/page/([0-9]+)"
},
{
"pattern": ".*/pagina/([0-9]+)"
}
]
}
A pattern can also hold its own alibabaDefinition
object:
{
"pagePatterns": [
{
"pattern": "*/page/([0-9]+)",
"alibabaDefinition" : {
"configuration" : {
"feedRipper" : "jsoupRipper",
"jsoupSelectors" : "index/src/jsoup/tag.properties"
}
}
}
]
}
# UI Customization
Rendering the section pagination is not limited to the next and previous pages. We can show as many section pages we've been able to discover during the extraction.
Use the ui.json
to determine the maximum of pages the paginated section should show:
{
"siteStructure": {
"pagination": {
"numPages": 5
}
}
}
The default is 7:
- 3 previous pages
- The current page
- 3 next pages
# Local and preview branch behaviour
As section pagination uses the absolute links provided by tenant's html, we need to take in consideration that clicking a section page will take us to the production environment.