# Glossary
This glossary puts in relation content-platform terms at Marfeel.
Reach out to add more definitions!
# Alibaba
MarfeelAlibaba
is the section extraction orchestrator. It selects the configured Ripper and Extractor to retrieve section information from the target tenant.
By default, it uses whiteCollarRipper
and boilerpipeExtractor
.
# AMP (opens new window)
AMP stands for Accelerated Mobile Pages. It is an open standard framework for any publisher to have pages load quickly on mobile devices.
At Marfeel, AMP pages are generated automatically using MarfeelJigsaw.
# Boilerpipe
Boilerpipe is the component in charge of processing Article pages extraction. It includes Fetchers, Extractors, and SAXProcessors.
# Details
Details is the Marfeel term for article pages.
# DocumentModifiers
DocumentModifiers allow further transformations to HTML elements. They target specific elements and have multiple purposes.
Some of the usages are collapsing images into galleries, remove unwanted content from the article...
TIP
Find the details of DocumentModifiers in the Article extraction article.
# Extractor
Extractors are components in charge of information retrieval. Depending on the context, it can refer to different Marfeel components.
# Extractor in WhiteCollar
In a WhiteCollar configuration file, extractor is where the configuration for article retrieval is set.
# Section extractor
Section extraction is handled by MarfeelAlibaba
.
# Article extractor
MarfeelExtractor is a component within Boilerpipe
. It retrieves the text content in article pages.
TIP
BoilerpipeExtractor is the default extractor for non-MarfeelPress tenants.
BoilerpipePressExtractor is the default extractor configuration for tenants using MarfeelPress.
# Provider extractor
Designed to automatically detect providers in tenant pages, they are created along with the provider implementation.
TIP
Learn the details in its dedicated article
# Metadata extractor
A metadata extractor retrieves information from the tenant's page, either mosaic or details, to pass it to widgets or ad servers.
# Fetcher
Fetchers retrieve content from the tenant's site. The content is then processed by MarfeelExtractor.
TIP
Check the Article pages extraction article for a complete picture of the article extraction process.
# Gutenberg
Gutenberg (opens new window) is Marfeel's backend. A monorepo (opens new window) that contains core components related to content extraction, its processing and finally serving the Marfeelized content. It also includes the Marfeel Insight application.
# Invalidation
Invalidation at Marfeel refers to the process of refreshing content. This includes triggering the process at the right time, content extraction, its processing and refreshing the cache layers. The invalidation is finished once the new content lands in the production environment.
# Jigsaw
MarfeelJigsaw (opens new window) is a library that transforms Marfeel HTML into AMP compliant HTML (opens new window) using filters and XSL transformations (opens new window).
# Mosaic
Mosaic is the Marfeel term for section pages.
# Ripper
A Ripper is the component of MarfeelAlibaba in charge of retrieving the necessary information to populate section pages in Marfeel. There are several ripper implementations : whitecollarRipper, PuppeteerRipper, JsoupRipper, MPressRipper...
These implementations can be executed on Gutenberg or the MarfeelMRippers depending on active feature toggles. For example, this one controls JsoupRipper microservice execution.
Ripper changes deploy
As Rippers implementations are also used on the MarfeelMRippers microservice, it is important to update the microservice after deploying Gutenberg's changes. For any change to a ripper in Gutenberg, contact the Content Platform chapter to receive instructions regarding the microservices shuttle.
# SAXProcessors
SAXProcessors are processing tools in charge of detecting and modifying HTML elements during article extraction. They process all the images, media, commenting system... replacing the necessary elements to produce the Marfeelized version of the article.
SAX
SAX stands for simple API for XML (opens new window)
TIP
There are two SAXProcessors in Marfeel, ImageDocumentSAXProcessor
and HTMLDocumentSAXProcessor
.
Check their behaviour in detail in the Article extraction article.