Knowledge Base

Catalog One

Topsort catalog standardization system is designed to help marketplaces, retailers, and brands ensure their product data is consistent, searchable, and free from duplicates. By recognizing and unifying brands, categories, and products across varied catalogs, Topsort enables better discovery, reporting, and ad targeting.

Requirements:

Customers send their product catalog to our Catalog API. Topsort Catalog API accepts product data in a structured format including key attributes such as title, description, brand name, category name, and product ID (see Schema Description)

Endpoint engineering features:

Asynchronous processing: Once a catalog is received, Topsort triggers an offline standardization task. This task processes the catalog using advanced matching logic and machine learning models. The time to run the full process of standardization may depend on the size of the original catalog, but a rough estimate is around 10 products per second.

Endpoint ML features:

Brand recognition: Matches the input brand to a canonical brand ID using a mix of fuzzy matching, pre-trained models and Large Language Models.

Category classification: Automatically maps free-text categories into a standardized category taxonomy, enabling consistent browsing and reporting. Topsort uses the Google taxonomy as reference for categories.

Product deduplication: Detects and links duplicate products across the catalog. Duplicates are grouped under a unified master_product_id using techniques like similarity scoring and vector-based clustering.

Output description:

The output is a clean, deduplicated, and enriched catalog where each product is linked to recognized brands, standardized categories, and master product groups. Topsort can also include scores for each inference task.

To consume this standardized catalog here’s the URL of the endpoint documentation. This is a paginated endpoint which works at a rate limit of 10 requests per second.