Skip to content
Knowledge Base

Catalog One

Catalog One is Topsort’s catalog standardization system designed to help marketplaces, retailers, and brands maintain consistent, searchable, and duplicate-free product data. By identifying and unifying brands, categories, and products across different catalogs, Topsort enhances product discovery, reporting, and ad targeting.

Key Benefits

Cleaner Data

No more inconsistent brand and category labels.

Improved UX

Deduplicated listings avoid clutter and repeated content.

Better Attribution

Accurate product identifiers allow better tracking of performance and attribution across channels.

Ad Targeting Ready

Unified data supports product-level ads, retargeting, and analytics.

API Integration

Easy-to-integrate ingestion process with feedback on the result.

Requirements

Customers send their product catalog to our Catalog API. Topsort Catalog API accepts product data in a structured format including key attributes such as title, description, brand name, category name, and product ID (see Schema Description)

Endpoint Engineering Features

Asynchronous Processing: Once a catalog is received, Topsort triggers an offline standardization task. This task processes the catalog using advanced matching logic and machine learning models. The time to run the full process of standardization may depend on the size of the original catalog, but a rough estimate is around 10 products per second.

Endpoint ML Features

Brand Recognition: Matches the input brand to a canonical brand ID using a mix of fuzzy matching, pre-trained models and Large Language Models.

Category Classification: Automatically maps free-text categories into a standardized category taxonomy, enabling consistent browsing and reporting. Topsort uses the Google taxonomy as reference for categories.

Product Deduplication: Detects and links duplicate products across the catalog. Duplicates are grouped under a unified master_product_id using techniques like similarity scoring and vector-based clustering.

Output Description

The output is a clean, deduplicated, and enriched catalog where each product is linked to recognized brands, standardized categories, and master product groups. Topsort can also include scores for each inference task.

To consume this standardized catalog here’s the URL of the endpoint documentation. This is a paginated endpoint which works at a rate limit of 10 requests per second.