Shopify Metafield Normalisation at Scale

Overview

This project involved the large-scale normalisation of structured product data across a live Shopify store with 6,348 products.

The objective was not cosmetic improvement.
It was to permanently fix broken filtering, inconsistent search behaviour, and weak SEO foundations caused by unreliable underlying data.


The Problem

The store contained thousands of products with:

  • Missing or inconsistent dimensions

  • Unstructured material and colour data

  • Attributes embedded in descriptions instead of structured fields

  • Filters and search results behaving unpredictably

From a user perspective, filters “looked” available — but didn’t work reliably.
From an operational perspective, staff were compensating with manual work and workarounds.


Why It Mattered

Shopify’s filtering, search, feeds, and integrations depend on clean, typed metafields.

Without them:

  • Filters silently fail or return misleading results

  • Search relevance degrades

  • SEO structured data is incomplete

  • Feeds to third-party tools become unreliable

Manual correction was estimated at ~300 staff-hours, with no guarantee of consistency or future safety.


Constraints & Risks

This was a live production store.

Key risks included:

  • Accidentally targeting variants instead of parent products

  • Inserting invalid data types (e.g. decimals into integer fields)

  • API rate limits and partial failures

  • Irreversible data corruption at scale

A one-off script was not acceptable.
The solution had to be safe, repeatable, and auditable.


What Was Done

A custom, production-safe automation pipeline was designed and built with the following characteristics:

Data Extraction

  • Parsed product descriptions, titles, and supplier source material

  • Used OCR where dimensions existed only in PDFs or images

  • Assisted by AI for extraction only, never blind insertion

Validation & Normalisation

  • Strict type enforcement (e.g. integer-only dimensions)

  • Controlled vocabularies for materials and colour families

  • Parent-product targeting only (variant-safe)

Import & Safeguards

  • Shopify Admin API–based updates

  • Idempotent logic (safe to re-run)

  • Full CSV and JSONL outputs

  • Detailed logging for audit and rollback

Every step was verifiable before anything touched the live store.


Scale of Execution

  • 6,348 parent products processed

  • Thousands of structured metafields written

  • Zero variant corruption

  • Zero manual intervention required during execution

This was not a bulk edit — it was controlled, staged automation.


Outcome

  • ~300 staff-hours eliminated

  • ~700–1,000% ROI compared to outsourcing manual entry

  • Filters and faceted navigation now behave predictably

  • Search relevance improved immediately

  • SEO foundations strengthened through structured data

  • A reusable automation system created for future catalogues


Permanent Change

The most important outcome was not speed — it was structural improvement.

Product data hygiene is now enforced by tooling, not people.

Future catalogue additions can be normalised using the same pipeline, preventing the problem from re-emerging.


Key Takeaway

Most Shopify stores don’t have a “filter problem” or a “search problem”.

They have a data problem — and fixing it manually does not scale.

This project replaced fragile human processes with a system that is safe, repeatable, and built for real-world scale.