Shopify Metafield Normalisation at Scale
Overview
This project involved the large-scale normalisation of structured product data across a live Shopify store with 6,348 products.
The objective was not cosmetic improvement.
It was to permanently fix broken filtering, inconsistent search behaviour, and weak SEO foundations caused by unreliable underlying data.
The Problem
The store contained thousands of products with:
-
Missing or inconsistent dimensions
-
Unstructured material and colour data
-
Attributes embedded in descriptions instead of structured fields
-
Filters and search results behaving unpredictably
From a user perspective, filters “looked” available — but didn’t work reliably.
From an operational perspective, staff were compensating with manual work and workarounds.
Why It Mattered
Shopify’s filtering, search, feeds, and integrations depend on clean, typed metafields.
Without them:
-
Filters silently fail or return misleading results
-
Search relevance degrades
-
SEO structured data is incomplete
-
Feeds to third-party tools become unreliable
Manual correction was estimated at ~300 staff-hours, with no guarantee of consistency or future safety.
Constraints & Risks
This was a live production store.
Key risks included:
-
Accidentally targeting variants instead of parent products
-
Inserting invalid data types (e.g. decimals into integer fields)
-
API rate limits and partial failures
-
Irreversible data corruption at scale
A one-off script was not acceptable.
The solution had to be safe, repeatable, and auditable.
What Was Done
A custom, production-safe automation pipeline was designed and built with the following characteristics:
Data Extraction
-
Parsed product descriptions, titles, and supplier source material
-
Used OCR where dimensions existed only in PDFs or images
-
Assisted by AI for extraction only, never blind insertion
Validation & Normalisation
-
Strict type enforcement (e.g. integer-only dimensions)
-
Controlled vocabularies for materials and colour families
-
Parent-product targeting only (variant-safe)
Import & Safeguards
-
Shopify Admin API–based updates
-
Idempotent logic (safe to re-run)
-
Full CSV and JSONL outputs
-
Detailed logging for audit and rollback
Every step was verifiable before anything touched the live store.
Scale of Execution
-
6,348 parent products processed
-
Thousands of structured metafields written
-
Zero variant corruption
-
Zero manual intervention required during execution
This was not a bulk edit — it was controlled, staged automation.
Outcome
-
~300 staff-hours eliminated
-
~700–1,000% ROI compared to outsourcing manual entry
-
Filters and faceted navigation now behave predictably
-
Search relevance improved immediately
-
SEO foundations strengthened through structured data
-
A reusable automation system created for future catalogues
Permanent Change
The most important outcome was not speed — it was structural improvement.
Product data hygiene is now enforced by tooling, not people.
Future catalogue additions can be normalised using the same pipeline, preventing the problem from re-emerging.
Key Takeaway
Most Shopify stores don’t have a “filter problem” or a “search problem”.
They have a data problem — and fixing it manually does not scale.
This project replaced fragile human processes with a system that is safe, repeatable, and built for real-world scale.