This document provides instructions for cleaning up postcode data to improve lookup accuracy. It describes transforming spaces, fixing transposed characters, shifting characters to standard forms, trimming to standard postal codes, and cleaning the data. These steps resulted in a 2.5% increase in successful postcode conversions for one data field.
Handwritten Text Recognition for manuscripts and early printed texts
Postcode auto correct
1. Lovefilm postcode lookup:
CRO 1XA CR0 1XA SE!£ ^DH
CR0 lXA CR0 1XA SE% $RL
CRO1XA CR01XA CT^&EF
CR0lXA CR01XA EC!A 1DF
CRO 1XA CR0 1XA SE£ (SH
CR0 lXA CR0 1XA SE3 9SH.
Strip spaces. Fix transposed characters (0/1/L/O). Transform
shifted characters. Trim if front part matches post office codes.
Clean it. Auto fix = 2.5% increase in conversion for one field.
Outcodes and Incodes : bit.ly/yhR9Oy
Hinweis der Redaktion
Aha – hard to explain this without being there.I spent 2 days once optimising a postcode lookup field for Lovefilm.OK – let’s say you get no problems – everyone finds their addresses, orders their stuff and is happy. Birds tweet. Plinky music plays. SkreeeeEEET. Wait a minute – what if it’s not.So I looked at the postcode rules (use the link at the bottom) and studied these in detail. Worked out how they are made and validated etc.I then looked at the website and found tons of stuff in the FAILED postcode lookups.Apart from the odd person putting in things like DICK and laughing alone whilst they do it, what did we find?People transpose the Letter ‘L’ and the number ‘1’People also transpose the Letter ‘O’ and the number ‘0’People put 1,2 or 3 spaces in the middle or the end of the postcodeSome people use CAPS – majority use lower case.Why do they do those then? Well the transposition is cause they write it like an envelope. It’s in CAPS (it’s probably an older demographic?) on the envelope so it gets typed in as CAPS. And people confuse their own postcode or use the wrong letter.We can autocorrect this – so we just fix the transpositions (cause we know what the customer meant anyway, based on the post office rules – all these codes in the two columns can be inferred) We then remove spaces or anything else that doesn’t look relevant (full stops on the end, but the rest is a valid postcode).Now – here’s the really clever bit. What the hell is that stuff in the right hand column? Rubbish? No – perfectly serviceable postcodes. Why are people shifting characters. Well, they’re not touch typists so they’re looking at the keyboard, right? And they’re typing in CAPS, so they’re shifting each character. SO when they do the numbers, they get £ and % signs. We can auto convert these.We applied many small techniques like this to ONE FIELD in a checkout process. This resulted in a 2.5% increase in conversion. Sadly they’ve now lost this ‘feature’ which is pretty silly.