20min talk given at PyData London 2014 A client in the energy sector wanted to create predictive behavioural models of business customers at the company level, but the CRM data was messy, often containing several sub-accounts for each business, without any grouping identifiers, and so aggregation was impossible. In this talk I describe a short project where we used text mining, a handful of unsupervised learning techniques and pragmatic use of human skill, to identify the true company level structures in the CRM data.