5. How to Process Text
• Create features
• Tf-idf for bag of words
• N-grams
• Other features e.g. length of text, contains url, …
• Convert to matrix of 0/1
• Singular value decomposition (SVD)
• Machine learning models
7. Training Models
• Daily training
• Queued and trained when servers are under utilized
• Lower priority as the system still has working models
• Store to S3
8. Models in Production
• Servers specific for prioritization
• Chef to configure new servers
• Servers download models from S3
• Cache as many models in memory as possible
• Evict older models
• Use the client specific model to classify message