The bar is raised: when we first started Lucid, the problems were all around standing up Lucene or Solr or dealing with performance issues, now the large majority of them are around taking search to the next level: better relevance, personalization, recommendations, etc., i.e. how to have better relevance
How do you gain insight?The Search boxis the UI for data these daysFeedback improvements into system for usersExtract key metrics for business understanding
Make into images?
All about ad hoc and bulk storage and computationAll about the analytics that drive your computationGlue to make it all work together – data where it needs to be when it needs to be thereAll are examples of ways to do this. There are actually a fair number of viable alternatives for all of these pieces, all in open sourceI tend to stick to Apache and “commercial” friendly licenses, where possible
Analytics:Discovery:– Recommendations, trends, related searches
Authoritative store: managing across, consistency, etc.Analysis should be done where it most makes sense given the location of the data and the type of analysis being doneHadoop and HBase stuff are all pretty straightforward
Log and navigation: clicks, search trails, etc.Data cleanliness: Never viewed docs that are related to other documents
Big Picture: too often devs are stuck in the weeds