Content analytics uses natural language processing techniques like n-grams and TF-IDF metrics to analyze content relevance. N-grams identify frequently occurring sequences of words in a document, while TF-IDF compares word frequencies in a topic corpus versus a general corpus to determine how important words are to a topic. By analyzing a document's n-grams against relevant topic corpora and a general corpus using TF-IDF, a content analyst can identify keywords that are highly relevant to the topic and optimize the document's content accordingly to improve search engine rankings.
6. NLP Data
● Natural Language Processing statistics
New data :
– How many times the main keywords are in my
content ?
– How many times these keywords are subject of a
sentence ?
– How relevant are the words I am using ?
8. Metric : TF - IDF
Numerical statistic that is intended to reflect
how important a word is to a document in a
corpus
Frequency of a word (or series of words) in a
document.
To avoid words that would be too specific to
only 1 document, it is compared to the
frequency in the corpus
15. 2nd - Topic Corpus
Now, create a Topic corpus around your keyword
(basically, pages ranked in Google)
Let's get 100 top results for these keywords
● Analytics event
● Analytics conference
● Measure Camp
Get the n-gram within all the documents (around 200
documents if you remove duplicate)
Calculate TF-IDF for each n gram
16. YAY !!! : My first relevant Content Metrics:)
measure camp : 100 (very frequent)
analytics conference : 60 (quite frequent)
● Peter O'Neill : 50 (quite frequent)
● Stay (in) London : 30 (somewhat frequent)
* not actual data. Simplified version of TF-IDF
17. Now, create a topic-neutral corpus (basically take
thousands and thousands of random webpages and create
a corpus with it)
Get the n-gram out of it
Extract :
Click here (very frequent)
Stay London (appears a few times)
Peter O'Neill (nowhere to be found)
Measure Camp (1 time in the corpus)
3rd – topic neutral corpus
18. 4 - Now let's compare
● Stay London : somewhat frequent in both
corpus : not so relevant for your content
● Peter O'Neill : Yay !
● Measure Camp : not so frequent in English,
very frequent in our topic corpus : I shall use it
19. ● Big data : very frequent in the topic corpus, not
seo frequent
→ Oh, sounds like something people want to
hear about. Let's write content about it.
20. 5 – Optimize your content
Proofread your content with these new relevant
expressions in mind.
Can I add more value to the user ?
Can it help improve my organic ranking ?