3.1 The Paradigm Shift to Semantic SEO
Modern search engines have evolved far beyond simple keyword matching. Through initiatives like the Knowledge Graph, Google has shifted its focus to semantic search: understanding the real-world entities (people, places, organizations, concepts) within a query and a document, and the relationships between them.28 An SEO tool that merely counts words is operating on an obsolete model. To provide value, it must mirror this semantic leap.
When a user searches for “who was president when the twin towers fell,” Google does not look for pages containing those exact keywords. It understands the entities “U.S. President” and “World Trade Center,” the event “September 11 attacks,” and the temporal relationship between them. It returns the answer “George W. Bush” by comprehending the meaning, not by matching strings.30 An advanced SEO tool must therefore move from analyzing words to analyzing concepts.
3.2 Implementing Named Entity Recognition (NER) in JavaScript
Named Entity Recognition (NER) is the core NLP task that enables this semantic understanding. NER is the process of scanning text to identify and classify named entities into predefined categories such as PERSON, ORGANIZATION, LOCATION, PRODUCT, DATE, and more.32
For SEO, NER is invaluable. It helps:
- Disambiguate terms: It can distinguish between “Apple” (the ORGANIZATION) and “apple” (the fruit), allowing for more precise topical analysis.28
- Identify core subjects: It reveals what a page is truly about by highlighting the key entities discussed.
- Facilitate structured data recommendations: By identifying entities, the tool can check if they are correctly marked up with Schema.org structured data, a critical technical SEO factor.29
While benchmark NER systems like StanfordNER are Java-based 32, the JavaScript ecosystem offers powerful and accessible alternatives. Libraries like
wink-ner and nlp.js provide pre-trained models that can be readily integrated into a Node.js or browser environment.35
The implementation involves passing text to the NER model, which returns an array of identified entities with their types and positions.
JavaScript
// Hypothetical example using a library like wink-ner or nlp.js
const ner = require(‘some-ner-library’);
const text = “Sundar Pichai, CEO of Google, announced the new Pixel phone in California.”;
const entities = ner.recognize(text);
console.log(entities);
/*
Expected Output:
*/
By integrating NER, the analysis engine gains a foundational layer of semantic comprehension, moving it closer to how modern search engines interpret content.
3.3 A Hybrid Scoring Model: Unifying TF-IDF, N-grams, and NER
The true power of these new analytical layers is realized when they are combined. Relying on TF-IDF alone is purely statistical. Relying on NER alone identifies entities but doesn’t weigh their importance. A hybrid scoring model that synthesizes these inputs provides a far more accurate measure of a concept’s importance on a page. This approach is supported by research showing that combining statistical methods like TF-IDF with semantic information yields superior results.37
A proposed algorithm for a “Concept Importance Score” is as follows:
- Execute the full text pre-processing pipeline from Part I (cleaning, normalization, tokenization, lemmatization).
- Calculate TF-IDF scores for all relevant unigrams and n-grams (bigrams, trigrams) against the competitor corpus.
- Perform Named Entity Recognition on the original, unprocessed text to identify entities and their types.
- Create a unified list of “concepts,” which can be any unigram, n-gram, or named entity.
- For each concept, calculate a final Concept Score using a weighted formula:
ConceptScore=TF-IDF_ScoreĂ—EntityType_Multiplier
The EntityType_Multiplier is a configurable weight that boosts the score of terms identified as meaningful entities. For example:
- PERSON, ORGANIZATION, LOCATION: Multiplier of 1.5
- PRODUCT, EVENT: Multiplier of 1.3
- Not an Entity (a simple n-gram): Multiplier of 1.0
This model intrinsically values concepts that are both statistically significant (high TF-IDF) and semantically important (recognized as a named entity). A term like “Nvidia” will receive a higher final score than a non-entity term with an identical TF-IDF score, more accurately reflecting its importance to a page about graphics cards.
3.4 Re-architecting the keywordAnalysis Data Structure
The existing websiteData.keywordAnalysis structure is insufficient to hold this new, richer data. A new, more robust schema is required to serve as the backbone for the revamped analysis and recommendation engine. A unified concepts array provides a clean, scalable architecture.
The following table details the proposed schema.
| Field Name | Data Type | Description | Example |
| concepts | Array<Object> | An array containing all identified concepts on the page. | […] |
| concept.term | string | The text of the concept itself. | “natural language processing” |
| concept.type | string | The type of concept. | “trigram”, “PERSON”, “ORGANIZATION” |
| concept.isEntity | boolean | True if the concept was identified by the NER model. | true |
| concept.lemma | string | The lemmatized root form of the term. | “natural language process” |
| concept.tfidfScore | number | The calculated TF-IDF score against the competitor corpus. | 4.78 |
| concept.conceptScore | number | The final hybrid score after applying multipliers. | 4.78 (if not an entity) or 7.17 (if an entity with a 1.5x multiplier) |
| concept.locations | Array<string> | An array indicating where the term appears on the page. | [“title”, “h1”, “body”] |
| concept.count | number | The raw frequency of the term on the page. | 5 |
This structure provides a comprehensive foundation for building a next-generation recommendation engine, enabling specific, data-driven advice that was previously impossible.