Core NLP Concepts

1. Named Entity Recognition (NER)

NER identifies and classifies real-world entities mentioned in text.

Common entity types

Entity Type	Example
PERSON	Elon Musk
ORGANIZATION	Tesla
LOCATION	USA
DATE	January 2024
MONEY	$500 million
PERCENTAGE	12%

Example

Sentence:  "Elon Musk is the CEO of Tesla and lives in the USA."

NER output:
  Elon Musk  →  PERSON
  Tesla      →  ORGANIZATION
  USA        →  LOCATION

Why NER matters

Extracts structured information from unstructured text
Used in resume parsing and document processing
Widely applied in medical and legal NLP systems
Improves search engines and chatbots

2. Bag of Words (BoW)

One of the simplest techniques to convert text into numbers. Word order and grammar are ignored — only word frequency matters.

Example

Sentence 1: "I love NLP"
Sentence 2: "I love AI"

Vocabulary: [I, love, NLP, AI]

Sentence 1  →  [1, 1, 1, 0]
Sentence 2  →  [1, 1, 0, 1]

Advantages	Limitations
Very easy to implement	No understanding of context
Works well for small datasets	No semantic meaning
Useful as a baseline model	Treats all words as equally important

3. TF-IDF (Term Frequency – Inverse Document Frequency)

TF-IDF improves Bag of Words by assigning importance scores to words. Words frequent in a document but rare across all documents score higher.

TF-IDF = TF × IDF

TF(t, d)  =  (Number of times term t appears in document d)
             / (Total number of terms in document d)

IDF(t)    =  log( Total number of documents
                  / Number of documents containing term t )

Intuition: A word like "the" appears everywhere — low IDF, low score. A word like "neural" appears rarely — high IDF, high score when present.

Better than BoW because

Reduces importance of common words like the and is
Highlights meaningful, document-specific words

Works well for

Search engines
Spam detection
Document similarity tasks

Limitations

Does not capture semantic meaning
Synonyms are treated as different words

4. Word2Vec

Word2Vec represents words as dense numerical vectors that capture meaning and context. Words used in similar contexts get similar vectors.

Famous arithmetic examples:
  King − Man + Woman ≈ Queen
  Paris − France + Italy ≈ Rome

CBOW (Continuous Bag of Words)

Predicts a target word using surrounding context words.

Sentence: "Raj went to school yesterday"   (window size: 1)

  Input: [Raj, to]        →  Output: went
  Input: [went, school]   →  Output: to
  Input: [to, yesterday]  →  Output: school

How it works:
  1. Context words converted to one-hot vectors
  2. Vectors are summed or averaged
  3. Passed through hidden layer
  4. Model predicts the target word
  5. Error calculated, weights updated via backpropagation

Skip-Gram

Predicts surrounding context words from a target word — the reverse of CBOW.

Sentence: "Raj went to school yesterday"   (window size: 1)

  Target: went   →  Training pairs: (went→Raj), (went→to)
  Target: to     →  Training pairs: (to→went), (to→school)
  Target: school →  Training pairs: (school→to), (school→yesterday)

How it works:
  1. Target word converted to one-hot vector
  2. Passed through hidden layer
  3. Model predicts each context word
  4. Error calculated, weights updated via backpropagation

  👉 The hidden layer weights become the word embeddings

Advantages	Limitations
Captures semantic relationships	Same word has one vector regardless of context
Dense and meaningful embeddings	bank (river) and bank (money) get the same vector
Useful for clustering and similarity	Fixed by contextual models like BERT

5. When to Use Each Technique

Technique	Use when
Bag of Words	Building simple text classifiers or baseline NLP models
TF-IDF	Search systems, document similarity, spam detection
Word2Vec	Semantic similarity, recommendation systems, text clustering

These techniques show the evolution of NLP: from counting words → weighting word importance → understanding semantic meaning. They form the foundation for modern NLP and Generative AI systems.

6. Linguistic Fundamentals

Level	Focus	Example
Syntax	Grammatical structure of sentences	Dependency parsing, constituency parsing
Semantics	Meaning of words and sentences	Word sense disambiguation, contextual interpretation
Pragmatics	Speaker intent in real-world context	"Can you open the window?" is a request, not an ability test

7. Core NLP Tasks

Task	Description
Tokenization	Breaking text into words, phrases, or symbols
POS Tagging	Classifying words into grammatical categories (noun, verb, etc.)
NER	Identifying names of people, places, organizations
Parsing	Understanding structural relationships between words
Sentiment Analysis	Evaluating emotional tone (positive, negative, neutral)
Machine Translation	Converting text from one language to another

8. The NLP Pipeline

Step 1: Text Preprocessing
        Tokenization, stop-word removal, stemming, lemmatization
            |
Step 2: Feature Extraction
        TF-IDF, word embeddings, or statistical models
            |
Step 3: Model Training
        Supervised or unsupervised learning
            |
Step 4: Parsing & Semantic Analysis
        Understanding sentence structure and meaning
            |
Step 5: Inference & Decision Making
        Translation, summarization, question answering

9. Modern Deep Learning Approaches

Approach	Description	Examples
Word Embeddings	Capture semantic meaning of words as dense vectors	Word2Vec, GloVe, FastText
Sequence-to-Sequence	Process text where order matters	RNNs, LSTMs, GRUs
Transformers	Understand deep context via attention mechanisms	BERT, GPT, T5

10. Evaluation & Ethical Considerations

Evaluation metrics: Precision, Recall, F1 Score, BLEU, ROUGE
Bias and fairness: Reduce bias in training data and ensure fair model outputs
Explainability: Make complex models transparent, especially for critical applications like medical or legal NLP