1. Named Entity Recognition (NER)
NER identifies and classifies real-world entities mentioned in text.
Common entity types
| Entity Type | Example |
|---|---|
| PERSON | Elon Musk |
| ORGANIZATION | Tesla |
| LOCATION | USA |
| DATE | January 2024 |
| MONEY | $500 million |
| PERCENTAGE | 12% |
Example
Sentence: "Elon Musk is the CEO of Tesla and lives in the USA." NER output: Elon Musk → PERSON Tesla → ORGANIZATION USA → LOCATION
Why NER matters
- Extracts structured information from unstructured text
- Used in resume parsing and document processing
- Widely applied in medical and legal NLP systems
- Improves search engines and chatbots
2. Bag of Words (BoW)
One of the simplest techniques to convert text into numbers. Word order and grammar are ignored — only word frequency matters.
Example
Sentence 1: "I love NLP" Sentence 2: "I love AI" Vocabulary: [I, love, NLP, AI] Sentence 1 → [1, 1, 1, 0] Sentence 2 → [1, 1, 0, 1]
| Advantages | Limitations |
|---|---|
| Very easy to implement | No understanding of context |
| Works well for small datasets | No semantic meaning |
| Useful as a baseline model | Treats all words as equally important |
3. TF-IDF (Term Frequency – Inverse Document Frequency)
TF-IDF improves Bag of Words by assigning importance scores to words. Words frequent in a document but rare across all documents score higher.
TF-IDF = TF × IDF
TF(t, d) = (Number of times term t appears in document d)
/ (Total number of terms in document d)
IDF(t) = log( Total number of documents
/ Number of documents containing term t )
Intuition: A word like "the" appears everywhere — low IDF, low score. A word like "neural" appears rarely — high IDF, high score when present.
Better than BoW because
- Reduces importance of common words like
theandis - Highlights meaningful, document-specific words
Works well for
- Search engines
- Spam detection
- Document similarity tasks
Limitations
- Does not capture semantic meaning
- Synonyms are treated as different words
4. Word2Vec
Word2Vec represents words as dense numerical vectors that capture meaning and context. Words used in similar contexts get similar vectors.
Famous arithmetic examples: King − Man + Woman ≈ Queen Paris − France + Italy ≈ Rome
CBOW (Continuous Bag of Words)
Predicts a target word using surrounding context words.
Sentence: "Raj went to school yesterday" (window size: 1) Input: [Raj, to] → Output: went Input: [went, school] → Output: to Input: [to, yesterday] → Output: school How it works: 1. Context words converted to one-hot vectors 2. Vectors are summed or averaged 3. Passed through hidden layer 4. Model predicts the target word 5. Error calculated, weights updated via backpropagation
Skip-Gram
Predicts surrounding context words from a target word — the reverse of CBOW.
Sentence: "Raj went to school yesterday" (window size: 1) Target: went → Training pairs: (went→Raj), (went→to) Target: to → Training pairs: (to→went), (to→school) Target: school → Training pairs: (school→to), (school→yesterday) How it works: 1. Target word converted to one-hot vector 2. Passed through hidden layer 3. Model predicts each context word 4. Error calculated, weights updated via backpropagation 👉 The hidden layer weights become the word embeddings
| Advantages | Limitations |
|---|---|
| Captures semantic relationships | Same word has one vector regardless of context |
| Dense and meaningful embeddings | bank (river) and bank (money) get the same vector |
| Useful for clustering and similarity | Fixed by contextual models like BERT |
5. When to Use Each Technique
| Technique | Use when |
|---|---|
| Bag of Words | Building simple text classifiers or baseline NLP models |
| TF-IDF | Search systems, document similarity, spam detection |
| Word2Vec | Semantic similarity, recommendation systems, text clustering |
These techniques show the evolution of NLP: from counting words → weighting word importance → understanding semantic meaning. They form the foundation for modern NLP and Generative AI systems.
6. Linguistic Fundamentals
| Level | Focus | Example |
|---|---|---|
| Syntax | Grammatical structure of sentences | Dependency parsing, constituency parsing |
| Semantics | Meaning of words and sentences | Word sense disambiguation, contextual interpretation |
| Pragmatics | Speaker intent in real-world context | "Can you open the window?" is a request, not an ability test |
7. Core NLP Tasks
| Task | Description |
|---|---|
| Tokenization | Breaking text into words, phrases, or symbols |
| POS Tagging | Classifying words into grammatical categories (noun, verb, etc.) |
| NER | Identifying names of people, places, organizations |
| Parsing | Understanding structural relationships between words |
| Sentiment Analysis | Evaluating emotional tone (positive, negative, neutral) |
| Machine Translation | Converting text from one language to another |
8. The NLP Pipeline
Step 1: Text Preprocessing
Tokenization, stop-word removal, stemming, lemmatization
|
Step 2: Feature Extraction
TF-IDF, word embeddings, or statistical models
|
Step 3: Model Training
Supervised or unsupervised learning
|
Step 4: Parsing & Semantic Analysis
Understanding sentence structure and meaning
|
Step 5: Inference & Decision Making
Translation, summarization, question answering
9. Modern Deep Learning Approaches
| Approach | Description | Examples |
|---|---|---|
| Word Embeddings | Capture semantic meaning of words as dense vectors | Word2Vec, GloVe, FastText |
| Sequence-to-Sequence | Process text where order matters | RNNs, LSTMs, GRUs |
| Transformers | Understand deep context via attention mechanisms | BERT, GPT, T5 |
10. Evaluation & Ethical Considerations
- Evaluation metrics: Precision, Recall, F1 Score, BLEU, ROUGE
- Bias and fairness: Reduce bias in training data and ensure fair model outputs
- Explainability: Make complex models transparent, especially for critical applications like medical or legal NLP