Slide 2:
- text preprocessing
- Stopwords removal
- Stemming
- Basic stemming methods
- remove ending
- transform words
- Digits
- Hyphens
- Punctuation Marks
- Case of Letters
- Identifying different text fields
- Identifying anchor text:
- Removing HTML tags
- Identifying main content blocks
- Partitioning based on visual cues
- Tree matching
slide 9:Duplicate Detection : ngrams
slide 11:Inverted index
slide 14:Search using inverted index
slide 16:Index construction
slide 21:
- Inverted Index Compression
- variable-bit scheme
- Unary coding
- Elias gamma coding
- delta coding
- Golomb coding
- the variable-byte scheme
- variable-bit scheme
slide 24:how the id stored by using gap
slide 25:Unary Coding
slide 27:Elias Gamma Coding
slide 28:Elias Gamma Decoding
slide 29:Elias Delta Coding
slide 30:Elias Delta Coding
slide 31:Golomb Coding
slide 33:Golomb Decoding
slide 34:Golomb Decoding Example
slide 35:Variable-Byte Coding
slide 36:Variable Byte Decoding
slide 38:Space Vs Time trades off
slide 39:Latent Semantic Indexing
slide 42:Singular Value Decomposition
slide 44:LSI organization
slide 45:Query and Retrieval
slide 52:LSI-Disadvantages