Awesome Search
April 20, 2026 · View on GitHub
Support Ukrainian fight for the freedom
RUSSIAN WARSHIP, GO F*CK YOURSELF
I've been building e-commerce search applications for more than ten years. Below is a list of some publications, conferences, and books that have inspired me, grouped by topic. If an item fits into multiple topics, it appears in multiple sections.
:star: Star us on GitHub — it helps!
Also check my other collections awesome e-commerce, awesome knowledge graphs, awesome cloud apps
Topics
- General, fun, philosophy
- Types of search
- Search Quality Assurance
- Areas of application
- Search Results
- Search UX
- Spelling correction
- Synonyms
- Stopwords
- Suggestions
- Graphs/Taxonomies/Knowledge Graph
- Query expansion
- Query understanding
- Algorithms
- Tracking, profiling, GDPR, Analysis
- Experiments
- Testing, metrics, KPIs
- Architecture
- Education and networking
- Search Team. Managment, composition, hiring
- Economics of Search
- Blogposts series
- Industry players
- Case studies
- Videos
- Datasets
- Tools
- Other awesome stuff
- Unsorted
General, fun, philosophy
- Falsehoods Programmers Believe About Search
- Ethical Search: Designing an irresistible journey with a positive impact
- On Semantic Search
- Feedback debt: what the segway teaches search teams
- Supporting the Searcher’s Journey: When and How
- Shopping is Hard, Let’s go Searching!
- An Introduction to Search Quality
- On-Site Search Design Patterns for E-Commerce: Schema Structure, Data Driven Ranking & More
- In Search of Recall
- Balance Your Search Budget!
Types of search
Classic/Lexical Search
- Etsy. Targeting Broad Queries in Search
- How Etsy Uses Thermodynamics to Help You Search for “Geeky”
- Broad and Ambiguous Search Queries
- Deconstructing E-Commerce Search: The 12 Query Types
Vectors/Semantic search
- Migrating to Elasticsearch with dense vector for Carousell Spotlight search engine
- From zero to semantic search embedding model
- Guidelines to choose an index
- Pinecone Series
- Innovating Search Experience with Amazon OpenSearch and Amazon Bedrock
Symmetric and Asymmetric semantic search
Embeddings
Types
- Bi-encoder vs Cross encoder?When to use which one?
- What is ColBERT and Late Interaction and Why They Matter in Search?
Vector retrieval
Query/Document tokens interaction
No interactions - Two towers / Bi-encoders
Early interactions - Cross-encoders
Late interactions - ColBERT
- Announcing the Vespa ColBERT embedder
- What is ColBERT and Late Interaction and Why They Matter in Search?
Dense Vectors
Size of input and Chunking
- Chunking Strategies for LLM Applications
- Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex
- How to Chunk Text Data — A Comparative Analysis
####### Positional chunking
####### Semantic chunking
####### Hypothetical Document Embeddings
Matryoshka embeddings
- Matryoshka embeddings: faster OpenAI vector search using Adaptive Retrieval
- Introduction to Matryoshka Embedding Models
- Matryoshka representations. A guide to faster semantic search
Context-aware embeddings
- Improve your RAG applications by moving to Task-aware Embeddings
- How Context-Aware Embeddings Are Transforming Enterprise Search
Sparse Vectors
SPLADE
- Hybrid Search: SPLADE (Sparse Encoder)
- SPLADE for Sparse Vector Search Explained
- Improving information retrieval in the Elastic Stack. Introducing Elastic Learned Sparse Encoder, our new retrieval model
- No tracking until you click to share SPLADE – a sparse bi-encoder BERT-based model achieves effective and efficient first-stage ranking
Handling high-dimension embeddings
Dimensionality reduction
PCA
t-SNE
Quantization
Scalar quantization
Binary quantization
Product quantization
Rotational quantization
Finetuning models
- Fine-Tuning Text Embeddings For Domain-Specific Search
- Fine-tuning Multimodal Embedding Models
- Is Fine-Tuning an Embedding Model Worth it?
Hybrid search
Reciprocal rank fusion (RRF)
Linear Score Combination
Multimodal search
- Muves: Multimodal & multilingual vector search w/ Hardware Acceleration
- Model Selection for Multimodal Search
Multimodality Problems
Modality Gap
Contrastive Gap
Agentic search
Search Quality Assurance
Evaluation Paradigms
Session-based Evaluation
Query-based Evaluation
Random sampling
Stratified sampling
Probability-proportional-to-size sampling
Metrics
- Choosing your search relevance evaluation metric
- Visualizing search metrics
- Choosing your search relevance evaluation metric
- Measuring Search: Metrics Matter
Focused on ranking quality
- Discounted cumulative gain
- Flavors of NDCG - normalized to what!?
- Mean reciprocal rank
- P@k
- Demystifying nDCG and ERR
- https://en.wikipedia.org/wiki/Precision_and_recall
- https://en.wikipedia.org/wiki/F1_score
Focused on diversity of results
MMR
- How to Calculate MMR?
- Maximal Marginal Relevance to Re-rank results in Unsupervised KeyPhrase Extraction
Average Pairwise Distance, APD
- edit distance
- semantic distance
Entropy
Behavioral / Product / Performance
Clicks
Zero clicks
Clicks residual
Zero results
Evaluation Modes
Offline
- How to Implement a Normalized Discounted Cumulative Gain (NDCG) Ranking Quality Scorer in Quepid
- Compute Mean Reciprocal Rank (MRR) using Pandas
Judgements
HUman judgements
- What Is a Judgment List?
- Evaluating Search: Using Human Judgments
- Measuring Search, A Human Approach
Implicite judgements
add something on clicks streams
Using LLM as judge
- Improving retrieval with LLM-as-a-judge
- LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods
Online
Areas of application
Enterprise search
e-Commerce search
Conversational search
Search Results
Retrieval
Relevance
- Humans Search for Things not for Strings
- What is a ‘Relevant’ Search Result?
- How to Achieve Ecommerce Search Relevance
- Setting up a relevance evaluation program
Relevance Algorithms
BM25
- Understanding the BM25 full text search algorithm
- Practical BM25: How Shards Affect Relevance Scoring in Elasticsearch, The BM25 Algorithm and its Variables
- The influence of TF-IDF algorithms in eCommerce search
- BM25 The Next Generation of Lucene Relevance
- Lucene Similarities (BM25, DFR, DFI, IB, LM) Explained
Bayesian BM25 (BB25)
- Bayesian BM25 is cool
- Releasing bb25 0.2.0: Why Bayesian BM25 (bb25) extends well far-beyond search?
Ranking
Multi-stage ranking
Reranking
Learning to Rank
- How is search different than other machine learning problems?
- Reinforcement learning assisted search ranking
- E-commerce Search Re-Ranking as a Reinforcement Learning Problem
- When to use a machine learned vs. score-based search ranker
- What is Learning To Rank?
- Using AI and Machine Learning to Overcome Position Bias within Adobe Stock Search
- Train and Test Sets Split for Evaluating Learning To Rank Models
- How LambdaMART works - optimizing product ranking goals
Click models for search
Bias
Diversification
- Search Result Diversification using Causal Language Models
- Learning to Diversify for E-commerce Search with Multi-Armed Bandit
- Search Quality for Discovery & Inspiration
- How to measure Diversity of Search Results
- Searching for Goldilocks
- Broad and Ambiguous Search Queries - Recognizing When Search Results Need Diversification
- Thoughts on Search Result Diversity
MMR
- How to Calculate MMR?
- Maximal Marginal Relevance to Re-rank results in Unsupervised KeyPhrase Extraction
Personalisation
- Patterns for Personalization in Recommendations and Search
- Daniel Tunkelang Personalization
- Airbnb - Real-time personalization in search
- 98 personal data points that facebook uses to target ads to you
- Architecture of real world recommendation systems
- Feature engineering for personalized search
Zero search results
- Strategies for using alternative queries to mitigate zero results and their application to online marketplaces
- Semantic Equivalence of e-Commerce Queries
Search UX
Baymard Institute
- Deconstructing E-Commerce Search: The 12 Query Types
- Autodirect or Guide Users to Matching Category
- 13 Design Patterns for Autocomplete Suggestions (27% Get it Wrong)
- E-Commerce Search Needs to Support Users’ Non-Product Search Queries (15% Don’t)
- Search UX: 6 Essential Elements for ‘No Results’ Pages
- Product Thumbnails Should Dynamically Update to Match the Variation Searched For (54% Don’t)
- Faceted Sorting - A New Method for Sorting Search Results
- The Current State of E-Commerce Search
- E-Commerce Sites Need Multiple of These 5 ‘Search Scope’ Features
- E-Commerce Search Field Design and Its Implications
- E-Commerce Sites Should Include Contextual Search Snippets (96% Get it Wrong)
- E-Commerce Search Usability: Report & Benchmark
- Six ‘COVID-19’ Related E-Commerce UX Improvements to Make
Nielsen Norman Group
- The Love-at-First-Sight Gaze Pattern on Search-Results Pages
- Good Abandonment on Search Results Pages
- Complex Search-Results Pages Change Search Behavior: The Pinball Pattern
- Site Search Suggestions
- Search-Log Analysis: The Most Overlooked Opportunity in Web UX Research
- Scoped Search: Dangerous, but Sometimes Useful
- 3 Guidelines for Search Engine "No Results" Pages
Enterprise Knowledge LLC
Facets
- Facets of Faceted Search
- Coffee, Coffee, Coffee!
- Faceted Search (start here!)
- How to implement faceted search the right way
- Metadata and Faceted Search
- Metacrap: Putting the torch to seven straw-men of the meta-utopia
- 7 Filtering Implementations That Make Macy’s Best-in-Class
- Facet Search: The Most Comprehensive Guide. Best Practices, Design Patterns, Hidden Caveats, And Workarounds
- Facets: Constraints or Preferences?
- Facets, But Which Ones?
Accidental Taxonomist
- How Many Facets Should a Taxonomy Have
- When a Taxonomy Should not be Hierarchical
- Customizing Taxonomy Facets
Other
- Learning from Friction to Improve the Search Experience
- Why is it so hard to sort by price?
- Faceted Sorting
- Google kills Instant Search
Spelling correction
- Peter Norvig. "How to Write a Spelling Corrector". Classic publication.
- Daniel Tunkelang. "Spelling Correction"
- A simple spell checker built from word vectors
- A closer look into the spell correction problem: 1, 2, 3, preDict
- Deep Spelling
- Modeling Spelling Correction for Search at Etsy
- Wolf Garbe. Author of Sympell. 1000x Faster Spelling Correction algorithm, Top highlight SymSpell vs. BK-tree: 100x faster fuzzy string search & spell checking, Fast Word Segmentation of Noisy Text
- Chars2vec: character-based language model for handling real world texts with spelling errors and
- JamSpell, spelling correction taking into account surrounding context - library, (in russian) Исправляем опечатки с учётом контекста
- Embedding for spelling correction
- A simple spell checker built from word vectors
- What are some algorithms of spelling correction that are used by search engines?
- Moman - lucene/solr/elasticsearch spell correction/autocorrect is (was?) actually powered by this library.
- Query Segmentation and Spelling Correction
- Applying Context Aware Spell Checking in Spark NLP
- Autocorrect in Google, Amazon and Pinterest and how to write your own one
Synonyms
- Boosting the power of Elasticsearch with synonyms
- Real Talk About Synonyms and Search
- Synonyms in Solr I — The good, the bad and the ugly
- Synonyms and Antonyms from WordNet
- Synonyms and Antonyms in Python
- Dive into WordNet with NLTK
- Creating Better Searches Through Automatic Synonym Detection
- Multiword synonyms in search using Querqy
- How to Build a Smart Synonyms Model
- The importance of Synonyms in eCommerce Search
Stopwords
Suggestions
Synonyms: autocomplete, search as you type, suggestions
- Giovanni Fernandez-Kincade. Bootstrapping Autosuggest, Building an Autosuggest Corpus, Part 1, Building an Autosuggest Corpus, Part 2, Autosuggest Retrieval Data Structures & Algorithms, Autosuggest Ranking
- On two types of suggestions
- Improving Search Suggestions for eCommerce
- Autocomplete Search Best Practices to Increase Conversions
- Why we’ve developed the searchhub smartSuggest module and why it might matter to you
- Nielsen Norman Group: Site Search Suggestions
- 13 Design Patterns for Autocomplete Suggestions
- Autocomplete
- Autocomplete and User Experience
- IMPLEMENTING A LINKEDIN LIKE SEARCH AS YOU TYPE WITH ELASTICSEARCH
- Smart autocomplete best practices: improve search relevance and sales
- OLX: Building Corpus for AutoSuggest (Part 1), AutoSuggest Retrieval & Ranking (Part 2)
- Autocomplete, Live Search Suggestions, and Autocorrection: Best Practice Design Patterns
- Mirror, Mirror, What Am I Typing Next? All About Search Suggestions
- How we built the lightning fast autosuggest for otto.de
Graphs/Taxonomies/Knowledge Graph
-
Knowledge graphs applied in the retail industry
Knowledge graphs are becoming increasingly popular in tech. We explore how they can be used in the retail industry to enrich data, widen search results and add value to a retail company.
Integrating Search and Knowledge Graphs (by Enterprise Knowledge)
Query expansion
Query understanding
- Daniel Tunkelang Query Understanding.
- Query Understanding, Divided into Three Parts
- Search for Things not for Strings
- Understanding the Search Query. Part 1, Part 2, Part 3
- Food Discovery with Uber Eats: Building a Query Understanding Engine
- AI for Query Understanding
Search Intent
Query segmentation
- Paper Unsupervised Query Segmentation Using only Query Logs
- Paper Towards Semantic Query Segmentation
Algorithms
BERT
- Understanding BERT and Search Relevance
- Google is improving web search with BERT – can we use it for enterprise search too?
ColBERT
Collocations, common phrases
- Automatically detect common phrases – multi-word expressions / word n-grams – from a stream of sentences.
- The Unreasonable Effectiveness of Collocations
Other Algorithms
Hashing
- Locality Sensitive Hashing
- Locality Sensitive Hashing (LSH): The Practical and Illustrated Guide
- Minhash
Sorting by average ratings
Keywords extraction
Tracking, profiling, GDPR, Analysis
Tools, platforms, helpers for search tracking
- OpenSearch User Behavior Insights
- Site Search tracking with Google Analytics 4
- Snowplow
- search-colletor
- OpenTelemetry with search additions
- Pulse Query Analytics
- Tracking who's hot and who's not presents an algorithmic challenge
Resources
- Anonymisation: managing data protection risk (code of practice)
- The Anonymisation Decision-Making Framework
- 98 personal data points that facebook uses to target ads to you
- Opportunity Analysis for Search
- A Face Is Exposed for AOL Searcher No. 4417749
- AOL search data leak
- Personal data
Experiments
- Common Pitfalls of Search Experimentation
- Improving Search @scale with efficient query experimentation
A/B testing, MABs
Testing, metrics, KPIs
KPIs
- 5 Right Ways to Measure How Search Is Performing
- E-commerce Site-Search KPIs. Part 1 – Customers, Part 2 – Products, Part 3 - Queries
- Learning from Friction to Improve the Search Experience
- Behind the Wizardry of a Seamless Search Experience
- Analyzing online search relevance metrics with the Elastic Stack
- How to Gain Insight From Search Analytics
Evaluating Search (by Daniel Tunkelang)
Measuring Search (by James Rubinstein)
- Statistical and human-centered approaches to search engine improvement
- A Human Approach
- Setting up a relevance evaluation program
- Metrics Matter
- A/B Testing Search: thinking like a scientist
- Query Triage: The Secret Weapon for Search Relevance
- The Launch Review: bringing it all together…
Three Pillars of Search Relevancy (by Andreas Wagner)
Architecture
- The Art Of Abstraction – Revisiting Webshop Architecture
- Canva - Search Pipeline
- Event-Driven Architecture for Efficient Search Indexing
Education and networking
Conferences
Trainings and courses
-
Cheat at Search with LLMs. Doug Turnbull Next: July 2025
-
Relevant Search Masterclass by Doug Turnbull Next: July 2025
-
OpenSource Connections
-
Search Fundamentals. Daniel Tunkelang, Grant Ingersoll Next: Feb 6, 2023
-
Search with Machine Learning. Daniel Tunkelang, Grant Ingersoll Next: Feb 27, 2023
-
Search for Product Managers. Daniel Tunkelang Next: Apr 3, 2023
-
Sematext's Solr, Elasticsearch, and OpenSearch trainings
Fall 2023
-
https://dtunkelang.medium.com/upcoming-search-classes-this-fall-58f877fe00ad
Books
- AI-powered search
- Relevant Search
- Deep Learning for search
- Interactions with search systems
- Embeddings in Natural Language Processing. Theory and Advances in Vector Representation of Meaning
- Search User Interfaces
- Search Patterns
- Search Analytics for Your Site: Conversations with Your Customers
- Click Models for Web Search
- Optimization Algorithms
- Query Understanding for Search Engines
Blogs and Portals
Papers
Search Team. Managment, composition, hiring
- Search is a Team Sport
- Thoughts about Managing Search Teams
- On Search Leadership
- Building an Effective Search Team: the key to great search & relevancy
- Query Triage: The Secret Weapon for Search Relevance
- The Launch Review: bringing it all together
- The Role of Search Product Owners
- Search Product Management: The Most Misunderstood Role in Search?
- Search relevance for understaffed teams
Job Interviews
- Interview Questions for Search Relevance Engineers, Data Scientists, and Product Managers
- Data Science Interviews: Ranking and search
Engineering
Economics of Search
Blogposts series
Search Optimization 101 (by Charlie Hull)
- How do I know that my search is broken?
- What does it mean if my search is ‘broken’?
- How do you fix a broken search?
- Reducing business risk by optimizing search
Query Understanding (by Daniel Tunkelang)
Better search through query understanding.
- An Introduction
- Language Identification
- Character Filtering
- Tokenization
- Spelling Correction
- Stemming and Lemmatization
- Query Rewriting: An Overview
- Query Expansion
- Query Relaxation
- Query Segmentation
- Query Scoping
- Entity Recognition
- Taxonomies and Ontologies
- Autocomplete
- Autocomplete and User Experience
- Contextual Query Understanding: An Overview
- Session Context
- Location as Context
- Seasonality
- Personalization
- Search as a Conversation
- Clarification Dialogues
- Relevance Feedback
- Faceted Search
- Search Results Presentation
- Search Result Snippets
- Search Results Clustering
- Question Answering
- Query Understanding and Voice Interfaces
- Query Understanding and Chatbots
Grid Dynamics
- Not your father’s search engine: a brief history of retail search
- Semantic vector search: the new frontier in product discovery
- Boosting product discovery with semantic search
- Semantic query parsing blueprint
Considering Search: Search Topics (by Derek Sisson)
- Intro
- Assumptions About Search
- Assumptions About User Search Behavior
- Types of Information Collections
- A Structural Look at Search
- Users and the Task of Information Retrieval
- Testing Search
- Useful Search Links and References
Industry players
Personalies and influencers
Search Engines
- Bing
- Yandex
- Amazon
- eBay
Products and services
- Algolia
- [Vespa] (https://vespa.ai/)
- Elasticsearch - Distributed search & analytics engine
- ParadeDB - Modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads.
- Solr - Solr is the blazing-fast, open source, multi-modal search platform built on the full-text vector, and geospatial search capabilities of Apache Lucene
- Fess Enterprise Search Server
- Typesense - an opensource alternative to Algolia.
- TopK - combines AI-powered query understanding with adaptive ranking to provide the most relevant results in your domain.
- SearchHub.io
- Datafari - an open source enterprise search solution.
- Qdrant - an open source vector database.
- Awakari - Real-Time search from unlimited sources like RSS, Fediverse, Telegram. Text keyword matching conditions, numeric conditions, condition groups. Reverse search index based.
- Meilisearch - Open source search API that supports full-text, vector, geospatial & faceted search.
Consulting companies
Case studies
- Airbnb - Machine Learning-Powered Search Ranking of Airbnb Experiences
- Airbnb - Listing Embeddings in Search Ranking
- Algolia - The Architecture Of Algolia’s Distributed Search Network
- Meituan - Exploration and practice of BERT in the core ranking of Meituan search (🇨🇳 BERT在美团搜索核心排序的探索和实践)
- Netflix - How Netflix Content Engineering makes a federated graph searchable (Part 1, Part 2)
- Netflix - Elasticsearch Indexing Strategy in Asset Management Platform (AMP)
- Skyscanner - Learning to Rank for Flight Itinerary Search
- Slack - Search at Slack
- Twitter - Stability and scalability for search
- Amazon SEO Explained: How to Rank Your Products #1 in Amazon Search Results in 2020
- Building a Better Search Engine for Semantic Scholar
General search
- How Bing Ranks Search Results: Core Algorithm & Blue Links
- How Google Search Ranking Works – Darwinism in Search
E-commerce
Multisided markets
Videos
Channels
Featured
Datasets
- Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search
- ESCI-S: extended metadata for Amazon ESCI dataset
- Home Depot Product Search Relevance
- WANDS - Wayfair ANnotation Dataset
Tools
Spacy
Awesome Spacy - Natural language upderstanding, content enrichment etc.
Word2Vec
- Word2Vec For Phrases — Learning Embeddings For More Than One Word
- Gensim Word2Vec Tutorial
- How to incorporate phrases into Word2Vec – a text mining approach
- Word2Vec — a baby step in Deep Learning but a giant leap towards Natural Language Processing
- How to Develop Word Embeddings in Python with Gensim
Libs
- Query Segmenter
- Tantivy - Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
- https://github.com/zentity-io/zentity
- https://github.com/mammothb/symspellpy
- https://github.com/searchhub/search-collector
- Kiri - State-of-the-art semantic search made easy.
- Haystack - End-to-end Python framework for building natural language search interfaces to data.
- https://github.com/castorini/docTTTTTquery
Other
Other awesome stuff
Unsorted
- IndexFox - AI-powered site search SaaS combining keyword and semantic search with instant AI-generated answers.
- sandbox Jun 2021
- sandbox May 2021
- sandbox April 2021
- sandbox Dec 2020
- sandbox Jan 2020
OLD TOC (to review)
- General, fun, philosophy
- Types of search
- Search Quality Assurance
- Areas of application
- Search Results
- Search UX
- Spelling correction
- Suggestions
- Synonyms
- Stopwords
- Graphs/Taxonomies/Knowledge Graph
- Integrating Search and Knowledge Graphs (by Enterprise Knowledge)
- Query expansion
- Query understanding
- Algorithms
- Tracking, profiling, GDPR, Analysis
- Experiments
- A/B testing, MABs
- Evaluating search
- MRR
- Testing, metrics, KPIs
- KPIs
- Evaluating Search (by Daniel Tunkelang)
- Measuring Search (by James Rubinstein)
- Three Pillars of Search Relevancy (by Andreas Wagner)
- Architecture
- Vectors search
- Education and networking
- Management, Search Team
- Industry players
- Personalies and influencers
- Search Engines
- Products and services
- Consulting companies
- Blogposts series
- Search Optimization 101 (by Charlie Hull)
- Query Understanding (by Daniel Tunkelang)
- Grid Dynamics
- Considering Search: Search Topics (by Derek Sisson)
- Videos
- Channels
- Featured
- Case studies
- Datasets
- Tools