Instacart Blog
Tech & Innovation
How Instacart Built a Modern Search Infrastructure...

How Instacart Built a Modern Search Infrastructure on Postgres

Instacart

May 29, 2025

Authors: Ankit Mittal, Vinesh Gudla, Guanghua Shu, Akshay Nair, Tejaswi Tenneti, Andrew Tanner

In a previous blog post “Improving search at Instacart using hybrid recall”, we shared our progress on adaptively combining traditional full text search with embedding-based retrieval. Embarking on this journey required rethinking our search infrastructure to support hybrid recall while ensuring scalability and reliability. In this blog post, we will dive deeper into the architecture and engineering efforts that made this possible and lessons learned along the way.

Introduction

At Instacart, our mission is to create a seamless and efficient shopping experience for our customers. Given the size of our catalog — billions of items across thousands of retailers — the importance of an effective search infrastructure cannot be overstated. Search is often the primary entry point for users navigating our app, and delivering fast, relevant results is crucial for customer satisfaction and retention.

Retrieval is a critical component in search and most industry standard systems today are powered by text and embedding retrieval. Traditional full-text retrieval methods, while effective for simple keyword-based queries, struggle to capture the nuances of a user’s semantic intent. Embedding-based retrieval can complement the capabilities of full-text search by understanding the semantic relationship between a query and document. For example, a specific query like “pesto pasta sauce 8oz” is best served by keyword retrieval while a more ambiguous query like “healthy foods” is better served by semantic search.

We previously had distinct search stacks for these two retrieval mechanisms, a design which created challenges, such as:

Overfetching & Post Filtering: Overfetching documents from each source to account for merging and post-filtering of documents.
No control over Precision & Recall*: When retrieving documents we wanted fine grained control over precision & recall to hit the right balance for each search query.
Operational Burden: The need to maintain and scale multiple retrieval systems.

*Precision is the percentage of retrieved results that are relevant, and Recall is the percentage of relevant documents retrieved from the entire corpus.

To address these limitations, we recognized the need for a hybrid retrieval solution that leverages the strengths of both traditional and embedding-based approaches in a single datastore. Our goals were to:

Enhance Relevance: Combine keyword matching with semantic understanding to deliver more accurate results that align with user intent.
Improve Performance: Optimize search latency and throughput to handle our high volume of requests efficiently and reduce the maintenance overhead of multiple retrieval systems.

Scale of Search Retrieval at Instacart

To provide a clearer context, let’s take a step back and examine the fundamental challenges that we face at instacart in building a robust search system:

High read throughput: We handle millions of search requests daily with a wide variance in query distribution throughout the day. Our infrastructure needs to provide quick and accurate results under this read workload to meet user expectations.
Dynamic Inventory: Grocery items are fast-moving goods. Prices, availability, and discounts change multiple times a day and our search system must reflect these real-time changes to provide up-to-date information. As a result, our search and ranking database receives billions of writes per day. This includes catalog changes, pricing data, availability and inventory data, ancillary tables for ranking and display, personalization, and replacement data.
Complex user preferences: Users have varied preferences, influenced by factors like dietary restrictions, brand loyalties and price sensitivity. Incorporating these personalized factors into search retrieval adds another layer of complexity.

The Evolution of Search

Outlined below is the journey of how our system evolved to tackle these challenges.

Full Text Search in Elasticsearch: This was our initial search implementation.
Full Text Search in Postgres: We transitioned our full text search functionality from Elasticsearch to Postgres.
Semantic Search with FAISS: The FAISS library was introduced to add semantic search capabilities.
Hybrid Search in Postgres with pgvector for semantic search: Our current solution which combines lexical and embedding-based retrieval in a single Postgres engine.

Now let’s look at how our architecture and system evolved in more detail.

Original Architecture

Originally, Instacart’s full text search was implemented in Elasticsearch, an industry-standard solution at the time. However, as the system grew, we encountered significant challenges. In our case, our Elasticsearch implementation did not scale well due to our denormalized data model and the nature of our write workloads. Specifically, frequent partial writes to documents were needed to update billions of items to reflect price changes and inventory availability. Over time, the indexing load and throughput caused the cluster to struggle so much that fixing erroneous data would take days to be corrected.

Additionally, we also wanted to support rich ML features and sophisticated models for search retrieval. These further increased the already high indexing load as well as the cost of indexing. As a result, the read performance also continued to degrade which made our overall search performance untenable.

Moving Full Text Search to Postgres

This was when we decided to migrate our text retrieval stack to sharded Postgres instances with a high degree of data normalization. While this might seem somewhat unconventional, it made sense for our use case.

Since Instacart’s catalog was already being served out of Postgres and since the cluster was already scaled to handle the high loads, moving the search traffic to it proved to be relatively straightforward and quick for the majority of use cases.

By using Postgres GIN indexes and a modified version of Postgres’ ts_rank function, we were able to implement highly performant text matching in our retrieval. Since Postgres is a relational database, we could also store the ML features and model coefficients in separate tables. The advantage of this was that different tables could have different update/write frequencies and they could be joined in SQL which enabled support for more sophisticated ML models for retrieval.

We saw distinct advantages over our previous architecture.

A normalized data model allowed us to have a 10x reduction in write workload compared to the denormalized data model that we had to use in Elasticsearch. This led to substantial savings on storage and indexing.
It allowed us to index hundreds of GBs of ML features alongside the documents enabling more complex retrieval models.
We had stronger operational expertise with running Postgres at scale compared to Elasticsearch. Postgres tends to fail in predictable ways and degrades gracefully instead of failing abruptly.

A key insight was to bring compute closer to storage. This is opposed to more recent database patterns, where storage and compute layers are separated by networked I/O. The Postgres based search ended up being twice as fast by pushing logic and computation down to the data layer instead of pulling data up to the application layer for computation. This approach, combined with Postgres on NVMEs, further improved data fetching performance and reducing latency. Fig 1 shows the high-level topology of this Postgres cluster.

To be more specific, previously our application layer had to make multiple network calls to Elasticsearch and other services such as the item availability data service, join the data and finally filter the results to get the final result set. This not only resulted in overfetching of documents but also added to the latency and overall complexity of the system. In the case of Postgres, by pushing the availability information and other data sources into the database and using a normalized data model, we were able to consolidate all the network calls and avoided the need to overfetch and filter the results. This delivered results to users noticeably faster and simplified the application layer as well.

Fig 1: Topology of the Postgres cluster (a single shard is shown for simplicity)

Semantic Search

By 2021, we had already consolidated full-text search onto Postgres and wanted to add support for semantic retrieval. Because Postgres still lacked native support for the approximate nearest neighbor (ANN) search needed for semantic retrieval, we spun up a standalone service to support ANN search. The query and document embeddings were generated using a bi-encoder model based on the Huggingface MiniLM-L3-v2 architecture and the document embedding indexes were built using Meta’s FAISS library

For each search query, parallel calls would be made to the Postgres database for full text search and the ANN service for semantic search. The results from these calls were combined in the application layer into a single list using a linear ranking model. The top k items from this merged retrieval set would then be passed to the subsequent ranking & reranking layers to generate the final results list as shown in Fig 2.

Fig 2. Retrieval architecture with FAISS and Postgres

Combining the documents from the two sources in the application layer to generate the final set yielded significant improvements in overall search quality but this architecture was not very flexible and we faced an increasing number of challenges as we introduced more sophisticated models in the recall layer.

We couldn’t support better algorithms that could optimally use the relative strengths of the two retrieval mechanisms when recalling documents.
FAISS had limitations on filtering documents based on their attributes at retrieval time which meant that an excess number of documents had to be fetched first (overfetching) before being filtered in a post-processing step. This resulted in some relevant documents not being fetched and was also wasteful from a system resources perspective.
Maintaining two separate services for recall came with developmental and operational overhead. Storing data in two separate systems and keeping them in-sync led to inconsistencies.

Bringing it all Together: An Emergent Hybrid Search Infrastructure

While migrating full-text search to Postgres improved our search infrastructure, and introducing semantic retrieval in FAISS boosted search quality, we wanted to explore a modern datastore to tackle the challenges mentioned above and unlock new capabilities by building the next generation of our search infrastructure.

We wanted to unify our overall retrieval system and address the three challenges from earlier:

Minimize document overfetching and reduce post filtering by implementing both retrieval mechanisms in one system.
Enable fine-grained control of the retrieval datasets to improve the precision & recall.
Reduce operational overhead by consolidating the datastores and simplifying the stack.

Choosing the right datastore for combined retrieval

Across the industry, two approaches are typically used to combine the results from text and semantic retrieval, a paradigm that is also called hybrid retrieval.

Using standalone vector stores like Milvus, Pinecone, and LanceDB and text search datastores such as Elastic with the results retrieved from each datastore and combined in the application layer. Some, like LanceDB, also support query patterns similar to relational databases.
Using semantic search support in existing text search datastores. For example pgvector for Postgres or approximate kNN vector search for Elastic could be used to support both forms of retrieval in a single datastore. Combining both retrieval mechanisms into a single datastore eliminates the need to maintain separate indexing pipelines and datastores, reduces infrastructure requirements, and most importantly, offers finer-grained control over how the recall set is generated and combined

Solution 1 is popular among new applications and for cases where the retrieved number of documents is small (order of tens of thousands) as the retrieval sets can be combined in the application layer. Solution 2 is more appealing for applications with existing text retrieval solutions that also support semantic retrieval.

Solution 2 appealed to us since we were already serving our full text search from Postgres, so adopting pgvector would enable consolidation of both retrieval mechanisms into a single system and minimize data duplication and reduce the maintenance costs. Thanks to pgvector’s rapid development cycle it had already become a strong contender. Additionally, similar to full text search we could leverage realtime item availability information in Postgres as a pre-filter to reduce latency & avoid overfetching. Fig 3 below illustrates the architecture of search retrieval and ranking implemented in our solution.

Fig 3. Hybrid retrieval architecture with pgvector and Postgres

ANN Prototype Cluster

In order to migrate from FAISS to pgvector, we first created a lab scale cluster for offline experimentation. By mimicking production traffic, we were able to verify that pgvector could handle our throughput and latency requirements. This was a critical test to ensure that our designed architecture would work in production.

ANN Index Design in Postgres

Typically, most Instacart searches are performed within the context of a given retailer. Given the nature of our query distribution and the variance in catalog sizes across retailers, in our previous FAISS based ANN service, we had created a separate HNSW index for each retailer. This approach had a relatively high maintenance overhead as we had to maintain 100s of indexes.

Using our prototype cluster, we were able to quickly arrive at a configuration that worked well in our offline evaluations. Table 1 below compares FAISS with pgvector for semantic search. The product count column indicates the number of products for a retailer. While pgvector was marginally slower than FAISS for larger retailers, it had better recall performance.

We also found that additional parameter tweaks worked better for us than the out of the box configuration as listed below:

Built hybrid indexes based on the retailer characteristics instead of dedicated per-retailer indexes.
Updated Postgres tuning parameters such as increasing max_parallel_workers_per_gather and max_parallel_workers to 8. This speeds up the scans where Postgres decides not to use a pgvector index
Updated the embedding column to use inline storage instead of TOAST.

Interestingly tuning pgvector’s index parameters to vary by retailer catalog size did not yield significant benefits.

Online Performance

Based on the offline performance of pgvector, we launched a production A/B test to a section of users. We saw a 6% drop in the number of searches with zero results due to better recall. This led to a substantial increase in incremental revenue for the platform as users ran into fewer dead-end searches and were able to better find the items they were looking for. Overall the pgvector migration was a great success both from a system and quality point of view and unlocked subsequent improvements in our retrieval stack.

Attribute Filtering

One of the other advantages of performing all retrieval inside a single datastore and specifically with pgvector is that we can also perform filtering of documents based on attributes which was something that was not possible with the FAISS based ANN service. Attribute filtering can help improve retrieval performance when applied as pre-filter or post-filter. As discussed previously, in the current system, real-time inventory availability information is used as a pre-filter to reduce the search space for semantic search; thus improving latency. Other attributes like brand and category can further improve the quality of the recall set. This is one area that we plan to further explore in upcoming iterations.

Summary

By migrating to a hybrid search infrastructure leveraging pgvector within Postgres and ts_rank for text match, we successfully unified our retrieval mechanisms into a single, scalable system. This consolidation not only enhanced search relevance by combining keyword matching with semantic understanding but also reduced operational overhead by consolidating our retrieval stacks.

The transition allowed for greater flexibility in handling dynamic inventory and complex user preferences, ultimately leading to a more efficient and personalized shopping experience for our customers. As we continue to evolve our search capabilities, this foundation positions us to better meet user needs and adapt to future challenges in the ever-changing landscape of e-commerce.

Acknowledgments
This project required the collaboration of multiple teams across the company including ML, ML infra, backend and core infra teams to be realized. Special thanks to Xiao Xiao, Raochuan Fan, Alex Charlton, Prakash Putta, Jonathan Phillips & Xukai Tang who also contributed to this work and made this vision a reality. I’d also like to thank Naval Shal, Eric Hacke, & Riddhima Sejpal for their thoughtful and thorough review of the blog post.

Instacart

Author

Instacart is the leading grocery technology company in North America, partnering with more than 1,800 national, regional, and local retail banners to deliver from more than 100,000 stores across more than 15,000 cities in North America. To read more Instacart posts, you can browse the company blog or search by keyword using the search bar at the top of the page.

Cutting tomatoes on a cutting board after grocery delivery.

Instacart Recommends

View most recent posts →

Building Instacart Meals

Introducing Coil: Kotlin-first Image Loading on Android

7 steps to get started with large-scale labeling