3 Ways to Optimize Your Semantic Search Engine With Graft

JD Prater

November 7, 2023

Table of Contents

Building a semantic search engine from the ground up can feel like navigating a maze with infinite paths. You're faced with a barrage of decisions: which embedding strategies to employ, which models to select, and how to configure re-ranking. Each choice looms large, with the pressure to quickly find the winning combination.

You've been there, haven't you? Changing chunking methods isn't just a switch flip; it's a code rewrite, a data reprocess, a multi-hour marathon. And the task of deploying and evaluating new embedding models? That's a whole other level of complexity.

What if you could have the flexibility you crave without the migraines? Imagine being able to iterate on text embeddings with a simple click. A platform where you can enhance semantic search results as effortlessly as new data and models surface, all without touching a line of code.

Graft's Modern AI Platform makes semantic search experimentation not just possible—it's powerful. With us, you can seamlessly test embedding strategies, interchange models, and integrate re-ranking, all while bypassing the maintenance chaos.

The intricacies of semantic search dissolve into simplicity, transforming the search for the perfect parameter combination into a journey of discovery and delight.

"Spend less time building out the semantic search, and more time experimenting and making sure the search performs."
~ Deven Navani, Senior Software Engineer @ Graft (Source: How to Build a Two-Stage Semantic Search Pipeline on Your Data, With Zero Code)

In this post I highlight how we make semantic search optimization not just effortless but enjoyable. I’ll explore the nuances of text embedding strategies, embedding model selection, and the art of re-ranking, all designed to elevate your search solutions to new heights.

1. Try Different Text Embedding Strategies

In the intricate dance of text analysis, the steps you choose can make all the difference. We understand that the rhythm of your data is unique, and so we offer a suite of six text embedding strategies, each with its own cadence and style, designed to match the beat of your content's heart.

From the nuanced blend of Chunk Average and Special Token Average—our default go-to that harmonizes local context with global semantics—to the precise footwork of Truncate and Token Average, our strategies are tailored to ensure that every word in your text moves in sync with your search and analysis goals.

Whether you're choreographing a complex sequence with lengthy documents or a quick shuffle with shorter texts, our strategies like Chunk Average and Classification Token or Truncate and Special Token Average are your partners in this dance, leading you to a performance that captures the essence of your data.

Let's take the stage and explore how each of these strategies can spotlight the semantics of your input, ensuring that every search, every analysis, and every decision is a step in the right direction.

Graft supports six text embedding strategies.

  1. Chunk Average and Special Token Average (Default)
  2. Chunk Average and Token Average
  3. Chunk Average and Classification Token
  4. Truncate and Special Token Average
  5. Truncate and Token Average
  6. Truncate and Classification Token
graft text embedding strategies

What is a classification token? This is a special token introduced in order to capture the semantics of the entire input.

1. Chunk Average and Special Token Average (Graft Default)

The text is split into 512-token chunks, each chunk and special tokens are embedded separately, then averaged to create the final embedding.

Chunk Average and Special Token Average

Use when: Most general purpose uses, balances local context and global semantics. Works well for many tasks.

2.Truncate and Special Token Average

The truncate method limits the input text to a fixed length of 512 tokens. The truncated text is embedded along with special [CLS] and [SEP] tokens. The final embedding is an average of the truncated sentence and special tokens.

Truncate and Special Token Average

Use when: You need a straightforward single vector representation of the overall semantics, especially for short texts.

3. Chunk Average and Classification Token

The chunk and average strategy takes the entire text and breaks it into chunks of n tokens, extracts a classification token embedding for each chunk and then averages those final embeddings. The chunk size depends chosen model but is often 512 tokens or about 400 words. If the input text is less than the maximum sequence length of the model then this technique is equivalent to the truncate method.

Chunk and Average

Use when: Most of your text is longer than around 400 words, and your primary goal is to train a classifier on the text.

4.Truncate and Classification Token

The truncate method looks at the first n tokens in the input text, prepends a special classification token and extracts the embedding for this token. The value of n is often 512 tokens and thus usually captures the meaningful semantics of the text.

truncate and classification token

Use when: Most of your text is short, less than around 400 words, and your primary goal is to train a classifier on the text.

5. Chunk Average and Token Average

The chunk and token average strategy splits the text into chunks of n tokens. In each chunk individual token embeddings are extracted and averaged to make a single embedding for the chunk. The embeddings for all chunks are then also averaged.

Chunk Average and Token Average

Use when: Most of your text is longer than around 400 words, and your primary goal is to perform similarity or semantic searches on the text.

6. Truncate and Token Average

The token average strategy takes the first n tokens of the text, extracts their individual embeddings and averages them to make a single embedding for the input text.

Truncate and Token Average

Use when: Most of your text is short, less than around 400 words, and your primary goal is to perform similarity or semantic searches on the text

2. Compare Results and Performance of Multiple Text Embedding Models

Graft provides one-click access to some of the most popular open source text embedding models like BERT, RoBERTa, and GTE as well as third-party models from OpenAI and Cohere. Now you have an extensive set of options to experiment with and optimize for your specific use case.

"It’s an amazing feature to support multiple variations of embedding strategies. Most of the ML practitioners are probably not even aware of the impact of embedding strategies on the performance of downstream tasks (semantic search, classification, etc.). It even pushes further to expose experimentation and comparison of different strategies in a fairly user-friendly fashion. So we can pick the best model tailored to our own data rather than to those less relevant public benchmarks. On top of this, I could foresee auto foundation model selection against bespoke data in the near future."

Steve Han, Senior Machine Learning Scientist @ Graft

For example, models like GTE offer an excellent balance of performance and efficiency for general semantic search applications. Scientific domains may benefit more from SciBERT's vocabulary tailored for academic literature. Applications that need to handle multiple languages can leverage multilingual models like mBERT.

Our Modern AI Platform empowers users to compare the results and performance across models to determine the best fit for their goals. Whether you need blazing speed, state-of-the-art accuracy, or specialized vocabulary, Graft has a text embedding model for you.

Want to learn more about these models and how Graft simplifies leveraging them? Check out our guide on the 8 Best Embedding Models for Search Semantic. Discover how Graft unlocks the flexibility of open source and third-party models while removing the headaches of integrating, comparing, and maintaining these advanced NLP techniques.

3. The Power of Re-Ranking in One-Click

Semantic search pipelines typically have two key stages - an initial retrieval model like text embeddings to find potentially relevant candidates, followed by a re-ranking model to refine the results.

This re-ranking step is crucial for filtering out false positives and honing in on documents highly aligned to the query's intent. It provides a second pass of scrutiny by evaluating query-document interactions directly using cross-encoder architectures.

Without re-ranking, results may be polluted with tangentially related matches. But precise re-ranking ensures only the most relevant candidates are ranked highly.

graft re-ranking semantic search

Implementing quality re-ranking models involves finding the right deep learning architectures and training datasets. But with Graft, a robust re-ranking model is built-in and ready to deploy with one-click.

Graft allows seamlessly comparing retrieval-only versus retrieval + re-ranking pipelines through their evaluation tools. Users consistently find significant gains from adding re-ranking - it's an easy lift with dramatic impact.

By handling the complexities of re-ranking behind the scenes, Graft makes it simple to benefit from more accurate and intent-aligned semantic search. The key to search that feels almost telepathic lies in this powerful retrieval + re-ranking combination unlocked by Graft.


You’re often faced with the daunting task of constructing a semantic search engine that is both sophisticated and user-friendly. In this post, I showcased three pivotal optimizations that Graft's Modern AI Platform offers to streamline this process:

  1. Text Embedding Strategies: Tailored to your content's length and complexity, Graft provides diverse strategies to ensure your search aligns perfectly with your data's nuances.
  2. Embedding Model Selection: With access to a wide range of text embedding models, Graft enables you to experiment and find the optimal balance between speed, accuracy, and linguistic scope for your specific needs.
  3. Re-Ranking Power: Graft's integrated re-ranking model refines search results to match user intent with precision, transforming a good search into a great one.

Embrace the simplicity and power of Graft to elevate your semantic search capabilities. Dive in, experiment with ease, and deliver a search experience that delights your users. Let's move beyond the traditional—step into the future of AI-powered search with Graft.

The Graft Intelligence Layer integrates your company knowledge and expertise to streamline your enterprise operations.

Book Demo
checkmark icon
All Your Use Cases - Advanced AI models for search, predictive, and generative.
checkmark icon
Use All Your Data - Every data source, every modality, always current.
checkmark icon
Customizable and Extensible - Leverage Graft's API to build custom AI-powered applications and workflows on top of the intelligence layer.
The AI of the 1%,
Built for the 99%
Get Access

Last Updated

January 18, 2024

Further reading

JD Prater

Head of Marketing

JD writes about his experience using and building AI solutions. Outside of work, you'll find him spending time with his family, cycling the backroads of the Santa Cruz mountains, and surfing the local sandbars. Say hi on LinkedIn.

Unify Knowledge

Centralized knowledge for easy access and discovery.

grid icon
Quick Setup

No machine learning expertise or infrastructure setup required.

cubes icon
Tailor to Your Needs

We partner closely with your team to ensure your success.

Amplify Your Productivity with Graft's Intelligence Layer

checkmark icon
Immediate productivity gains
checkmark icon
Save 2-3 hours/week/employee
checkmark icon
Reduce costs