Skip to main content

Performing Searches and Understanding Results

To perform full-text search queries using the SearchIndex and understand the results, you first need to initialize the SearchIndex with a BookmarkRepository. Then, you can use the search method to find bookmarks matching your query.

Here's how to initialize SearchIndex and execute a search query:

from app.services.search_service import SearchIndex
from app.models.bookmark import Bookmark # Assuming Bookmark model exists
from app.repositories.bookmark_repository import BookmarkRepository # Assuming BookmarkRepository exists

# Assume a BookmarkRepository instance is available
# For demonstration, let's create a mock repository and bookmarks
class MockBookmarkRepository(BookmarkRepository):
def __init__(self):
self._bookmarks = {}

def get_all(self):
return list(self._bookmarks.values())

def get_by_id(self, bookmark_id: str):
return self._bookmarks.get(bookmark_id)

def add(self, bookmark: Bookmark):
self._bookmarks[bookmark.id] = bookmark

# Create some sample bookmarks
bookmark1 = Bookmark(
id="b1",
title="Python Programming Guide",
description="A comprehensive guide to Python programming for beginners and advanced users.",
url="https://example.com/python-guide"
)
bookmark2 = Bookmark(
id="b2",
title="Data Science with Python",
description="Learn data science fundamentals using Python libraries like Pandas and NumPy.",
url="https://example.com/data-science"
)
bookmark3 = Bookmark(
id="b3",
title="Advanced JavaScript Techniques",
description="Explore modern JavaScript features and best practices.",
url="https://example.com/javascript"
)

mock_repo = MockBookmarkRepository()
mock_repo.add(bookmark1)
mock_repo.add(bookmark2)
mock_repo.add(bookmark3)

# Initialize the SearchIndex with the repository
search_index = SearchIndex(repository=mock_repo)

# Index the bookmarks (this happens automatically on init, but explicitly showing for clarity)
search_index.index_bookmark(bookmark1)
search_index.index_bookmark(bookmark2)
search_index.index_bookmark(bookmark3)

# Perform a search
query = "python programming"
results = search_index.search(query=query, limit=5)

print(f"Search results for '{query}':")
for bookmark in results:
print(f"- {bookmark.title} ({bookmark.url})")

# Example with a different query
query_data_science = "data science"
results_data_science = search_index.search(query=query_data_science, limit=5)

print(f"
Search results for '{query_data_science}':")
for bookmark in results_data_science:
print(f"- {bookmark.title} ({bookmark.url})")

Understanding Search Results

The search method returns a list of Bookmark objects that match the provided query, ordered by relevance. The SearchIndex uses an inverted index to facilitate full-text search, processing bookmark titles and descriptions.

Tokenization and Stop Words

When bookmarks are indexed using index_bookmark, their title and description are tokenized. Tokenization involves breaking down the text into individual words or "tokens." During this process, common words (stop words) like "a," "the," "is," etc., are typically ignored, and words might be converted to lowercase to ensure case-insensitive matching.

For example, if a bookmark has the title "A Comprehensive Guide to Python Programming", it might be tokenized into tokens like comprehensive, guide, python, programming. When you search for "python programming", the query is similarly tokenized, and the index looks for bookmarks containing these tokens.

Result Ranking

After identifying all bookmarks that contain any of the query tokens, the SearchIndex ranks these results to present the most relevant ones first. The ranking algorithm (implemented in _rank_results) likely considers factors such as:

  • Token Frequency: Bookmarks with more occurrences of the query tokens might be ranked higher.
  • Token Proximity: If query tokens appear close to each other in the bookmark's title or description, it might indicate higher relevance.
  • Field Importance: Matches in the title might be given more weight than matches in the description.

The limit parameter in the search method allows you to control the maximum number of results returned. If limit is set to 5, only the top 5 most relevant bookmarks will be returned.

Variations and Scenarios

Limiting Search Results

You can adjust the limit parameter to retrieve a specific number of top results.

# Search for "python" and get only 1 result
query = "python"
results_limited = search_index.search(query=query, limit=1)

print(f"
Top 1 search result for '{query}':")
for bookmark in results_limited:
print(f"- {bookmark.title}")

Searching with Partial Matches

The tokenization process allows for effective partial matching. If you search for a part of a word that forms a token, it will likely match. However, the exact behavior depends on the tokenization strategy (e.g., stemming).

# Searching for "program" will likely match "programming"
query_partial = "program"
results_partial = search_index.search(query=query_partial, limit=5)

print(f"
Search results for '{query_partial}':")
for bookmark in results_partial:
print(f"- {bookmark.title}")

Handling Empty or No-Match Queries

If a query yields no matching tokens or no bookmarks contain the tokens, the search method will return an empty list.

# Query with no matching bookmarks
query_no_match = "nonexistent_topic"
results_no_match = search_index.search(query=query_no_match, limit=5)

print(f"
Search results for '{query_no_match}':")
if not results_no_match:
print("No bookmarks found matching the query.")