> ## Documentation Index
> Fetch the complete documentation index at: https://docs.paradedb.com/llms.txt
> Use this file to discover all available pages before exploring further.

# BM25 Scoring

> BM25 scores sort the result set by relevance

BM25 scores measure how relevant a score is for a given query. Higher scores indicate higher relevance.

## Basic Usage

The `pdb.score(<key_field>)` function produces a BM25 score and can be added to any query where any of the ParadeDB operators are present.

<CodeGroup>
  ```sql SQL theme={null}
  SELECT id, pdb.score(id)
  FROM mock_items
  WHERE description ||| 'shoes'
  ORDER BY pdb.score(id) DESC
  LIMIT 5;
  ```

  ```ts Drizzle theme={null}
  import { desc } from "drizzle-orm";
  import { search } from "@paradedb/drizzle-paradedb";

  await db
    .select({
      id: mockItems.id,
      score: search.score(mockItems.id),
    })
    .from(mockItems)
    .where(search.matchAny(mockItems.description, "shoes"))
    .orderBy(desc(search.score(mockItems.id)))
    .limit(5);
  ```

  ```python Django theme={null}
  from paradedb import MatchAny, ParadeDB, Score

  MockItem.objects.filter(
      description=ParadeDB(MatchAny('shoes'))
  ).annotate(
      score=Score()
  ).values('id', 'score').order_by('-score')[:5]
  ```

  ```python SQLAlchemy theme={null}
  from sqlalchemy import desc, select
  from sqlalchemy.orm import Session
  from paradedb.sqlalchemy import pdb, search

  stmt = (
      select(MockItem.id, pdb.score(MockItem.id).label("score"))
      .where(search.match_any(MockItem.description, "shoes"))
      .order_by(desc("score"))
      .limit(5)
  )

  with Session(engine) as session:
      session.execute(stmt).all()
  ```

  ```ruby Rails theme={null}
  MockItem.search(:description)
          .matching_any("shoes")
          .with_score
          .select(:id)
          .order(search_score: :desc)
          .limit(5)
  ```

  ```cs EF Core theme={null}
  await dbContext
      .MockItems.Where(item => EF.Functions.MatchAny(item.Description, "shoes"))
      .Select(item => new { item.Id, Score = EF.Functions.Score(item.Id) })
      .OrderByDescending(item => item.Score)
      .Take(5)
      .ToListAsync();
  ```
</CodeGroup>

In order for a field to be factored into the BM25 score, it must be present in the BM25 index. For instance,
consider this query:

<CodeGroup>
  ```sql SQL theme={null}
  SELECT id, pdb.score(id)
  FROM mock_items
  WHERE description ||| 'keyboard' OR rating < 2
  ORDER BY pdb.score(id) DESC
  LIMIT 5;
  ```

  ```ts Drizzle theme={null}
  import { desc, lt, or } from "drizzle-orm";
  import { search } from "@paradedb/drizzle-paradedb";

  await db
    .select({
      id: mockItems.id,
      score: search.score(mockItems.id),
    })
    .from(mockItems)
    .where(
      or(
        search.matchAny(mockItems.description, "keyboard"),
        lt(mockItems.rating, 2),
      ),
    )
    .orderBy(desc(search.score(mockItems.id)))
    .limit(5);
  ```

  ```python Django theme={null}
  from django.db.models import Q
  from paradedb import MatchAny, ParadeDB, Score

  MockItem.objects.filter(
      Q(description=ParadeDB(MatchAny('keyboard'))) | Q(rating__lt=2)
  ).annotate(
      score=Score()
  ).values('id', 'score').order_by('-score')[:5]
  ```

  ```python SQLAlchemy theme={null}
  from sqlalchemy import desc, or_, select
  from sqlalchemy.orm import Session
  from paradedb.sqlalchemy import pdb, search

  stmt = (
      select(MockItem.id, pdb.score(MockItem.id).label("score"))
      .where(or_(search.match_any(MockItem.description, "keyboard"), MockItem.rating < 2))
      .order_by(desc("score"))
      .limit(5)
  )

  with Session(engine) as session:
      session.execute(stmt).all()
  ```

  ```ruby Rails theme={null}
  MockItem.search(:description)
          .matching_any("keyboard")
          .or(MockItem.where(rating: ...2))
          .with_score
          .select(:id)
          .order(search_score: :desc)
          .limit(5)
  ```

  ```cs EF Core theme={null}
  await dbContext
      .MockItems.Where(item =>
          EF.Functions.MatchAny(item.Description, "keyboard") || item.Rating < 2
      )
      .Select(item => new { item.Id, Score = EF.Functions.Score(item.Id) })
      .OrderByDescending(item => item.Score)
      .Take(5)
      .ToListAsync();
  ```
</CodeGroup>

While BM25 scores will be returned as long as `description` is indexed, including `rating` in the BM25 index definition will allow results matching
`rating < 2` to rank higher than those that do not match.

## Joined Scores

First, let's create a second table called `orders` that can be joined with `mock_items`:

```sql theme={null}
CALL paradedb.create_bm25_test_table(
  schema_name => 'public',
  table_name => 'orders',
  table_type => 'Orders'
);

ALTER TABLE orders
ADD CONSTRAINT foreign_key_product_id
FOREIGN KEY (product_id)
REFERENCES mock_items(id);

CREATE INDEX orders_idx ON orders
USING bm25 (order_id, product_id, order_quantity, order_total, customer_name)
WITH (key_field = 'order_id');
```

Next, let's compute a "combined BM25 score" over a join across both tables.
The Django example assumes an `Order` model with `product = models.ForeignKey(MockItem, db_column='product_id', to_field='id', ...)`.

<CodeGroup>
  ```sql SQL theme={null}
  SELECT o.order_id, o.customer_name, m.description, pdb.score(o.order_id) + pdb.score(m.id) as score
  FROM orders o
  JOIN mock_items m ON o.product_id = m.id
  WHERE o.customer_name ||| 'Johnson' AND m.description ||| 'running shoes'
  ORDER BY score DESC, o.order_id
  LIMIT 5;
  ```

  ```ts Drizzle theme={null}
  import { and, desc, eq, sql } from "drizzle-orm";
  import { search } from "@paradedb/drizzle-paradedb";

  const score = sql`${search.score(orders.orderId)} + ${search.score(mockItems.id)}`;

  await db
    .select({
      orderId: orders.orderId,
      customerName: orders.customerName,
      description: mockItems.description,
      score,
    })
    .from(orders)
    .innerJoin(mockItems, eq(orders.productId, mockItems.id))
    .where(
      and(
        search.matchAny(orders.customerName, "Johnson"),
        search.matchAny(mockItems.description, "running shoes"),
      ),
    )
    .orderBy(desc(score), orders.orderId)
    .limit(5);
  ```

  ```python Django theme={null}
  from django.db.models import F, FloatField
  from django.db.models.expressions import RawSQL
  from paradedb import MatchAny, ParadeDB, Score

  Order.objects.filter(
      customer_name=ParadeDB(MatchAny('Johnson')),
      product__description=ParadeDB(MatchAny('running shoes')),
  ).annotate(
      order_score=Score(),
      product_score=RawSQL('pdb.score(mock_items.id)', [], output_field=FloatField()),
  ).annotate(
      score=F('order_score') + F('product_score')
  ).values(
      'order_id', 'customer_name', 'product__description', 'score'
  ).order_by('-score', 'order_id')[:5]
  ```

  ```python SQLAlchemy theme={null}
  from sqlalchemy import desc, select
  from sqlalchemy.orm import Session
  from paradedb.sqlalchemy import pdb, search

  stmt = (
      select(
          Order.order_id,
          Order.customer_name,
          MockItem.description,
          (pdb.score(Order.order_id) + pdb.score(MockItem.id)).label("score"),
      )
      .select_from(Order)
      .join(MockItem, Order.product_id == MockItem.id)
      .where(
          search.match_any(Order.customer_name, "Johnson"),
          search.match_any(MockItem.description, "running shoes"),
      )
      .order_by(desc("score"), Order.order_id)
      .limit(5)
  )

  with Session(engine) as session:
      session.execute(stmt).all()
  ```

  ```ruby Rails theme={null}
  orders = Order.arel_table
  mock_items = MockItem.arel_table
  combined_score = Arel::Nodes::Addition.new(
    orders[:order_id].pdb_score,
    mock_items[:id].pdb_score
  )
  join = orders.join(mock_items).on(orders[:product_id].eq(mock_items[:id])).join_sources

  Order.joins(join)
       .search(:customer_name)
       .matching_any("Johnson")
       .search(mock_items[:description])
       .matching_any("running shoes")
       .select(
         orders[:order_id],
         orders[:customer_name],
         mock_items[:description].as("product_description"),
         combined_score.as("score")
       )
       .order(
         Arel::Nodes::Descending.new(combined_score),
         Arel::Nodes::Ascending.new(orders[:order_id])
       )
       .limit(5)
  ```

  ```cs EF Core theme={null}
  await dbContext
      .Orders.Join(
          dbContext.MockItems,
          order => order.ProductId,
          item => item.Id,
          (order, item) => new { Order = order, Item = item }
      )
      .Where(row =>
          EF.Functions.MatchAny(row.Order.CustomerName, "Johnson")
          && EF.Functions.MatchAny(row.Item.Description, "running shoes")
      )
      .Select(row => new
      {
          row.Order.OrderId,
          row.Order.CustomerName,
          row.Item.Description,
          Score = EF.Functions.Score(row.Order.OrderId) + EF.Functions.Score(row.Item.Id)
      })
      .OrderByDescending(row => row.Score)
      .ThenBy(row => row.OrderId)
      .Take(5)
      .ToListAsync();
  ```
</CodeGroup>

## Score Refresh

The scores generated by the BM25 index may be influenced by dead rows that have not been cleaned up by the `VACUUM` process.

Running `VACUUM` on the underlying table will remove all dead rows from the index and ensures that only rows visible to the current
transaction are factored into the BM25 score.

```sql theme={null}
VACUUM mock_items;
```

This can be automated with [autovacuum](/documentation/performance-tuning/overview).
