Apache Solr (apache/solr)

Apache Solr

https://github.com/apache/solr
Admin
Apache Solr is a blazing-fast, open-source, multi-modal search platform built on Apache Lucene,...

Tokens:918,877
Snippets:12,531
Trust Score:9.1
Update:3 weeks ago
Show doc for...
Context Summary (auto-generated)
Raw
# Apache Solr

Apache Solr is a blazing-fast, open-source, multi-modal search platform built on Apache Lucene. It powers full-text, vector, and geospatial search at many of the world's largest organizations, providing enterprise-grade search capabilities with features like distributed indexing, replication, load-balanced querying, and automated failover and recovery.

Solr provides a REST-like API for indexing and searching documents, with support for faceting, highlighting, spell-checking, and more. It can run in standalone mode for smaller deployments or in SolrCloud mode for distributed, highly-available deployments. The platform includes SolrJ, a Java client library, and supports communication via HTTP with JSON, XML, or binary formats.

## REST API - Search Query

The `/select` endpoint handles search queries with support for full-text search, filtering, faceting, and highlighting.

```bash
# Basic search query
curl "http://localhost:8983/solr/myCollection/select?q=title:solr&rows=10"

# Search with filter query and faceting
curl "http://localhost:8983/solr/myCollection/select" \
  -d "q=*:*" \
  -d "fq=category:books" \
  -d "facet=true" \
  -d "facet.field=author" \
  -d "facet.field=genre" \
  -d "rows=20" \
  -d "start=0" \
  -d "fl=id,title,author,price" \
  -d "sort=price asc"

# JSON response example:
# {
#   "responseHeader": {"status": 0, "QTime": 5},
#   "response": {
#     "numFound": 125,
#     "start": 0,
#     "docs": [
#       {"id": "978-0641723445", "title": "The Lightning Thief", "author": "Rick Riordan", "price": 12.50}
#     ]
#   },
#   "facet_counts": {
#     "facet_fields": {
#       "author": ["Rick Riordan", 5, "Michael McCandless", 3]
#     }
#   }
# }
```

## REST API - Document Indexing

The `/update` endpoint accepts documents for indexing in JSON, XML, or CSV formats with optional commit parameters.

```bash
# Index a single document with JSON
curl -X POST "http://localhost:8983/solr/myCollection/update?commit=true" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "id": "978-0641723445",
      "cat": ["book", "hardcover"],
      "name": "The Lightning Thief",
      "author": "Rick Riordan",
      "series_t": "Percy Jackson and the Olympians",
      "genre_s": "fantasy",
      "inStock": true,
      "price": 12.50,
      "pages_i": 384
    }
  ]'

# Index multiple documents with commitWithin (auto-commit within 1 second)
curl -X POST "http://localhost:8983/solr/myCollection/update?commitWithin=1000" \
  -H "Content-Type: application/json" \
  -d '[
    {"id": "doc1", "title": "First Document", "content": "Full text content here"},
    {"id": "doc2", "title": "Second Document", "content": "More searchable content"}
  ]'

# Delete documents by ID
curl -X POST "http://localhost:8983/solr/myCollection/update?commit=true" \
  -H "Content-Type: application/json" \
  -d '{"delete": {"id": "978-0641723445"}}'

# Delete documents by query
curl -X POST "http://localhost:8983/solr/myCollection/update?commit=true" \
  -H "Content-Type: application/json" \
  -d '{"delete": {"query": "category:obsolete"}}'
```

## REST API - JSON Query DSL

The JSON Request API provides a structured way to build complex queries with facets, filters, and sorting.

```bash
# Complex JSON query with facets and highlighting
curl -X POST "http://localhost:8983/solr/myCollection/query" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "content:search engine",
    "filter": ["category:technology", "inStock:true"],
    "limit": 10,
    "offset": 0,
    "fields": ["id", "title", "author", "score"],
    "sort": "score desc, title asc",
    "facet": {
      "categories": {
        "type": "terms",
        "field": "category",
        "limit": 5
      },
      "price_ranges": {
        "type": "range",
        "field": "price",
        "start": 0,
        "end": 100,
        "gap": 20
      }
    },
    "highlight": {
      "fields": {"content": {}},
      "pre": "<em>",
      "post": "</em>"
    }
  }'
```

## REST API - Collection Admin

The Collections API manages SolrCloud collections including creation, deletion, and modification operations.

```bash
# Create a new collection with 3 shards and 2 replicas
curl "http://localhost:8983/solr/admin/collections?action=CREATE&name=myNewCollection&numShards=3&replicationFactor=2&collection.configName=_default"

# List all collections
curl "http://localhost:8983/solr/admin/collections?action=LIST"

# Get collection status
curl "http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS&collection=myCollection"

# Delete a collection
curl "http://localhost:8983/solr/admin/collections?action=DELETE&name=myOldCollection"

# Reload a collection (to pick up config changes)
curl "http://localhost:8983/solr/admin/collections?action=RELOAD&name=myCollection"

# Create an alias pointing to a collection
curl "http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=myAlias&collections=myCollection"
```

## SolrJ Client - HttpJdkSolrClient

HttpJdkSolrClient is a lightweight Java client using the built-in Java 11+ HTTP client for minimal dependencies.

```java
import org.apache.solr.client.solrj.impl.HttpJdkSolrClient;
import org.apache.solr.client.solrj.request.SolrQuery;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrDocument;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.SolrInputDocument;

// Create a client connected to a standalone Solr instance
try (HttpJdkSolrClient client = new HttpJdkSolrClient.Builder("http://localhost:8983/solr")
        .withDefaultCollection("myCollection")
        .withConnectionTimeout(5000)
        .withRequestTimeout(30000)
        .build()) {

    // Index a document
    SolrInputDocument doc = new SolrInputDocument();
    doc.addField("id", "book-001");
    doc.addField("title", "Apache Solr Reference Guide");
    doc.addField("author", "Apache Solr Team");
    doc.addField("category", "technology");
    doc.addField("price", 49.99);
    client.add(doc);
    client.commit();

    // Execute a search query
    SolrQuery query = new SolrQuery("*:*");
    query.setRows(10);
    query.setStart(0);
    query.addFilterQuery("category:technology");
    query.setSort("price", SolrQuery.ORDER.asc);
    query.setFields("id", "title", "author", "price");

    QueryResponse response = client.query(query);
    SolrDocumentList results = response.getResults();

    System.out.println("Found " + results.getNumFound() + " documents");
    for (SolrDocument result : results) {
        System.out.println("ID: " + result.getFieldValue("id"));
        System.out.println("Title: " + result.getFieldValue("title"));
    }

    // Delete by ID
    client.deleteById("book-001");
    client.commit();
}
```

## SolrJ Client - CloudSolrClient

CloudSolrClient routes requests to the correct nodes in a SolrCloud cluster with automatic failover.

```java
import org.apache.solr.client.solrj.impl.CloudSolrClient;
import org.apache.solr.client.solrj.request.CollectionAdminRequest;
import org.apache.solr.client.solrj.response.CollectionAdminResponse;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.SolrInputDocument;

import java.util.Arrays;
import java.util.List;

// Connect to SolrCloud using ZooKeeper
List<String> zkHosts = Arrays.asList("zk1:2181", "zk2:2181", "zk3:2181");
try (CloudSolrClient client = new CloudSolrClient.Builder(zkHosts)
        .withZkChroot("/solr")
        .withDefaultCollection("myCollection")
        .build()) {

    // Create a new collection
    CollectionAdminRequest.Create createRequest =
        CollectionAdminRequest.createCollection("newCollection", "_default", 2, 2);
    CollectionAdminResponse createResponse = createRequest.process(client);

    // Index documents - automatically routed to correct shards
    SolrInputDocument doc1 = new SolrInputDocument();
    doc1.addField("id", "user-123");
    doc1.addField("name", "John Doe");
    doc1.addField("email", "john@example.com");

    SolrInputDocument doc2 = new SolrInputDocument();
    doc2.addField("id", "user-456");
    doc2.addField("name", "Jane Smith");
    doc2.addField("email", "jane@example.com");

    client.add(Arrays.asList(doc1, doc2));
    client.commit();

    // Query across all shards
    SolrQuery query = new SolrQuery("name:*");
    query.setRows(100);
    QueryResponse response = client.query(query);

    // Get document by ID (real-time get)
    SolrDocument doc = client.getById("user-123");
}

// Alternative: Connect using Solr URLs instead of ZooKeeper
try (CloudSolrClient client = new CloudSolrClient.Builder(
        Arrays.asList("http://solr1:8983/solr", "http://solr2:8983/solr"))
        .withDefaultCollection("myCollection")
        .build()) {
    // Use client...
}
```

## SolrJ Client - SolrQuery Builder

SolrQuery provides a fluent API for building complex search queries with facets, highlighting, and more.

```java
import org.apache.solr.client.solrj.SolrClient;
import org.apache.solr.client.solrj.request.SolrQuery;
import org.apache.solr.client.solrj.response.FacetField;
import org.apache.solr.client.solrj.response.QueryResponse;

// Build a comprehensive search query
SolrQuery query = new SolrQuery();

// Main query and filters
query.setQuery("content:search AND content:engine");
query.addFilterQuery("type:article");
query.addFilterQuery("date:[2023-01-01T00:00:00Z TO NOW]");
query.addFilterQuery("-status:draft");

// Pagination and sorting
query.setStart(0);
query.setRows(25);
query.setSort("relevance", SolrQuery.ORDER.desc);
query.addSort("date", SolrQuery.ORDER.desc);

// Field selection
query.setFields("id", "title", "author", "date", "score");

// Faceting configuration
query.setFacet(true);
query.addFacetField("category");
query.addFacetField("author");
query.setFacetLimit(10);
query.setFacetMinCount(1);
query.addFacetQuery("price:[0 TO 25]");
query.addFacetQuery("price:[25 TO 50]");
query.addFacetQuery("price:[50 TO *]");

// Highlighting configuration
query.setHighlight(true);
query.addHighlightField("content");
query.addHighlightField("title");
query.setHighlightSimplePre("<mark>");
query.setHighlightSimplePost("</mark>");
query.setHighlightSnippets(3);
query.setHighlightFragsize(150);

// Execute query
QueryResponse response = client.query(query);

// Process facet results
for (FacetField facet : response.getFacetFields()) {
    System.out.println("Facet: " + facet.getName());
    for (FacetField.Count count : facet.getValues()) {
        System.out.println("  " + count.getName() + ": " + count.getCount());
    }
}

// Access highlighting
Map<String, Map<String, List<String>>> highlighting = response.getHighlighting();
for (SolrDocument doc : response.getResults()) {
    String id = (String) doc.getFieldValue("id");
    Map<String, List<String>> docHighlights = highlighting.get(id);
    if (docHighlights != null && docHighlights.containsKey("content")) {
        System.out.println("Highlighted: " + docHighlights.get("content").get(0));
    }
}
```

## SolrJ Client - JsonQueryRequest

JsonQueryRequest enables building queries using the JSON Request API format for complex structured queries.

```java
import org.apache.solr.client.solrj.request.json.JsonQueryRequest;
import org.apache.solr.client.solrj.request.json.TermsFacetMap;
import org.apache.solr.client.solrj.request.json.RangeFacetMap;
import org.apache.solr.client.solrj.response.QueryResponse;

import java.util.HashMap;
import java.util.Map;

// Build a JSON query request
JsonQueryRequest jsonQuery = new JsonQueryRequest()
    .setQuery("*:*")
    .setLimit(20)
    .setOffset(0)
    .returnFields("id", "title", "author", "price", "score");

// Add filter queries
jsonQuery.withFilter("category:books");
jsonQuery.withFilter("inStock:true");

// Add terms facet
TermsFacetMap authorFacet = new TermsFacetMap("author")
    .setLimit(10)
    .setMinCount(1);
jsonQuery.withFacet("top_authors", authorFacet);

// Add range facet
RangeFacetMap priceFacet = new RangeFacetMap("price", 0, 100, 25);
jsonQuery.withFacet("price_ranges", priceFacet);

// Add nested sub-facet
Map<String, Object> categoryFacet = new HashMap<>();
categoryFacet.put("type", "terms");
categoryFacet.put("field", "category");
categoryFacet.put("limit", 5);

Map<String, Object> subFacet = new HashMap<>();
subFacet.put("type", "terms");
subFacet.put("field", "author");
subFacet.put("limit", 3);
categoryFacet.put("facet", Map.of("top_author_per_category", subFacet));

jsonQuery.withFacet("categories_with_authors", categoryFacet);

// Execute and process response
QueryResponse response = jsonQuery.process(client, "myCollection");
System.out.println("Found: " + response.getResults().getNumFound());
```

## CLI - Solr Commands

The `bin/solr` command-line interface manages Solr instances, collections, and provides administrative operations.

```bash
# Start Solr in standalone mode
bin/solr start -p 8983

# Start Solr in SolrCloud mode with embedded ZooKeeper
bin/solr start -c -p 8983

# Start SolrCloud connecting to external ZooKeeper
bin/solr start -c -z zk1:2181,zk2:2181,zk3:2181/solr

# Start with specific memory settings
bin/solr start -m 4g

# Check Solr status
bin/solr status

# Stop Solr
bin/solr stop -p 8983
bin/solr stop -all

# Create a collection
bin/solr create -c myCollection -n _default -shards 2 -replicationFactor 2

# Create a core (standalone mode)
bin/solr create_core -c myCore -d _default

# Delete a collection or core
bin/solr delete -c myCollection

# Post documents to Solr
bin/solr post -c myCollection /path/to/documents.json
bin/solr post -c myCollection /path/to/data/*.xml
bin/solr post -c myCollection -filetypes json,xml /path/to/files/

# Export collection data
bin/solr export -c myCollection -query "*:*" -out /path/to/export.json

# Health check
bin/solr healthcheck -c myCollection -z localhost:2181

# Authenticate (if security enabled)
bin/solr auth enable -type basicAuth -credentials admin:password
```

## Docker Deployment

Solr provides official Docker images for containerized deployments with support for SolrCloud clustering.

```bash
# Run Solr standalone
docker run -d -p 8983:8983 --name solr solr:latest

# Run Solr with persistent data
docker run -d -p 8983:8983 -v solr_data:/var/solr --name solr solr:latest

# Create a core on startup
docker run -d -p 8983:8983 --name solr solr:latest solr-precreate mycore

# Run in demo mode (creates example collection)
docker run -d -p 8983:8983 --name solr solr:latest solr-demo
```

```yaml
# docker-compose.yml for SolrCloud with ZooKeeper
version: '3.8'
services:
  zookeeper:
    image: zookeeper:3.9
    ports:
      - "2181:2181"
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: server.1=zookeeper:2888:3888;2181

  solr1:
    image: solr:latest
    ports:
      - "8983:8983"
    environment:
      ZK_HOST: zookeeper:2181
    depends_on:
      - zookeeper
    volumes:
      - solr1_data:/var/solr

  solr2:
    image: solr:latest
    ports:
      - "8984:8983"
    environment:
      ZK_HOST: zookeeper:2181
    depends_on:
      - zookeeper
    volumes:
      - solr2_data:/var/solr

volumes:
  solr1_data:
  solr2_data:
```

## Schema Configuration

The managed-schema.xml file defines field types, fields, and dynamic field patterns for document indexing.

```xml
<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example-config" version="1.7">
  <!-- Unique key field -->
  <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>

  <!-- Version field for optimistic concurrency -->
  <field name="_version_" type="plong" indexed="false" stored="false"/>

  <!-- Content fields -->
  <field name="title" type="text_general" indexed="true" stored="true"/>
  <field name="content" type="text_general" indexed="true" stored="true"/>
  <field name="author" type="string" indexed="true" stored="true"/>
  <field name="category" type="string" indexed="true" stored="true" multiValued="true"/>
  <field name="price" type="pfloat" indexed="true" stored="true"/>
  <field name="inStock" type="boolean" indexed="true" stored="true"/>
  <field name="publishDate" type="pdate" indexed="true" stored="true"/>

  <!-- Catch-all field for full-text search -->
  <field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>

  <!-- Copy fields for catch-all search -->
  <copyField source="title" dest="_text_"/>
  <copyField source="content" dest="_text_"/>
  <copyField source="author" dest="_text_"/>

  <!-- Dynamic fields - type inferred from suffix -->
  <dynamicField name="*_i"   type="pint"    indexed="true" stored="true"/>
  <dynamicField name="*_s"   type="string"  indexed="true" stored="true"/>
  <dynamicField name="*_l"   type="plong"   indexed="true" stored="true"/>
  <dynamicField name="*_t"   type="text_general" indexed="true" stored="true"/>
  <dynamicField name="*_b"   type="boolean" indexed="true" stored="true"/>
  <dynamicField name="*_f"   type="pfloat"  indexed="true" stored="true"/>
  <dynamicField name="*_d"   type="pdouble" indexed="true" stored="true"/>
  <dynamicField name="*_dt"  type="pdate"   indexed="true" stored="true"/>

  <!-- Field types -->
  <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
  <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
  <fieldType name="pint" class="solr.IntPointField"/>
  <fieldType name="pfloat" class="solr.FloatPointField"/>
  <fieldType name="plong" class="solr.LongPointField"/>
  <fieldType name="pdouble" class="solr.DoublePointField"/>
  <fieldType name="pdate" class="solr.DatePointField"/>

  <!-- Text field with standard analysis -->
  <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
      <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true"/>
    </analyzer>
  </fieldType>

  <!-- Unique key specification -->
  <uniqueKey>id</uniqueKey>
</schema>
```

## solrconfig.xml Configuration

The solrconfig.xml file configures request handlers, caching, indexing settings, and other Solr behaviors.

```xml
<?xml version="1.0" encoding="UTF-8" ?>
<config>
  <luceneMatchVersion>10.0</luceneMatchVersion>

  <!-- Index configuration -->
  <indexConfig>
    <ramBufferSizeMB>100</ramBufferSizeMB>
    <lockType>native</lockType>
  </indexConfig>

  <!-- Query handler for searches -->
  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="df">_text_</str>
      <int name="rows">10</int>
      <str name="wt">json</str>
    </lst>
  </requestHandler>

  <!-- Update handler for indexing -->
  <requestHandler name="/update" class="solr.UpdateRequestHandler"/>

  <!-- Real-time get handler -->
  <requestHandler name="/get" class="solr.RealTimeGetHandler">
    <lst name="defaults">
      <str name="omitHeader">true</str>
      <str name="wt">json</str>
    </lst>
  </requestHandler>

  <!-- Query caches -->
  <query>
    <maxBooleanClauses>1024</maxBooleanClauses>
    <filterCache class="solr.CaffeineCache"
                 size="512" initialSize="512" autowarmCount="256"/>
    <queryResultCache class="solr.CaffeineCache"
                      size="512" initialSize="512" autowarmCount="256"/>
    <documentCache class="solr.CaffeineCache"
                   size="512" initialSize="512"/>
    <enableLazyFieldLoading>true</enableLazyFieldLoading>
  </query>

  <!-- Update processor chain with schema guessing -->
  <updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="true">
    <processor class="solr.UUIDUpdateProcessorFactory">
      <str name="fieldName">id</str>
    </processor>
    <processor class="solr.RemoveBlankFieldUpdateProcessorFactory"/>
    <processor class="solr.ParseBooleanFieldUpdateProcessorFactory"/>
    <processor class="solr.ParseLongFieldUpdateProcessorFactory"/>
    <processor class="solr.ParseDoubleFieldUpdateProcessorFactory"/>
    <processor class="solr.ParseDateFieldUpdateProcessorFactory"/>
    <processor class="solr.AddSchemaFieldsUpdateProcessorFactory"/>
    <processor class="solr.LogUpdateProcessorFactory"/>
    <processor class="solr.RunUpdateProcessorFactory"/>
  </updateRequestProcessorChain>

  <!-- Auto-commit settings -->
  <updateHandler class="solr.DirectUpdateHandler2">
    <autoCommit>
      <maxTime>15000</maxTime>
      <maxDocs>10000</maxDocs>
      <openSearcher>false</openSearcher>
    </autoCommit>
    <autoSoftCommit>
      <maxTime>1000</maxTime>
    </autoSoftCommit>
  </updateHandler>
</config>
```

Apache Solr is the ideal choice for applications requiring powerful full-text search, faceted navigation, real-time indexing, and high availability. Common use cases include e-commerce product search, content management systems, log analytics, enterprise search portals, and any application where fast, relevant search results are critical.

The platform supports multiple integration patterns: direct REST API calls for simple applications, SolrJ for Java applications requiring type safety and connection pooling, and Docker/Kubernetes deployments for cloud-native architectures. SolrCloud mode enables horizontal scaling with automatic sharding, leader election, and distributed queries across clusters of any size.