Core Concepts - sourcegraph/zoekt

Zoekt is an open-source code search engine developed by Sourcegraph. It is designed to be scalable, efficient, and flexible. This document explains the fundamental architecture of Zoekt, including indexing, search, and data storage and retrieval.

Indexing

Zoekt uses a custom-built indexing engine based on the go-ctags library for symbol extraction. The indexing engine creates a .zoekt index file for each repository, which is stored on an SSD for fast searches. The indexing engine is responsible for writing the .zoekt index files, while the zoekt-webserver is responsible for responding to searches by reading these .zoekt index files.

Here’s an example of how to index a repository:

zoekt-indexer index --repo-path /path/to/repo --index-path /path/to/index

Search

Zoekt uses a custom-built search engine based on the Grafana regexp library. The search engine supports regular expression searches and is designed to be efficient and scalable. Zoekt also supports advanced search features such as filtering by file type, author, and date range.

Here’s an example of how to search a repository:

zoekt-search search --index-path /path/to/index --query "functionName"

Data Storage and Retrieval

Zoekt stores the .zoekt index files on an SSD for fast searches. The index files are stored in a flat file format, which makes it easy to manage and scale the index files. Zoekt also supports sharding, which allows you to distribute the index files across multiple nodes for even greater scalability.

Zoekt uses a custom-built distributed storage system based on the Raft consensus algorithm. The distributed storage system ensures that the index files are consistent and available across all nodes in the cluster.

Here’s an example of how to start a Zoekt node:

zoekt-node start --address 0.0.0.0 --shard-name shard1 --gitlab-url https://gitlab.com

Dependencies

Zoekt depends on several open-source projects, including:

  • Go programming language
  • gRPC
  • Protocol Buffers
  • Go standard library
  • go-ctags
  • go-cmp
  • Slothfs
  • Grafana regexp
  • Jaeger

Resources

Sources: