File-Based Indexes

Edit pageLast modified: 19 March 2025

File-based indexes are based on a Map/Reduce architecture. Each index has a specific type of key and a particular type of value.

The key is what's later used to retrieve data from the index.

Example: in the word index, the key is the word itself.

The value is arbitrary data, which is associated with the key in the index.

Example: in the word index, the value is a mask indicating in which context the word occurs (code, string literal, or comment).

In the simplest case, when one needs to know in what files some data is present, the value has type Void and is not stored in the index.

When the index implementation indexes a file, it receives a file's content and returns a map from the keys found in the file to the associated values.

When accessing an index, specify the key you're interested in and get back the list of files in which the key occurs, and the value associated with each file.

tip
In some cases, using Gists can be considered as an alternative.

Implementing a File-Based Index

tip
A relatively simple file-based index implementation is the UI Designer bound forms index, storing FQN of bound implementation class for GUI Designer .form files.

Each specific index implementation is a class extending FileBasedIndexExtension registered in com.intellij.fileBasedIndex extension point.

An implementation of a file-based index consists of the following main parts:

getIndexer() returns the DataIndexer implementation actually responsible for building a set of key/value pairs based on file content.
getKeyDescriptor() returns the KeyDescriptor responsible for comparing keys and storing them in a serialized binary format. Probably the most commonly used implementation is EnumeratorStringDescriptor, which is designed for storing identifiers efficiently.
getValueExternalizer() returns the DataExternalizer responsible for storing values in a serialized binary format.
getInputFilter() allows restricting the indexing only to a certain set of files. Consider using DefaultFileTypeSpecificInputFilter.
getName() returns a unique index ID. Consider using fully qualified index class name to not clash with other plugins defining index with the same ID, e.g., com.example.myplugin.indexing.MyIndex.
getVersion() returns the version of the index implementation. The index is automatically rebuilt if the current version differs from the version of the index implementation used to build it.

If there's no value to associate with the files (i.e., value type is Void), simplify the implementation by extending ScalarIndexExtension. In case of single value per file, extend from SingleEntryFileBasedIndexExtension.

Please see also Improving indexing performance.

warning
Critical Implementation Notes
Value class must implement equals() and hashCode() properly, so a value deserialized from binary data should be equal to original one.
The data returned by DataIndexer.map() must depend only on input data passed to the method, and must not depend on any external files. Otherwise, your index will not be correctly updated when the external data changes, and you will have stale data in your index.
Please set system property intellij.idea.indices.debug/intellij.idea.indices.debug.extra.sanity to true to enable additional debugging assertions during development to assert correct index implementation.

Accessing a File-Based Index

Access to file-based indexes is performed through the FileBasedIndex class.

note
Please note index access is restricted during dumb mode.

The following primary operations are supported:

getAllKeys() and processAllKeys() allow obtaining the list of all keys found in files, which are a part of the specified project. To optimize performance, consider returning true from FileBasedIndexExtension.traceKeyHashToVirtualFileMapping() (see its Javadoc for details).

note
The returned data is guaranteed to contain all keys found in up-to-date project content, but may also include additional keys not currently found in the project.

getValues() allows to get all values associated with a specific key but not the files in which they were found.
getContainingFiles() allows collecting all files in which a particular key was encountered.
processValues() allows iterating through all files in which a specific key was encountered and accessing the associated values simultaneously.

Nested Index Access

When accessing index data in nested calls (usually from multiple indexes), limitations might apply.

2023.1 and later

2022.3 and earlier

Nested index access is now possible.

NOTE: Please do not use yet This is known to cause problems under certain conditions, please watch this issue.

warning
Nested index access is forbidden as it might lead to a deadlock. Collect all necessary data from index A first, then process results while accessing index B.

Standard Indexes

The IntelliJ Platform contains several standard file-based indexes. The most useful indexes for plugin developers are:

Word Index

Generally, the word index should be accessed indirectly by using helper methods of the PsiSearchHelper class.

File Name Index

FilenameIndex provides a quick way to find all files matching a specific file name.

File Type Index

FileTypeIndex serves a similar goal: it allows to find all files of a particular FileType quickly.

Additional Index Roots

To add additional files/directories to be indexed, implement IndexableSetContributor and register in com.intellij.indexedRootsProvider extension point.

File-Based Indexes﻿

tip

Implementing a File-Based Index﻿

tip

warning

Accessing a File-Based Index﻿

note

note

Nested Index Access﻿

warning

Standard Indexes﻿

Word Index﻿

File Name Index﻿

File Type Index﻿

Additional Index Roots﻿