IntelliJ Platform Plugin SDK Help

4. Lexer and Parser Definition

The lexical analyzer defines how the contents of a file are broken into tokens, which is the basis for supporting custom language features. The easiest way to create a lexer is to use JFlex.

Define a Lexer

Define a Simple.flex file with rules for the Simple Language lexer in package org.intellij.sdk.language.

// Copyright 2000-2022 JetBrains s.r.o. and other contributors. Use of this source code is governed by the Apache 2.0 license that can be found in the LICENSE file. package org.intellij.sdk.language; import com.intellij.lexer.FlexLexer; import com.intellij.psi.tree.IElementType; import org.intellij.sdk.language.psi.SimpleTypes; import com.intellij.psi.TokenType; %% %class SimpleLexer %implements FlexLexer %unicode %function advance %type IElementType %eof{ return; %eof} CRLF=\R WHITE_SPACE=[\ \n\t\f] FIRST_VALUE_CHARACTER=[^ \n\f\\] | "\\"{CRLF} | "\\". VALUE_CHARACTER=[^\n\f\\] | "\\"{CRLF} | "\\". END_OF_LINE_COMMENT=("#"|"!")[^\r\n]* SEPARATOR=[:=] KEY_CHARACTER=[^:=\ \n\t\f\\] | "\\ " %state WAITING_VALUE %% <YYINITIAL> {END_OF_LINE_COMMENT} { yybegin(YYINITIAL); return SimpleTypes.COMMENT; } <YYINITIAL> {KEY_CHARACTER}+ { yybegin(YYINITIAL); return SimpleTypes.KEY; } <YYINITIAL> {SEPARATOR} { yybegin(WAITING_VALUE); return SimpleTypes.SEPARATOR; } <WAITING_VALUE> {CRLF}({CRLF}|{WHITE_SPACE})+ { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; } <WAITING_VALUE> {WHITE_SPACE}+ { yybegin(WAITING_VALUE); return TokenType.WHITE_SPACE; } <WAITING_VALUE> {FIRST_VALUE_CHARACTER}{VALUE_CHARACTER}* { yybegin(YYINITIAL); return SimpleTypes.VALUE; } ({CRLF}|{WHITE_SPACE})+ { yybegin(YYINITIAL); return TokenType.WHITE_SPACE; } [^] { return TokenType.BAD_CHARACTER; }

Generate a Lexer Class

Now generate a lexer class via Run JFlex Generator from the context menu on Simple.flex file.

The Grammar-Kit plugin uses the JFlex lexer generation. When running for the first time, JFlex prompts for a destination folder to download the JFlex library and skeleton. Choose the project root directory, for example code_samples/simple_language_plugin.

After that, the IDE generates the lexer under the gen directory, for example in simple_language_plugin/src/main/gen/org/intellij/sdk/language/SimpleLexer.

Define a Lexer Adapter

The JFlex lexer needs to be adapted to the IntelliJ Platform Lexer API. Implement SimpleLexerAdapter by subclassing FlexAdapter.

// Copyright 2000-2022 JetBrains s.r.o. and other contributors. Use of this source code is governed by the Apache 2.0 license that can be found in the LICENSE file. package org.intellij.sdk.language; import com.intellij.lexer.FlexAdapter; public class SimpleLexerAdapter extends FlexAdapter { public SimpleLexerAdapter() { super(new SimpleLexer(null)); } }

Define a Root File

The SimpleFile implementation is the top-level node of the tree of PsiElements for a Simple Language file.

// Copyright 2000-2022 JetBrains s.r.o. and other contributors. Use of this source code is governed by the Apache 2.0 license that can be found in the LICENSE file. package org.intellij.sdk.language.psi; import com.intellij.extapi.psi.PsiFileBase; import com.intellij.openapi.fileTypes.FileType; import com.intellij.psi.FileViewProvider; import org.intellij.sdk.language.SimpleFileType; import org.intellij.sdk.language.SimpleLanguage; import org.jetbrains.annotations.NotNull; public class SimpleFile extends PsiFileBase { public SimpleFile(@NotNull FileViewProvider viewProvider) { super(viewProvider, SimpleLanguage.INSTANCE); } @NotNull @Override public FileType getFileType() { return SimpleFileType.INSTANCE; } @Override public String toString() { return "Simple File"; } }

Define SimpleTokenSets

Define all sets of related token types from SimpleTypes in SimpleTokenSets.

// Copyright 2000-2022 JetBrains s.r.o. and contributors. Use of this source code is governed by the Apache 2.0 license. package org.intellij.sdk.language.psi; import com.intellij.psi.tree.TokenSet; public interface SimpleTokenSets { TokenSet IDENTIFIERS = TokenSet.create(SimpleTypes.KEY); TokenSet COMMENTS = TokenSet.create(SimpleTypes.COMMENT); }

Define a Parser

The Simple Language parser is defined in SimpleParserDefinition by subclassing ParserDefinition. To avoid unnecessary classloading when initializing the extension point implementation, all TokenSet return values should use constants from dedicated $Language$TokenSets class.

// Copyright 2000-2022 JetBrains s.r.o. and other contributors. Use of this source code is governed by the Apache 2.0 license that can be found in the LICENSE file. package org.intellij.sdk.language; import com.intellij.lang.ASTNode; import com.intellij.lang.ParserDefinition; import com.intellij.lang.PsiParser; import com.intellij.lexer.Lexer; import com.intellij.openapi.project.Project; import com.intellij.psi.FileViewProvider; import com.intellij.psi.PsiElement; import com.intellij.psi.PsiFile; import com.intellij.psi.tree.IFileElementType; import com.intellij.psi.tree.TokenSet; import org.intellij.sdk.language.parser.SimpleParser; import org.intellij.sdk.language.psi.SimpleFile; import org.intellij.sdk.language.psi.SimpleTokenSets; import org.intellij.sdk.language.psi.SimpleTypes; import org.jetbrains.annotations.NotNull; public class SimpleParserDefinition implements ParserDefinition { public static final IFileElementType FILE = new IFileElementType(SimpleLanguage.INSTANCE); @NotNull @Override public Lexer createLexer(Project project) { return new SimpleLexerAdapter(); } @NotNull @Override public TokenSet getCommentTokens() { return SimpleTokenSets.COMMENTS; } @NotNull @Override public TokenSet getStringLiteralElements() { return TokenSet.EMPTY; } @NotNull @Override public PsiParser createParser(final Project project) { return new SimpleParser(); } @NotNull @Override public IFileElementType getFileNodeType() { return FILE; } @NotNull @Override public PsiFile createFile(@NotNull FileViewProvider viewProvider) { return new SimpleFile(viewProvider); } @NotNull @Override public PsiElement createElement(ASTNode node) { return SimpleTypes.Factory.createElement(node); } }

Register the Parser Definition

Registering the parser definition in the plugin.xml file makes it available to the IntelliJ Platform. Use the com.intellij.lang.parserDefinition extension point for registration. For example, see simple_language_plugin/src/main/resources/META-INF/plugin.xml.

<extensions defaultExtensionNs="com.intellij"> <lang.parserDefinition language="Simple" implementationClass="org.intellij.sdk.language.SimpleParserDefinition"/> </extensions>

Run the Project

Run the plugin by using the Gradle runIde task.

Create a test.simple file with the following content:

# You are reading the ".properties" entry. ! The exclamation mark can also mark text as comments. website = https://en.wikipedia.org/ language = English # The backslash below tells the application to continue reading # the value onto the next line. message = Welcome to \ Wikipedia! # Add spaces to the key key\ with\ spaces = This is the value that could be looked up with the key "key with spaces". # Unicode tab : \u0009

Now open the PsiViewer tool window and check how the lexer breaks the content of the file into tokens, and the parser transforms the tokens into PSI elements.

PSI Elements
Last modified: 27 September 2022