syntax_styler.ts

Declarations
#

13 declarations

view source

AddSyntaxGrammar
#

syntax_styler.ts view source

AddSyntaxGrammar

HookAfterTokenizeCallback
#

syntax_styler.ts view source

HookAfterTokenizeCallback

HookAfterTokenizeCallbackContext
#

syntax_styler.ts view source

HookAfterTokenizeCallbackContext

`code`

type string

`grammar`

type SyntaxGrammar

`lang`

type string

`tokens`

type SyntaxTokenStream

HookBeforeTokenizeCallback
#

syntax_styler.ts view source

HookBeforeTokenizeCallback

HookBeforeTokenizeCallbackContext
#

syntax_styler.ts view source

HookBeforeTokenizeCallbackContext

`code`

type string

`grammar`

type SyntaxGrammar

`lang`

type string

`tokens`

type undefined

HookWrapCallback
#

syntax_styler.ts view source

HookWrapCallback

HookWrapCallbackContext
#

syntax_styler.ts view source

HookWrapCallbackContext

`type`

type string

`content`

type string

`tag`

type string

`classes`

type Array<string>

`attributes`

type Record<string, string>

`lang`

type string

SyntaxGrammar
#

syntax_styler.ts view source

SyntaxGrammar

A grammar after normalization. All values are arrays of normalized tokens with consistent shapes.

SyntaxGrammarRaw
#

syntax_styler.ts view source

SyntaxGrammarRaw

SyntaxGrammarToken
#

syntax_styler.ts view source

SyntaxGrammarToken

Grammar token with all properties required. This is the normalized representation used at runtime.

`pattern`

type RegExp

`lookbehind`

type boolean

`greedy`

type boolean

`alias`

type Array<string>

`inside`

type SyntaxGrammar | null

SyntaxGrammarTokenRaw
#

syntax_styler.ts view source

SyntaxGrammarTokenRaw

The expansion of a simple RegExp literal to support additional properties.

The inside grammar will be used to tokenize the text value of each token of this kind.

This can be used to make nested and even recursive language definitions.

Note: This can cause infinite recursion. Be careful when you embed different languages or even the same language into each another.

Note: Grammar authors can use optional properties, but they will be normalized to required properties at registration time for optimal performance.

`pattern`

The regular expression of the token.

type RegExp

`lookbehind`

If true, then the first capturing group of pattern will (effectively) behave as a lookbehind group meaning that the captured text will not be part of the matched text of the new token.

type boolean

`greedy`

Whether the token is greedy.

type boolean

`alias`

An optional alias or list of aliases.

type string | Array<string>

`inside`

The nested grammar of this token.

type SyntaxGrammarRaw | null

SyntaxGrammarValueRaw
#

syntax_styler.ts view source

SyntaxGrammarValueRaw

SyntaxStyler
#

syntax_styler.ts view source

Based on Prism (https://github.com/PrismJS/prism) by Lea Verou (https://lea.verou.me/)

MIT license

`langs`

type Record<string, SyntaxGrammar | undefined>

`add_lang`

type (id: string, grammar: SyntaxGrammarRaw, aliases?: string[] | undefined): void

`id`

type string

`grammar`

type SyntaxGrammarRaw

`aliases?`

type string[] | undefined

optional

returns void

`add_extended_lang`

type (base_id: string, extension_id: string, extension: SyntaxGrammarRaw, aliases?: string[] | undefined): SyntaxGrammar

`base_id`

type string

`extension_id`

type string

`extension`

type SyntaxGrammarRaw

`aliases?`

type string[] | undefined

optional

returns SyntaxGrammar

`get_lang`

type (id: string): SyntaxGrammar

`id`

type string

returns SyntaxGrammar

`stylize`

Generates HTML with syntax highlighting from source code.

Process: 1. Runs before_tokenize hook 2. Tokenizes code using the provided or looked-up grammar 3. Runs after_tokenize hook 4. Runs wrap hook on each token 5. Converts tokens to HTML with CSS classes

Parameter Relationship: - lang is ALWAYS required for hook context and identification - grammar is optional; when undefined, automatically looks up via this.get_lang(lang) - When both are provided, grammar is used for tokenization, lang for metadata

Use cases: - Standard usage: stylize(code, 'ts') - uses registered TypeScript grammar - Custom grammar: stylize(code, 'ts', customGrammar) - uses custom grammar but keeps 'ts' label - Extended grammar: stylize(code, 'custom', this.extend_grammar('ts', extension)) - new language variant

type (text: string, lang: string, grammar?: SyntaxGrammar | undefined): string

`text`

- The source code to syntax highlight.

type string

`lang`

- Language identifier (e.g., 'ts', 'css', 'html'). Used for: - Grammar lookup when grammar is undefined - Hook context (lang field passed to hooks) - Language identification in output

type string

`grammar`

- Optional custom grammar object. When undefined, automatically looks up the grammar via this.get_lang(lang). Provide this to use a custom or modified grammar instead of the registered one.

type SyntaxGrammar | undefined

default this.get_lang(lang)

returns string

HTML string with syntax highlighting using CSS classes (.token_*)

`grammar_insert_before`

Inserts tokens before another token in a language definition or any other grammar.

Usage

This helper method makes it easy to modify existing languages. For example, the CSS language definition not only defines CSS styling for CSS documents, but also needs to define styling for CSS embedded in HTML through <style> elements. To do this, it needs to modify syntax_styler.get_lang('markup') and add the appropriate tokens. However, syntax_styler.get_lang('markup') is a regular JS object literal, so if you do this:

syntax_styler.get_lang('markup').style = {
    // token
};

then the style token will be added (and processed) at the end. insert_before allows you to insert tokens before existing tokens. For the CSS example above, you would use it like this:

grammar_insert_before('markup', 'cdata', {
    'style': {
        // token
    }
});

Special cases

If the grammars of inside and insert have tokens with the same name, the tokens in inside's grammar will be ignored.

This behavior can be used to insert tokens after before:

grammar_insert_before('markup', 'comment', {
    'comment': syntax_styler.get_lang('markup').comment,
    // tokens after 'comment'
});

Limitations

The main problem insert_before has to solve is iteration order. Since ES2015, the iteration order for object properties is guaranteed to be the insertion order (except for integer keys) but some browsers behave differently when keys are deleted and re-inserted. So insert_before can't be implemented by temporarily deleting properties which is necessary to insert at arbitrary positions.

To solve this problem, insert_before doesn't actually insert the given tokens into the target object. Instead, it will create a new object and replace all references to the target object with the new one. This can be done without temporarily deleting properties, so the iteration order is well-defined.

However, only references that can be reached from syntax_styler.langs or insert will be replaced. I.e. if you hold the target object in a variable, then the value of the variable will not change.

var oldMarkup = syntax_styler.get_lang('markup');
var newMarkup = grammar_insert_before('markup', 'comment', { ... });

assert(oldMarkup !== syntax_styler.get_lang('markup'));
assert(newMarkup === syntax_styler.get_lang('markup'));

type (inside: string, before: string, insert: SyntaxGrammarRaw, root?: Record<string, any>): SyntaxGrammar

`inside`

- The property of root (e.g. a language id in syntax_styler.langs) that contains the object to be modified.

type string

`before`

- The key to insert before.

type string

`insert`

- An object containing the key-value pairs to be inserted.

type SyntaxGrammarRaw

`root`

- The object containing inside, i.e. the object that contains the object to be modified.

Defaults to syntax_styler.langs.

type Record<string, any>

default this.langs

returns SyntaxGrammar

the new grammar object

`stringify_token`

Converts the given token or token stream to an HTML representation.

Runs the wrap hook on each SyntaxToken.

type (o: string | SyntaxTokenStream | SyntaxToken, lang: string): string

`o`

- The token or token stream to be converted.

type string | SyntaxTokenStream | SyntaxToken

`lang`

- The name of current language.

type string

returns string

The HTML representation of the token or token stream.

`extend_grammar`

Creates a deep copy of the language with the given id and appends the given tokens.

If a token in extension also appears in the copied language, then the existing token in the copied language will be overwritten at its original position.

Best practices

Since the position of overwriting tokens (token in extension that overwrite tokens in the copied language) doesn't matter, they can technically be in any order. However, this can be confusing to others that trying to understand the language definition because, normally, the order of tokens matters in the grammars.

Therefore, it is encouraged to order overwriting tokens according to the positions of the overwritten tokens. Furthermore, all non-overwriting tokens should be placed after the overwriting ones.

type (base_id: string, extension: SyntaxGrammarRaw): SyntaxGrammar

`base_id`

- The id of the language to extend. This has to be a key in syntax_styler.langs.

type string

`extension`

- The new tokens to append.

type SyntaxGrammarRaw

returns SyntaxGrammar

the new grammar

`normalize_pattern`

Normalize a single pattern to have consistent shape. This ensures all patterns have the same object shape for V8 optimization.

type (pattern: RegExp | SyntaxGrammarTokenRaw, visited: Set<number>): SyntaxGrammarToken

private

`pattern`

type RegExp | SyntaxGrammarTokenRaw

`visited`

type Set<number>

returns SyntaxGrammarToken

`normalize_grammar`

Normalize a grammar to have consistent object shapes. This performs several optimizations: 1. Merges rest property into main grammar 2. Ensures all pattern values are arrays 3. Normalizes all pattern objects to have consistent shapes 4. Adds global flag to greedy patterns

This is called once at registration time to avoid runtime overhead.

type (grammar: SyntaxGrammarRaw, visited: Set<number>): void

private

`grammar`

type SyntaxGrammarRaw

`visited`

- Set of grammar object IDs already normalized (for circular references)

type Set<number>

returns void

`plugins`

type Record<string, any>

`hooks_before_tokenize`

type Array<HookBeforeTokenizeCallback>

`hooks_after_tokenize`

type Array<HookAfterTokenizeCallback>

`hooks_wrap`

type Array<HookWrapCallback>

`add_hook_before_tokenize`

type (cb: HookBeforeTokenizeCallback): void

`cb`

type HookBeforeTokenizeCallback

returns void

`add_hook_after_tokenize`

type (cb: HookAfterTokenizeCallback): void

`cb`

type HookAfterTokenizeCallback

returns void

`add_hook_wrap`

type (cb: HookWrapCallback): void

`cb`

type HookWrapCallback

returns void

`run_hook_before_tokenize`

type (ctx: HookBeforeTokenizeCallbackContext): void

`ctx`

type HookBeforeTokenizeCallbackContext

returns void

`run_hook_after_tokenize`

type (ctx: HookAfterTokenizeCallbackContext): void

`ctx`

type HookAfterTokenizeCallbackContext

returns void

`run_hook_wrap`

type (ctx: HookWrapCallbackContext): void

`ctx`

type HookWrapCallbackContext

returns void

Depends on
#

Imported by
#

syntax_styler_global.ts

code

grammar

lang

tokens

code

grammar

lang

tokens

type

content

tag

classes

attributes

lang

pattern

lookbehind

greedy

alias

inside

pattern

lookbehind

greedy

alias

inside

see also

langs

add_lang

id

grammar

aliases?

add_extended_lang

base_id

extension_id

extension

aliases?

get_lang

id

stylize

text

lang

grammar

grammar_insert_before

Usage

Special cases

Limitations

inside

before

insert

root

stringify_token

o

lang

extend_grammar

Best practices

base_id

extension

normalize_pattern

pattern

visited

normalize_grammar

grammar

visited

plugins

hooks_before_tokenize

hooks_after_tokenize

hooks_wrap

add_hook_before_tokenize

cb

add_hook_after_tokenize

cb

add_hook_wrap

cb

run_hook_before_tokenize

ctx

run_hook_after_tokenize

ctx

run_hook_wrap

ctx

`code`

`grammar`

`lang`

`tokens`

`code`

`grammar`

`lang`

`tokens`

`type`

`content`

`tag`

`classes`

`attributes`

`lang`

`pattern`

`lookbehind`

`greedy`

`alias`

`inside`

`pattern`

`lookbehind`

`greedy`

`alias`

`inside`

`langs`

`add_lang`

`id`

`grammar`

`aliases?`

`add_extended_lang`

`base_id`

`extension_id`

`extension`

`aliases?`

`get_lang`

`id`

`stylize`

`text`

`lang`

`grammar`

`grammar_insert_before`

`inside`

`before`

`insert`

`root`

`stringify_token`

`o`

`lang`

`extend_grammar`

`base_id`

`extension`

`normalize_pattern`

`pattern`

`visited`

`normalize_grammar`

`grammar`

`visited`

`plugins`

`hooks_before_tokenize`

`hooks_after_tokenize`

`hooks_wrap`

`add_hook_before_tokenize`

`cb`

`add_hook_after_tokenize`

`cb`

`add_hook_wrap`

`cb`

`run_hook_before_tokenize`

`ctx`

`run_hook_after_tokenize`

`ctx`

`run_hook_wrap`

`ctx`