syntax_styler.ts

Declarations
#

13 declarations

view source

AddSyntaxGrammar
#

HookAfterTokenizeCallback
#

HookAfterTokenizeCallbackContext
#

HookBeforeTokenizeCallback
#

HookBeforeTokenizeCallbackContext
#

HookWrapCallback
#

HookWrapCallbackContext
#

syntax_styler.ts view source

HookWrapCallbackContext

type

type string

content

type string

tag

type string

classes

type Array<string>

attributes

type Record<string, string>

lang

type string

SyntaxGrammar
#

syntax_styler.ts view source

SyntaxGrammar

A grammar after normalization. All values are arrays of normalized tokens with consistent shapes.

SyntaxGrammarRaw
#

SyntaxGrammarToken
#

syntax_styler.ts view source

SyntaxGrammarToken

Grammar token with all properties required. This is the normalized representation used at runtime.

pattern

type RegExp

lookbehind

type boolean

greedy

type boolean

alias

type Array<string>

inside

type SyntaxGrammar | null

SyntaxGrammarTokenRaw
#

syntax_styler.ts view source

SyntaxGrammarTokenRaw

The expansion of a simple RegExp literal to support additional properties.

The inside grammar will be used to tokenize the text value of each token of this kind.

This can be used to make nested and even recursive language definitions.

Note: This can cause infinite recursion. Be careful when you embed different languages or even the same language into each another.

Note: Grammar authors can use optional properties, but they will be normalized to required properties at registration time for optimal performance.

pattern

The regular expression of the token.

type RegExp

lookbehind

If true, then the first capturing group of pattern will (effectively) behave as a lookbehind group meaning that the captured text will not be part of the matched text of the new token.

type boolean

greedy

Whether the token is greedy.

type boolean

alias

An optional alias or list of aliases.

type string | Array<string>

inside

The nested grammar of this token.

type SyntaxGrammarRaw | null

SyntaxGrammarValueRaw
#

SyntaxStyler
#

syntax_styler.ts view source

Based on Prism (https://github.com/PrismJS/prism) by Lea Verou (https://lea.verou.me/)

MIT license

see also

  • LICENSE

langs

type Record<string, SyntaxGrammar | undefined>

add_lang

type (id: string, grammar: SyntaxGrammarRaw, aliases?: string[] | undefined): void

id
type string
grammar
aliases?
type string[] | undefined
optional
returns void

add_extended_lang

type (base_id: string, extension_id: string, extension: SyntaxGrammarRaw, aliases?: string[] | undefined): SyntaxGrammar

base_id
type string
extension_id
type string
extension
aliases?
type string[] | undefined
optional

get_lang

type (id: string): SyntaxGrammar

id
type string

stylize

Generates HTML with syntax highlighting from source code.

Process: 1. Runs before_tokenize hook 2. Tokenizes code using the provided or looked-up grammar 3. Runs after_tokenize hook 4. Runs wrap hook on each token 5. Converts tokens to HTML with CSS classes

Parameter Relationship: - lang is ALWAYS required for hook context and identification - grammar is optional; when undefined, automatically looks up via this.get_lang(lang) - When both are provided, grammar is used for tokenization, lang for metadata

Use cases: - Standard usage: stylize(code, 'ts') - uses registered TypeScript grammar - Custom grammar: stylize(code, 'ts', customGrammar) - uses custom grammar but keeps 'ts' label - Extended grammar: stylize(code, 'custom', this.extend_grammar('ts', extension)) - new language variant

type (text: string, lang: string, grammar?: SyntaxGrammar | undefined): string

text

- The source code to syntax highlight.

type string
lang

- Language identifier (e.g., 'ts', 'css', 'html'). Used for: - Grammar lookup when grammar is undefined - Hook context (lang field passed to hooks) - Language identification in output

type string
grammar

- Optional custom grammar object. When undefined, automatically looks up the grammar via this.get_lang(lang). Provide this to use a custom or modified grammar instead of the registered one.

type SyntaxGrammar | undefined
default this.get_lang(lang)
returns string

HTML string with syntax highlighting using CSS classes (.token_*)

grammar_insert_before

Inserts tokens before another token in a language definition or any other grammar.

Usage

This helper method makes it easy to modify existing languages. For example, the CSS language definition not only defines CSS styling for CSS documents, but also needs to define styling for CSS embedded in HTML through <style> elements. To do this, it needs to modify syntax_styler.get_lang('markup') and add the appropriate tokens. However, syntax_styler.get_lang('markup') is a regular JS object literal, so if you do this:

syntax_styler.get_lang('markup').style = { // token };

then the style token will be added (and processed) at the end. insert_before allows you to insert tokens before existing tokens. For the CSS example above, you would use it like this:

grammar_insert_before('markup', 'cdata', { 'style': { // token } });

Special cases

If the grammars of inside and insert have tokens with the same name, the tokens in inside's grammar will be ignored.

This behavior can be used to insert tokens after before:

grammar_insert_before('markup', 'comment', { 'comment': syntax_styler.get_lang('markup').comment, // tokens after 'comment' });

Limitations

The main problem insert_before has to solve is iteration order. Since ES2015, the iteration order for object properties is guaranteed to be the insertion order (except for integer keys) but some browsers behave differently when keys are deleted and re-inserted. So insert_before can't be implemented by temporarily deleting properties which is necessary to insert at arbitrary positions.

To solve this problem, insert_before doesn't actually insert the given tokens into the target object. Instead, it will create a new object and replace all references to the target object with the new one. This can be done without temporarily deleting properties, so the iteration order is well-defined.

However, only references that can be reached from syntax_styler.langs or insert will be replaced. I.e. if you hold the target object in a variable, then the value of the variable will not change.

var oldMarkup = syntax_styler.get_lang('markup'); var newMarkup = grammar_insert_before('markup', 'comment', { ... }); assert(oldMarkup !== syntax_styler.get_lang('markup')); assert(newMarkup === syntax_styler.get_lang('markup'));

type (inside: string, before: string, insert: SyntaxGrammarRaw, root?: Record<string, any>): SyntaxGrammar

inside

- The property of root (e.g. a language id in syntax_styler.langs) that contains the object to be modified.

type string
before

- The key to insert before.

type string
insert

- An object containing the key-value pairs to be inserted.

root

- The object containing inside, i.e. the object that contains the object to be modified.

Defaults to syntax_styler.langs.

type Record<string, any>
default this.langs

the new grammar object

stringify_token

Converts the given token or token stream to an HTML representation.

Runs the wrap hook on each SyntaxToken.

type (o: string | SyntaxTokenStream | SyntaxToken, lang: string): string

o

- The token or token stream to be converted.

type string | SyntaxTokenStream | SyntaxToken
lang

- The name of current language.

type string
returns string

The HTML representation of the token or token stream.

extend_grammar

Creates a deep copy of the language with the given id and appends the given tokens.

If a token in extension also appears in the copied language, then the existing token in the copied language will be overwritten at its original position.

Best practices

Since the position of overwriting tokens (token in extension that overwrite tokens in the copied language) doesn't matter, they can technically be in any order. However, this can be confusing to others that trying to understand the language definition because, normally, the order of tokens matters in the grammars.

Therefore, it is encouraged to order overwriting tokens according to the positions of the overwritten tokens. Furthermore, all non-overwriting tokens should be placed after the overwriting ones.

type (base_id: string, extension: SyntaxGrammarRaw): SyntaxGrammar

base_id

- The id of the language to extend. This has to be a key in syntax_styler.langs.

type string
extension

- The new tokens to append.

the new grammar

normalize_pattern

Normalize a single pattern to have consistent shape. This ensures all patterns have the same object shape for V8 optimization.

type (pattern: RegExp | SyntaxGrammarTokenRaw, visited: Set<number>): SyntaxGrammarToken

private
pattern
type RegExp | SyntaxGrammarTokenRaw
visited
type Set<number>

normalize_grammar

Normalize a grammar to have consistent object shapes. This performs several optimizations: 1. Merges rest property into main grammar 2. Ensures all pattern values are arrays 3. Normalizes all pattern objects to have consistent shapes 4. Adds global flag to greedy patterns

This is called once at registration time to avoid runtime overhead.

type (grammar: SyntaxGrammarRaw, visited: Set<number>): void

private
grammar
visited

- Set of grammar object IDs already normalized (for circular references)

type Set<number>
returns void

plugins

type Record<string, any>

hooks_before_tokenize

type Array<HookBeforeTokenizeCallback>

hooks_after_tokenize

type Array<HookAfterTokenizeCallback>

hooks_wrap

type Array<HookWrapCallback>

add_hook_before_tokenize

type (cb: HookBeforeTokenizeCallback): void

cb
returns void

add_hook_after_tokenize

type (cb: HookAfterTokenizeCallback): void

cb
returns void

add_hook_wrap

type (cb: HookWrapCallback): void

cb
returns void

run_hook_before_tokenize

type (ctx: HookBeforeTokenizeCallbackContext): void

ctx
returns void

run_hook_after_tokenize

type (ctx: HookAfterTokenizeCallbackContext): void

ctx
returns void

run_hook_wrap

type (ctx: HookWrapCallbackContext): void

ctx
returns void

Depends on
#

Imported by
#