
LLM integration services in TypeScript transform static applications into AI-powered tools: get started now or get left behind .
WebLLM
Dependencies
Adding LLM configuration file
This code configures the application to use a predefined list of models and enables the use of web workers:
model_list : This property is set to the model_list from the prebuiltAppConfig. It contains a list of models that the application can use. Here are the primary families of models currently supported:
Llama : Llama 3, Llama 2, Hermes-2-Pro-Llama-3
Phi : Phi 3, Phi 2, Phi 1.5
Gemma : Gemma-2B
Mistral : Mistral-7B-v0.3, Hermes-2-Pro-Mistral-7B, NeuralHermes-2.5-Mistral-7B, OpenHermes-2.5-Mistral-7B
Qwen : Qwen2 0.5B, 1.5B, 7B
use_web_worker : This property is set to true, indicating that the application should use a web worker for running tasks. Web workers allow for running scripts in background threads, which can improve performance by offloading tasks from the main thread.
Instantiate the Engine
This code performs followed three steps:
Step 1. Importing all the exported members
The first line imports all the exported members (functions, classes, constants, etc.) from the @mlc-ai/web-llm package and makes them available under the namespace webllm.
Step 2. Determine Whether to Use a Web Worker
The second line retrieves the use_web_worker setting from the appConfig object. This setting determines whether the application should use a web worker for running tasks.
Step 3. Declare the Engine Variable
The third line declares a variable engine of type webllm.MLCEngineInterface. This variable will hold the instance of the machine learning engine.
Step 4. Instantiate the Engine:
If useWebWorker is true:
It creates an instance of webllm.WebWorkerMLCEngine.
This instance is initialized with a new web worker, created from the worker.ts file.
The web worker is set up to run as a module.
The engine is also configured with appConfig and a log level of "INFO".
If useWebWorker is false:
It creates an instance of webllm.MLCEngine directly, without using a web worker.
This instance is also configured with appConfig.
Main Entry Point
The entry point in this example is the asynchronous CreateAsync method, which initializes the ChatUI class, passing the engine instance as an argument. This method sets up UI elements with the specified engine, and registers event handlers:
Chat Completion
Once the engine is successfully initialized, you can utilize the engine.chat.completions interface to call chat completions in the OpenAI style:
Streaming
WebLLM also supports streaming chat completion generating. To utilize it, just include stream: true in the engine.chat.completions.create call.:
Testing
Run `npm install`and `npm start` in CMD or PowerShell to start the application. In our case, the system automatically selected the Llama-3.2-1B-Instruct-q4f32_1-MLC model for work. Also, in our case, a chatbot client had already been developed, which only needed to be integrated with the above-described interface of the WebLLM interface functionality.
As we can see, LLM integration copes well with abstract questions from the knowledge base on which it was trained. But model might not have real-time data access or the capability to provide specific weather updates.
The example demonstrates how to invoke chat completions using OpenAI-style chat APIs and how to enable streaming for real-time responses. These make the chat experience more dynamic and responsive.
Conclusion
LLM integration: Terms Explained
Prompt Engineering
Retrieval-Augmented Generation (RAG)
Embeddings
Vector Database
Function Calling
Context Window
Agent Frameworks
+48 22 104 20 98