type
status
date
slug
summary
tags
category
icon
password

Building MLC-LLM: A Step-by-Step Guide

In this post, we’ll walk through how to set up and compile the MLC-LLM project using TVM and MLC. Before diving in, please note the following prerequisites and best practices:
  • CUDA Environment: Ensure your CUDA environment is properly configured. We have verified that CUDA-12.4 works correctly.
  • Environment Isolation: It is highly recommended to use Anaconda for environment isolation to prevent unexpected issues.
  • Compilation Order: Avoid pitfalls by following the optimal compilation sequence—build TVM first, then MLC. You only need to create the Conda environment once during the TVM build process (which will cover all dependencies required by MLC).
Tip: Always start by cloning the latest source code from GitHub:

1. Compiling TVM from Source

1.1 Source Preparation

Simply use the TVM source code cloned into the 3rdparty/tvm directory within the MLC-LLM project. Alternatively, you may choose to clone the TVM repository directly from GitHub.

1.2 Creating and Activating the Conda Environment

Create a new Conda environment that includes all build dependencies. Run:

1.3 Compiling the TVM Source

Assume your TVM source is located at ~/mlc-llm/3rdparty/tvm (remember this path as TVM_SOURCE_DIR).

1.3.1 Generating and Configuring config.cmake

Navigate to the TVM directory, create a build directory, and copy the sample config file:
Next, append the following settings to the config.cmake file in the build folder. These configurations enable CUDA and OpenCL by default (adjust as needed):
Note:
  • If you are using CUDA with a compute capability above 80, you must build with set(USE_FLASHINFER ON).
  • When planning to use CUDA later, be sure to review and enable the CUDA options in the configuration!

1.3.2 Modifying CMakeLists.txt in TVM

Within the TVM source directory, edit the CMakeLists.txt file to include the following:

1.3.3 Building TVM

In the build directory, run the following commands:
(Tip: You can replace $(nproc) with a specific number, such as 8, to control the number of parallel threads.)

1.3.4 Converting TVM into a Python-Executable Extension

Activate your Conda environment and ensure Python is installed, then navigate to the python directory inside TVM:

1.3.5 Verifying the TVM Installation

You can check if TVM is installed correctly by running:
Expected output includes details such as CUDA_VERSION: 12.4, among other configurations. For further details, refer to the TVM installation guide.

2. Compiling the MLC Source Code

2.1 Building MLC

Ensure your Conda environment is still activated, then compile the MLC source code:

2.2 Converting MLC into a Python-Executable Extension

While still in the activated environment, move to the Python folder of MLC-LLM and install it:

2.3 Verifying the MLC Installation

You should see the libmlc_llm.so and libtvm_runtime.so files in the build directory:
Also, verify by running the help command:
For further reference, consult the MLC-LLM installation guide.

3. Potential Risks and Troubleshooting

3.1 Virtual Machine Considerations

If you are setting up the environment using a virtual machine (such as WSL) and encounter compilation errors (with messages like "xxx process killed xxxx" without clear errors), try increasing your swap space. For example:

3.2 Common CMake Errors

For issues related to missing libraries like zlib or libxml2 during TVM compilation on Ubuntu, run:
Memory and Computation in LLM Inference: Transformers vs. RWKVReal-time Logging in Model Training Using Python