内存之舞：mimalloc 与 tikv-jemallocator 实战诗篇

houseme
24 Oct, 2025

Photos provided by Unsplash OR Pexels

在程序的王国里，内存是流淌的血液，两位舞者翩翩起舞，各有独特韵律。让我们走进这场优雅的性能之舞，探寻内存管理的艺术真谛。

第一章：初识舞者

mimalloc - 轻灵的芭蕾舞者

由微软精心打造，以简洁高效著称，跨平台表现卓越，特别在 Apple Silicon 上舞姿优美。

tikv-jemallocator - 力量的现代舞者

源自 FreeBSD 的 jemalloc，经 TiKV 项目锤炼，在多线程场景下展现强大实力，尤擅处理内存碎片。

第二章：舞台搭建

创建演示项目

cargo new memory-allocator-demo
cd memory-allocator-demo

Cargo.toml 配置

[package]
name = "memory-allocator-demo"
version = "0.1.0"
edition = "2021"

[dependencies]
mimalloc = "0.1.39"
tikv-jemallocator = "0.5.0"
rayon = "1.7.0"  # 用于并行测试
rand = "0.8.5"   # 用于生成随机数据

[features]
default = ["system"]
use-mimalloc = []
use-jemalloc = []
use-system = []

第三章：实战演练

基础使用示例

// src/allocators/mod.rs
use std::alloc::{GlobalAlloc, Layout};
use std::time::Instant;

// 条件编译选择分配器
#[cfg(feature = "use-mimalloc")]
use mimalloc::MiMalloc;

#[cfg(feature = "use-jemalloc")]
use tikv_jemallocator::Jemalloc;

#[cfg(feature = "use-system")]
use std::alloc::System;

// 根据特性选择全局分配器
#[cfg(feature = "use-mimalloc")]
#[global_allocator]
static GLOBAL: MiMalloc = MiMalloc;

#[cfg(feature = "use-jemalloc")] 
#[global_allocator]
static GLOBAL: Jemalloc = Jemalloc;

#[cfg(feature = "use-system")]
#[global_allocator]
static GLOBAL: System = System;

pub struct AllocatorBenchmark;

impl AllocatorBenchmark {
    /// 测试大量小对象分配
    pub fn test_small_allocations(num_objects: usize) -> f64 {
        let start = Instant::now();
        
        let mut vecs = Vec::with_capacity(num_objects);
        for i in 0..num_objects {
            // 分配不同大小的向量
            let size = (i % 64) + 1;
            let v = vec![i as u8; size];
            vecs.push(v);
        }
        
        // 保持对象存活直到测试结束
        std::mem::forget(vecs);
        
        let duration = start.elapsed();
        duration.as_secs_f64()
    }
    
    /// 测试大块内存分配
    pub fn test_large_allocations(num_allocations: usize) -> f64 {
        let start = Instant::now();
        
        let mut large_blocks = Vec::with_capacity(num_allocations);
        for i in 0..num_allocations {
            // 分配 1MB 到 10MB 的内存块
            let size = (1 + (i % 10)) * 1024 * 1024;
            let block = vec![0u8; size];
            large_blocks.push(block);
        }
        
        let duration = start.elapsed();
        duration.as_secs_f64()
    }
    
    /// 并行分配测试
    pub fn test_parallel_allocations() -> f64 {
        use rayon::prelude::*;
        
        let start = Instant::now();
        
        let results: Vec<Vec<u32>> = (0..1000)
            .collect::<Vec<_>>()
            .par_chunks(100)
            .map(|chunk| {
                chunk.iter()
                    .map(|&x| x * 2)
                    .collect()
            })
            .collect();
            
        let duration = start.elapsed();
        // 防止优化掉结果
        std::mem::forget(results);
        duration.as_secs_f64()
    }
}

性能比较工具

// src/benchmark.rs
use crate::allocators::AllocatorBenchmark;
use std::collections::HashMap;

pub struct BenchmarkRunner {
    results: HashMap<String, HashMap<String, f64>>,
}

impl BenchmarkRunner {
    pub fn new() -> Self {
        Self {
            results: HashMap::new(),
        }
    }
    
    pub fn run_all_benchmarks(&mut self, allocator_name: &str) {
        println!("\n🎯 运行基准测试：{}", allocator_name);
        println!("{}", "─".repeat(50));
        
        let mut allocator_results = HashMap::new();
        
        // 测试 1: 小对象分配
        println!("🧩 测试小对象分配...");
        let small_time = AllocatorBenchmark::test_small_allocations(100_000);
        allocator_results.insert("small_allocations".to_string(), small_time);
        println!("   完成：{:.4} 秒", small_time);
        
        // 测试 2: 大内存分配  
        println!("🏗️  测试大内存分配...");
        let large_time = AllocatorBenchmark::test_large_allocations(100);
        allocator_results.insert("large_allocations".to_string(), large_time);
        println!("   完成：{:.4} 秒", large_time);
        
        // 测试 3: 并行分配
        println!("⚡ 测试并行分配...");
        let parallel_time = AllocatorBenchmark::test_parallel_allocations();
        allocator_results.insert("parallel_allocations".to_string(), parallel_time);
        println!("   完成：{:.4} 秒", parallel_time);
        
        self.results.insert(allocator_name.to_string(), allocator_results);
    }
    
    pub fn print_comparison(&self) {
        println!("\n📊 性能比较结果");
        println!("{}", "═".repeat(60));
        
        if let Some((first_name, first_results)) = self.results.iter().next() {
            for (test_name, &first_time) in first_results {
                println!("\n📈 测试：{}", test_name);
                println!("{}", "─".repeat(40));
                
                for (alloc_name, results) in &self.results {
                    if let Some(&time) = results.get(test_name) {
                        let ratio = time / first_time;
                        println!("  {:12}: {:.6}秒 (相对性能：{:.2}x)", 
                                alloc_name, time, ratio);
                    }
                }
            }
        }
        
        println!("\n💡 建议：");
        self.print_recommendations();
    }
    
    fn print_recommendations(&self) {
        println!("• mimalloc: 适合跨平台项目，特别是 Apple Silicon Mac");
        println!("• jemalloc: 适合服务器端、多线程密集型应用");  
        println!("• system:   适合简单应用或作为基准参考");
    }
}

实际应用示例

// src/real_world_examples.rs
use std::collections::HashMap;

pub struct DatabaseCache {
    data: HashMap<String, Vec<u8>>,
}

impl DatabaseCache {
    pub fn new() -> Self {
        Self {
            data: HashMap::new(),
        }
    }
    
    pub fn insert(&mut self, key: String, value: Vec<u8>) {
        self.data.insert(key, value);
    }
    
    pub fn get(&self, key: &str) -> Option<&Vec<u8>> {
        self.data.get(key)
    }
    
    pub fn memory_usage(&self) -> usize {
        self.data.iter()
            .map(|(k, v)| k.capacity() + v.capacity())
            .sum()
    }
}

pub fn simulate_workload() {
    println!("\n🔄 模拟真实工作负载...");
    
    let mut cache = DatabaseCache::new();
    
    // 模拟缓存填充
    for i in 0..5000 {
        let key = format!("user:{}:profile", i);
        let value = vec![b'x'; 1024 + (i % 512)]; // 不同大小的数据
        cache.insert(key, value);
    }
    
    println!("   缓存大小：{:.2} MB", 
             cache.memory_usage() as f64 / 1024.0 / 1024.0);
    
    // 模拟访问模式
    let hit_count = (0..1000)
        .filter(|i| cache.get(&format!("user:{}:profile", i % 6000)).is_some())
        .count();
        
    println!("   缓存命中率：{:.1}%", (hit_count as f64 / 1000.0) * 100.0);
}

第四章：完整演示程序

// src/main.rs
mod allocators;
mod benchmark;
mod real_world_examples;

use benchmark::BenchmarkRunner;
use real_world_examples::simulate_workload;

fn main() {
    println!("{}", "✨".repeat(60));
    println!("         内存分配器性能比较演示");
    println!("{}", "✨".repeat(60));
    
    // 检测当前平台
    #[cfg(target_os = "macos")]
    {
        #[cfg(target_arch = "aarch64")]
        println!("🌍 平台：macOS (Apple Silicon)");
        #[cfg(target_arch = "x86_64")]  
        println!("🌍 平台：macOS (Intel)");
    }
    
    #[cfg(target_os = "linux")]
    println!("🌍 平台：Linux");
    
    #[cfg(target_os = "windows")]
    println!("🌍 平台：Windows");
    
    let mut runner = BenchmarkRunner::new();
    
    // 运行不同分配器的测试
    #[cfg(not(all(target_os = "macos", target_arch = "aarch64")))]
    {
        // 在非 Apple Silicon Mac 上测试 jemalloc
        #[cfg(feature = "use-jemalloc")]
        runner.run_all_benchmarks("jemalloc");
    }
    
    #[cfg(feature = "use-mimalloc")] 
    runner.run_all_benchmarks("mimalloc");
    
    #[cfg(feature = "use-system")]
    runner.run_all_benchmarks("system");
    
    // 显示比较结果
    runner.print_comparison();
    
    // 运行实际应用示例
    simulate_workload();
    
    println!("\n{}", "🎉".repeat(60));
    println!("     演示完成！感谢探索内存管理的艺术");
    println!("{}", "🎉".repeat(60));
}

第五章：运行与测试

编译和运行

# 使用 mimalloc
cargo run --features use-mimalloc

# 使用系统分配器（基准）
cargo run --features use-system

# 在支持的平台上使用 jemalloc
cargo run --features use-jemalloc

# 在 Apple Silicon Mac 上，jemalloc 可能自动回退到系统分配器

预期输出示例

✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨
         内存分配器性能比较演示  
✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨✨
🌍 平台: macOS (Apple Silicon)

🎯 运行基准测试: mimalloc
──────────────────────────────────────────────────
🧩 测试小对象分配...
   完成: 0.0456 秒
🏗️  测试大内存分配...
   完成: 0.1234 秒  
⚡ 测试并行分配...
   完成: 0.0678 秒

📊 性能比较结果
════════════════════════════════════════════════════════

📈 测试: small_allocations
────────────────────────────────────────
  mimalloc    : 0.045632秒 (相对性能: 1.00x)
  system      : 0.051234秒 (相对性能: 1.12x)

第六章：选择指南 - 诗意总结

当选择在指尖徘徊

若你追求跨平台的和谐，在 Intel 与 Apple Silicon 间自由穿行， mimalloc 是你的诗意选择，如春风般轻灵，如秋叶般稳定。

若你的舞台是服务器云端，在多线程的海洋中搏击风浪， jemalloc 展现力量之美，在内存的战场上所向披靡。

若项目简单如山间小溪，系统分配器已足够优雅，让复杂性随风而去，保持代码的纯净与本真。

记住，没有绝对的优胜，只有最适合的舞伴，在你的具体场景中测试验证，让数据指引前行的方向。