在浏览器中运行 DeepSeek Janus-Pro-1B：一份全面的指南

直接在浏览器中运行大型语言模型 (LLM) 的能力，为保护隐私的客户端 AI 应用开辟了新的可能性。在本篇博文中，我们将探讨如何使用 WebGPU 和 Hugging Face 的 Transformers.js 库，在浏览器中完全运行DeepSeek Janus-Pro-1B——一个强大的文本到图像生成模型。

为什么选择基于浏览器的推理？

隐私：数据绝不会离开用户的设备。
成本效益高：无需服务器基础设施。
可访问性：可在任何配备现代浏览器和 WebGPU 支持的设备上运行。

DeepSeek Janus-Pro-1B 专为文本到图像生成等多模态任务而设计，现在可以通过基于浏览器的推理访问，这得益于Transformers.js中的优化和WebGPU 加速。

关键工具和库

Transformers.js：Hugging Face 的 Transformers 库的 JavaScript 版本，针对浏览器执行进行了优化。
WebGPU：一种用于浏览器中 GPU 加速的现代 API，可替代 WebGL，并为机器学习工作负载提供更高的性能。
ONNX 运行时：通过优化的计算图实现模型执行。

演示代码演示

以下示例演示了如何在 Web Worker 中加载和运行 DeepSeek Janus-Pro-1B 以实现非阻塞推理。完整代码可在GitHub 仓库中找到。

import {
  AutoProcessor,
  MultiModalityCausalLM,
  BaseStreamer,
  TextStreamer,
  InterruptableStoppingCriteria,
} from "@huggingface/transformers";

// Define constants
const IMAGE_GENERATION_COMMAND_PREFIX = "/imagine ";
const MAX_NEW_TEXT_TOKENS = 1024;

/**
 * Helper function to perform WebGPU feature detection
 */
let fp16_supported = false;
async function check() {
  try {
    const adapter = await navigator.gpu.requestAdapter();
    if (!adapter) {
      throw new Error("WebGPU is not supported (no adapter found)");
    }
    fp16_supported = adapter.features.has("shader-f16");
    self.postMessage({
      status: "success",
      data: fp16_supported,
    });
  } catch (e) {
    self.postMessage({
      status: "error",
      data: e.toString(),
    });
  }
}

/**
 * This class uses the Singleton pattern to enable lazy-loading of the pipeline
 */
class ImageGenerationPipeline {
  static model_id = "onnx-community/Janus-Pro-1B-ONNX";

  static async getInstance(progress_callback = null) {
    this.processor ??= AutoProcessor.from_pretrained(this.model_id, {
      progress_callback,
    });

    this.model ??= MultiModalityCausalLM.from_pretrained(this.model_id, {
      dtype: fp16_supported
        ? {
            prepare_inputs_embeds: "q4",
            language_model: "q4f16",
            lm_head: "fp16",
            gen_head: "fp16",
            gen_img_embeds: "fp16",
            image_decode: "fp32",
          }
        : {
            prepare_inputs_embeds: "fp32",
            language_model: "q4",
            lm_head: "fp32",
            gen_head: "fp32",
            gen_img_embeds: "fp32",
            image_decode: "fp32",
          },
      device: {
        prepare_inputs_embeds: "wasm", // TODO use "webgpu" when bug is fixed
        language_model: "webgpu",
        lm_head: "webgpu",
        gen_head: "webgpu",
        gen_img_embeds: "webgpu",
        image_decode: "webgpu",
      },
      progress_callback,
    });

    return Promise.all([this.processor, this.model]);
  }
}

class ProgressStreamer extends BaseStreamer {
  constructor(total, on_progress) {
    super();
    this.total = total;
    this.on_progress = on_progress;

    this.count = null;
    this.start_time = null;
  }

  put(value) {
    if (this.count === null) {
      // Ignore the first batch of tokens (prompt)
      this.count = 0;
      this.start_time = performance.now();
      return;
    }

    const progress = ++this.count / this.total;

    this.on_progress({
      count: this.count,
      total: this.total,
      progress,
      time: performance.now() - this.start_time,
    });
  }

  end() {
    /* no nothing */
  }
}

const stopping_criteria = new InterruptableStoppingCriteria();

async function generate(messages) {
  // For this demo, we only respond to the last message
  const message = messages.at(-1);

  // Tell the main thread we are starting
  self.postMessage({ status: "start" });

  // Load the pipeline
  const [processor, model] = await ImageGenerationPipeline.getInstance();

  // Determine if the user wants to generate an image or text
  if (message.content.startsWith(IMAGE_GENERATION_COMMAND_PREFIX)) {
    const text = message.content.replace(IMAGE_GENERATION_COMMAND_PREFIX, "");

    const conversation = [
      {
        role: "<|User|>", // uses title case
        content: text,
      },
    ];
    const inputs = await processor(conversation, {
      chat_template: "text_to_image",
    });

    const callback_function = (output) => {
      self.postMessage({
        status: "image-update",
        ...output,
      });
    };

    const num_image_tokens = processor.num_image_tokens;
    const streamer = new ProgressStreamer(num_image_tokens, callback_function);

    const outputs = await model.generate_images({
      ...inputs,
      min_new_tokens: num_image_tokens,
      max_new_tokens: num_image_tokens,
      do_sample: true,
      streamer,
    });

    const blob = await outputs[0].toBlob();

    // Send the output back to the main thread
    self.postMessage({
      status: "image-update",
      blob,
    });
  } else {
    const inputs = await processor(
      message.image
        ? [
            {
              role: "<|User|>",
              content: "<image_placeholder>\n" + message.content,
              images: [message.image],
            },
          ]
        : [
            {
              role: "<|System|>",
              content:
                "You are a helpful assistant. Answer the user's questions in a concise manner.",
            },
            {
              role: "<|User|>",
              content: message.content,
            },
          ],
    );

    let startTime;
    let numTokens = 0;
    let tps;
    const token_callback_function = () => {
      startTime ??= performance.now();

      if (numTokens++ > 0) {
        tps = (numTokens / (performance.now() - startTime)) * 1000;
      }
    };
    const callback_function = (output) => {
      self.postMessage({
        status: "text-update",
        output,
        tps,
        numTokens,
      });
    };

    const streamer = new TextStreamer(processor.tokenizer, {
      skip_prompt: true,
      skip_special_tokens: true,
      callback_function,
      token_callback_function,
    });

    // Generate response
    const outputs = await model.generate({
      ...inputs,
      max_new_tokens: MAX_NEW_TEXT_TOKENS,
      do_sample: false,
      streamer,
      stopping_criteria,
    });
  }

  // Tell the main thread we are done
  self.postMessage({
    status: "complete",
  });
}

async function load() {
  self.postMessage({
    status: "loading",
    data: "Loading model...",
  });

  // Load the pipeline and save it for future use.
  const [processor, model] = await ImageGenerationPipeline.getInstance((x) => {
    // We also add a progress callback to the pipeline so that we can
    // track model loading.
    self.postMessage(x);
  });

  self.postMessage({ status: "ready" });
}

// Listen for messages from the main thread
self.addEventListener("message", async (e) => {
  const { type, data } = e.data;

  switch (type) {
    case "check":
      check();
      break;

    case "load":
      load();
      break;

    case "generate":
      stopping_criteria.reset();
      generate(data);
      break;

    case "interrupt":
      stopping_criteria.interrupt();
      break;

    case "reset":
      stopping_criteria.reset();
      break;
  }
});

运行演示

点击此处查看实时演示：DeepSeek Janus-Pro-1B 浏览器演示。

演示版的主要功能：

模型加载和推理过程中的实时进度更新。
WebGPU加速生成（需要Chrome 113+或Edge 113+）。
完全客户端执行——不向外部服务器发送任何数据。

挑战与优化

模型量化：将模型量化为 8 位，以减小其大小并提高加载速度。
内存管理：Web Workers 可防止推理期间 UI 卡顿。
浏览器兼容性：WebGPU 仍处于实验阶段，但对性能至关重要。

结论

在浏览器中运行 DeepSeek Janus-Pro-1B 展示了客户端 AI 的潜力。借助 Transformers.js 和 WebGPU 等工具，复杂的模型现在可以在资源受限的环境中高效运行，同时保护用户隐私。

后续步骤：

尝试不同的提示和模型配置。
探索如何针对特定领域任务对模型进行微调。
监测WebGPU的采用情况，以确保更广泛的兼容性。

对于开发者而言，这标志着人工智能应用向去中心化、以用户为中心的方向迈出了激动人心的一步。快来深入研究示例代码，开始构建吧！🚀

文章来源：https://dev.to/emojiiii/running-deepseek-janus-pro-1b-in-the-browser-a-step-by-step-guide-kj2

菜单

分享

在浏览器中运行 DeepSeek Janus-Pro-1B：一份全面的指南

在浏览器中运行 DeepSeek Janus-Pro-1B：一份全面的指南

为什么选择基于浏览器的推理？

关键工具和库

演示代码演示

运行演示

挑战与优化

结论

系统设计面试中的 19 种微服务模式

使用 React 和 AWS Amplify 实现无服务器架构第三部分：跟踪应用使用情况

模型-视图-控制器（MVC）模式到底是什么？DEV 全球项目展示挑战赛，由 Mux 主办：快来展示你的项目吧！

我在两年内从 PHP 开发人员晋升为高级 C#/.NET 开发人员。

了解 Docker：第 12 部分 – 传递构建参数

Yarn 和第三方 NPM 客户端的黑暗未来 DEV 的全球展示与讲述挑战赛，由 Mux 呈现：展示你的项目！

CSS DEV 的全球展示挑战赛“响应式字体”由 Mux 呈现：展示你的项目！

我是如何以学生开发者的身份免费获得 Tabnine Pro 的，你也可以！

五大顶级JS框架

从 Rector PHP 开始：利用自动化改进您的 PHP 代码