<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Inference on Osman&#39;s Odyssey: Byte &amp; Build</title>
    <link>https://www.ahmadosman.com/tags/inference/</link>
    <description>Recent content in Inference on Osman&#39;s Odyssey: Byte &amp; Build</description>
    <image>
      <title>Osman&#39;s Odyssey: Byte &amp; Build</title>
      <url>https://www.ahmadosman.com/logo/byte-and-build.png</url>
      <link>https://www.ahmadosman.com/logo/byte-and-build.png</link>
    </image>
    <generator>Hugo -- 0.145.0</generator>
    <language>en-us</language>
    <lastBuildDate>Fri, 14 Feb 2025 02:56:56 -0600</lastBuildDate>
    <atom:link href="https://www.ahmadosman.com/tags/inference/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Key Highlights From Running DeepSeek R-1 671B on 14x RTX 3090s &#43; Epyc 7713 &amp; 512GB RAM</title>
      <link>https://www.ahmadosman.com/blog/r1-ktransformers-inference-livestream/</link>
      <pubDate>Fri, 14 Feb 2025 02:56:56 -0600</pubDate>
      <guid>https://www.ahmadosman.com/blog/r1-ktransformers-inference-livestream/</guid>
      <description>Key takeaways from livestreaming DeepSeek R-1 671B (4-bit) on a 14x RTX 3090 basement AI server. See how KTransformers crushed llama.cpp in prompt eval speeds, compare setups, and get real-world insights into massive LLM inference with vLLM, ExLlamaV2, and more.</description>
    </item>
    <item>
      <title>Serving AI From The Basement — Part II</title>
      <link>https://www.ahmadosman.com/blog/serving-ai-from-the-basement-part-ii/</link>
      <pubDate>Wed, 18 Sep 2024 05:57:26 -0500</pubDate>
      <guid>https://www.ahmadosman.com/blog/serving-ai-from-the-basement-part-ii/</guid>
      <description>SWE Agentic Framework, MoEs, Quantizations &amp; Mixed Precision, Batch Inference, LLM Architectures, vLLM, DeepSeek v2.5, Embedding Models, and Speculative Decoding: An LLM Brain Dump... I have been working on a multi-agent system that simulates a team of Software Engineers; this system assigns projects, creates teams and adds members to them based on areas of expertise and need, and asks team members to build features, assign story points, have pair programming sessions together, etc.</description>
    </item>
  </channel>
</rss>
