- FP8 is ~100 tflops faster when the kernel name has "cutlass" in it
by limoce on 7/11/25, 10:36 AM, with comments
- Polaris: A Post-training recipe for scaling RL on Advanced Reasoning models
by limoce on 7/9/25, 6:58 AM, with comments
- Overclocking LLM Reasoning: Monitoring and Controlling LLM Thinking Path Lengths
by limoce on 7/6/25, 12:53 PM, with comments
- Neutrino: Probing-Based eBPF-Like GPU Kernel Profiling
by limoce on 7/1/25, 10:33 AM, with comments
- Machine Learning Conferences Should Establish "Refutations and Critiques" Track
by limoce on 6/26/25, 10:27 AM, with comments
- SuperGPQA: Scaling LLM Evaluation Across 285 Graduate Disciplines
by limoce on 3/4/25, 7:26 AM, with comments
- SepLLM: Accelerate LLMs by Compressing One Segment into One Separator
by limoce on 3/3/25, 1:27 PM, with comments
- Step-Video-T2V: The Practice, Challenges, and Future of Video Foundation Model
by limoce on 2/17/25, 9:54 AM, with comments
- Logic R1: Reproduce DeepSeek R1 Zero on 2K Logic Puzzle Dataset
by limoce on 2/5/25, 4:02 AM, with comments
- Libnginx: Nginx as a Shared Library
by limoce on 2/4/25, 7:56 AM, with comments
- DeepSeek-VL2: Moe Vision-Language Models for Advanced Multimodal Understanding [pdf]
by limoce on 12/13/24, 12:53 PM, with comments
- Fast vectorizable algorithms of binary searching for floating point numbers
by limoce on 11/15/24, 12:53 AM, with comments
- New OpenAI Feature: Predicted Outputs
by limoce on 11/5/24, 2:47 AM, with comments
- Collaborative Filtering Is Wrong and Here Is Why
by limoce on 10/24/24, 9:03 AM, with comments
- REST: A Plug-and-Play Method for Accelerating LLM Without Additional Training
by limoce on 10/20/24, 6:13 AM, with comments
- Smoke 'em if you got 'em: Hacker gains root access using cigarette lighter
by limoce on 10/12/24, 1:20 PM, with comments
- O1 Replication Journey: A Strategic Progress Report
by limoce on 10/9/24, 8:09 AM, with comments
- Failures of Gradient-Based Deep Learning (2017) [pdf]
by limoce on 8/15/24, 10:19 AM, with comments
- Qwen2-VL
by limoce on 8/14/24, 8:20 AM, with comments
- Qwen2-Math
by limoce on 8/8/24, 3:00 PM, with comments
- FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention
by limoce on 8/8/24, 7:24 AM, with comments
- MiniCPM-v2.6: GPT-4V Level MLLM for Single/Multi Image and Video on Your Phone
by limoce on 8/7/24, 2:00 AM, with comments
- MindSearch: LLM-Based Web Search Engine Similar to Perplexity.ai and SearchGPT
by limoce on 8/1/24, 8:53 AM, with comments
- Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
by limoce on 6/12/24, 11:01 AM, with comments
- PowerInfer-2: Fast Large Language Model Inference on a Smartphone
by limoce on 6/11/24, 2:19 PM, with comments
- Large-scale photonic chiplet Taichi empowers 160TOPS/W AI
by limoce on 4/12/24, 7:49 AM, with comments
- Asterinas: OS kernel written in Rust and providing Linux-compatible ABI
by limoce on 3/5/24, 8:52 AM, with comments
- Mq-deadline scalability improvements (with more than 100% improvement)
by limoce on 1/20/24, 12:04 PM, with comments
- Researchers Create First Functional Semiconductor Made from Graphene
by limoce on 1/5/24, 4:45 AM, with comments
- Wayland Enjoyed Many Successes in 2023
by limoce on 1/2/24, 3:54 AM, with comments
- Improving our safety with a physical quantities and units library
by limoce on 12/23/23, 6:01 AM, with comments
- Nesting chinstrap penguins sleep by seconds-long microsleeps
by limoce on 12/22/23, 2:31 PM, with comments
- PowerInfer: High-Speed Large Language Model Serving on Consumer-Grade GPUs
by limoce on 12/19/23, 12:19 PM, with comments
- Zpoline: System Call Hook for Linux
by limoce on 7/20/23, 4:18 AM, with comments
- Randomized Single-Source Shortest Path Algo. On Undirected Real-Weighted Graphs
by limoce on 7/11/23, 7:26 AM, with comments
- ChatGPT powered Rust proc macro that generates code at compile-time
by limoce on 3/13/23, 2:31 AM, with comments
- Memcpy is faster than memset on Intel i7 12700 with glibc 2.36
by limoce on 1/1/23, 12:50 PM, with comments
- Year-in-search-trends: Visualization of search interest over time
by limoce on 12/30/22, 1:28 PM, with comments
- Make call_rcu() lazy to save power
by limoce on 12/30/22, 12:58 PM, with comments
- IOMMUFD
by limoce on 12/30/22, 12:57 PM, with comments
- Chinese Chipmaker Loongson Readies 3A6000 to Tackle Zen 3 and Tiger Lake
by limoce on 10/18/22, 2:26 PM, with comments
- Skyplane: Optimizing Transfer Cost and Throughput Using Cloud-Aware Overlays
by limoce on 10/18/22, 9:39 AM, with comments
- Slirp Is Dead, Long Live Slirp: a New Approach to User-Mode Networking
by limoce on 9/21/22, 2:32 PM, with comments
- eRPC: A fast remote procedure call library for datacenters
by limoce on 8/30/22, 7:01 AM, with comments
- Poly-time algorithm for deciding Hilbert Nullstellensatz. A proof of P=NP
by limoce on 8/16/22, 3:03 PM, with comments
- How hard is it to open a file (2016)
by limoce on 8/14/22, 1:19 AM, with comments