{"ID":2879324,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.16151","arxiv_id":"2508.16151","title":"Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates","abstract":"The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwired-Neurons Language Processing Unit (HNLPU), which physically hardwires LLM weight parameters into the computational fabric, achieving several orders of magnitude computational efficiency improvement by extreme specialization. However, a significant challenge still lies in the scale of modern LLMs. A straightforward hardwiring of gpt-oss 120 B would require fabricating photomask sets valued at over 6 billion dollars, rendering this straightforward solution economically impractical. Addressing this challenge, we propose the novel Metal-Embedding methodology. Instead of embedding weights in a 2D grid of silicon device cells, Metal-Embedding embeds weight parameters into the 3D topology of metal wires. This brings two benefits: (1) a 15x increase in density, and (2) 60 out of 70 photomask layers are homogeneous across chips, including all EUV photomasks. In total, Metal-Embedding reduced the photomask cost by 112x, bringing the Non-Recurring Engineering (NRE) cost of HNLPU into an economically viable range. Experimental results show that HNLPU achieved 249,960 tokens/s (5,555x/85x that of GPU/WSE), 36 tokens/J (1,047x/283x that of GPU/WSE), 13,232 mm2 total die area, $59.46 M-123.5 M estimated NRE at 5 nm technology. Analysis shows that HNLPU achieved 41.7-80.4x improvement in cost-effectiveness and 357x reduction in carbon footprint compared to OpenAI-scale H100 clusters, under an annual weight updating assumption.","short_abstract":"The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwir...","url_abs":"https://arxiv.org/abs/2508.16151","url_pdf":"https://arxiv.org/pdf/2508.16151v2","authors":"[\"Yang Liu\",\"Yi Chen\",\"Yongwei Zhao\",\"Yifan Hao\",\"Zifu Zheng\",\"Weihao Kong\",\"Zhangmai Li\",\"Dongchen Jiang\",\"Ruiyang Xia\",\"Zhihong Ma\",\"Zisheng Liu\",\"Zhaoyong Wan\",\"Yunqi Lu\",\"Ximing Liu\",\"Hongrui Guo\",\"Zhihao Yang\",\"Zhe Wang\",\"Tianrui Ma\",\"Mo Zou\",\"Rui Zhang\",\"Ling Li\",\"Xing Hu\",\"Zidong Du\",\"Zhiwei Xu\",\"Qi Guo\",\"Tianshi Chen\",\"Yunji Chen\"]","published":"2025-08-22T07:20:19Z","proceeding":"cs.AR","tasks":"[\"cs.AR\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}