# Scalable and Open-source Edge Al Architecture

**練維漢** Wei-han Lien Chief Architect and Senior Fellow



July 2024

## **Digital Transformation**



AI



Automation



### Human race is entering Digital AI Transformation

- Al revolution: Machine intelligence replaces human Intellect
- Reshape business models, practices, and cultures for competitiveness
- Data analytics are revolutionizing the digital landscape
- Real-time data and streamlined processing enables agile decisionmaking and strategy adjustments
- Digital insights allow personalized experiences and tailored solutions, fostering customer loyalty



## AI Personalization

- ChatGPT3 significantly improves the AI usability
- Intimacy of AI
  - Tailored experiences
  - Real-time processing of nuanced inputs
  - Human like interactions
  - Privacy and performance
  - Adaptable and Evolving





## Digital Transformation Compute Everywhere

- Exponential AI model size grow since 2012
- RISC-V started in 2010
- ChatGPT4 = 2-trillion parameters
- Data Generation = 2.5 Quintillion Byte/per day
- Both still growing.....
- How about power and cost?

2x10<sup>12</sup> parameters X 2.5x10<sup>18</sup> byte data per day



Compute everywhere

RISC-V Start 4

## Distribute Compute for AI



- Localized Processing
- Adaptive Resource
  Allocation
- Hierarchical Data
  Processing
- Efficient Pathway for Communication
- Intelligent Scaling and Dormancy
- Redundancy for Fault Tolerance and Recovery
- Real-time Energy

![](_page_4_Picture_9.jpeg)

## Unified AI Architecture

- Al pervasive computing from mw to MW
  - Client devices
  - Edge device
  - Data centers
- Tesntorrent provides key scalable AI enablement technologies
  - CPU
  - Al
  - Chiplets

![](_page_5_Picture_9.jpeg)

CPU/AI

![](_page_5_Picture_10.jpeg)

![](_page_5_Picture_12.jpeg)

Chiplet

![](_page_5_Picture_14.jpeg)

Whitebox

![](_page_5_Figure_16.jpeg)

![](_page_5_Picture_17.jpeg)

## Benefits of Open-source

![](_page_6_Figure_1.jpeg)

![](_page_6_Picture_2.jpeg)

Open Standard CPU

![](_page_6_Picture_4.jpeg)

Efficient

Stable

## Software, Silicon and Systems to Run AI and ML Fast

![](_page_7_Figure_1.jpeg)

## Our Technology

#### IP (Ascalon / Tensix-Neo)

![](_page_8_Picture_2.jpeg)

## • Scales from mW to MW for efficiency and performance

- IP available for licensing
- Industry-leading performance
- Modular design available in varied configurations

### Chips & Chiplets

![](_page_8_Picture_8.jpeg)

- Portfolio of cards powered by scalable Tensix AI cores
- Inference and Training, CNN and NLP, Recommendation Engines, all on the same silicon
- Hardware available for purchase, as well as IP available for licensing
- Multi-component modular chiplets

#### Servers (Galaxy)

![](_page_8_Picture_14.jpeg)

#### Galaxy Server – 32 high performance cards in a custom chassis - starts shipping in 2024

 Servers are easily combined into a Galaxy Rack for high bandwidth chip-to-chip connectivity

![](_page_8_Figure_17.jpeg)

Software

- ML compilers that scale from one chip to thousands
- Buda Automated Al/ML Compiler
- Metalium Bare Metal Software Stack

![](_page_8_Picture_21.jpeg)

![](_page_9_Figure_0.jpeg)

tenstorrent

10

## Scalable Tensix Element

![](_page_10_Picture_1.jpeg)

Grayskull: 120 Tensix cores

![](_page_10_Picture_3.jpeg)

![](_page_10_Figure_4.jpeg)

- Communication Subsystem
  - RISC-V controlled NoC subsystems
- Computation subsystem
  - General computation: "Baby" RISC-V
  - RISC-V controlled Matrix and vector engines
- Collaboration Mechanisms
  - Hardware supported through RISC-V

## Scalable AI Architecture

![](_page_11_Figure_1.jpeg)

#### Al scalability from 1 Tensix core to thousands of chips

![](_page_11_Picture_3.jpeg)

## Ascalon O-o-O Superscalar Processor

- Disruptive high-performance RISC-V processor for AI and server
- Best performance & power efficiency

#### RVA23

- Advanced branch predictors
- 8-wide decode
- 3 LD/ST with large load/store queues
- 6 ALU/2 BR
- 2 256-bit vector units
- 2 FPU units

![](_page_12_Figure_10.jpeg)

![](_page_12_Picture_11.jpeg)

![](_page_12_Picture_12.jpeg)

### Tenstorrent RISC-V O-o-O Processor Family

![](_page_13_Picture_1.jpeg)

Decode Width

## Chiplet

- Design Reuse
- Compossibility
- Scalability

![](_page_14_Figure_4.jpeg)

## Auto IP potential user cases in ADAS/ADS

![](_page_15_Picture_1.jpeg)

tenstorrent

## Summary

- Al compute is pervasive
- Unified scalable architecture
  - Scalable Al
  - Scalable RISC-V
  - Chiplet
  - Open-source for innovation

## • Edge Al

- Necessary for tailored user experiences
- Deployment constraints
  - Power and Thermal
  - Confidentiality
  - Safety

![](_page_16_Picture_13.jpeg)

![](_page_16_Picture_14.jpeg)