Abstract
In wireless communication systems, stringent requirements such as ultra-low latency and power consumption have significantly increased the demand for efficient algorithm-to-hardware deployment. A2HCoder addresses the persistent gap between algorithm design and hardware implementation by introducing a hierarchical framework that enhances both robustness and interpretability while suppressing common hallucination issues in LLM-generated code.
Framework Architecture
A2HCoder operates under a hierarchical transformation strategy that incorporates both horizontal algorithm decomposition and vertical refinement flow. The framework bridges the semantic and structural gap between high-level algorithm design and hardware synthesis through a structured, multi-stage transformation pipeline.

Horizontal Decomposition
Disassembles complex communication algorithms into modular, loosely coupled components, simplifying the translation process and improving robustness.
Vertical Refinement
Performs step-by-step refinement from MATLAB source code to synthesizable C code, ultimately targeting HDL synthesis via existing HLS toolchains.
Stream-based Adaptation
Reconciles the frame-based global memory paradigm of MATLAB with the stream-based, dataflow-oriented execution preferred in hardware.
Feedback Loops
Agent-style feedback loops allow the LLM to iteratively revise outputs based on synthesis feedback and task constraints.
Three-Stage Processing Pipeline
1. Code Adaptation within MATLAB Domain
The first stage operates entirely within the MATLAB domain, adapting high-level, CPU-oriented code to meet FPGA architecture constraints. Global memory accesses are restructured into sequential patterns, and batch-based operations are rewritten into sample-wise streaming pipelines.

2. Code Translation from MATLAB to HLS C++
The second stage transforms optimized MATLAB code into synthesizable HLS C++ code, focusing on restructuring function interfaces and internal operations for hardware-oriented execution with stream-oriented structures.

3. Optimization and Refinement
The final stage focuses on optimizing validated HLS C++ code to reduce resource utilization and computation latency through buffer management mechanisms and design space exploration techniques.

4. System-Level Integration
After submodule-level processing, individually refined modules are composed into a unified, executable hardware design using stream-based dataflow architecture.

Experimental Results
We validated A2HCoder through a real-world deployment case in the 5G wireless communication domain, implementing a complete 5G New Radio (NR) Synchronization Signal Block (SSB) detection system with five core submodules.
Ablation Study Results
Method | LUT | FF | DSP | BRAM | Latency | Clock (MHz) |
---|---|---|---|---|---|---|
calcThreshold - Direct | 36,500 | 80,434 | 38 | 16 | 6,385 | Failed |
calcThreshold - Adaptation | 685 | 1,176 | 24 | 4 | 6,301 | 277.09 |
calcThreshold - Refinement | 173 | 274 | 3 | 1 | 6,013 | 322.27 |
extractSSBsig - Direct | 4,468 | 7,071 | 0 | 24 | 24,890 | 265.11 |
extractSSBsig - Adaptation | 275 | 353 | 0 | 4 | 12,441 | 253.29 |
extractSSBsig - Refinement | 155 | 148 | 0 | 4 | 6,730 | 269.11 |
Interactive Demo
Explore the three-stage transformation process of A2HCoder with the calcThreshold function. See how the original MATLAB code evolves through adaptation, translation, and optimization stages to become efficient hardware implementation.
Stage 1: Code Adaptation within MATLAB Domain
Transform the original MATLAB algorithm to be hardware-friendly while staying in MATLAB. The filter-based approach is converted to a circular buffer with running sum for streaming compatibility.
Key Changes: Filter function → Circular buffer, Batch processing → Sample-wise streaming, Global memory → Sequential access% This function calculates threshold for correlation values from signal energy. % Author: Jie Lei % Date: 03/19/2025 % Revision: 0.1 %-------------------------------------------------------------------------- function threshold = calcThreshold(RxSignal) % Calculate signal energy using a moving average filter Nfft = 256; % Number of FFT points % Create a moving average filter with length Nfft % The filter coefficients are all equal to 1 % The filter is applied to the squared magnitude of the input signal % to calculate the signal energy % The filter is implemented using the filter function in MATLAB energyFilt = ones(Nfft,1); sig_mag_sq = abs(RxSignal).^2; sigEnergy = filter(energyFilt,1,sig_mag_sq); minimumThreshold = Nfft*((2^-12)^2); % Set minimumThreshold to avoid triggering on noise PSSThreshold_dB = -6; % Strength threshold in dBs (0 is least sensitive). thresholdScaling = 10^(PSSThreshold_dB/10); % Threshold scaling factor % Calculate threshold for correlation values from signal energy. threshold = sigEnergy.*thresholdScaling; % Calculate the threshold using the signal power and threshold scaling factor threshold(threshold<minimumThreshold) = minimumThreshold; % Apply minimum threshold value where calculated threshold is lower than the minimum end
function threshold = calcThreshold_hls(RxSignal) Nfft = 256; minimumThreshold = Nfft*(2^-12)^2; PSSThreshold_dB = -6; thresholdScaling = 10^(PSSThreshold_dB/10); inputLength = length(RxSignal); threshold = zeros(inputLength, 1); circularBuffer = zeros(Nfft, 1); writePtr = 0; bufferSum = 0; for i = 0:inputLength-1 currentSample = RxSignal(i+1); sig_mag_sq = abs(currentSample)^2; oldValue = circularBuffer(writePtr + 1); circularBuffer(writePtr + 1) = sig_mag_sq; bufferSum = bufferSum - oldValue + sig_mag_sq; sigEnergy = bufferSum; thresholdValue = sigEnergy * thresholdScaling; if thresholdValue < minimumThreshold thresholdValue = minimumThreshold; end threshold(i+1) = thresholdValue; writePtr = writePtr + 1; if writePtr >= Nfft writePtr = 0; end end end
Stage 2: Code Translation from MATLAB to HLS C++
Translate the hardware-optimized MATLAB code to HLS C++ with streaming interfaces and hardware-specific optimizations.
Key Changes: MATLAB syntax → C++, Arrays → Streams, Function calls → Hardware interfaces, Data types → Fixed-pointfunction threshold = calcThreshold_hls(RxSignal) Nfft = 256; minimumThreshold = Nfft*(2^-12)^2; PSSThreshold_dB = -6; thresholdScaling = 10^(PSSThreshold_dB/10); inputLength = length(RxSignal); threshold = zeros(inputLength, 1); circularBuffer = zeros(Nfft, 1); writePtr = 0; bufferSum = 0; for i = 0:inputLength-1 currentSample = RxSignal(i+1); sig_mag_sq = abs(currentSample)^2; oldValue = circularBuffer(writePtr + 1); circularBuffer(writePtr + 1) = sig_mag_sq; bufferSum = bufferSum - oldValue + sig_mag_sq; sigEnergy = bufferSum; thresholdValue = sigEnergy * thresholdScaling; if thresholdValue < minimumThreshold thresholdValue = minimumThreshold; end threshold(i+1) = thresholdValue; writePtr = writePtr + 1; if writePtr >= Nfft writePtr = 0; end end end
void calcThreshold( ComplexStream &rxSignal, ThresholdStream &threshold, int numSamples ) { // HLS interface pragmas for AXI4-Stream and AXI4-Lite #pragma HLS INTERFACE axis port=rxSignal #pragma HLS INTERFACE axis port=threshold #pragma HLS INTERFACE s_axilite port=numSamples #pragma HLS INTERFACE s_axilite port=return // Circular buffer for storing squared magnitude values FixedPoint circularBuffer[NFFT]; #pragma HLS ARRAY_PARTITION variable=circularBuffer type=cyclic factor=2 #pragma HLS RESOURCE variable=circularBuffer core=RAM_2P_BRAM // State variables int writePtr = 0; EnergyType bufferSum = 0; // Constants for threshold calculation const ThresholdType thresholdScaling = THRESHOLD_SCALING; const ThresholdType minimumThreshold = MINIMUM_THRESHOLD; // Initialize circular buffer to zero INIT_BUFFER: for (int i = 0; i < NFFT; i++) { #pragma HLS PIPELINE II=1 circularBuffer[i] = 0; } // Main processing loop PROCESS_SAMPLES: for (int sampleIdx = 0; sampleIdx < numSamples; sampleIdx++) { #pragma HLS PIPELINE II=1 #pragma HLS LOOP_TRIPCOUNT min=1024 max=8192 avg=4096 // Read input sample from stream ComplexType currentSample = rxSignal.read(); // Calculate squared magnitude FixedPoint realPart = currentSample.real(); FixedPoint imagPart = currentSample.imag(); FixedPoint sigMagSq = realPart * realPart + imagPart * imagPart; // Circular buffer operation FixedPoint oldValue = circularBuffer[writePtr]; circularBuffer[writePtr] = sigMagSq; bufferSum = bufferSum - oldValue + sigMagSq; // Calculate and apply threshold EnergyType sigEnergy = bufferSum; ThresholdType thresholdValue = sigEnergy * thresholdScaling; if (thresholdValue < minimumThreshold) { thresholdValue = minimumThreshold; } threshold.write(thresholdValue); writePtr = (writePtr + 1) % NFFT; } }
Stage 3: Optimization and Refinement
Apply advanced optimizations to reduce latency and resource utilization. Smart buffer management eliminates initialization overhead.
Key Optimizations: Remove initialization loop (saves 256 cycles), Smart buffer management, Conditional logic for uninitialized values, Enhanced HLS pragmasvoid calcThreshold( ComplexStream &rxSignal, ThresholdStream &threshold, int numSamples ) { // HLS interface pragmas for AXI4-Stream and AXI4-Lite #pragma HLS INTERFACE axis port=rxSignal #pragma HLS INTERFACE axis port=threshold #pragma HLS INTERFACE s_axilite port=numSamples #pragma HLS INTERFACE s_axilite port=return // Circular buffer for storing squared magnitude values FixedPoint circularBuffer[NFFT]; #pragma HLS ARRAY_PARTITION variable=circularBuffer type=cyclic factor=2 #pragma HLS RESOURCE variable=circularBuffer core=RAM_2P_BRAM // State variables int writePtr = 0; EnergyType bufferSum = 0; // Constants for threshold calculation const ThresholdType thresholdScaling = THRESHOLD_SCALING; const ThresholdType minimumThreshold = MINIMUM_THRESHOLD; // Initialize circular buffer to zero INIT_BUFFER: for (int i = 0; i < NFFT; i++) { #pragma HLS PIPELINE II=1 circularBuffer[i] = 0; } // Main processing loop PROCESS_SAMPLES: for (int sampleIdx = 0; sampleIdx < numSamples; sampleIdx++) { #pragma HLS PIPELINE II=1 #pragma HLS LOOP_TRIPCOUNT min=1024 max=8192 avg=4096 // Read input sample from stream ComplexType currentSample = rxSignal.read(); // Calculate squared magnitude FixedPoint realPart = currentSample.real(); FixedPoint imagPart = currentSample.imag(); FixedPoint sigMagSq = realPart * realPart + imagPart * imagPart; // Circular buffer operation FixedPoint oldValue = circularBuffer[writePtr]; circularBuffer[writePtr] = sigMagSq; bufferSum = bufferSum - oldValue + sigMagSq; // Calculate and apply threshold EnergyType sigEnergy = bufferSum; ThresholdType thresholdValue = sigEnergy * thresholdScaling; if (thresholdValue < minimumThreshold) { thresholdValue = minimumThreshold; } threshold.write(thresholdValue); writePtr = (writePtr + 1) % NFFT; } }
void calcThreshold( ComplexStream &rxSignal, ThresholdStream &threshold, int numSamples ) { // HLS interface pragmas for AXI4-Stream and AXI4-Lite // #pragma HLS INTERFACE axis port=rxSignal // #pragma HLS INTERFACE axis port=threshold // #pragma HLS INTERFACE s_axilite port=numSamples // #pragma HLS INTERFACE s_axilite port=return // Circular buffer for storing squared magnitude values EnergyType circularBuffer[NFFT]; // #pragma HLS ARRAY_PARTITION variable=circularBuffer type=cyclic factor=2 #pragma HLS RESOURCE variable=circularBuffer core=RAM_2P_BRAM // State variables int writePtr = 0; EnergyType bufferSum = 0; // Constants for threshold calculation const ThresholdType thresholdScaling = THRESHOLD_SCALING; const ThresholdType minimumThreshold = MINIMUM_THRESHOLD; // LATENCY OPTIMIZATION: Removed INIT_BUFFER loop (saves NFFT=256 cycles) // Smart buffer management: treat uninitialized values as zero for first NFFT samples // Main processing loop - optimized for reduced latency PROCESS_SAMPLES: for (int sampleIdx = 0; sampleIdx < numSamples; sampleIdx++) { #pragma HLS PIPELINE II=1 #pragma HLS LOOP_TRIPCOUNT min=1024 max=8192 avg=4096 // Read input sample from stream ComplexType currentSample = rxSignal.read(); // Calculate squared magnitude FixedPoint realPart = currentSample.real(); FixedPoint imagPart = currentSample.imag(); EnergyType sigMagSq = realPart * realPart + imagPart * imagPart; // Smart circular buffer operation - no initialization required EnergyType oldValue; if (sampleIdx < NFFT) { oldValue = 0; // Treat uninitialized as zero for first NFFT iterations } else { oldValue = circularBuffer[writePtr]; // Normal circular buffer operation } // Store new value in circular buffer circularBuffer[writePtr] = sigMagSq; // Update running sum: subtract old, add new bufferSum = bufferSum - oldValue + sigMagSq; // Calculate signal energy and apply threshold EnergyType sigEnergy = bufferSum; ThresholdType thresholdValue = sigEnergy * thresholdScaling; // Apply minimum threshold constraint if (thresholdValue < minimumThreshold) { thresholdValue = minimumThreshold; } // Write threshold to output stream threshold.write(thresholdValue); // Update circular buffer pointer writePtr = (writePtr + 1) % NFFT; } }
Real-World Deployment
The complete 5G SSB detection system was successfully deployed on a USRP X310 platform equipped with Xilinx Kintex-7 FPGA, demonstrating A2HCoder's capability to generate modular, high-performance, and synthesizable hardware directly from high-level MATLAB specifications.

System-Level Performance
Module | LUTs | FFs | DSP | BRAMs | Latency | Clock (MHz) |
---|---|---|---|---|---|---|
pssCorrelator | 6,329 | 21,088 | 276 | 0 | 54,060 | 254.00 |
calcThreshold | 173 | 274 | 3 | 1 | 6,013 | 322.27 |
peakFinder | 1,061 | 1,439 | 0 | 0 | 6,007 | 279.02 |
collectLocations | 85 | 211 | 0 | 0 | 6,004 | 332.78 |
extractSSBsig | 155 | 148 | 0 | 4 | 6,730 | 269.11 |
detectSSB (Complete System) | 8,669 | 24,216 | 279 | 7 | 53,872 | 292.23 |
Explore A2HCoder
Ready to bridge the gap between algorithm design and hardware implementation?
Download Paper Back to Top