Christopher Al Khawand
Portfolio > Posts > Calculating the Fast Fourier Transform (FFT) on a microcontroller

Calculating the Fast Fourier Transform (FFT) on a microcontroller

C/C++ Engineering Programming

As part of a bigger project at the end of my 3rd engineering year, we were tasked to perform an FFT calculation on a LPC1768 (Arch Pro) microcontroller and display the 2 “loudest” frequencies on an LCD screen, alongside their respective strengths in dB.

The beauty of this project lies in the details. We are used to our computers having plenty of resources, be it RAM, storage, etc… But in the microcontroller world, things get a lot less powerful. In fact, the Arch Pro microcontroller only has 64Kb of RAM and 512Kb of storage, which already imposes many constraints.

Adding insult to injury, the Keil uVision IDE being an education version imposed a linker size of just 32Kb, reducing even more the amount of storage we can use.

Let’s get to work and discover how I optimized my code to make all of this work.

For this implementation, I decided to use the radix FFT method since it uses a divide-and-conquer algorithm and works with power-of-2 number of samples. You can find an in-depth explanation on how this algorithm works here. I highly recommend you read the explanation before continuing to be able to grasp the extent of optimizations.

Constraint #1: Sampling

To perform an FFT, we first need audio samples. I used a microphone connected to my microcontroller to record audio samples at a frequency of 44.1kHz, or so I thought…

I was using Mbed library‘s Ticker class to trigger interruptions at regular intervals to collect audio samples. It turns out the library was maxing out at only 11kHz… 4 times less than what I desired.

What had to be done was done, I scraped Mbed altogether and started programming the microcontroller’s registers directly (Mbed is an abstraction to register programming, so the implementation might have been slower in a trade off to add more functionalities).

Here’s an excerpt of the code I wrote to program the timer:

  • Initializing the timer
FFT::FFT() {
    recording_done = false;
    recording_timer = 0;

    mic = new AnalogIn(A3);

    LPC_TIM0->MR0 = PCLK/Fe - 1; // Match 0 for a frequency of 44100 Hz
    NVIC_EnableIRQ(TIMER0_IRQn);
    LPC_TIM0->MCR = 3;           // IRQ and Reset if TC = MR0
    LPC_TIM0->TCR = 0x02;        // Stop timer
}
  • Starting the timer
void FFT::acquire() {
    LPC_TIM0->TCR = 0x01;        // Start timer
}
  • Handling the timer’s “ticks”
extern "C" void TIMER0_IRQHandler() {
    LPC_TIM0->IR = 1; // acknowledge IRQ Match 0
    fft.x[fft.recording_counter] = fft.mic->read();
    fft.recording_counter++;
    if (fft.recording_counter >= FFT::N) {
        LPC_TIM0->TCR = 0x02; // Stop timer
        fft.recording_counter = 0;
        fft.recording_done = true;
    }
}

Using this method, I was able to match the desired frequency of 44,100Hz.

Constraint #2: RAM usage

Having a limited RAM space (64Kb), I had to limit the FFT to 2048 points.

As you may know, in C/C++ we have to allocate memory for our lists. We have 2 options to do so:

  1. Allocate the list statically at compile time, it will live on the heap
  2. Allocate the list dynamically at runtime, it will live on the stack

After trying both options, I went with option #1 because it guaranteed that the space would be allocated before execution. Indeed, by using option #1 I could perform an FFT of 2048 points, and by using option #1 I could only perform an FFT on 1024 points.

Constraint #3: Linker

Having the free version of Keil uVision, the linker imposes a code size of only 32Kb. This constraint has induced the most optimizations in this project.

At the start of the project, I was only printf‘ing the FFT results to the console for simplicity. But the project’s instructions were clear: we had to display the results on an LCD display.

Not wanting to re-invent the wheel, I tried using the LCD_RGB_Suli to send the output to the LCD screen. But Keil was quick to point out that I had exceeded the allowed size. So I got to optimizing…

  1. I deleted all functions in the library that weren’t used. Memory saved: ~600 bytes
  2. I tried rewriting less precise versions of the sine and cosine functions (which the FFT uses). The program compiled successfully, but the FFT was accurate up until around 7,500 Hz (yikes!). So I went back to using math.h‘s sine and cosine functions.
  3. I finally created my own display library for the LCD screen which is independent of Suli.h and in consequence is more optimized for my use case and weighs less.

BINGO! By implementing the 3rd optimization, the linker compiled my code successfully and the FFT was precise in the frequency range imposed up until 22,050Hz (Nyquist–Shannon sampling theorem). The resulting program was fast and efficient!

Constraint #4: Linker (again)

As a final step, I had to calculate the spectral power of the given frequencies using the formula 20*log(|FFT|^2). What caused the linker to scream was the log function which apparently added lots of bytes to my program. So I had to do some math engineering shenanigans.

For context, computers love calculating in base 2. And calculating the log in base 2 is surprisingly easy:

static inline int log_2(int n) {
    int val;
    for (val = 0; n > 1; val++, n >>= 1);

    return val;
}

Then, throwing back to the math classes I used to hate, the formula to go from base 2 to the desired base 10 is, simply:

#define LOG2_10 3.32192809488736234787
static inline float log_10(float n) {
    return (float)(log_2((int)n)/LOG2_10);
}

As simple as that!

Please note that this method, while fast and simple, has some caveats. Notably, the log_2 function is not very precise, but is precise enough for my use case since I only needed to get an idea of the spectral power of the measured frequencies. After all, engineering is all about coming up with solutions that are “good enough”.

Speed and precision

As indicated above, I was limited to N=2048 points, providing an accuracy of around 10 Hz, very acceptable in my case since I am doing an FFT on an embedded system with fairly limited resources.
With regard to the calculation time, using Mbed’s Timer to measure delta-times, the average time of calculation of the FFT over the frequency range [20; 22,050] Hz is roughly 516ms, which is well within the standards for an FFT on a resource-constrained microcontroller.

Closing thoughts

Wow! You’ve made it this far already! Thank you for reading through my journey of optimizations. Don’t hesitate to leave a comment below if you have any comments and/or remarks about anything you’ve read, and let’s have a discussion on what I could have improved further. I might release the codebase on my GitHub profile at a later point so stay tuned!


Christopher Al Khawand avatar Christopher Al Khawand

I’m a university student currently living in France, in my 4th engineering year at Sorbonne University.
Passionate about programming, solving complex IT problems as well as new technological progress, learning new skills never stops.

Comments

By clicking "Show comments", you agree to Disqus's: Terms of Use, Privacy Policy .