Tag Archives: Teensy

IR Modulation Processing Algorithm Development. Part VIII

Posted 18 June 2017

In my last post on this subject, I showed how I could speed up ADC cycles for the Teensy 3.5 SBC, ending up with a configuration that took only about 5μSec/analog read.  This in turn gave me some confidence that I could implement a full four-sensor digital BPF running at 20 samples/cycle at 520Hz without running out of time.

So, I decided to code this up in an Arduino sketch and see if my confidence was warranted.  The general algorithm for one sensor channel is as follows:

  1. Collect a 1/4 cycle group of samples, and add them all to form a ‘sample_group’
  2. For each sample_group, form I & Q components by multiplying the single sample_group by the appropriate sign for that position in the cycle.  The sign sequence for I is (+,+,-,-), and for Q it is (-,+,+,-) .
  3. Perform steps 1 & 2 above 4 times to collect an entire cycle’s worth of samples.  As each I/Q sample_group component is generated, add it to a ‘cycle_group_sum’ – one for the I and one for the Q component.
  4. When a new set of cycle_group_sums (one for I, one for Q) is completed, use it to update a set of two N-element running sums (one for I, one for Q).
  5. Add the absolute values of the I & Q running sums to form the final demodulated signal value for the sensor channel.

To generalize the above algorithm for K sensor channels, the ‘sample_group’ and ‘cycle_group_sum’ variables become K-element arrays, and each step becomes a K-step loop. The N-element running sum arrays (circular buffers) become [K][M] arrays, i.e. two M-element array for each sensor (one for I, one for Q).

All of the above sampling, summing, and circular buffer management must take place within the ~96μSec ‘window’ between samples, but not all steps have to be performed each time.  A new sample for each sensor channel is acquired at each point, but sample groups are converted to cycle group sums only once every 5 passes, and  the running sum and final values are only updated every 20 passes.

I built up the algorithm in VS2017 and put in some print statements to show how the gears are turning.  In addition, I added code to set a digital output HIGH at the start of each sample window, and LOW when all processing for that pass was finished.  The idea is that if the HIGH portion of the pulse is less than the available window time, all is well. When I ran this code on my Teensy 3.5, I got the following print output (truncated for brevity)

And the digital output pulse on the scope is shown in the following photo

Timing pulse for BPF algorithm run, shown at 10uS/cm. Note the time between rising edges is almost exactly 96uSec, and there is well over 60uSec ‘free time’ between the end of processing and the start of the next acquisition window.

As can be seen in the above photo, there appears to be plenty of time (over 60μSec) remaining between the end of processing for one acquisition cycle, and the start of the next acquisition window.  Also, note the fainter ‘fill-in’ section over the LOW part of the digital output.  I believe this shows that not all acquisition cycles take the same amount of processing time.  Four acquisition cycles out of every 5 require much less processing, as all that happens is the individual samples are grouped into a ‘sample_group’.  So the faint ‘fill-in’ section probably shows the additional time required for the processing that occurs after collection/summation of each ‘sample_group’.

The code for these measurements is included below:

More to come,

Frank

IR Modulation Processing Algorithm Development. Part VI

Posted 14 June 2017

In my previous posts on this subject, I have been working with an Arduino Uno as the demodulator processor, but I have been plagued by its limitation of 2KB for program memory. This has caused severe limitations with timing debug, as I can’t make debug arrays long enough for decent time averaging, and I can’t do more than one sensor channel at a time.

So, I finally took the plunge and acquired some of Paul J Stoffregen’s Teensy 3.5 processors from their store.  From their site: “Version 3.5 features a 32 bit 120 MHz ARM Cortex-M4 processor with floating point unit. All digital pins are 5 volt tolerant.” The tech specs are shown on this page, but the main features I was interested in are:

  • 120MHz processor speed vs 16MHz for the Uno
  • 192KB RAM vs 2KB for the Uno
  • Analog input has 13 bit resolution vs 12 for the Uno
  • As an added bonus, the Cortex-M4 has an FPU, so integer-only math may be unnecessary.
  • Much smaller physical footprint – the Teensy 3.5 is about 1/4 the area of the Uno
  • Lower power consumption – The Teensy 3.5 at 120MHz consumes about 30mA at 5V vs about 45mA at 5V for the Uno.

Here are some photos of the Teensy 3.5 as installed on my algorithm test bed, and also on my Wall-E2 robot where it might be installed:

Teensy 3.5 installed on my algorithm test bed, with the Uno shown for size comparison. The small processor in the foreground is an Adafruit ‘Trinket’

Side-by-side comparison of the Uno and Teensy 3.5 SBC’s

Closeup of the Teensy 3.5 shown atop the ‘sunshade’ surrounding the IR sensors.  this is a possible installed location

Wider view of a Teensy 3.5 placed atop the ‘sunshade’ surrounding the IR sensors

In addition to all these goodies, the folks at Visual Micro added the Teensy line to their Microsoft Visual Studio add-on, so programming a Teensy 3.5 is just as easy as programming a Uno – YAY!

Of course, I’ll need to re-run all the timing tests I did before, but being able to create and load (almost) arbitrary-length sample capture arrays for debugging purposes will be a great help, not to mention the ability to use floating-point calculations for better accuracy.

Stay tuned,

Frank