Tag Archives: I2C Freeze

I2C Hangup bug cured! Miracle of Miracles! Film at 11!

Posted 06 July 2020

Miracle of miracles!  Arduino finally got off their collective asses and decided to do something about the well-known, well-documented, and long-ignored I2C hangup bug.  Thanks to Grey Christoforo of Oxford, England for submitting the pull request that started the ball rolling.  See this  github issue thread for all the gory details.  However, in a bizarre outcome, the implementation of the needed timeouts isn’t implemented by default! You have to modify your code to add a call to a new function, like the following:

Note that you have to explicitly add a timeout value (3000 in my example above) or the timeout feature will still not be enabled! The ‘true’ parameter tells the library to reset the I2C bus if a timeout is detected – surely something you will want to do.

I’m currently working on a ‘before/after’ post to demonstrate that the new timeout feature actually works with real hardware scenarios.  However, due to the intermittent nature of the I2C hangup bug, it takes a while (hours/days) to grind through enough iterations to excite the bug reliably, so it may be a while before I have a good demonstration

One last thing; at some point the examples in C:\Program Files (x86)\Arduino\hardware\arduino\avr\libraries\Wire\examples (on my Win 10 machine) will probably be updated/expanded to show how to properly implement the new timeout feature, but this has not happened yet AFAICT.

The rest of this post describes my attempt to verify that the new timeout feature does, in fact, work as advertised.  The idea is to construct a “before-and-after” demonstration, where the ‘before’ configuration reliably hangs up using the Wire library without the timeout enabled, and an ‘after’ configuration that is identical to the ‘before’ setup except with the timeout enabled.

Before Configuration:

I actually started with a ‘before-before’ configuration using the SBWire library, as I have been working with I2C projects and the SBWire library ever since I gave up on the Arduino Wire library two years ago.  This configuration is patterned after Wall-E2, my current autonomous wall-following robot, which uses an Adafruit RTC, an Adafruit FRAM, a DFRobots MPU6050 IMU, and six VL53L0X time-of-flight proximity sensors (the ToF sensors are managed by a slave Teensy over the I2C bus).  For this test, I arranged all the I2C components on a plug board and connected to them using an Arduino Mega 2560 (the same controller I have on Wall-E2), as shown in the following photo.

From left to right; two VL53L0X ToF modules, FRAM module, DS3231 RTC module, MPU6050 IMU module

The software is a cut down version of the robot software, and in this first test all it does is print out time/date from the RTC and the relative heading value from the IMU.  After almost 13 hours, it was still running fine, as shown below:

So now I have a ‘known good’ (with SBWire) hardware configuration.  The next step is to change the software back from SBWire to Wire without the timeout implemented.  This should fail – the IMU readout should hangup within a few hours as it did before I originally switched to SBWire.

July 08 2020 Update:

After laboriously changing back from SBWire to Wire, I got the configuration shown in the following photo to work properly using the new Wire library without the new timeout feature enabled.

From left to right: MPU6050 IMU, DS3231 RTC, Adafruit I2C FRAM, and 3e VL53L0X ToF proximity sensors, all on the Mega 2560’s I2C bus

I programmed the Mega to access everything but the FRAM 10 times/second, and print out the results on the serial monitor, and then let it run overnight.  When I got up this morning I expected to see that it had hung up after a few hours, but discovered that it was still running fine after eight hours – bummer!  at 10 meas/sec that is 480 min * 60 sec/min * 10 = 288,000 I2C measurement cycles * 5 I2C transactions per cycle = 1,440,000 I2C transactions.  I was bummed out because it will be impossible to verify whether or not the timeout feature actually works if I can’t get a configuration that reliably hangs up. When I came back a few hours later, I saw that the printout to the serial monitor had stopped at around 700 minutes, but this turned out to be the monitor hanging up – not the I2C bus – double bummer.

So, I modified the program to only report results every second instead of 10/second so I won’t run out of serial monitor again, and restarted the ‘before’ configuration.

10 July 2020 Update:

I added the Sunfounder 20 x 4 I2C LCD display to the setup so I could display the IMU heading and proximity sensor distances locally, as shown below

I2C Test setup with Sunfounder 20 x 4 I2C LCD added

After getting this setup running, I was trying to figure out how to definitively demonstrate I2C bus hangups without the Wire library timeout feature (the ‘before’ configuration) and then demonstrate continued operation with timeouts enabled (the ‘after’ configuration).  In an email conversation, Grey Christoforo pointed me to another poster who was doing the same thing, by using an external transistor to short one I2C line to ground under program control, thereby demonstrating that the timeout feature allowed continued operation.  This gave me the idea that manually shorting one of the I2C lines to ground should do the same thing, and would allow me to demonstrate the ‘before’ and ‘after’ configurations.

The following code snippet shows the code necessary to enable the Wire library timeout feature

Although not entirely necessary, this is how I instrumented my code to capture timeout events and display them on my serial monitor

All my other hardware setup code has been removed for clarity.  Notice though, that I tried a number of different timeout values, starting from the default value of 25000 (25 mSec) down to 2000, and then back up to 3000.  At least in my particular configuration, the 1000 value was too small – it caused a timeout flag to be generated on every pass through the loop.  This was an unexpected result, as the SBWire library uses a 100 uSec (i.e. a timeout value of 100) for it’s default timeout value, and this setting has always worked fine in all my I2C projects.

In any case, here’s a short video that demonstrates that the Wire library can now recover from an I2C bus traffic interruption via the use of the new timeout feature.

 

Stay tuned!

I2C Bus Sniffing with Excel VBA

In my never-ending quest to figure out why my I2C connection to an MPU6050 dies intermittently, I decided to try and record the I2C bus conversation to see if I can determine if it is the MPU6050 or the microcontroller goes tits-up on me.

Of course, this adventure turned out to be a LOT more complicated than I thought – I mean, how hard could it be to find and run one of the many (or so I thought) I2C sniffer setups out there in the i-verse?  Well, after a fair bit of Googling and forum searches, I found that there just aren’t any good I2C sniffer programs out there, or at least nothing that I could find.

I did run across one promising program; it’s a Teensy 3.2 sniffer program written by ‘Kito’ and posted on the PJRC Teensy forum in this post.  I also found this program written for the Arduino Mega.  So, I created a small Arduino Mega test program connected to a MPU6050 using Jeff Rowberg’s I2CDev library.

This program sets up the connection to the MPU6050 and then once every 200 mSec tests the I2C connection, resets the FIFO, and then repeatedly checks the FIFO count to verify that the MPU6050 is actually doing something.

When I ran Kito’s I2C sniffer program on a Teensy 3.2 (taking care to switch the SCL & SDA lines as Kito’s code has it backwards), I get the following output

which isn’t very useful, when compared to the debug output from Jeff Rowberg’s I2CDev program, as follows:

As can be seen from Jeff’s output, there is a LOT of data being missed by Kito’s program. It gets the initial sequence right (S,Addr=0x68,W,N,P), but skips the 8-bit data sequence after the ‘W’, and mis-detects the following RESTART as a STOP.  The next sequence (S,Addr=0x68,R,N,P) is correct as far as the initial address is concerned, but again omits the 8-bit data value after the ‘R’ direction modifier.

Notwithstanding its problems, Kito’s program, along with this I2C bus specifications document  did teach me a LOT about the I2C protocol and how to parse it effectively. In addition, Kito’s program showed me how to use direct port bus reads to bypass the overhead associated with ‘digitalRead()’ calls – nice!

I got lost pretty quickly trying to understand Kito’s programming logic, so I decided I would do what any good researcher does when trying to understand a complex situation – CHEAT!!  I modified Kito’s program to simply capture the I2C bus transitions associated with my little test program into a 1024 byte buffer, then stop and print the contents of the buffer out to the serial port.  Then I copy/pasted this output into an Excel spreadsheet and wrote a VBA script to parse through the output, line-by-line. By doing it this way, I could easily examine the result of each script change, and step through the script one line at a time, watching the parsing machinery run.

Here’s a partial output from the data capture program:

So then I copy/pasted this into Excel and wrote the following VBA script to parse the data:

The above script assumes the data is in column A, starting at A1. A partial output from the program is shown below, showing the first few sequences

The above output corresponds to this line in the debug output from Jeff Rowberg’s I2Cdev code:

So, the VBA program is parsing OK-ish, but is missing big chunks, and there are some weird 1 and 2 bit sequences floating around too.

After some more research, I finally figured out that part of the problem is that the I2C protocol allows a slave device to pull the SCL line low unilaterally to temporarily suspend transmissions until the slave device catches up.  This causes ‘NOP’ sequences to appear more or less randomly in the data stream.  So, I again modified Kito’s program to first capture a 1024 byte data sample, and then parse through the sample, eliminating any NOP sequences. The result is a ‘clean’ data sample.  Here’s the modified Kito program

and a partial output from the run:

After processing all 1024 transition codes, 96 invalid transitions were removed, resulting in 928 valid I2C transitions.

When this data was copy/pasted into my Excel VBA program, it was able to correctly parse the entire sample correctly, as shown below:

This corresponds to the following lines from Jeff’s program:

Although the VBA code correctly parsed all the data and missed nothing, there is still a small ‘fly in the ointment’; there is still an extra ‘0’ bit after every transmission sequence.  Instead of

we  have

with an extra ‘0’ between the ACK/NAK and the RESTART.  This appears in every transmission sequence, so it must be a real part of the I2C protocol, but I haven’t yet found an explanation for it.

In any case, it is clear that the Excel VBA program is correctly parsing the captured sequence, so I should now be able to port it into C++ code for my Teensy I2C sniffer.

Stay tuned!

Frank

 

 

 

 

 

 

 

 

Integrating Time, Memory, and Heading Capability, Part VI

Posted 25 August 2018

In my previous posts, I have been describing my efforts to give Wall-E2, my autonomous wall-following robot, relative heading sensing ability using the DFRobots MPU6050 6DOF module.   As I went through this process, I discovered that the ‘standard’ Arduino Wire library was seriously defective, and the problem had been known, but not fixed for almost a decade!   Once I figured this out, I was able to fix my local copies of Wire.c/h and twi_c/h and all my hangup problems went away.   Subsequently I found another Wire library (SBWire by Shuning (Steve) Bian that also incorporates the necessary fixes, so I started using his library instead of my own local fixes.

Anyway, after all the I2C drama, I finally got the damned thing working, and ran some tests to demonstrate Wall-E2’s new-found ability to make reasonably precise and consistent turns.   In the first test I had Wall-E2 make a series of 90-deg (ish) turns, and in the second one I had him make some 180-deg (ish) K-turns to simulate what he might want to do after disconnecting from (or avoiding) a charging station.

Known defect in Arduino I2C code causes hangup problems

Posted 20 August 2018

06 July 2020 Update

Miracle of miracles!  Arduino finally got off their collective asses and decided to do something about the well-known, well-documented, and long-ignored I2C hangup bug.  Thanks to Grey Christoforo of Oxford, England for submitting the pull request that started the ball rolling.  See https://github.com/arduino/ArduinoCore-avr/pull/107 for all the gory details.  However, in a bizarre outcome, the implementation of the needed timeouts isn’t implemented by default! You have to modify your code to add a call to a new function, like the following:

Note that you have to explicitly add a timeout value (1000 in my example above) or the timeout feature will still not be enabled! The ‘true’ parameter tells the library to reset the I2C bus if a timeout is detected – surely something you will want to do.

I’m currently working on a ‘before/after’ post to demonstrate that the new timeout feature actually works with real hardware scenarios.  However, due to the intermittent nature of the I2C hangup bug, it takes a while (hours/days) to grind through enough iterations to excite the bug reliably, so it may be a while before I have a good demonstration

One last thing; at some point the examples in C:\Program Files (x86)\Arduino\hardware\arduino\avr\libraries\Wire\examples (on my Win 10 machine) will probably be updated/expanded to show how to properly implement the new timeout feature, but this has not happened yet AFAICT.

Stay tuned!

In my continuing quest to add relative heading sensing to Wall-E2, my autonomous wall-following robot, I have been trying to make the Invensense MPU-6050 module sold by DFRobots work on my robot.

In my last post on this topic, I had finally figured out that the program lockup problems I had been experiencing were due to a well-known-but-never-fixed bug in twi.c the low-level code associated with the Arduino I2C library.   This utility program has a number of while() loops used to send and receive bytes across the I2C bus, and every one of them is prone to deadlock when the device(s) on the other end of the bus misbehaves at all.   Then the while() loop never exits, and whatever program is running dies a horrible death.

The weird thing about this problem is that it has been known for at least a decade (yep – 10 years!!!), and has actually been fixed multiple times by multiple people over this period, but the fixes have never made it into the ‘official’ Arduino Wire library.   This makes  NO SENSE, as the Wire library code is open-source, and is available on GitHub.   I thought the whole idea behind open-source code and GitHub was that others could contribute code fixes in a reliable revision-tracked way, so that when someone finds a bug, it can be fixed quickly and then propagated out to all users.   Apparently the guys at Arduino never got the memo, because I found it impossible to get a ‘Pull Request’ containing the bug fix through the code-maintainer’s gauntlet.

Thinking this was just a logistics problem that I could solve with just a few hours of elbow grease, and would be a good training exercise for other open-source collaboration projects, I decided to take a swing at this problem myself – how hard could it be?

  • I thoroughly researched the technical issues, made the changes to my local copies of Wire.cpp/h and twi.c/h, and verified that they indeed fully solved the hangup problems
  • Found the releveant Arduino Wire library source tree on GitHub
  • Forked the Arduino Wire library source tree to my own GitHub Account
  • Cloned my fork of the Arduino Wire Library to my PC
  • Made all the relevant changes to my local repo, tested the result, and pushed the changes to my GitHub repo.
  • Created a ‘Pull Request’ with all the changes, with a descriptive note

By this time, I had expended a LOT of time, but that was OK as I had learned a lot that would pay off in future efforts, and besides I was finished – I thought!

Then I got a very nice email from the Arduino maintainer of the Wire library, listing all the things I had done wrong, and making it clear that the changes wouldn’t be merged into the ‘official’ Wire library until all was correct to their satisfaction.   When I looked at the list of problems, I realized most of it was about ‘whitespace’ mismatches between my submission and the official version.   Now, I don’t know about you, but I stopped thinking about whitespace a decade or so ago, when it became clear that whitespace was just a figment of the programmer’s mind, and had NOTHING WHATSOEVER to do with how well or poorly the code actually worked.   Now I was being asked to manually correct all the literally hundreds/thousands of places where my code had 2 spaces and the ‘official’ code had 3!   So, if I wanted this bugfix to get into the main distribution, I was going to   have to spend a HUGE amount of time dealing with nit-picking aesthetics that have nothing whatsoever to do with anything but somebody’s misplaced idea of right and wrong with respect to whitespace, for source files that are rarely, if ever, viewed by 99% of the Arduino programming community.   I mean, this would be like refusing to make a small, but important change to the maintenance manual for a car because the shop technician’s penmanship wasn’t up to par!   What is penmanship going to matter when known defects aren’t corrected?

So, I thought about that some more, and I came to realize why this I2C hangup bug has been around for so long – nobody’s pull request has ever made it through the ‘penmanship contest’ gauntlet; the Arduino maintainers are more interested in penmanship than in fixing clearly defective code that has (and still is) causing grief for anyone who tries to use the I2C bus.   My personal response to this problem was “screw them – I’m not going to spend all that effort just to please someone’s weird affection for whitespace, especially since my local copy of these files has already been fixed.

With just a little bit of searching, I found Steve Bian’s ‘SBWire’ library with timeouts added to all the while() loops in twi.c, and was quickly able to ascertain that Steve’s library did indeed solve my hangup problems.   Moreover, Steve actually answered my emails, and is undoubtedly much more open to open-source collaboration than the guys at Arduino.

The sad thing about all this is that Arduino is not doing themselves any favors by making themselves part of the problem rather than part of the solution. If they aren’t going to actively maintain their baseline code distribution, it (and Arduino) will become irrelevant as users find other ways around the obstacles.

Frank

25 August update:

So, I did the same thing with Shuning (Steve) Bain’s SBWire library that I had done with Arduino’s Wire library.   Forked his repo, cloned it to my PC, made the small changes I wanted, pushed to my repo, and created a pull request.    Two Days later, Shuning had merged my changes into the library.   Now I do realize that SBWire isn’t ARduino Wire, so maybe a ‘higher standard’ might be justified for the ‘gold standard’ I2C library.   However, I think we could all agree that EIGHT FRIGGIN’ YEARS  of known defects is probably a bit much!

So, my advice, if you’ve been having problems with I2C hangups, is to throw the Arduino Wire library in the nearest trashcan and use Shuning’s SBWire library

26 August Update:

I have been running SBWire on a little I2C test board, and I left it running over the weekend while my wife and I were away on a trip.   When I came back, some 95 hours later, the board was still running merrily.   I did note that the ‘lockup counter’ (the number of times the standard Wire library code would have locked up) stood at 14, or about once every 7 hours or so.   Actually I’m a bit surprised by this number, as in my personal experience the Wire library never lasted more than about 2 hours before locking up.

Just another reason to dump the Arduino Wire library and use something useful like SBWire ;-).

Integrating Time, Memory, and Heading Capability, Part V

Posted 10 August 2018

Well, it appears I spoke too soon about having solved the I2C hangup problem on my Wall-E2 wall-following robot.   In my last post on this subject, I described all the troubleshooting efforts I employed to nail down the cause of intermittent hangups when trying to use the MPU6050 6DOF IMU on the robot, along with several other I2C devices (a Teensy 3.5 used for IR homing, and Adafruit RTC, and FRAM modules).

After (I thought) figuring out that the I2C SCL/SDA line lengths were the root problem of the hangups I had been experiencing, my grandson Danny and I spent some quality time reworking Wall-E2’s layout to accommodate shorter line lengths.   Instead of mounting the IMU and it’s companion sensors on the second deck as before, we 3D printed a small plastic plate to attach to one of the hexagonal 2nd deck standoff posts and provide a 1st deck mounting area for the sensors.   The previous and new mounting locations are shown below:

2nd deck mounting location. The MPU6050 is the module with the illuminated blue LED toward the rear of the robot

1st deck mounting location for I2C sensors (lower right-hand corner of the photo). The Teensy 3.5 IR homing module is shown mounted on the IR detector housing (above the red plastic plate)

Unfortunately, as I was doing some final tests on this setup, I started experiencing hangups again.   After a day or so moping and some very choice words, I started all over again trying to figure out what happened.

On previous searches through the i-verse, I had run across several posts indicating that the Arduino Wire library had some basic problems with I2C bus edge conditions; there were several places where it uses several blocking ‘while()’ loops to transmit and receive data on the I2C bus, and there was no way to recover from a ‘while()’ loop where the exit condition was never satisfied.    After literally exhausting all the other possibilities, it was becoming apparent that this must be what was happening – the MPU6050 must occasionally fail to respond correctly to a I2C transaction, causing the associated ‘while()’ loop to never exit.

So, I started looking for solutions to this problem.   Again, I found some posts where folks had modified the low-level I2C bus handling code found in twi.c/.h, the code underlying the Android Wire class.   I found a post by ‘unaie’ (http://forum.arduino.cc/index.php/topic,19624.0.html) with the same complaint, but he also posted modified versions of twi.c and twi.h that solved these problems by forcing the ‘while()’ loops to exit after a set number of iterations, and resetting the I2C bus when this happens.   His modified versions can be downloaded at:

http://liken.otsoa.net/pub/ntwi/twi.h

http://liken.otsoa.net/pub/ntwi/twi.c

I downloaded these files and tried to replace the ‘stock’ twi.c/h with the modified versions. Unfortunately, unaie’s modifications were made on a quite old version of the files, and conflicted with the later ‘repeated start’ versions of these files that are in the current ‘wire’ library.

So, I did a ‘diff’ between the ‘repeated start’ version and unaie’s version, and created a modified version of the latest ‘repeated start’ twi.c/h.   In addition, I added a couple of functions to allow monitoring of the number of times a bus reset was required due to a ‘while()’ loop timeout.   When I was finished, I ran the sensor for over 24 hours with no failures, but in that time there were three instances where a ‘while()’ loop timed out and a I2C bus reset was required.   A small snippet of this run is shown below.   The blue line is the yaw value, and the plot snippet shows where I manually rotated the sensor just after 24 hours, and the horizontal orange line shows the number of bus resets.

Small snippet of 24-hour sensor run. blue line is reported yaw value; orange shows the I2C bus reset counter

So it is clear that, absent the lockup recovery modifications, the I2C bus would have locked up long before, and that with the modifications ‘while()’ loop deadlocks have been successfully handled.

11 August 2018 Update:

The sensor is still going strong after 44 hours with no hangups, and the reset counter is still holding at 3.

The complete twi.c & twi.h codes are included below:

 

Stay tuned!

Frank