No, I am not going to repeat the lyrics of the famous Doors’ song here. It’s just that I have decided to start a new post for reporting my progress and open issues in porting my driver to dahdi. To begin by a short summary, dahdi porting is going very well, modulo some problems that I think I will be able to resolve sooner or later. All this makes me feel like when approaching land after a long sea journey, hence the title of this post.
Here are the details (nasty as usual).
My new module, called oufxs.ko, loads successfully and registers with dahdi-base without any problems. I have now fixed the “location” of the card, so dahdi_cfg when the card is first plugged into a USB port reports e.g.:
 active=yes alarms=UNCONFIGURED description=Open USB FXS board 1: setup not started name=OUFXS/1 manufacturer=Angelos Varvitsiotis devicetype=Open USB FXS location=USB 2-1 device #6 basechan=1 totchans=1 irq=0 type=analog port=1,FXS
OK, you noticed the “USB 2-1 device #6″ above; did you also notice the “description” message with the “setup not started” note? This is thanks to the board initialization worker thread, that changes the description string as it goes along with initialization, so successive calls to dahdi_scan report the following, in the order listed here, with changes from one step to the next taking approx. half to one second, depending on the initialization step:
description=Open USB FXS board 1: setup not started description=Open USB FXS board 1: SLIC sanity check description=Open USB FXS board 1: initial powerdown description=Open USB FXS board 1: set indirect regs description=Open USB FXS board 1: set dc-dc convrtr description=Open USB FXS board 1: dc-dc cnv powerup description=Open USB FXS board 1: VBAT pwrleak test description=Open USB FXS board 1: ADC calibration description=Open USB FXS board 1: Q5/6 calibration description=Open USB FXS board 1: LBAL calibration description=Open USB FXS board 1: final step setup description=Open USB FXS board 1: up-and-running
BTW, dahdi_tool also reports correctly the board and displays the description, refreshing it whenever the current board status changes. The “alarms=UNCONFIGURED” line is the next thing to worry about. The dahdi_cfg utility needs to be executed to configure signaling before the board can be used by dahdi. After some guesswork, I think that the only configuration really required in /etc/dahdi/system.conf is a line containing “fxols=N”, where N is the the dahdi channel for the board (1, if no other dahdi hardware exists on the system). After that, dahdi_scan reports:
active=yes alarms=OK description=Open USB FXS board 1: up-and-running name=OUFXS/1 manufacturer=Angelos Varvitsiotis devicetype=Open USB FXS location=USB 2-1 device #7 basechan=1 totchans=1 irq=0 type=analog port=1,FXS
At this stage, “fxstest” can be used to play with the board. For example, “fxstest /dev/dahdi/1 stats” proudly reports:
TIP: -3.3840 Volts RING: -27.8240 Volts VBAT: -36.8480 Volts
(yes, this is the low-power mode that Edwin suggested in his comment to my previous post), while “fxstest /dev/dahdi/1 regdump” outputs the following, very enlightening dump:
Direct registers: 0. 05 1. 28 2. 01 3. 00 4. 01 5. 00 6. 00 7. 00 8. 00 9. 00 10. 08 11. 33 12. 00 13. 00 14. 00 15. 00 16. 00 17. 00 18. 00 19. 00 20. 00 21. 00 22. ff 23. 01 24. 00 25. 00 26. 00 27. 00 28. 00 29. 00 30. 00 31. 00 32. 00 33. 00 34. 00 35. 00 36. 00 37. 00 38. 00 39. 00 40. 00 41. 00 42. 00 43. 00 44. 00 45. 00 46. 00 47. 00 48. 00 49. 00 50. 00 51. 00 52. 00 53. 00 54. 00 55. 00 56. 00 57. 00 58. 00 59. 00 60. 00 61. 00 62. 00 63. 54 64. 11 65. 61 66. 01 67. 1f 68. 04 69. 0a 70. 0a 71. 00 72. 10 73. 02 74. 2c 75. 28 76. 00 77. 00 78. 10 79. 42 80. 09 81. 49 82. 62 83. 62 84. 01 85. 16 86. 00 87. 00 88. 00 89. 00 90. 00 91. 00 92. ca 93. 0c 94. 65 95. 00 96. 00 97. 00 98. 0b 99. 12 100. 0f 101. 0f 102. 05 103. 46 104. 05 105. 27 106. 02 107. 08 108. eb Indirect registers: 0. 55c2 1. 51e6 2. 4b85 3. 4937 4. 3333 5. 0202 6. 0202 7. 0198 8. 0198 9. 0611 10. 0202 11. 00e5 12. 0a1c 13. 7b30 14. 0063 15. 0000 16. 7870 17. 007d 18. 0000 19. 0000 20. 7ef0 21. 0108 22. 0000 23. 2000 24. 2000 25. 0000 26. 2000 27. 4000 28. 1000 29. 3600 30. 1000 31. 0200 32. 0ff0 33. 7f80 34. 0ff0 35. 8000 36. 0320 37. 0010 38. 0010 39. 0010 40. 0000 41. 0c00 42. b4f9 43. 1000 44. 0000 45. 0000 46. 0000 47. 0000 48. 0000 49. 0000 50. 0000 51. 0000 52. 0000 53. 0000 54. 0000 55. 0000 56. 0000 57. 0000 58. 0000 59. 0000 60. 0000 61. 0000 62. 0000 63. 0000 64. 0000 65. 0000 66. 0000 67. 0000 68. 0000 69. 0000 70. 0000 71. 0000 72. 0000 73. 0000 74. 0000 75. 0000 76. 0000 77. 0000 78. 0000 79. 0000 80. 0000 81. 0000 82. 0000 83. 0000 84. 0000 85. 0000 86. 0000 87. 0000 88. 0000 89. 0000 90. 0000 91. 0000 92. 0000 93. 0000 94. 0000 95. 0000 96. 0000 97. 0000 98. 0000 99. 4bfe 100. b6c4 101. 9ea5 102. 99ef 103. bffe 104. c699
Notice in the above dunp that unimplemented or reserved 3210 registers are not probed for their contents, and their contents are printed as zeros.
How about actual audio? So far I have implemented only the OUT part (if you ask why not also the IN part, I was too lazy — or precautious, if you prefer – to start writing the IN leg of the code without having yet tested thouroughly the OUT one). The OUT part appears to be working OK: using “fxstest /dev/dahdi/1 tones” as a test, I hear various beeping tones, including “dial tone”, “ring tone”, “busy tone”, etc., in succession. A few audible clicks from time to time can, according to my former experience, be attributed to VMWare’s sploppiness, so no worries here, although I owe to test more thoroughly under native (non-virtualized) linux. Provided that everything goes OK with the OUT part, I trust that the IN part will not be that hard to write as well. Summary: dahdi does fine with sending audio to my board by means of the new driver, and receiving audio does not look like an issue at all.
So far, so good. Now come the bad news: “fxstest /dev/dahdi/1 ring” (or “fxstest /dev/dahdi/1 polarity, which also sets the phone set ringing for a very short while) cause a complete system hang :-(. So far, I have not been able to find a good explanation for that. The nature of the hang is similar to what I have seen when trying to lock a spinlock twice, and by means of various debugging attempts (e.g., by not loading a tonezone in /etc/dahdi/system.conf and expecting the code circa line 5323 of dahdi-base.c to return -ENODATA since no zone is loaded, but contrarily to what is expected, getting another system hang), it could be related to the locking of chan->lock in line 5324 (my version is from debian source package dahdi-linux-18.104.22.168~dfsg). Why would this lock hang the system is a little mystery to me, but I trust I ‘ll be able to debug the situation further, if not otherwise, by means of extensive printk statements in the dahdi code.
There is also another caveat that I need to resolve. Wctdm.c, whose logic I am mimicking in my new driver, reads or sets various board and ProSLIC (3210/3215 in Digium boards) registers in interrupt context. As an example, while servicing a PCI interrupt from the board, if the data received indicates a 3210 hook change alarm, the hook-state ProSLIC register 68 is checked in-band. OK, this is fine if you work with a PCI board, where setting a 3210 register only involves a few inb/outb instructions, however it’s a disaster with USB, where one needs at least one bulk OUT packet (and its confirmation, hence a 2-ms separate I/O sequence) in order to communicate with the board. So, I guess that I need to somehow embed such commands within the isochronous OUT stream. This is not as hard as it sounds. The above example (checking the hook state) is already serviced in this manner (status is embedded in the IN packet stream) in my former driver, so copying it here is no problem. But embedding commands into the OUT stream will definitely need some additional work and will require a firmware upgrade.
Meanwhile, Alok Prasad (see his comments in previous posts) has reported an issue with porting the original openusbfxs driver to kernels prior to 2.6.20. The problem lies in usb_anchor, which was introduced in late kernel versions. In my reply to Alok, I am holding that
…the anchor is a place where pending URBs can be stored in order to cancel them easily if need be (e.g., if the driver unloads). It is relatively easy to craft a workaround, because in my code the active URBs are known at any time: I use usb_bulk_msg() for bulk IN/OUT, so the anchor is not used there. For isochronous IN/OUT, if one does not have an anchor, one can cancel active urbs by walking through dev->outbufs and dev->in_bufs and checking for the state of each of those: if dev->XXXbufs[i].state==st_sbmtd then dev->XXXbufs[i].urb is submitted to the core and must be canceled in case of a disconnect or driver unload. With this workaround, the anchor can be avoided.
While I keep myself busy with hunting nasty spinlock bugs and with other ugly details of dahdi porting, if any reader of this blog can assist by coming up with the above usb_anchor workaround, this would be of great help.
This is the situation right now. I am going to report further progress later in this post. Apart from the worrysome system hang, everything else looks really good! Maybe the board’s “hello, Asterisk!” day is not anymore that far away. Even though this day is not today or tomorrow, or not even next week, I have to admit that it certainly looks much, much closer now than when I first started the project. After all, this post is great news: while not yet in the harbor, I can see land ahead — hence the title.
Update, April 17: The kernel lockup/hang mystery has just been solved! It seems I fell a victim of my own thoughts when I wrote
OK, this is fine if you work with a PCI board, where setting a 3210 register only involves a few inb/outb instructions, however it’s a disaster with USB, where one needs at least one bulk OUT packet (and its confirmation, hence a 2-ms separate I/O sequence) in order to communicate with the board.
just a few paragraphs above. It turns out that the kernel lockup nightmare was hidden exactly in this paragraph. Why? Because, what is by far worse than the 2ms delay that I was babbling about, is the fact that bulk USB communication between the driver and the board is based on usb_bulk_msg(), which presumably is using blocking I/O. This means that the invoking process issues a bulk OUT URB and is then put in a sleep state while waiting for a URB completion or a timeout. Now, here is what was happening: when the DAHDI_TXSIG_START ioctl was handled by the driver, I had chosen to do the same thing that wctdm.c does: write 3210′s DR 64 of course, to set the phone ringing! This operation is done with the channel spinlock locked (chan->lock — this lock is acquired by dahdi-base right before invoking the ioctl handler in the hardware device driver), and at the same time, the isochronous engine was running, meaning that dahdi_transmit() would be called eight times each millisecond. What is one of the first things that dahdi_transmit() does? Acquire the channel spinlock, of course! So, the channel lock was acquired by dahdi-base on behalf of fxstest, then the calling process for fxstest went to sleep for two milliseconds (one to transmit the bulk OUT URB for setting DR 64, and one to receive a confirmation). To begin with, letting a process hold a spinlock while going to sleep is a very serious kernel programming error that can result in lockups. Meanwhile, just to make sure that the lockup would occur, the background isochronous engine thread was still running and invoking dahdi_transmit, which tried first-thing to acquire the same spinlock that was being held by the sleeping fxstest process. This was causing a kernel lockup. Bingo! I verified my guess by removing the dr_write() macros that set various DRs in all places but the initialization worker thread. And — yes! — the kernel hang disappeared. Of course the phone set did not ring, since DR64 is not yet being set, however my driver did no more lockup the kernel. Now I have resolved this issue, I am going to fix these to use isochronous OUT piggybacking. This means that, if no other unpleasant surprises wait for me down the road, the dahdi driver will be ready within a few days from now!
Update, April 18: I have implemented the DR set piggybacking logic in both the driver and the firmware. It did not work right away, so some testing and debugging is still needed, but the basis is there and there should be no fundamental error in the logic. What’s nice is that the kernel does not hang anymore on the DAHDI_RING ioctl. I do get a couple of serious errors though. When disconnecting the board, fxstest receives a SIGSEGV and the module count remains 1, even though the program that has opened the device gets killed. As a result, the module cannot be unloaded anymore and becomes a zombie. Some fixes here and there cured another kernel hang when the board was unplugged and then plugged back in. All these I guess are race conditions, as the usb core calls the module’s disconnect() method, which in turn starts dismantling the device instance, while the device is still hooked within dahdi. Anyway, I ‘ll soon resolve these errors, I ‘m sure.
Update, April 20: I think I fixed the module reference errors. The most significant error was that I was calling dahdi_unregister() too early, and thus the close() that occurred when the calling process exited was not received by my driver. Hence, the module’s reference count was left equal to one and the module could not be unloaded. Moving dahdi_unregister() from the oufxs_disconnect() callback to the oufxs_delete() one fixed it (you may check the source of the older driver to see what these do). So the driver looks stable now, does not hang or freeze the system and does not put the module into a zombie state anymore. Next thing to do is to debug the piggyback register setting capability. Once this is fixed as well, all that’s left is the IN isochronous engine, and then the driver will be ready!
Update, April 20 (later): Piggybacking of register set commands on OUT packet headers now works perfect and “fxstest /dev/dahdi/1 ring” rings the phone!
Update, April 22: One more nasty bug resolved: when lowpower mode was set to zero (normal power mode), the phone refused to ring! I chased this down to a wrong 3210 register setting during longitudinal balance mode calibration, due to a stupid copy-paste error of mine (I think that I ‘ve already written quite a few times in this blog that code copy-paste-due bugs were by far the worst ones I ‘ve dealt with). The fix to this bug resolved also an intolerable noise issue in audio, which I had swept under the rug for the time. Perfect! Another bug still remains, but this is kernel-related, so — I guess — easier to fix: when I plug the board with the module already loaded and then right away, before the board initializes, I reload the driver, I get a kernel freeze. By now, I ‘m experienced enough to suspect that this is a double spinlock locking, due to a race condition that occurs when the board plugging with the old module instance causes dahd_register() to be invoked shortly before or at the same time that disconnect() is called, which starts dismantling the device. Ugly, but definitely fixable. So, my short-term roadmap is to resolve this last lockup bug, then to write the IN isochronous engine (and hook/dtmf monitoring) and then – provided that no new bugs come up, of course – to publish the new driver code, along with the modified firmware. I have forgotten to mention that, in the meantime, I have assembled one new version-2 dongle board, which is now patiently waiting for me to finish with kernel driver bug-hunting in order to program and test it! So now — unless of course the new dongle blows in smoke on my face – things feel really like approaching the harbor, aren’t they?
Update, April 22 (later): The IN part works now (of course, after fixing another nasty copy-paste bug — I was using the OUT USB endpoint instead of the IN one). The audio quality is perfect in both playout and recording. It’s not ready, of course: I need to add the hook state checks, dtmf state, statistics, and a whole lot more (especially the echo canceller, which is one more thing that I have been sweeping under the rug all this time). However, by now the “ship” is in a “safe harbor”: Open USB FXS works with Dahdi, full stop. So, this is going to be my last update for this post. Prepare for the next one — titled “Hello, Asterisk!” or something similar (I ‘m sure you are guessing correctly it is going to be about, aren’t you?).