Archive for March, 2010

Some interesting (?) updates

March 24, 2010

I am writing this post in order to provide a quick update on all open fronts of the project. These are (a) the readers’ possibility of obtaining a development board/DIY kit in order to help in advancing the project, (b) the dongle board and its issues and (c) the “channel driver vs. dahdi-compatibility” saga. There’s news in all of these fronts — but don’t hold your breath, I am not going to announce anything really spectacular.

First, let me finish off with the poll (that I have once more included above). I have now set a closing date on it, which is by the end of the current week (March 28th at noon in my timezone). As of today, there are already twelve replies. Wow, a team of twelve prospective developers/testers is already a small army! After the poll closes, I will try to (no promises yet) do all that is required to give to the readers who are interested to help the possibility to order a prototype at the bare cost of materials+p/p+assembly (the latter only for ready-made boards). Details on this will follow, as I need to check my options: for example, if I can convince a local e-shop to run an errand for me, I might make prototypes/DIY kits available through their web site, otherwise I could use eBay and PayPal — but I ‘ll announce more on that later on.

My second piece of news is that I have debugged (well, somewhat…) the situation with the heat dissipation issue in my dongle board. By means of the popular among electronics enthusiasts “touch-n-burn-your-fingers” technique, I was able to trace the heat source. And — surprise! — although the 3210 gets hot, it is not the primary source of heat as I had thought. It is the line driver chip (Si3201), which is mounted on the bottom side of the board, that dissipates the most heat. The 3201 gets hot quickly, and its proximity to the 3210 makes the latter get hot quickly as well. Why didn’t this show up on my large-form boards? Because there, the hotty 3201 is relatively isolated from the other sources of heat dissipation (PIC, 3210) and has also a very good heat sink — a large ground-level copper area on the bottom side of the board (this is the large-form one I am talking about).

Back to the dongle, I am not a specialist in computing the thermal resistance of PCBs, however I think I can do a couple of things to redesign the dongle in order to fix the problem. The first thing to try is to relocate the line driver chip and place it as far away as possible from the 3210. The second is to use a four-layer board and to make the medium two layers into heat sinks. This would work by adding a large-hole “via” underneath the 3201, connected to the thermal pad which is already there. A generous blob of solder would serve as a thermal conductor between the thermal pad and the middle layers. Then, I could use the same method (thermal connection between layers through large-hole vias) to create “heat egress points” to the two surface layers of the board, onto some ground areas located as far away as possible from the 3210.

Other ideas include (i) an adhesive heat sink mounted on the Si3201 (see picture on the left — it’s not very expensive, but it will defeat the low-profile design of the bottom-side of the board) and (ii) redesigning the circuit and the board so as to revert to the discrete transistor-based output stage, as shown in the Silabs reference design. I tend to flirt with the idea (ii), since it might reduce the cost of the board. However, it is quite a lot of work, plus it is a challenge to fit another twelve or so components onto the tiny dongle board, so I really don’t know for now — maybe later…

Note: all the above babbling means that if you choose to manufacture your own dongle before I devise, test and publish some of the above thermal fixes, you risk working with two hot chips, and this may end up in these chips working unreliably or even burning. Probably the quickest patch here is using a heat sink, so I ‘ll try this ASAP and report back the results.

That having been said, let me now get to the dahdi-driver-versus-channel-driver saga.

The last few days I have been recap’ing my reading of the Dahdi drivers. As it turned out, after having written my own device driver for the board (or a first attempt thereof whatsoever), I was able to understand much easier what is going on in the Dahdi world. [Note, since the structure of the older predecessor of Dahdi, Zaptel, is very similar, for the rest of this discussion I ‘ll go on with Dahdi and assume that the same things more or less hold for Zaptel as well.]

So, I studied a bit wctdm.c at first. This is the source code of a Linux module implementing a device driver for a family of Digium PCI-based FXS and FXO cards which use Si3210. Some readers may remember that I have, ahem, “borrowed” some initialization values for the chip’s indirect registers in my test code from this very source. Modulo a few changes in register values (e.g., 1 instead of 0 for the TXS/RXS direct registers, different values for the indirect registers that monitor line output transistor power level, etc.), the code that handles the 3210 in this file could just as well manage my board too. On the other hand, this is a PCI driver, while mine is a USB one. I ‘ll come back to that in a few paragraphs.

Then, there is another kernel module, dahdi-base.c. In terms of the Linux module hierarchy, this module exports symbols for a set of common functions used by all other hardware-dependent device drivers. For example, and unlike what I have done with my “openusbfxs” driver, the fops (file operations) section that includes open(), release(), read(), write() and friends is implemented in dahdi-base and not on hardware-dependent device drivers like wctdm.c.

Now what I found very interesting is that the only functions that are triggered by userland requests and need to make it through to the hardware-dependent device drivers are ioctls. Read(), write(), poll() and friends are implemented entirely in dahdi-base.c and do not require any sort of “hooks” in the device drivers. As for read() and write(), there is a totally asynchronous interface between the hardware-dependent drivers and dahdi-base. Here is how I think this works.

Each hardware device driver like wctdm.c implements a dahdi-compliant device structure, which in turn contains a set of channel sub-structures, with one such sub-structure for each actual device that a card implements (remember that a card may implement e.g. four or eight FXS or FXO interfaces — I won’t discuss trunk cards, like E1/T1 for now). The hardware device driver implements a h/w  interrupt-level automaton (for example, in the case of wctdm.c this is triggered by PCI IRQs) that inputs and outputs audio data at the pace of the hardware. The device driver reads and writes data to some buffers in the device structure and then invokes two functions, dahdi_send() and dahdi_receive().

These latter functions implement a smart circular structure, made out of a set of buffers. The read() and write() syscalls that are implemented in dahdi-base.c read data from / place data to, respectively, these same buffers which are alternated between dahdi-base and the hardware device driver. This buffer structure  does not really require locking between the device driver and dahdi-base, because buffer “ownership” is only modified by the device driver, and this happens only at interrupt level (when the device is ready to read or write more data), by invoking dahdi_send()/dahdi_receive().

This looks very similar to the way that that my openusbfxs device driver works! The main difference in my current openusbfxs driver is that data are not pushed to or pulled from an “upper-layer” driver like dahdi-base, but are instead interfaced directly to the read() and write() syscalls. Because however of the fine-grain locking involved, it may turn out that my driver is imposing some overhead that dahdi does not have. In other words it seems that, as an amateur device driver writer, I may have introduced far too much complexity into my design and things could become considerably faster by avoiding locking altogether, like Dahdi does.

Hmmm… Presumably, with my experience from writing the “openusbfxs” driver, I could utilize much of the code in wctdm.c, substituting the PCI interface with the USB core interface, and removing much of the fine-grain locking that my driver is based on. Since making my h/w driver visible to the Linux filesystem level is not needed, I could remove the fops section altogether. For I/O, all I would need to do is invoke dahdi-send() and dahdi-receive() as soon as a URB arrives or is ready to ship, respectively. Finally, I would need to implement the ioctls with their Dahdi names (excluding some LED/lamp flashing device-specific ioctls) — and that’s largely all there is to it! [OK, there is also the echo cancellation that I need to take care of, but I think this won’t be very hard to add a posteriori].

Which means that, unless there is something big that I am really missing here, I think that rewriting my device driver for Dahdi is much, much easier than writing a channel driver from scratch, especially taking into account that there are tons of functionality already implemented in the Dahdi channel driver that I would need to repeat.

So, the next few days I am going to start in this course, and report as I progress through some major steps (e.g., basic module, USB working, board initialization, Dahdi registration and Dahdi I/O). I certainly hope that the results will be faster than in my previous attempt, but if I were you, I would not hold my breath. I have tried things in the wrong direction before (and this blog is the very proof of that, just check some of the older posts) and it is not unlikely that things go wrong this time as well.

As usually, I ‘ll be updating this post as I go on, so you may want to re-check this post periodically to see (if and) how work is progressing.

Update, March 29: The poll is now closed. The results are on the top of this page. I have not yet decided how exactly I should proceed, but I ‘ll let readers know really soon. If anyone intended to participate to the poll but have missed the closing date (or has just found out about it past the closing date), no worries: if I produce boards or kits, I am going to leave some headroom and make larger quantities available.

Besides that, I am redesigning the output state of the dongle, to make room for on-board heat sink areas for the 3201, as far away from the 3210 as possible. Here is the current stage of the redesign:

As you can see, the chip is now placed underneath a relatively empty top-side area, which will be covered by the GND fill polygon. Hopefully, this area will conduct much of the generated heat to the air surrounding the board. Moreover, all this area is near the RJ11 plug, and hence close to an opening in the dongle’s case, that will provide some ventilation if needed. Finally, since now the top PCB side near the hot chip is almost clear of components (there are still some remaining ones that I need to move away), a normal external heat sink could be mounted directly on the top side GND copper area if needed.

BTW, the Dahdi-compatible kernel module is also underway. Currently, it just loads but doesn’t do anything useful yet (not even invoke dahdi-base functions). Finishing my next dongle design attempt, I ‘ll definitely get more active on that — stay tuned…

Update, March 31: the heat-revised dongle design is now ready. Here is what it looks like: 

As you can see, I have placed a lot of “free copper” on the top side of the “hot area” of the board, in the hope that this will be enough. In addition, I have removed the solder mask from the most part of this area in order to ease heat radiation. If all this proves to be insufficient, there is also enough room to add two adhesive heat-sinks (this is mainly why I have removed the solder mask). I hope all these will suffice, but in order to be sure, I ‘ll order and assemble one or two prototype PCBs. If these prove to work OK, then I ‘ll be ready to go on with ordering the necessary parts for DIY kits or ready-made boards.

Update, April 6: In the dongle front, I have ordered a set of two/three prototypes for my new design (shown above) and am waiting for them to arrive. Salva has provided a very useful startup version of project shopping basket in Mouser.com (see his comment in this post for more information). In the driver front, I am now rewriting some of the initialization stuff and doing a lot of thinking about other parts of the code. Here is a question if anyone knows: when I get notified of a urb completion which means that I have received some (typically, four or eight) 1-millisecond chunks, should I call dahdi_receive repetitvely to deliver all the received data to dahdi-base (or, contrarily, should I deliver one 1-millisecond chunk per system “tick”)? In the meantime, if you are really interested in the dahdi driver, a very useful resource is Tzafrir Cohen’s page on Dahdi-Linux and especially the low-level drivers section. It seems that many of my questions throughout writing my new module will find answers there.

Update, April 8: I had forgotten to upload all my changes including statistics gathering, modifications to test programs to display statistics, SOF profiling, and the fix to tmr1_isr.asm that synchronizes the board’s clock to the USB SOF. They are all now uploaded to the project’s Google code page. Please note that by now the changes to the openusbfxs kernel module are in a sense obsolete, since the focus of the project has now moved into creating a dahdi-compliant kernel module; however, I am going to use nearly everything from my old module, so it’s a good idea to review the changes if you are actively interested in the code. In the new module front, I have stumbled upon this bug (I am developing on a 2.6.26 kernel and kbuild environment) but have found two workarounds: (1) compile from the top dahdi directory, having the new module in a subdir of $(TOP_DAHDI)/drivers/dahdi and adding the env variable SUBDIRS_EXTRA, and (2) copying the Module.symvers file that is generated in $(TOP_DAHDI)/drivers/dahdi after compiling dahdi into my own module’s directory and issuing a “make” in that directory. So, I am now able to insmod my new module. In a matter of days, I am going to report on my first tests (and crashes, if any :-)).

Update (same day, later on): the new module recognizes the board (as expected) and registers itself successfully with dahdi-base. Here is what dahdi_scan reports:

[1]
active=yes
alarms=UNCONFIGURED
description=DAHDI_DUMMY/1 (source: HRtimer) 1
name=DAHDI_DUMMY/1
manufacturer=
devicetype=DAHDI Dummy Timing
location=
basechan=1
totchans=0
irq=0
type=analog
[2]
active=yes
alarms=UNCONFIGURED
description=Open USB FXS board 0
name=OUFXS/0
manufacturer=Angelos Varvitsiotis
devicetype=Open USB FXS
location=USB ??? - FIXME
basechan=1
totchans=1
irq=0
type=analog
port=1,FXS

Not bad at all, is it? Probably, to be consistent with Dahdi numbering, some ‘0’s should read ‘1’. Also, location “??? FIXME” is printed because what is expected there is that the driver report the USB path of the attached device, but I haven’t put the necessary effort to fix that yet.

Update, April 9: My new prototype dongle boards (3 of them) have arrived and look OK. I am now refreshing my BOM and going to order some parts that I am missing.

Update, April 13: The board initialization with the new driver is now complete (well, almost: the initial URB submissions to get the isochronous engine rolling aren’t yet in place, but I ‘ll add that soon). I have implemented two new module parameters taken from wctdm.c. The “reversepolarity” does what it says, i.e., causes the driver to use reverse-active linefeed mode (Si3210 direct register 64). The “lowpower” parameter instructs the 3210 to work with an on-hook voltage of 24V and a ringer peak voltage of 50V, as originally hinted by Edwin in his comment (see also my last reply to that comment). BTW, it seems that the low-power mode indeed results in less heat dissipation on the 3201 — thanks for the hint, Edwin! Now probably I can work somewhat longer with my (un-revised) dongle without fearing that it will burn on me. I have left out the tx- and rx-gain parameters for the time, as I have also done with all the MWVI code and module parameters (MWVI stands for “Message Waiting Visual Indication”, and is used to flash a light bulb on a phone with such a bulb installed when a voice message is waiting; I guess I could use the board’s LED for a similar visual indication, but I ‘m not going to do this right now, maybe later…).

Update, April 14: I have now ordered materials for three rev-b dongle prototypes. I guess that my first rev-b prototype will be assembled by the end of this week (will it work though? fingers crossed…). In the dahdi driver front, I am ready to write the code for the isochronous engine. It still bugs me that I am not sure how often I need to tick dahdi_send()/dahdi_receive(). Theoretically, these two should be ticked once every millisecond. However, in dahdy_dymmy.c (the HR-timer based dahdi timing module), the code seems to tick dahdi_send/receive four consecutive times every four milliseconds. So, presumably, I could do the same, by calling the two functions N times at each URB completion time  without incurring too much inaccuracy [it might help to remind that in isochronous USB it makes sense to transfer N (N ≥ 4) packets per URB, so I get a completion callback only once per N milliseconds].  Thus, I ‘ll make my first attempt along these lines, and I ‘ll report on how good it will work.

Back to the roots (of bad audio quality)

March 17, 2010

While contemplating my next step (to finish off with a simple channel driver, or to try planting my device driver into the Zaptel/Dahdi family), I couldn’t help but feel a bit angry at my newly-born dongle board, because at first glance it seemed not as perfect as I had expected it to be. Besides the heat dissipation issue that I reported in my last post, I also noticed — and reported in my initial dongle-is-ready post — that the dongle was producing bad-quality sound in recording. This time, though, I was decided not to give up so easily. You see, with the dongle, I had accumulated just too many boards with sound problems, and attributing these problems to bad materials on all of them just did not make sense. So, I decided to subject the dongle to a series of torture tests.

Surprisingly, I was to find that occasionally the recording quality was perfect. This meant that the problem should not have to do with components, but rather with some condition in the digital part of the circuit, or, even more probably, some I/O glitch. But what could that be?

It was thus (and very reluctantly, believe me) that I decided to postpone the channel driver work and go back again into debugging bad audio quality problems. What new could I find by debugging again now? Well, in contrast with my early development days, when the only tool I had to debug packet loss, reordering, etc. was a USB sniffer, now I had a complete kernel driver to play with. So, I decided to go and plant some monitoring code into the driver. I ‘ll cheat a bit by telling you right away up front that the debugging findings proved to be surprisingly interesting.

To understand debugging, remember that audio data packets contain a header. The OUTgoing packets (PC to board) contain a 1-byte sequence number. The INcoming packets (board to PC) contain two of those: a sequence number of their own (incremented independently on odd and even packets) and a mirror of the incoming sequence number. If I made use of sequence numbers, debugging packet loss and reordering issues would be easy. Apart from packet loss and reordering, there were two other conditions that I needed to debug as well: input overrun (when nobody picks data delieverd from the board) and output underrun (when there is no data while it’s time to send some to the board). The driver knew all about these cases and could report them right away.

So, I implemented some statistics counters in the driver, adding of course  the necessary sophistication salt by instructing open() to reset all the counters, so that I would get a fresh image each time a user program would open the device anew. I ran again my output-only and input-only test programs, only this time I had inserted a piece of code instructing the programs to report the collected statistics roughly every second (every 8,192 samples, to be exact).

An excerpt of the output messages from the output-only test program follows. I guess it would help you understand what’s happening if I decoded the runes for you, so IN_OVR stands for “input overrun” conditions (each overrun is repoted once, regardless of the number of packets lost) , “IN_MSS” is “missed IN packets sequence numbers” (deduced from gaps in incoming sequence numbers), “IN_BAD” represents “bad” packets (reported by the USB core to be received incorrectly), OUTUND stands for “output underrun” conditions (again, each underrun is reported once) and OUTMSS is for “missed OUT packet sequence numbers” (sequence numbers not mirrored as expected).

IN OVR: 494, IN_MSS: 0, IN_BAD: 0, OUTUND: 250, OUTMSS: 0
IN OVR: 750, IN_MSS: 2, IN_BAD: 0, OUTUND: 250, OUTMSS: 3
IN OVR: 1006, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 89
IN OVR: 1262, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 1518, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 1774, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 2030, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 2286, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 2542, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 2798, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 3054, IN_MSS: 6, IN_BAD: 2, OUTUND: 250, OUTMSS: 189
IN OVR: 3310, IN_MSS: 6, IN_BAD: 2, OUTUND: 250, OUTMSS: 189
IN OVR: 3566, IN_MSS: 6, IN_BAD: 2, OUTUND: 250, OUTMSS: 189

The results were quite interesting (well, at least they did look interesting to me): since the program was not reading any samples that the board was sending, the IN_OVR statistics increased steadily. That was OK. Every now and then, a couple of IN packets were missed. This was definitely not  OK, and I should investigate further why it happened. By checking the IN_BAD column, it looked like every now and then the USB core received a “bad” IN packet (which it ignored) and the IN packet next to that was also lost. Hmmmm… why would that happen in pairs? I needed to think it over… The OUTUND column reported a constant number of underruns, which presumably occured from the time the program open()s the device until it starts sending audio. That was perfectly OK. The rightmost column was more intriguing. It looked as if, every now and then, there was a burst of lost sequence numbers. A DEBUG statement in the driver helped me remember that I have been “flexible” with OUT packets in my ISR code, allowing odd and even packets to be reversed when there are delays. So, while the driver was expecting, say, mirrored OUT sequence #10, it got back sequence #11. It then set for expecting sequence #12 (11 + 1), but received instead sequence #10 that it had been missing. It set for sequence #11, but received sequence #13. And so forth, until one packet got late and sequence numbers got reversed again.

What was by far more interesting was that OUT delays resulting in sequence number reversals, which in turn manifest themselves as bursts of missing sequence numbers, seemed to occur at the same time as IN bad/lost packets. Hmmmm… I’d better make a note of this.

Here were the results I got by running my input-only test program.

IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 498, OUTMSS: 75
IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 754, OUTMSS: 75
IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 1010, OUTMSS: 75
IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 1266, OUTMSS: 75
IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 1522, OUTMSS: 75
IN OVR: 254, IN_MSS: 2, IN_BAD: 1, OUTUND: 1778, OUTMSS: 76
IN OVR: 254, IN_MSS: 2, IN_BAD: 1, OUTUND: 2034, OUTMSS: 76

As it made good sense, the IN_OVR columns reported a steady number of (initial) IN overruns, while the OUTUND showed an increasing number of OUT underruns, since nobody was feeding the driver with data. The IN_MSS showed two missed IN sequence numbers, but only one OUT sequence was lost. All this meant that no substantial issues were observed, although one IN packet was lost somewhere in the way.

While it was too early to tell for sure, I was suspecting that all these — especially the “bad” IN packets, where each “bad” packet resulted in a pair of missed IN sequence numbers — were signs of bad synchronization between the ISR and the PC’s USB host controller. A possible explanation was that, while the ISR was clocking the I/O between the PIC and the 3210 diligently at (its own idea of) 256 kHz, the ISR synchronized (or so I thought, more on this later on…) with the host-originated USB SOF microframe only once in a lifetime. Thus, if a small drift existed between the USB clock and the board’s clock, eventually the IN/OUT isochronous traffic from the board drifted too, until the IN frame coincided with SOF (hence, it was lost), and the OUT was expected too early or too late (hence, an OUT frame was received “late” and the OUT odd/even ping-pong buffers were reversed).

If the above scenario was true, this meant that there were actually two caveats in my initial naive design. The first was the assumption that it suffices to synchronize the SOF time between the PC and the ISR code once and only once (or so I thought, more on this later on…), and let the two clocks run in parallel thereafter. The second caveat was that the device could work its internal audio I/O at a rate that is independent from the USB SOF rate (without some sort of a feedback-adaptation mechanism, like the feedback endpoint and the variable packet size in audio-class USB devices).

My first step in debugging was to confirm the hypothesis that these two caveats were indeed the cause behind missed packets. Thus, I implemented a new simple function in firmware to report in a 32-byte packet fifteen times the value of TMR3 sampled right after the SOF bit is asserted by the PIC’s USB engine [with TMR1 interrupts temporarily disabled and without actually modifying TMR3, only by sampling its value, so that I could deduce the elapsed time exactly by subtracting two successive values from one another]. Then, I added a few lines of code to my Windows test driver program in order to print out the difference between successive TMR3 values (remember that TMR3 is running at one tick per instruction). Here is the result:

 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE3 2EE0 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3 2EE0
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE3 2EE0 2EE0 2EE0 2EE3 2EE0 2EE3

By multiplying these with the PIC’s nominal instruction cycle time Tcy, one gets most of the times 12,000 instructions, that is, exactly one millisecond, but sometimes one gets 12,003 instructions, which is slightly more than one millisecond (1.00025 ms, to be exact). This meant that the two clocks were indeed drifting w.r.t. to one another, with the PC’s USB clock being just a liiiiitle bit slower than the board’s clock, QED! Since I run this test against my large-form-factor prototype #2, I thought about running it against my dongle board as well. So, here are the results for the dongle:

01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EDD 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EE0 2EDD 2EE0 2EE0 2EE0 2EDD 2EDD 2EE0 2EE0 2EDD 2EE0 2EE0 2EDD
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EDD 2EE0 2EE0 2EDD 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EDD 2EE0 2EE0 2EDD

Wow! As it was turning out, this time the PC USB host controller’s clock was a bit too fast compared to that of the dongle board. No wonder the my two working boards (prototype #2 and dongle) showed slightly different sound problems, then… One less mystery to solve! So now that the culprit for missed packets was revealed, the question was how to fix the bug.

To return to an old and long-lasting theme in this blog, this is exactly the reason why people have come up with the audio class in USB: clocks between the host and the device are not the same and the obvious way to synchronize them is to adapt the sending rate. However, I felt — and I still feel — that modifying the firmware in order to support a composite device with two audio-class devices would be an unnecessary complication. Why? Because, unlike the case of an audio playback environment where the device instructs the host to adapt its sending rate to the device’s playback rate and there is an infinite buffer of source data, in a VoIP environment there is no globally accepted master clock (there are the remote-side endpoint, the USB host and the local-side USB device, and all these have different clocks), so data overruns and underruns will be inevitable in the end. However, why create one more such source of bad audio by not synchronizing the USB device to the host?

But wait a moment — why should we synchronize the sending rate to the USB clock by varying the amount of data per packet? If the skew between the USB host and the device clocks is not significant, it could be just as well that the USB device, instead of keeping a constant clock and adapting the amount of data per packet, adapts its audio recording and playout rate to the rate of the host! This would probably be impossible if the device had a pure hardware clock; however, with a firmware-generated clock, like the one in Open USB FXS, it did not sound impossible at all!

To take that road, I would have to resynchronize my ISR’s notion of time with the SOF signal not just once, but once every 256 ISR invocations (this is every 1 ms, at the rate of SOF). Unfortunately, adding a simple wait-for-SOF check in my ISR code would not suffice, because this would work only if the ISR was faster than the actual SOF rate; in the opposite case, SOF would arrive earlier than my test and the test would always succeed without synchronizing the two rates… After torturing my mind a bit, I came up with the following fix: what if I “stole” a few instructions out of the ISR’s schedule (say, once per 256 ISR invocations), so now the ISR would purposefully last shorter than the nominal 1ms? Say, 6 PIC instructions, which would amount to 0.5 microseconds? Then, I could create a “rendez-vous” point-in-time between the ISR and the SOF signal by means of a simple tight loop that would spin waiting for SOF to appear. What was exactly so devilish about this method was that it did not require any initial synchronization action: initially, SOF would by all chances appear somewhere else in the ISR, so my test would find the SOF bit already asserted. This would result in short-cycle execution, bringing the ISR’s overall execution time to 0.9995 ms. Thus, the relative position of SOF would gradually “advance” over time, occurring later and later in the ISR cycle, until a point in time would be reached where the test code in the ISR would find SOF not-yet-asserted. But in this case, the ISR would loop tightly waiting for the SOF to come. This would then happen over and over every 256 ISR cycles (once per SOF). Hence I would achieve synchronization between SOF and my board’s clock, which was what I was after!

All this was easier said than coded (in tight-time-profiled PIC assembly). First, I had to make room for my rendezvous point somewhere in the ISR code. I already had a special case for cycle #31 (counting starts from zero: cycle #31 is the 32nd cycle), so I moved FSYNC pulsing from that place to cycle #0 of the ISR. Setting TXS and RXS back to zero (instead of 1) was also required, but this worked fine (and passed the audio test without worsening the audio quality in recording or playback).

Then, it was the turn of the SOF test. I first removed a similar tight loop from the code, which was supposed to synchronize SOF with the ISR exactly once. I am saying “was supposed to”, because, as I found later on, that test was actually doing nothing, because of a stupid assembly bug: I had used a BTFSC instruction instead of BTFSS, thus testing for a clear bit condition instead of a set one, and that test came always true (it’s incredibly easy to make such stupid errors when coding in assembly, and it’s also incredible how easily these errors live for quite a long time until finally they get spotted). OK, that removed, I planted the test for one-every-256 invocations into cycle #31 and, after profiling carefully all the timing, I added the following tight loop for SOF synchronization:

SOFloop BTFSS           UIR, SOFIF, ACCESS      ; C:1/2 break loop if SOF is set
        BRA             SOFloop                 ; C:2 otherwise loop waiting
        BCF             UIR, SOFIF, ACCESS      ; C:1 clear SOFIF for next time

These three lines of code cost me two afternoons of debugging in frustration and despair. Why? Because this code just froze the board: no signs of activity, no LED flashing, no USB device, no nothing. By #ifdef’ing out the three above instructions, everything came back to normal, so it was clear that the PIC was sitting in a tight loop waiting for a SOF that never appeared. Looking at the disassembly of user-level compiler-generated code showed me that the compiler-generated instructions for profiling the frequency of SOF (which I mentioned earlier in this post, and which worked fine), were the exact same ones I had used in my ISR  — only in the ISR these very instructions were causing the board to stall. Stated differently, the exact same code was working in one place, but freezing the board in another. Shoot…

I almost went crazy for two days, trying to figure out why my code locked down the board. Finally, after lots of working hypotheses that proved wrong, enlightment came: the ISR was starting almost immediately after the board powered up, and before any USB negotiation. At that time, there was obviously no live USB connection, and therefore there was no SOF either. Thus, the processor just sat in a tight loop inside an ISR, doing nothing to proceed with the USB negotiation (which, in Microchip’s stack, is situated in userland). Of course, since the processor was doing nothing, no negotiation did ever take place, and no SOF ever appeared. Bingo! I quiclky added a test to activate the tight loop only after at least one SOF was seen (hence, negotiation was successful) and the board sprang back to life!

Now it was time for the infallible audio test. I run the test and — pfzzzt! — failure… I kept seeing missed packets here and there. This felt like madness, so I buried myself again in debugging. Added and removed stuff here and there, in the driver and in the ISR, but to no avail: both boards kept crackling from time to time, and the statistics reported missed packets in perfect accordance with the audible problems. However, I quickly noticed a difference, and it was a big one: “bad” IN packets had disappeared. This meant that my fix worked after all, and all I had to worry about were the missed sequence numbers.

It was then that my suspicions finally turned to the right direction. What if all this was VMware’s problem? A quick test confirmed the suspicions: increasing the priority of the VMWare process to the maximum almost eliminated packet losses and the like. Just to be totally sure, I installed quickly WUBI and tested my driver on a natively-running Ubuntu Linux system. And the truth shone: no packet losses at all, no crackles, no clicks, perfect sound for as long as I wished to listen!

A good question is, why hadn’t I seen these issues before? I seems that there are two good explanations: first, by adding more and more code (debugging, statistics, etc.) in the driver, I increased the load to be carried out, so occasionally VMware could not cope with that load (whereas a native Linux proved to have no problem). The other explanation is that I was reluctant to admit that there were still things to fix in my so-well-thought-of ISR code, so I tended to attribute occasional sound problems to bad hardware. I had to see similar problems appearing on more boards in order to admit the ugly truth and sit down to debug and fix the firmware.

But it was not over yet… The two test boards (the dongle more frequently than the large-form prototype) were now occasionally producing distorted audio on input. Yes, only in the input direction. With the priority increase in VMware, my statistics were showing that there weren’t any packet losses at all. However, sound was from time to time recorded with distortion, somewhat like when listening to audio equipment that clips an audio signal. It took me still another day to think about associating this issue with the FSYNC pulse.

Being mislead by the PCM timing diagram on p. 16 of the Si3210 datasheet, I had coded FSYNC to be asserted shortly before the falling edge of PCLK on the first ISR cycle. However, the timing diagrams on p.55 of the datasheet clearly show FSYNC rising right before the previous rising PCLK edge. But then again, why didn’t that problem too show up earlier? Well, the first thing is that it did appear occasionally on some boards. I guess that my recent SOF synchronization changes introduced some jitter that the 3210 apparently did not like very much. My assumption is that the 3210 was thus occasionally hastening or delaying the high-order bit of DTX. This bit being the sign bit of an 8-bit PCM sample byte, and depending on when/how often the bit was lost, the loss resulted either in a clip-like audio effect or into something like a vocoder-modulated signal, much like “Darth Vader’s voice”, as I have reported in some of my earlier posts. [Note: I am not sure of this explanation; it’s just the best explanation I could come up with. Maybe some experienced 3210 user or some other audio/DSP specialist will offer a better explanation than mine. I am all ears.]

So I attempted several fixes, mostly trying to relocate the rising edge of FSYNC somewhere else in the code, at a point where it would be better understood by the 3210. The fix that seemed to work best was to move FSYNC back to the 31st cycle, as early as possible before the falling edge of PCLK. This produced tolerable (though not 100% crystal-perfect) audio quality in recording on both boards (prototype #2 and dongle). Since the two boards behave quite the opposite in terms of SOF synchronization (the former is faster and the latter is slower than the USB host controller’s clock), it seems that I may have covered all major cases. Yet another mystery solved!

It goes without saying that, while attempting to relocate the FSYNC pulse, I forgot several times to raise or to lower FSYNC here and there in the code, and this resulted in some very unpleasant surprises. One such surprise was that on occasions, 3210’s DR11 which is expected to always contain 51 decimal, was found to contain nonsense. The worst by far such surprise was however when the board suddenly refused to power-up the DC-DC converter. Finally, I tracked this down to be a forgotten FSYNC pulse in some code branch: the 3210 seems to update its register values that contain sampled values (but also its output PWM signal that drives the converter) at a rate of 1 32kHz, as clocked by FSYNC! So, because FSYNC was erratic, convergence of the DC-DC converter to the desired 65V was very slow. Incredible!

Concluding, what is it that I have accomplished (well, hopefully…) after all this debugging? Quite a few things: one, I have added the SOF profiling code, which, as I am discussing right below, is very useful; two, I have managed to synchronize the board’s clock to the USB clock without resorting to the complication of an audio-class USB stack (there is still one caveat: the skew between the two clocks must be within 0.05% , or else the board’s crystal must be replaced, especially if the board’s clock is slower, but the skew can be now measured, thanks to the SOF profiling code); three, I have attributed missed packets to VMware and tested my driver in a native Linux environment, where it works perfectly; finally, I have traced the source of distortion (audio clipping) in the IN direction, and hopefully have fixed that, both for slower- and for faster-than-USB board clocks.

Unless some other annoying bug shows up, this time I have to promise to myself that the next thing I ‘ll do is the Asterisk code. Let’s see whether I can keep my promise (OK, I may deviate a bit from that schedule in order to upload all recent changes in the driver and the firmware to the project’s Google code page). Meanwhile, if you think that you can help with development, please do reply to the poll below what is your preferred flavor of a board. I am repeating the poll right here, for those who have missed it in my previous post (I am also repeating here that the purpose behind the poll is to help in the formation of a small developer community — not to make profit by selling boards and kits):

Dongle files uploaded

March 15, 2010

People seem to have liked a lot the dongle form of the board — thanks, folks! This blog keeps its promises (well, at least to a certain extent) and thus I have corrected a few things here and there in the board’s layout and I have uploaded the Cadsoft Eagle files to the project’s google code svn (here). Besides the .sch and the .brd file, you can find also a BOM Excel sheet (with Farnell product codes for many of the parts — but, depending on the part of the world where you live, you might prefer looking these up at Digikey, Mouser, etc.) and two .lbr files. One contains the Eagle device for the shielded choke L1 and the other one contains the Eagle devices for the 3210 and the 3201. I have not found TSSOP-38 packages available elsewhere, so people may find these useful.

Among the corrections I have made were many text moves in the tNames layer, so now most parts’ names don’t coincide with vias as they used to and should be printed OK on the board’s silk-screen. One other change I made was that I changed the board-edge connector with a 2×3 pinhead, which is much easier to work with. Parts on the bottom-side of the board were moved slightly to make room for the new, larger connector, however other than that, no significant changes have been made.

One word of warning: although the changes I made were minimal, the new board is yet untested. There is still one medium-importance issue with the dongle board that I have tested, and this is that the 3210 gets hot over time (it reaches something like 60o C, and I have not left the board plugged-in for too long to see if it will eventually burn). Maybe this is due to the placement of components, so if my next dongle board gets hot too, I will have to either take the 3210 away from the PIC (they both dissipate heat, and them two being one atop the other is probably what gives the poor 3210 a hard time) or somehow add a heat sink. With this caveat, the dongle design is OK, so I have uploaded all the relevant material.

So far, I have been working alone in this project. However, there are a few signs of  interest from people occasionally volunteering to help in one way or another. This gave me the idea of creating a poll to see if you, the readers of this blog, would like to get your hands on a board and try for yourselves how things work. So, if you ‘d like to participate in the project and help out with further development, please fill in the questionnaire below.

The poll is anonymous and will serve just to help me in ordering an appropriate quantity of materials to be able to make/send a board to those who would like to help (should I finally choose to do it, something that I am not promising to do). You will not assume any obligation by replying in the poll. You will not order anything online, nor will you pay anything. You will not even need to promise anything, like that you will help. You will just express a wish, and even that will be anonymously. On my part of the deal now, I am not obliged anyhow to fulfill your wish. I am just trying to see if people would like to help the project and what I can do to help them help it. That’s all.

Please do note that I am not trying to make any money out of this. At its current stage, Open USB FXS is not a useful product and will definitely not serve any purpose other than experimentation and further development. It would be unethical for me to promise otherwise and try to make profit out of this. This means that, provided that readers show some interest in getting their hands on a board and assuming I choose to send over some boards after the poll results, I am going to publicize (my) cost of materials and ask interested readers to cover exactly that cost, not a single penny above. All that said, here is the poll.

Back on the code front, I am now working to resolve some firmware bugs that I have found. I will report on these in my next post. Besides firmware issues, I have been trying to come up with the Asterisk channel driver, doing dutifully my daily reading and digesting screenful after of screenful all the Asterisk channel driver code I could. While reading on, I started again to hesitate between writing my own very simple channel driver on the one hand, and trying to go back to the Linux driver and produce a Dahdi/Zaptel-compatible driver on the other [note: I am using D/Z to refer to Dahdi/Zaptel in this post].

You see, in Asterisk, an awful lot of functionality available for D/Z devices has to be (also) implemented in the channel driver: call waiting, call transfer, multi-way voice conferencing, caller id, and many other “smart” PBX functions have to be supported by the channel driver or else they won’t work at all. So, was I really in the right track trying to rewrite all of this on my own? On the other hand, trying to make my driver D/Z compatible looks daunting as well, since I have come to master at least the elementaries of a D/Z channel driver, but I am quite ignorant of the structure of a kernel D/Z-compatible driver. But it looks there is a structure, and eventually it should not be extremely hard to follow that.

More on that in my next post. It should not take to long!