Back to the roots (of bad audio quality)

March 17, 2010

While contemplating my next step (to finish off with a simple channel driver, or to try planting my device driver into the Zaptel/Dahdi family), I couldn’t help but feel a bit angry at my newly-born dongle board, because at first glance it seemed not as perfect as I had expected it to be. Besides the heat dissipation issue that I reported in my last post, I also noticed — and reported in my initial dongle-is-ready post — that the dongle was producing bad-quality sound in recording. This time, though, I was decided not to give up so easily. You see, with the dongle, I had accumulated just too many boards with sound problems, and attributing these problems to bad materials on all of them just did not make sense. So, I decided to subject the dongle to a series of torture tests.

Surprisingly, I was to find that occasionally the recording quality was perfect. This meant that the problem should not have to do with components, but rather with some condition in the digital part of the circuit, or, even more probably, some I/O glitch. But what could that be?

It was thus (and very reluctantly, believe me) that I decided to postpone the channel driver work and go back again into debugging bad audio quality problems. What new could I find by debugging again now? Well, in contrast with my early development days, when the only tool I had to debug packet loss, reordering, etc. was a USB sniffer, now I had a complete kernel driver to play with. So, I decided to go and plant some monitoring code into the driver. I ‘ll cheat a bit by telling you right away up front that the debugging findings proved to be surprisingly interesting.

To understand debugging, remember that audio data packets contain a header. The OUTgoing packets (PC to board) contain a 1-byte sequence number. The INcoming packets (board to PC) contain two of those: a sequence number of their own (incremented independently on odd and even packets) and a mirror of the incoming sequence number. If I made use of sequence numbers, debugging packet loss and reordering issues would be easy. Apart from packet loss and reordering, there were two other conditions that I needed to debug as well: input overrun (when nobody picks data delieverd from the board) and output underrun (when there is no data while it’s time to send some to the board). The driver knew all about these cases and could report them right away.

So, I implemented some statistics counters in the driver, adding of course  the necessary sophistication salt by instructing open() to reset all the counters, so that I would get a fresh image each time a user program would open the device anew. I ran again my output-only and input-only test programs, only this time I had inserted a piece of code instructing the programs to report the collected statistics roughly every second (every 8,192 samples, to be exact).

An excerpt of the output messages from the output-only test program follows. I guess it would help you understand what’s happening if I decoded the runes for you, so IN_OVR stands for “input overrun” conditions (each overrun is repoted once, regardless of the number of packets lost) , “IN_MSS” is “missed IN packets sequence numbers” (deduced from gaps in incoming sequence numbers), “IN_BAD” represents “bad” packets (reported by the USB core to be received incorrectly), OUTUND stands for “output underrun” conditions (again, each underrun is reported once) and OUTMSS is for “missed OUT packet sequence numbers” (sequence numbers not mirrored as expected).

IN OVR: 494, IN_MSS: 0, IN_BAD: 0, OUTUND: 250, OUTMSS: 0
IN OVR: 750, IN_MSS: 2, IN_BAD: 0, OUTUND: 250, OUTMSS: 3
IN OVR: 1006, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 89
IN OVR: 1262, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 1518, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 1774, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 2030, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 2286, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 2542, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 2798, IN_MSS: 4, IN_BAD: 1, OUTUND: 250, OUTMSS: 113
IN OVR: 3054, IN_MSS: 6, IN_BAD: 2, OUTUND: 250, OUTMSS: 189
IN OVR: 3310, IN_MSS: 6, IN_BAD: 2, OUTUND: 250, OUTMSS: 189
IN OVR: 3566, IN_MSS: 6, IN_BAD: 2, OUTUND: 250, OUTMSS: 189

The results were quite interesting (well, at least they did look interesting to me): since the program was not reading any samples that the board was sending, the IN_OVR statistics increased steadily. That was OK. Every now and then, a couple of IN packets were missed. This was definitely not  OK, and I should investigate further why it happened. By checking the IN_BAD column, it looked like every now and then the USB core received a “bad” IN packet (which it ignored) and the IN packet next to that was also lost. Hmmmm… why would that happen in pairs? I needed to think it over… The OUTUND column reported a constant number of underruns, which presumably occured from the time the program open()s the device until it starts sending audio. That was perfectly OK. The rightmost column was more intriguing. It looked as if, every now and then, there was a burst of lost sequence numbers. A DEBUG statement in the driver helped me remember that I have been “flexible” with OUT packets in my ISR code, allowing odd and even packets to be reversed when there are delays. So, while the driver was expecting, say, mirrored OUT sequence #10, it got back sequence #11. It then set for expecting sequence #12 (11 + 1), but received instead sequence #10 that it had been missing. It set for sequence #11, but received sequence #13. And so forth, until one packet got late and sequence numbers got reversed again.

What was by far more interesting was that OUT delays resulting in sequence number reversals, which in turn manifest themselves as bursts of missing sequence numbers, seemed to occur at the same time as IN bad/lost packets. Hmmmm… I’d better make a note of this.

Here were the results I got by running my input-only test program.

IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 498, OUTMSS: 75
IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 754, OUTMSS: 75
IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 1010, OUTMSS: 75
IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 1266, OUTMSS: 75
IN OVR: 254, IN_MSS: 0, IN_BAD: 1, OUTUND: 1522, OUTMSS: 75
IN OVR: 254, IN_MSS: 2, IN_BAD: 1, OUTUND: 1778, OUTMSS: 76
IN OVR: 254, IN_MSS: 2, IN_BAD: 1, OUTUND: 2034, OUTMSS: 76

As it made good sense, the IN_OVR columns reported a steady number of (initial) IN overruns, while the OUTUND showed an increasing number of OUT underruns, since nobody was feeding the driver with data. The IN_MSS showed two missed IN sequence numbers, but only one OUT sequence was lost. All this meant that no substantial issues were observed, although one IN packet was lost somewhere in the way.

While it was too early to tell for sure, I was suspecting that all these — especially the “bad” IN packets, where each “bad” packet resulted in a pair of missed IN sequence numbers — were signs of bad synchronization between the ISR and the PC’s USB host controller. A possible explanation was that, while the ISR was clocking the I/O between the PIC and the 3210 diligently at (its own idea of) 256 kHz, the ISR synchronized (or so I thought, more on this later on…) with the host-originated USB SOF microframe only once in a lifetime. Thus, if a small drift existed between the USB clock and the board’s clock, eventually the IN/OUT isochronous traffic from the board drifted too, until the IN frame coincided with SOF (hence, it was lost), and the OUT was expected too early or too late (hence, an OUT frame was received “late” and the OUT odd/even ping-pong buffers were reversed).

If the above scenario was true, this meant that there were actually two caveats in my initial naive design. The first was the assumption that it suffices to synchronize the SOF time between the PC and the ISR code once and only once (or so I thought, more on this later on…), and let the two clocks run in parallel thereafter. The second caveat was that the device could work its internal audio I/O at a rate that is independent from the USB SOF rate (without some sort of a feedback-adaptation mechanism, like the feedback endpoint and the variable packet size in audio-class USB devices).

My first step in debugging was to confirm the hypothesis that these two caveats were indeed the cause behind missed packets. Thus, I implemented a new simple function in firmware to report in a 32-byte packet fifteen times the value of TMR3 sampled right after the SOF bit is asserted by the PIC’s USB engine [with TMR1 interrupts temporarily disabled and without actually modifying TMR3, only by sampling its value, so that I could deduce the elapsed time exactly by subtracting two successive values from one another]. Then, I added a few lines of code to my Windows test driver program in order to print out the difference between successive TMR3 values (remember that TMR3 is running at one tick per instruction). Here is the result:

 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE3 2EE0 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE0 2EE3 2EE0
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EE3 2EE0 2EE3 2EE0 2EE0 2EE3 2EE3 2EE0 2EE0 2EE0 2EE3 2EE0 2EE3

By multiplying these with the PIC’s nominal instruction cycle time Tcy, one gets most of the times 12,000 instructions, that is, exactly one millisecond, but sometimes one gets 12,003 instructions, which is slightly more than one millisecond (1.00025 ms, to be exact). This meant that the two clocks were indeed drifting w.r.t. to one another, with the PC’s USB clock being just a liiiiitle bit slower than the board’s clock, QED! Since I run this test against my large-form-factor prototype #2, I thought about running it against my dongle board as well. So, here are the results for the dongle:

01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EDD 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EE0 2EDD 2EE0 2EE0 2EE0 2EDD 2EDD 2EE0 2EE0 2EDD 2EE0 2EE0 2EDD
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0
 01   02   03   04   05   06   07   08   09   10   11   12   13   14
2EDD 2EE0 2EE0 2EDD 2EE0 2EDD 2EE0 2EE0 2EDD 2EE0 2EDD 2EE0 2EE0 2EDD

Wow! As it was turning out, this time the PC USB host controller’s clock was a bit too fast compared to that of the dongle board. No wonder the my two working boards (prototype #2 and dongle) showed slightly different sound problems, then… One less mystery to solve! So now that the culprit for missed packets was revealed, the question was how to fix the bug.

To return to an old and long-lasting theme in this blog, this is exactly the reason why people have come up with the audio class in USB: clocks between the host and the device are not the same and the obvious way to synchronize them is to adapt the sending rate. However, I felt — and I still feel — that modifying the firmware in order to support a composite device with two audio-class devices would be an unnecessary complication. Why? Because, unlike the case of an audio playback environment where the device instructs the host to adapt its sending rate to the device’s playback rate and there is an infinite buffer of source data, in a VoIP environment there is no globally accepted master clock (there are the remote-side endpoint, the USB host and the local-side USB device, and all these have different clocks), so data overruns and underruns will be inevitable in the end. However, why create one more such source of bad audio by not synchronizing the USB device to the host?

But wait a moment — why should we synchronize the sending rate to the USB clock by varying the amount of data per packet? If the skew between the USB host and the device clocks is not significant, it could be just as well that the USB device, instead of keeping a constant clock and adapting the amount of data per packet, adapts its audio recording and playout rate to the rate of the host! This would probably be impossible if the device had a pure hardware clock; however, with a firmware-generated clock, like the one in Open USB FXS, it did not sound impossible at all!

To take that road, I would have to resynchronize my ISR’s notion of time with the SOF signal not just once, but once every 256 ISR invocations (this is every 1 ms, at the rate of SOF). Unfortunately, adding a simple wait-for-SOF check in my ISR code would not suffice, because this would work only if the ISR was faster than the actual SOF rate; in the opposite case, SOF would arrive earlier than my test and the test would always succeed without synchronizing the two rates… After torturing my mind a bit, I came up with the following fix: what if I “stole” a few instructions out of the ISR’s schedule (say, once per 256 ISR invocations), so now the ISR would purposefully last shorter than the nominal 1ms? Say, 6 PIC instructions, which would amount to 0.5 microseconds? Then, I could create a “rendez-vous” point-in-time between the ISR and the SOF signal by means of a simple tight loop that would spin waiting for SOF to appear. What was exactly so devilish about this method was that it did not require any initial synchronization action: initially, SOF would by all chances appear somewhere else in the ISR, so my test would find the SOF bit already asserted. This would result in short-cycle execution, bringing the ISR’s overall execution time to 0.9995 ms. Thus, the relative position of SOF would gradually “advance” over time, occurring later and later in the ISR cycle, until a point in time would be reached where the test code in the ISR would find SOF not-yet-asserted. But in this case, the ISR would loop tightly waiting for the SOF to come. This would then happen over and over every 256 ISR cycles (once per SOF). Hence I would achieve synchronization between SOF and my board’s clock, which was what I was after!

All this was easier said than coded (in tight-time-profiled PIC assembly). First, I had to make room for my rendezvous point somewhere in the ISR code. I already had a special case for cycle #31 (counting starts from zero: cycle #31 is the 32nd cycle), so I moved FSYNC pulsing from that place to cycle #0 of the ISR. Setting TXS and RXS back to zero (instead of 1) was also required, but this worked fine (and passed the audio test without worsening the audio quality in recording or playback).

Then, it was the turn of the SOF test. I first removed a similar tight loop from the code, which was supposed to synchronize SOF with the ISR exactly once. I am saying “was supposed to”, because, as I found later on, that test was actually doing nothing, because of a stupid assembly bug: I had used a BTFSC instruction instead of BTFSS, thus testing for a clear bit condition instead of a set one, and that test came always true (it’s incredibly easy to make such stupid errors when coding in assembly, and it’s also incredible how easily these errors live for quite a long time until finally they get spotted). OK, that removed, I planted the test for one-every-256 invocations into cycle #31 and, after profiling carefully all the timing, I added the following tight loop for SOF synchronization:

SOFloop BTFSS           UIR, SOFIF, ACCESS      ; C:1/2 break loop if SOF is set
        BRA             SOFloop                 ; C:2 otherwise loop waiting
        BCF             UIR, SOFIF, ACCESS      ; C:1 clear SOFIF for next time

These three lines of code cost me two afternoons of debugging in frustration and despair. Why? Because this code just froze the board: no signs of activity, no LED flashing, no USB device, no nothing. By #ifdef’ing out the three above instructions, everything came back to normal, so it was clear that the PIC was sitting in a tight loop waiting for a SOF that never appeared. Looking at the disassembly of user-level compiler-generated code showed me that the compiler-generated instructions for profiling the frequency of SOF (which I mentioned earlier in this post, and which worked fine), were the exact same ones I had used in my ISR  — only in the ISR these very instructions were causing the board to stall. Stated differently, the exact same code was working in one place, but freezing the board in another. Shoot…

I almost went crazy for two days, trying to figure out why my code locked down the board. Finally, after lots of working hypotheses that proved wrong, enlightment came: the ISR was starting almost immediately after the board powered up, and before any USB negotiation. At that time, there was obviously no live USB connection, and therefore there was no SOF either. Thus, the processor just sat in a tight loop inside an ISR, doing nothing to proceed with the USB negotiation (which, in Microchip’s stack, is situated in userland). Of course, since the processor was doing nothing, no negotiation did ever take place, and no SOF ever appeared. Bingo! I quiclky added a test to activate the tight loop only after at least one SOF was seen (hence, negotiation was successful) and the board sprang back to life!

Now it was time for the infallible audio test. I run the test and — pfzzzt! — failure… I kept seeing missed packets here and there. This felt like madness, so I buried myself again in debugging. Added and removed stuff here and there, in the driver and in the ISR, but to no avail: both boards kept crackling from time to time, and the statistics reported missed packets in perfect accordance with the audible problems. However, I quickly noticed a difference, and it was a big one: “bad” IN packets had disappeared. This meant that my fix worked after all, and all I had to worry about were the missed sequence numbers.

It was then that my suspicions finally turned to the right direction. What if all this was VMware’s problem? A quick test confirmed the suspicions: increasing the priority of the VMWare process to the maximum almost eliminated packet losses and the like. Just to be totally sure, I installed quickly WUBI and tested my driver on a natively-running Ubuntu Linux system. And the truth shone: no packet losses at all, no crackles, no clicks, perfect sound for as long as I wished to listen!

A good question is, why hadn’t I seen these issues before? I seems that there are two good explanations: first, by adding more and more code (debugging, statistics, etc.) in the driver, I increased the load to be carried out, so occasionally VMware could not cope with that load (whereas a native Linux proved to have no problem). The other explanation is that I was reluctant to admit that there were still things to fix in my so-well-thought-of ISR code, so I tended to attribute occasional sound problems to bad hardware. I had to see similar problems appearing on more boards in order to admit the ugly truth and sit down to debug and fix the firmware.

But it was not over yet… The two test boards (the dongle more frequently than the large-form prototype) were now occasionally producing distorted audio on input. Yes, only in the input direction. With the priority increase in VMware, my statistics were showing that there weren’t any packet losses at all. However, sound was from time to time recorded with distortion, somewhat like when listening to audio equipment that clips an audio signal. It took me still another day to think about associating this issue with the FSYNC pulse.

Being mislead by the PCM timing diagram on p. 16 of the Si3210 datasheet, I had coded FSYNC to be asserted shortly before the falling edge of PCLK on the first ISR cycle. However, the timing diagrams on p.55 of the datasheet clearly show FSYNC rising right before the previous rising PCLK edge. But then again, why didn’t that problem too show up earlier? Well, the first thing is that it did appear occasionally on some boards. I guess that my recent SOF synchronization changes introduced some jitter that the 3210 apparently did not like very much. My assumption is that the 3210 was thus occasionally hastening or delaying the high-order bit of DTX. This bit being the sign bit of an 8-bit PCM sample byte, and depending on when/how often the bit was lost, the loss resulted either in a clip-like audio effect or into something like a vocoder-modulated signal, much like “Darth Vader’s voice”, as I have reported in some of my earlier posts. [Note: I am not sure of this explanation; it’s just the best explanation I could come up with. Maybe some experienced 3210 user or some other audio/DSP specialist will offer a better explanation than mine. I am all ears.]

So I attempted several fixes, mostly trying to relocate the rising edge of FSYNC somewhere else in the code, at a point where it would be better understood by the 3210. The fix that seemed to work best was to move FSYNC back to the 31st cycle, as early as possible before the falling edge of PCLK. This produced tolerable (though not 100% crystal-perfect) audio quality in recording on both boards (prototype #2 and dongle). Since the two boards behave quite the opposite in terms of SOF synchronization (the former is faster and the latter is slower than the USB host controller’s clock), it seems that I may have covered all major cases. Yet another mystery solved!

It goes without saying that, while attempting to relocate the FSYNC pulse, I forgot several times to raise or to lower FSYNC here and there in the code, and this resulted in some very unpleasant surprises. One such surprise was that on occasions, 3210’s DR11 which is expected to always contain 51 decimal, was found to contain nonsense. The worst by far such surprise was however when the board suddenly refused to power-up the DC-DC converter. Finally, I tracked this down to be a forgotten FSYNC pulse in some code branch: the 3210 seems to update its register values that contain sampled values (but also its output PWM signal that drives the converter) at a rate of 1 32kHz, as clocked by FSYNC! So, because FSYNC was erratic, convergence of the DC-DC converter to the desired 65V was very slow. Incredible!

Concluding, what is it that I have accomplished (well, hopefully…) after all this debugging? Quite a few things: one, I have added the SOF profiling code, which, as I am discussing right below, is very useful; two, I have managed to synchronize the board’s clock to the USB clock without resorting to the complication of an audio-class USB stack (there is still one caveat: the skew between the two clocks must be within 0.05% , or else the board’s crystal must be replaced, especially if the board’s clock is slower, but the skew can be now measured, thanks to the SOF profiling code); three, I have attributed missed packets to VMware and tested my driver in a native Linux environment, where it works perfectly; finally, I have traced the source of distortion (audio clipping) in the IN direction, and hopefully have fixed that, both for slower- and for faster-than-USB board clocks.

Unless some other annoying bug shows up, this time I have to promise to myself that the next thing I ‘ll do is the Asterisk code. Let’s see whether I can keep my promise (OK, I may deviate a bit from that schedule in order to upload all recent changes in the driver and the firmware to the project’s Google code page). Meanwhile, if you think that you can help with development, please do reply to the poll below what is your preferred flavor of a board. I am repeating the poll right here, for those who have missed it in my previous post (I am also repeating here that the purpose behind the poll is to help in the formation of a small developer community — not to make profit by selling boards and kits):

Advertisements

Dongle files uploaded

March 15, 2010

People seem to have liked a lot the dongle form of the board — thanks, folks! This blog keeps its promises (well, at least to a certain extent) and thus I have corrected a few things here and there in the board’s layout and I have uploaded the Cadsoft Eagle files to the project’s google code svn (here). Besides the .sch and the .brd file, you can find also a BOM Excel sheet (with Farnell product codes for many of the parts — but, depending on the part of the world where you live, you might prefer looking these up at Digikey, Mouser, etc.) and two .lbr files. One contains the Eagle device for the shielded choke L1 and the other one contains the Eagle devices for the 3210 and the 3201. I have not found TSSOP-38 packages available elsewhere, so people may find these useful.

Among the corrections I have made were many text moves in the tNames layer, so now most parts’ names don’t coincide with vias as they used to and should be printed OK on the board’s silk-screen. One other change I made was that I changed the board-edge connector with a 2×3 pinhead, which is much easier to work with. Parts on the bottom-side of the board were moved slightly to make room for the new, larger connector, however other than that, no significant changes have been made.

One word of warning: although the changes I made were minimal, the new board is yet untested. There is still one medium-importance issue with the dongle board that I have tested, and this is that the 3210 gets hot over time (it reaches something like 60o C, and I have not left the board plugged-in for too long to see if it will eventually burn). Maybe this is due to the placement of components, so if my next dongle board gets hot too, I will have to either take the 3210 away from the PIC (they both dissipate heat, and them two being one atop the other is probably what gives the poor 3210 a hard time) or somehow add a heat sink. With this caveat, the dongle design is OK, so I have uploaded all the relevant material.

So far, I have been working alone in this project. However, there are a few signs of  interest from people occasionally volunteering to help in one way or another. This gave me the idea of creating a poll to see if you, the readers of this blog, would like to get your hands on a board and try for yourselves how things work. So, if you ‘d like to participate in the project and help out with further development, please fill in the questionnaire below.

The poll is anonymous and will serve just to help me in ordering an appropriate quantity of materials to be able to make/send a board to those who would like to help (should I finally choose to do it, something that I am not promising to do). You will not assume any obligation by replying in the poll. You will not order anything online, nor will you pay anything. You will not even need to promise anything, like that you will help. You will just express a wish, and even that will be anonymously. On my part of the deal now, I am not obliged anyhow to fulfill your wish. I am just trying to see if people would like to help the project and what I can do to help them help it. That’s all.

Please do note that I am not trying to make any money out of this. At its current stage, Open USB FXS is not a useful product and will definitely not serve any purpose other than experimentation and further development. It would be unethical for me to promise otherwise and try to make profit out of this. This means that, provided that readers show some interest in getting their hands on a board and assuming I choose to send over some boards after the poll results, I am going to publicize (my) cost of materials and ask interested readers to cover exactly that cost, not a single penny above. All that said, here is the poll.

Back on the code front, I am now working to resolve some firmware bugs that I have found. I will report on these in my next post. Besides firmware issues, I have been trying to come up with the Asterisk channel driver, doing dutifully my daily reading and digesting screenful after of screenful all the Asterisk channel driver code I could. While reading on, I started again to hesitate between writing my own very simple channel driver on the one hand, and trying to go back to the Linux driver and produce a Dahdi/Zaptel-compatible driver on the other [note: I am using D/Z to refer to Dahdi/Zaptel in this post].

You see, in Asterisk, an awful lot of functionality available for D/Z devices has to be (also) implemented in the channel driver: call waiting, call transfer, multi-way voice conferencing, caller id, and many other “smart” PBX functions have to be supported by the channel driver or else they won’t work at all. So, was I really in the right track trying to rewrite all of this on my own? On the other hand, trying to make my driver D/Z compatible looks daunting as well, since I have come to master at least the elementaries of a D/Z channel driver, but I am quite ignorant of the structure of a kernel D/Z-compatible driver. But it looks there is a structure, and eventually it should not be extremely hard to follow that.

More on that in my next post. It should not take to long!

Dongle and channel driver: a quick update

February 25, 2010

So what’s happening with Open USB FXS these days? I know I ‘ve been silent lately, so you might expect me to come up and announce some really-big-next-thing. The truth is that there’s nothing tremendous to announce. However, there are a few new things to report.

I am finishing with the assembly of the “dongle” board. The construction revealed no big mistakes (yet), except one really stupid glitch: in creating the board-edge ICSP “connector” (consists of 2 sets of three stripes of copper at the edge of the board, I have placed the axes of the stripes 2 mm apart, whereas conventional board-edge connectors have a distance of 2.54 mm. Shoot… This is not the end of the world, since I can always solder wires on the “connector”‘s stripes and then remove them after I program the PIC, however it’s not really elegant. So I ‘ll have to correct that before I release the Eagle files (see why I haven’t done so yet?).

I plan to upload a few shots of the assembled dongle here. Here are two shots of the assembled dongle board.

On the top side, left to right, one can see the mounted type A USB plug, the DIP switches, the DC-DC converter circuitry (partially hidden behind the shielded choke L1), the PIC’s crystal, the Si3210 and its army of resistors and capacitors around it, the DC decoupling capacitor CDC1 and, behind that, the RJ11 plug. Notice that the top-side contacts of KICSP are hidden behind the RJ11 plug.

On the  bottom side, one can see (left to right) the output circuitry of the Si3201, the 3201 itself, he PIC, the crystal oscillator capacitors (front), the PIC (middle) and its pull-up/pull-down resistors (back) and the Si3210 overvoltage protection circuitry.

I still owe to report on how the dongle will work. Even if it doesn’t work at all, I find it’s a beautiful construction, so I won’t complain — at least, not too much. Don’t you agree? Isn’t it very elegant? Anyway, enough with bragging, let’s get back to actual progress.

In the Asterisk channel driver front, I am making some slow progress. I have managed to understand the basics of a channel driver module file and the necessary structures and functions. Plus, I have taken a good look at a couple of channels (mostly chan_oss, chan_sip and chan_dahdi — the latter, excluding all about PRI, SS7 and the like). Basically, the channel driver module consists of three “interface” functions (load, unload and reload) which are called by the Asterisk core. Then, the load (and most similar, the reload) function initializes just about everything, reads in the configuration from the respective file and fires off a “monitor” thread. That thread is responsible for well, all the monitoring work in active channel instances. In simple channel drivers (like mine), the monitor thread might also do the audio data I/O, whereas in more advanced ones, the I/O can be “bridged” directly with some other channel.

Then, there are functions to create a new instance of the channel (e.g., when a new call arrives) and deleting an instance (when a call finishes off and the respective interface should go idle). Remember that a channel driver handles many instances of the same “class” of interface (in my case, say, many dongles connected to the same computer), and an instance exists only when an interface is active (is engaged in a call).

This is roughly it, however I have still lots to study and lots to learn.

As I ‘ve done in the past, I ‘ll be updating this post as I progress with the channel driver. Hopefully it won’t be long until I report some good results.

Update, March 2: some good and some bad news: the dongle worked its dc-dc converter right away! It also rings the phone with no problem and its output audio quality is perfect. However, for some reason, the input audio (recording) is distorted. This is a bug I have seen on some other prototypes of mine as well. Until now, I thought that it was due to bad quality materials, like unrated voltage capacitors. As a matter of fact, replacing the caps in one of my prototypes fixed that. However, the dongle had the rightly-rated caps from the start, so this should not be it. Having more than one prototypes that present the same issue means that the cause might be not in the materials, but somewhere in the design (although it shouldn’t, because this is very close to the manufacturer’s reference design). Another possible cause might be a bug in my Si3210 register settings or calibration procedure. All this means that I have to return (again) to an unpleasant session of hardware debugging in order to spot and fix the issue. Nevertheless, I am quite optimistic on this. If need be, I ‘ll consult some fellows I know who happen to have more expertise on electronics than I do. On the other hand, if, among all you readers (see how I make it sound as if this blog had a few thousands of readers, heh?) anyone can suggest some possible cause or some debugging methodology that will help spot the problem, I would be delighted to hear from you.

Honey, I shrunk the ki–ahem, the board!

February 4, 2010

This could be me, showing my wife the new board!

After a friend’s advice, I decided to take some days off coding (which was what I really ought to do next, to be honest). I still need to troubleshoot my hardware to correct some clicking sounds (that do not come from missed USB packets), write an Asterisk channel driver, debug that driver, bring up an Asterisk system, probably set up an IAX or other link to a friend’s Asterisk and start running exhaustive tests. All this felt a bit too stressful, so as I already said, I decided to take a break and do something more fun.

What was so fun that I chose to do? Well, I redesigned my board from scratch, this time into a “dongle”-like form factor, that is, into something that can plug directly into the USB port of a computer. Here is the result. In this image you can see the top layer and component placement. In the image below, you can see the bottom layer and component placement.

To be honest, my effort has failed. A true dongle has a size only slightly larger than a type A usb male plug. My new board is 30 mm wide, so it’s much wider than a true dongle. However, it is not so bad as a first attempt and it is considerably smaller than my previous board. So I thought about publishing the design and, maybe later, after giving it a bit more thought, I may (or may not) attempt to shrink things even further.

Please note that this is totally untested. The board has passed Eagle’s DRC tests, however it may still contain serious errors that will show up either at production time or when one attempts to assemble it (e.g., incorrect placement of components), or operate it (e.g., interference, etc.). This is just to let the readers of this blog know that a “dongle”-like version of Open USB FXS is possible after all. Nevertheless, I may stick with this design, and try to manufacture a couple of boards in this form, and, if things work out OK, then I ‘ll upload Eagle source files on the project’s Google code site. In the meantime, you can just enjoy the big news — or maybe not so big, after all…

Update, Feb 14 2010: the dongle board is now more than a couple of Eagle files. I have had a couple of professional quality real PCBs manifactured and shipped to me (I used the Elektor magazine PCB service, in case you are interested — actually a franchise of Eurocircuits, but enough now with advertising, especially since I don’t get payed for it, nor do I intend to). I have not yet started mounting anything on them, but I think they look beautiful, so I decided to post a couple of PCB shots. Here they are:

The shot above shows the dongle’s top-side. On that side, I have placed the 3210, the bulky shielded power choke and the rest of the DC-DC converter circuitry, the PIC’s oscillator crystal, and a SMD DIL switch to replace S1 and S2. On the two ends of the PCB go the connectors to the outside world, a male type A plug that gets soldered on-board and the RJ-11 receptacle.

The shot above shows the bottom side. On that side I have placed the PIC and most of its related materials (pull-up resistors, user LED — this is placed right under the RJ11 receptacle –, etc.) and the 3201 line driver and associated circuitry. I hope you enjoy the shots.

One remark is that the PCB manufacturer has not produced a silk mask for the bottom side of the board, although I produced and submitted one. This makes the bottom side less “readable”, however this is the least of my concerns now. I am to make a working Open USB FXS out of this PCB, and from my experience this is the thing to worry about. Especially, since I am running out of stock in prototype 3210’s, and these days it seems that none of the big worldwide stock houses have them in stock. All this means that I ‘ll have to make my best to get my dongle board work with the first shot. We ‘ll see.

Another piece of news is that I changed the unrated-voltage capacitors that I used in some of my (original, large-form) prototype boards with correctly-rated ones. So, I am proud to announce that now one of my prototypes works perfectly well: no noise, no clicking sounds, no Darth Vader sound effects, perfect voice recording, perfect reproduction. Trust me, it feels really reassuring to have one perfect prototype after all this time! So now, I am going to progress with the dongle and the channel driver in parallel. What will come first? The next post will show…

Open USB FXS is now ready!

January 18, 2010

This is a historic moment! After successful completion of the remaining read() tests, I am hereby declaring my Open USB FXS project as working!

In summary, the board works, and has a Linux driver that can be used by easy-to-read/easy-to-write/easy-to-understand userspace code. So, I decided that this post should wrap up some of the issues I found and handled during development of the driver. Before doing that, however, I must confess there are some caveats: (a) my three prototype boards all exhibit some issues and need to be carefully debugged; (b) the driver does not handle well error cases (e.g., a board that stops responding after some time of correct operation) and needs work in this area. Other than that, things are quite OK. These having been said, here come the nasty details you were longing for.

First comes the patch to the Open USB FXS schematic:

As you can see, the control bus between the PIC and the 3210 now includes a new signal, \INT, that connects pin 2 of the Si3210 (\INT) to pin 13 of the PIC (RC2). Because  the Si3210 provides an open-drain signal (when the signal is not logical zero, it is tri-stated, and this serves to facilitate connecting more 3210’s in parallel to a controller using a single wire), \INT needs to be pulled-up to +5v via a 10k resistor (RINT). You may have also noticed that I removed the 32.768-kHz crystal and its capacitors from the design, since these were not used anywhere.

Second comes the patch to the board design:

A few signal lines around pin 2 of the Si3210 have been moved slightly in order to make room for the new \INT signal; the signal lines are highlighted using Eagle’s “show” function and appear in brighter colors. RINT has been placed on the board’s bottom side. However, if you need to hand-patch the previous version of the board using wires, it’s easier to solder RINT on the top side, to a +5vDC line, e.g., near CDC2.

Next, I am giving out some details about the Linux driver (a typical char device driver). The driver code can be thought of as consisting of four conceptual sections: (1) the USB core interface, (2) the device initialization code, (3) the isochronous I/O engine, (4) the file ops section (open, release, read, write and ioctl).

The USB core interface section is copied as shamelessly as it was possible from the USB skeleton driver (that comes with the Linux kernel source) by Greg Kroah-Hartman. It consists of all the ugly stuff that is needed for the driver to recognize the device when it is plugged and set up the interface between the kernel’s USB core and the driver. This part also handles (quite cleanly, I must admit) the ugly things that happen when the user disconnects the device by unplugging the USB cable all-at-a-sudden, by coincidence at the exact time that your driver does its most-critical-task-of-all-tasks-ever. It still doesn’t crash, which means Greg has done a very good job! Since this section is mostly copied-and-pasted from the USB skeleton code, there’s not really much more to discuss here, so I am moving on to the next sections.

The device initialization section is also dealt with very extensively in my previous posts. Essentially, after making sure that some 3210’s registers report back meaningful values, the code initializes everything there is to initialize in the board. A subtlety that is worth noting is that, since this takes a long time and may involve repetitive patterns, like testing for the DC-DC converter output a number of times, the whole initialization is performed by a worker thread. A “board-state” variable is the means of communication between the initializer thread and the rest of the code; if the state is not OK, all file ops will fail reporting either “Try again later” or “I/O error”, depending on whether the board is still initializing or has failed.

The isochronous I/O engine is where I have departed significantly from the USB skeleton code. This is not something I did for fun; simply enough, the skeleton code does not contain anything isochronous, and the USB core provides only low-level primitives for isochronous transfers. The driver code has to prepare its own URBs (this is what a “request” is called in USB jargon; the acronym comes from “user request block”), submit them to the core and provide a callback function that is to be invoked when that URB completes. Moreover, the calling code must provide explicitly for buffering and for the exact position of each “packet” within a larger buffer that can contain many small-sized isochronous packets. Thus, in the general case, isochronous packets may vary in size — not our case, anyway.

In theory, designing an isochronous engine with these tools is not that hard: one just submits a bunch of URBs (a single submitted URB at a time is not safe and will most probably result in data loss; it is best to have at least one more pre-submitted for the USB core to pick up when the first is done with), and then, as each URB completes, one provides code in the callback function to re-arm the URB and re-submit it with new data. This is easier said than done, though… Plus, there are some interesting questions arising. The most obvious one is, what is a good number of URBs to pre-submit? And, similarly, how many packets should each URB contain? Obviously, there is a tradeoff between submitting too few URBs — or too few packets per URB  — (the asynchronous nature of events in the Linux kernel may result in late invocation of the driver code, which means data loss) and too many of them (unneeded delays between the computer and the board will build up, due to buffering too many data). In the driver code, I have made these two parameters into module variables that can be set when the module is loaded into the kernel (separate variables exist for the OUT and IN directions). The defaults provide for four pre-queued URBs, each with 4 packets each, amounting to a total delay of 16ms — not too terrible, I hope. With these settings, sometimes there are audible clicks when the VM (remember I am running Linux on VMWare) performs some other job, like compiling the module. Other than that, the results are not bad. One can always compile in other defaults, or use the module variables to test how the module does with various values.

The other interesting question (which brings me to the file ops section) is, how the heck can this automaton synchronize with user I/O operations like read() and write(), which are inherently asynchronous? In order to answer that, it is best to think of the synchronous URBs as railroad “trains”. Here is how this metaphore helps understanding read and write I/O.

It’s easier to first think of write I/O as follows: each byte to be written can be thought of as a passenger, who is to travel with the “train”. Each train departs at a given time, and if our passenger is not there to catch it, that’s too bad for her, the train will go anyway and she ‘ll have to wait for the next train. If the passenger is on-time for the train that waits on the station and the train has free room for her, she will get on this train. If the train is full however, she will have to wait for the next train (and, if that’s full as well, for the next one, up to a limit). If all scheduled trains are full, our passenger just goes to sleep and must be awakened when a new train is scheduled (when a URB is completed).

To complicate things a bit further, the board does not understand individual bytes, but instead requires a minimum sample of 8 bytes. OK, here is how this fits into the train metaphore. Passengers must travel in compartments (“coupés”) of eight persons. A compartment can either be fully populated, or else it must remain empty. Thus, before getting into the train, passengers who arrive at the station one by one first have to gather into a small waiting chamber which has a capacity of  eight persons. If the time comes for the train to go while the waiting chamber is half-full, the unfortunate passengers have to wait for the next train. Otherwise, as soon as the chamber fills up, all eight passengers embark together at once (needless to say that this whole silly waiting chamber routine is not needed when we have a massive arrival of passengers, except for the last few –less-than-eight– passengers).

This is roughly how write() works. There are some details, needed in order to synchronize the user-facing side of write() with the device-facing side of the “train-scheduling” automaton. For example, write() cannot be allowed to hold a buffer (a “train”) for long intervals; when the time comes, the URB completion callback will only spin-wait until write places the next 8-byte chunk, and will then take the train –ehm,  the buffer– away from write(). In this case, the train departs with less compartments than its full capacity, in the hope that a large-sized write() will fill the next buffer.

Read() works almost the same. Trains arrive at regular intervals and bring into the station a fixed number of passengers each time. Passengers wait inside their trains to get picked up (by a car or by bus or whatever, let’s not be too realistic in this case). The station can hold a limited number of trains simultaneously. If the passengers of the oldest-arrived train have not been picked up when a new train arrives, the un-picked passengers have to be, ehmm, eradicated (didn’t I tell you, don’t take this example too literally…) to make room for the newcomers. Again, a small eight-person waiting chamber helps picking up loners in the correct order. If you wonder why this is needed in the case of read(), it’s because we cannot leave a train with a half-empty compartment (which is what would happen if we let passengers disembark one-by-one from trains) or else we shall mis-align the next samples. Instead, passengers must again get off the train as a full-compartment-group and then wait in the small chamber to be picked individually.

I hope all these details are more than enough and that by now you are completely fed up with read() and write(). Good, because there are also a few ioctls to discuss about. The most interesting ones (which, me being a lazy guy,  are the only ones I have implemented) are IOCSRING (sets ring on or off), IOCSLMODE (sets the line mode to open or forward active), IOCGHOOK (returns the hook state) and IOCGDTMF (returns the DTMF code currently seen). The DTMF part deserves a few more words. The board piggybacks the (hook and) DTMF state in all data packets (the newly-patched \INT signal serves to tell the PIC when to look for these). The driver code implements a sort of a one-digit “latch”, to report each pressed DTMF digit once and only once, and that only while the digit key  is being pressed. Thus, the user code must poll for DTMF by probing frequently the device, but in theory this probing does not have to be as frequent as data I/O (a valid DTMF code should last  at least 75 ms).

A final caveat note for the device driver: in its current version, the code does not allow more than one simultaneous open()s of the device. This saved me a lot of trouble and case-handling in the code. On the other hand, this precaution does not really guarantee too much, since a multi-threaded userspace app can still mess up everything by issuing many simultaneous read()s or write()s. The driver in its current state is guaranteed to misbehave or crash if this happens. Don’t try it or don’t blame me for what will happen.

Here is an example of code. I hope it is quite clean and easy to read for all readers (after all, that’s what the driver is useful for: it hides all the ugly floor planks, holes and cracks underneath a nice Persian rug). So, ideally, you should not need to ask me what the program does. Ah, and let me not forget this as well: I will provide an update to this post when I upload the newest versions to the project’s google code site.

# include <fcntl.h>
# include <stdlib.h>
# include <stdio.h>
# include <unistd.h>
# include "openusbfxs.h"

main () {
    int d, o, i;
    char c[8], n;
    int t = 0;
    int h;
    int k;
    if ((d = open ("vm-options.ulaw", O_RDONLY)) < 0) {
        perror ("open vm-options.ulaw failed");
        exit (1);
    }
    if ((o = open ("/dev/openusbfxs0", O_WRONLY)) < 0) {
        perror ("open /dev/openusbfxs0 failed");
        exit (1);
    }
    sleep (1);
    if ((i = ioctl (o, OPENUSBFXS_IOCGHOOK, &h)) < 0) {
        perror ("IOCGHOOK failed");
        exit (1);
    }
    if (h) {
        printf ("Not ringing since set is off-hook\n");
    }
    else {
        if ((i = ioctl (o, OPENUSBFXS_IOCSRING, 1)) < 0) {
            perror ("IOCSRING (on) failed");
            exit (1);
        }
        sleep (1);
        if ((i = ioctl (o, OPENUSBFXS_IOCSRING, 0)) < 0) {
            perror ("IOCSRING (off) failed");
            exit (1);
        }
    }
    while (1) {
        if ((i = ioctl (o, OPENUSBFXS_IOCGHOOK, &h)) < 0) {
            perror ("IOCGHOOK failed");
            exit (1);
        }
        printf ("Phone is %s-hook\n", (h)? "off":"on");
        if (h) break;
        sleep (1);
    }
    while (read (d, &c[0], 8 ) == 8 ) {
        if ((n = write (o, &c[0], 8)) < 0) {
            perror ("write failed");
            exit (1);
        }
        if (n < 8 ) {
            fprintf (stderr, "write returned %d\n", n);
            break;
        }
        t += 8;
        if ((i = ioctl (o, OPENUSBFXS_IOCGHOOK, &h)) < 0) {
            perror ("IOCGHOOK failed");
            exit (1);
        }
        if (!h) {
            printf ("Phone is %s-hook\n", (h)? "off":"on");
            break;
        }
        if ((i = ioctl (o, OPENUSBFXS_IOCGDTMF, &k)) < 0) {
            perror ("IOCGDTMF failed");
            exit (1);
        }
        if (k) {
            printf ("DTMF key pressed: %c\n", k);
        }
    }
    printf ("A total of %d bytes were written\n", t);
}

Update, Jan 21: all the new code and changes/fixes to existing code have been updated to the project’s google code site. This includes the new patched versions of the schematic and the board (the old versions have been kept too in the source, as “openusbfxs-unpatched”), the updated firmware (in source and hex format), the latest Win32 console driver code (a fix was needed for understanding ring trip detection [this is what an off-hook event while the phone is ringing is called in 3210’s language]), and, of course, the newest version of the driver, including the read() and ioctl functionality, plus many other small fixes.

Now that the driver code is out in its entirety, it would be nice to hear some reports on how well the code ports to various Linux kernel versions, so if anyone out there tries it with versions different than 2.6.26-2, please inform me on the results. One note: I know that the code will not port to versions prior to 2.6.11 or .12 (some API changes were made into the kernel by then), and I don’t really intend to back-port the driver to older kernel versions at this time (maybe later, because there is nothing special in these API changes that would really prevent the code from working on older kernels). On the contrary, if the code does not port well to newer kernel versions, and somebody is kind enough to point me to whatever incompatibility is found there, I would appreciate it very much.

A Linux driver’s day in life

December 1, 2009

“There is a great future in store for you and me, my boy — a great future!”
(Diktor to Bob Wilson, in Robert Heinlein’s legendary By his bootstraps)

After some elementary reading, I started writing a Linux device driver for Open USB FXS. Since I devote little time to this activity, it may well take some months. So, I figured that waiting for that long until I post a big “I made it!” article would result in (both me and anyone reading this) missing all the fun. Instead, I thought it would make better sense to write this post initially and update it as I progress through the driver.

I chose to develop my driver on a Lenny system (Debian 5.0.3, kernel version 2.6.26-2). I also chose to host my development system on vmware and, although I have some reasonable doubts about how well is isochronous USB timing going to perform on vmware, all the rest should go fine.

To begin with, the kernel source code comes with a veeeery useful “skeleton” USB driver, which contains all the required functionality for a boilerplate USB device with two bulk endpoints (one IN and one OUT). Based on that, I have already written a 300-line “driver” kernel module which loads and unloads OK, recognizes my Open USB FXS board when plugged in, creates the required /dev entries, checks the available endpoints and their sizes and produces an error on every other attempted operation (like opening the /dev/openusbfxsN file). Plus, it produces lots of unneeded debug information. Plus — didn’t I mention it before? — it doesn’t “oops” anywhere yet (the Linux kernel jargon word for doing something semi-fatal, like accessing a zero pointer). Plus, I think it does not leak memory and minor device numbers and it doesn’t lock-up the kernel — at least, not yet. Great job, right?

My next steps will be to implement (read: copy-paste from the skeleton example with necessary adaptations) the required basic functionality like device open, release etc. From then on, some design questions are arising. For example, what would be the actual read/write syscall functionality? Who would do the board initialization (the driver, or a userspace program)? Should I provide a 3210 register view, or just ioctl’s for events like on- and off-hook? And so on… So far, I intend to push the initialization code inside the driver, and (later on) write a couple of ioctls for re-initializing and/or using different initialization parameters without reloading the module. And, as far as the read/write functionality, this ought to be translated into PCM data transfers from/to the board resp., which means isochronous transfer. Probably something that can be done via two circular buffers and some code to schedule events around the two continuous IN/OUT streams.

Updates to this post will  follow sooner or later, as I ‘ll be progressing through these steps within my driver. Whenever I get to having something that can be considered working code, I ‘ll update accordingly the project’s Google Code repository. So, bear with me, there’s a great future in store…

Update, December 3: I have managed to talk to the board using a simple blocking primitive (usb_bulk_msg()). I am now working on the initialization of the board. I chose not to defer initializing the ProSLIC until someone open()s the device, because this would block the caller for quite some time — or even forever, if the device refuses to initialize properly, as in the case of the DC-DC converter refusing to power-up (exactly what my board has chosen to do these days, halas!…). So, I chose to delegate this task to a worker thread and, unless the device has initialized OK by the time the open() occurs, fail the system call with -EBUSY or something similar. So far, the worker thread communicates with the board OK, although I am a bit concerned about possible race conditions that my design choice leaves open. For the time, however, another issue is puzzling me: if I leave the board plugged in and rmmod-then-re-insmod my driver module, the board refuses to answer to commands over the bulk USB EPs. I am debugging this and hopefully I ‘ll find out a solution. A possible idea is that rmmod’ing the driver somehow stalls the endpoints on the device, but this definitely needs further investigation. I ‘ll report results whenever I have any.

Update, December 6: I have resolved the “board not responding” issue. It seems to be due to some vmware peculiarity. A few seconds later, after a couple of timeouts, the board responds OK. So I am now in the middle of copying all the initialization functionality from the “console” version of the Windows userspace “driver” program I have written into the new driver’s worker thread that initializes the board. In other words, things are going quite as expected so far. I guess the next mini-challenge will be to implement the isochronous read/write functionality using the lower-level URB I/O kernel API instead of the higher level functions I am using to read and write registers. Time will show.

[Which reminds me of a note I should have written in the beggining of this post about my quotation from By his Bootstraps: it feels like the “great future” of my Linux driver (“Bob”) is to repeat the same old functionality that already exists in its userland cousin (“Diktor”). But without the prospect of writing “Bob”, the Linux version for Asterisk, “Diktor”, Bob’s Windows userland cousin, would not have been written at all. Confusing? Well, do read Heinlein’s short novel if you haven’t already, and these funny windings of circular — or better still, spiral — logic may sound to you more familiar by the next time you visit.]

Update, December 9: the board initialization is complete, but the hard part is really starting now. My first step will be to implement the write() syscall. Since I am writing a char device driver, a userland program has the right to open /dev/openusbfxsN and then start writing PCM audio data one-byte-at-a-time; expensive as writing data in this way may be, a fast CPU system can cope with that on time, and my driver must support it. To map even this clearly degenerate case correctly onto the mechanics of isochronous USB I/O requires me to (a) pre-buffer data, waiting until a “good number of samples” becomes available; (b) packetize these data; (c) enqueue packetized data as URBs for isochronous submission; and, (d) block the userland caller program if no buffer space is left in the buffer or if a high number of URBs have already been submitted.  What exactly does a “good number of samples” stand for? My userland experience has shown that, queueing URBs with just a few 1ms-“packets” each results in poor quality. Maybe the kernelspace implementation will not be as sluggish as its userspace analogue, however, submitting many small-size URBs may still be a bad idea, because it adds unnecessary load to the system. Besides that, some day I may have to perform echo cancelling in the driver, so keeping a buffer around does not quite sound like the wrong thing to do. On the other side, pre-buffering too many samples will result in a noticeable delay. On the top of all this, the pre-buffering stage may act as a jitter buffer, which will amortize the variable delays between network packet arrivals; but then again, is the kernel driver the right place to place a jitter buffer, or is it better to put it in the “channel driver” code? Well, I guess I ‘ll have to find out the answers to all these questions while implementing the write syscall (hasn’t that always been the case in this blog, after all?…).

Update, December 13: I already dumped my first attempt to implement the write() system call. I figured I could do it in a totally asynchronous manner, without any sort of regularity like fixed buffer size or scheduling, both in the user-side part (the openusbfxs_write() function) and the back-end (urb submission function and completion callback). It turned out that this approach incurs too many subtleties in synchronizing things around, plus it is too complex and wasteful in terms of system resources. So, my second approach is to arrange for a way of pre-allocating fixed-size buffers and schedule a semi-constant-rate buffer submission routine in the back-end, while trying to fill at least one such buffer on-time in the front end. If a buffer is only partially filled by the time it is scheduled to be submitted, it will be sent out with as much data as it contains; and, if no data exists at all, no buffer will be sent out and the board will play a “silence” period on the phone set (which will result in an audible “click”, so maybe I ‘ll think of something better for that case). BTW, this whole complexity makes me understand why other boards and drivers do not employ isochronous pipes at all… Instead, the approach there is to enqueue samples and send a bunch of them to the board every now and then — less often than once per microframe, I guess — using bulk usb transfers. Oh, what the heck, if isochronous doesn’t work out for me, I could resort to a method like that as well. But for the time, I am sticking with my current design, and hope for the best. The next update will show.

Update, December 16: I ‘ve got a first version of isochronous write to work. The principle is to have a small number of buffers always submitted for isochronous transmission, and schedule a new buffer for transmission each time the transmition completion callback is invoked. A somewhat complicated per-buffer locking/state marking scheme ensures synchronization between the write() syscall implementation and the background transmission mechanism. Although this “somewhat complicated locking scheme” caused me a couple of total system hangs in some early versions of the code, it finally worked OK. I decided to stick to this, because it ensures minimal friction between background and foreground threads, which is essential in guaranteeing that I ‘ll have no delays or losses. VMware seems to cause no problem at all, since a sniffer on Windows shows my 512-byte buffers being transmitted OK with the appropriate inter-buffer delay (32 milliseconds for a 512-bytes buffer, corresponding to 32 packets of 16 bytes each, consisting of an 8-byte header and 8 bytes of payload). Underruns are handled fine, transmitting zeros (this could be any other value), and partial buffer writes are dealt with as well — at least in theory they should be. So, my next step, after the necessary code cleanup and some elementary unit testing, will be to implement the read() syscall. Good luck!

Update, December 17: many small fixes here and there, and now the write() syscall works quite decently! It handles well data sizes that are not multiples of the chunk size. Even in the degenerate case of one-byte writes that I have previously outlined, not only does the driver work fine (ehm, that is, after fixing a bug or two that were locking down the kernel), but it also manages to cope perfectly well with the 128kbps rate (twice the PCM rate, because of packet headers) without underruns (and all that, despite of the large number of debugging messages logged at each one-byte write() operation). I couldn’t hope for any better than that! The catch there was to add to the dev structure an additional small (1-chunk-long) buffer where data fragments are stored until a full chunk is accumulated. One thing that I am still missing is to add a mutex in order to keep out a potential second write()r while the write() syscall code messes with the inners of the dev structure [although I implement exclusive open(), my guess is that I cannot really prevent forked or multi-threaded clients from issuing parallel write()s, so it is better to play it safe there]. Probably I ‘ll use two separate mutexes though, in order to allow read()s to proceed in parallel with write()s. Will see.

Update, December 21 (or “Hello, Wolrd!” #2): although by now I should have done read() and some ioctl()s, I thought I ‘d give write() a bit of a finishing touch. The reason I stuck with write() is that I took a look at how Asterisk handles things. Unless I did a very hasty job reviewing the code (which I did anyway), it seems to me that Asterisk moves around  frames that contain both audio samples and control information. These frames are then just written to the devices of the system. Although I saw no obvious rule there, there seems to be a silent assumption that (at least) audio data are always written as integral chunks. Hmm… I then checked David’s villagetelco 8250 drivers. What I noted in those was that there are two distinct devices, one for reading/writing PCM data and one for doing the ioctls (I am crafting a single device for both purposes). But then I had to think a bit about buffering. The other thing that I couldn’t help but noticing in David’s 8250 drivers was that there is almost no buffering at the driver level: data just get written to the chip’s serial output (and from there to the 3250, but it was no use for me to dig that deep). This is virtually impossible with isochronous USB, where some prebuffering is needed anyway to keep things going smoothly. However, my own code was at the right opposite side: lots and lots of buffering in all possible places, just to make sure that there will be no buffer underruns. Under normal conditions, a userspace program would be able to buffer 16 * 32 = 512 (which amounts to half a second of) data samples before blocking. Admittedly, this would be great in an audio playout environment, however telephony stuff should be more responsive, shouldn’t it? So, I tried to parameterize the number of in-flight buffers (i.e., buffers pre-submitted to the USB core for isochronous transmissions) and the depth of each buffer (i.e., the number of samples per buffer). Then, I tried my two tests: (a) plain copying a large audio file and (b) writing the same audio file a-byte-at-a-time with the smallest possible parameters. I noticed a bug when I asked for one sample per buffer (instead, the sniffer showed four of those, and the file took about twice as much to transmit). I haven’t yet found why that happened, but other than that, with two or more samples per buffer, things worked just fine. But my incredulous self wasn’t convinced yet; so, I thought it was worth trying the infallible audio test. This required tweaking just a few lines of code (setting the 3210 line to “forward active” mode at open time and back to “open” mode at release time), and the by-now-familiar Asterisk audio menu was sounding fine! With plain full-file copying, or with a-byte-at-a-time writes, and with all possible kinds of buffering, 2 (min.) to 512 (max.) milliseconds! There was virtually no difference in any scenario. Maybe in the single-submitted-buffer case I heard a couple of clicks, but nothing too embarrassing. [Note that the choice of having only a single buffer at a time submitted to the USB core is quite risky, because there are higher chances that the kernel does not call the transmition completion callback routine early enough to submit a new buffer before it’s time to transmit a new sample. This means that there will be missed transmission slots, or audio “clicks”, every now and then. And this is why doing things too synchronously is rather impossible with isochronous USB — on the other hand, relying to bulk USB would have its own problems as well, one of them being that, because there is no actual guarantee as to when a sample will be transmitted, one would have to re-create the timing information at the receiver, which is what the only DAHDI-supported USB-based card does]. I found just one – audible – caveat: when buffering large amounts of data (512ms), the last write() returns after queuing as much data as needed and the file is immediately closed afterwards. In this case, the 3210 is set to “open” mode before all data are played out, and this produces a noticeable “cut” a bit too soon, before the end of the audio message is actually reproduced. This is fixable, however at this point in time I feel I have spent (though not wasted!) way too much time in the intricacies of write(). So, it’s time I moved along to implementing read() and ioctl() — this time for real!

Update, December 22: Ioctl is doing well now. I can now probe the board for hook state and set the linefeed mode to open or forward active. A caveat however is that, because the hook state query ioctl is implemented using bulk USB I/O, I cannot use that ioctl to check the hook state while PCM audio data transfer is active, because if I do this garbles the sound altogether (in theory, it should not, but from theory to practice…). This is why I had already planned to provide the two most important events, hook state and DTMF, in continuous mode in a packet header field reserved for that purpose. It took me half a day to remember some intricacies of the firmware (which I have written) and to make sure that just polling for DTMF outside the tightly-timed loop of the timer ISR will provide a good sampling rate. If my calculations are right, doing so should result in querying the hook and DTMF state at a rate of about once every 50ms, which is quite OK, taking into account that the minimum duration of a valid DTMF signal is 75ms. Once looked up, the DTMF signal indication will be “latched” in a variable and will be continuously transmitted with every data packet until re-sampled, 50ms later. Thus, even if one or a few data packets are lost, in theory the event should be eventually noted at the host (at least in theory…). Tonight (2 am) it’s somewhat late to test all this, so I guess there’s more to come tomorrow, at least for a simple test. Hang on… Oh, and by the way: a couple of more fixes in write() were needed; I changed spin_lock_irq to spin_lock_irqsave all over, because it seems that otherwise there were situations where I enabled irqs at spin_unlock_irq while I should not, resulting in lockups. The code is behaving quite better now. One more fix (a place where a fixed value was used instead of a module parameter) revealed that, with less than 4 packets per isochronous URB and less than 4 isochronous URBs in flight, the sound quality becomes unstable, with some clicks and interruptions. OK, this means that my driver incurs a delay of 16 ms in order to produce tolerable sound — not too terrible a penalty after all.

Update, December 24: I stumbled upon something… I don’t quite know why, but it seems that doing SPI I/O while PCM audio data transfer is active causes noise and poor performance (clicks and the like) . However, doing SPI I/O in parallel with PCM data I/O is required in order to check the hook and DTMF status… A non-exclusive list of possible explanations is: (a) I am using a variable from the USB I/O banks to store SPI-acquired data, and maybe this causes some short-term memory lock-up in the PIC; (b) it’s just interference, induced to the audio path either directly by the SPI clock and IO signals themselves, or (more plausibly) by the power bus (remember that power is scarce on a USB-powered device) via the power supply line; (c) it’s some other firmware bug. My preferred explanation is (b), but anyway the noise frequency is directly related to the rate at which I poll the 3210 registers over the SPI. Thus, if I check often enough, I am causing audible noise, if not, I am missing DTMF signals (verified that). It seems that the only way out of this (or the “right thing to do”) would be to have the PIC poll the interrupt signal that the 3210 generates and do SPI I/O only when an interrupt is asserted. However, in my hardware design I have not provided for a connection between 3210 pin 2 and the PIC… OK, so now what? After some quick thinking, I came up with this: I can patch my schematic and board to use PIC pin #13 (RC2) for polling the 3210 \INT signal without too much of a board redesign. So my short-term plan is to patch my board (soldering hairline-thick wires on semi-invisible copper pads? Oh, no, don’t give me that again…) and test if things get any better. Will report on results, however, in the meantime, it’s almost eluded me that it’s Christmas Day eve, so it’s time to take a breath and wish Merry Christmas to all readers.

Update, December 24 (later on, and only shortly before Christmas): I hacked my board and the firmware as intended, so now hook state and DTMF detection work as expected! There is a little quirk that requires some fixing though: the 3210 generates a DTMF interrupt when a digit key is pressed on the phone set, however it does not do the same when the key is released. So, the best thing to do in the firmware is to note the interrupt, poll DR24 (DTMF state) once and report the status to the PC over USB, then go on polling DR24 at regular intervals (e.g. once every millisecond) until the key is released. Even if the perceived audio quality is affected, this will occur only during DTMF dialing, so the user should not care too much. Which means, milestone reached, task accomplished, and I ‘ll try to forget about my board and this blog until next Monday. Cheers to all!

December 31 (last update for 2009): I have taken a couple of days off writing my device driver in order to assemble a third prototype (currently, I have only two working ones, out of which the second is somewat problematic in the DC-DC and audio path). However, the third prototypem, while initially working quite OK, eventualy showed some signs of misbehavior around the PIC and its crystal, and thus needs hardware debugging (that I ‘ll do without for a while). The only other thing I ‘ve done was uploading latest versions of the firmware and the device driver on the project’s Google Code page, so anyone interested can see (and perhaps comment on or criticize) the driver code, albeit in an incomplete status (read is missing and ioctls are not quite functional). That’s all folks! (For 2009, that is). Happy new year to everyone!

Update, January 11: It took me some days to write this update, because I am in the midst of preparing one more working prototype for a friend to help me develop and test the code. However, as usual, building a prototype is an adventurous task that eats lots and lots of time. The new board is now in the (usual) early development state where USB I/O works, however the DC-DC converter refuses to power up (as usual). Will need some more debugging and hopefully some day I ‘ll fix it. In the driver front, I have written the isochronous IN “daemon” part, and tested whether the board and the driver can cope with simultaneous isochronous INs and OUTs (yes, they can, in case you were anxious about it). Implementing the read() system call is next.

Update, January 12: Still another nuisance: buried deep inside a table somewhere in p.125 of the si3210 manual, there is a note that the \INT signal is open-drain, which means it needs an external pull-up resistor. Hence my single-wire hardware patch is not enough, and the board becomes sluggish, because it is continuously testing for hook/dtmf status while there is no active interrupt. Going to fix that and read() has to wait a little bit longer. Meanwhile, I am tidying up a little the Linux driver code and synchronizing changes between that and its Windows command-line cousin.

Update, January 16: glad to see that the patch with a 10k pull-up resistor works fine! The board recognizes hook state changes and DTMF in warp speed without becoming sluggish. Good! In addition, read() is almost ready now. That is, the code is there, working and tested; however, until yesterday evening, there were two silly bugs plaguing me (one, read() was returning packet headers along with data and two, on some occasions, I was going off-limit while writing buffer memory, causing weird system hangs and crashes). I have found and corrected both of them, however, before reporting success, I need to prove the fixes correct. And there, I had a minor issue: my home prototype board (I have one prototype board at home and one at work!) did not, as of this morning, have the required “patch” to recognize hook state and DTMF (as a result, the PIC would probe continuously the 3210, inducing audible noise into the audio path). So, verifying the whole thing will have to wait. Nevertheless, I thought that finishing off this “driver’s-day-in-the-life” post is now justified, since ninety-a-lot of the work seems to have finished now. Thus, this will my last update to this post. A new post will follow, summarizing the inners of the driver and discussing the next steps. Soon after it is tested, the driver source will be on the project’s google code page. Thanks for bearing with me, tireless and understanding reader! I hope the result has rewarded your effort!

Discussing further development

October 19, 2009

So, what’s next? If you have been reading this blog for a while, you may wonder if this project is (as a matter of fact, if I am) active any longer. Well, there is not an easy answer to this question. The real answer is yes and no.

The answer is “no” because, starting from the end of June, we have at home with us our second baby. This has resulted in a drastic transformation of the spacial-temporal parameters of my electronics hobbying life: there is no time for heating my soldering iron, nor can I find anywhere in sight the countless hours that I used to need to experiment with my board. Plus, there is no space on which I can lay my half-finished boards, SMD components, voltmeters, soldering paste, phone sets, etc., at least not without risking one of the kids swallowing any of the above (ok, not the voltmeter and the phone set, but the rest, yes, why not?), burning itself with the soldering iron or getting tangled with the various cables, while I am busy nursing the other one. And, of course, my lab desk has been evicted from the house, in order to make room for new beds, drawers, toys and other equipment.

Although taking care of two kids in parallel requires my wife and me to move at speeds approaching the speed of light, apparently the result is that the available space and time shrink both proportionally to one another, contrary to what Albert Einstein’s theory of relativity predicts. It seems that the mechanics of baby care somehow eluded the great physicist when he was creating this monumental work, or else I am sure he would have formulated his theory so as to cover this exception. The solution might be to add to the equations a new quantity, called “love”; this quantity would grow inversely to the product of the available space and time, so as to restore the theory’s correctness, even in extreme conditions like the ones I am describing above.

The answer about the project’s activity is “yes” as well. I haven’t dumped completely the project, since I have turned my attention to porting it to the Linux world. After all, if you check my replies to the comments  on the “About” page of this blog (e.g., replies to Denver or Vincent), you will see that I think about targeting the project for Linux/Asterisk (which was my initial plan anyway). As you may have guessed, there are two challenges to this; the first one is called “Linux” and the second one “Asterisk”.

Let me first start from the “Linux” part. In order to get me somewhat prepared, I thought that a good idea would be to study the Linux kernel and device driver modules to see how they deal with USB communication. Without claiming that I have mastered even the slightest part of it, it seems that a classless (“custom”) USB device driver is not that difficult to write — at least not in principle. While the trickiest task is to handle correctly the various kernel beasties such as spinlocks, semaphores and the  like, USB primitives themselves seem very clear and simple to use. Even isochronous communication is very much like in (the Windows flavor of) libusb: one simply has to initialize a transfer by specifying the packet size, and then one needs to queue one or more buffers, while the kernel core handles the packetization and actual I/O. Checking for the completion of a transfer is done asynchronously; one needs to do that, in order to queue up additional buffers.

As a quick grep reveals, some devices (e.g. USB cameras) use extensively isochronous USB I/O. In principle, I could learn by studying the respective device drivers and produce a driver for Open USB FXS.

Asterisk, on the other hand, is another story. In order to create a device that Asterisk can understand, one needs to write code for a “channel”. I am still studying Asterisk and DAHDI (the Asterisk ex-Zaptel device drivers), in order to see how to proceed with this task. Whereas writing a Linux driver does not seem too hard, the “channel” code looks a bit tangled to me. I am sure that I will eventually understand better its insides after some studying — whenever I manage to do that studying, of course…

One additional piece of work that needs to be done is to update the board’s firmware in order to fit some status reporting along with the isochronous PCM data. My reply to Vincent’s comment on the “About” page explains this somewhat.

So, this is where I stand for the time being. My progress is awfully slow, so I must apologize to any readers who expect a quick something, such as a project “code release”. On the other hand, since I am not burried in the loneliness of soldering components, but instead I am trying to write Linux code, and since the project’s code is posted in public, anyone (e.g., kernel driver code developers, etc.) interested in contributing to the project is warmly welcome to contact me and provide help. Feel free — this is an Open project anyway!

Update Nov 9 2009: after studying a bit David’s chan_mp code (see his comment below) and  some Linux USB device drivers (particularly some ones with isochronous transfers), I think I ‘ve got a grasp of the whole thing. First of all, writing a device driver is not strictly necessary. I could rely on usbdevfs to access the device (I verified that a Linux systems sees the board and all its USB endpoints as usbdevfs descriptors), and write all code in userland using libusb (plus some trickery for ISO transfers that the plain libusb does not support yet). Although feasible, this is somewhat too tricky and would result in ugly code. It seems that the best separation between driver and userspace functionality would be to orchestrate the isochronous transfer code in the device driver and the higher-level handling in userspace. Doing that has a great advantage: the kernel can invoke a callback function in the device driver code every time an isochronous transfer is completed, and that callback can do several things, like freeing buffer space, queueing additional data, etc. — quite different from the polling model I ‘ve been using in my strictly-userspace code. Moreover, I can hide tons of low-level functionality in the device driver code (like, initialize the ProSLIC, etc.) and provide a high-level IOCTL interface to the userland. Not bad. At least, in principle, before drowning myself in all the kernel oopses and crashes I am likely going to trigger in my device driver code.

And one more piece of news: Open USB FXS now has its own USB VID/PID pair. I have not bought a VID (it’s very expensive), however, Microchip have a VID/PID sublicensing program, to which I applied and they responded immediately with the magic numbers 0X04D8 and  0XFCF1. Thanks, Microchip! The code is not adapted yet, however, in (the extremely unlikely) case that someone out there is experimenting with my board and my code, they should from now on avoid abusing the company’s own demo card VID/PID and use the above id numbers instead.

Update Nov 15 2009: currently reading “Linux Device Drivers“, 3rd ed. by J. Corbet, A. Rubini and G. Kroah-Hartman. Thank you, dear authors, for putting this excellent book online; it’s a valuable resource! My goal is to progress through the various book chapters while writing at the same time some basic building blocks of my openusbfxs Linux device driver. Let’s see how fast (and how correctly) I can accomplish that [it turns out that I had better read the entire book before putting down a single line of code]. While reading the book, some interesting questions arise. Here is one: is it better to write a complicated device driver, where I will hide all the internals of isochronous I/O, or is it better to provide a dumb driver which will just implement device files and basic SLIC interworking (like register peek/write)? I guess that reading throughout the rest of the chapters will provide me with some hints as to that. Let’s see…

The (un)holly explanation

July 25, 2009

As the title says, this post is all about the reason why my board sometimes used to produce garbled audio. As of this writing, I am in the midst of fixing the bug, and in some cases I am already able to listen to the complete 15-second audio file from the Asterisk IVR collection (“press one to record your unavailable message, press two to record your busy message” etc.) without a single audible hickup. This is “a small step for the project, but a giant leap for me and my own good mood”! I was nearly depressed with this bug resisting arrest all this long time, but the morale of the story is that in the end, no bug can last forever. An optimistic message, isn’t it?

To recapitulate: until my previous post, I was in doubt whether something was wrong with the board’s firmware or the issue of garbled audio was caused by libusb. Well, as one would expect, the answer is not so straightforward. The problem has a dual cause: it is was caused by a bug in the firmware which was triggered by libusb occasionally delaying a OUT packet.

So, I am going to explain the issue here. If all this is too technical for you, you may skip to the next post (whenever this comes out). There’s absolutely no need for you to go through my bug analysis session. But, if you find it instructive (or amusing, or both) to read stories on how other people blew it, you are on the right track here.

The libusb side of the problem is easy to understand. Ocasionally, libusb was delaying an OUT packet a bit. I am not sure yet why this is happening, but I am not complaining: in a real-life VoIP environment, a lost or delayed packet is something usual. Although this should not happen in a USB environment (with just a one-foot shielded cable intervening between the software and my board), the truth is that it is allowed to happen. Isochronous USB I/O clearly allows packets to be lost in the way. Thus, my firmware should clearly allow for a packet to be missed or delayed. Did it really?

Again, the answer is not so straightforward. My PIC USB configuration employs something called ping-pong buffering, which uses two sets of USB endpoint control descriptors, a set for even transmissions, and another set for odd ones. In my TMR1 ISR code, the thing was designed to work as follows: I had two sets of OUT/IN buffers, one OUT/IN pair for the even ping-pong phase, and one for the odd one. The reasoning was that I would “arm” two USB I/O operations (and OUT and an IN one) using, say, the “even” endpoint control descriptors and the even set of buffers; then, while waiting for these I/O operations to complete, I would do PCM I/O to and from the 3210 transferring data from and to the odd buffers. Then, at the next ping-pong phase, the “armed” even I/O operations for the even buffers would have completed, so I could “arm” new I/O operations using the odd endpoint control descriptors and the odd buffers, while doing PCM I/O to and from the 3210 using the even buffers. And so forth.

Of course, my code provided for packet loss in the OUT direction (that is, OUT from the PC to the board, which is the input direction for the board — confusing, I know, but one gets used to it after some time). If, at the time that the ISR was expecting an OUT packet, no such packet had shown up yet, the code would just ignore the packet loss and fill the respective (even or odd) buffer with PCM “silence” (a bunch of 0xFF’s).

When writing the code, this looked like the right thing to do; however, as usual, doing the right thing turns out to be more complicated than one thought. The first question to ask is, what will happen if libusb misses one packet? So, instead of, say, sending packets 1, 2, 3, 4, 5, 6, …, libusb sends packets 1, 2, 4, 5, 6, … — what will happen then? Supposing that packet 1 corresponds to an even buffer, this is the sequence of events:

1 --> E
2 --> O
 X --> E

, and then, guess what: packet number 4 will not be received, because it’s still the turn of an even OUT operation! The PIC queues up to four USB packets internally, so packet 4 is queued to be received by the next even OUT. However, the ISR tries an odd OUT and sees no packet there. Thus, the next events are

[4 -Q> Ε]
X --> O
4 --> E
 5 --> O

This doesn’t look so bad, does it? And, it should not sound too bad either: only two milliseconds of silence, and then everything is back in order, queued a bit behind.

While all this is fine so far, what will happen if libusb delays a packet but does not miss it? In other words, what if libusb transmits instead of packets 1, 2, 3, 4, 5, 6, …, packets 1, 2, X, 3&4, 5, 6, … ? This means that packet 3 is delayed a bit, so that when the ISR checks for the packet, it doesn’t see it, but the packet is sent shortly after, within the same SOF frame, and the PIC sees it and queues it. Packet 4 then comes on-time, so no loss occurs. Here is what happens in this case:

1 --> E
2 --> O
X --> E
[3 -Q> E]
4 --> O
[5 -Q> E]
3 --> E
 6 --> O

Wow! This looks quite different, doesn’t it? The effect is that all packets are reordered, and they are now being “played” in the opposite order than the intended one, like this: 1, 2, 4, 3, 6, 5, …, until eventually, the same thing happens again, and the order is fixed. This was the notorious bug that had been producing garbled audio (and causing me endless hours of headache and nights of fruitless debugging)!

To resolve this, I am working towards decoupling the odd-even endpoint descriptor handling from the odd-even buffers. So, if a (say) even phase OUT yields no packet, in the next round I re-try the even USB endpoint descriptor, but this time data will be pointed to the “odd” buffer (it does not really make sense to call the buffers odd and even anymore). This seems to work right most of the time, but the firmware still needs more fixes here and there. But things look (and sound!) definitely better now than before.

This was the (un)holy explanation behind a difficult and elusive bug. It certainly feels nice to leave such things behind and go on. So, what’s next? But of course it’s Linux, and Linux-wards my footsteps are driving me! But more on this in my next post(s).

Update July 26 (later on): here is an example of mis-sync between the PC and my board, as captured by the USB sniffer (you may need to click on the image to make it appear full-size):

missync

In each row sniffed, you can see two IN packets, each starting with the “magic sequence” 0xBA, 0xBE (I call them “the babes” :-)). Next to that, you can see a 0xDD for “odd” packets or a 0xEE for “even” ones (the change I made does not affect IN packets, so this corresponds to the ping-pong phase of the PIC USB system). Then, byte #3 (starting from zero, the first boxed column of packets is #3) mirrors the sequence number of the respective OUT packet (so, under normal conditions,  it is incremented on every IN packet). Then, bytes #4 and #5 (on odd packets only) reflect the value of TMR3 at a specific point during the TMR1 ISR sequence which occurs once before an odd packet is transmitted (the value is in little endian byte order). Subtracting each number in this sequence from the previous one yields the constant number 0x5DC0, or 24000 decimal (which, multiplied by the 83,333+ ns that each PIC clock cycle takes, yields 2 milliseconds!). The next byte (not boxed, has the value 0x5) on odd packets reflects the number of OUT packet misses since last reset of the board. The last boxed column, byte #7 on even packets only is a IN sequence number, which is incremented independently on each even packet.

Now, observe that the buffer contains only 31 packets, whereas it should contain 32. The 19th packet, albeit correctly an odd one, contains a wrong sequence number (thick gray box, value 0x12 — the expected value is 0x52). The next packet is again an odd one! This is definitely a mis-sync or other similar issue on the side of libusb, because the firmware cannot repeat transmission of an odd packet. This second odd IN packet continues correctly the sequence of mirrored OUT packets (0x52)! In other words, no OUT packet was missed (and byte #6 of even packets is constantly 5, or else it should have been incremented). But now, subtracting 0x64F9 (TMR3 value on last correct odd packet appearing on the right-hand side) from 0x2079 (TMR3 value on first odd packet appearing on the left-hand side), one gets 0xBB80 (decimal 48000, or 4 milliseconds)! This means that an even IN packet is missing altogether, and this is also proved by the sequence numbers of even packets (the boxed sequence on the left-hand side ends at 0xAB, and the one on the right-hand side starts at 0xAD).

In my opinion, this shows quite clearly a mis-sync situation between the board and libusb. In this case, the board has recovered without any audible distortion. What is still puzzling is why no packet appears to have been lost in the OUT direction. If, by reading this, you can suggest an explanation, I ‘d be happy to hear about it.

Interlude II

June 30, 2009

It’ s the second time through the lifetime of this blog that development has slowed almost down to a halt. Just like in my November 2008 posts, both project-related and real-life conditions contribute to this. From the project’s standpoint, I have been offered a proposal for volunteer help, and have decided to construct a few more prototypes, so I can send over (a working) one to start development in parallel [BTW, should anyone else be interested in volunteering, please drop me a note]. So, I have swept the dust off my stash of PCBs, and ordered again the necessary materials to build three more boards (my only prototype is showing some signs of bad behavior on occasion, so building a few more prototypes to have handy does not sound like a bad idea anyway).

There are news from the real life front as well: our newborn second son has just arrived home! As you can imagine, a lot of non-openusbfxs-related tasks keep me busy most of the day, and for the rest of the time, I have to try to fill some gaps for my real-life job. Although all this leaves little time for the project now, I hope that things will gradually settle down and I ‘ll have the chance to pick up again the soldering iron.

In the meantime, I have uploaded all project-related code, this time to Google Code for people to see, and possibly comment, use, or contribute to (I decided moving off sourceforge.net, however the policy there is to keep code around, so the old — and possibly, copyright violating — code is still around; confusing, isn’t it?). Some of the uploading work included removing parts of code that are clearly Microchip’s and just provide just a few source patches to indicate my own changes (all binary PIC code is readily available however). That, plus adding appropriate copyright and license notes everywhere kept me a bit busy the last few days.

In the trial-and-error front, the situation is much as I described it in my previous post. This means that I cannot get a good voice quality yet: initially, and for one or two seconds, audio comes out OK, but then it looks as if synchronization is lost or packet losses occur. Then, if the audio file is sufficiently long, it seems to synch again for a while, then I get losses again, and so forth. This is weird, because I have been mirroring the sequence numbers of the OUT packets (the ones sent from the PC to my board) into the IN packets. By observing the mirrored sequence numbers, it seems that no loss occurs. Not a single packet is missed or reordered (well… almost; see below). I cannot tell anything about strict packet transmission timing, since the sniffer does not give me any real visibility into what happens on-the-wire.

In addition, I got a weird finding, which could be related to the intermittent audio quality. During the first two libusb isochronous receive (reap) operations, the first two packet slots are empty (remember that, depending on the buffer size, a buffer consists of several 16-byte packets, and each 16-byte packet contains 8 bytes of audio data, plus other control information). It is normal to see the first two slots empty: because of ping-pong buffering on the PIC, it takes two packets’ time to get the first OUT sequence number mirrored into an IN packet. Again, this happens for the first two reap operations. However, from the third reap operation and onwards, the first packet slot in each receive (IN) buffer seems to be occupied with the packet that comes next to the last one in that buffer! In other words, it seems that libusb returns the IN buffer one packet too late, while it resets some internal pointer, so that one packet more than the expected number appears, and this extraneous packet is placed at the beginning of the buffer. Unfortunately (from a debugging standpoint), the sniffer cannot tell me whether this is happening too in the OUTgoing direction: the sniffer “sniffs” data at the user-kernel interface, and not on-the-wire, so it is hard to tell actual data gets transmitted on the wire. Obviously, this is why debugging this issue is hard.

All this remains a mystery to me. There are two plausible scenarios: either my board loses synchronization with the SOF pulse, or libusb occasionally shufles OUT packet data, much the same way as it does with the IN packet data. I am in favor of the second scenario, not just because I am the proud designer and builder of my board, but because it makes more sense. I have measured inter-packet time using TMR3 (you have to look at the ISR code to see how), and the time is constantly 24000 PIC cycles per each two packets (I am reporting the time only on even packets — or is it only on odd ones, it eludes me now), which amounts to 2 ms exactly.

On the other hand, these 2 ms are counted using the PIC’s own crystal-based clock frequency as a time reference, and drifts between that and the PC clock may well exist. But my feeling is that such drifts would not be so important as to cause a loss-of-synch condition every few seconds.

Moreover, if it were a mis-synch issue, periodic re-synch would recur every now and then in fixed time intervals. This is not the case, though: re-synch occurs in seemingly random periods. Sometimes, many seconds of audio comes out without distortion, while in still other cases, I get garbled sound right away. Moreover, by fiddling with libusb buffer sizes, I have managed to get better or worse results. All these tend to put the blame on the side of libusb (or my userspace code) rather than on the board’s side.

In any case, my plans are to abandon further development using libusb and see how I can write a linux kernel driver. Presumably, there I will have less undesirable effects such as the .NET garbage collector or some libusb bug getting in my way. On the other hand, the code debugging cycle will be considerably longer. But I am not worrying about this. Time will show.

In any case, switching to a Linux kernel driver will take some time; in addition, I need to assemble a few boards, so more time there. Hang on, though, it will happen in the end. And, let me not forget to mention that all kinds of help are greatly appreciated!

Hello, World!

May 15, 2009

For many months have I been imagining the day that I would write a post with this title.

Now that I am actually typing the words, it feels weird: I have reached one of the most important milestones in my Open USB FXS “pet”-project, and yet I suddenly realize that the road ahead might be quite longer than the road behind. This is the exact feeling I had the day I graduated high school, or the day I finished with my military service (which is mandatory in Greece where I live), and that feeling goes something like, “it wasn’t that hard, was it?” — not to mention, of course, that what’s going to follow will be definitely harder. Anyway, enough with all this crap, let’s jump into the details that I know every reader is looking forward to.

As it is obvious from the title of the post, I made my board utter its first “Hello, World!” sound! My board has passed its PCM language exam. I am not sure whether it has gotten an “A” grade (see below for more on that), but nevertheless, the simple “Hello, World!” voice message is reproduced loud and clear. How did I reach here, though?

My previous post, “time (in a bottle)” was devoted to the details of rewriting the TMR1 ISR routine. I won’t return to those details, with one exception: the position of the FSYNC pulse. It seems that moving this to the 32nd cycle (end) of the sequence instead of the first cycle (start) of the sequence was not necessary. That was due to my hasty re-reading of the timing diagrams on p.55 of the Si3210 datasheets. In reality, the chip starts counting bytes from the rising edge of FSYNC. I did not notice this until late in my tests, and there’s more on this later on. However, from a subject point of view, this belongs to the previous post, so I thought about mentioning it first.

So, again, how did I reach here?

It seems that the most important challenge was to create a good PCM audio communication path between the board and the PC. This required understanding and implementing USB isochronous transfers. So let me start with these.

The USB standard defines various types of isochronous endpoints, namely, synchronous, asynchronous, data, and feedback. An asynchronous endpoint means that the USB device has its own clock and the host may send (in an isochronous manner) more or less data per frame to match demand (e.g., of some codec with variable bit rate). This was not my case. In contrast, a synchronous endpoint always transfers a given amount of data per frame, and the USB device operates synchronously to the USB clocking. It looked like I was in this category. A data endpoint sends data (my case), whereas a feedback endpoint sends feedback (to e.g., increase or decrease the amount of data per frame — not my case).

Time (in a bottle) describes what it takes for synchronous/isochronous transfers to work correctly from the viewpoint of the board: the 3210 PCM highway I/O tasks and the USB I/O had to be strictly synchronized and timing had to be rigorously checked to run equally fast on both sides. All this were done via the TMR1 ISR.

However, from the side of the PC things were a bit more complicated. The user-level API of libusb-win32 that I finally worked with required my PC  “driver” program to pre-queue a buffer and wait for the transfer to finish. However, as I was soon to find out, if I waited for a queued buffer to finish and then re-queued the next buffer, the library was missing frames. So, at every point in time, I had to have not one, but two buffers queued. One buffer would be the one transmitting, and the next one would be just waiting so that no gap would occur when the first one ended. One exception to this “always-two” rule seemed natural: when a buffer was drained out, I had only one buffer queued while I was preparing the next one to be queued. Here is an outline of how this works in a loop:

prepare (buffer1);
context1 = LibUSBenqueue (buffer1);
prepare (buffer2);
context2 = LibUSBenqueue (buffer2);
current context = context1;
#transferred packets = 0;
#last time;
while (true) {
    #packets transferred from current buffer = LibUSBcheck (current context);
    if (#packets transferred from current buffer > #last time checked) {
        increase #transferred packets by #packets transferred from current buffer - #last time checked;
        #last time checked = #packets transferred from current buffer;
    }
    if (#transferred packets == desired) break while loop;
    if (#packets transferred from current buffer == (buffer size / packet size)) {
        LibUSBfinishWith (current context);

        if (current context == context1) {
            prepare (buffer1);
            context1 = LibUSBenqueue (buffer1);
            current context = context2; // which has been previously queued
        } else {
            prepare (buffer2);
            context2 = LibUSBenqueue (buffer2);
            current context = context1; // which has been previously queued
        }

    }
    #packets transferred form current buffer = #last time;
}
LibUSBfinishWith (current context);

This is not exactly what you could call “simple”, especially if one takes into account the details of filling in buffers with data from a file (appropriately sliced into packets with headers and payloads), plus the fact that this whole thing has to work in parallel for both read (IN) and write (OUT) USB transfers [to be honest, this requirement is not necessary for a “Hello, World!” test which sends one-way audio, but naturally one needs to make sure that both-way audio would work in principle, hence the double effort].

Of course, it took days and days of trial-and-error to get the above thing working. Sometimes I was missing IN packets, and I had no idea why. Some other times, the whole thing stalled on me, and again, I had no clue as to what the error was. Anyway, as soon as I had this working, I pushed all the gory details into a “SendAudioFile” function, and tried that inside a test sequence within my driver PC program. The test sequence went like, “initialize the board, then ring once the phone, then wait until the phone goes off-hook, then SendAudioFile”, then check again on/off hook status, if off-hook SendAudioFile, etc., etc., ad infinitum.

At the beginning, I just heard an acute sound with a steady frequency. After some thinking, I attributed that to the fact that my buffer size had changed from 64 bytes (its size in my previous ISR version) to just 8 bytes. So, whenever I was not sending any PCM data, the ISR kept repeating the same 8 bytes every millisecond, and this produced a pattern that repeated itself every millisecond. This is 1kHz, and yes, I was listening to an 1-KHz pattern! So everything was in order there — of course, apart from the fact that this whole thing meant that my board was not receiving any USB data.

Activating the test pattern b10101010 (or, 0xAA, if you prefer) that pre-existed in my code made the accute sound disappear; so this means that the PIC-to-3210 path was likely OK. Then, I scrutinized the ISR code, just to find that I had left in a #define for debug purposes, which was bypassing the OUT part altogether. Fixing that gave me an audible result full of noise, cracks and clicks. Definitely better than the 1kHz steady tone, however something was still wrong. What?

It took another day or two until I tried a longer message from the collection of asterisk’s IVR sounds and noticed that I could distinguish periods with more and others with less noise, in the familiar pattern of a voice that speaks, then does short pauses between words and longer ones between phrases. This led me into thinking that the audio was passing halfway through, and the culprit for that was the position of the TXD/RXD pulse stream with respect to FSYNC. So, after a better look at the timing diagrams of the PCM highway in p.55 of the 3210 datasheet, I set the RXS and TXS registers to 1 instead of zero, meaning that the audio pulse train was one PCLK-cycle later than FSYNC. And after that, in my next test, the phone rang and “Hello, World” came out, loud and clear!

It goes without saying that there are still issues. The most important issue is that audio occasionally comes out distorted, in a way very familiar to the ear of someone who has been using VoIP systems: some packets come in late. I need to do somewhat more debugging for this (e.g., increment a counter whenever no data is received in-time and check for its value). Then, I ‘ll see what remedy I can find.

Another issue (now resolved) is the acute 1-kHz sound: in periods where no actual audio data are sent, the board just kept repeating the last two packets (two, because of ping-pong buffering), which resulted in the now-well-known 1-kHz acute sound. By changing the ISR execution path where no data were received into replacing the received data buffer with 0xFF’s (u-law silence), this has now been solved.

So, having reached this milestone, I ‘ll allow myself the few beers (and the few days off the project) that I feel I deserve. Then, probably I ‘ll go and fix some bugs (the packet loss issue and another annoying thing where, on occasion, back-on-hook is not recognized by the board) and move my way on to either first dealing with DTMF and then writing a Linux driver, or writing the Linux driver right away. Bear with me, folks! It seems that I might make it in the end! I still need to publish code, updated timing diagrams, and updated BOM — I have been neglecting these, I know — but I ‘ll do it eventually.