Archive for December, 2009

A Linux driver’s day in life

December 1, 2009

“There is a great future in store for you and me, my boy — a great future!”
(Diktor to Bob Wilson, in Robert Heinlein’s legendary By his bootstraps)

After some elementary reading, I started writing a Linux device driver for Open USB FXS. Since I devote little time to this activity, it may well take some months. So, I figured that waiting for that long until I post a big “I made it!” article would result in (both me and anyone reading this) missing all the fun. Instead, I thought it would make better sense to write this post initially and update it as I progress through the driver.

I chose to develop my driver on a Lenny system (Debian 5.0.3, kernel version 2.6.26-2). I also chose to host my development system on vmware and, although I have some reasonable doubts about how well is isochronous USB timing going to perform on vmware, all the rest should go fine.

To begin with, the kernel source code comes with a veeeery useful “skeleton” USB driver, which contains all the required functionality for a boilerplate USB device with two bulk endpoints (one IN and one OUT). Based on that, I have already written a 300-line “driver” kernel module which loads and unloads OK, recognizes my Open USB FXS board when plugged in, creates the required /dev entries, checks the available endpoints and their sizes and produces an error on every other attempted operation (like opening the /dev/openusbfxsN file). Plus, it produces lots of unneeded debug information. Plus — didn’t I mention it before? — it doesn’t “oops” anywhere yet (the Linux kernel jargon word for doing something semi-fatal, like accessing a zero pointer). Plus, I think it does not leak memory and minor device numbers and it doesn’t lock-up the kernel — at least, not yet. Great job, right?

My next steps will be to implement (read: copy-paste from the skeleton example with necessary adaptations) the required basic functionality like device open, release etc. From then on, some design questions are arising. For example, what would be the actual read/write syscall functionality? Who would do the board initialization (the driver, or a userspace program)? Should I provide a 3210 register view, or just ioctl’s for events like on- and off-hook? And so on… So far, I intend to push the initialization code inside the driver, and (later on) write a couple of ioctls for re-initializing and/or using different initialization parameters without reloading the module. And, as far as the read/write functionality, this ought to be translated into PCM data transfers from/to the board resp., which means isochronous transfer. Probably something that can be done via two circular buffers and some code to schedule events around the two continuous IN/OUT streams.

Updates to this post will  follow sooner or later, as I ‘ll be progressing through these steps within my driver. Whenever I get to having something that can be considered working code, I ‘ll update accordingly the project’s Google Code repository. So, bear with me, there’s a great future in store…

Update, December 3: I have managed to talk to the board using a simple blocking primitive (usb_bulk_msg()). I am now working on the initialization of the board. I chose not to defer initializing the ProSLIC until someone open()s the device, because this would block the caller for quite some time — or even forever, if the device refuses to initialize properly, as in the case of the DC-DC converter refusing to power-up (exactly what my board has chosen to do these days, halas!…). So, I chose to delegate this task to a worker thread and, unless the device has initialized OK by the time the open() occurs, fail the system call with -EBUSY or something similar. So far, the worker thread communicates with the board OK, although I am a bit concerned about possible race conditions that my design choice leaves open. For the time, however, another issue is puzzling me: if I leave the board plugged in and rmmod-then-re-insmod my driver module, the board refuses to answer to commands over the bulk USB EPs. I am debugging this and hopefully I ‘ll find out a solution. A possible idea is that rmmod’ing the driver somehow stalls the endpoints on the device, but this definitely needs further investigation. I ‘ll report results whenever I have any.

Update, December 6: I have resolved the “board not responding” issue. It seems to be due to some vmware peculiarity. A few seconds later, after a couple of timeouts, the board responds OK. So I am now in the middle of copying all the initialization functionality from the “console” version of the Windows userspace “driver” program I have written into the new driver’s worker thread that initializes the board. In other words, things are going quite as expected so far. I guess the next mini-challenge will be to implement the isochronous read/write functionality using the lower-level URB I/O kernel API instead of the higher level functions I am using to read and write registers. Time will show.

[Which reminds me of a note I should have written in the beggining of this post about my quotation from By his Bootstraps: it feels like the “great future” of my Linux driver (“Bob”) is to repeat the same old functionality that already exists in its userland cousin (“Diktor”). But without the prospect of writing “Bob”, the Linux version for Asterisk, “Diktor”, Bob’s Windows userland cousin, would not have been written at all. Confusing? Well, do read Heinlein’s short novel if you haven’t already, and these funny windings of circular — or better still, spiral — logic may sound to you more familiar by the next time you visit.]

Update, December 9: the board initialization is complete, but the hard part is really starting now. My first step will be to implement the write() syscall. Since I am writing a char device driver, a userland program has the right to open /dev/openusbfxsN and then start writing PCM audio data one-byte-at-a-time; expensive as writing data in this way may be, a fast CPU system can cope with that on time, and my driver must support it. To map even this clearly degenerate case correctly onto the mechanics of isochronous USB I/O requires me to (a) pre-buffer data, waiting until a “good number of samples” becomes available; (b) packetize these data; (c) enqueue packetized data as URBs for isochronous submission; and, (d) block the userland caller program if no buffer space is left in the buffer or if a high number of URBs have already been submitted.  What exactly does a “good number of samples” stand for? My userland experience has shown that, queueing URBs with just a few 1ms-“packets” each results in poor quality. Maybe the kernelspace implementation will not be as sluggish as its userspace analogue, however, submitting many small-size URBs may still be a bad idea, because it adds unnecessary load to the system. Besides that, some day I may have to perform echo cancelling in the driver, so keeping a buffer around does not quite sound like the wrong thing to do. On the other side, pre-buffering too many samples will result in a noticeable delay. On the top of all this, the pre-buffering stage may act as a jitter buffer, which will amortize the variable delays between network packet arrivals; but then again, is the kernel driver the right place to place a jitter buffer, or is it better to put it in the “channel driver” code? Well, I guess I ‘ll have to find out the answers to all these questions while implementing the write syscall (hasn’t that always been the case in this blog, after all?…).

Update, December 13: I already dumped my first attempt to implement the write() system call. I figured I could do it in a totally asynchronous manner, without any sort of regularity like fixed buffer size or scheduling, both in the user-side part (the openusbfxs_write() function) and the back-end (urb submission function and completion callback). It turned out that this approach incurs too many subtleties in synchronizing things around, plus it is too complex and wasteful in terms of system resources. So, my second approach is to arrange for a way of pre-allocating fixed-size buffers and schedule a semi-constant-rate buffer submission routine in the back-end, while trying to fill at least one such buffer on-time in the front end. If a buffer is only partially filled by the time it is scheduled to be submitted, it will be sent out with as much data as it contains; and, if no data exists at all, no buffer will be sent out and the board will play a “silence” period on the phone set (which will result in an audible “click”, so maybe I ‘ll think of something better for that case). BTW, this whole complexity makes me understand why other boards and drivers do not employ isochronous pipes at all… Instead, the approach there is to enqueue samples and send a bunch of them to the board every now and then — less often than once per microframe, I guess — using bulk usb transfers. Oh, what the heck, if isochronous doesn’t work out for me, I could resort to a method like that as well. But for the time, I am sticking with my current design, and hope for the best. The next update will show.

Update, December 16: I ‘ve got a first version of isochronous write to work. The principle is to have a small number of buffers always submitted for isochronous transmission, and schedule a new buffer for transmission each time the transmition completion callback is invoked. A somewhat complicated per-buffer locking/state marking scheme ensures synchronization between the write() syscall implementation and the background transmission mechanism. Although this “somewhat complicated locking scheme” caused me a couple of total system hangs in some early versions of the code, it finally worked OK. I decided to stick to this, because it ensures minimal friction between background and foreground threads, which is essential in guaranteeing that I ‘ll have no delays or losses. VMware seems to cause no problem at all, since a sniffer on Windows shows my 512-byte buffers being transmitted OK with the appropriate inter-buffer delay (32 milliseconds for a 512-bytes buffer, corresponding to 32 packets of 16 bytes each, consisting of an 8-byte header and 8 bytes of payload). Underruns are handled fine, transmitting zeros (this could be any other value), and partial buffer writes are dealt with as well — at least in theory they should be. So, my next step, after the necessary code cleanup and some elementary unit testing, will be to implement the read() syscall. Good luck!

Update, December 17: many small fixes here and there, and now the write() syscall works quite decently! It handles well data sizes that are not multiples of the chunk size. Even in the degenerate case of one-byte writes that I have previously outlined, not only does the driver work fine (ehm, that is, after fixing a bug or two that were locking down the kernel), but it also manages to cope perfectly well with the 128kbps rate (twice the PCM rate, because of packet headers) without underruns (and all that, despite of the large number of debugging messages logged at each one-byte write() operation). I couldn’t hope for any better than that! The catch there was to add to the dev structure an additional small (1-chunk-long) buffer where data fragments are stored until a full chunk is accumulated. One thing that I am still missing is to add a mutex in order to keep out a potential second write()r while the write() syscall code messes with the inners of the dev structure [although I implement exclusive open(), my guess is that I cannot really prevent forked or multi-threaded clients from issuing parallel write()s, so it is better to play it safe there]. Probably I ‘ll use two separate mutexes though, in order to allow read()s to proceed in parallel with write()s. Will see.

Update, December 21 (or “Hello, Wolrd!” #2): although by now I should have done read() and some ioctl()s, I thought I ‘d give write() a bit of a finishing touch. The reason I stuck with write() is that I took a look at how Asterisk handles things. Unless I did a very hasty job reviewing the code (which I did anyway), it seems to me that Asterisk moves around  frames that contain both audio samples and control information. These frames are then just written to the devices of the system. Although I saw no obvious rule there, there seems to be a silent assumption that (at least) audio data are always written as integral chunks. Hmm… I then checked David’s villagetelco 8250 drivers. What I noted in those was that there are two distinct devices, one for reading/writing PCM data and one for doing the ioctls (I am crafting a single device for both purposes). But then I had to think a bit about buffering. The other thing that I couldn’t help but noticing in David’s 8250 drivers was that there is almost no buffering at the driver level: data just get written to the chip’s serial output (and from there to the 3250, but it was no use for me to dig that deep). This is virtually impossible with isochronous USB, where some prebuffering is needed anyway to keep things going smoothly. However, my own code was at the right opposite side: lots and lots of buffering in all possible places, just to make sure that there will be no buffer underruns. Under normal conditions, a userspace program would be able to buffer 16 * 32 = 512 (which amounts to half a second of) data samples before blocking. Admittedly, this would be great in an audio playout environment, however telephony stuff should be more responsive, shouldn’t it? So, I tried to parameterize the number of in-flight buffers (i.e., buffers pre-submitted to the USB core for isochronous transmissions) and the depth of each buffer (i.e., the number of samples per buffer). Then, I tried my two tests: (a) plain copying a large audio file and (b) writing the same audio file a-byte-at-a-time with the smallest possible parameters. I noticed a bug when I asked for one sample per buffer (instead, the sniffer showed four of those, and the file took about twice as much to transmit). I haven’t yet found why that happened, but other than that, with two or more samples per buffer, things worked just fine. But my incredulous self wasn’t convinced yet; so, I thought it was worth trying the infallible audio test. This required tweaking just a few lines of code (setting the 3210 line to “forward active” mode at open time and back to “open” mode at release time), and the by-now-familiar Asterisk audio menu was sounding fine! With plain full-file copying, or with a-byte-at-a-time writes, and with all possible kinds of buffering, 2 (min.) to 512 (max.) milliseconds! There was virtually no difference in any scenario. Maybe in the single-submitted-buffer case I heard a couple of clicks, but nothing too embarrassing. [Note that the choice of having only a single buffer at a time submitted to the USB core is quite risky, because there are higher chances that the kernel does not call the transmition completion callback routine early enough to submit a new buffer before it’s time to transmit a new sample. This means that there will be missed transmission slots, or audio “clicks”, every now and then. And this is why doing things too synchronously is rather impossible with isochronous USB — on the other hand, relying to bulk USB would have its own problems as well, one of them being that, because there is no actual guarantee as to when a sample will be transmitted, one would have to re-create the timing information at the receiver, which is what the only DAHDI-supported USB-based card does]. I found just one – audible – caveat: when buffering large amounts of data (512ms), the last write() returns after queuing as much data as needed and the file is immediately closed afterwards. In this case, the 3210 is set to “open” mode before all data are played out, and this produces a noticeable “cut” a bit too soon, before the end of the audio message is actually reproduced. This is fixable, however at this point in time I feel I have spent (though not wasted!) way too much time in the intricacies of write(). So, it’s time I moved along to implementing read() and ioctl() — this time for real!

Update, December 22: Ioctl is doing well now. I can now probe the board for hook state and set the linefeed mode to open or forward active. A caveat however is that, because the hook state query ioctl is implemented using bulk USB I/O, I cannot use that ioctl to check the hook state while PCM audio data transfer is active, because if I do this garbles the sound altogether (in theory, it should not, but from theory to practice…). This is why I had already planned to provide the two most important events, hook state and DTMF, in continuous mode in a packet header field reserved for that purpose. It took me half a day to remember some intricacies of the firmware (which I have written) and to make sure that just polling for DTMF outside the tightly-timed loop of the timer ISR will provide a good sampling rate. If my calculations are right, doing so should result in querying the hook and DTMF state at a rate of about once every 50ms, which is quite OK, taking into account that the minimum duration of a valid DTMF signal is 75ms. Once looked up, the DTMF signal indication will be “latched” in a variable and will be continuously transmitted with every data packet until re-sampled, 50ms later. Thus, even if one or a few data packets are lost, in theory the event should be eventually noted at the host (at least in theory…). Tonight (2 am) it’s somewhat late to test all this, so I guess there’s more to come tomorrow, at least for a simple test. Hang on… Oh, and by the way: a couple of more fixes in write() were needed; I changed spin_lock_irq to spin_lock_irqsave all over, because it seems that otherwise there were situations where I enabled irqs at spin_unlock_irq while I should not, resulting in lockups. The code is behaving quite better now. One more fix (a place where a fixed value was used instead of a module parameter) revealed that, with less than 4 packets per isochronous URB and less than 4 isochronous URBs in flight, the sound quality becomes unstable, with some clicks and interruptions. OK, this means that my driver incurs a delay of 16 ms in order to produce tolerable sound — not too terrible a penalty after all.

Update, December 24: I stumbled upon something… I don’t quite know why, but it seems that doing SPI I/O while PCM audio data transfer is active causes noise and poor performance (clicks and the like) . However, doing SPI I/O in parallel with PCM data I/O is required in order to check the hook and DTMF status… A non-exclusive list of possible explanations is: (a) I am using a variable from the USB I/O banks to store SPI-acquired data, and maybe this causes some short-term memory lock-up in the PIC; (b) it’s just interference, induced to the audio path either directly by the SPI clock and IO signals themselves, or (more plausibly) by the power bus (remember that power is scarce on a USB-powered device) via the power supply line; (c) it’s some other firmware bug. My preferred explanation is (b), but anyway the noise frequency is directly related to the rate at which I poll the 3210 registers over the SPI. Thus, if I check often enough, I am causing audible noise, if not, I am missing DTMF signals (verified that). It seems that the only way out of this (or the “right thing to do”) would be to have the PIC poll the interrupt signal that the 3210 generates and do SPI I/O only when an interrupt is asserted. However, in my hardware design I have not provided for a connection between 3210 pin 2 and the PIC… OK, so now what? After some quick thinking, I came up with this: I can patch my schematic and board to use PIC pin #13 (RC2) for polling the 3210 \INT signal without too much of a board redesign. So my short-term plan is to patch my board (soldering hairline-thick wires on semi-invisible copper pads? Oh, no, don’t give me that again…) and test if things get any better. Will report on results, however, in the meantime, it’s almost eluded me that it’s Christmas Day eve, so it’s time to take a breath and wish Merry Christmas to all readers.

Update, December 24 (later on, and only shortly before Christmas): I hacked my board and the firmware as intended, so now hook state and DTMF detection work as expected! There is a little quirk that requires some fixing though: the 3210 generates a DTMF interrupt when a digit key is pressed on the phone set, however it does not do the same when the key is released. So, the best thing to do in the firmware is to note the interrupt, poll DR24 (DTMF state) once and report the status to the PC over USB, then go on polling DR24 at regular intervals (e.g. once every millisecond) until the key is released. Even if the perceived audio quality is affected, this will occur only during DTMF dialing, so the user should not care too much. Which means, milestone reached, task accomplished, and I ‘ll try to forget about my board and this blog until next Monday. Cheers to all!

December 31 (last update for 2009): I have taken a couple of days off writing my device driver in order to assemble a third prototype (currently, I have only two working ones, out of which the second is somewat problematic in the DC-DC and audio path). However, the third prototypem, while initially working quite OK, eventualy showed some signs of misbehavior around the PIC and its crystal, and thus needs hardware debugging (that I ‘ll do without for a while). The only other thing I ‘ve done was uploading latest versions of the firmware and the device driver on the project’s Google Code page, so anyone interested can see (and perhaps comment on or criticize) the driver code, albeit in an incomplete status (read is missing and ioctls are not quite functional). That’s all folks! (For 2009, that is). Happy new year to everyone!

Update, January 11: It took me some days to write this update, because I am in the midst of preparing one more working prototype for a friend to help me develop and test the code. However, as usual, building a prototype is an adventurous task that eats lots and lots of time. The new board is now in the (usual) early development state where USB I/O works, however the DC-DC converter refuses to power up (as usual). Will need some more debugging and hopefully some day I ‘ll fix it. In the driver front, I have written the isochronous IN “daemon” part, and tested whether the board and the driver can cope with simultaneous isochronous INs and OUTs (yes, they can, in case you were anxious about it). Implementing the read() system call is next.

Update, January 12: Still another nuisance: buried deep inside a table somewhere in p.125 of the si3210 manual, there is a note that the \INT signal is open-drain, which means it needs an external pull-up resistor. Hence my single-wire hardware patch is not enough, and the board becomes sluggish, because it is continuously testing for hook/dtmf status while there is no active interrupt. Going to fix that and read() has to wait a little bit longer. Meanwhile, I am tidying up a little the Linux driver code and synchronizing changes between that and its Windows command-line cousin.

Update, January 16: glad to see that the patch with a 10k pull-up resistor works fine! The board recognizes hook state changes and DTMF in warp speed without becoming sluggish. Good! In addition, read() is almost ready now. That is, the code is there, working and tested; however, until yesterday evening, there were two silly bugs plaguing me (one, read() was returning packet headers along with data and two, on some occasions, I was going off-limit while writing buffer memory, causing weird system hangs and crashes). I have found and corrected both of them, however, before reporting success, I need to prove the fixes correct. And there, I had a minor issue: my home prototype board (I have one prototype board at home and one at work!) did not, as of this morning, have the required “patch” to recognize hook state and DTMF (as a result, the PIC would probe continuously the 3210, inducing audible noise into the audio path). So, verifying the whole thing will have to wait. Nevertheless, I thought that finishing off this “driver’s-day-in-the-life” post is now justified, since ninety-a-lot of the work seems to have finished now. Thus, this will my last update to this post. A new post will follow, summarizing the inners of the driver and discussing the next steps. Soon after it is tested, the driver source will be on the project’s google code page. Thanks for bearing with me, tireless and understanding reader! I hope the result has rewarded your effort!