Archive for June, 2010

Power games

June 28, 2010

This post is being typed on my mobile phone, thus it will be shorter than I intended. However, my laptop, on which I have done all the work so far, is off to the repair shop. In the meantime, I have worked for a few days on a desktop owned by my employer.

By the time my laptop decided to die on me, I had already received all the required materials for 50 prototype Open USB FXS boards. I could just as well start shipping DIY kits right away. However, I thought it would be wiser to first verify the newly-ordered parts and PCBs by assembling about a dozen boards. So (a bit reluctantly, to tell the truth), I re-created an elementary development environment on the desktop: Eagle, then Microchip PICDEM environment, then Wubi and finally Dahdi, svn-get the driver and compilation. All went fine.

Then, I started getting busy with assembling the boards. The first two annoying things that I noticed were that I had ordered wrong C5 and C6 (220nF instead of 22) and that, although I had ordered correctly the part number for Cusb (220nF), the little plastic bug from Mouser contained some unknown SMD part with eight contacts (“pins” would not be appropriate here, since contacts were pin-less, just like in QFN packages). I decided to proceed without C5/6 — since these are just an HF filter, I could add them afterwards — and to use the wrong C5s (220nF) as Cusb.

Then the Judgement Day arrived, when testing would tell apart good boards from bad ones.

First thing, I had to bring up the pre-programmed boot loader by switching on S2. This gave me the first creeps: six boards refused to work. Checked better and, false alarm, it was one of the S2 pins that had been left unsoldered. So I corrected that and flashed the latest firmware on all ten boards. Then, elementary testing gave the following results: one perfectly good board, one that refused to pass the Q6 calibration step, and eight boards with more or less severe audio issues — clicks and buzzes right after tranition to off-hook, that were mostly disappearing after a few seconds of operation.

This was unheard of; 90% failure rate meant that the prototypes business would not work and, besides that, it also meant that a serious issue existed, most probably a design glitch. Debugging again was the only way to go.

First I focused on the board with the Q6 issue. That one proved easy to solve: a short between R1 and R2 was relatively easy to spot and remove, and after that, the board initialized OK, but then presented audio problems, just like the rest of the problematic ones. The failure rate was still 90%. This felt really despairing, so I decided to resort to the oscilloscope again — with help from some colleagues whom I ‘ll remember to thank publicly some day.

First thing we noticed was that Q7 was getting very hot (70-90°C) when the phone set was off-hook or when the phone was rung. This told me something about power, since in the ringing and off-hook states the DC-DC converter must provide more power than in a standby state.

What the oscilloscope revealed was truly interesting. If you consult my early post “Chasing an elusive bug around”, you ‘ll see a shot of what happens at inductor L1: just as the power transistor Q7 is switched off, the inductor starts discharging in an oscillating manner. This was more or less the same image on all boards. However, in off-hook or ringing mode, the picture changed: there was no oscillation at all, just the on-off square waveform as Q7 was switching on and off. More descriptively, L1 was discharging very quickly and Q7 was pumping too much current into L1 in order to charge it again, thus it was heating up too much.

So now we tested with a simpler tool: a digital multimeter. Measuring the voltage right before F1 yielded a mere 4.8V. When in off-hook state, however, the voltage was dropping down to 4.2V! For the current settings and components used in Open USB FXS, Silabs’ spreadsheet mentions 4.2V as the absolute minimum Vunreg, in that below that voltage, the DC-DC converter does not work at all!

By using a Type-A to Type-A USB extension cable and interpolating the multimeter in mA mode, we measured 400 mA in the idle state and 440mA in the off-hook state. This ruled out the overcurrent case. Thus, it became obvious that the desktop’s USB ports were unable to supply the required 500mA at 5V that my board required (lawfully, as per the USB standard). With the overhead of the extension cable and the amp-meter, sometimes the voltage was dropping that much that the board was rebooting!

But why would this result in audible clicks? It is simple: the 3210, seeing that the converter is not performing as it should, attempts to restart it from time to time. This results in a “click”, as the voltage on the line drops momentarily. The restarting frequency changes as the line conditions change (e.g., as capacitors inside the phone set get charged), so clicks may eventually disappear. Anyway, under these conditions, the circuitry does not operate as it should.

What’s the bottom line of all this? There are two answers. The first one is that the USB ports (or, most probably) the power supply of the desktop that I used were inadequate for the power requirements of the board. However, this is the easy answer. The difficult one is the circuit’s design parameters could after all be a bit marginal and this may cause failures on situations like this one. This must me remedied in a way or another. Until I get my laptop back I cannot do much (typing all this on my mobile is already an overkill), but I would be delightful to receive suggestions from you, the readers.

Advertisements

Open USB FaXSuccess

June 24, 2010

It all started when I came into discussions with the first company (owned by friends) that I have mentioned in some previous posts of mine. These gentlemen mentioned fax as a possible commercial use of my adapter. And it’s true: even in the case of an Asterisk installation with IP phones, one always needs one (usually one is enough) FXS port to connect a good, old analog fax machine.

So the question came to me: can Open USB FXS do fax? From this — misleadingly simple — question, a whole new adventure in the deep seas of debugging was born. I still do not quite understand in depth all the issues that arose but, since my stories usually have a happy end, I managed to resolve these issues. Maybe some of the readers will understand more than I do, and they are welcome to submit their comments. Good, let’s start then.

A primer on fax in VoIP systems. Usually, this is a no-no-no-dont-do-it thing. Fax machines are designed to work over the TDM network, under the assumption that there are no such things as packet loss or jitter. In contrast, VoIP systems operate under the exact opposite assumptions: packets may be lost, delayed or even get duplicated by the network. For these very reasons, it’s permissible to pack audio samples into jitter buffers at the receiving end and play all sorts of tricks to make the audio result “better” in case of losses etc. One common such trick is, for example, to repeat the last audio sample in case the input buffer is empty. This will “sound better” than sending a sample of silence. Thus, implementing tricks like this one, drivers for devices like Open USB FXS allow slack in various degrees. This should not be the case for fax, though. When dealing with fax, a single lost packet can have a detrimental effect in the reproduced image, or even cause the entire transmission to fail.

For this reason, ITU-T has come with the T.38 standard. According to that, two terminal VoIP systems that wish to exchange facsimile messages terminate the fax encoding realm locally by decoding the signals that the sending machine transmits, and then exchange the decoded contents as IP packets, using UDPTL. How about Asterisk and T.38? Currently, there is no “native” support of T.38 in Asterisk, but various commercial add-ons and public patches (such as this one) claim to add the necessary functionality.

What this situation leaves us with is the local (pseudo-TDM, in Asterisk jargon) packet handling by the driver. In other words, if T.38 handles (successfully or not, that’s another discussion) the network-facing part of the problem, the device-facing part — the device driver — should handle audio packet transmission without a single loss. Did Open USB FXS fulfill this requirement? An idea, suggested by a friend, was to try local fax transmission between two FXS ports on the same machine. I was confident enough that Open USB FXS would pass the test easily.

I was wrong. The two fax machines that I tried were unable to pass the initial negotiation. When I eavesdropped the phone line, I discovered why. The tones that the answering fax machine was sending were arriving badly clipped and distorted. But why was that? Voice audio was delivered fine; why would the fax tone be clipped?

The answer was simple: high frequency. A quick test revealed that high frequencies were somehow “filtered” by Asterisk when I used Open USB FXS. It did not take me long to understand the next culprit in the chain, multi-sample packing per URB. My driver tries to be smart and make best use of available system resources, in order to perform acceptably even on low-power CPUs (I am emulating such an environment using VMWare). In order to do that, it packs multiple 1-ms audio samples into a single URB [see why this is more efficient by consulting the discussion at the end of this post]. When the URB completes, the driver passes all samples together to the dahdi core (and from there on to Asterisk).

Apparently dahdi did not like that. I took me quite some time to test all possible combinations of driver parameters, until I finally noticed that the clipping effect was lesser when I was loading the driver with only two samples per URB (using the “rpacksperurb” and “wpacksperurb” parameters). Still, the quality was not quite acceptable for fax transmission, but at least one could make out the familiar sound of an answering fax machine sending various tones. This encouraged me to look deeper and test further.

The number two (two samples per URB) seems to be related to the buffer depth of dahdi, which is exactly two samples. So it’s no wonder that submitting more samples at once to dahdi screws things up, because it results in newer samples overwriting older ones before the latter get the chance to be read off the buffer. Interestingly, there is no user-side way to change this, because although there is an ioctl() to change the number of buffers in a channel, a subsequent close() resets this to the default number 2. Even more interestingly, changing 2 to a higher number (8) in the dahdi source and recompiling dahdi did not make the high-frequency fax tones sounding better. I declare my ignorance as to why this happens, but changing the dahdi source did not feel like the way to go, so I went back into trying to fix my driver instead of fiddling with the dahdi code.

I tried with one sample per URB, but the results were still not what I was hoping for. Sometimes I was getting good results, sometimes not, and I could not understand why. It was only then that I finally thought about removing the dahdi_dummy timing module. Dahdi_dummy employs the high-resolution timer available in the x86 platforms to “tick” the dahdi core once per millisecond. However, this “ticking” is not synchronized in any way to the pace of interrupts arriving from USB. I am not sure why, but this caused still bad quality audio in high frequencies, even with a single sample per URB.

To make things worse, the default initialization script /etc/init.d/dahdi checks for hardware dahdi channels, and if it finds none, it loads automatically dahdi_dummy. I have the bad habit of plugging my USB dongles after the system boots and watch them initialize through dmesg messages), so dahdi_dummy was loaded by default. This caused me much confusion, because I got some good results which were worse again after a reboot, and it took me a lot of bumping my head on the wall to find out why: it was dahdi_dummy, which was being reloaded after the reboot, and I was forgetting to remove it.

After all that, I tried again the test with the two local fax machines. This time everything went fine. I sent over an image (an ad of a Bob Dylan concert, you can make out the greek letters on it) from a Samsung FS5100. Here it is:

Here is the transmission report, showing what the sending fax machine sent to the other side (the handwriting is mine). Note in particular the noisy parts (e.g., the dots inside the white letters “Bob”), which are already present in the scanned image:

Finally, here is what the receiving part got — a perfect reproduction of the transmitted image:

From all that, I guess Open USB FXS deserves the title of this post, F(a)XS(uccess)! However, here are some final notes. In contrast with a PCI(e) driver, where a few outb/outw instructions suffice to handle an interrupt, submitting an isochronous URB is a very expensive operation. It takes a considerable number of CPU instructions, and this is mainly because the driver must make sure that the submitted URB does not take up too much out of the available bandwidth on the USB bus. Especially when plugging more than one Open USB FXS devices into the same computer, this could result in bad performance, loss of USB timeslots etc. On my VMWare testing platform, this is apparent, in that the kernel reacts slower than required and misses USB timeslots, resulting in audible “clicks”. I have looked extensively into the kernel source for UHCI, trying to find a way to get an interrupt per USB frame (not per URB completion), but it seems that there is no such thing. In theory, I could use in my driver the HR-based “ticking” of dahdi_dummy and test the first submitted URB for per-packet completion, even with more than one samples per URB. However, this seemed to me very complicated, so I let it go for the time being. Other drivers (Astribank xpp for example) seem to do something like that. Maybe that’s the way to go in the future.

I am now getting back into the prototypes hell (which BTW have some issues, and this is why I am holding back the whole release phase), but there are lots of things in that hell to blog about, and this would deserve a different post. Soon to be, I hope, along with the announcement of public availability!

Prototypes and ordering

June 15, 2010

Great news! I have now received all necessary materials for prototypes, although I have not yet had the time to deal with their assembly. In the meantime, several tests and trials keep me busy (but more on these on my next post).

As for the prototypes, I am setting up a page with information on how to order. It may still contain inaccurate information, but I am confident it contains most of the essential parts of the procedure.

I want to confess that it felt  embarrassing to write all these disclaimers at the end of the page, but after all I guess it is best to explain what people should expect from a prototype. The bottom line is, it may work or it may not work for you (and especially if you build it yourself without the necessary experience and tools, it is very possible that it will not). Even ready-made tested boards may fail to work with some USB host controllers, or with platforms other than the x86 (the latter for performance reasons).

Bottom line, your mileage may vary; you are buying an exotic — though cheap — gizmo to experiment with, not a consumer product. It is always good to remember that I am not attempting to fool anyone into buying and I am not making any financial profit out of you buying prototypes. The one who really benefits from spreading out prototypes is the project itself, not my own wallet.

Within this week, I ‘ll be able to set the final price for DIY kits (assembled boards will follow). Note that it may be wiser to wait a bit for a couple of boards to be assembled before you order a DIY kit, at least to verify that the current lot of PCBs and materials are OK. You never know, there may be some part that is not appropriate or a glitch in the PCBs. One working board will disprove all such cases, so when I announce that, it will be safe(r) to order DIY kits. If you cannot wait, however, you may just as well order a DIY kit before I assemble any boards. It is up to you.

I will not write any further posts on prototypes, except perhaps for announcing the availability of assembled boards. All this story has consumed too much of the energy that I had available for this project, and there are lots more to do. The ordering page will be updated though, to reflect changes, so you may consult it to see what is new.

I truly hope this whole prototype business will go OK and it will end up with you, the readers, doing happy experiments and contributing to the project your experiences, your firmware and driver patches, and your observations on bugs that need fixing. Time to wish good luck to all of us — and and of course to the project!

Are you serial?

June 4, 2010

Materials from my prototypes have arrived for the most part. I am still waiting for the PCBs, but, since I am not exactly the type of person that would sit idle in the meantime, I decided to torture my mind with a set of optimizations for the kernel driver and other hacks.

The first of these optimizations — and the one I am to discuss in this post, hence its title — is also the only one that I have started implementing so far and relates to serial numbers. To start the discussion, I might have to remind readers that USB, among the other device descriptors (device string, manufacturer string, etc.) supports a “serial number” string descriptor.  But wait a moment, why would a serial number be needed in the first place?

The most important use of serial numbers that I was thinking about is called channel number persistence. As discussed in my older post Hello, Asterisk!, the channel number that the dahdi core gives to a board depends on the order in which this board is plugged into the system. Moreover, all other characteristics of a device in Dahdi and Asterisk, like e.g. the local extension number, the echo canceller setup, etc., depend on the channel number. If one plugs two boards in the opposite order, one gets the extensions and all other settings mixed up. Given that I am experimenting with two or more boards which I plug and unplug all the time, this behavior is not exactly what I wanted.

Because of that, I was thinking about this channel number thing. What this would mean is that I could add a module parameter that my kernel driver would understand, something e.g. along this example:
   insmod oufxs.ko persist=<serial1>,<serial2>,<serial3>
and this would have the effect that my driver would pre-allocate some dahdi channels (three, in the above example) and map each of them to the appropriate serial number using a field in the respective device structure. The driver could arrange at startup to mark these pre-allocated channels as inoperative and return e.g. -ENXIO or something similar to any caller for all operations — until a board with an expected serial number is plugged in.

Then, at device plug-in time, the driver would look through its internal list of devices and try to match the serial number of the board being plugged in onto an existing idle channel. (I have underlined idle here, because there is no guarantee that two USB devices will not present the same serial number to the host, and if a check for an idle channel is not made, this would result in the newly-plugged board hijacking an existing channel belonging to another board). If a match is found, then the driver would associate the respective idle channel with the new board, otherwise it would allocate a new internal structure as it does now.

To implement that, I first needed to implement serial numbers in firmware. The default USB stack provided by Microchip does not include a serial number string. To begin with, I was not sure whether the standard defines what format a serial number should be in, but after looking around, I found several examples in which the serial was a (text) string of hex digits, so I decided to stick with that.

The next decision had to do with how to program the serial number. All other device descriptors in Michrochip’s stack are stored in program flash memory. By means of a few additional lines of code, I was able to configure the famous “dead beef” hex string as a serial number and get the host to see that. Good, that proved that I could use the program flash as source for the serial number, but was this what I really needed?

The answer was “no”: if I chose this solution, I would have to manually program the serial number along with the firmware. In other words, I would have to produce individualized firmware images for each board. Not only that, but I would have to somehow remember the serial number of a board and restore that after a firmware upgrade (and firmware upgrades in my boards are pretty frequent as you may imagine).

Fortunately, the PIC has a good answer for that problem. It contains a program-controllable EEPROM memory, where data can be stored. After some googling, I found this hack which gave me enough hints on how to use the EEPROM. My first step then was to move the device descriptor into RAM, still initializing it from program flash as “dead beef”, and make sure that this worked OK. Second step, I tested the EEPROM read functions and got back factory-set data (a series of 0xEE’s).

Then, all I had to do was write a USB primitive for “burning” a serial number on the EEPROM and craft a couple of modifications in my driver. The driver would check for a serial number, and if it found the factory-set string 0xEEEEEEEE, it would use the “burn” primitive to write a new, should-be-unique, serial number in the EEPROM. I coded that, using the current value of the jiffies variable as a source for unique serials, plugged a board and — voilà, the driver reported that a new serial was burnt into the board. Re-plugged the board and, that’s it, the new serial was recognized.

I had only forgotten a minor detail, that caused me a kernel “oops”: old-firmware boards did not report any serials. In this case, the kernel initializes the respective field of the device’s usb structure with a NULL pointer, and I was comparing that NULL pointer against “EEEEEEEE”, resulting in a zero-address dereference. Fixed that, by adding another message informing that this was an old board and the firmware had to be upgraded. Tested again, and everything worked fine.

Here are some dmesg excerpts I know you will like. At first, a new device with old firmware gets plugged:

Jun  2 14:48:09 avarvit-d NetworkManager: <debug> [1275479289.719625]
nm_hal_device_added(): New device added (hal udi is
'/org/freedesktop/Hal/devices/usb_device_4d8_fcf1_noserial_usbraw').
Jun  2 14:48:10 avarvit-d kernel: [  735.306908] oufxs: oufxs_setup:
oufxs1: old-version firmware not reporting serial
Jun  2 14:48:10 avarvit-d kernel: [  735.306921] oufxs: oufxs_setup:
please upgrade firmware on board and replug

Then, the firmware is upgraded to report a serial number (and the “burn” operation) and re-plugged. The driver sees that this is a fresh-from-factory EEPROM, and burns a new serial:

'/org/freedesktop/Hal/devices/usb_device_4d8_fcf1_EEEEEEEE_usbraw').
Jun  2 14:50:19 avarvit-d kernel: [  864.196194] oufxs: oufxs_setup:
oufxs1: no serial on device's eeprom, burning one
Jun  2 14:50:19 avarvit-d kernel: [  864.211314] oufxs: oufxs_setup:
oufxs1: serial written OK, re-plug to activate new serial

Finally, when the board is removed and plugged back again, the new serial is recognized by the Linux USB core:

Jun  2 14:51:01 avarvit-d NetworkManager: <debug> [1275479461.618913]
nm_hal_device_added(): New device added (hal udi is
'/org/freedesktop/Hal/devices/usb_device_4d8_fcf1_6100E926_usbraw').

So now, I am ready to do some serial — sorry, serious — programming on the driver code in order to implement channel persistence. I don’t know if I am going to do this right now, because of other priorities (some tests that I will mention in my next post, and prototype assembly). But it is a feature that I think is needed, and I will add it ASAP.