This post is being typed on my mobile phone, thus it will be shorter than I intended. However, my laptop, on which I have done all the work so far, is off to the repair shop. In the meantime, I have worked for a few days on a desktop owned by my employer.
By the time my laptop decided to die on me, I had already received all the required materials for 50 prototype Open USB FXS boards. I could just as well start shipping DIY kits right away. However, I thought it would be wiser to first verify the newly-ordered parts and PCBs by assembling about a dozen boards. So (a bit reluctantly, to tell the truth), I re-created an elementary development environment on the desktop: Eagle, then Microchip PICDEM environment, then Wubi and finally Dahdi, svn-get the driver and compilation. All went fine.
Then, I started getting busy with assembling the boards. The first two annoying things that I noticed were that I had ordered wrong C5 and C6 (220nF instead of 22) and that, although I had ordered correctly the part number for Cusb (220nF), the little plastic bug from Mouser contained some unknown SMD part with eight contacts (“pins” would not be appropriate here, since contacts were pin-less, just like in QFN packages). I decided to proceed without C5/6 — since these are just an HF filter, I could add them afterwards — and to use the wrong C5s (220nF) as Cusb.
Then the Judgement Day arrived, when testing would tell apart good boards from bad ones.
First thing, I had to bring up the pre-programmed boot loader by switching on S2. This gave me the first creeps: six boards refused to work. Checked better and, false alarm, it was one of the S2 pins that had been left unsoldered. So I corrected that and flashed the latest firmware on all ten boards. Then, elementary testing gave the following results: one perfectly good board, one that refused to pass the Q6 calibration step, and eight boards with more or less severe audio issues — clicks and buzzes right after tranition to off-hook, that were mostly disappearing after a few seconds of operation.
This was unheard of; 90% failure rate meant that the prototypes business would not work and, besides that, it also meant that a serious issue existed, most probably a design glitch. Debugging again was the only way to go.
First I focused on the board with the Q6 issue. That one proved easy to solve: a short between R1 and R2 was relatively easy to spot and remove, and after that, the board initialized OK, but then presented audio problems, just like the rest of the problematic ones. The failure rate was still 90%. This felt really despairing, so I decided to resort to the oscilloscope again — with help from some colleagues whom I ‘ll remember to thank publicly some day.
First thing we noticed was that Q7 was getting very hot (70-90°C) when the phone set was off-hook or when the phone was rung. This told me something about power, since in the ringing and off-hook states the DC-DC converter must provide more power than in a standby state.
What the oscilloscope revealed was truly interesting. If you consult my early post “Chasing an elusive bug around”, you ‘ll see a shot of what happens at inductor L1: just as the power transistor Q7 is switched off, the inductor starts discharging in an oscillating manner. This was more or less the same image on all boards. However, in off-hook or ringing mode, the picture changed: there was no oscillation at all, just the on-off square waveform as Q7 was switching on and off. More descriptively, L1 was discharging very quickly and Q7 was pumping too much current into L1 in order to charge it again, thus it was heating up too much.
So now we tested with a simpler tool: a digital multimeter. Measuring the voltage right before F1 yielded a mere 4.8V. When in off-hook state, however, the voltage was dropping down to 4.2V! For the current settings and components used in Open USB FXS, Silabs’ spreadsheet mentions 4.2V as the absolute minimum Vunreg, in that below that voltage, the DC-DC converter does not work at all!
By using a Type-A to Type-A USB extension cable and interpolating the multimeter in mA mode, we measured 400 mA in the idle state and 440mA in the off-hook state. This ruled out the overcurrent case. Thus, it became obvious that the desktop’s USB ports were unable to supply the required 500mA at 5V that my board required (lawfully, as per the USB standard). With the overhead of the extension cable and the amp-meter, sometimes the voltage was dropping that much that the board was rebooting!
But why would this result in audible clicks? It is simple: the 3210, seeing that the converter is not performing as it should, attempts to restart it from time to time. This results in a “click”, as the voltage on the line drops momentarily. The restarting frequency changes as the line conditions change (e.g., as capacitors inside the phone set get charged), so clicks may eventually disappear. Anyway, under these conditions, the circuitry does not operate as it should.
What’s the bottom line of all this? There are two answers. The first one is that the USB ports (or, most probably) the power supply of the desktop that I used were inadequate for the power requirements of the board. However, this is the easy answer. The difficult one is the circuit’s design parameters could after all be a bit marginal and this may cause failures on situations like this one. This must me remedied in a way or another. Until I get my laptop back I cannot do much (typing all this on my mobile is already an overkill), but I would be delightful to receive suggestions from you, the readers.
It all started when I came into discussions with the first company (owned by friends) that I have mentioned in some previous posts of mine. These gentlemen mentioned fax as a possible commercial use of my adapter. And it’s true: even in the case of an Asterisk installation with IP phones, one always needs one (usually one is enough) FXS port to connect a good, old analog fax machine.
So the question came to me: can Open USB FXS do fax? From this — misleadingly simple — question, a whole new adventure in the deep seas of debugging was born. I still do not quite understand in depth all the issues that arose but, since my stories usually have a happy end, I managed to resolve these issues. Maybe some of the readers will understand more than I do, and they are welcome to submit their comments. Good, let’s start then.
A primer on fax in VoIP systems. Usually, this is a no-no-no-dont-do-it thing. Fax machines are designed to work over the TDM network, under the assumption that there are no such things as packet loss or jitter. In contrast, VoIP systems operate under the exact opposite assumptions: packets may be lost, delayed or even get duplicated by the network. For these very reasons, it’s permissible to pack audio samples into jitter buffers at the receiving end and play all sorts of tricks to make the audio result “better” in case of losses etc. One common such trick is, for example, to repeat the last audio sample in case the input buffer is empty. This will “sound better” than sending a sample of silence. Thus, implementing tricks like this one, drivers for devices like Open USB FXS allow slack in various degrees. This should not be the case for fax, though. When dealing with fax, a single lost packet can have a detrimental effect in the reproduced image, or even cause the entire transmission to fail.
For this reason, ITU-T has come with the T.38 standard. According to that, two terminal VoIP systems that wish to exchange facsimile messages terminate the fax encoding realm locally by decoding the signals that the sending machine transmits, and then exchange the decoded contents as IP packets, using UDPTL. How about Asterisk and T.38? Currently, there is no “native” support of T.38 in Asterisk, but various commercial add-ons and public patches (such as this one) claim to add the necessary functionality.
What this situation leaves us with is the local (pseudo-TDM, in Asterisk jargon) packet handling by the driver. In other words, if T.38 handles (successfully or not, that’s another discussion) the network-facing part of the problem, the device-facing part — the device driver — should handle audio packet transmission without a single loss. Did Open USB FXS fulfill this requirement? An idea, suggested by a friend, was to try local fax transmission between two FXS ports on the same machine. I was confident enough that Open USB FXS would pass the test easily.
I was wrong. The two fax machines that I tried were unable to pass the initial negotiation. When I eavesdropped the phone line, I discovered why. The tones that the answering fax machine was sending were arriving badly clipped and distorted. But why was that? Voice audio was delivered fine; why would the fax tone be clipped?
The answer was simple: high frequency. A quick test revealed that high frequencies were somehow “filtered” by Asterisk when I used Open USB FXS. It did not take me long to understand the next culprit in the chain, multi-sample packing per URB. My driver tries to be smart and make best use of available system resources, in order to perform acceptably even on low-power CPUs (I am emulating such an environment using VMWare). In order to do that, it packs multiple 1-ms audio samples into a single URB [see why this is more efficient by consulting the discussion at the end of this post]. When the URB completes, the driver passes all samples together to the dahdi core (and from there on to Asterisk).
Apparently dahdi did not like that. I took me quite some time to test all possible combinations of driver parameters, until I finally noticed that the clipping effect was lesser when I was loading the driver with only two samples per URB (using the “rpacksperurb” and “wpacksperurb” parameters). Still, the quality was not quite acceptable for fax transmission, but at least one could make out the familiar sound of an answering fax machine sending various tones. This encouraged me to look deeper and test further.
The number two (two samples per URB) seems to be related to the buffer depth of dahdi, which is exactly two samples. So it’s no wonder that submitting more samples at once to dahdi screws things up, because it results in newer samples overwriting older ones before the latter get the chance to be read off the buffer. Interestingly, there is no user-side way to change this, because although there is an ioctl() to change the number of buffers in a channel, a subsequent close() resets this to the default number 2. Even more interestingly, changing 2 to a higher number (8) in the dahdi source and recompiling dahdi did not make the high-frequency fax tones sounding better. I declare my ignorance as to why this happens, but changing the dahdi source did not feel like the way to go, so I went back into trying to fix my driver instead of fiddling with the dahdi code.
I tried with one sample per URB, but the results were still not what I was hoping for. Sometimes I was getting good results, sometimes not, and I could not understand why. It was only then that I finally thought about removing the dahdi_dummy timing module. Dahdi_dummy employs the high-resolution timer available in the x86 platforms to “tick” the dahdi core once per millisecond. However, this “ticking” is not synchronized in any way to the pace of interrupts arriving from USB. I am not sure why, but this caused still bad quality audio in high frequencies, even with a single sample per URB.
To make things worse, the default initialization script /etc/init.d/dahdi checks for hardware dahdi channels, and if it finds none, it loads automatically dahdi_dummy. I have the bad habit of plugging my USB dongles after the system boots and watch them initialize through dmesg messages), so dahdi_dummy was loaded by default. This caused me much confusion, because I got some good results which were worse again after a reboot, and it took me a lot of bumping my head on the wall to find out why: it was dahdi_dummy, which was being reloaded after the reboot, and I was forgetting to remove it.
After all that, I tried again the test with the two local fax machines. This time everything went fine. I sent over an image (an ad of a Bob Dylan concert, you can make out the greek letters on it) from a Samsung FS5100. Here it is:
Here is the transmission report, showing what the sending fax machine sent to the other side (the handwriting is mine). Note in particular the noisy parts (e.g., the dots inside the white letters “Bob”), which are already present in the scanned image:
Finally, here is what the receiving part got — a perfect reproduction of the transmitted image:
From all that, I guess Open USB FXS deserves the title of this post, F(a)XS(uccess)! However, here are some final notes. In contrast with a PCI(e) driver, where a few outb/outw instructions suffice to handle an interrupt, submitting an isochronous URB is a very expensive operation. It takes a considerable number of CPU instructions, and this is mainly because the driver must make sure that the submitted URB does not take up too much out of the available bandwidth on the USB bus. Especially when plugging more than one Open USB FXS devices into the same computer, this could result in bad performance, loss of USB timeslots etc. On my VMWare testing platform, this is apparent, in that the kernel reacts slower than required and misses USB timeslots, resulting in audible “clicks”. I have looked extensively into the kernel source for UHCI, trying to find a way to get an interrupt per USB frame (not per URB completion), but it seems that there is no such thing. In theory, I could use in my driver the HR-based “ticking” of dahdi_dummy and test the first submitted URB for per-packet completion, even with more than one samples per URB. However, this seemed to me very complicated, so I let it go for the time being. Other drivers (Astribank xpp for example) seem to do something like that. Maybe that’s the way to go in the future.
I am now getting back into the prototypes hell (which BTW have some issues, and this is why I am holding back the whole release phase), but there are lots of things in that hell to blog about, and this would deserve a different post. Soon to be, I hope, along with the announcement of public availability!
Great news! I have now received all necessary materials for prototypes, although I have not yet had the time to deal with their assembly. In the meantime, several tests and trials keep me busy (but more on these on my next post).
As for the prototypes, I am setting up a page with information on how to order. It may still contain inaccurate information, but I am confident it contains most of the essential parts of the procedure.
I want to confess that it felt embarrassing to write all these disclaimers at the end of the page, but after all I guess it is best to explain what people should expect from a prototype. The bottom line is, it may work or it may not work for you (and especially if you build it yourself without the necessary experience and tools, it is very possible that it will not). Even ready-made tested boards may fail to work with some USB host controllers, or with platforms other than the x86 (the latter for performance reasons).
Bottom line, your mileage may vary; you are buying an exotic — though cheap — gizmo to experiment with, not a consumer product. It is always good to remember that I am not attempting to fool anyone into buying and I am not making any financial profit out of you buying prototypes. The one who really benefits from spreading out prototypes is the project itself, not my own wallet.
Within this week, I ‘ll be able to set the final price for DIY kits (assembled boards will follow). Note that it may be wiser to wait a bit for a couple of boards to be assembled before you order a DIY kit, at least to verify that the current lot of PCBs and materials are OK. You never know, there may be some part that is not appropriate or a glitch in the PCBs. One working board will disprove all such cases, so when I announce that, it will be safe(r) to order DIY kits. If you cannot wait, however, you may just as well order a DIY kit before I assemble any boards. It is up to you.
I will not write any further posts on prototypes, except perhaps for announcing the availability of assembled boards. All this story has consumed too much of the energy that I had available for this project, and there are lots more to do. The ordering page will be updated though, to reflect changes, so you may consult it to see what is new.
I truly hope this whole prototype business will go OK and it will end up with you, the readers, doing happy experiments and contributing to the project your experiences, your firmware and driver patches, and your observations on bugs that need fixing. Time to wish good luck to all of us — and and of course to the project!
Materials from my prototypes have arrived for the most part. I am still waiting for the PCBs, but, since I am not exactly the type of person that would sit idle in the meantime, I decided to torture my mind with a set of optimizations for the kernel driver and other hacks.
The first of these optimizations — and the one I am to discuss in this post, hence its title — is also the only one that I have started implementing so far and relates to serial numbers. To start the discussion, I might have to remind readers that USB, among the other device descriptors (device string, manufacturer string, etc.) supports a “serial number” string descriptor. But wait a moment, why would a serial number be needed in the first place?
The most important use of serial numbers that I was thinking about is called channel number persistence. As discussed in my older post Hello, Asterisk!, the channel number that the dahdi core gives to a board depends on the order in which this board is plugged into the system. Moreover, all other characteristics of a device in Dahdi and Asterisk, like e.g. the local extension number, the echo canceller setup, etc., depend on the channel number. If one plugs two boards in the opposite order, one gets the extensions and all other settings mixed up. Given that I am experimenting with two or more boards which I plug and unplug all the time, this behavior is not exactly what I wanted.
Because of that, I was thinking about this channel number thing. What this would mean is that I could add a module parameter that my kernel driver would understand, something e.g. along this example: insmod oufxs.ko persist=<serial1>,<serial2>,<serial3> and this would have the effect that my driver would pre-allocate some dahdi channels (three, in the above example) and map each of them to the appropriate serial number using a field in the respective device structure. The driver could arrange at startup to mark these pre-allocated channels as inoperative and return e.g. -ENXIO or something similar to any caller for all operations — until a board with an expected serial number is plugged in.
Then, at device plug-in time, the driver would look through its internal list of devices and try to match the serial number of the board being plugged in onto an existing idle channel. (I have underlined idle here, because there is no guarantee that two USB devices will not present the same serial number to the host, and if a check for an idle channel is not made, this would result in the newly-plugged board hijacking an existing channel belonging to another board). If a match is found, then the driver would associate the respective idle channel with the new board, otherwise it would allocate a new internal structure as it does now.
To implement that, I first needed to implement serial numbers in firmware. The default USB stack provided by Microchip does not include a serial number string. To begin with, I was not sure whether the standard defines what format a serial number should be in, but after looking around, I found several examples in which the serial was a (text) string of hex digits, so I decided to stick with that.
The next decision had to do with how to program the serial number. All other device descriptors in Michrochip’s stack are stored in program flash memory. By means of a few additional lines of code, I was able to configure the famous “dead beef” hex string as a serial number and get the host to see that. Good, that proved that I could use the program flash as source for the serial number, but was this what I really needed?
The answer was “no”: if I chose this solution, I would have to manually program the serial number along with the firmware. In other words, I would have to produce individualized firmware images for each board. Not only that, but I would have to somehow remember the serial number of a board and restore that after a firmware upgrade (and firmware upgrades in my boards are pretty frequent as you may imagine).
Fortunately, the PIC has a good answer for that problem. It contains a program-controllable EEPROM memory, where data can be stored. After some googling, I found this hack which gave me enough hints on how to use the EEPROM. My first step then was to move the device descriptor into RAM, still initializing it from program flash as “dead beef”, and make sure that this worked OK. Second step, I tested the EEPROM read functions and got back factory-set data (a series of 0xEE’s).
Then, all I had to do was write a USB primitive for “burning” a serial number on the EEPROM and craft a couple of modifications in my driver. The driver would check for a serial number, and if it found the factory-set string 0xEEEEEEEE, it would use the “burn” primitive to write a new, should-be-unique, serial number in the EEPROM. I coded that, using the current value of the jiffies variable as a source for unique serials, plugged a board and — voilà, the driver reported that a new serial was burnt into the board. Re-plugged the board and, that’s it, the new serial was recognized.
I had only forgotten a minor detail, that caused me a kernel “oops”: old-firmware boards did not report any serials. In this case, the kernel initializes the respective field of the device’s usb structure with a NULL pointer, and I was comparing that NULL pointer against “EEEEEEEE”, resulting in a zero-address dereference. Fixed that, by adding another message informing that this was an old board and the firmware had to be upgraded. Tested again, and everything worked fine.
Here are some dmesg excerpts I know you will like. At first, a new device with old firmware gets plugged:
Jun 2 14:48:09 avarvit-d NetworkManager: <debug> [1275479289.719625]
nm_hal_device_added(): New device added (hal udi is
Jun 2 14:48:10 avarvit-d kernel: [ 735.306908] oufxs: oufxs_setup:
oufxs1: old-version firmware not reporting serial
Jun 2 14:48:10 avarvit-d kernel: [ 735.306921] oufxs: oufxs_setup:
please upgrade firmware on board and replug
Then, the firmware is upgraded to report a serial number (and the “burn” operation) and re-plugged. The driver sees that this is a fresh-from-factory EEPROM, and burns a new serial:
Jun 2 14:50:19 avarvit-d kernel: [ 864.196194] oufxs: oufxs_setup:
oufxs1: no serial on device's eeprom, burning one
Jun 2 14:50:19 avarvit-d kernel: [ 864.211314] oufxs: oufxs_setup:
oufxs1: serial written OK, re-plug to activate new serial
Finally, when the board is removed and plugged back again, the new serial is recognized by the Linux USB core:
Jun 2 14:51:01 avarvit-d NetworkManager: <debug> [1275479461.618913]
nm_hal_device_added(): New device added (hal udi is
So now, I am ready to do some serial — sorry, serious — programming on the driver code in order to implement channel persistence. I don’t know if I am going to do this right now, because of other priorities (some tests that I will mention in my next post, and prototype assembly). But it is a feature that I think is needed, and I will add it ASAP.
Frequent visitors of this blog might have been wondering what the heck I have been doing all this time. Well, the answer is “I ‘ve been trying to get the prototypes business going”. This post is about all those messy details of this effort that would otherwise go unnoticed. Initially, I did not really think that all this crap was worth posting. After giving it some thought, however, it struck me that all stumbling upon the details that follow was very similar to every other debugging session described in other posts of this blog. Here are the details, then.
To begin with, I am not a company and I do not own a company. According to the Greek Revenue/Tax laws, buying and selling stuff is “commerce” (well, in a sense this may really be so) and this is an activity that individual taxpayers are not allowed to have. Instead, individuals must form a individual-owned company, which is then required to have its own VAT number and do its own accountant bookkeeping. In real life terms, this meant that, in order to sell the prototypes, I would have to spend some weeks in various public sector services (Greece is notorious for its bureaucracy levels) to start up this company, plus a non-negligible amount of money for just being allowed to do business (I don’t know if you read the papers, but Greece is also — and lately it is becoming more and more — notorious for taxing and charging with dire cruelty everything that breathes, moves, eats or speaks). I would also need to pay an accountant to get me through the maze of legal procedures concerning financial activities, bookkeeping, logistics, etc. OK, if one is planning to open a shop or a production line, maybe all this is worth the trouble. For just selling a bunch of prototypes at bare cost price without profit, it does not really make any sense.
So then, it was clear that I had to act differently. As I already said, I came in terms with a company owned by some friends, who would undertake all the logistics, accounting, and bookkeeping, while I was to coordinate the thing. That agreement having been made, I just went on and compiled my first basket from Mouser, and then went back to my friends and said to them: “OK, we are now ready to begin, please go to this URL, use that username and password to log-in, and then a cart is waiting for you; all you have to do is click the checkout button — and pay for the order, of course”. Ideally, this would take less than one working day of waiting and less than ten minutes of actual work, but it turned out that the world is not such an ideal place… Day after day, I was trying to reach out the people I had come in terms with, ask why they had not yet checked out the basket, and kindly urge them to do so. The answer I kept getting back was, something like “Sorry, I was terribly busy today, I ‘ll do it first thing in the morning tomorrow”. And the thing went on like this.
All this delay was making me really anxious. As the hardware-experienced readers will know, electronics stock houses do not keep materials in stock forever. These days, only Mouser seemed to have the Silabs chips in-stock, and if they went out of stock, the lead time to re-order would be in the order of six to twelve weeks. OK, Mouser had a good number stocked, however nothing precluded a potential buyer from ordering the whole stock at once. The chances of that happening were very very low, but the risk of delaying the prototypes business for months was one that I wasn’t willing to take. So, every day that went off without my friends checking out that damn Mouser basket was a source of frustration for me.
After almost two weeks of anxious waiting, unanswered calls, SMS messages and mails, I decided that this was not going to work. I loved — and still do love — my friends, but if it took them weeks to click on a checkout URL, then it would take us ages to just gather all required materials — not to speak about what would happen with assembling the boards. Fortunately, I had a backup scenario already in place: a second company that I had come in similar terms with, who proved to be much more available. It took just a phone call to agree on a meeting date in order to click the checkout URLs together. That should do it, and that should be all.
Nope, it wasn’t all. To begin with, I had to create a bunch of new web accounts for the second company, something I had already done once for the first one. Sure, that was not that hard — at least not in principle, because at the Elektor PCB service/Eurocircuits site, where I decided to re-order my prototype PCBs, another unpleasant surprise waited for me. I created the account, I uploaded the latest Gerber PCB files, I specified the quantities, I calculated the price and then — where is the “Submit” button? Believe me or not, there was no “Submit” button! I know it ought to be there, because I had used it before. Initially I tried the Elektor site, and when I saw the bug, I tried the Eurocircuits site (the latter is an OEM of the first, and what is worse, if you try to work on both sites at once, they sometimes get confused and redirect you to one another — but this is a different story). No “Submit” button there, either.
A quick phone call to Eurocircuits did not buy me a lot. They said this was a problem they had seen before but they did not know what was causing it, they asked me to submit a screenshot and let them know my user id in order to try reproducing the problem. I sent them that, however I did not receive an immediate feedback (such as a mail saying, e.g., “we verified that this bug exists and are working to fix it”). Then, after spending some good time on their web site, I accidentally browsed to the accounts management page. There, I suddenly noticed that the company account I had just created had a missing field. This field was called “Initials”, and was not present at all in the registration page, so it would have been impossible to fill it in at first. Nevertheless, on the accounts management page, this very field was marked “required” with a red star. As soon as I put some crap text in it (I didn’t really know what “Initials” would stand for to make it a required field), submitted the change and went back to the orders page, the “Submit” button magically appeared. Shoot…
Nope, this was not all. Together with my friend at the second company, we went through the submission with Mouser, he gave his credit card information and all went fine. Not for long, though: some hours later I received a mail from Mouser’s credit department, saying that the credit card had failed authorization (jeez, why did not that show up when we were checking out our order?). However, my friend was swearing that he had payed for this credit card this very morning. So, what was wrong then? The truth did not shine until a day later: the company had a special arrangement with its bank, with two company credit cards that were fed from the same company bank account. However these two were still behaving as two different credit cards. When money was deposited to the bank account, that did not act as a common pool for both cards; instead, using some algorithm — unkown, even to the bank’s representatives to which my friend spoke over the phone — money was put into one of the credit cards (e.g., the one with the shortest remaining time limit, or the one with the highest debt — the bank’s representatives were not able to tell). Obviously then, my friend had used the wrong card, since money had gone to the other one. [If you find that doing business that way is fascinating, I can give you the name of the bank]. Running the order again with the new card fixed the problem, and now most of the required materials were on their way. Pheeww!…
Not all materials are from Mouser. Some I have ordered from Farnell, and some others I was able to find in retail stores cheaper than I would find them in any components stock-house. So I am still waiting for more materials to ship or to arrive. I have to say that this whole optimization work is very cumbersome. It is also very frustrating to find out right after you have ordered 200 pieces of component XYZ costing something like 50 EUR/USD, you could just as well have ordered the same component from retail or from another stockhouse at half the price. Sometimes even these numbers are misleading: Digi-key orders are subject to import taxes, so one might pay an undetermined amount on the top of the list price of each material.
The bottom line is that it requires an expert in the electronics market to make this whole business run as cheap as possible. I don’t know how much of an expert I am, but I tried to optimize as much as it made sense the cost of my BOM. I have decided to offer 50 prototypes (assembled and DIY kits included), so the economies of scale I could achieve were elementary. With the optimizations I was able to make, the BOM, including PCBs and all materials, amounted up to something between 1,200 and 1,400 EUR. There are some cases where VAT would be added, so this would be higher (if you allow me to become a bit sarcastic, Greece –striving, as maintained by its political leaders, to make its economy more competitive — has increased the VAT coefficients from 19% to 21% and then again from 21% to 23%, and this means that imported materials from other countries might cost 23% higher than if I set up this whole prototyping business in China or India — hey, Mr. Government, you’re doing a great job in saving my country! Keep it up like this!). Anyway, this means that a bare DIY prototype (no assembly, no testing, no guarantee it is going to work) will cost something in the order of 35.00 EUR plus P&P.
Besides optimizations based on large quantities and cheapest material selection, there are quite a few optimizations that I plan making on the board itself. For example, the double SMD DIP switch is not cheap at all. In a production version of the board, this would not need to be present, since it is easy to tell the firmware to jump to Flash-programming mode by just a simple command over USB. If need be, an SMD open contact can act as a switch by short-ing its contacts when the board boots. And there are many other optimizations, only I feel this is not exactly the time to start this discussion. Maybe in the next post.
I hope you enjoyed this post. To get to its title, what else could that refer to if not to a song’s lyrics? It’s an early one from Jethro Tull, in which then-young-and-optimist Ian Anderson gives some piece of advice:
Nothing is easy. Though time gets you worrying
my friend, it’s o.k.
Just take your life easy and stop all that hurrying,
be happy my way.
But maybe I ‘m just not that type of guy — or am I? Maybe neither Ian Anderson is that type of guy, or else he wouldn’t have written all that wonderful music of his (music is like any other business: it requires endless hours of study, trials, rehearsals and failures before even one inspired and well-performed musical phrase can come out).
Nothing is easy, then, and prototypes are not an exception. But now materials are on their way, and my motto is that getting something going is more than half of the work. In my next post I will most probably announce some details on how to order boards and kits. In the meantime, we can all rely on young Ian’s optimism. Maybe he knows better than we do.
I have now finished with the code for the hook debouncing and for invoking the echo canceller. The latter is still largely untested, but I might be able to set up a test maybe even today. One could say that the code is more or less ready now. The only thing that has not yet been ported from the old driver is the code that counts USB packet-level statistics (missing sequence numbers, incorrectly received USB frames and the like). At this point in time though, I feel that these are not really urgent, since I have kind of resolved the issues for which these statistics were needed. So, I am going to update the SVN tree soon with the latest version of the driver (and will place an update to this post as soon as I do that).
One thing that I need to do is to write some compiling instructions on how to best compile the module on various popular Linux distros. When trying to compile the driver, one’s mileage may vary depending on the distro/kernel/Kbuild environment. To give an example: because of a bug (mentioned in an older post of mine), in Debian lenny one has to either issue a “make EXTRADIRS=oufxs” at the top-level dahdi directory, or else the Module.symvers file has to be copied manually for an isolated “make” to work inside the oufxs directory. On ubuntu, “make” inside the oufxs directory fails altogether with a syntax error (?) message; plus, in order to install the driver via the dkms facility, one needs to fiddle with the dkms.conf file found in the top-level dahdi directory. I have not the slightest idea what the situation is on other distros like Red-Hat ones.
I have managed to bring the failed board mentioned in item #6 of my previous post back to life! The problem was that the 3201 had not been soldered to the thermal pad and thus had burnt after some time of continuous operation. I suspected the 3201 because of the following symptoms: the board would not make the phone ring at all; nothing was heard on the earphone when the card switched to forward active mode (a click is normally heard); and, the DC-DC converter was producing correctly 65V on powerup, but the voltage fell to 9V when the card switched to forward active mode. All these indications suggested an issue beyond the 3210, at the output stage. Replacing the 3201 with a new one — which I tried to make sure is now tightly soldered to its thermal pad — resolved the problem and the card is now alive and kicking.
My third and last dongle-version-2 prototype card remains out-of-order. The indications there suggest a problem in the 3210 or in the PCM bus between that and the PIC. I need to spend some time using an oscilloscope in order to debug this further, and I don’t know if it is really worth the trouble at this stage of the project. I might do it later on.
I have to note down that I owe to try out the pull-down resistor suggested by Gabor in this comment. The truth is that I am having some noise, which I have been overlooking all this time. Probably, as the fix by Gabor suggests, the cause is that the 3210’s PCM bus driver, when outputting a logical zero on the DTX line, does not offer that a good GND-level surge for the noise caught by the transmission line between the 3210 and the PIC. Thus, unless the fix with the pull-down resistor is applied, noise can possibly make it into the audio path. Another issue is the noise induced into the analog audio path, mainly from the DC-DC converter. Fortunately, the 3210 contains a squelch filter for that, however the filter takes some time to adapt after changes in the linefeed mode (direct register 64), during which a high-frequency hum is audible. Probably the solution there is to find out which indirect register(s) are related with the squelch, save their values and re-apply that after every change in DR 64’s value. Anyway, just in case I am to adopt Gabor’s fix, I have checked the PCB design and it is easy to add the pull-down, near the “via” for the DTX signal, with no major changes.
Another piece of good news is that I cannot seem to reproduce the DTMF dialing intermittent failures on a native Linux. The issue still remains on the VMWare-hosted Asterisk, which by now is well known to starve from CPU and I/O resources. Probably the failures are due to a CPU-starved Asterisk missing samples and thus being unable to decode correctly DTMF. This is a great relief, in that DTMF recognition intermittency was a scary, scary bug, at least to me.
Perhaps the most important piece of news in this post is that I have now got on in an agreement with a local company owned by some good old friends, in order to set forth the production of a limited quantity of prototypes (in both an assembled board and a DIY kit flavor). This involves a number of steps, too, and I am going to spend some time discussing these steps.
A quick inspection and review of my board by some hardware assembly specialists revealed to me that there is substantial room for improvement in the choice and placement of components in order to ease the mechanized assembly and thus reduce the cost of the board. To name one, I had better change all components into SMD ones (my board still has a number of through-hole components: two electrolytic capacitors, a tantalum capacitor, a crystal, and the USB and RJ11 plugs). In PCB assembly production lines, placement of through-hole components slows down the process and costs more.
Another area of improvement is that, because the board has components on both sides, it normally requires a second baking round in a reflow oven, and bottom-side components must be glued to the PCB, or else they will fall off the board during the second bake phase. This raises the cost of assembly. However, if components are all oriented alike, the bottom soldering can be done in a solder bath, which is much cheaper than baking in a reflow oven. So, if I am to go into mass production, I had better redesign the bottom side of the board to orient all components west-to-east (the PIC cannot change orientation, so it will make the rule for the others).
Finally, there are lots of components, mainly on the top side, which are too close to one another. Mechanized assembly tools like pick-and-place machines tend to complain or even fail in such cases. Some examples where I should allow components more headroom between them can be found around L1 (mainly the power transistor and the crystal).
The good news are that this redesign step, otherwise time-consuming, will probably be unnecessary for the production of prototypes. Because of the limited number of prototypes, we may go for a manual assembly production, without this incurring any substantial higher cost (the cost for the setup of a mechanized assembly production line is high anyway). The only possibly required redesign of my PCB at this stage involves the DTX pull-down resistor mentioned earlier.
This means that, during the next few weeks, I might be able to give out some more-or-less concrete dates for the availability of assembled prototypes. However, please bear in mind that things are now getting a bit more complicated, since more people and at least two companies are involved. So, please be patient if planned dates shift a bit. For the time, the only planned date is next week, when I think I ‘ll be able to give out some first timeplan. In any case, the important piece of news is that there is now sort of a commitment to produce and give out prototypes at a nominal price. The dates will follow.
I will make sure to include a shot of an assembled v2-dongle in this post, just to make it livelier. Other updates may follow as I get done with the echo tests and as the prototype production is progressing.
Update, May 15: The driver code with the hook debouncing and the invocation of the echo canceller has now been uploaded to the project’s Google code page. Also, I have patched a board of mine with a pull-down resistor in the DTX line as suggested by Gabor in his comment. However, I saw — or, to be exact, I heard — no obvious difference. I also ran a few more tests by setting the 3210 to several loopback modes. I tried both ALM1 and ALM2, and was happy to find that there were no audio quality issues at all — at least no issues that my ear could perceive. Similarly, I ran a DLM test by generating a 1-kHz tone, sending that to the board and collecting it back from loopback. Apart from lost packets (on VMWare), this did not show any issues either. However, I will run these tests again, carefully comparing a patched versus an unpatched version of the board, to make sure that the pull-down resistor patch is not really needed. This is very important because it proves that the firmware (including the receipt and transmission of data via the digital DTX/DRX loop) is working perfectly OK, and any potential audio issues can be isolated in the USB layer and up (including the USB communication between the firmware and the host, the driver, and Asterisk). Hey, this is a test I should have run months ago… Anyway… Finally, here is another great piece of news: I am about to place an order for materials for a bunch of 50 prototypes! Without making any promises yet, prototypes should now be available soon!
Update, May 23: It may seem that I am inactive these days, since I am not posting much. This is not at all the case. The reason I am not posting is that I am doing is a very, very mundane and dull job: I am trying to optimize the cost of my BOM, by looking around at several online electronics shops and retailers. To give an example, a quantity of 50 USB Type A solder-on-board plugs costs about 75 EUR on Digi-Key (Molex), about 100 EUR on mouser (Molex) and about 25 EUR from Farnell. Moreover, in my location, I have to calculate import taxes for merchandise bought from Digi-key. Anyway, in this case Farnell is the clear winner, however this is a very lengthy (and tiresome) work. So far, my per-board BOM cost for 50 prototypes is somewhere between 20 and 30 EUR (this cost does not include assembly and P&P). A friend at a retail shop is currently helping me out to reduce this even further, but I feel the BOM cost won’t drop below 20 EUR/brd. That’s not awfully bad for such a small quantity, and it can drop dramatically at higher quantities. Moreover, there are other optimizations one can make. To give an example, the 0.22uF/100V caps at the output stage cost more than twice as much in a 1812 package than in a 1206 one. I don’t really know why Silabs suggest a 1812 package (I should ask them BTW, or if any reader has gone through this already, a word of advice would be greatly appreciated). As of now, I have decided not to touch the PCB design and to just try finding the best price for the existing list of materials. Anyway, I am to finalize and place my orders the forthcoming week; until then, probably there are not many news to expect.
Besides hardware, I am also contemplating an optimization for the driver. Here are some details: the driver tends to spend long periods of time in a non-interruptible (IRQ?-)context inside the USB completion callback routines, packetizing and de-packetizing audio data. With the current defaults, this occurs once every four milliseconds for OUT packets plus at the same frequency for IN packets. Ideally, this would occur once every two milliseconds, alternating between OUT and IN callbacks. However, nothing really guarantees this optimal alternating scheduling: the initial submissions that trigger callbacks are executed asynchronously, and it might just as well occur that both IN and OUT callbacks get ready to execute at the exact same millisecond. If more than one boards are plugged, four (or more) callbacks might occur at the same millisecond, and each of those must consume a non-negligible amount of time to packetize/de-packetize data. This might starve the (userspace) Asterisk process from real-time CPU resources and result in “lost” samples (samples that arrive too late) or to bad sound quality. Optimizations include the following: (a) try a preemptive kernel (I am not expecting much there, besides numerous bugs that might show up, both in my and in other people’s code); (b) invoke schedule() at times while doing (de-)packetization and (c) decouple (de-)packetization from the completion callbacks, by adding tasklets to perform the lengthier jobs. I trust that some of the options (a) to (c) above (or a combination thereof) will peform much better than the current driver. Notice that my VMWare testing environment is very demanding anyway in that performance issues show up that would never exist in a native Linux environment. Once I am clear with hardware orders, I might try these optimizations and see what happens. Again, readers who might advise me on the subject are more than welcome!
Tell me the truth: did you really think I was going to sit idle, drinking beer and celebrating my first Asterisk call? Naaahhh, I didn’t think so myself… That first Asterisk call made me so restless that it was impossible to abstain from hacking on the project. So there is lots of news.
Before all else, please let me spend a few words on the title of this post. “Astral weeks” has been the title of an album from Van Morison, which I used to like a lot (that was in the previous century, though). And I guess that everyone knows what “Star wars” refers to. I wanted to use “star wars” as a title to refer to my first combats with the star-named PBX, but I found “astral” much more poetic (yes, the old 60’s and 70’s rock-n-roll songs seem to be a recurring motto in this blog). Why “wars”? Well, I was to fight with the Beast, and although bloodshed was less severe than I expected, every war has its casualties (in this case, just a board, read on for the details). The news from the front are both good and bad. Although I prefer to listen to the bad news first, when it comes to battlefront news, I think it is best to report them in their chronological order. Here they are, then:
Item #1 (good news): As I reported in my previous post, I was able to place an Asterisk call over VMWare using my board. The audio quality was lousy, but I guessed (correctly, as it will follow) that VMWare was the culprit for that. Then I wrote the previous post.
Item #2 (good news): I assembled a dongle board using my version-2 design. The actual assembly was finished before the Asterisk test, but I had not had the time to test the new dongle before my previous post. So, the good news is that the dongle worked fine (at first, as it will follow). The heat problems seem to have vanished and I was able to run all the tests (like ringing the phone, inputting and outputting ulaw samples, dtmf, etc.) without any issues.
Item #3 (good news): I tried installing my new driver and running Asterisk over it on a native Linux installation (Ubuntu/wubi). The sound quality was really good, and there were no clicks or other signs of USB packet loss. Great!
Item #4 (bad news): DTMF works intermittently. At times Asterisk understands the dialed digits, at other times it does not. As David Rowe pointed out to me in a private email of his, this might be due either to distorted/clipped IN audio, or to the lack of an echo canceller (I have not added an echo canceller yet). I think that the most plausible explanation is the second one, because at the absence of an echo cancellation process, the OUT signal may garble the IN tone so that it becomes unrecognizable.
Item #5 (particularly worrysome): The symptom of DTMF not being recognized is board-dependent. My older, large-form board worked fine, however the (hot, version-1) dongle board proved to be problematic. As to my (corrected, version-2) board…
Item #6 (bad news, but not so bad after all): … the second board decided to succumb to its wounds during the Asterisk tests. While initially working (with intermittent and problematic DTMF recognition, but otherwise working), the board suddenly started to refuse to power up the DC-DC converter, the phone did not ring, etc. It now steadily refuses to power up the converter during the first go, it then succeeds on a later try, but VBAT drops very low (-9V) as soon as the linefeed is set to “forward active”. This looks like the converter cannot supply enough current to the circuit or there is a power leak somewhere. News is not so bad after all, since the board has worked for a while, however it makes me worry about design issues. I ‘ll have to either hardware-debug the board, or assemble a new one (I tend to choose the second alternative).
Item #7 (good news): I was able to place a call through an IAX connection to Digium’s server (500 from the main pre-installed demo IVR menu), and the call went perfectly OK (apart from an expected audio quality degradation due to a flaky Internet connection at the place where I work, I was able to hear to the full voice message from Digium’s server).
Item #8 (bad news): My driver is still missing a hook state debouncing algorithm, so funny things happen from time to time, with the funniest one being that Asterisk occasionally interprets back-on-hook as a “flash” event (momentary off-/on-/off-hook transition). Funniest thing, this happened during the IAX call to Digium, which I managed to place into a three-way conferencing state. Yes, Open USB FXS can do three-way conferencing (like any other dahdi hardware)!
Item #9 (good news?): I have placed all the newest driver code, Eagle files, etc. on the project’s Google code site. If you wish to compile the new module, you have to place it into a directory named /usr/src/dahdi-2.2.XXX/drivers/dahdi/oufxs/ and either issue “make” within that directory, or add “oufxs/” (without any other path component) to EXTRADIRS and issue a “make” from the top Dahdi directory (/usr/src/dahdi-2.2.XXX). The plain “make” inside the oufxs directory failed on my Ubuntu tests. In Ubuntu, you need to modify the “dkms.conf” file provided with the dahdi source. I have not yet uploaded a modified dkms.conf, but I will. A word of warning: if you plan to produce your own PCB from the newest (v-2) dongle Eagle files, please take into account that (a) the only such prototype I ‘ve made has failed and (b) the LED and RLED should be swapped to place the LED under the RJ-11 plug. Also, an updated BOM is now in place. Please consider all the upload code and other files as drafts and don’t shoot me if they don’t compile on your system.
Item #9 (good news): After all, if I manage to assemble some working dongle-v2 prototypes and get the DTMF working with them, I think I am going to produce a limited number of dongles (ready-made or DIY kits) at nominal price for interested people to get. I am getting myself organized and will post details soon.
All the above sound like Open USB FXS is slowly conquering the star-land. It may be just ambition, it may be the reality. The next days will show and a soon-to-be post is due as soon as something significant occurs. The next steps are to assemble a couple of dongle-v2 prototypes, to get hook debouncing and to do something with the echo cancellation, then re-test. Expect to have news, sooner or later, and let’s hope it’s going to be exclusively good news this time!
Update, May 2: I am currently looking at the changes I need to make to involve (correctly) an echo canceller. I think that the best way is to work along the following lines. In order to keep a consistent time difference between OUT and IN samples, I need to keep track of the number of samples sent and received. To this, it is vastly more convenient to have the same number of samples (or “packets” in oufxs.c terminology) per URB. Currently, the driver allows module parameters for setting these two independently. This was a great tool for debugging and tuning, but I think the two distinct parameters are not really needed anymore. So, I will merge the two distinct set of module parameters and associated variables into one. After that, I will just need to keep track of an offset between URBs that are sent OUT and the respective IN ones. Normally, this offset should be constant, however the board may delay the receipt of an OUT USB frame, so I need to look carefully at this, using the mirrored sequence numbers to resynchronize in the case of delayed or lost USB frames (note, this was a real issue before I hacked the clock of the ISR to be consistent with the USB SOF frame rate, but normally it should not happen anymore — need to check though). After getting this inter-URB offset, I think that just invoking the echo canceller will be trivial. Basically this is it; however, the devil always lies in the details, so I will report more as I ‘ll be going through my plan. Let’s see.
Update, May 3: In the meantime, I have assembled another two V2 dongle boards. The results — at least for those who see the glass half-full — are not discouraging: one of the two new boards worked right away, showing no excessive heat issues. The other board failed in an early test stage (problems in communication between the PIC and the 3210, which usually translates into a problem in the PCM bus and especially the FSYNC signal, which in turn means a badly soldered 3210 pin or a totally damaged chip). Both boards might be fixable, but this means time-consuming labwork to diagnose the problem and perform surgery to the failed board. That is, things that I could not possibly do in a production line.
Let me summarize: out of three V2 dongles, one died while on duty (DC-DC converter issue), one worked perfectly well (so far…) and another was found dead in early testing. I don’t know if all this sounds satisfactory to you, however, if I were to read this blog, I would tend to say that the hardware is problematic. I don’t know if this is due to hand-soldering, so that I can expect better rates when I get to mass-produce boards. The experience of readers w.r.t. similar projects would help me there. You see, I am reluctant to go and order a quantity of assembled boards from some assembly factory if rate of dead boards is expected to be 50% or even more… If, on the other hand, hand-soldering delicate SMD materials is known to cause a high rate of damaged boards compared to what one can expect from an industrial grade assembly facility, then I am probably OK to go on (OK, after trying and possibly incorporating the patch proposed by Gabor). Moreover, this means that a DIY kit will have high chances of resulting into a brick circuit, a situation that will eventually kick back to me for providing the DIY option to the blog readers, even at a not-for-profit price. What do you, blog readers, think about this?
On a different course now: I am progressing with the driver. I have implemented hook debouncing and am going to test it soon. And, I have put some more thought into the echo cancellation issue. Fortunately enough, I already have sequence numbers in my USB packet headers. I can use these to deduce the actual OUT sample that is being acknowledged by each IN sample, so that I can match the two when calling the echo canceller. In addtion, I think I can fix this mapping between sequence numbers and samples to be static, so I won’t need to update it (thus, I will need no complex structures and locking).
I will report more on this a bit later; for the time, what I would appreciate very much would be to receive some comments on the dead boards issue, in particular from people who have experience in similar projects.
Yes, you guessed right: this post is about Open USB FXS interworking with Asterisk. And it does work indeed! After a setup of half an hour, I was able to get Asterisk to recognize the board (the respective Dahdi channel, to be exact), to dial from a phone set attached to the board and to listen some Asterisk voice prompts. Not bad at all, is it?
The environment on which I tested was a Debian lenny system, with some packages installed from sid (the latest unstable Debian version). The main such packages were Asterisk v1.6 and Dahdi v2.2. All related .deb packages were built from the respective source packages. I don’t know if other Asterisk versions work with the dahdi drivers. Maybe not, and the driver needs porting to zaptel as well (if so, it’s not a problem anyway, porting should be trivial).
Of course, there is a list of several serious caveats. To begin with, the voice quality on VMWare is horrible. Probably this is due to the insufficient CPU and I/O resources that the VMWare environment can offer to Asterisk and the driver, which require both real-time priority to work. I haven’t yet tried to run the whole thing under a native Linux. I trust things will be better there, but I promise to report any uncurable issues that I will run into.
The second serious caveat is the plug-and-play nature of a USB device as opposed to the relatively fixed nature of Asterisk and its configuration. When I configured my device under the [phones] section of /etc/asterisk/chan_dahdi.conf (the location may vary in your system) and started Asterisk without the dongle board being plugged in yet, Asterisk managed to get itself a majestic SIGSEGV (or was it SIGABORT? I forget). It does not make sense for the channel driver to insist in working if a configured device does not even have an associated device file on the system (this is exactly the case here). So this segmentation fault is probably a bug, which needs further investigation.
Another issue is that, depending on the order in which more than one Open USB FXS devices are plugged in the system, they may be assigned different channel numbers each time. This will in turn mean that, depending on the order in which a phone is plugged in the system, it may get a different extension number (this is for sure a great feature, don’t you agree? Imagine yourself being a VoiP provider and mixing up customers’ extension numbers each time a device is plugged or removed from the system; wow!…).
An additional yet unresolved caveat is the echo canceller. I have not yet built one into the oufxs driver; that’s not an issue in itself, but it turns out that it is not very easy to write the actual code, because the IN and OUT completion callbacks are asynchronous. So, the event sequence IN(n), OUT(n), OUT(n+1), IN(n+1) is perfectly valid. OK, but now if you think about how to submit the samples of such a sequence to the echo canceller, you ‘ll see that it gets to be a bit tricky.
Believe me or not, none of the above sounds anymore like a real problem to me. The next few days will show, however my project’s biggest goal from its first conception has now been achieved. I am really moved and excited about it, though my writing style probably does not reveal too many emotions. I believe I ‘ll offer myself a few days off the project now (plus the necessary quantity of beer to celebrate it). I am also thinking about writing a dedications page, where I will make sure not to forget that person who, one year from now or so, mailed me asking my phone number, and then called me in order to try to convince me that it was not going to work — “you are not going to get yourself anywhere with the PIC”, was his final argument.
No, I am not going to repeat the lyrics of the famous Doors’ song here. It’s just that I have decided to start a new post for reporting my progress and open issues in porting my driver to dahdi. To begin by a short summary, dahdi porting is going very well, modulo some problems that I think I will be able to resolve sooner or later. All this makes me feel like when approaching land after a long sea journey, hence the title of this post.
Here are the details (nasty as usual).
My new module, called oufxs.ko, loads successfully and registers with dahdi-base without any problems. I have now fixed the “location” of the card, so dahdi_cfg when the card is first plugged into a USB port reports e.g.:
description=Open USB FXS board 1: setup not started
devicetype=Open USB FXS
location=USB 2-1 device #6
OK, you noticed the “USB 2-1 device #6” above; did you also notice the “description” message with the “setup not started” note? This is thanks to the board initialization worker thread, that changes the description string as it goes along with initialization, so successive calls to dahdi_scan report the following, in the order listed here, with changes from one step to the next taking approx. half to one second, depending on the initialization step:
description=Open USB FXS board 1: setup not started
description=Open USB FXS board 1: SLIC sanity check
description=Open USB FXS board 1: initial powerdown
description=Open USB FXS board 1: set indirect regs
description=Open USB FXS board 1: set dc-dc convrtr
description=Open USB FXS board 1: dc-dc cnv powerup
description=Open USB FXS board 1: VBAT pwrleak test
description=Open USB FXS board 1: ADC calibration
description=Open USB FXS board 1: Q5/6 calibration
description=Open USB FXS board 1: LBAL calibration
description=Open USB FXS board 1: final step setup
description=Open USB FXS board 1: up-and-running
BTW, dahdi_tool also reports correctly the board and displays the description, refreshing it whenever the current board status changes. The “alarms=UNCONFIGURED” line is the next thing to worry about. The dahdi_cfg utility needs to be executed to configure signaling before the board can be used by dahdi. After some guesswork, I think that the only configuration really required in /etc/dahdi/system.conf is a line containing “fxols=N”, where N is the the dahdi channel for the board (1, if no other dahdi hardware exists on the system). After that, dahdi_scan reports:
Notice in the above dunp that unimplemented or reserved 3210 registers are not probed for their contents, and their contents are printed as zeros.
How about actual audio? So far I have implemented only the OUT part (if you ask why not also the IN part, I was too lazy — or precautious, if you prefer — to start writing the IN leg of the code without having yet tested thouroughly the OUT one). The OUT part appears to be working OK: using “fxstest /dev/dahdi/1 tones” as a test, I hear various beeping tones, including “dial tone”, “ring tone”, “busy tone”, etc., in succession. A few audible clicks from time to time can, according to my former experience, be attributed to VMWare’s sploppiness, so no worries here, although I owe to test more thoroughly under native (non-virtualized) linux. Provided that everything goes OK with the OUT part, I trust that the IN part will not be that hard to write as well. Summary: dahdi does fine with sending audio to my board by means of the new driver, and receiving audio does not look like an issue at all.
So far, so good. Now come the bad news: “fxstest /dev/dahdi/1 ring” (or “fxstest /dev/dahdi/1 polarity, which also sets the phone set ringing for a very short while) cause a complete system hang :-(. So far, I have not been able to find a good explanation for that. The nature of the hang is similar to what I have seen when trying to lock a spinlock twice, and by means of various debugging attempts (e.g., by not loading a tonezone in /etc/dahdi/system.conf and expecting the code circa line 5323 of dahdi-base.c to return -ENODATA since no zone is loaded, but contrarily to what is expected, getting another system hang), it could be related to the locking of chan->lock in line 5324 (my version is from debian source package dahdi-linux-126.96.36.199~dfsg). Why would this lock hang the system is a little mystery to me, but I trust I ‘ll be able to debug the situation further, if not otherwise, by means of extensive printk statements in the dahdi code.
There is also another caveat that I need to resolve. Wctdm.c, whose logic I am mimicking in my new driver, reads or sets various board and ProSLIC (3210/3215 in Digium boards) registers in interrupt context. As an example, while servicing a PCI interrupt from the board, if the data received indicates a 3210 hook change alarm, the hook-state ProSLIC register 68 is checked in-band. OK, this is fine if you work with a PCI board, where setting a 3210 register only involves a few inb/outb instructions, however it’s a disaster with USB, where one needs at least one bulk OUT packet (and its confirmation, hence a 2-ms separate I/O sequence) in order to communicate with the board. So, I guess that I need to somehow embed such commands within the isochronous OUT stream. This is not as hard as it sounds. The above example (checking the hook state) is already serviced in this manner (status is embedded in the IN packet stream) in my former driver, so copying it here is no problem. But embedding commands into the OUT stream will definitely need some additional work and will require a firmware upgrade.
Meanwhile, Alok Prasad (see his comments in previous posts) has reported an issue with porting the original openusbfxs driver to kernels prior to 2.6.20. The problem lies in usb_anchor, which was introduced in late kernel versions. In my reply to Alok, I am holding that
…the anchor is a place where pending URBs can be stored in order to cancel them easily if need be (e.g., if the driver unloads). It is relatively easy to craft a workaround, because in my code the active URBs are known at any time: I use usb_bulk_msg() for bulk IN/OUT, so the anchor is not used there. For isochronous IN/OUT, if one does not have an anchor, one can cancel active urbs by walking through dev->outbufs and dev->in_bufs and checking for the state of each of those: if dev->XXXbufs[i].state==st_sbmtd then dev->XXXbufs[i].urb is submitted to the core and must be canceled in case of a disconnect or driver unload. With this workaround, the anchor can be avoided.
While I keep myself busy with hunting nasty spinlock bugs and with other ugly details of dahdi porting, if any reader of this blog can assist by coming up with the above usb_anchor workaround, this would be of great help.
This is the situation right now. I am going to report further progress later in this post. Apart from the worrysome system hang, everything else looks really good! Maybe the board’s “hello, Asterisk!” day is not anymore that far away. Even though this day is not today or tomorrow, or not even next week, I have to admit that it certainly looks much, much closer now than when I first started the project. After all, this post is great news: while not yet in the harbor, I can see land ahead — hence the title.
Update, April 17: The kernel lockup/hang mystery has just been solved! It seems I fell a victim of my own thoughts when I wrote
OK, this is fine if you work with a PCI board, where setting a 3210 register only involves a few inb/outb instructions, however it’s a disaster with USB, where one needs at least one bulk OUT packet (and its confirmation, hence a 2-ms separate I/O sequence) in order to communicate with the board.
just a few paragraphs above. It turns out that the kernel lockup nightmare was hidden exactly in this paragraph. Why? Because, what is by far worse than the 2ms delay that I was babbling about, is the fact that bulk USB communication between the driver and the board is based on usb_bulk_msg(), which presumably is using blocking I/O. This means that the invoking process issues a bulk OUT URB and is then put in a sleep state while waiting for a URB completion or a timeout. Now, here is what was happening: when the DAHDI_TXSIG_START ioctl was handled by the driver, I had chosen to do the same thing that wctdm.c does: write 3210’s DR 64 of course, to set the phone ringing! This operation is done with the channel spinlock locked (chan->lock — this lock is acquired by dahdi-base right before invoking the ioctl handler in the hardware device driver), and at the same time, the isochronous engine was running, meaning that dahdi_transmit() would be called eight times each millisecond. What is one of the first things that dahdi_transmit() does? Acquire the channel spinlock, of course! So, the channel lock was acquired by dahdi-base on behalf of fxstest, then the calling process for fxstest went to sleep for two milliseconds (one to transmit the bulk OUT URB for setting DR 64, and one to receive a confirmation). To begin with, letting a process hold a spinlock while going to sleep is a very serious kernel programming error that can result in lockups. Meanwhile, just to make sure that the lockup would occur, the background isochronous engine thread was still running and invoking dahdi_transmit, which tried first-thing to acquire the same spinlock that was being held by the sleeping fxstest process. This was causing a kernel lockup. Bingo! I verified my guess by removing the dr_write() macros that set various DRs in all places but the initialization worker thread. And — yes! — the kernel hang disappeared. Of course the phone set did not ring, since DR64 is not yet being set, however my driver did no more lockup the kernel. Now I have resolved this issue, I am going to fix these to use isochronous OUT piggybacking. This means that, if no other unpleasant surprises wait for me down the road, the dahdi driver will be ready within a few days from now!
Update, April 18: I have implemented the DR set piggybacking logic in both the driver and the firmware. It did not work right away, so some testing and debugging is still needed, but the basis is there and there should be no fundamental error in the logic. What’s nice is that the kernel does not hang anymore on the DAHDI_RING ioctl. I do get a couple of serious errors though. When disconnecting the board, fxstest receives a SIGSEGV and the module count remains 1, even though the program that has opened the device gets killed. As a result, the module cannot be unloaded anymore and becomes a zombie. Some fixes here and there cured another kernel hang when the board was unplugged and then plugged back in. All these I guess are race conditions, as the usb core calls the module’s disconnect() method, which in turn starts dismantling the device instance, while the device is still hooked within dahdi. Anyway, I ‘ll soon resolve these errors, I ‘m sure.
Update, April 20: I think I fixed the module reference errors. The most significant error was that I was calling dahdi_unregister() too early, and thus the close() that occurred when the calling process exited was not received by my driver. Hence, the module’s reference count was left equal to one and the module could not be unloaded. Moving dahdi_unregister() from the oufxs_disconnect() callback to the oufxs_delete() one fixed it (you may check the source of the older driver to see what these do). So the driver looks stable now, does not hang or freeze the system and does not put the module into a zombie state anymore. Next thing to do is to debug the piggyback register setting capability. Once this is fixed as well, all that’s left is the IN isochronous engine, and then the driver will be ready!
Update, April 20 (later): Piggybacking of register set commands on OUT packet headers now works perfect and “fxstest /dev/dahdi/1 ring” rings the phone!
Update, April 22: One more nasty bug resolved: when lowpower mode was set to zero (normal power mode), the phone refused to ring! I chased this down to a wrong 3210 register setting during longitudinal balance mode calibration, due to a stupid copy-paste error of mine (I think that I ‘ve already written quite a few times in this blog that code copy-paste-due bugs were by far the worst ones I ‘ve dealt with). The fix to this bug resolved also an intolerable noise issue in audio, which I had swept under the rug for the time. Perfect! Another bug still remains, but this is kernel-related, so — I guess — easier to fix: when I plug the board with the module already loaded and then right away, before the board initializes, I reload the driver, I get a kernel freeze. By now, I ‘m experienced enough to suspect that this is a double spinlock locking, due to a race condition that occurs when the board plugging with the old module instance causes dahd_register() to be invoked shortly before or at the same time that disconnect() is called, which starts dismantling the device. Ugly, but definitely fixable. So, my short-term roadmap is to resolve this last lockup bug, then to write the IN isochronous engine (and hook/dtmf monitoring) and then — provided that no new bugs come up, of course — to publish the new driver code, along with the modified firmware. I have forgotten to mention that, in the meantime, I have assembled one new version-2 dongle board, which is now patiently waiting for me to finish with kernel driver bug-hunting in order to program and test it! So now — unless of course the new dongle blows in smoke on my face — things feel really like approaching the harbor, aren’t they?
Update, April 22 (later): The IN part works now (of course, after fixing another nasty copy-paste bug — I was using the OUT USB endpoint instead of the IN one). The audio quality is perfect in both playout and recording. It’s not ready, of course: I need to add the hook state checks, dtmf state, statistics, and a whole lot more (especially the echo canceller, which is one more thing that I have been sweeping under the rug all this time). However, by now the “ship” is in a “safe harbor”: Open USB FXS works with Dahdi, full stop. So, this is going to be my last update for this post. Prepare for the next one — titled “Hello, Asterisk!” or something similar (I ‘m sure you are guessing correctly it is going to be about, aren’t you?).
I am writing this post in order to provide a quick update on all open fronts of the project. These are (a) the readers’ possibility of obtaining a development board/DIY kit in order to help in advancing the project, (b) the dongle board and its issues and (c) the “channel driver vs. dahdi-compatibility” saga. There’s news in all of these fronts — but don’t hold your breath, I am not going to announce anything really spectacular.
First, let me finish off with the poll (that I have once more included above). I have now set a closing date on it, which is by the end of the current week (March 28th at noon in my timezone). As of today, there are already twelve replies. Wow, a team of twelve prospective developers/testers is already a small army! After the poll closes, I will try to (no promises yet) do all that is required to give to the readers who are interested to help the possibility to order a prototype at the bare cost of materials+p/p+assembly (the latter only for ready-made boards). Details on this will follow, as I need to check my options: for example, if I can convince a local e-shop to run an errand for me, I might make prototypes/DIY kits available through their web site, otherwise I could use eBay and PayPal — but I ‘ll announce more on that later on.
My second piece of news is that I have debugged (well, somewhat…) the situation with the heat dissipation issue in my dongle board. By means of the popular among electronics enthusiasts “touch-n-burn-your-fingers” technique, I was able to trace the heat source. And — surprise! — although the 3210 gets hot, it is not the primary source of heat as I had thought. It is the line driver chip (Si3201), which is mounted on the bottom side of the board, that dissipates the most heat. The 3201 gets hot quickly, and its proximity to the 3210 makes the latter get hot quickly as well. Why didn’t this show up on my large-form boards? Because there, the hotty 3201 is relatively isolated from the other sources of heat dissipation (PIC, 3210) and has also a very good heat sink — a large ground-level copper area on the bottom side of the board (this is the large-form one I am talking about).
Back to the dongle, I am not a specialist in computing the thermal resistance of PCBs, however I think I can do a couple of things to redesign the dongle in order to fix the problem. The first thing to try is to relocate the line driver chip and place it as far away as possible from the 3210. The second is to use a four-layer board and to make the medium two layers into heat sinks. This would work by adding a large-hole “via” underneath the 3201, connected to the thermal pad which is already there. A generous blob of solder would serve as a thermal conductor between the thermal pad and the middle layers. Then, I could use the same method (thermal connection between layers through large-hole vias) to create “heat egress points” to the two surface layers of the board, onto some ground areas located as far away as possible from the 3210.
Other ideas include (i) an adhesive heat sink mounted on the Si3201 (see picture on the left — it’s not very expensive, but it will defeat the low-profile design of the bottom-side of the board) and (ii) redesigning the circuit and the board so as to revert to the discrete transistor-based output stage, as shown in the Silabs reference design. I tend to flirt with the idea (ii), since it might reduce the cost of the board. However, it is quite a lot of work, plus it is a challenge to fit another twelve or so components onto the tiny dongle board, so I really don’t know for now — maybe later…
Note: all the above babbling means that if you choose to manufacture your own dongle before I devise, test and publish some of the above thermal fixes, you risk working with two hot chips, and this may end up in these chips working unreliably or even burning. Probably the quickest patch here is using a heat sink, so I ‘ll try this ASAP and report back the results.
That having been said, let me now get to the dahdi-driver-versus-channel-driver saga.
The last few days I have been recap’ing my reading of the Dahdi drivers. As it turned out, after having written my own device driver for the board (or a first attempt thereof whatsoever), I was able to understand much easier what is going on in the Dahdi world. [Note, since the structure of the older predecessor of Dahdi, Zaptel, is very similar, for the rest of this discussion I ‘ll go on with Dahdi and assume that the same things more or less hold for Zaptel as well.]
So, I studied a bit wctdm.c at first. This is the source code of a Linux module implementing a device driver for a family of Digium PCI-based FXS and FXO cards which use Si3210. Some readers may remember that I have, ahem, “borrowed” some initialization values for the chip’s indirect registers in my test code from this very source. Modulo a few changes in register values (e.g., 1 instead of 0 for the TXS/RXS direct registers, different values for the indirect registers that monitor line output transistor power level, etc.), the code that handles the 3210 in this file could just as well manage my board too. On the other hand, this is a PCI driver, while mine is a USB one. I ‘ll come back to that in a few paragraphs.
Then, there is another kernel module, dahdi-base.c. In terms of the Linux module hierarchy, this module exports symbols for a set of common functions used by all other hardware-dependent device drivers. For example, and unlike what I have done with my “openusbfxs” driver, the fops (file operations) section that includes open(), release(), read(), write() and friends is implemented in dahdi-base and not on hardware-dependent device drivers like wctdm.c.
Now what I found very interesting is that the only functions that are triggered by userland requests and need to make it through to the hardware-dependent device drivers are ioctls. Read(), write(), poll() and friends are implemented entirely in dahdi-base.c and do not require any sort of “hooks” in the device drivers. As for read() and write(), there is a totally asynchronous interface between the hardware-dependent drivers and dahdi-base. Here is how I think this works.
Each hardware device driver like wctdm.c implements a dahdi-compliant device structure, which in turn contains a set of channel sub-structures, with one such sub-structure for each actual device that a card implements (remember that a card may implement e.g. four or eight FXS or FXO interfaces — I won’t discuss trunk cards, like E1/T1 for now). The hardware device driver implements a h/w interrupt-level automaton (for example, in the case of wctdm.c this is triggered by PCI IRQs) that inputs and outputs audio data at the pace of the hardware. The device driver reads and writes data to some buffers in the device structure and then invokes two functions, dahdi_send() and dahdi_receive().
These latter functions implement a smart circular structure, made out of a set of buffers. The read() and write() syscalls that are implemented in dahdi-base.c read data from / place data to, respectively, these same buffers which are alternated between dahdi-base and the hardware device driver. This buffer structure does not really require locking between the device driver and dahdi-base, because buffer “ownership” is only modified by the device driver, and this happens only at interrupt level (when the device is ready to read or write more data), by invoking dahdi_send()/dahdi_receive().
This looks very similar to the way that that my openusbfxs device driver works! The main difference in my current openusbfxs driver is that data are not pushed to or pulled from an “upper-layer” driver like dahdi-base, but are instead interfaced directly to the read() and write() syscalls. Because however of the fine-grain locking involved, it may turn out that my driver is imposing some overhead that dahdi does not have. In other words it seems that, as an amateur device driver writer, I may have introduced far too much complexity into my design and things could become considerably faster by avoiding locking altogether, like Dahdi does.
Hmmm… Presumably, with my experience from writing the “openusbfxs” driver, I could utilize much of the code in wctdm.c, substituting the PCI interface with the USB core interface, and removing much of the fine-grain locking that my driver is based on. Since making my h/w driver visible to the Linux filesystem level is not needed, I could remove the fops section altogether. For I/O, all I would need to do is invoke dahdi-send() and dahdi-receive() as soon as a URB arrives or is ready to ship, respectively. Finally, I would need to implement the ioctls with their Dahdi names (excluding some LED/lamp flashing device-specific ioctls) — and that’s largely all there is to it! [OK, there is also the echo cancellation that I need to take care of, but I think this won’t be very hard to add a posteriori].
Which means that, unless there is something big that I am really missing here, I think that rewriting my device driver for Dahdi is much, much easier than writing a channel driver from scratch, especially taking into account that there are tons of functionality already implemented in the Dahdi channel driver that I would need to repeat.
So, the next few days I am going to start in this course, and report as I progress through some major steps (e.g., basic module, USB working, board initialization, Dahdi registration and Dahdi I/O). I certainly hope that the results will be faster than in my previous attempt, but if I were you, I would not hold my breath. I have tried things in the wrong direction before (and this blog is the very proof of that, just check some of the older posts) and it is not unlikely that things go wrong this time as well.
As usually, I ‘ll be updating this post as I go on, so you may want to re-check this post periodically to see (if and) how work is progressing.
Update, March 29: The poll is now closed. The results are on the top of this page. I have not yet decided how exactly I should proceed, but I ‘ll let readers know really soon. If anyone intended to participate to the poll but have missed the closing date (or has just found out about it past the closing date), no worries: if I produce boards or kits, I am going to leave some headroom and make larger quantities available.
Besides that, I am redesigning the output state of the dongle, to make room for on-board heat sink areas for the 3201, as far away from the 3210 as possible. Here is the current stage of the redesign:
As you can see, the chip is now placed underneath a relatively empty top-side area, which will be covered by the GND fill polygon. Hopefully, this area will conduct much of the generated heat to the air surrounding the board. Moreover, all this area is near the RJ11 plug, and hence close to an opening in the dongle’s case, that will provide some ventilation if needed. Finally, since now the top PCB side near the hot chip is almost clear of components (there are still some remaining ones that I need to move away), a normal external heat sink could be mounted directly on the top side GND copper area if needed.
BTW, the Dahdi-compatible kernel module is also underway. Currently, it just loads but doesn’t do anything useful yet (not even invoke dahdi-base functions). Finishing my next dongle design attempt, I ‘ll definitely get more active on that — stay tuned…
Update, March 31: the heat-revised dongle design is now ready. Here is what it looks like:
As you can see, I have placed a lot of “free copper” on the top side of the “hot area” of the board, in the hope that this will be enough. In addition, I have removed the solder mask from the most part of this area in order to ease heat radiation. If all this proves to be insufficient, there is also enough room to add two adhesive heat-sinks (this is mainly why I have removed the solder mask). I hope all these will suffice, but in order to be sure, I ‘ll order and assemble one or two prototype PCBs. If these prove to work OK, then I ‘ll be ready to go on with ordering the necessary parts for DIY kits or ready-made boards.
Update, April 6: In the dongle front, I have ordered a set of two/three prototypes for my new design (shown above) and am waiting for them to arrive. Salva has provided a very useful startup version of project shopping basket in Mouser.com (see his comment in this post for more information). In the driver front, I am now rewriting some of the initialization stuff and doing a lot of thinking about other parts of the code. Here is a question if anyone knows: when I get notified of a urb completion which means that I have received some (typically, four or eight) 1-millisecond chunks, should I call dahdi_receive repetitvely to deliver all the received data to dahdi-base (or, contrarily, should I deliver one 1-millisecond chunk per system “tick”)? In the meantime, if you are really interested in the dahdi driver, a very useful resource is Tzafrir Cohen’s page on Dahdi-Linux and especially the low-level drivers section. It seems that many of my questions throughout writing my new module will find answers there.
Update, April 8: I had forgotten to upload all my changes including statistics gathering, modifications to test programs to display statistics, SOF profiling, and the fix to tmr1_isr.asm that synchronizes the board’s clock to the USB SOF. They are all now uploaded to the project’s Google code page. Please note that by now the changes to the openusbfxs kernel module are in a sense obsolete, since the focus of the project has now moved into creating a dahdi-compliant kernel module; however, I am going to use nearly everything from my old module, so it’s a good idea to review the changes if you are actively interested in the code. In the new module front, I have stumbled upon this bug (I am developing on a 2.6.26 kernel and kbuild environment) but have found two workarounds: (1) compile from the top dahdi directory, having the new module in a subdir of $(TOP_DAHDI)/drivers/dahdi and adding the env variable SUBDIRS_EXTRA, and (2) copying the Module.symvers file that is generated in $(TOP_DAHDI)/drivers/dahdi after compiling dahdi into my own module’s directory and issuing a “make” in that directory. So, I am now able to insmod my new module. In a matter of days, I am going to report on my first tests (and crashes, if any :-)).
Update (same day, later on): the new module recognizes the board (as expected) and registers itself successfully with dahdi-base. Here is what dahdi_scan reports:
Not bad at all, is it? Probably, to be consistent with Dahdi numbering, some ‘0’s should read ‘1’. Also, location “??? FIXME” is printed because what is expected there is that the driver report the USB path of the attached device, but I haven’t put the necessary effort to fix that yet.
Update, April 9: My new prototype dongle boards (3 of them) have arrived and look OK. I am now refreshing my BOM and going to order some parts that I am missing.
Update, April 13: The board initialization with the new driver is now complete (well, almost: the initial URB submissions to get the isochronous engine rolling aren’t yet in place, but I ‘ll add that soon). I have implemented two new module parameters taken from wctdm.c. The “reversepolarity” does what it says, i.e., causes the driver to use reverse-active linefeed mode (Si3210 direct register 64). The “lowpower” parameter instructs the 3210 to work with an on-hook voltage of 24V and a ringer peak voltage of 50V, as originally hinted by Edwin in his comment (see also my last reply to that comment). BTW, it seems that the low-power mode indeed results in less heat dissipation on the 3201 — thanks for the hint, Edwin! Now probably I can work somewhat longer with my (un-revised) dongle without fearing that it will burn on me. I have left out the tx- and rx-gain parameters for the time, as I have also done with all the MWVI code and module parameters (MWVI stands for “Message Waiting Visual Indication”, and is used to flash a light bulb on a phone with such a bulb installed when a voice message is waiting; I guess I could use the board’s LED for a similar visual indication, but I ‘m not going to do this right now, maybe later…).
Update, April 14: I have now ordered materials for three rev-b dongle prototypes. I guess that my first rev-b prototype will be assembled by the end of this week (will it work though? fingers crossed…). In the dahdi driver front, I am ready to write the code for the isochronous engine. It still bugs me that I am not sure how often I need to tick dahdi_send()/dahdi_receive(). Theoretically, these two should be ticked once every millisecond. However, in dahdy_dymmy.c (the HR-timer based dahdi timing module), the code seems to tick dahdi_send/receive four consecutive times every four milliseconds. So, presumably, I could do the same, by calling the two functions N times at each URB completion time without incurring too much inaccuracy [it might help to remind that in isochronous USB it makes sense to transfer N (N ≥ 4) packets per URB, so I get a completion callback only once per N milliseconds]. Thus, I ‘ll make my first attempt along these lines, and I ‘ll report on how good it will work.