Archive for April, 2009

Rewriting the firmware…

April 27, 2009

[Note: as of now, I have not yet fully finished rewriting the firmware. However, during the process, I have run into a vast amount of interesting information, which I am sure I will forget in the end. So, I decided documenting the rewriting process in parallel within this post. Probably, when I finish up, I ‘ll edit out this first paragraph.]

I decided to rewrite the firmware because of the following test: one second’s worth of 64kbps-encoded PCM data sampled at 8kHz using 8-byte values (be it linear, μ- or A-law) consists of 8000 bytes. Using 8-byte chunks, this corresponds to 1000 chunks of data being exchanged between a USB device and the PC within a single second. So, I decided to measure how long my current implementation takes to transfer this amount of data from the PC to the Open USB FXS board. And the result was: 32 whole seconds.

In other words, I would need to make my board perform 32 times faster. The current firmware used USB polling and request servicing in a serial manner, so I judged that this was never going to work. I had to make drastic changes.

One thing that I could not help noticing right away was that I had been using the old version of Microchip’s firmware (1.x); however, the company has moved to version 2.x, which is very different in its philosophy. Version 2.x is much cleaner and more abstract than 1.x; the framework files are used from the compiler’s tree and not copied over to the user code (actually, this is not a framework change, but rather an IDE enhancement); the cumbersome multi-level directory structure of 1.x is now gone. Porting my code to the new version proved easy enough, although it took me quite some time.

Porting to 2.x came with a pleasant surprise: the firmware was much, much faster now. My previous test (sending over 1 second’s worth of PCM data) now took only 2 seconds instead of ~32. One good explanation is that the new version uses USB-specific ram as buffer space, thus eliminating data copy operations between USB-specific and “user” RAM. Later, I tried to both send and receive the same amount of data; this amounted to slightly more than 4 seconds. Which, as I was soon to find out, was the best one could hope for.

Soon after, I started scratching my head about isochronous USB transfer. I read and re-read the standard and tried to find something in the firmware and/or the C++ examples from Microchip that could serve me as an example. In the course of studying isochronous transfer, I found out why my four seconds was the lower bound for bulk transfer: the USB Bulk transfer mode requires an ACK for each data packet, and a stream of packets is only initiated after a SOF (start-of-frame) microframe. The USB host sends one SOF per millisecond. So, sending over a data chunk of 8 bytes and receiving back the upstream equivalent takes four milliseconds (at best): one for the OUT data, one for the ACK back to the host, one for the IN data and one for the ACK back to the device. It was clear that I would never be able to perform faster than that, unless I increased my chunk size. A bare minimum of 32 bytes was needed. Not what I had in mind.

This made me search further for isochronous USB transfer. This mode does not require handshake (ACK packets). Again, the upper bound is one packet per SOF frame, but this was exactly how much I needed. Maybe I could provide something like a self-clocked synchronous transfer method as follows: the firmware provides a SOF callback function, which is invoked on every SOF microframe. Then, inside that function, I would send out a chunk using an isochronous IN packet (note that the “IN” and “OUT” is host-side terminology, so a USB device sends IN packets and receives OUT ones), and I would receive an isochronous OUT packet. Lost packets would never incur any timeouts etc. On the host side, I would wait for the IN packet (clocked at the rate of one packet per ms by the pace of the SOF) and reply with an OUT packet. Later on, when this would test OK, I would get rid of the ring data structures in my ISR and arrange the ISR code to mess directly with the USB packet buffers without copying (and also to synchronize with the SOF 1-kHz “heartbeat”).

My tests using the Microchip Generic USB host-side device driver all failed. As I found out soon thereafter, this device driver does not provide isochronous transfer primitives. Using bulk data primitives instead caused the primitives to halt waiting for an ACK which never came; then, of course, the primitives were timing out. No, definitely not good. I had to look further.

Googling the subject brought me some interesting discussions. Mostly this thread on the Microchip users forum site shows the efforts of some good men to make the PIC speak isochronous. In a nutshell, they all seem to agree that (1) both the PIC and the firmware can do fine with isochronous transfers, (2) Microchip’s Generic Host-side device driver is inadequate because it does not support isochronous transfer primitives. One example also mentioned libusb-win32 (available here) as an isoc-capable driver. So, I tried changing the device driver in my host-side controller.

[BTW: the Linux kernel supports isochronous transfers, so I am on the safe side with that – maybe it’s time to move to Linux? We ‘ll see…].

Apart from some minor issues (that I overlooked for the time), porting my host-side Windows source to libusb-win32 was not that hard. However, the tests I tried produced mixed results. Many times the IN isochronous primitives were failing, and there was no good explanation for that. If I only did IN without OUT transfers, then one every two primitives worked OK, while the other one failed. Intermixing OUT and IN transfers, the IN success rate was much, much lower, whereas the OUT success rate stays at 1:2. As of now, I have not yet resolved the issue.

In the meantime, I re-read the thread I mentioned above and noticed that some people report success using the Cypress CyUSB host-side device driver. This is another direction in testing: I needed to port once more my code to Cypress. Having done this once, I thougth I might be able to do it again.

Update, April 29:  partial success using libusb-win32! One thing I had gotten wrong was that libusb expects isochronous transfer requests to be submitted using large buffers that the library fills in (or drains out, for the OUT direction) at its own pace [BTW, checking out CyUSB, it seems to work in a similar manner]. An interesting question then is, how can the user-level program synchronize with these asynchronous primitives? Obviously, one cannot wait until a large buffer is filled in and then proceed, since this would defeat the purpose of using small-size isochronous transfers. Libusb gives out a nice (though totally undocumented) way of doing it: one can ask the driver about the amount of data transferred so far (with the usb_async_reap_nocancel() primitive); if the number returned is larger than last time, the user program can safely assume that one more chunk has been transferred.

A delicate point with this method (and one of the library’s inadequacies) is that, if the IN-pipe misses a packet for some reason, then the returned value is not updated (although the pointer in the provided buffer is incremented, thus a missed packet creates an unidentifiable “hole” in the buffer). Even after receiving the whole buffer, other than using packet integrity checksums etc., one cannot really tell where the “hole” is, because there is really no signaling mechanism from the driver to the user space to inform on the event that a packet was missed.

So, I devised a method whereby I query the OUT-pipe for its progress, and assume that the IN-pipe will be ready at the same pace. This seems to work, in that the OUT-pipe is synchronized with the SOF frames and thus the transmitted byte count is incremented at a steady pace. This is nice, in that every millisecond I get the chance to run the relevant user-level code that checks for IN-packet data in a near-synchronous manner.

However, my success with this method was only partial: using a USB sniffer program I observed many missed IN-packets. It is really hard to tell what causes this. There are two suspects: the firmware and the driver. The firmware might come late in transmitting packets, or the driver may miss them whatsoever, although they are transmitted OK. A reason for the first scenario is the large portion of the time that the PIC spends into the TMR1 ISR, so it might respond late to the SOF. Clearly, I need more trial-and-error work here; I ‘ll report again later, by updating this post. [Nevertheless, I feel not so worried about all these little problems; a driver in kernel space should be able to handle isochronous transfer in much more efficient ways; for the time, I only need to make sure that the culprit is not my firmware].

Quick update, later on the same day: bypassing my ISR (by adding an immediate return instruction) makes the above test work fine with libusb! Now I can even rely on the reap_async_nocancel() from the IN pipe catching all packets, so the isochronous IN-pipe can be used reliably for timing purposes in the user code! So I guess the explanation is that, because the USB is serviced in poll mode and the ISR in its current form takes much time, the firmware gets to transmit the isochronous IN packets late some of the times. But, as I have already said somewhere earlier, I need anyway to rewrite the ISR and get rid of much of the code in there (ring buffer management and so on), so probably I ‘ll come up with an acceptable tradeoff between ISR time and USB polling. If not, I can still devote some ISR cycles to fire up the isochronous USB transfer from within the ISR in a synchronized manner (see next paragraph).

Moreover (and this something I have keeping in my mind, carefully sweeping it under the rug so far), I need anyway to synchronize the ISR to the SOF frames. This is because I will need to swap USB/PCM buffers around (preferrably using the PING-PONG buffering method of the PIC, which I still need to try out) after the first 8 PCLK-cycles of my 32-cycle ISR (remember that PCM audio I/O between the PIC and the 3210 occurs during these first 8 cycles, so buffers had better stay untouched at that stage). All this sounds like quite a lot of work — only this time it seems doable, whereas up to now, things felt more like in a survival-in-the-jungle bootcamp. The “Hello, World” milestone seems now closer than ever before!

Update, May 7: lots and lots of re-planning, strategy changes, rewrites from scratch and finally, isochronous INs (from the board to the PC) work fine! To cut a long story short, I decided to quit my early ideas of mix-n-match between interrupt and USB polling, and to rewrite my TMR1 ISR from scratch in order to make it fully capable of handling isochronous PCM I/O. Early trials were disappointing, in that I saw no packets at all coming from the board. However, when I addded code to synchronize once between the SOF frame and the IN isochronous packets, I started seeing some packets on the PC. With the aid of a USB sniffer, I noted that, sooner or later, my ISR was missing the right time frame to send a packet. This led me to a very tiresome and difficult debugging of my ISR, until finally I trimmed it not to miss a single clock cycle. It now works fine! So I plan to write a post dedicated to the ISR code, while in the meantime I will be progressing my PCM audio trials.


Halfway through getting PCM to work

April 13, 2009

At the time I started writing this post, I did not have a sure answer as to whether the board’s PCM was working correctly; however, debugging was fun, so I decided to start writing without finishing first. To take things in order, as soon as I got the board to ring a phone set, the next thing I did was to augment the functionality of the controller so as to display direct and indirect registers together. Here is the result (the rightmost cluster of values are the indirect registers — and, yes, I know a descriptive text label is missing there, but I was too lazy to add it).

Controller capture, now with indirect registers

Controller capture, now with indirect registers

After finishing with that (and re-soldering the crystal, which decided right then to break loose on one of its pins, giving me almost a heart attack when the board suddenly died on me), I turned to PCM audio. The test scenario I had in mind was to make the board reproduce an audio message on the phone. To that end, I implemented two more firmware functions, one for sending and one for receiving chunks of PCM audio data (there are lots of things to discuss here, but I am leaving this discussion for a bit later). Supposedly, these functions would write(/read) data  to(/from) the output(/input) ring(s) in the ISR area, respecting the input and output ring pointers so as to not overwrite any data. Then, I wrote a piece of code in my controller program which would open a file with PCM μ-law data and use the new function to send these over USB. As simple as that.

Am I catching anyone by surprise here by saying that this didn’t work? I guess not… Letting aside a really stupid bug (in the first tries, an off-by-one error resulted in a 0xFF ‘RESET BOARD’ command being sent over, so instead of producing audio, the thing was rebooting!), nothing but “line noise” could be heard on the phone. A quick first glance through the 3210 datasheet revealed I had forgotten to set DR 1 to 0x28 (set PCME bit to enable PCM audio). After that, the phone started producing a clicking sound that did not even distantly remind of the original audio. OK, better than nothing, I agree, but still not what I wanted.

I then played with firmware, instructing the board to ignore the data sent via USB and just send out zeros. The clicking sound should disappear, but it did not. Looking carefully through the firmware code, I found that I had incorrectly configured the DRX and DTX PIC ports (DRX is the receive PCM path for the 3210, not for the PIC). Fixing this (it took me two trials, because these were also wrong in the TIMR1 ISR assembly code) made it: the clicking sound disappeared. Good! [I am not sure what was causing the clicking sound. My most plausible explanation is that, since both the 3210 and PIC were placing the DRX line in high-impedance state, the line was acting like a small antenna and collecting noise from some other nearby signal on the board.]

Then this time PCM ought to work, right? Well, it did not. So I decided to preload the output ring with some data. This did not fix it either. This was suggesting that my ISR code was not correct in sending out PCM data. I then changed all BSF (bit set) and BCF (bit clear) instructions driving the DRX line with BTG (bit toggle) ones. Of course, the result would not be audible, because this represents a constant value (0b10101010, or 0xCC) being sent to the 3210; however, the resulting pattern should be easily distinghuishable on the oscilloscope. But what I saw on the glass was certainly not what I was expecting. Here are the PCLK and FSYNC signals. The resolution is such that more than two full ISR cycles are displayed:

PCLK        FSYNC(you may wish to click on the pictures to examine them in full size). Here is the puzzling DRX signal:

DRX signal

DRX signal

This was certainly looking wrong. To remind you about my TIMR1 ISR, the code was supposed to distinguish four ‘phases’ within a FSYNC period. All PCM I/O should occur during the first phase; instead, it seemed that the ISR code responsible for the first phase was executing three times. Back into the PIC datasheet, I found my bug: I was checking the C (carry) status bit after decrementing a counter value from zero to 0xFF with a DECF instruction; instead, I should have checked the N (negative) status bit. Fixed that, re-checked with the scope, and — voilà!

Fixed DRX test signal

Fixed DRX test signal

Of course, I would not leave  without measuring the DTX signal (PCM input from the phone to the 3210 to the PIC). Here is what this looked like:

DTX signal

DTX signal

There are two things to note here. The first is that the actual data seems to consist of a constant 0xFF pattern (which seems OK, since in u-Law encoding this corresponds to a decoder output of zero). The second noticeable thing is the ramp-like pattern to the right of each 0xFF logic-true. This can be explained by the fact that during non-transmission periods, 3210’s DTX pin goes tri-state, and the respective PIC input is also tri-stated. So, this pattern probably corresponds to a high-frequency signal while the energy captured in the transmission line between the 3210 and the PIC gradually discharges through a high-impedence path to the GND level.

What seemed encouraging here was that, when I spoke to the phone, I was able to note some “noise” in the data part, with the ramp-like part consisting of lower-placed ramps. This fits nicely with the above theory, since actual u-Law data contains some zeros, corresponding to the “noise” in the data part and an equally lower ramp-like “trajectory” to the GND level being displayed thereafter. So, the PCM receive path (although called DTX, it is the receive path) was rather working, although I was not collecting any data yet.

What about the transmit path? Well, that was not ready yet. It took another day or two until I noticed that the underrun test condition I had provided for in the ISR code was not ever taken care of. In order to avoid echoing, I had also provided for an execution path that stops transmitting PCM if a data underrun condition is detected. When removing that, I finally heard the 125-Hz (which is 1 / 8ms, consisting of a repeating pattern of test data) test sound I was expecting. But what about the actual audio data? Well… The pace at which the controller currently sends data to the board is very slow.

Actually, this is the interesting part! In contrast to writing, say, a flash disk driver, where I would be able to use 1-kB blocks, in PCM audio one needs to pass about small “chunks” of data in an isochronous manner.  If the chunks get too large, then there will be considerable (and audible) delay introduced in the audio path. If the chunks get too small however, an overrun or underrun condition gets more likely to occur. So, what is the best chunk size there, given the actual processing power of the PIC? And, most important of all, is the PIC fast enough to cope with these requirements?

To answer these questions, I took a fresh look at the USB code sample by Microchip, upon which my code has been built. It seems that the sample code is not really very effective. Lots of functions that call other functions, which in turn copy data around — very wasteful. The PIC can perform at warp-speed when doing USB transfers, using parallel dedicated hardware. But then, data are placed in a special RAM location, and the sample code copies this to “user space”, freeing the buffer back to the USB hardware. Thus, although I am not sure yet, it looks like I can save me lots and lots of wasted PIC cycles by throwing away my transmit and receive rings and interfacing directly between USB memory and my ISR.

Another, more mundane problem that has been getting away so far is that the FSYNC pulse needs to be shifted one PCLK earlier: I am raising and lowering PCLK within the first ISR cycle, but the 3210 datasheet on p.16 (and other places) is clear: PCM transfer starts at the first rising edge of PCLK after the falling edge of FSYNC. Hmm… This means that I need to move the code that pulses the FSYNC at the 31st cycle of my ISR.

Summarizing here, the good news is that PCM path of the board works both ways (well, so to speak…). So, it is time to look more closely at the actual firmware of the board and optimize the transfer paths, while at the same time keeping an eye at zaptel compatibility. It cannot be that hard, can it? So, I hope to be back soon with even better news!

Update, April 15: there seems to be a PIC USB configuration, called “ping-pong buffering” which seems to fit nicely my needs (although I am not yet entirely convinced about that). That thing works by using odd and even-numbered buffers, whose ownership is alternated between the CPU and the chip’s USB engine with a single bit change (in other words, very fast). More can be found in the PIC 18F2550 datasheet, p.177. Currently, I am studying that in parallel with Microchip’s USB stack code to check how easy it will be to adapt the sample USB stack code provided by Microchip into what I need.


April 6, 2009

Yes, you guessed it right! The title of this post means that I finally got my openusbfxs board to make a phone set ring! But please let me take things in order.

In the point where I had left things in my previous post, I was trying to implement one-by-one the initialization and calibration steps described in p.3 of Silabs’ AN35 application note. Most steps were relatively easy; however, when performing the gain mismatch (manual calibration), I noticed the error due to R7 mentioned at the end of my previous post. After fixing that, I tried to move on with common-mode calibration (steps 17 — 19 of AN35). However, there I was getting a calibration error.

In order to debug the issue a bit further, I tried to bypass this step and see what would happen if I set the line mode (register 64) to the “forward active” state. What actually happened was that the 3210 objected to that, and kept stubbornly the line mode in the “open” state. Hmm…

A quick look into the Si321x FAQ found me the same question (second question on p. 6 of the FAQ). Unfortunately, it did not get me the correct answer, since my DC-DC converter values where allegedly OK, and I had just finished with manual calibration. What was the reason then?

For almost one week thereafter, I ran just every test I could come up with against the board. First, I tried to bypass automatic return to open state (set AOPN bit in DR 67 to zero). This produced a very interesting result: when I attempted to bring the line to “forward active” mode, the DC-DC converter was auto-shutting down. I found no way to instruct it not to, so I had to find another way of figuring out what was wrong.

Enabling power alarm interrupts (DR22 <- 0xFF) enlightened me a bit more, in that I saw in DR 19 that I was getting a power interrupt because some (or, on occasion, even all) of Q1, Q2, Q3, Q4, Q5 and Q6 were sensed to dissipate too much power. OK, it was clear then: I had to fix the initial values of Indirect Registers (IRs) 32–34 and 37–39. I have to admit that I had borrowed clues for these values from the zaptel driver, so it made sense trying to find the correct values for my case. In my board I am using the 3201 and not discrete transistors, so I had no idea what the power and thermal coefficients should be for that. The answer was well hidden in the bottom of p.4 of Silabs’ AN47, where some values are suggested (the same as for SOT89 transistor packages).

At this point, debugging should have ended. Well, it did not, and the reason was a stupid error of mine, as I am explaining in the next paragraph. So setting the correct values for Q1–Q6 did not solve it; thus, after measuring countless hours with the voltmeter, devising tens of test sequences in my driver code, trying pumped-up values for IRs 32-34, and just about everything else I could imagine, I finally chose to change-ineer the si3201, just to make sure it was not burnt or something. Of course, as usual, change-ineering did not do it either (advice: choose this method as your last resort, and only if you cannot find anything else to do: it will not fix anything, but it will make you feel better because at least you tried it).

Deeply despaired, I swore strict abstination from debugging my board for the whole last weekend. Guess what: it seems that this method made it: today (Monday), taking a fresh look at my code, I finally found the culprit: because of a stupid copy-paste error, I was not initializing correctly any Indirect Registers (all values were written into the same IR)! BTW, this is my second copy-paste-due bug which takes me this long to find and fix. [It seems I must not copy-paste any more code and promise to type in every single bit. Statistically, this will save me weeks of fruitless debugging.] Anyway, right when I fixed this, everything worked magically. So now I passed steps 16, 17, 18, 19, 20 and 21 of the AN35 calibration and initialization procedure. It then sounded like a good idea to run a few tests with a phone set.

The board initialized OK when I connected the phone set I have at work (a Siemens euroset 2010) to the RJ11 pin. After initialization, taking the phone off-hook is detected in DR 68 (however, putting the phone back on-hook is not, so I need more work there). The “usual” line noise of a POTS phone line is heard from the phone’s earphone when DR64 is set to 0x01 (“forward active” mode). Moreover, with its current settings, my board can make the phone set ring! Just setting register 64 to 0x04 does it! Wow! That was actually my first milestone, back in year 2008 when I first started designing the board! I can’t really believe it took me six months or so to get here!

There are still however some signs that I don’t like. The board does not seem to understand when the phone goes on-hook again. The (absolute) values of the voltage produced by the DC-DC converter go far below the nominal 65V during operation, and I don’t know if this is normal. So, during the next (few, I hope) days will have to deal with these issues and correct any bugs I find.

As soon as this is finished, it will be PCM’s turn: I will try to have my board produce the good-old asterisk’s “Hello, World!” message onto my phone. Keeping in mind my current progress pace, a date for that next milestone should not be expected anytime sooner than year 2010 :-). This time, however, my hope is I ‘ll not piss off the Gods of hardware as much as I have been until now, so that they will help me reach this next milestone somewhat faster. We ‘ll see.

You might just as well ask yourselves what’s down the road. Well, I think that, once (read: if ever) the board gets into a stable state, it will be an easy step to write a zaptel-compliant driver and see how the board will do with Asterisk. This then, if ever accomplished, will be the end of development for this project.

I will update this post as soon as I have the updated versions of the TIMR1 interrupt code, fixed board, schematic, BOM, etc. uploaded.

Quick update, April 8: about the reduced VBAT value in the forward active mode: this is OK, since this is exactly what setting TRACK to 1 does. In this “loop current tracking mode”, the chip provides just the necessary voltage to drive enough current through the loop, which presumably results in lots of power savings. So I need not worry about this. What’s not OK though is the back-on-hook non-detection — but I have not started debugging this yet.

Another quick update, April 9: by setting register 67 to its default 0x1F, now the board detects correctly the transition from off-hook back to on-hook (don’t ask me why, I cannot see any plausible reason for this).