Problems with NetBIOS Drivers

Updated 6/6/97

I reran this test on my office system to verify the problem. I was able to unplug the Ethernet connectors and the programs would not hang! It appears that different hardware can change the results.

1/23/97

I finally got the networked client-server system installed at Roseburg Forest Products. Some other work took priority and they didn't need the network system in a hurry. It was decided to switch them from the older Artisoft Lantastic software they were using to the standard Windows network. Because they didn't want to upgrade the computer too much, we decided to use Windows for Workgroups on the ticket computer and use the free client software provided by Microsoft on their FTP site for the DOS real-time systems. This is the MS Client 3.0 software that ships with Windows NT Server to allow older DOS computers to access Windows servers.

Part of my test certification routine is to set up computers in my office and run the real-time software overnight to see if there are any problems. This is done over several nights because some problems only show up after several days. I use the standard Windows NetBEUI (their brand of NetBIOS) protocol drivers for my office computers because it seems to run the fastest and there are problems with using TCP/IP for both Ethernet and dial-up access at the same time.

The first round of testing showed no problems at all with the DOS client software. The DOS clients run my PROT/RTX real-time system and there are a lot of interrupts going on. The RTC is set to interrupt at 1024 Hz. Not all these interrupts do much work, most of it gets done on a 50 Hz basis. The high interrupt rate is needed for precision timing of the tasks. In such high interrupt environments, typical office software may fall apart.

Just before I was to install the system, I started to notice some problems. The Windows 95 Win32 software would crash once in a great while in the NETBIOS.DLL routine. Considering the page fault was in Microsoft's module and not mine, I assumed there must be something wrong with their end. They use NetBIOS for some of the communications between computers on their network, but most of the work of moving file data gets done in the main protocol. If you install TCP/IP or IPX protocols, the MS software will use the NetBIOS protocols on top of the primary protocols to send messages between the computers to establish a connection. It appears that they have only debugged the parts of the NetBIOS interface that are important to them and haven't tested all functions, specifically some that I am using. In addition to these problems with the Win32 client program, I was also experiencing crashing of the DOS server running on the real-time computer.

I took a few steps backwards at this point and installed the TCP/IP protocol and removed NetBEUI. The TCP/IP protocol stack uses TCP/IP to push basic file data between the computers, but the Microsoft network software uses a form of NetBIOS layered on top of TCP/IP to communicate. My software was able to automatically make use of this NetBIOS layer and communicated just as if the NetBEUI protocol was loaded. With these protocols installed, my crashing problems at both the Win32 client and DOS server ends seemed to go away. After several days of testing, I decided it was safe to install at the Roseburg site.

The installation at Roseburg Forest Products went well and in a few days we had the office computer talking to the real-time computers over about 500 feet of thin coax. The speed seemed to be a bit slower than what I saw in my office, and a few collisions were showing up on the hub indicator lights, but everything seemed to be working well. I hung around for an extra day to look at some other things that we had planned.

And then it happened. One of the real-time computers lost communications with the office computer. I walked out to the sorter and checked the DOS computer. It was locked up at the keyboard but was still running the sorter. This indicated that the RTC real-time interrupts were still in action and processing the real-time data, but the DOS program had gone into an infinite loop of some sort. The ticket printing was part of a background DOS program loop and this didn't appear to be operational. Our only choice at this point was to reset the computer. Because the DOS program was locked up, it wasn't saving the database periodically, and we would need to dump the sorter and start the bins over again. Things weren't too bad because they were doing a custom cut and were only sorting a few packages.

To explore the problem a bit further, we set up three computers in the lab to simulate the combination of one DOS real-time server, a WFW ticket computer server and a Win95 client. As part of this test, I wanted to see what would happen if I disconnected the DOS server from the network when it was trying to write to a file on the WFW ticket computer. I started up the programs in their simulation mode and the DOS program was writing to the ticket computer about every minute. I then disconnected the network T connector from the DOS computer to see what would happen. The DOS computer tried to access the ticket computer but now the network was gone. It just sat there. And sat there...and sat there.

Normally, I would expect the network software to timeout after a while and return control to the program with a file error to indicate it couldn't access the file. If the software was really smart it would not look as long the second time it tried to access the dead network. Smart software from Microsoft?

I let the computer sit disconnected from the network for several minutes and it never timed out. It never came back from the initial DOS call to open the file. I plugged the network connector back in and it instantly recovered and continued running. How interesting! Exactly the failure mode that I seemed to see on the real system. Could the excessive length of the network coax be causing network errors? Could the software become confused and not recover correctly? I had double checked the network topology calculations and length could not be the problem.

I put the older Lantastic software on the test system and tried the same disconnection test. The Lantastic software would timeout after a few seconds and return control to the keyboard.

One of the real sorter computers crashed again. At this point I decided to put the old Lantastic software back in because it had been running for years without trouble. I reconfigured the AUTOEXEC.BAT and CONFIG.SYS files. The systems booted up and started running on Lantastic.

I had to get a copy of the latest Lantastic 7.0 for the office computer running Windows 95 before it could again communicate with the DOS sorter computers. When I finally got it back on the network I seemed to notice a slight increase in the speed over the Microsoft network software. It has been running for about a week now without problems. It appears the network problems are gone now.

In summary, the Windows 95 NetBIOS protocols can't be trusted. The Windows NT NetBIOS protocols seem to be secure (I am testing Windows NT and NetBIOS client-server over TCP/IP at another site, without the DOS real-time servers - so far no problems). The MS Client 3.0 software provided for DOS is unsuitable for real-time work because it can crash in a high interrupt environment and will lock up when disconnected from the network. Artisoft Lantastic 5.0 to 7.0 seems to work well in high interrupt environments, has the added advantage of being faster and will recover from network errors that cause MS Client to lock up.

The new Lantastic 7.0 provides a TCP/IP stack for use with their software, but I haven't tested it yet. I am using their standard NetBIOS protocol stack on my systems. The TCP/IP stack is provided by another vendor. Before using network software on any computer that must perform critical real-time operations, you will need to experiment with network faults and high interrupt rates to determine if there are any problems. Certain combinations of hardware and software can cause problems. My laptop computer uses a PC Card adapter and it seems to have more problems than my desktop computers running NE2000 compatible adapters. The driver software provided with the network adapter card can cause problems too.

The moral of this story: if Microsoft ever takes over the whole software world we will never have anyone else to go to when their junk quits working.

© James S. Gibbons 1987-2015