More Real-Time Windows NT

Update: 9/9/98

Another issue with using Windows NT for real-time operation has come to my attention (I may cover this better in a future tip). There is no control over the disk cache in Windows NT and this can cause problems if you don't have enough system RAM. In a 64 MB system, NT will use almost all of available memory for the disk cache. This means that if you have a time critical process running and are also accessing large disk files, your real-time process can be swapped out to the paging file and suffer large timing delays. While you can lock your application pages into memory, this isn't an easy solution (because you would also need to lock all the DLLs that your app calls). The best solution I have found currently is to add more memory to the system and keep file access to a minimum. It also appears that accessing files over a network may keep cache use to a minimum. The folks at System Internals have covered this cache problem in detail and even supply some information that may allow you to control the cache operation.

Update: 2/3/98

The Road Runner crash discussed below was fixed by using their latest driver, version 1.2. It appeared to be an issue with dual processors only. I tested the board on a single processor system and it would not crash.

The 2 millisecond timing resolution also seems to be related to either the dual processor kernel or possibly a change introduced in the NT version installed on this machine. Both my single and dual processor systems run version 4.0 with service pack 3, but the older single processor system has 1 millisecond timing resolution.

The long delays in the thread making the calls to the National Instruments NI-DAQ library routines appear to be delays in the routines. Another thread I created without any I/O calls still has delays but they are at most about 20 milliseconds. The fact that the NI thread would have larger delays on I/O calls suggests that something in the kernel or NI driver is serialized and blocks the calls for short periods. The average time they block appears to be about 1 millisecond, with much longer delays on occasion.

The Original Story Follows

I recently started a major real-time project using Windows NT and now have some more benchmark information. I am using a dual Pentium II 300 MHz computer and can put one of the real-time threads into a spin loop to test how long it gets interrupted. This can't be done on a single processor system because it will lock up the computer, but with two processors the spin looped thread gets assigned to one processor and the other processor is free to run other threads of lower priority.

WARNING: don't run a spin loop thread on a single processor system or it will hang. You may experience hangs when running certain applications even on a dual processor system.

Test Setup

The system I am running is a no-name-clone with a Tyan S1692D Tiger 2 ATX motherboard, two Pentium II 300 MHz processors (keeps my office nice and warm) and 96 MB of high speed DIMM memory. The operating system is Windows NT 4.0 with service pack 3. All programs are developed with Borland C++ Builder 1.0.

The system has several I/O interfaces linked into the program:

  • National Instruments DIO-24 
  • National Instruments TIO-10 
  • BitFlow Road Runner Digital Camera Interface (industrial cameras)


The National Instruments cards are interfaced using the latest version 5.1 NI-DAQ drivers and the BitFlow is interfaced using version 1.0 of their software. This is a real test with real I/O.

Spin Loop Test Results

For all the tests, the main program is switched to the real-time priority class and the test thread runs at the time critical priority. This is the highest possible priority. Both the DIO-24 and TIO-10 interfaces use threads to periodically read and write the digital I/O on the cards. The TIO-10 thread is the one I used for testing. The Road Runner card appears to add 2 threads into the system and I had no control over their priority. I have run benchmarks on the program and it appears these threads don't consume any CPU time unless the Road Runner is active. In the tests, I kept the Road Runner inactive except to specifically test delays it added to the system. I used the timeBeginPeriod call to speed up the system timer for 1 millisecond resolution. The QueryPerformanceCounter call is used to obtain timing information. The GetTickCount call can't be used due to the extremely short times being measured.

Here is the code used to set the process priority class and speed up the system clock for 1 millisecond ticks:

SetPriorityClass(GetCurrentProcess(),REALTIME_PRIORITY_CLASS);
timeBeginPeriod(1);


This is the basic code for the spin loop thread's Execute method:

LARGE_INTEGER t0,t1,f;
double MinTime,MaxTime,AveTime;
int NTime;

void __fastcall TTioWorkThread::Execute(void)
{
  #define TIME_DELAY 1
  i16 i;
  bool outflag;
  HANDLE hTE;
  LARGE_INTEGER tLT,t;
  double dt;
  int SkipCount=50;

  hTE=(HANDLE)CreateWaitableTimer(NULL,false,"TIOWait");
  if (hTE!=NULL) {
    tLT.QuadPart=0;
    if (!SetWaitableTimer(hTE,&tLT,TIME_DELAY,NULL,NULL,true)) {
      CloseHandle(hTE);
      hTE=NULL;
    }
  }

  MinTime=1024.0;
  MaxTime=0.0;
  AveTime=0.0;
  NTime=0;
  QueryPerformanceFrequency(&f);
  QueryPerformanceCounter(&t0);

  while (!Terminated) {
    Sleep(0);  //this releases execution to equal priority threads

    QueryPerformanceCounter(&t1);
    t.QuadPart=t1.QuadPart-t0.QuadPart;
    if (t.QuadPart>0) {
      dt=t.QuadPart;
      dt=dt/f.QuadPart;
      if (SkipCount>0) SkipCount--;
      if (SkipCount==0) {
if (dt>MaxTime) MaxTime=dt;
if (dt<MinTime) MinTime=dt;
      }
      NTime++;
      if (NTime<0) {
NTime=1;
AveTime=0.0;
      }
      AveTime+=dt;
    }
    t0=t1;
  }

  if (hTE!=NULL) CloseHandle(hTE);
}


I am able to get a readout of the minimum, maximum and average thread period on this dialog:

tip6 1

The dialog above shows the basic spin loop timing. The loop operations are seen to take 5 microseconds, but when the mouse is moved, it is interrupted for 795 microseconds. Here is a table of basic operations and how they affected the spin loop (all times in milliseconds):

Operation

Minimum Period

Maximum Period

Average Period

Switch Windows

0.005

0.774

0.005

Open Explorer

0.005

4.601

0.005


I tried to load Adobe Photoshop to see what would happen when there was a lot of disk and memory activity, but for some reason it wouldn't load with the thread running.

tip6 2

We can see from this screen shot of Task Manager that one processor is busy all the time running the thread:

Fully Loaded System Test Results

For the next test I disabled the spin loop and added more functions into the system. The TIO-10 thread now waits on a timing object with a period of 1 millisecond. The DIO-24 card's thread is also enabled at the time critical priority and runs on a 1 millisecond period. The Road Runner interface is run in a diagnostic mode to add further load to the system. This arrangement runs 6 threads and they call 3 I/O drivers. Here is the full TIO-10 thread code:

void __fastcall TTioWorkThread::Execute(void)
{
  #define TIME_DELAY 1
  i16 i;
  bool outflag;
  HANDLE hTE;
  LARGE_INTEGER tLT,t;
  double dt;
  int SkipCount=50;

  hTE=(HANDLE)CreateWaitableTimer(NULL,false,"TIOWait");
  if (hTE!=NULL) {
    tLT.QuadPart=0;
    if (!SetWaitableTimer(hTE,&tLT,TIME_DELAY,NULL,NULL,true)) {
      CloseHandle(hTE);
      hTE=NULL;
    }
  }

  MinTime=1024.0;
  MaxTime=0.0;
  AveTime=0.0;
  NTime=0;
  QueryPerformanceFrequency(&f);
  QueryPerformanceCounter(&t0);

  while (!Terminated) {
    if (hTE==NULL)
    Sleep(TIME_DELAY);
    else
    WaitForSingleObject(hTE,TIME_DELAY*10);

    QueryPerformanceCounter(&t1);
    t.QuadPart=t1.QuadPart-t0.QuadPart;
    if (t.QuadPart>0) {
      dt=t.QuadPart;
      dt=dt/f.QuadPart;
      if (SkipCount>0) SkipCount--;
      if (SkipCount==0) {
      if (dt>MaxTime) MaxTime=dt;
      if (dt<MinTime) MinTime=dt;
      }
      NTime++;
      if (NTime<0) {
        NTime=1;
        AveTime=0.0;
      }
      AveTime+=dt;
    }
    t0=t1;

    if (TIO10Form->BoardOpen) {
      TIO10Form->PortOutput[1]++;

      for (i=0;i<NTIOP;i++) {
        DIG_In_Port(TIO10Form->deviceNumber,i,&TIO10Form->PortInput[i]);
      }

      if (UserTIO!=NULL) UserTIO();

      for (i=0;i<NTIOP;i++) {
        switch (i) {
          case 0:outflag=TIO10Form->Port0Dir->ItemIndex==1;
            break;
          case 1:outflag=TIO10Form->Port1Dir->ItemIndex==1;
            break;
          default:
            outflag=false;
        }
        if (outflag) DIG_Out_Port(TIO10Form->deviceNumber,i
        ,(TIO10Form->PortOutput[i]&~TIO10Form->ForceMask[i])
        |(TIO10Form->ForceOutput[i]&TIO10Form->ForceMask[i]));
      }
    }
    else
      Terminate();
  }

  if (hTE!=NULL) CloseHandle(hTE);
}


I am not showing all the global variables here and I hope you can fill in the picture. The port I/O calls perform input and output operations for the 2 I/O ports on the TIO-10. Here is the BitFlow Road Runner running a diagnostic:

tip6 4

The Road Runner is a high performance camera input card. The dialog indicates it is transferring about 87 megabytes per second across the PCI bus and into main memory. This data transfer doesn't seem to load the CPUs very much:

tip6 5

The TIO-10 thread gives this timing with some mouse movement:

tip6 3

Very interesting! Even though I requested 1 millisecond timing resolution, it appears that Windows NT will only allow 2 millisecond timing. I ran the following timing tests with all the interfaces running:

Operation

Minimum Period

Maximum Period

Average Period

Switch Windows

1.033

6.148

1.954

Open Explorer

0.622

9.901

1.958

Open Photoshop

0.359

21.680

1.990

Exit Photoshop

0.512

23.339

1.959


I also tried to run a disk speed diagnostic but when I did, crash, blue screen, panic... This is all that was put into the event log:

The computer has rebooted from a bugcheck

The bugcheck was: 0x0000000a (0xe129e038, 0x00000008, 0x00000000, 0xfd1b650e)

Microsoft Windows NT [v15.1381]

A full dump was not saved.

Now that I have run CHKDSK and fixed some NTFS index errors, I'm back on the air. I also backed up the system to avoid future problems and had to type a few lines into this file that I hadn't saved. The crash seems to be related to a hardware interrupt fault in the Road Runner driver. I still wanted to run disk tests on the hard disk and network drives, so to avoid problems I won't run the Road Runner diagnostic. Here is the timing for these tests:

Operation

Minimum Period

Maximum Period

Average Period

FAT 512K Records

0.349

153.480

2.150

FAT 512K Records

0.366

106.573

2.130

Network 512K Records

0.454

29.200

1.955

Network 16K Records

0.477

8.151

1.953

NTFS 512K Records

0.333

119.444

2.280

NTFS 16K Records

0.347

165.272

2.200

FAT 512K Records

0.348

16.819

2.140

NTFS 512K Records

0.338

267.174

2.660


It appears that writing data to the local hard disk causes some problems. The disk test I ran operates at normal priority and uses the standard Win32 calls for disk I/O. I ran the first test twice because I thought there was a problem with the timing routines. The maximum period only increases to around 20 to 30 milliseconds at first, but later it jumps up to the high value. I think that this has something to do with the disk cache filling up. The system I am using has plenty of memory and it takes a while for the cache to fill up. For the last two tests I used Norton System Doctor to watch the disk cache fill up with the real-time thread and disk diagnostic running. As the cache filled up, the maximum period increased to about 45 milliseconds. The cache stayed almost full for some time, and then jumped up a bit, and the maximum period jumped to 267 milliseconds. The last FAT test never filled the cache all the way and the maximum period was only 16 milliseconds. The last NTFS test produced the worst delay. These large delays seemed to increase the average period and it was slowly decreasing when the reading was taken.

The network tests were over 10 megabits per second Ethernet using a 3COM PCI adapter and another Windows NT workstation. There were only slight increases in the maximum period, and they seemed to be related to the record size. Smaller packets gave shorter delays.

Conclusions

This is a test of Windows NT real-time performance with real I/O drivers. One of the drivers was found to be unreliable during disk operations, but further testing will be needed to verify this. Most operations caused delays on the order of 20 milliseconds. This should not be a problem for AC control systems where solid state relays have a 9 millisecond turn on uncertainty and input relays can have even longer delay times. If there is a lot of heavy disk I/O going on, the excessive delays experienced could be a serious problem.

The smaller delays experienced using a network drive suggest a solution for this problem. Why not store most of the system and your application on a network drive? This solves some backup problems if you keep your server backed up properly. It might be possible to use client-server methods similar to those I use in my products to remove the operator interface from the real-time computer. This keeps local disk and program access to a minimum to avoid the problems caused by scheduling delays. The drawback is that it increases program complexity.

The most important thing learned from this test was that you shouldn't plan on using Windows NT for a real-time application without doing some tests first. In my case, I am not worried because I can move the application to client-server mode if necessary.

© James S. Gibbons 1987-2015