Unix to Windows Porting Dictionary for HPC

Links

Function List

Porting Tips


High-Performance Computing (HPC) Resources

Microsoft has provided a great deal of resources to assist developers working with High Performance Computing (HPC) applications. These include:

Additionally, the UNIX Migration Project Guide discusses many of the details of moving to the Windows platform from other operating systems.

Platform-specific Compiling

When modifying code during porting, it is almost always desirable to make changes in such a way that the code continues to build on the original platform. The usual way to do this is to use preprocessor conditionals for the compiler; directives such as #ifdef and #ifndef. Be careful, however, and make sure your conditionals test for operating system or compiler as appropriate. Just because a code is being built on Windows (detectable with "#ifdef _WIN32") doesn't mean that the changes specific to Visual Studio need to be used; the code might be compiling on Windows, but using GCC through Cygwin or MinGW.

To test for Visual Studio, use the _MSC_VER macro; for example,

// If we are building with MSVC, include the Windows headers;
// otherwise, use the POSIX unistd.h header
#ifdef _MSC_VER
  #include <windows.h>
#else
  #include <unistd.h>
#endif

A complete list of macros defined by Visual Studio 2008 can be found on MSDN at http://msdn.microsoft.com/en-us/library/b0084kay.aspx.

unistd.h

Many applications written for Unix include the header file "unistd.h"; this header provides access to the POSIX API. Since many versions of Windows does not (by default) include POSIX compatibility, this header file is a good "red flag" when developers are examining source code during the porting process. If a source file includes unistd.h, it will almost certainly require modification to build with Visual Studio.

Environment Variables

Unix and Windows handle environment variables quite differently, and this can be a cause of confusion and frustration for developers porting software between platforms. Windows has two types of environment variables: system and user. The system variables can only be set by users with administrator access; in contrast, user variables are set by users and are contained in their profiles. The total environment is the union of these two, with some special rules for variables such as %PATH%.

The overall %PATH% variable used by Windows is constructed by combining the user and system %PATH% variable, by putting the system %PATH% first, and then the user %PATH%. This means that users cannot override %the system %PATH% setting. For developers dealing with programs that require %or expect certain environment variables, the best strategy is to never put anything in the system's %PATH% that users might want to override. In contrast to the way many Unix systems are set up, administrators cannot set up default paths and allow users to override them if needed.

The setx application allows users to create or modify environment variables, and is available on most versions of Windows.

Shared Libraries and DLLs

Using DLLs can be confusing to developers used to working with shared libraries in a Unix environment. Thankfully, there are several good tutorials available on the Web to help programmers come up to speed quickly on creating and linking to DLLs, including:

File Permissions and Ownership

Windows NT and derivatives don't use file permissions the way developers accustomed to programming in Unix may expect; instead, they use Access Control Lists (ACLs). ACLs are more powerful than the Unix-style read/write/execute designations for user/group/other, but they are also more complex. Some resources discussing ACLs on Windows in greater detail can be found at

Using #ifdef for Conditional Compiling

You can use '#ifdef' to help you keep a single source code base over multiple operating systems or versions of your application.

There are 5 pre-processor directives:

  • #ifdef
  • #ifndef
  • #if
  • #else
  • #endif

These work with defined environment variables. The environment variables can be set system-wide, or specifically defined for a build with the "-D" option. For example,

-D DEMO_VERSION

defines the environment variable "DEMO_VERSION" just for this build. With an environment variable set or defined, conditional source code can be used. For example,

	#ifdef DEMO_VERSION
		check_demo_expiry();
	#else /* DEMO_VERSION */
		check_license();
	endif /* DEMO_VERSION */

This will check how much longer the demo will work when DEMO_VERSION is defined. Otherwise, when DEMO_VERSION is not defined, the application will use the code to check the license.

You can have many conditional expressions in your source code. You can conditionalize the source code for operating systems as well. For example, #ifndef WIN32

Note that with the '#else' and '#endif' used in the example above a comment follows containing the name of the variable being tested. This is good coding form and should be followed to improve code readability, maintenance and understanding. This form becomes important when larger block of source code are conditionalized or when many nested conditionals are used.

There is no "#else if" directive, but the directives can be nested. For example,

	#ifdef DEMO_VERSION
		check_demo_expiry();
	#else /* DEMO_VERSION */
	  #if PREMIUM_VERSION */
		premium = 1;
	  #endif /* PREMIUM_VERSION */
		check_license();
	#endif /* DEMO_VERSION */

You can check that if an environment variable is defined to a particular value:

	#ifdef DEMO_VERSION == 1

Also you can used logical evaluations and brackets for order evaluation:

	#if (DEMO_VERSION == 1 && WIN32 == 2)

You may explicitly have a conditional active (evaluate to true), "#if 1", or inactive (evaluate to false), "#if 0".

Using Include Files When Porting

Include files can contain information that is both standardized to fixed values and standardized to system specific values. The purpose of include files is to improve the readability and the portability of source code.

Many standard include files will continue to be used when moving source code form Unix to Windows. An example of this is the "<stdio.h>" file.

Many of the Unix include files are for the POSIX API's. Naturally, many of these include files will not have an identical name match or even an obvious alternative name. When the situation arises that you find one of these include file when porting to Windows this is often an alert that there is source code that will require changes or source code that will have to be conditionalized to build on different systems. Examining the source will help reveal what actions can be taken. You can reference the Dictionary entries to find matching, alternative or workaround source code in many instances. In some instances the paradigm differences between the two systems will mean a different coding approach will be needed for the include file and the source code. An example of this is the use of signals on Unix systems: the include of "<signals.h>" is an alert that the program uses Unix signals for which there is no direct functional match due to Windows having a different paradigm.

Other entries in the Dictionary may help with a resolution when there is a paradigm difference regarding certain include files.

Library Differences

When moving code from Unix systems to Windows the source code typically relies on one or more libraries (outide of system calls). Porgemas written in the C language, as an example, will always use the C library. The C library is standardized by the IEEE for what minimally must be included. Many Unix systems blend additional API's into the C library such as helper functions that are specific to the local system that are not portable. Sometimes other libraries, such as the "libm" math library, are meregd with the C library. Between different flavors of Unix these variations will result in minor adjustments to build files (i.e. makefiles) and some conditional source code to account for available API differences.

Many supplementary libraries used by Unix applications are included with the base distribution of the system. If they are not included then they are usually readily available from the Internet. An example would be the OpenSSL library that is included by most Unix systems, but not all.

When moving source code to Windows the standardaized libraries are, of course, available. Non-portable APIs may not be available (usually for obvious reasons). When APIs are not directly available then close or similar alternative API's may be available. Some investigation may be needed here in the Dictionary, through MSDN or a search engine to rectify these APIs.

Some APIs may not be contexually appropriate on Win32 because of the paradigm difference between the two types of operating systems. In other sections of the Dictionary this is discussed in more detail. This type of source code will need to be changed or conditionalized to work with Windows.

Windows supplementary libraries matching the Unix supplementary libraries may be available from the Internet or a 3rd party vendor. While some supplementary libraries that are funcctionally equivalent may be available for Windows, some have very different APIs. An example of this is the SSL library typically used on Unix versus the SSL library available on Windows.

Unix Libraries vs. Windows Libraries

On Unix systems there are two major types of libraries: static and shared. Static libraries are often called an archive. Programs linking with static libraries include the required API's directly into the the program. Programs linking with shared libraries have a reference including into the program about which libary the API will be found in. At loadtime the program's shared library and any dependencies that libarary has are loaded. A variation of the shared library is the dynamically shared object (DSO). A DSO is loaded by a running program (runtime) only when the program requests it be loaded by some internal decision. A program does not need to load all, or even any, DSO's when run. API's that do the work are typically named 'dlopen', 'dlclose' and 'dlsym'.

On Windows systems, not unlike Unix, there are two major types of libraries: static and dynamically linked. Static libraries are handled the same on WIndows as on Unix. Dynamically linked libraries (aka DLL's) are similar to Unix shared libraries, but not identical. The similarity is the DLL is referenced from the program when it is linked, but not loaded until loadtime. Windows can also load objects at runtime (versus loadtime) though this seems to happen less often than with Unix systems. The API's that do this work on Windows are typically 'LoadLibrary', 'FreeLibrary' and 'GetProcAddress'.

One of the key differences between Unix shared libraries and WIndows DLL's is that the library information is stored differently. On Unix systems all of the information is stored in the one file (the shared library). On Windows systems the information is stored in two or three file (depending). While this does not create any difficulties when building an application it can have an effect when debugging a program. A Unix program will look for debugging information within the library (which will be resolved) while the Windows program usually has the debugging information external to the library (a different file). The debugger used on WIndows may need to have file paths set to find the file that contains the debugging information.

Operating System Paradigms

There are many different operating systems that have been created over the history of computing. Some have been minor differences between them while other have quite radical comparisons. For may of these operating systems their design has been strongly influenced by the purpose and context of that purpose. For example, an operating system with the purpose of controlling a production assembly is different than an operating system for playing games. It is not that one cannot serve the purpose that the other does, but rather one is better suited to certain actions because it has been designed to meet contextual requirements.

Unix and Windows are two operating systems with different paradigms. Both have common elements such processes and user accounts. But how these are implemented, even with minor differences, begins to show their different paradigms. Usually minor paradigm differences cause little to no source code changes. However, as the differences become more central, or tenet, to the operating systems design the more divergent the paradigms become.

Major paradigm difference with Unix and Windows affect how programs can be designed, implemented and exceuted. Some of these paradigm differences are historical while others are from the very beinging of the design of the operating system. These historical and base design decisions are what cause operating system to diverge.

On Unix systems the sending of signals between processes is a central method of communication between processes. Windows does not use any signals at all. In a similar comparison Windows will use events to create an alert about a particular condition, but this is best used between threads of the same process. On Unix the creation of a new process is often as light as creating a new thread, but much less complex from a programming position. In contrast creating a new thread on Windows is much lighter than creating a new process. Just these two contrasts illustrate how the two paradigms of these operating systems begin to diverge. This divergence is what influences source code design changes when porting application between the two operating systems.

Some other points that influence the paradigms differing include, but are not limited to: 1) Scheduling differences for threads and processes. Windows programs tend to run faster when threaded than when multiple processes are used to do the same work. The reverse holds true for Unix programs. 2) Synchronous versus Asynchronous I/O. Windows is structured to work better with Asynchonous I/O while Unix is structured to work better with Synchronous I/O. This is another factor that strongly influences using one process with multiple threads on Windows in contrast to many processes not threaded on Unix. 3) Different Resource Usage. There is a lower resource use when adding threads to a process on Windows versus starting a new process. On Unix the use of resources to start a process is as low or sometimes lower than a new thread. 4) Windows does not maintain a parent-child relationship. 5) Windows does not have process groups or sessions 6) Windows processes do not have controlling terminals 7) Windows file handles can not be used as Unix file descriptors are. That is file handles can not be passed through all interfaces that accept file handles.

Neither of these paradigms is necessarily better or worse than the other. They are different paradigms which means a different approach must be taken with each when attempting certain actions with a program. For programs that do not involve areas where there are paradigm differences the source code will be structured the same and the executable code will run with the same speed and efficiency. An example of this would be a simple loop that counts. Forcing or layering helper functions to create functionality as expected on another platform can save a lot of porting time. The effect on resource use and efficiency can vary from minimal to significant. Effects that are more significant usually involve paradigm differences that are greater between the two operating systems. However, correctly adjusting the source code for the paradigm difference can yield performance improvements.

I/O Considerations

When porting source code from Unix to Windows the easiest action to take is the most minimal set of source code changes possible while still maintaining the program's behavior.

A minimal set of changes with the Unix I/O API's write(), read(), open() and close() to Windows API's would be to prefix each of these API's with an underscore. So change write() to _write(), read() to _read, open() to _open() and close() to _close(). This preserves program's behavior and reduces indirect changes as well such as having to change the variable for the Unix file descriptor from an 'int' to a 'HANDLE'. These Windows functions do have a resource cost for time and memory because they translate the actions to native Windows API's. The result is the program works correctly but does not excel in performance.

When porting programs between systems the first goal is always to obtain behavioral correctness. If the program cannot perform correctly, even at a fast speed, it is not much use. A secondary goal is to improve performance or at least restore it to a performance level equal to the previous system. Source code changes for improving performance will require significant changes. Some changes may require logical changes to the code because not all API's map one-to-one or are closely matched.

The Unix open() API is a good choice as an example to explain the changes needed as an API change and as a logical change. There is not a native API matching one-to-one on Windows. Instead the CreateFile() API must be used. Whereas the Unix API has 2 (maybe 3) arguments, the Windows API has 6 (maybe 7) arguments. The mapping of the arguments does match for filename, is very close for file access and similar for file permissions (if the file does not already exist). The remaining arguments in a Unix context are not required, but are required on a Windows system. Making this change will take time, but is not impossible. Additional changes will need to be made for the variable type or the existing arguments and the new arguments created. Further making the changes allows you to tailor the code, especially the error handling, to your program. This will provide lower overhead costs and better speed than the _open() API because there will be less stack used, fewer context switches and less code to run. Examples using other API's follow a similar argument.

The "underscore" functions are available to help ease the code porting. As such they are really generic helper functions. Generic helper functions must handle all possible argument boundaries and error result mappings. Creating your own helper functions can result in source code that is more tailored; fewer boundary checks and fewer error result mappings. Personal helper functions can improve your programs speed and efficiency over the generic helper functions. There is not an endless availability of generic helper functions so you likely would need to create some of your own or change to Windows API's complete in those sections of source code. If you do have personal helper functions the it is worth the time checking and verifying the source code for resource use and speed.

Unix source code typically will block after requesting data to be read or written. The source code will continue, synchronously or in a linear logic, after for the immediate handling of any errors. On Unix systems this organization is very efficient for resources and also results in faster programs because of the manner that context switching happens between processes and the kernel as well as how threads in a program are used. On Windows there is a paradigm shift where the structure of Unix code just described results in a slower program. Windows is designed to work fastest and most efficiently with a program having several threads and data I/O being asynchronous (non-blocking). Typically, faster Windows programs will make a request for data to be read or written using one thread of a program but not wait for the result. Instead another thread will be alerted to the data event finishing and handle whether the event was a success or not. Neither system is better or worse for the method that works most efficiently on it. There are fundamental difference in how each OS is structured, works and the methods available. As a result this create the paradigm that scaffolds the best organization of source code on that system for use of resources and speed.

The resource and time cost comparison between Unix and Windows systems for process creation and thread creation is significantly difference. Again, the difference is a result of their paradigms. While structuring a program using one organization or another is not wrong in priciple, it can be more or less efficient to use a particular organization on a particular system. Unix system have a very low overhead for duplicating a process (fork) and then running another program (exec). In fact the cost is low enough that it is very close to the cost of creating a thread. A chief benefit of creating another process is the independence a program has by having all of it's resources isolated just to itself. A thread in comparision must share all resources with other threads. In particular the resources of memory and variables. In contrast, on Windows, the cost of creating a new thread is very low while the cost of creating a new process is much more expensive. These costs are additional aspects to the fundamental paradigm differences between the two systems. This creates a very important and strong reason for organizational change in programs being ported from Unix to Windows for resource use and speed. While straight-forward and/or simple programs may not reveal the costs of the different paradigms, more complex programs will expose the costs. Typically when a major difference in speed is noticed programmers and systems architechs will examine the source code line-by-line looking for a method to improve the speed and efficiency of the program. However, as just explained the better approach is an overall change in the organizational structure.

API Usage

Typical software projects that have a single code base will use #ifdef's to separate Windows and Unix source code. Some macros may also be used to identify function calls with the macros being conditionalized with #ifdef's. Often the underscore version of the Unix API will be used on Windows if it is available (e.g. write() to _write()). These are often called helper functions. While this minimizes source code changes and maintains close behavior, this is usually has a performance cost. Performance can usually be increased by calling the underlying Windows API's directly. The cost here is the change in code will be significant and maintaining a single code base will be more difficult. Refer to individual API Dictionary entries for more specific information.

Threads

There many models for creating and managing threads. In fact some computer systems have more than one model available. The model most used on Unix machine is "Pthreads" which is a compression of the full name "Posix Threads". Posix Threads is an official standard with the IEEE. The Pthreads standard includes specification for all of the API's that can be used. Windows uses its own threads model. The Windows and Pthreads models are close enough with functionality that the majority of API's are very close. However, the available number of API's with both models is extremely large meaning the API's that are not close in functionality can cause significant re-coding time and require organizational changes to the source code when porting.

When initially porting source code from Unix to Windows that uses threads the first concern always is maintaining behavioral and functional correction of the program. There are three possible approaches to porting threads: use a portability library, translate Pthread API's to Window API's in place and restructuring and rewriting the source code to use Windows API's. The final approach is certainly will consume the most time and effort, but be your objective over a longer time period. The middle solution will also consume a large amount of time and effort as well as increasing the risk of introducing new problems or regressing older problems. The first solution requires very little time and effort, has a very low risk for introducing problems, and requires few if any source code changes. But the first solution will use more system resources and likely not run as fast as a port to using the Windows API's directly. However, the time and effort difference between the middle and first solution is several orders of magnitude. That is, the difference of weeks or months versus hours.

As such the best recommendation for an initial port of Unix source code with Pthreads is to use the portability library. This will reduce the initial effort to get the program running with correct behavior while other porting issues are addressed.

The paradigms of the Unix and Windows operating systems is different enough that one-to-one mapping of thread API's is not clean nor easy. The long term planning for the program when porting from Unix to Windows will likely require a organizational change to the source code in addition to the changes in API specification. If you would like to maintain code portability between the Unix and Windows systems then you should investigate the use of Microsoft's Subsystem for Unix-based Applications (SUA) which is a peer subsytem to the Windows subsystem (aka Win32) on the Windows OS because it provides the support for Unix API's and behavior including Pthreads.

For more information on threading, see