Thread-Safe ACIS

From DocR23

Jump to: navigation, search
Showdoc.png




Contents

Introduction

A useful discussion about Thread-Safe ACIS (TS-ACIS) and multi-threading in general requires an understanding of the terminology used to describe it. The following section helps lay a foundation and provides basic definitions of the terms used throughout this technical article.

Performance

The purpose of multi-threading is to improve application performance by utilizing available processors. Improvements occur when simultaneous processing (also called concurrency and parallelism) is achieved in areas of code that have been modified to support threads (parallel regions). Good designs allow performance to continually improve as the number of processors increase (scaling).

Ideal scaling is achieved when performance increases linearly as the number of processors increase. Super scaling occurs when performance improves more than ideal scaling. In practice it is difficult to achieve and probably unrealistic to expect ideal scaling. Managing thread interactions (synchronization), and contention for shared resources using mutual exclusion logic (mutex, critical section), introduces overhead.

Performance is also affected by the Operating System's ability to schedule threads effectively (affinity, for example) and by each processor's ability to manage its caching mechanism. Multi-threaded applications have a higher probability of reducing cache effectiveness by writing to and reading from shared memory locations (cache evictions). Optimal cache use contributes significantly to application performance.

Implementing parallelism at the highest possible levels in algorithms typically provides better overall performance improvements (coarse versus fine-grained parallelism). Parallelizing an operation that takes one percent of the total processing time will obviously be less beneficial than targeting a higher-level operation that consumes a greater portion of the total time (Amdahl's Law).

Work Distribution

Distributing work effectively across available processors is crucial to performance and involves various strategies. Giving each thread an equal size chunk of work is a form of block distribution. A block distribution of 100 units of work using two threads results in the first thread receiving the first 50 units and the second the last 50. This approach is advantageous when the units of work are roughly equivalent in complexity or the complexity is distributed evenly throughout all available work. If not then one thread might get stuck processing the most complex work while other threads finish more quickly and become idle.

Letting each thread iterate through all available work with a fixed step factor is a form of cyclic distribution. A cyclic distribution of the above scenario implies a step factor of two (the number of threads) and results in the first thread processing all even units (0, 2, 4, 6,...) and the second all odd ones (1, 3, 5, 7,...) This approach is advantageous when the complexity of each work unit varies and the distribution is unknown. It is less likely for one thread to get stuck with the majority of complex work.

Another form of work distribution is called dynamic distribution. Here each thread takes the next available work unit in sequential order. This approach utilizes processors very effectively but has the expense of asking for the next available work unit, which involves some form of synchronization and introduces overhead. Other forms of work distribution exist but are less common in practice.

Shared Resources

Thread-safe and multi-threaded applications must take the proper steps to avoid the incorrect use of shared resources, such as global and static data. Failing to do so leads to incorrect and non-deterministic application behavior. Two threads writing concurrently to the same shared data location (that is, a race condition) will produce unexpected results.

Modifying a variable typically requires multiple machine instructions. For example, incrementing an integer requires a load, increment, and a store. Two threads executing these instructions concurrently will obviously produce the wrong answer. The increment operation is not atomic, in that it can be further divided into multiple machine instructions, as is the case for most operations.

Each thread must either be given unique storage locations (thread-local storage) for shared resources or the data must be protected so that only one thread at a time can access it (mutex). Care must be taken when using mutexes to avoid situations when processing cannot continue because threads are blocked trying to acquire mutexes the other holds (dead lock).

A mutex is a synchronization primitive used to restrict access to a specific region of code to one thread at a time (critical section). Mutexes are used to eliminate race conditions in accesses to shared resources and in operations that consist of multiple operations that must execute serially (atomic operations). An overuse of mutexes can impact performance negatively.

Thread Management

Thread management involves creating and destroying threads, employing threads, and the synchronization of all interactions between threads. OpenMP and Thread-building Blocks are examples of commercial thread management systems. These systems greatly simplify the use of threads in applications by providing uncomplicated interfaces and flexible scheduling strategies. OpenMP for example provides a simple way to utilize available threads to concurrently process iterations of a for loop. Many other systems are available but are beyond the scope of this document.

ACIS provides a thread management system tailored to the needs of thread-safe ACIS. This system utilizes a producer/consumer queue, which is an example of a thread management strategy with desirable properties. For one, responsibilities are well defined. The master thread adds work to the queue (that is, it produces work) and worker threads remove work from the queue and process it (that is, they consume work). This strategy also has the potential to scale well. The ACIS Thread Manager can be used with other thread management products and must be used with multithreaded functionality in ACIS, for instance Multithreaded Faceting.

Overview

TS-ACIS is a version of the ACIS modeler that distinguishes between threads (that is, is thread-aware) and supports concurrent modeling operations given a set of rules. This allows developers to enhance application performance by taking advantage of available processors in compute intensive workflows that are conducive to threading.

TS-ACIS contains thread management, thread-local storage, and mutual exclusion functionality. These are the fundamental building blocks needed to implement parallel regions. Developers can use these capabilities to incrementally improve very targeted and specific areas of their code.

TS-ACIS uses thread-local storage to maintain separate and independent modeler states for each thread. This thread-specific storage contains all global variables such as SPAresabs and all ACIS options. Threads neither interact nor interfere with each other because of the separation, as each conceptually has a private instance of the modeler.

Threads are not permanently tied to models and models are not tied to threads. In other words, one thread can be used to load a model and after that a different thread can be used to facet the model. Care must be taken to not operate concurrently on the same model. Each operation must have exclusive access to the model data. Furthermore, each model must be on a unique history stream. This eliminates race conditions and supports transactional operations.

TS-ACIS is currently available for Linux and Windows architectures only. On Windows architectures, it is based upon Windows threads; on Linux architectures, it is based on POSIX threads. Its core functionality utilizes standard thread synchronization, critical sections, and thread-local storage functions. This makes it compatible with other tools sets such as OpenMP and the Windows threading API. The use of the ACIS thread-manager is not strictly necessary, but is highly recommended.

When to use TS-ACIS

Workflows that lend themselves well to multi-threading are compute intensive and manipulate independent data. A good example workflow is loading and faceting a collection of models (that is, an assembly). Restoring models from disk and faceting models are compute intensive operations on independent data. Using multiple processors can drastically reduce the time required to load and visualize assemblies, which is a common workflow. We call these multi-model or assembly operations.

Another workflow that can work well is model analysis. Here one may perform many operations like point in face or entity point distance calculations on specific portions of a model, such as on faces. The complexity is due to the fact that these operations are not read-only and cannot be performed concurrently on dependent data (that is, on the same model). However, when the number of operations exceeds the overhead of copying portions of the model, then multi-threading can be beneficial. These are single model operations that benefit from multi-threading by utilizing deep-copy routines to make model data independent.

A key distinction to recognize between multi-model and single-model operations involves the results or output of the operations. Faceting multiple independent bodies concurrently results in facet data attached to each model. A result that is consistent with a serial implementation. Deep copying faces of a single-body model to facet concurrently results in facet data attached to the face copies, which are temporary by nature. Steps must be taken at the end of all single-model operations to either process the data before it is deleted or move it back to the original model.


How to use TS-ACIS

Thread Manager

We recommend the use of the ACIS thread manager for numerous reasons. It is simple to use, easily understood, and tailored to ACIS. Other thread management schemes are supported but they may hinder collaborative efforts and complicate the process of reporting incidents.

The ACIS thread manager implements a producer/consumer queue with the following behaviors. Only the master thread can add work to the queue. The master thread only produces work for the queue when worker threads are available, otherwise the master consumes as it produces, which restores serial behavior. Also, work is never overproduced. In other words, only enough work is produced to occupy available worker threads.

The thread manager has one pure virtual function called process. Implement this method in derived classes that perform the work and produce the results for the operation. This is the only method that executes concurrently when worker threads exist. The thread manager run method, which is called only by the master thread, will schedule the process method to be called by worker threads and returns. It directly calls the process method when worker threads are not available.

Thread Manager Implementation Steps

It is very straightforward to add parallel regions to algorithms using the ACIS thread manager. The steps are:

  • Identify the potential parallel operation and associated data.
  • Encapsulate the independent inputs and outputs of each operation into a data structure (work packet).
  • Develop a class with a method that processes each work packet.
  • Add a method that performs the independent operation on work packets.
  • Add a method to combine all the outputs from the work packets.
  • Derive the new class from the thread manager base class: thread_work_base.
  • Ensure the correctness of the operations without threads.
  • Ensure the correctness of the operations with threads.

Thread Manager Example

The following is a trivial example of using the thread manager. The code adds all the whole numbers from one to one million. The serial code is shown below:

int run()
{
    int answer = 0;
    for ( int i = 1; i < 1000000; i++ )
    {
	answer += i;
    }
 
    return answer;
}

To modify this example to use the ACIS thread manager we must first decide how to distribute the work amongst available threads. The type of work in this example, an addition operation, has equal complexity and distribution. It therefore lends itself well to either cyclic or block scheduling. We will chose cyclic scheduling, where each thread begins at a unique point and increments to its next value by adding the number of threads involved in the calculation. This works well in this situation because it doesn't matter in which order the numbers are added and it requires the least amount of information to be passed to each thread.

We begin by deriving a custom class from the thread manager base class thread_work_base. To this we add an array of integers to hold the thread specific totals. This will give each thread an independent location to store its data where it is easily accessible from the master thread. We have sized this array to the maximum number of threads that ACIS can utilize using the MAX_ACTIVE_THREADS symbol, which is currently set to 1024.

We also add an integer to hold the number of threads used in the calculation. This makes sense because we only want to calculate it once as it is then essentially a read-only value. We must consider that the master thread will not do any of the work because it only produces work when worker threads are available. In this case we must subtract one from the total. Note that the value will be one when no workers are available (because the master thread will do all the work) and when only one worker is available (because the one worker will do all the work).

The run method of the derived class, which by the way does not have to be called run, has the same external behavior as the original function in that it sums all the numbers from one to one million and returns the answer. The fact that we use threads to compute the answer is hidden in the implementation. The method first calculates the number of threads to use, then schedules work (puts it in the queue) by calling the base class run method; passing along the starting index for each thread. It then waits for all threads to complete the operation, adds the computed values together, and returns the answer.

The process method of the derived class, which is called by worker threads when the thread management system schedules the work in the queue, receives the starting index passed through the run method as input. This represents the first of the numbers to be added together. We initialize the value where the answers are placed to zero, then add together all numbers of the set each thread is responsible for by incrementing our step by the number of worker threads.

The example below demonstrates cyclic scheduling. As an alternative we could have used block scheduling, where each thread works on an equally sized and contiguous data set. In this case we would have divided the one million numbers by the number of threads, letting each sum the contiguous values from some staring point to some stopping point. The code would not have been much different. The pros and cons of choosing one over the other are discussed below.

class my_thread_work : public thread_work_base
{
    int answers[MAX_ACTIVE_THREADS];	
    int num_threads;				
 
protected:
 
    void process( void* arg )
    {
	int index = (int)arg;
	answers[index] = 0;
	for ( int i = index; i < 1000000; i += num_threads )
	{
	    answers[index] += i;
	}
    }
 
public:
 
    int run()
    {
	num_threads = thread_count() == 1 ? 1 : thread_count() - 1;
 
	int i;
	for ( i = 0; i < num_threads; i++ )
	{
	    thread_work_base::run( (void*)i );		
	}
 
	thread_work_base::sync();
 
	for ( i = 1; i < num_threads; i++ )
	{
	    answers[0] += answers[i];
	}
 
	return answers[0];
    }
};

This trivial example is valuable because it demonstrates common practices one encounters when developing parallel code regions. These are: dividing work, processing work, and aggregating results. The thread manager simplifies the process by providing structure. Roles and responsibilities are clear: only the master thread can produce work and worker threads are only consumers. Behavior is easily understood: the code works with and without worker threads and the process method is the only code that is executed concurrently.

The drawback of this example is that it does not demonstrate the benefits of scaling simply because it does not perform the operation faster with threads. This is because the operation itself is not time consuming enough to warrant the added overhead of thread management. This is an example of fine-grain parallelism that requires bare-bones thread management tactics with little overhead. The ACIS thread manager is designed to be used for coarse-grain parallelism, where thread management overhead is minuscule in relation to the operation.

Thread-Local Storage

ACIS uses internally developed template objects called safe types to implement thread-local storage (TLS). Safe types exist for all basic data types and provide thread specific values for each variable. Safe types must be global in scope and must have been initialized (constructed) before the ACIS modeler is started (or the base is initialized).

The safe type values are initialized to the value assigned during static construction. The master thread's value is stored in a "current value" (currval) variable within the safe type object. (This enhances performance in non-threaded code regions and is useful when debugging.) Thread specific values are located in storage outside of the object and are therefore not directly visible.

Thread specific values are fetched only when safe types are accessed within thread safe regions. These regions are defined by calling the functions thread_safe_region_begin and thread_safe_region_end. The thread manager automatically defines thread safe regions in its run method. The use of regions eliminates the overhead of TLS accesses when they are not needed.

ACIS supports the following safe types:

Name Use
safe_integral_type int, short, char, signed or unsigned
safe_floating_type float or double
safe_pointer_type Pointer to non-class data
safe_object_pointer Pointer to a class object
safe_function_type Pointer to a function

Safe type Example

The following is an example use of thread-local storage with safe types. We use the thread manager to schedule a large number of concurrent calls to the process method of a custom thread work class. The process method, in the first pass, increments the thread specific value of the safe type. In the second pass each thread prints out its accumulated value.

safe_integral_type<int> thread_local_integer;
 
class my_thread_work : public thread_work_base
{
 
protected:
 
    void process( void* arg )
    {
	if ( (int)arg == 0 )
	{
	    thread_local_integer++;
	}
	else
	{
	    printf( "Thread %d scheduled %d times\n", 
		thread_id(), (int)thread_local_integer );
	}
    }
 
public:
 
    void run()
    {
	int i;
	for ( i = 0; i < 1000000; i++ )
	{
	    thread_work_base::run( (void*)0 );		
	}
	thread_work_base::sync();
	for ( i = 0; i < thread_count() - 1; i++ )
	{
	    thread_work_base::run( (void*)1 );		
	}
	thread_work_base::sync();
    }
};

Several aspects of this example are worth pointing out: the safe type is defined at a global scope as required and the safe type supports the operations of the basic type it encapsulates. The former facilitates in proper and timely initialization and storage allocation, the later eliminates the need for special syntax when using safe types.

In fact, safe types can be used just as native types with one known exception. One must explicitly cast safe types in non-type-specific operations, such as when used with printf as in the example above. printf is a variadic function, in that it supports a variable number of arguments. The arguments furthermore are not accessed in a type specific manner, which eliminates the automatic call to the type operator in an object. Hence the need for an explicit cast.

Also worth mentioning is the non-deterministic behavior of this example as shown below. Each thread will most likely print out different count values and the order of the output will be different every time the program is executed. This is largely affected by the Operating System scheduler and is an expected and accepted consequence of multi-threading.

Example output:

Thread 2 scheduled 125008 times
Thread 8 scheduled 125000 times
Thread 3 scheduled 124999 times
Thread 1 scheduled 124982 times
Thread 7 scheduled 125005 times
Thread 4 scheduled 125001 times
Thread 5 scheduled 125001 times
Thread 6 scheduled 125004 times
Press any key to continue . . .

Mutual Exclusion Logic (Mutex)

ACIS uses class objects to implement mutual exclusion logic. The base class, called mutex_resource, encapsulates the OS specific mutex implementation (i.e., the resource). The Windows implementation uses the CRITICAL_SECTION runtime object, which yields CPU utilization of threads waiting to acquire the mutex resource.

The mutex_resource class has methods to acquire and release the mutex resource directly, but is more commonly used in conjunction with another ACIS class object called mutex_object. This class object calls the mutex_resource acquire method when constructed and calls the mutex_resource release method when destructed. This helps assure that the mutex resource is properly released.

Finally, the ACIS CRITICAL_BLOCK macro helps simplify the use of mutexes by automating the construction of mutex_object objects with the proper coupling to mutex_resource objects. This is the preferred method of implementing mutex blocks.

Mutex Example

The following example uses ACIS mutex objects. We use the thread manager to schedule a large number of concurrent calls to the process method of a custom thread work class. The process method simply increments a global integer. Since this is not an atomic operation, we must use a mutex to assure that the operation is not performed concurrently.

mutex_resource my_mutex;
 
int global_integer;
 
class my_thread_work : public thread_work_base
{
 
protected:
 
    void process( void* arg )
    {
	CRITICAL_BLOCK( my_mutex );
	global_integer++;
    }
 
public:
 
    void run()
    {
	int i;
	for ( i = 0; i < 1000000; i++ )
	{
	    thread_work_base::run( (void*)0 );		
	}
	thread_work_base::sync();
	printf( "Accumulated value: %d\n", global_integer );
 
    }
};

This code would create a race condition without the mutex because multiple threads would read from and write to the variable concurrently in a non-deterministic manner. The program outputs different values without the mutex but always outputs the same and correct value with the mutex in place. We use the mutex to force an atomic operation and restore determinacy.

The performance impact of the mutex in this case will be significant because the overhead of acquiring and releasing it is certainly more costly than the addition operation it protects. In other words, we have extreme contention for a trivial operation. It is advantageous to use an alternate approach that avoids synchronization, as in earlier examples where threads increment unique variables.

Multi-Model Operations

TS-ACIS supports concurrent operations on independent data. We define independent data as a collection of entities belonging to a particular history stream, without connectivity to entities in other streams. In ACIS this independent data represents a model. Working with a single model is called part modeling and working with multiple models is typically called assembly modeling. (In this context assembly modeling does not refer to ACIS assembly management functionality.)

Applications that support assembly modeling are good candidates for TS-ACIS because they probably have workflows that are conducive to multi-threading. Loading and viewing assemblies, for example, may be a very common operation. This involves restoring and faceting groups of ACIS models, which is parallelizable.

Multi-Model Example

The following example uses the ACIS thread manager to load and facet models of an assembly concurrently. Much of the implementation is omitted for simplification. The assembly_info class contains a pointer to a linked list of model_info objects. The model_info class contains a pointer to a target history stream, a handle to the model file, and a list to hold top-level entities.

class my_thread_work : public thread_work_base
{
    assembly_info* assembly;
 
protected:
 
    void process( void* arg)
    {
	model_info* model = (model_info*)arg;
	api_set_default_history( model->stream() );
	api_restore_entity_list( model->file(), ...
	api_facet_entity( model->entity() );
    }
 
public:
 
    void run()
    {
	model_info* mi = assembly->first_model();
	while (mi)
	{
	    thread_work_base::run( (void*)mi );		
	    mi = mi->next_model();
	}
	thread_work_base::sync();
    }
};

We implement a run method to iterate through all models of an assembly. The master thread passes the addresses of the model_info objects to the thread manager base class run method, which schedules the work, and then waits for the work to complete by calling the sync method.

The process method, which executes concurrently when worker threads are available, casts its input argument to a model_info object, sets its default history stream to that of the model, restores the model, and facets it.

The model_info class design encapsulates all information required to perform the operation in isolation. We associate a history stream with the model so that other threads can in turn work on the model without mixing history data (mixing streams). Setting the default history stream to that of the model (i.e., activating the stream) is the very first thing each thread must do before working on a particular model.

There may be other model specific information, in addition to the history stream, to consider in scheduled operations. The unit representations, for example, may be different. One model might be modeled in millimeters while another is modeled in inches. This must be accounted for. ACIS provides a modeler_state helper object that currently contains a snapshot of the tolerance variables and option_header values. Use this in conjunction with a custom implementation to represent the true state of a model.

Single-Model Operations

Employing threads in single-model operations is more complex than in multi-model operations because the model data is not independent. Each thread must make a deep copy of the portion of the model it requires to perform the desired operation. The benefits of parallelization therefore must outweigh the expense of copying the data.

One must also consider whether the results of the operation are to be reflected in the original model. Most output data, from evaluation operations anyway, is independent of entity references. A point-in-face operation for example, inputs a face and a position and outputs containment information, which is an enumerated type. This simple output data can easily be stored in the output data object (work packet) for future use.

An entity-point-distance operation on the other hand, inputs an entity and a position and outputs closest point information, containing a pointer to an entity that may be different from the input entity. This type of output data is useless to the master thread when the temporary entity copies are lost. Care must therefore be taken to map the entity of the copy to the corresponding entity of the original.

To illustrate this entity mapping let us consider a scenario where we have deep copied a face for an entity-point-distance operation and receive an edge pointer in the output data. To map the copied edge belonging to the copied face to the original edge belonging to the input face, we can rely on the ordering of the ENTITY_LISTs returned by calling api_get_edges on both the copy face and original face. The mapping is then reduced to finding the index of the edge in the copy list and looking up the corresponding edge in the original list.

Other operations may require entity modification along with history data from worker threads to be merged into the master thread's history stream. This requires many restrictions in order to maintain consistent history information. For example, only one set of changes per original entity are permitted, otherwise the merges cannot be reconciled. Use merge_child_state to merge the worker thread's history stream with the master thread's history stream.

Single-Model Example

The following example uses the ACIS thread manager to facet the faces of an input body concurrently. It demonstrates the use of deep copy to create independent model data, how to use a mutex to avoid race conditions when modifying shared model data, and merging child history stream data into the master stream. As before, much of the implementation is omitted for simplification.

class facet_body_thread_work : public thread_work_base
{
    struct face_info
    {
	FACE* input_face;
	HISTORY_STREAM* stream;
	int status;
    };
 
    face_info* face_info_array;
    mutex_resource mutex;
 
protected:
 
    void process( void* arg )
    {
	face_info& info = face_info_array[(int)arg];
	HISTORY_STREAM* current_stream;
	api_get_default_history( current_stream );
	api_set_default_history( info.stream );
	info.stream->set_distribute_flag( TRUE );
	API_BEGIN
	    FACE* input_face = info.input_face;
	    FACE* copy_face = NULL;
	    api_deep_down_copy_entity( input_face, (ENTITY*&)copy_face, TRUE );
	    api_facet_entity( copy_face );
	    ATTRIB* att = copy_face->attrib();
	    {
		CRITICAL_BLOCK( mutex );
		att->move( input_face );
	    }
	    api_del_entity( copy_face );
	API_END
	info.status = result.error_number();
	api_set_default_history( current_stream );
    }
 
public:
 
    void facet_body_with_threads( BODY* body )
    {
	ENTITY_LIST face_list;
	api_get_faces( body, face_list);
	int face_count = face_list.count();
	face_info_array = ACIS_NEW face_info[face_count];
	int i;
	for ( i = 0; i < face_count; i++ )
	{
	    face_info_array[i].input_face = (FACE*)face_list[i];
	    api_create_history( face_info_array[i].stream );
	    thread_work_base::run( (void*)i );
	}
	thread_work_base::sync();
	for ( i = 0; i < face_count; i++ )
	{
	    if (face_info_array[i].status == 0)
		merge_child_state( face_info_array[i].stream );
	    api_delete_history( face_info_array[i].stream );
	}
	ACIS_DELETE [] face_info_array;
    }
};

The run method, in this case called facet_body_with_threads, accepts an input body, creates storage to hold the data for the scheduled operations (work packets), then schedules the work and waits for it to complete, and finally merges all entity modifications into the masters history stream.

The process method sets the appropriate history stream, deep copies the input face, facets the copied face, moves the facet attribute from the copy to the input face, and deletes the copied face.

Important details to notice are setting the distribution flag on the target stream to TRUE, which allows the stream to log entity changes from multiple streams (mixed streams), and using a mutex via a CRITICAL_BLOCK to remove concurrency while moving the facet data attribute from the copy face to the input face.

Initializing the Thread Manager Example

The following example demonstrates the initialization of the thread manager and additionally incorporates the use of the code that adds all numbers up to one million as shown earlier.

int start_acis() 
{
    return api_start_modeller(0).ok();
}
 
int stop_acis()
{
    return api_stop_modeller().ok();
}
 
int main()
{
    int num_threads = 2;
 
    start_acis();
 
    unlock_license();
 
    thread_work_base::initialize( num_threads, start_acis, stop_acis );
 
    my_thread_work thread_work;
 
    int total = thread_work.run();
 
    printf( "Adding all numbers equals %d\n", total );
 
    thread_work_base::terminate();
 
    stop_acis();
 
    return 0;
}

The master thread starts the modeler, unlocks the ACIS license, and initializes the thread manager with the number of threads to use, as well as initialization and termination routines. Each thread created calls the initialize routine and then waits to take work from the queue. The master thread then constructs a thread work object and calls its run method. This schedules the work to be processed by available threads by placing it in the queue. The master thread then waits for the work to complete, prints out the calculated answer, terminates the thread manager, thereby terminating all worker threads, and stops the modeler.


Internal Use of TS-ACIS

The Entity-Point Distance algorithm and the Faceter can use multithreading. For information on using Entity-Point Distance functionality with multithreading, refer to Using EPD in Multithreaded Mode. For information on the Faceter's use of multithreading, refer to Multithreaded Faceting.

Upcoming versions of the ACIS modeler will take advantage of multithreading to enhance the performance of other algorithms. The implementations will be based on the thread manager described herein. Proper initialization of the thread manager is required for the operations to utilize multiple processors.

Scheme Extensions

Multithreading can be exercised in ACIS using the Scheme AIDE. Four Scheme extensions exist to initialize multithreaded operations, query the number of worker threads or the total number of threads, and terminate multithreaded operations.

Using Native Threads with TS-ACIS

Using the ACIS thread manager is optional, as the fundamental components of TS-ACIS are compatible with native threads, such as Windows threads or POSIX threads. Application developers are free to develop their own thread-management logic or use commercial implementations, such as Thread Building Blocks or OpenMP. Custom implementations are subject to the same rules as with the ACIS thread manager, in that all threads must initialize and terminate the modeler appropriately, and must only work with independent data. Additionally, extra care must be taken to interact with ACIS correctly.

ACIS has two modes: serial and thread-safe. In serial mode, the default, no distinction whatsoever is made between threads. In thread-safe mode, threads are given special treatment. They are assigned a unique identifier and are given access to thread-local storage. With this they can operate concurrently without affecting each other. However, this special treatment impacts performance and should only be used when necessary, which is in parallel code regions. In other words, thread-safe ACIS mode should only be enabled when concurrent operations take place. Use thread_safe_region_begin to enable thread-safe mode at the beginning of a parallel region and thread_safe_region_end to disable it at the end. This assures the proper handling of multiple threads and minimizes the overhead of thread-local-storage accesses.

Note: Toggling the mode is handled transparently by the ACIS thread manager.


POSIX Threads Example

The following example uses Pthreads (POSIX threads) with thread-safe ACIS. It performs ACIS operations concurrently on the main thread and a worker (Pthread) thread. Note:

  1. In termination, the outcome object returned by api_stop_modeller must destruct before the base is terminated, because the outcome destructor accesses thread-local storage that is deleted when the base terminates. This is achieved in terminate_acis in this example by scoping the outcome object such that it is destructed before terminate_base is called.
  2. Thread-safe ACIS mode must be enabled with thread_safe_region_begin before any concurrent operation is performed, including modeler initialization and termination.
  3. Always use the history streams created by the main thread in operations where operational results are merged back into the main thread's history stream with merge_child_state.
#include <stdio.h>
#include <pthread.h>
 
// base headers
#include "base.hxx"
#include "safe.hxx"
 
// kernel headers
#include "acis.hxx"
#include "kernapi.hxx"
#include "lists.hxx"
#include "alltop.hxx"
#include "ckoutcom.hxx"
#include "bulletin.hxx"
 
// cstr headers
#include "cstrapi.hxx"
 
// initialize_acis
// Returns the level of initialization:
//  1 for base, 2 for modeller, 0 for failure.
int 
initialize_acis()
{
    int level = 0;
 
    // Base initalization will among other things
    //  create thread-local storage and a thread ID.
    if ( initialize_base() )
    {
	++level;
	outcome result = api_start_modeller(0);
	if ( result.ok() )
	{
	    ++level;
	    // Add common ACIS state settings here.
	}
    }
 
    return level;
}
 
// terminate_acis
// Terminates to the specified level:
//  1 for base, 2 for modeller, 0 for none.
// Returns 0 for success.
int 
terminate_acis( int level)
{
    if (level>1)
    {
	// This outcome object must destruct before terminate base is called,
	//  because thread-local storage is accessed.
	outcome result = api_stop_modeller();
	if (result.ok() == FALSE)
	{
	    --level;
	}
    }
 
    if (level>0)
    {
	if (terminate_base())
	{
	    --level;
	}
    }
 
    return level;
}
 
// Create 100 wiggles and add them to the given list.
outcome 
do_work( ENTITY_LIST& list )
{
    API_BEGIN
	for ( int i = 0; i < 100; ++i )
	{
	    BODY* body = NULL;
	    result = api_wiggle( 20.0, 20.0, 10.0, 2, 2, 2, 2, TRUE, body );
	    check_outcome( result );
	    list.add( body );
	}
    API_END
 
    return result;
}
 
// Thread-specific operational data object.
struct thread_data
{
    HISTORY_STREAM* stream;
    ENTITY_LIST list;
    outcome result;
    thread_data() : stream(NULL) {}
};
 
// Function called when a thread is launched.
void* 
thread_func( void* arg )
{
    // Each thread must initialize the modeler!
    int level = initialize_acis();
 
    if ( level > 1 )
    {
	EXCEPTION_BEGIN
	    outcome result;
	    // Thread specific data passed as argument.
	    thread_data* data = (thread_data*)arg;
	    HISTORY_STREAM* stream = NULL;
	EXCEPTION_TRY
	    // Save the threads history stream.
	    result = api_get_default_history( stream );
	    check_outcome( result );
 
	    // Set the specified stream.
	    result = api_set_default_history( data->stream );
	    check_outcome( result );
 
	    // Perform the operation.
	    result = do_work( data->list );
	    check_outcome( result );
	EXCEPTION_CATCH_TRUE
	    data->result = result;
	    if (stream)
	    {
		// Reset the threads history stream.
		api_set_default_history( stream );
	    }
	EXCEPTION_END_NO_RESIGNAL
    }
 
    // Terminate the modeler since this thread is exiting.
    terminate_acis( level );
 
    return NULL;
}
 
int 
main( int argc, char** argv )
{
    // Initialize the modeler on the main thread (thread 0).
    int level = initialize_acis();
 
    if ( level > 1 )
    {
	EXCEPTION_BEGIN
	    outcome result;
	    thread_data data;
	EXCEPTION_TRY
	    // Add a history stream owned by the main thread.
	    data.stream = ACIS_NEW HISTORY_STREAM();
	    ENTITY_LIST list;
 
	    EXCEPTION_BEGIN
		pthread_t thread;
	    EXCEPTION_TRY
		// Enable thread-safe ACIS mode.
		thread_safe_region_begin();
 
		// Launch the thread.
		pthread_attr_t attr;
		pthread_attr_init( &attr );
		pthread_create( &thread, &attr, thread_func, (void*)&data );
 
		// Concurrent work on the main thread.
		result = do_work( list );
		check_outcome( result );
 
	    EXCEPTION_CATCH_TRUE
		// Synchronize threads.
		pthread_join( thread, NULL );
 
		// Disable thread-safe ACIS mode.
		thread_safe_region_end();
	    EXCEPTION_END
 
	    // Merge the thread's history stream into parent's history stream.
	    result = merge_child_state( data.stream );
	    check_outcome( result );
 
	    list.add( data.list );
	    printf( "list.count() = %d\n", list.count() );
 
	EXCEPTION_CATCH_TRUE
	    // Clean-up thread specific data.
	    ACIS_DELETE data.stream;
	EXCEPTION_END_NO_RESIGNAL
    }
 
    // Terminate the modeler.
    return terminate_acis( level );
}
Personal tools