Multi-threaded Encoder/Decoder

group libjxl_threads

Additional multi-threaded implementations for the parallel runner.

Defines

JXL_PARALLEL_RET_RUNNER_ERROR

General error returned by the JxlParallelRunInit function to indicate an error.

Typedefs

typedef int JxlParallelRetCode

API for running data operations in parallel in a multi-threaded environment. This module allows the JPEG XL caller to define their own way of creating and assigning threads.

The JxlParallelRunner function type defines a parallel data processing runner that may be implemented by the caller to allow the library to process in multiple threads. The multi-threaded processing in this library only requires to run the same function over each number of a range, possibly running each call in a different thread. The JPEG XL caller is responsible for implementing this logic using the thread APIs available in their system. For convenience, a C++ implementation based on std::thread is provided in jpegxl/parallel_runner_thread.h (part of the jpegxl_threads library).

Thread pools usually store small numbers of heterogeneous tasks in a queue. When tasks are identical or differ only by an integer input parameter, it is much faster to store just one function of an integer parameter and call it for each value. Conventional vector-of-tasks can be run in parallel using a lambda function adapter that simply calls task_funcs[task].

If no multi-threading is desired, a NULL value of JxlParallelRunner will use an internal implementation without multi-threading. Return code used in the JxlParallel* functions as return value. A value of 0 means success and any other value means error. The special value JXL_PARALLEL_RET_RUNNER_ERROR can be used by the runner to indicate any other error.

typedef JxlParallelRetCode (*JxlParallelRunInit)(void *jpegxl_opaque, size_t num_threads)

Parallel run initialization callback. See JxlParallelRunner for details.

This function MUST be called by the JxlParallelRunner only once, on the same thread that called JxlParallelRunner, before any parallel execution. The purpose of this call is to provide the maximum number of threads that the JxlParallelRunner will use, which can be used by JPEG XL to allocate per-thread storage if needed.

Param jpegxl_opaque:

the jpegxl_opaque handle provided to JxlParallelRunner() must be passed here.

Param num_threads:

the maximum number of threads. This value must be positive.

Return:

0 if the initialization process was successful.

Return:

an error code if there was an error, which should be returned by JxlParallelRunner().

typedef void (*JxlParallelRunFunction)(void *jpegxl_opaque, uint32_t value, size_t thread_id)

Parallel run data processing callback. See JxlParallelRunner for details.

This function MUST be called once for every number in the range [start_range, end_range) (including start_range but not including end_range) passing this number as the value. Calls for different value may be executed from different threads in parallel.

Param jpegxl_opaque:

the jpegxl_opaque handle provided to JxlParallelRunner() must be passed here.

Param value:

the number in the range [start_range, end_range) of the call.

Param thread_id:

the thread number where this function is being called from. This must be lower than the num_threads value passed to JxlParallelRunInit.

typedef JxlParallelRetCode (*JxlParallelRunner)(void *runner_opaque, void *jpegxl_opaque, JxlParallelRunInit init, JxlParallelRunFunction func, uint32_t start_range, uint32_t end_range)

JxlParallelRunner function type. A parallel runner implementation can be provided by a JPEG XL caller to allow running computations in multiple threads. This function must call the initialization function init in the same thread that called it and then call the passed func once for every number in the range [start_range, end_range) (including start_range but not including end_range) possibly from different multiple threads in parallel.

The JxlParallelRunner function does not need to be re-entrant. This means that the same JxlParallelRunner function with the same runner_opaque provided parameter will not be called from the library from either init or func in the same decoder or encoder instance. However, a single decoding or encoding instance may call the provided JxlParallelRunner multiple times for different parts of the decoding or encoding process.

Return:

0 if the init call succeeded (returned 0) and no other error occurred in the runner code.

Return:

JXL_PARALLEL_RET_RUNNER_ERROR if an error occurred in the runner code, for example, setting up the threads.

Return:

the return value of init() if non-zero.

Functions

JXL_THREADS_EXPORT JxlParallelRetCode JxlResizableParallelRunner(void *runner_opaque, void *jpegxl_opaque, JxlParallelRunInit init, JxlParallelRunFunction func, uint32_t start_range, uint32_t end_range)

Implementation of JxlParallelRunner than can be used to enable multithreading when using the JPEG XL library. This uses std::thread internally and related synchronization functions. The number of threads created can be changed after creation of the thread pool; the threads (including the main thread) are re-used for every ResizableParallelRunner::Runner call. Only one concurrent JxlResizableParallelRunner call per instance is allowed at a time.

This is a scalable, lower-overhead thread pool runner, especially suitable for data-parallel computations in the fork-join model, where clients need to know when all tasks have completed.

Compared to the implementation in thread_parallel_runner.h, this implementation is tuned for execution on lower-powered systems, including for example ARM CPUs with big.LITTLE computation models. Parallel runner internally using std::thread. Use as JxlParallelRunner.

JXL_THREADS_EXPORT void *JxlResizableParallelRunnerCreate(const JxlMemoryManager *memory_manager)

Creates the runner for JxlResizableParallelRunner. Use as the opaque runner. The runner will execute tasks on the calling thread until JxlResizableParallelRunnerSetThreads is called.

JXL_THREADS_EXPORT void JxlResizableParallelRunnerSetThreads(void *runner_opaque, size_t num_threads)

Changes the number of threads for JxlResizableParallelRunner.

JXL_THREADS_EXPORT uint32_t JxlResizableParallelRunnerSuggestThreads(uint64_t xsize, uint64_t ysize)

Suggests a number of threads to use for an image of given size.

JXL_THREADS_EXPORT void JxlResizableParallelRunnerDestroy(void *runner_opaque)

Destroys the runner created by JxlResizableParallelRunnerCreate.

JXL_THREADS_EXPORT JxlParallelRetCode JxlThreadParallelRunner(void *runner_opaque, void *jpegxl_opaque, JxlParallelRunInit init, JxlParallelRunFunction func, uint32_t start_range, uint32_t end_range)

Implementation of JxlParallelRunner than can be used to enable multithreading when using the JPEG XL library. This uses std::thread internally and related synchronization functions. The number of threads created is fixed at construction time and the threads are re-used for every ThreadParallelRunner::Runner call. Only one concurrent JxlThreadParallelRunner call per instance is allowed at a time.

This is a scalable, lower-overhead thread pool runner, especially suitable for data-parallel computations in the fork-join model, where clients need to know when all tasks have completed.

This thread pool can efficiently load-balance millions of tasks using an atomic counter, thus avoiding per-task virtual or system calls. With 48 hyperthreads and 1M tasks that add to an atomic counter, overall runtime is 10-20x higher when using std::async, and ~200x for a queue-based thread Parallel runner internally using std::thread. Use as JxlParallelRunner.

JXL_THREADS_EXPORT void *JxlThreadParallelRunnerCreate(const JxlMemoryManager *memory_manager, size_t num_worker_threads)

Creates the runner for JxlThreadParallelRunner. Use as the opaque runner.

JXL_THREADS_EXPORT void JxlThreadParallelRunnerDestroy(void *runner_opaque)

Destroys the runner created by JxlThreadParallelRunnerCreate.

JXL_THREADS_EXPORT size_t JxlThreadParallelRunnerDefaultNumWorkerThreads(void)

Returns a default num_worker_threads value for JxlThreadParallelRunnerCreate.