Memory Usage

This article mentions some of the classes which are described in greater detail here

Memory usage explained

Memory usage per lane depends on several parameters:

1. Input FASTQ files - an input file size depends on the number of reads and their length, and the number of records in each FASTQ file

For example: a typical 100 bp (base pairs) Paired-end run has three reads - two of length 101, and one shorter read which contains the index, typically of length 7 or 9.

There is different between the versions of bcl2fastq:

For bcl2fastq version 1 (v1.8.4 for example):

bcl2fastq’s default output files contain 4,000,000 FASTQ records.

Let’s examine these following files:

lane2_NoIndex_L002_R1_001.fastq.gz
lane2_NoIndex_L002_R2_001.fastq.gz
lane2_NoIndex_L002_R3_001.fastq.gz

And call them an input batch. The input batch contains 4,000,000 fragments.

Reading an input batch is done simultaneously. Loading such a batch for 100 bp Paired-end takes up around ~8GB RAM.

For bcl2fastq version 2 (v2.17 for example):

bcl2fastq’s default output files contain all FASTQ records in one batch of big files.

You have 2 options:

1) Leave the files compressed as is, but than, the reader will read all big files into the memory at once.

2) Uncompress the files before running IndeXeeker. In this option, the reader will read only part of the batch files at once. The user can use with –reader-chunk-size parameter of the script indexeeker-demultiplex.py to determine the size (in MB) to read from each file of the batch.

Let’s examine these following files:

Undetermined_S0_L001_R1_001.fastq
Undetermined_S0_L001_R2_001.fastq
Undetermined_S0_L001_I1_001.fastq

The reader will read –reader-chunk-size of MB from each of files. We will call to this part of fragments an input batch.

Reading an input batch is done simultaneously. Loading such a batch for Paired-end (3 files) takes up exactly –reader-chunk-size*3 MB RAM.

  1. Max readers per lane - this parameter is the one which has the biggest effect on the run time and memory usage.

A reader is equivalent to a single InputBatchDemultiplexer which handles a single input batch at a time.

The more readers per lane - the quicker IndeXeeker will run, however each such process takes up memory depending on the input files size, as described previously, and the number of samples and output buffer size, as described later.

Note: actually, (2) could have been more accurately named “Max batch demultiplexers per lane”.

3. Output buffer size - after the demultiplexer determines for a single fragment to which sample it belongs (based on the barcodes), the fragment is placed into a proper buffer, which is flushed when the buffer is full.

We have not experimented much with this parameter, setting it to be 50,000 by default.

  1. Max flushers per lane - the flush processes are responsible for writing the demultiplexed reads to output files.

5. Shared memory process - a SyncManager process is responsible for sharing various information between process. The most memory consuming shared information is the output buffers which are filled by the demultiplexers and flushed by the flushers.

The memory consumption of this process is highly variable, since it depends on the number of other processes, as well as the number of samples. For example:

A small number of samples can cause locks on output, thus causing buffers queue stacking and high manager memory use.

Examples

Here are some examples of memory use of the different processes and IndeXeeker as a whole in the following table. The examples are true for the output of bcl2fastq version 1, or for version 2 if the user set the value ~650MB in the parameter –reader-chunk-size (approximately size of each file).

The parameters which were the same across all of these runs:

  • a single lane
  • output buffer size = 50,000
  • max readers = 10
  • max flushers = 16
Run type Read lengths Num. of Samples Manager mem. (GB) Reader mem. (GB) Flusher mem. (GB) Total mem. (GB)
SR 60 61, 11 4 5.1-7.7 2.7GB 0.15GB 36GB
SR 60 61, 11 12 3.3-6.3 2.9GB 0.15GB 38GB
SR 60 61, 11 48 4.2-7.2 3.9GB 0.15GB 48GB
PE 125 126, 9, 126 2 10-13 5.1GB 0.25GB 68GB
PE 125 126, 7, 126 10 2.4-5 5.5GB 0.25GB 64GB
PE 125 126, 7, 126 53 3-9-4.5 7.9GB 0.25GB 87GB