History¶
The Basic Linear Algebra Subprograms (BLAS) are a set of functions designed to standardize and simplify numerical programming in an efficient and portable way. They are commonly used as building blocks in the design of higher-level linear algebra routines that demand performance and correctness.
The original version of BLAS, now called the reference implementation, was developed in Fortran77 starting in 1979 and has been updated with improvements and fixes up to the present. Highly optimized implementations of BLAS tuned for specific machine architectures are available for most development environments. These optimized implementations typically follow the same naming scheme and calling conventions as the reference implementation. Many optimized implementations also provide additional functions and options.
QML contains a high performance implementation of BLAS optimized for Qualcomm processors.
Levels¶
Functions in BLAS are organized in three levels.
| Level | Operations | Examples |
|---|---|---|
| BLAS-1 | vector | dot product
scale a vector
|
| BLAS-2 | matrix-vector | matrix-vector product
solve a linear system of equations
|
| BLAS-3 | matrix-matrix | matrix-matrix multiply
solve a linear system of equations
with multiple right-hand sides
|
The BLAS level indicates how many levels of nested loops are required to perform the operation. Scaling a vector requires a single loop iterating over the data. Computing a matrix-matrix product uses three nested loops (in the simplest implementation). The best performance is usually achieved by calling the highest level BLAS function possible to give the library the most oportunity for optimization.
Naming Conventions¶
Precision¶
Many functions can work with several types of data. Function names
are prefixed with the precision of the data they support.
For example, the function dgemm works with double precision
floating point values.
| Prefix | Precision |
|---|---|
| S | single |
| D | double |
| C | single complex |
| Z | double complex |
Matrix storage¶
BLAS can operate on matrices that are stored in several different dense formats. After the precision prefix is a secondary prefix that indicates the storage format of the matrix.
| Prefix | Matrix type |
|---|---|
| GE | general |
| SY | symmetric |
| HE | hermitian |
| TR | triangular |
| GB | general banded |
| SB | symmetric banded |
| HB | hermitian banded |
| TB | triangle banded |
| SP | symmetric packed |
| HP | hermitian packed |
| TP | triangle packed |
Fortran77 Interface¶
Because the reference BLAS implementation is written in Fortran77 the
interface to BLAS library functions follows Fortran77 conventions.
The most important difference from C is that all arguments to functions
are passed by reference. In the C function prototypes this means
every argument is a pointer, even simple scalar values.
Functions that return complex values use a void return type and
instead put their output into a result pointer.
Options¶
Many functions need to be given information such as whether to use an untransposed or transposed version of a matrix, or whether a triangular matrix is upper or lower triangular. In the Fortran77 interface these types of arguments are passed as character strings. Only the first character in the string is used and is interpreted using the following table.
| Option | Character | Interpretation |
|---|---|---|
| TRANSx | N | Non-transpose |
| T | Transpose | |
| C | Conjugate transpose | |
| UPLO | U | Upper triangular |
| L | Lower triangular | |
| DIAG | N | Non-unit diagonal |
| U | Unit diagonal | |
| SIDE | L | Left side |
| R | Right side |
Column-Major¶
Following Fortran, all matrices are assumed to be stored in column-major order when using the Fortran77 interface. This means that values linearly stored in increasing addresses in memory are interpreted as values in a column of the matrix going from top to bottom. Every matrix also has a leading dimension. For a matrix A the leading dimension is passed as the integer LDA. The values of a row of matrix A are stored in memory locations spaced apart by LDA values.
For example, a 3x3 matrix with LDA=10 in column-major order has values stored in the following memory locations.

One-based Indexing¶
The functions AMAX and AMIN return indices into arrays. Fortran uses 1-based indexing where an array with N elements has indices 1 through N. The Fortran BLAS interface follows the Fortran convention and uses 1-based indexing.
Example¶
For example, suppose we wish to perform a simple matrix multiply:

Here is the Fortran77 prototype for dgemm:
void dgemm(const char *TRANSA, const char *TRANSB, const qml_long *M,
const qml_long *N, const qml_long *K, const double *ALPHA,
const double *A, const qml_long *LDA, const double *B,
const qml_long *LDB, const double *BETA, double *C,
const qml_long *LDC);
The funtion dgemm is a double precision (d) general (ge) matrix-matrix
multiply. We do not wish to transpose A or B, so we pass "N"
for TRANSA and TRANSB. The function actually performs the
operation:

We choose
and
. Even though ALPHA and
BETA are scalar values, we still need to pass the address of the values
to use the Fortran interface.
C Interface¶
CBLAS is a lightweight wrapper around BLAS that provides an interface
compatible with the C language. CBLAS functions are prefixed
with cblas_ but otherwise follow the same naming convention as BLAS.
Passing arguments¶
In the C interface, vectors and matrices are passed by pointer while
non-complex scalar values are passed by value. All complex arguments
including scalar values are passed by reference. Functions that would
return complex values are recast as returning void,
have a final extra pointer argument result that receives the result
value, and are renamed to include the suffix _sub.
Options such as non-transpose/transpose are passed by value using an enumerated type
defined in the qml_cblas.h header.
enum CBLAS_TRANSPOSE {CblasNoTrans = 111, CblasTrans = 112, CblasConjTrans = 113};
enum CBLAS_UPLO {CblasUpper = 121, CblasLower = 122};
enum CBLAS_DIAG {CblasNonUnit = 131, CblasUnit = 132};
enum CBLAS_SIDE {CblasLeft = 141, CblasRight = 142};
Aliasing¶
Pointers in the C language are generally allowed to
alias to the same region of memory. The C interface to BLAS allows
aliasing of arguments declared const (input arguments) but does
not allow aliasing of output arguments (any argument not declared const).
For example, passing dgemm the same memory location for A and
B is allowed because both arguments are input arguments. Passing
dgemm the same memory location for A and C is not allowed
because C is an output argument.
Row- or Column-Major¶
The CBLAS interface allows either row- or column-major matrix storage order. Functions that accept matrices start with an argument that indicates whether row- or column-major ordering should be used.
enum CBLAS_ORDER {CblasRowMajor = 101, CblasColMajor = 102};
Row-major ordering means that values linearly stored in increasing addresses in memory are interpreted as values in a row of the matrix going from left to right. Every matrix also has a leading dimension. For a matrix A the leading dimension is passed as the integer LDA. The values of a column of matrix A are stored in memory locations spaced apart by LDA values.
For example, a 3x3 matrix with LDA=10 in row-major order has values stored in the following memory locations.

Zero-based Indexing¶
The functions AMAX and AMIN return indices into arrays. C uses 0-based indexing where an array with N elements has indices 0 through N-1. The CBLAS interface follows the C convention and uses 0-based indexing.
Performance Considerations¶
Using row-major ordering has no extra cost for most BLAS functions. Some BLAS-2 functions involving conjugate transpose require a small amount of extra processing and so will see degraded performance compared to column-major ordering. See Legacy BLAS: C Interface to the Legacy BLAS for more information.
Example¶
For example, suppose we wish to perform a simple matrix multiply:

Here is the CBLAS prototype for dgemm:
void cblas_dgemm(const CBLAS_ORDER ORDER, const CBLAS_TRANSPOSE TRANSA,
const CBLAS_TRANSPOSE TRANSB, const qml_long M,
const qml_long N, const qml_long K, const double ALPHA,
const double *A, const qml_long LDA, const double *B,
const qml_long LDB, const double BETA, double *C,
const qml_long LDC);
The funtion cblas_dgemm is the CBLAS interface (cblas_)
for a double precision (d) general (ge) matrix-matrix
multiply. We are storing our matrices in row-major order, so we choose
CblasRowMajor for ORDER. We do not wish to transpose A or B,
so we pass CblasNoTrans for TRANSA and TRANSB. The function actually performs the
operation:

We choose
and
, and pass the value
1.0 for ALPHA and 0.0 for BETA.
Storage Format Examples¶
The following examples show how values are laid out in memory using the different storage formats. The examples are shown for a 4x4 matrix in column-major ordering with LDA=10.
Note that in many formats not all input values are referenced. For example, in SY format with UPLO=”L” only the lower triangular part of the matrix is referenced, the upper triangular part is assumed to be symmetric. In some formats elements of the matrix are assumed to be a constant value and are not referenced in memory. For example, in TR format with DIAG=”U” the diagonal elements are assumed to be 1 and are never referenced.
References¶
The official homepage of BLAS is on Netlib.
Netlib also publishes a handy cheat sheet for quick reference to all BLAS functions.










