Example Programs

All the example code can be found in the examples/ subdirectory at the installation prefix location.

Minimal Example

This example shows the minimal code needed to call a QML function.

This example constructs matrices A and B that are both 1024x1024 filled with unit values. It then calls sgemm to multiply the matrices and store the 1024x1024 result in matrix C. The first value in the resulting matrix is displayed, which should be 1024.

#include <qml.h>

#include <iostream>

using namespace std;

int main()
{
    const uint32_t matrixSize = 1024*1024;
    
    float *A = new float[matrixSize];
    float *B = new float[matrixSize];
    float *C = new float[matrixSize];
    
    for(uint32_t i=0; i < matrixSize; i++)
    {
      A[i] = B[i] = C[i] = 1.0;
    }
    
    cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 1024, 1024, 1024,
        1.0, A, 1024, B, 1024, 0.0, C, 1024);
    
    cout << "Value of C[0] is: " << C[0] << endl;

    delete[] C;
    delete[] B;
    delete[] A;
    
    return 0;
}

CBLAS Example

This example shows how row-major indexing works in CBLAS and how offsets and leading dimensions can be passed to QML functions to operate on subregions of larger matrices.

#include <qml.h>

#include <iostream>

/* Example program that uses the CBLAS interface to permute part of a matrix.
   
   This program constructs a matrix A of size 6x8 with entries:
   
   [  1,  2,  3,  4,  5,  6,  7,  8; ]
   [  9, 10, 11, 12, 13, 14, 15, 16; ]
   [ 17, 18, 19, 20, 21, 22, 23, 24; ]
   [ 25, 26, 27, 28, 29, 30, 31, 32; ]
   [ 33, 34, 35, 36, 37, 38, 39, 40; ]
   [ 41, 42, 43, 44, 45, 46, 47, 48; ]
   
   It constructs an explicit permutation matrix of size 4x4 with entries:

   [ 0, 1, 0, 0; ]
   [ 1, 0, 0, 0; ]
   [ 0, 0, 0, 1; ]
   [ 0, 0, 1, 0; ]
   
   The operation to be performed is to apply the permutation matrix
   to the lower left quadrant of A on the right and store the result in
   a new matrix C. The result matrix C should be:

   [ 17, 18, 19, 20; ]   [ 0, 1, 0, 0; ]   [ 18, 17, 20, 19; ]
   [ 25, 26, 27, 28; ] * [ 1, 0, 0, 0; ] = [ 26, 25, 28, 27; ]
   [ 33, 34, 35, 36; ]   [ 0, 0, 0, 1; ]   [ 34, 33, 36, 35; ]
   [ 41, 42, 43, 44; ]   [ 0, 0, 1, 0; ]   [ 42, 41, 44, 43; ]

   All matrices are stored in row-major order in double precision.
 */

int main()
{
    // A is 6 x 8
    // Create on the heap
    const uint32_t A_rows = 6;
    const uint32_t A_cols = 8;
    double *A = new double[A_rows * A_cols];
    const uint32_t LDA = A_cols;

    // Fill out A
    // Row-major means adjacent memory locations are increasing
    for(uint32_t i=0; i < A_rows * A_cols; i++)
    {
      A[i] = i + 1;
    }

    // P is k x k
    // Create on the stack
    const uint32_t P_size = 4;
    double P[] = { 0, 1, 0, 0,
                   1, 0, 0, 0,
                   0, 0, 0, 1,
                   0, 0, 1, 0 };
    const uint32_t LDP = P_size;

    // C is k x k
    // Create on the heap
    double *C = new double[P_size * P_size];
    const uint32_t LDC = P_size;

    // CBLAS call to dgemm
    // Operation:
    //     C := 1.0 * P * A[3..6][1..4] + 0.0 * C
    cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, P_size, P_size,
        P_size, 1.0, A + LDA * 2, LDA, P, LDP, 0.0, C, LDC);

    // Show result matrix C in row-major order
    for (uint32_t row = 0; row < P_size; row++)
    {
        for (uint32_t col = 0; col < P_size; col++)
        {
            std::cout << C[col + LDC * row] << " ";
        }
        std::cout << std::endl;
    }

    // Cleanup
    delete[] A;
    delete[] C;
    return 0;
}

BLAS Solve Example

This example shows how to use the Fortran BLAS interface from C++ code to solve a system of linear equations.

#include <qml.h>

#include <iostream>

/* Example program that uses the BLAS interface to solve a system of
   equations.
   
   This program constructs a matrix A of size 7x7 with entries:
   
   [ 1,  0,   0,     0,  0 ]
   [ 0,  2,  -1,     0,  0 ]
   [ 0,  0, 1.5,    -1,  0 ]
   [ 0,  0,   0, 1.333, -1 ]
   [ 0,  0,   0,     0,  1 ]
   
   and a vector b with entries:

   [     1 ]
   [     2 ]
   [     2 ]
   [ 2.333 ]
   [     1 ]

   We would like to solve the system of equations A * x = b for x.
   
   Matrix A is stored in column-major order using single precision.
 */

int main()
{
    // A is 5x5
    // Create on the heap
    const qml_long A_size = 5;
    float *A = new float[A_size * A_size]{}; // Initialize to 0
    const qml_long LDA = A_size;

    // Fill out non-zero A entries
    A[0 + 0 * LDA] = 1.0f;
    A[1 + 1 * LDA] = 2.0f;
    A[2 + 2 * LDA] = 1.5f;
    A[3 + 3 * LDA] = 1.33333333f;
    A[4 + 4 * LDA] = 1.0f;
    A[1 + 2 * LDA] = -1.0f;
    A[2 + 3 * LDA] = -1.0f;
    A[3 + 4 * LDA] = -1.0f;

    // b is length 7
    // Create on the stack, name V since it will also hold x
    const qml_long V_size = 5;
    float V[] = { 1.0f, 2.0f, 2.0f, 2.33333333f, 1.0f };
    const qml_long INCV = 1;

    // BLAS call to strsv (single precision triangular solver)
    // Operation:
    //     Solve A x = b for x
    //     Upper triangular
    //     Non-transposed solve
    //     Non-unit diagonal
    // Note we pass addresses of scalars following Fortran convention
    strsv("U", "N", "N", &A_size, A, &LDA, V, &INCV);

    // Show result vector
    for (qml_long i = 0; i < V_size; i++)
    {
        std::cout << V[i] << std::endl;
    }

    // Cleanup
    delete[] A;
    return 0;
}

LAPACK Least Squares Example

This example shows how to use the LAPACK interface of QML from C++ code to find the best-fit quadratic equation through a set of points. The example calls dgels to compute the linear least squares solution.

#include <qml.h>

#include <iostream>

/* Example program that uses the LAPACK interface to compute
   the best-fit quadratic equation through a set of data points.

   The problem is to find values for beta_0, beta_1, and beta_2 so that
   the formula
   
   y = beta_0 + beta_1 * x + beta_2 * x^2
   
   is the "best" approximation to the following observed data points.

    x   | y
   -----+-----
    1   | 3
    2   | 4
    4   | 13
    5   | 27

   Here "best" means minimizing the sum of the squares of the
   differences.
   
   We use the LAPACK GELS function to compute a least-squares solution
   to the overdetermined A * x = b.

   A is the matrix:

   [ 1    1    1  ]
   [ 1    2    4  ]
   [ 1    4    16 ]
   [ 1    5    25 ]

   x is the unknown vector:
   
   [ beta_0 ]
   [ beta_1 ]
   [ beta_2 ]
   
   b is the matrix:
   
   [  3 ]
   [  4 ]
   [ 13 ]
   [ 27 ]

   Matrix A is stored in column-major order in double precision.
   Even though x and b are vectors here, GELS can handle multiple right
   hand sides simultaneously so it takes x and b as matrices.
 */

int main()
{
    // A is 4x3
    // Create on the heap
    const qml_long A_rows = 4, A_cols = 3;
    // Column major, each text row below is a column of A
    double *A = new double[A_rows * A_cols]{
        1.0, 1.0, 1.0, 1.0,
        1.0, 2.0, 4.0, 5.0,
        1.0, 4.0, 16.0, 25.0
    };
    const qml_long LDA = A_rows;
    const qml_long NRHS = 1;

    // b is length 4
    // Will contain x after dgels returns
    const qml_long B_size = 4;
    const qml_long X_size = 3;
    double *B = new double[B_size]{ 3.0, 4.0, 13.0, 27.0 };
    const qml_long LDB = B_size;

    // Query required workspace size
    qml_long lwork = -1; // query into first position of workspace
    double opt_worksize;
    qml_long info;
    dgels("N", &A_rows, &A_cols, &NRHS, A, &LDA, B, &LDB, &opt_worksize, &lwork, &info);

    // Allocate optimal scratch space
    // (Need cast to integer type because optimal size is given as a double)
    lwork = static_cast<qml_long>(opt_worksize);
    double *WORK = new double[lwork]{};

    // LAPACK call to dgels (double precision least squares solver)
    dgels("N", &A_rows, &A_cols, &NRHS, A, &LDA, B, &LDB, WORK, &lwork, &info);

    // Check for success
    if (info != 0)
    {
        std::cout << "ERROR computing least squares solution" << std::endl;
        return 1;
    }

    // Show result (should be y = 8.73333 * x^0  + -7.3 * x^1  + 2.16667 * x^2)
    std::cout << "Best fit equation is y = ";
    for (qml_long i = 0; i < X_size; i++)
    {
        std::cout << B[i] << " * x^" << i << " ";
        if (i < X_size - 1)
        {
            std::cout << " + ";
        }
    }
    std::cout << std::endl;

    // Cleanup
    delete[] A;
    delete[] B;
    delete[] WORK;
    return 0;
}

LAPACK Singular Value Decomposition

This example shows how to use the LAPACK interface of QML from C++ code to find the singular value decomposition of a matrix. It calls dgesvd to do the decomposition.

#include <qml.h>

#include <iostream>

/* Example program that uses the LAPACK interface to compute
   the singular value decomposition (SVD) of a matrix.

   The problem is to find U, SIGMA, and V^T such that
   
   U * SIGMA * V^T = [ 3  2  2 ]
                     [ 2  3 -2 ]
    
   where U is a 2x2 orthogonal matrix, V^T is a 3x3 orthogonal matrix,
   and SIGMA is a 2x3 matrix with non-zero elements only on the diagonal.
 */

int main()
{
    // A is 2x3
    // Create on the heap
    const qml_long rows = 2, cols = 3;
    // Column major, each text row below is a column of A
    double *A = new double[rows * cols]{
        3.0, 2.0,
        2.0, 3.0,
        2.0, -2.0
    };

    // S is diagonal entries of SIGMA
    double *S = new double[rows]{};

    // U is 2x2
    double *U = new double[rows * rows]{};

    // VT is 3x3
    double *VT = new double[cols * cols]{};

    // Query required workspace size
    qml_long lwork = -1; // query into first position of workspace
    double opt_worksize;
    qml_long info;
    dgesvd("A", "A", &rows, &cols, A, &rows, S, U, &rows, VT, &cols, &opt_worksize, &lwork, &info);

    // Allocate optimal scratch space
    // (Need cast to integer type because optimal size is given as a double)
    lwork = static_cast<qml_long>(opt_worksize);
    double *WORK = new double[lwork]{};

    // Do SVD
    dgesvd("A", "A", &rows, &cols, A, &rows, S, U, &rows, VT, &cols, WORK, &lwork, &info);

    // Check for success
    if (info != 0)
    {
        std::cout << "ERROR computing SVD" << std::endl;
        return 1;
    }

    // Show singular values (should be [ 5 3 ])
    std::cout << "Singular values are [ ";
    for (qml_long i = 0; i < rows; i++)
    {
        std::cout << S[i] << " ";
    }
    std::cout << "]\n";

    // Show the first left singular vector (should be [ -0.707107 -0.707107 ])
    std::cout << "First left singular vector is [ ";
    for (qml_long i = 0; i < rows; i++)
    {
        std::cout << U[i] << " ";
    }
    std::cout << "]\n";

    // Cleanup
    delete[] A;
    delete[] S;
    delete[] U;
    delete[] VT;
    delete[] WORK;
    return 0;
}

Building the Examples

One way to build your application or library and link against QML is to use the official Android NDK. You declare that QML is a pre-built library using directives in an Android.mk file inside the jni directory of an application tree.

include $(CLEAR_VARS)
LOCAL_MODULE := QML
LOCAL_SRC_FILES := <install-prefix>/lib<name>.so
LOCAL_EXPORT_C_INCLUDES := <install-prefix>/include
include $(PREBUILT_SHARED_LIBRARY)

Once this module has been declared, you can let the build system know that your application links against QML by adding the following directive to your application’s Android.mk:

LOCAL_SHARED_LIBRARIES += QML

To have access to newer C++11 features you also need to link against a full-featured version of the C++ runtime library and enable C++11 features in the compiler. These settings are set in Application.mk.

APP_STL := gnustl_shared
APP_CPPFLAGS += -std=c++11

To build the provided example programs in this way, run ndk-build from the examples/ directory of the installation within the desired architecture. This will produce executables and library files inside libs/<arch>/. Within the context of a complete application, these executables and libraries will be included as part of the final APK and installed in the correct location on application install.

Running the Examples

There are two main ways to run the examples. If you control the Android platform on the device, you can add QML to the system libraries that are available to all applications. If you do not control the platform then you will include the libraries as part of your application so they will be installed in the private application area during installation.

Platform control

If you control the Android platform, you can install the QML into the system libraries location of the device. The typical location would be /system/vendor/lib/ for 32-bit architectures or /system/vendor/lib64/ for 64-bit architectures. With root access in ADB this can be done for testing purposes using adb push.

Once the libraries are installed in system locations, running the examples requires copying the executables to any location on the device and running them.

Local install

To run the executables without root access, copy the libraries and executables from libs/<arch>/ to an accessible directory on the device such as /data/local/test. Once QML, and the test application are all in the same directory, run the executable with a command such as:

LD_LIBRARY_PATH=. ./MinimalExample

Complete applications packaged inside an APK will automatically install the libraries into the correct locations during installation if the libraries are declared as previously described in the Android.mk file.