Access a wobbegong-formatted SummarizedExperiment
Overview
The wobbegong specification supports the retrieval of parts of a SummarizedExperiment object via HTTP range requests on static files. This allows web applications to fetch and visualize assay data, reduced dimension results, etc. without the need to download the entire object or implement custom server logic. The wobbegong.js library provides an easy-to-use Javascript interface that handles the process of decoding the ranges from wobbegong-formatted files.
Quick start
First, we install the wobbegong library from npm via the usual method:
npm install wobbegong
Developers are expected to know how to fetch content from their static file server. For example, we could define the relevant fetching functions as below:
const url = "https://my.wobbegong.host.com";
// Define a method to retrieve JSON from the wobbegong-hosted files.
// This typically uses fetch() on a web browser:
const fetch_json = async (path) => {
const res = await fetch(url+ "/" + path);
if (!res.ok) {
throw new Error("oops, failed to retrieve '" + path + "' (" + String(res.status) + ")");
}
return res.json();
};
// Define a method to do range requests on wobbegong-hosted files:
const fetch_range = async (path, start, end) => {
const res = await fetch(
url + "/" + path,
{ headers: { Range: "bytes=" + String(start) + "-" + String(end-1) } }
);
if (!res.ok) {
throw new Error("oops, failed to retrieve range from '" + path + "' (" + String(res.status) + ")");
}
let output = new Uint8Array(await res.arrayBuffer());
return output.slice(0, end - start); // trim off any excess junk
};
Once that's done, we use the wobbegong.js library to load the interface to the SummarizedExperiment:
import * as wob from "wobbegong";
const se = await wob.load("my_dataset", fetch_json, fetch_range);
se.numberOfRows();
se.numberOfColumns();
se.isSingleCellExperiment();
Retrieving row/column data
Row data will be returned as a DataFrame, or null
if no row data is available.
se.hasRowData(); // indicates whether row data is available
const rowd = await se.rowData(); // null if the above is false.
rowd.numberOfRows();
rowd.columnNames();
Each column of the DataFrame can be retrieved by index or name, returning an array of the same length.
await rowd.column(0); // first column
await rowd.column("whee"); // can also access by name
We can also check if there are any row names on the DataFrame. If present, this returns an array of strings.
rowd.hasRowNames();
await rowd.rowNames(); // null if no row names are present
The same applies to the column data.
const cold = await se.columnData();
cold.numberOfRows();
cold.columnNames();
await cold.column(0);
await cold.column("whee");
Retrieving assays
We can check the available assays in the SummarizedExperiment:
se.assayNames();
And then retrieve each matrix by name or index:
const first_assay = await se.assay(0);
const log_assay = await se.assay('logcounts');
Each assay matrix from the same SummarizedExperiment will have the same number of rows and columns, but may have different types or sparsity.
log_assay.numberOfRows();
log_assay.numberOfColumns();
log_assay.type(); // usually one of integer, double or boolean
log_assay.sparse();
The only way to extract data from a matrix is by row, which optimizes for per-gene access.
For dense matrices, this returns an array of length equal to the number of columns;
for sparse matrices, this returns an object with value
and index
properties that specify the value and column index, respectively, of each structural non-zero element.
Developers can forcibly return a dense array in all situations by setting asDense: true
.
const vals = await log_assay.row(0);
const vals2 = await log_assay.row(0, { asDense: true }); // always returns dense array
Developers may also like to inspect pre-computed statistics such as the column sums or the number of detected genes in each column. In particular, the column sums can be used to perform library size normalization of count data prior to visualization.
log_assay.statisticNames(); // see available stats.
const colsums = await log_assay.statistic("column_sum");
Retrieving reduced dimensions
For SingleCellExperiments, we can check the available reduced dimension results:
se.reducedDimensionNames();
And then retrieve each result by name or index:
const first_reddim = await se.reducedDimension(0);
const tsne = await se.reducedDimension('TSNE');
We can examine various properties of each result:
tsne.numberOfRows();
tsne.numberOfColumns();
tsne.type(); // typically double
Each column can then be extracted for visualization.
const tsne_x = await tsne.column(0);
const tsne_y = await tsne.column(1);
Retrieving alternative experiments
For SingleCellExperiments, we can check the available alternative experiments:
se.alternativeExperimentNames();
And then retrieve each alternative experiment by name or index:
const first_ae = await se.alternativeExperiment(0);
const adt_ae = await se.alternativeExperiment('ADT');
Each one of these is just another SummarizedExperiment
instance,
so all of the methods described above can be applied here.
adt_ae.assayNames();
adt_ae.numberOfRows();
adt_ae.numberOfColumns();
More reading
Check out the reference documentation for more details on each class and method.
Use the R package to generate wobbegong-formatted files from a SummarizedExperiment.