SequenceJuxtaposer -- Read Me
Back to Documentation
1 What is SequenceJuxtaposer?
SequenceJuxtaposer is a flexible tool used to explore a set of sequences.
It provides an immediate view of a set of sequences that a user can navigate
through with a few clicks of a mouse.
2 Loading FASTA files
Any FASTA file (DNA or RNA) can be loaded into SequenceJuxtaposer. A FASTA
file is either passed into SequenceJuxtaposer as a command-line argument
java -jar sj.jar fastaFile.fa
or SequenceJuxtaposer can be started and a file can be loaded using the
java -jar sj.jar
3 Searching for motifs
You may search for a motif in a few ways: Keyboard input, Mouse input, and
3.1 Keyboard input
If the motif you are searching for is very short (i.e. start codon) it may
by typed into the search box at the bottom of the display screen. Pressing
enter after typing your search (or clicking the search button) will perform
a search in the data. All locations of your input motif will be highlighted
with a magenta box if there aren't too many results (i.e. searching for
a single base pair character may not be displayed in very large data sets).
Please see the section of this document entitled "Resizing an area of
interest" for another useful tool related to the search motif.
3.2 Mouse input
If you know the position of a motif in a loaded sequence, select the motif
using a mouse by holding shift, pressing the mouse button and dragging from
the start of the motif to the end and releasing at the other end of the
motif. This action copies the motif to the search box and highlights the
other locations of the selected motif in the loaded sequences. The text in
the search box can now be modified and a new search can be performed with
small changes. A motif that is too long to fully display will be truncated;
you may edit a small range of base pairs at the start and/or end of a large
3.3 Mouse navigating
Motif searching can also be achieved without using the previously mentioned
searching techniques; simple navigation is possible if you know where you
would like to explore spatially. By dragging a box around a subset of base
pair sites and a subset of sequences and using the mouse to drag the box
larger, you may stretch the region in the box to any size you want. This
also resizes the areas outside the box accordingly. Navigation with the
mouse in this manner is a very powerful tool for "growing" or "shrinking"
a selection of base pair sites across sequences.
4 Resizing an area of interest
Using the "Groups" panel, you can resize different groups. The resizing can
either make the selected group "smaller" by shrinking each member
horizontally or "bigger" by growing each member horizontally. The groups
that can be resized this way are indicated by radio buttons in the
"Groups" panel. They include: gaps, search results, and differences.
5 Changing colors of groups and sites
The color of a group (base pair sites, gaps, differences, searched for results)
can be changed through the "Groups" panel by clicking on the color swatch.
Differences in aligned sequences are computed at run-time. They may be turned
off in the "Groups" panel. This group is intended to be a guide in sets
of very similar sequences as even the smallest differences can be found
in large data sets due to SequenceJuxtaposer's guarantee of groups always
7 Resizable blocks
Local regions of nucleotides are culled into collections called blocks; the
number of nucleotides per block depends on the density of nucleotides, the
color represented in a block is a nucleotide that is most frequent in that
block. The smallest possible block size is 1 pixel, which may be either too
small to see on high resolution monitors or noisy with large datasets. We
use a default block size of 5, which means that we try to limit the minimum
size of a block to 5 pixels. However, due to our data structures, there are
cases where smaller regions appear. We ensure that no blocks larger than
5 pixels are drawn, unless a single, indivisible nucleotide is stretched.
The block size can be controlled by a slider in our "Groups" panel.
8 Version Information
- released January 11, 2004
- original code from TJ, using quadtrees, binary trees
- limited to approximately 2 million total base pairs in 1.5G heap memory
- submitted for publication
- released June 25, 2004
- updated documentation
- splitLine code for drawing, resizing, picking, culling
- limited to approximately 17 million total base pairs in 1.5G heap memory
- flexible structure for use in future accordion drawing packages
- annotations added
- visible interaction boxes
- released February 8, 2005
- updated documentation
- resizable blocks of nucleotides/sequences
- limited to approximately 40 million total base pairs (Mbp) in 1.5G heap memory
- color averaging
- cached block information in split lines
- peak (cached) scene rendering time of approximately 500ms:
- Pentium4 3Ghz, 800x600, 40 Mbp, block size 5