splitter

 

Function

Split a sequence into (overlapping) smaller sequences

Description

This simple editing program allows you to split a long sequence into smaller, optionally overlapping, subsequences.

There should be little requirement to split sequences into smaller sub-sequences in EMBOSS, but there may be circumstances where memory usage becomes restrictive when dealing with truly large sequences. In this case, memory usage may be reduced by repeating the analysis several times on split sub-sequences.

If you need to split a large sequence into smaller subsequences so that a non-EMBOSS program can analyse the smaller sequence, it may also be useful to write the sub-sequences into separate files instead of the default EMBOSS behaviour of concatenating them together into one file.

To write the output sequences to separate files, use the command-line switch '-ossingle'.

Usage

Here is a sample session with splitter

Split a sequence into sub-sequences of 10,000 bases (the default size) with no overlap between the sub-sequences:


% splitter tembl:AP000504 ap000504.split 
Split a sequence into (overlapping) smaller sequences

Go to the input files for this example
Go to the output files for this example

Example 2

Split a sequence into sub-sequences of 50,000 bases with an overlap of 3,000 bases on each sub-sequence:


% splitter tembl:AP000504 ap000504.split -size=50000 -over=3000 
Split a sequence into (overlapping) smaller sequences

Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-sequence]          seqall     Sequence(s) filename and optional format, or
                                  reference (input USA)
  [-outseq]            seqoutall  [.] Sequence set(s)
                                  filename and optional format (output USA)

   Additional (Optional) qualifiers:
   -size               integer    [10000] Size to split at (Integer 1 or more)
   -overlap            integer    [0] Overlap between split sequences (Integer
                                  0 or more)
   -source             boolean    [N] Split using source features with /oridif
                                  qualifiers
   -multifile          boolean    [N] Split sequence into multiple files

   Advanced (Unprompted) qualifiers:
   -feature            boolean    [N] Use feature information
   -addoverlap         boolean    [N] Add overlap to size

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-outseq" associated qualifiers
   -osformat2          string     Output seq format
   -osextension2       string     File name extension
   -osname2            string     Base file name
   -osdirectory2       string     Output directory
   -osdbname2          string     Database name to add
   -ossingle2          boolean    Separate file for each entry
   -oufo2              string     UFO features
   -offormat2          string     Features format
   -ofname2            string     Features file name
   -ofdirectory2       string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages

Standard (Mandatory) qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence(s) filename and optional format, or reference (input USA) Readable sequence(s) Required
[-outseq]
(Parameter 2)
Sequence set(s) filename and optional format (output USA) Writeable sequence(s)  
Additional (Optional) qualifiers Allowed values Default
-size Size to split at Integer 1 or more 10000
-overlap Overlap between split sequences Integer 0 or more 0
-source Split using source features with /oridif qualifiers Boolean value Yes/No No
-multifile Split sequence into multiple files Boolean value Yes/No No
Advanced (Unprompted) qualifiers Allowed values Default
-feature Use feature information Boolean value Yes/No No
-addoverlap Add overlap to size Boolean value Yes/No No

Input File Format

splitter reads one or more sequence USAs.

Input files for usage example

'tembl:AP000504' is a sequence entry in the example nucleic acid database 'tembl'

Database entry: tembl:AP000504

ID   AP000504   standard; DNA; HUM; 100000 BP.
XX
AC   AP000504; BA000025;
XX
SV   AP000504.1
XX
DT   28-SEP-1999 (Rel. 61, Created)
DT   22-AUG-2001 (Rel. 68, Last updated, Version 3)
XX
DE   Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section
DE   3/20.
XX
KW   .
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.
XX
RN   [1]
RP   1-100000
RA   Hirakawa M., Yamaguchi H., Imai K., Shimada J.;
RT   ;
RL   Submitted (21-SEP-1999) to the EMBL/GenBank/DDBJ databases.
RL   Mika Hirakawa, Japan Science and Technology Corporation (JST), Advanced
RL   Databases Department; 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, Japan
RL   (E-mail:mika@tokyo.jst.go.jp, URL:http://www-alis.tokyo.jst.go.jp/,
RL   Tel:81-3-5214-8491, Fax:81-3-5214-8470)
XX
RN   [2]
RA   Shiina S., Tamiya G., Oka A., Inoko H.;
RT   "Homo sapiens 2,229,817bp genomic DNA of 6p21.3 HLA class I region";
RL   Unpublished.
XX
DR   SWISS-PROT; O00299; CLI1_HUMAN.
DR   SWISS-PROT; O43196; MSH5_HUMAN.
DR   SWISS-PROT; O95445; APOM_HUMAN.
DR   SWISS-PROT; O95865; DDH2_HUMAN.
DR   SWISS-PROT; O95867; NG24_HUMAN.
DR   SWISS-PROT; P13862; KC2B_HUMAN.
XX
CC   This sequence is conducted by Tokai University as a JST sequencing
CC   Team.
CC   Principal Investigator: Hidetoshi Inoko Ph.D
CC   Phone:+81-463-93-1121, Fax:+81-463-94-8884,
CC   The sequence is submitted by Human Genome Sequencing in ALIS
CC   project of JST
CC   Japan Science and Technology Corporation (JST)
CC   5-3, Yonbancyo, Chiyoda-ku, Tokyo, 102-0081 Japan
CC   For further infomation about this sequences, please visit our
CC   sequence archive Web site (http://www-alis.tokyo.jst.go.jp/HGS/top.


  [Part of this file has been deleted for brevity]

     gggtggatca tgaggtcaag agatcgagac tatcctggct aacatgatga aaccccgtct     97080
     ctactaaaaa tacaaaaaat tagctgggca tggtggcggg cacctgtagt cccagctact     97140
     cgggaggctg agtcaggaga atggtgtgaa cccaggagac ggagcttgca gtgagctgag     97200
     gtcgcaccac tgcactccag cctgggtgat agagcgagac tctgtctcaa aaaaaaaaaa     97260
     aaaaaaaaaa aaaacaaaaa ttagccgggt gtggtggcag gcaacttaat cccagctact     97320
     tgggaggcag aggcaggaga atcgtttgaa cctgggaggc ggaggttgaa gagaatagaa     97380
     gctctgctgg tccagagaag gattgggcca gggctctggg agaccaggga gaaagagggc     97440
     acatgtggtc cctgttgact gtgagggtgg gaatctgagg aaggctttgg ctcattgccc     97500
     cttgggtttg tccacagcca tccttcccct gcggagtatg tcgaggtgct ccaggagcta     97560
     cagcggctgg agagtcgcct ccagcccttc ttgcagcgct actacgaggt tctgggtgct     97620
     gctgccacca cggactacaa taacaatgtg agccctttga tggccctgcc ctttctcctc     97680
     agccccagta ctcccaaaac agaacaggct gaaatacaga taactctttc cctccctgga     97740
     aaaacattgc aacagggcca ggtgcagtgg ctcacgcctg taatcccagc actttgggag     97800
     gccaaggtgg gcggatcatc tgagatcggg agtttgagac cagcctggcc aacatggtgc     97860
     aaccccatct ctactgaaaa tataaacatt agctggatgt agtggtgcac acctgtaatc     97920
     ccagctactc aggaggctga ggcaggagaa tcgctagaac tcgggaggag ggggttgcag     97980
     tgagccgaga ttgcactact gcactctagc ctgggtgaca gagcgagact gtctcaaaaa     98040
     acaaaacaaa acaaaaaaac acacattgca acaaaacaat ttctctctaa acctgtaagt     98100
     gattttgtcc tcccttacag agaaggtgat aatctttgct gtaagcactg tcctcgtatc     98160
     gtaccccttg tgcccctgaa tgaatttaga aaatgtaaag tacaggagat cagtatatga     98220
     tgacttactg attcatagta gtgttttaat aggatgttcc ttatgtgaat aagatataat     98280
     ttatttgcaa agatttggtc tacatgtaaa cttccaagga tataactgaa agttttggag     98340
     gacatggtat tctcagtagg cattattgct tttattagtg agatggactc cagcttgata     98400
     ttttctgcct ttttgtgttt ggctggttgt gcgcagcacg agggccggga ggaggatcag     98460
     cggttgatca acttggtagg ggagagcctg cgactgctgg gcaacacctt tgttgcactg     98520
     tctgacctgc gctgcaatct ggcctgcacg cccccacgac acctgcatgt ggtccggcct     98580
     atgtctcact acaccacccc catggtgctc cagcaggcag ccattcccat acaggtgggt     98640
     tagggggagt ctggcctgag ggagagtgag gggtgttgat agagtgaccc agggtagcta     98700
     ctgggcctga aggaggttag gaaaggagga gactggaaac atggtgatga aggctggaga     98760
     tactttagag gtttatcatg aggttttctt ggttaggctc ttgtattttt ctcacatctg     98820
     cctgtccatc tgtctttttc agatcaatgt gggaaccact gtgaccatga caggaaatgg     98880
     gactcggccc cccccaactc ccaatgcaga ggcacctccc cctggtcctg ggcaggcctc     98940
     atccgtggct ccgtcttcta ccaatgtcga gtcctcagct gagggggctc ccccgccagg     99000
     tccagctccc ccgccagcca ccagccaccc gagggtcatc cggatttccc accagagtgt     99060
     ggaacccgtg gtcatgatgc acatgaacat tcaaggtgag aatagttgct ggcgagaaga     99120
     gcaggatcag catgatgagg gaggttcatg ctgaggtgtg agggaacagg gtggggaagg     99180
     gagaggcaca tgctggtggt ggtagcctgg ggaccagagc agaagcttaa gtagacagat     99240
     gtggggggtg tgggggttgg tttgtctttg gaggtgtgtt tgtgtggtga agggagtacc     99300
     tctccctgtt tagatggagg gaaaggcagg ctttctgatt gggggattat gggcctgaag     99360
     tatgcctgat ctcagaagga tatagttagg ccttggccct acctacctca gggccactgt     99420
     ctctgtctcc ctgcccagat tctggcacac agcctggtgg tgttccgagt gctcccactg     99480
     gccccctggg accccctggt catggccaaa ccctgggtaa gagtgagggc atcagggcag     99540
     gctgagctct gggtagagaa agggaagggc tgagtgggtg ggttgaaggg gtccaggttc     99600
     aaggttacat cagacccgcc ccccaggctc caccctcatc cagctgccct ccctgccccc     99660
     tgagttcatg cacgccgtcg cccaccagat cactcatcag gccatggtgg cagctgttgc     99720
     ctccgcggcc gcaggtaatg acctggaagg ggaggcttgg gaggtagggc acagtccatg     99780
     gtggcagctg gctggcaagg gcctggccct cagccctctt cggtctgtct cttctgccac     99840
     ccacaggaca gcaggtgcca ggcttcccaa cagctccaac ccgggtggtg attgcccggc     99900
     ccactcctcc acaggctcgg ccttcccatc ctggagggcc cccagtctct gggacactgg     99960
     tgagcaaggg tcggggagtt ctagtgcgta acagtctagg                          100000
//

Output File Format

Output files for usage example

File: ap000504.split

>AP000504_1-10000 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section 3/20.
gaccaatctcactgtgaggaggcagtcaaagggaataatggaagagaggaagaggatttt
ctcagtggcagtcatggcgtctgggatgaaggagtagtttccagaaaggaggcgttgttt
gcttatctccagacctatttgagggaggcaagcaaagggaacggtcttgtagctcaattt
tttcaccccattttaagaatgagacaatagaagcaagagagattatttgacttgcccaag
ctcacacaggcagttaatggaaagctagagcaagaaccaaattttcagactcttagtcta
attctctttttattctacatataatataaagatacttgtctgaaagcacagcctgagaaa
gataaatggctgaggaaagtagacatctgtctggaattgaggattttggtcaaaataatg
gtattaatagaactagtaacactaatgccttaatatctaattaggatagtacactcctgt
tcttattgtaaacctaggaaagttatagaagtgccttatggatcataataagggtcactg
aggcagtgccttttggtttggtgataaaaggctttaacttaatggggagaattccaacaa
taaaaccctgtccaaaaagtgtcaccactcctcaggggaggccctcatccctagacatga
cttaagcagaggcttcccaataagctgcaggttattaaagggtagggagcaggagagatc
ttggggggacaggtcatagggcatgaggagcacaaaggtttaggatgacataaggcagag
gggagatctgtgatgatgaaggtagagttgggggaaagaatgggacaccggaacagggag
ttaggcaaagcaaaaggaaggagataccaaaatccacacttggcaaaaatatgatttcag
gtcttttaggctctctgtgctcctgggaggctgtgggggaggaaagaaaaggctatcatt
ctttacatctcagtccttctacctctgtctgacactccctctcacccaattctagccccc
tggaatattccatatattagtccttccccattttccctctatcctttaccaagtccttac
caagctttcccagaaatcgagtcatattctcatcctgtttggcactcgtaacaacagact
ggggattgatctcatccagaacttggaaggagaacagagatcaaatgagttaaaggatct
ttgtctttgactaagagaaaacccatagccctcctcttcctacccctctccttctcaaaa
acatttcctccctaggagtagggagtgctctgcacagtgggaacacaggtagaagttgag
atttagaaaagtagttaagagtggtgggatggtgagagggaagtgggatgttctggatgt
tgtcactaggctgtaaacccctggagaacagacatgactgatttgcccagggctgaatct
gaagcacctgaaacattgtaaatacgtcatatatatttgtggccaggcacagtggctcat
gcctataatccctgccctttgggaggccaaggcaggcagatcactggaggccaggagctc
aagacaagcctagccaacgtggtgaaaccctgcctctactaaaaatataaaaattagcca
ggcgtgatggcagattcttgtaatcccagctactcgggagactgaggcaggagaattgct
tgaatccgggagacggaggttgcagtgagccaagatggcaccactacacttccagcctga
gtgacggagcaagacactgtctcaaaaaagaacaaccaaacaaaccaaaaaacagcctca
caaatatttgttaaataatgaaatgaattcataaaaacaaaagagggagcctctgtgaag
caactgtaaaatatattgagtcagtgctatagtttggatgtgatttgtccctgccaaata
tcgtgttgaaatttaatccccagtgtgatagtgttgtgaggtagggcctagcaggaggtg
tgtgggtgatgggagtggatcgctcatgaacagattaatgcccttcctggagtgtgttgg
tgggtatgagtgagaggttctcactctattagttcctgagagagctggttgtcaaaaaga
gcctggcatctccctcccccttgcttcttctctgccatgtgacctctacacaccctgcct
tcccttcttccatgagttgaagcagtctgaggctctcaccagtgaagatgcccaattttg
agctttccaaccatccagaaccataagccaaataaaactttttttttttttttaacaaat
tactcagagtcaggtatttccttacagcaacacaaaatatgctagacagtgaggtgagtt
aatgtaagtaaaacatggctgggcgtggtgactcacacctgtagtcccagcactttagga
ggccaaggtgggcggatcacaaggtcaggagtttgagaccaccctggccaacatggtgaa
acaccgtctgtgctaaaaacacacacaaaaaactagctgggtgtggtggcacacgcctgt
agtcccagctactcgggaggttgagtcaggagaattgcttgaacccaggaggtggaggct
gcagtgagccaagattgcgccactgcacttgagcctgggtaacagagcaagactctgtct
agaaaaaaaaaatatgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtg
taacacatctgcaatcccagagagcagaggaattcatggttccatccccacctctctgga
gaagcttgaggctctcgtggtctggggcatctggcatgaagtggatagtggagtcactag
tatcatagtaggcaatgcccaagtatcctgaattccacagcacacacagatggatctgtc
cagcaaggaagaaaggaaatcactattagaatcactcataagtgtagggtttaccatgtc


  [Part of this file has been deleted for brevity]

aaatacaggccgggcacagtggctcacgcctgtaatcccagcactttgggaggccgaggc
gggtggatcatgaggtcaagagatcgagactatcctggctaacatgatgaaaccccgtct
ctactaaaaatacaaaaaattagctgggcatggtggcgggcacctgtagtcccagctact
cgggaggctgagtcaggagaatggtgtgaacccaggagacggagcttgcagtgagctgag
gtcgcaccactgcactccagcctgggtgatagagcgagactctgtctcaaaaaaaaaaaa
aaaaaaaaaaaaaacaaaaattagccgggtgtggtggcaggcaacttaatcccagctact
tgggaggcagaggcaggagaatcgtttgaacctgggaggcggaggttgaagagaatagaa
gctctgctggtccagagaaggattgggccagggctctgggagaccagggagaaagagggc
acatgtggtccctgttgactgtgagggtgggaatctgaggaaggctttggctcattgccc
cttgggtttgtccacagccatccttcccctgcggagtatgtcgaggtgctccaggagcta
cagcggctggagagtcgcctccagcccttcttgcagcgctactacgaggttctgggtgct
gctgccaccacggactacaataacaatgtgagccctttgatggccctgccctttctcctc
agccccagtactcccaaaacagaacaggctgaaatacagataactctttccctccctgga
aaaacattgcaacagggccaggtgcagtggctcacgcctgtaatcccagcactttgggag
gccaaggtgggcggatcatctgagatcgggagtttgagaccagcctggccaacatggtgc
aaccccatctctactgaaaatataaacattagctggatgtagtggtgcacacctgtaatc
ccagctactcaggaggctgaggcaggagaatcgctagaactcgggaggagggggttgcag
tgagccgagattgcactactgcactctagcctgggtgacagagcgagactgtctcaaaaa
acaaaacaaaacaaaaaaacacacattgcaacaaaacaatttctctctaaacctgtaagt
gattttgtcctcccttacagagaaggtgataatctttgctgtaagcactgtcctcgtatc
gtaccccttgtgcccctgaatgaatttagaaaatgtaaagtacaggagatcagtatatga
tgacttactgattcatagtagtgttttaataggatgttccttatgtgaataagatataat
ttatttgcaaagatttggtctacatgtaaacttccaaggatataactgaaagttttggag
gacatggtattctcagtaggcattattgcttttattagtgagatggactccagcttgata
ttttctgcctttttgtgtttggctggttgtgcgcagcacgagggccgggaggaggatcag
cggttgatcaacttggtaggggagagcctgcgactgctgggcaacacctttgttgcactg
tctgacctgcgctgcaatctggcctgcacgcccccacgacacctgcatgtggtccggcct
atgtctcactacaccacccccatggtgctccagcaggcagccattcccatacaggtgggt
tagggggagtctggcctgagggagagtgaggggtgttgatagagtgacccagggtagcta
ctgggcctgaaggaggttaggaaaggaggagactggaaacatggtgatgaaggctggaga
tactttagaggtttatcatgaggttttcttggttaggctcttgtatttttctcacatctg
cctgtccatctgtctttttcagatcaatgtgggaaccactgtgaccatgacaggaaatgg
gactcggccccccccaactcccaatgcagaggcacctccccctggtcctgggcaggcctc
atccgtggctccgtcttctaccaatgtcgagtcctcagctgagggggctcccccgccagg
tccagctcccccgccagccaccagccacccgagggtcatccggatttcccaccagagtgt
ggaacccgtggtcatgatgcacatgaacattcaaggtgagaatagttgctggcgagaaga
gcaggatcagcatgatgagggaggttcatgctgaggtgtgagggaacagggtggggaagg
gagaggcacatgctggtggtggtagcctggggaccagagcagaagcttaagtagacagat
gtggggggtgtgggggttggtttgtctttggaggtgtgtttgtgtggtgaagggagtacc
tctccctgtttagatggagggaaaggcaggctttctgattgggggattatgggcctgaag
tatgcctgatctcagaaggatatagttaggccttggccctacctacctcagggccactgt
ctctgtctccctgcccagattctggcacacagcctggtggtgttccgagtgctcccactg
gccccctgggaccccctggtcatggccaaaccctgggtaagagtgagggcatcagggcag
gctgagctctgggtagagaaagggaagggctgagtgggtgggttgaaggggtccaggttc
aaggttacatcagacccgccccccaggctccaccctcatccagctgccctccctgccccc
tgagttcatgcacgccgtcgcccaccagatcactcatcaggccatggtggcagctgttgc
ctccgcggccgcaggtaatgacctggaaggggaggcttgggaggtagggcacagtccatg
gtggcagctggctggcaagggcctggccctcagccctcttcggtctgtctcttctgccac
ccacaggacagcaggtgccaggcttcccaacagctccaacccgggtggtgattgcccggc
ccactcctccacaggctcggccttcccatcctggagggcccccagtctctgggacactgg
tgagcaagggtcggggagttctagtgcgtaacagtctagg

Output files for usage example 2

File: ap000504.split

>AP000504_1-50000 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section 3/20.
gaccaatctcactgtgaggaggcagtcaaagggaataatggaagagaggaagaggatttt
ctcagtggcagtcatggcgtctgggatgaaggagtagtttccagaaaggaggcgttgttt
gcttatctccagacctatttgagggaggcaagcaaagggaacggtcttgtagctcaattt
tttcaccccattttaagaatgagacaatagaagcaagagagattatttgacttgcccaag
ctcacacaggcagttaatggaaagctagagcaagaaccaaattttcagactcttagtcta
attctctttttattctacatataatataaagatacttgtctgaaagcacagcctgagaaa
gataaatggctgaggaaagtagacatctgtctggaattgaggattttggtcaaaataatg
gtattaatagaactagtaacactaatgccttaatatctaattaggatagtacactcctgt
tcttattgtaaacctaggaaagttatagaagtgccttatggatcataataagggtcactg
aggcagtgccttttggtttggtgataaaaggctttaacttaatggggagaattccaacaa
taaaaccctgtccaaaaagtgtcaccactcctcaggggaggccctcatccctagacatga
cttaagcagaggcttcccaataagctgcaggttattaaagggtagggagcaggagagatc
ttggggggacaggtcatagggcatgaggagcacaaaggtttaggatgacataaggcagag
gggagatctgtgatgatgaaggtagagttgggggaaagaatgggacaccggaacagggag
ttaggcaaagcaaaaggaaggagataccaaaatccacacttggcaaaaatatgatttcag
gtcttttaggctctctgtgctcctgggaggctgtgggggaggaaagaaaaggctatcatt
ctttacatctcagtccttctacctctgtctgacactccctctcacccaattctagccccc
tggaatattccatatattagtccttccccattttccctctatcctttaccaagtccttac
caagctttcccagaaatcgagtcatattctcatcctgtttggcactcgtaacaacagact
ggggattgatctcatccagaacttggaaggagaacagagatcaaatgagttaaaggatct
ttgtctttgactaagagaaaacccatagccctcctcttcctacccctctccttctcaaaa
acatttcctccctaggagtagggagtgctctgcacagtgggaacacaggtagaagttgag
atttagaaaagtagttaagagtggtgggatggtgagagggaagtgggatgttctggatgt
tgtcactaggctgtaaacccctggagaacagacatgactgatttgcccagggctgaatct
gaagcacctgaaacattgtaaatacgtcatatatatttgtggccaggcacagtggctcat
gcctataatccctgccctttgggaggccaaggcaggcagatcactggaggccaggagctc
aagacaagcctagccaacgtggtgaaaccctgcctctactaaaaatataaaaattagcca
ggcgtgatggcagattcttgtaatcccagctactcgggagactgaggcaggagaattgct
tgaatccgggagacggaggttgcagtgagccaagatggcaccactacacttccagcctga
gtgacggagcaagacactgtctcaaaaaagaacaaccaaacaaaccaaaaaacagcctca
caaatatttgttaaataatgaaatgaattcataaaaacaaaagagggagcctctgtgaag
caactgtaaaatatattgagtcagtgctatagtttggatgtgatttgtccctgccaaata
tcgtgttgaaatttaatccccagtgtgatagtgttgtgaggtagggcctagcaggaggtg
tgtgggtgatgggagtggatcgctcatgaacagattaatgcccttcctggagtgtgttgg
tgggtatgagtgagaggttctcactctattagttcctgagagagctggttgtcaaaaaga
gcctggcatctccctcccccttgcttcttctctgccatgtgacctctacacaccctgcct
tcccttcttccatgagttgaagcagtctgaggctctcaccagtgaagatgcccaattttg
agctttccaaccatccagaaccataagccaaataaaactttttttttttttttaacaaat
tactcagagtcaggtatttccttacagcaacacaaaatatgctagacagtgaggtgagtt
aatgtaagtaaaacatggctgggcgtggtgactcacacctgtagtcccagcactttagga
ggccaaggtgggcggatcacaaggtcaggagtttgagaccaccctggccaacatggtgaa
acaccgtctgtgctaaaaacacacacaaaaaactagctgggtgtggtggcacacgcctgt
agtcccagctactcgggaggttgagtcaggagaattgcttgaacccaggaggtggaggct
gcagtgagccaagattgcgccactgcacttgagcctgggtaacagagcaagactctgtct
agaaaaaaaaaatatgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgtg
taacacatctgcaatcccagagagcagaggaattcatggttccatccccacctctctgga
gaagcttgaggctctcgtggtctggggcatctggcatgaagtggatagtggagtcactag
tatcatagtaggcaatgcccaagtatcctgaattccacagcacacacagatggatctgtc
cagcaaggaagaaaggaaatcactattagaatcactcataagtgtagggtttaccatgtc


  [Part of this file has been deleted for brevity]

gaaaccctgtctctactaaaaaatacaggccgggcacagtggctcacgcctgtaatccca
gcactttgggaggccgaggcgggtggatcatgaggtcaagagatcgagactatcctggct
aacatgatgaaaccccgtctctactaaaaatacaaaaaattagctgggcatggtggcggg
cacctgtagtcccagctactcgggaggctgagtcaggagaatggtgtgaacccaggagac
ggagcttgcagtgagctgaggtcgcaccactgcactccagcctgggtgatagagcgagac
tctgtctcaaaaaaaaaaaaaaaaaaaaaaaaaacaaaaattagccgggtgtggtggcag
gcaacttaatcccagctacttgggaggcagaggcaggagaatcgtttgaacctgggaggc
ggaggttgaagagaatagaagctctgctggtccagagaaggattgggccagggctctggg
agaccagggagaaagagggcacatgtggtccctgttgactgtgagggtgggaatctgagg
aaggctttggctcattgccccttgggtttgtccacagccatccttcccctgcggagtatg
tcgaggtgctccaggagctacagcggctggagagtcgcctccagcccttcttgcagcgct
actacgaggttctgggtgctgctgccaccacggactacaataacaatgtgagccctttga
tggccctgccctttctcctcagccccagtactcccaaaacagaacaggctgaaatacaga
taactctttccctccctggaaaaacattgcaacagggccaggtgcagtggctcacgcctg
taatcccagcactttgggaggccaaggtgggcggatcatctgagatcgggagtttgagac
cagcctggccaacatggtgcaaccccatctctactgaaaatataaacattagctggatgt
agtggtgcacacctgtaatcccagctactcaggaggctgaggcaggagaatcgctagaac
tcgggaggagggggttgcagtgagccgagattgcactactgcactctagcctgggtgaca
gagcgagactgtctcaaaaaacaaaacaaaacaaaaaaacacacattgcaacaaaacaat
ttctctctaaacctgtaagtgattttgtcctcccttacagagaaggtgataatctttgct
gtaagcactgtcctcgtatcgtaccccttgtgcccctgaatgaatttagaaaatgtaaag
tacaggagatcagtatatgatgacttactgattcatagtagtgttttaataggatgttcc
ttatgtgaataagatataatttatttgcaaagatttggtctacatgtaaacttccaagga
tataactgaaagttttggaggacatggtattctcagtaggcattattgcttttattagtg
agatggactccagcttgatattttctgcctttttgtgtttggctggttgtgcgcagcacg
agggccgggaggaggatcagcggttgatcaacttggtaggggagagcctgcgactgctgg
gcaacacctttgttgcactgtctgacctgcgctgcaatctggcctgcacgcccccacgac
acctgcatgtggtccggcctatgtctcactacaccacccccatggtgctccagcaggcag
ccattcccatacaggtgggttagggggagtctggcctgagggagagtgaggggtgttgat
agagtgacccagggtagctactgggcctgaaggaggttaggaaaggaggagactggaaac
atggtgatgaaggctggagatactttagaggtttatcatgaggttttcttggttaggctc
ttgtatttttctcacatctgcctgtccatctgtctttttcagatcaatgtgggaaccact
gtgaccatgacaggaaatgggactcggccccccccaactcccaatgcagaggcacctccc
cctggtcctgggcaggcctcatccgtggctccgtcttctaccaatgtcgagtcctcagct
gagggggctcccccgccaggtccagctcccccgccagccaccagccacccgagggtcatc
cggatttcccaccagagtgtggaacccgtggtcatgatgcacatgaacattcaaggtgag
aatagttgctggcgagaagagcaggatcagcatgatgagggaggttcatgctgaggtgtg
agggaacagggtggggaagggagaggcacatgctggtggtggtagcctggggaccagagc
agaagcttaagtagacagatgtggggggtgtgggggttggtttgtctttggaggtgtgtt
tgtgtggtgaagggagtacctctccctgtttagatggagggaaaggcaggctttctgatt
gggggattatgggcctgaagtatgcctgatctcagaaggatatagttaggccttggccct
acctacctcagggccactgtctctgtctccctgcccagattctggcacacagcctggtgg
tgttccgagtgctcccactggccccctgggaccccctggtcatggccaaaccctgggtaa
gagtgagggcatcagggcaggctgagctctgggtagagaaagggaagggctgagtgggtg
ggttgaaggggtccaggttcaaggttacatcagacccgccccccaggctccaccctcatc
cagctgccctccctgccccctgagttcatgcacgccgtcgcccaccagatcactcatcag
gccatggtggcagctgttgcctccgcggccgcaggtaatgacctggaaggggaggcttgg
gaggtagggcacagtccatggtggcagctggctggcaagggcctggccctcagccctctt
cggtctgtctcttctgccacccacaggacagcaggtgccaggcttcccaacagctccaac
ccgggtggtgattgcccggcccactcctccacaggctcggccttcccatcctggagggcc
cccagtctctgggacactggtgagcaagggtcggggagttctagtgcgtaacagtctagg

The names of the sequences are the same as the original sequence, with '_start-end' appended, where 'start', and 'end' are the start and end positions of the sub-sequence. eg: The name HSHBB would be changed in the sub-sequences to: HSHBB_1-50000 and HSHBB_50001-73308 if they were split at the size of 50000 with no overlap.

Data files

None.

Notes

There should be little requirement to split sequences into smaller sub-sequences in EMBOSS, but there may be circumstances where memory usage becomes restrictive when dealing with truly large sequences.

References

None

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0

Known bugs

None.

See also

Program nameDescription
biosed Replace or delete sequence sections
codcopy Reads and writes a codon usage table
cutseq Removes a specified section from a sequence
degapseq Removes gap characters from sequences
descseq Alter the name or description of a sequence
entret Reads and writes (returns) flatfile entries
extractfeat Extract features from a sequence
extractseq Extract regions from a sequence
listor Write a list file of the logical OR of two sets of sequences
makenucseq Creates random nucleotide sequences
makeprotseq Creates random protein sequences
maskfeat Mask off features of a sequence
maskseq Mask off regions of a sequence
newseq Type in a short new sequence
noreturn Removes carriage return from ASCII files
notseq Exclude a set of sequences and write out the remaining ones
nthseq Writes one sequence from a multiple set of sequences
pasteseq Insert one sequence into another
revseq Reverse and complement a sequence
seqret Reads and writes (returns) sequences
seqretsplit Reads and writes (returns) sequences in individual files
skipseq Reads and writes (returns) sequences, skipping first few
trimest Trim poly-A tails off EST sequences
trimseq Trim ambiguous bits off the ends of sequences
union Reads sequence fragments and builds one sequence
vectorstrip Strips out DNA between a pair of vector sequences
yank Reads a sequence range, appends the full USA to a list file

Author(s)

Gary Williams (gwilliam © rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

Completed 22 March 1999

Target users This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None