We are very happy to provide you with the MCC (Micro Capture-C Code). This hasn't been put on Github because it is under licence but we are more than happy to share for academic use. The main scripts for analysis of the data are available from the Oxford University Innovation Software Store
Several different scripts are available for analysis of the data and we have a pipline which can be used to run these to go from raw FASTQ to a UCSC track hub. The bulk of the code is in perl and you will probably need to adapt it to run the full pipeline on your server. We are also happy to share the R code we used for the downstream analysis of the data.
Please see our GitHub page for the pipeline and other pieces of code.
Please note that the oligos are designed using CapSequm
The details of the analysis are outlined below. The key scripts are available from the Oxford University Innovation Software Store. There is also a pipeline to automate analysis, which can be downloaded from our GitHub page but this code will require some customisation to run on most servers. The overview of the analysis is as follows:
1. Create file of the position of the oligos with a name for the target in BED chr\tstart\tstop\tname or chr:start-stop_name or the output from Capsequm will work Please do not put "_" or other weird characters in the oligo names - this is likely to mess stuff up later on. Please make sure carriage returns are \n too (not \r\n)
2. Use the script MCC_BLATfa.pl to make a FASTA file of the 800bp surrounding the oligo sequences (Script available from the Oxford University Innovation Software Store).
perl MCC_BLATfa.pl -g mm9.fa -t 800 -f oligos.bed
This will output two files:
a. A FASTA file of the oligos with the sequence of the surrounding 800bp
>chr:start-stop_name #This should be the coordinates of the 120bp of the capture oligo NNNNNNNNNNNNNNNNNNNNN.... #The 800bp of the sequence surrounding the oligo
b. A BED file with the coordinates of the target oligo and the 800bp around the oligo. This file also specifies the colour in RGB format, which will be used in the track hub.
Please adjust these colours for the overlay tracks if you want them to be different.
3. Prepare your FASTQ files so that there is a R1 and a R2 file unzipped.
4. Trim adaptor sequences from the FASTQ files if this hasn't been done already (e.g. Trim_galore)
trim_galore --paired filename_R1.fastq filename_R2.fastq
5. Merge the paired end R1 and R2 files into a single read using Flash
flash filename_R1.fastq filename_R2.fastq
NB if the libraries have been sonicated well the majority of the reads will end up being combined
This file of combined extended reads - denoted by "_ext" is used for further processing
6. Convert the combined fastq file to fasta e.g. using sed
sed -n '1~4s/^@/>/p;2~4p' filename_ext.fastq > filename_ext.fa
7. Use BLAT on the command line to map the reads to the 800bp around the oligo sequences
blat -minScore=20 -minIdentity=5 -maxIntron=10000 -tileSize=11 oligo_file.fa filename_ext.fa filename_ext.psl
8. Run MCC_splitter.pl to split the reads into individual fastq files depending on which oligo the map best to (Script available from the Oxford University Innovation Software Store).
perl MCC_splitter.pl -f filename_ext.fastq -p filename_ext.psl -r reference_for_the_experiment
9. Align each of the fastq files produced using bowtie 2
bowtie2 -p 4 -X 1000 -x bowtie_genome_path filename.fastq -S filename.sam
10. Sort the reads into order with samtools
samtools sort -n -o $filename\_sort.sam $filename.sam
11. Run MCCanalyser.pl on each file. (Script available from the Oxford University Innovation Software Store)
perl MCC_analyser.pl -f filename.sam -pf public_folder -bf path_to_bigwig_genome_sizes_file -genome genome -o oligo_file.fa
The oligo_file is the FASTA file generated with MCC_BLATfa.pl script which is used as the reference for the BLAT mapping
We're always keen to hear from people who are interested in working with us