SNP array resources SNP information file Genome information file
Read in SNP data External or Illumina SNP data Filter SNPs
Combine array types Probe data view
SNP data view Export SNP data or regions
Besides the mRNA expression level analysis, oligonucleotide arrays have also been applied to Single-nucleotide-polymorphisms (SNP) (Chee et al. 96; Wang et al. 98) and loss-of-heterozygosity (LOH) studies (Lindblad-Toh et al. 00). Please cite Lin et al. 04 if dChip SNP analysis functions are used in your work.
Affymetrix resources: Technical note on LOH and copy number analysis, CNAT software, SNP array publications
500K SNP array resources: Product page, Support materials, HapMap genotypes (CEL files, GEO, 48 samples), Copy number data
Array support materials: 10K array, 10K 2.0 array, 100K array, 500K array, 5.0 array, 6.0 array
Other SNP array analysis software from academics: CNAG, PLASQ, SNPscan
100K datasets from Affymetrix: 54 individuals (only genotype calls), HapMap trio dataset
Zhao et al. 2005 (Data), Garraway et al. 2005 (GEO Data); These two datasets use Early Access 100K array
Zhao et al. 2004 (Data); this dataset uses Early Access 10K array
If genotype text files are not directly downloadable from the GEO website, at each sample page, click “View full table” on the bottom, and use “File/Save as/Save as type: text file” to save to a text file (example).
CDF files: Early Access 10K array (unzip it), Early Access 100K array: CentHindAv2.CDF, CentXbaAv2.CDF
Mapping 10K – 500K SNP arrays: go
to the “
Genome information files for SNP arrays
To use the “Analysis/Chromosome” function to view SNP data, we need genome information file and optional RefGene and Cytoband file. The RefGene files provide the gene information. The genome assembly hg number of these files should be matched in one analysis session (Correspondence between NCBI build and UCSC assembly numbers).
Unzip these files to get genome information files or see their format:
HuSNP, Early Access 10K (ax13339 array; file names contain “10k”) and Mapping 10K (Mapping10K_Xba array, file names contain “11k”) SNP arrays: hg12.zip or hg15.zip
Mapping 10K 2.0 (hg17), Early Access 100K, Mapping 100K (hg16, Xba, Hind and combined), Mapping 100K, hg17
Early Access 500K (hg17), Mapping500K: hg17 (Nsp, Sty and combined), hg18 (use WinRAR to unzip)
If needed the information file of two sub-arrays can be row-wise combined using a text editor when combining sub-arrays for analysis.
2/12/08: SNP 6.0
and SNP
5.0 genome info files include one for the CNV probes. The two files may be
combined in a text editor to view SNP and CNV probes together.
These files are based on the
annotation
[10/10/07] This Python program can be used to make dChip genome information files from recent Affy annotation CSV files of SNP arrays such as "Mapping250K_Nsp.na23.annot.csv". Download and install Python, start Python GUI, use "File/Open" to open "snp500k_info.py" (in the same directory as CSV and PSI files), modify file names in the program if necessary, and select "Run/Run Module".
Use “Analysis/Open group” to open a group of SNP array
For 500K SNP array, Natalie Twine observed that when batch exporting of .CHP files from GTYPE to generate the .TXT files, the resultant .txt files will have different orders of SNPs from the CDF file and this leads to long reading time at "Open group". The solution is to manually open a .CHP file in GTYPE and export as .TXT by clicking on the export icon.
Specify “Other information/CDF file”, and specify “Other information/sample information file” if needed.
Then click OK to read data. You may uncheck "Analysis/Open group/Options/Load probe data in memory" to not load probe data for faster computation when array
number or size is large.
[V5/29/06+] To read SNP CEL files with combined genotype file but no individual matching TXT/CHP genotype files, first prepare a combined genotype call file from Affymetrix genotyping software, in this format (save in text format in Excel). Make sure the column names are the CEL file names without “.CEL” extension. Then after “Open group”, use “Analysis/Get external data” and specify this genotype call file as “Data file” and check “Read SNP call file and save to DCP file”. In future sessions of dChip, “Open group” will be fine without doing “Get external data” again, since genotype calls are already in DCP files.
External SNP data or Illumina SNP array
data
One can also use “Analysis/Get external data” to read in a tab-delimited text file containing SNP calls or SSLP LOH data. Each column contains the SNP call of one sample (example file). Check the “SNP or SSLP data” checkbox. If there are signal data as well (e.g. exported by “Tools/Export expression data” after signal analysis; see below) in the external data file, check the “Has both signal and SNP call” checkbox. This means there are SNP call columns after the signal value column for each sample.
The external file may contain a SNP signal column as above, or two signal columns for two alleles (example file) as described in this paragraph. For Illumina SNP array data, this format may be used. The "Signal" column of external data file contains allele A signal and the "SE" column in external data file contains the allele B signal. The allele A/B signals may be normalized signals comparable across samples for a SNP, or computed allele-specific raw copy numbers. Save the data file in tab-delimited text format. At "Analysis/Get external data", check "Has both signal and SNP call", "Has standard error" and "SNP data", and check "Options/Model/Compute A & B allele signals for SNP array". An associated text format genome information file should be made to match the SNP IDs in the external data file. After reading in data, "Analysis/Chromosome" may be used for further analysis or visualization.
We may also read in and visualize SSLP LOH
data. See Wang et al. 2005 for applications. Download and
unzip the example data file. The data file
contains NI (noninformative),
Also see the Illumina BeadStudio plugin for outputting dChip-format data.
[Use version 10/13/05+] We can filter SNPs with better quality to use in the downstream analysis. For example, Nannya et al. 2005 used SNP fragment length and GC content to improve signal-to-noise ratio. Specify a SNP information file at "Analysis/Open group/SNP information file". Also use an array list file to group normal and tumor samples. After "Open group", use "Analysis/Filter SNPs" to filter SNPs using fragment length and No Call rate across samples. A conflict call between normal and tumor samples (e.g. A in normal and B or AB in tumor) is counted as a No Call. Afterwards specify the filtering SNP list at "Analysis/Chromosome/SNP list file" to only use these SNPs in analysis.
Combine sub-arrays or different
SNP array types
There are two sub-arrays of 100K data for two restriction enzymes: XbaI and HindIII. Each array type can be analyzed to obtain signal values using “Open group”. To combine the data of the two arrays, use “Tools/Export expression value” to export both signal values and SNP calls (check “Has both signal and call” but uncheck “Has standard error”), then row-wise combine the data (See combine sub arrays). Checking “Tools/Export expression value/Append to this file” can avoid manually combing the data. Finally open the combined data file by “Analysis/Get external data” (check “Has SNP call” and “SNP data”), and use a combined genome info file. To save time, you can first look at the data of each sub array separately. If you find combining data could increase the resolution for aberration regions, you can combine the sub arrays.
Combining different generations of
SNP arrays is similar to combine
expression data of different arrays. First manually make a common probe set file. For Early
Access 10K and Mapping10K131 array, the file contains ~8K common
SNPs (based on a file “Template EA conversion 10K.xls”). For Mapping10K131
array and Mapping10K142 array, use Affymetrix annotation
Combing 100K and 500K arrays may be more difficult since the two arrays use different restriction enzymes and the common SNPs are fewer. Right now one may open two dChip sessions side-by-side to visualize and compare.
At the PM/MM data view, the 20 probe pairs are ordered from left to right. The same ordering applies when using “Tools/Export probe set” to export probe level data.

On the left, the first 5 probe pairs are for PM and MM A allele, forward strand; the next 5 probe pairs are for PM and MM B allele, forward strand; the next 5 probe pairs are for PM and MM A allele, reverse strand; the last 5 probe pairs are for PM and MM B allele, reverse strand. On the right, these four probe pair sets are upper left 10 probes, lower left 10 probes, upper right 10 probes and lower right 10 probes. The middle probe pair of the 5 probe pairs has shift 0, and the others have shift from –4 to +4. For some probe sets, there are 7 probe pairs instead of 5 in each of the four sets.
This view is useful since the Affymetrix SNP genotype calls (Di et al. 05, Liu et al. 03) can be checked with their probe level data when in question. After “Open group” finishes, a “SNP” icon will show in the left panel. Click the icon to display the SNP view:

The squares in the top panel of the SNP view represent the arrays, clustered by Principle Component Analysis (PCA) using probe-level data of a particular probe set (data courtesy of Charles Wang). Red, blue, yellow and black colors are for allele call AA, BB, AB and “No Call”. The PCA method: For each MiniBlock i = 1 … M, compute Diff_A = max (pmA – mmA, 1), Ri = Diff_A / (Diff_A + Diff_B). The data of one SNP in one sample is (R1, R2 ,… RM). Finally use PCA to project S data points (for S samples) into two dimensions to visualize.
The bottom panel shows the probe-level data of this probe set in the currently selected array. In this example there are 4 mini-blocks (only for one strand), and each mini-block has intensity data for mmA (gray), pmA (red), pmB (blue) and mmB (gray). The intensity bars are scaled relative to the maximum intensity currently in view.
Use the “Home” and “End” key to go
to another marker, and the “PageUp” and “PageDown” key to go to another array.
Use Arrow keys to zoom the image, and Control+Left and Control+Right keys to
adjust the point size.
Export
SNP data or chromosome regions
Use “Chromosome/Export SNP data/Data under view” to export LOH, log2, raw or inferred copy numbers (“Chromosome/Next data type, Display inferred” to switch). The current curve values will also be exported. At the LOH data view, when paired normal and tumor samples exist, the informative and conflict call percentages of sample pairs will also be exported. Open the exported file in a text editor and go to the bottom to see these values.
When exporting the inferred LOH calls, the option “Options/Chromosome/Inferred LOH call threshold” can be set to convert inferred probability of LOH to LOH calls. SNPs with Probability (LOH) > threshold will be exported as LOH, and SNPs with Probability (LOH) < 1 – threshold will be exported as Retention, and otherwise exported as “No LOH Call”. If the threshold is set to –1, the probability of LOH will be exported. "L", "R" and "N" in the exported file represent “Loss”, “Retention” and “Noninformative/No call”.
“Tools/Export expression value” exports raw signal values and SNP genotype types. This is useful for combing array data.
To export interesting chromosome regions with curve exceeding specified threshold, first go to a data view (key ‘D’ or ‘I’) such as inferred LOH or copy number view. Also specify “Standardize separators” in array list file to divide different tumor samples or pairs before exporting. Then use “Chromosome/Export SNP data/Regions with significant curve value” to export. Checking “Options/Chromosome/Use min and max as threshold” will use “£ Min or ³ Max” as threshold, otherwise “³ Threshold” is used as threshold. For example, to export regions with inferred copy number value beyond [0,7], set Min = 0, Max = Threshold = 7 at “Options/Chromosome”. At the LOH data view, the LOH prevalence score across samples is used for exporting; at the inferred copy number data view, the inferred copy number in individual samples are used for exporting. If cytoband or refgene files are specified at "Analysis/Chromosome", cytobands or genes contained in the regions will also be exported in addition to SNP names.
(Updated 8/11/07)