Tumor-only LOH inference Copy number analysis Adjust batch and individual effect
Trimmed analysis Infer copy numbers Hyperploidy and normal contamination
Copy number summary plot Export data in Wiggle format Show LOH in copy number view
Loss-of-Heterozygosity
(LOH) analysis
See Lin et al. 2004 for reference. Also see the “Analysis/Chromosome” function. To combine the SNP call data for pairs of normal and tumor samples to make LOH calls, one need to first use “Tools/Array list file” to order arrays so that tumor array follows the paired normal array, with standardize separators separating pairs of normal and tumor samples (normal sample first) or only tumor sample without paired normal:
173Normal
173Tumor
---
Standardize---
174Normal
174Tumor
---
Standardize---
175Tumor
---
Standardize---
…
At “Analysis/Chromosome”, specify the genome information file, refgene and cytoband file, select “Analysis method” as “LOH”. Click OK to display chromosome view, and see Analysis/Chromosome for some commands used in this view. Select “Chromosome/Next data type” or key “D” or “Enter” to toggle between SNP genotype data and LOH data view. In the SNP genotype data view, red, yellow, blue and white colors represent AA, AB, BB genotype calls and No Call. One can mouse over the colors to see their values in the bottom of the window. In the observed LOH data view (left side, data from Zhao et al. 2004):

The SNP calls of a pair of normal and tumor samples are combined to obtain the observed LOH calls. The blue, yellow, red, gray and white colors indicate LOH (AB in normal and A or B in tumor), Retention (AB in both normal and tumor, or No Call in normal and AB in tumor), Conflict (A (or B) in normal and AB/B (or AB/A) in tumor), Non-informative call (A or B in both normal and tumor), and “No Call” (SNP No Call in normal or tumor sample)). See Chromosome view and Adjusting the chromosome view for more information on the LOH data view.
At the “Proportional distance” and observed LOH view, when “No Call” or non-informative markers compete space with LOH or retention, the former will lose since the latter is more important. Therefore using Shift+up/down arrow will not change this view as much as before due to that informative markers are hidden behind non-informative markers. Note the sub-pixel display at the non-proportional view: as you press Up arrow, the disappearance of marker names on the right side indicates that not all markers are displayed.
[Use dChip 10/21/05+] See Beroukhim et al. 2006 for the method details. A Hidden Markov Model (HMM) is used to infer the probability of LOH based on the observed LOH calls (from paired normal/tumor samples) or genotype calls (from tumor sample without paired normal). Use “Chromosome/Display inferred” (or key “I”) to toggle to the inferred probability of LOH (the right-side figure above), which is displayed from blue (1) to white (0.5) to yellow (0). The blue LOH curve on the right is a LOH score measuring the prevalence of LOH at a marker across the samples, and is computed as the average probability of LOH. “Chromosome/Permute data” can be used to assess the significance of the peak LOH score regions. The inferred LOH calls can be exported by “Chromosome/Export SNP data”.
Sometimes intervening LOH/retention calls can be inferred. They can be caused by intervening homozygous and heterozygous genotypes due to normal contamination or hyperploidy. One may try a larger “Options/Chromosome/Genotyping error” to make it smoother.
To infer the LOH status of non-informative LOH calls from paired normal/tumor LOH analysis, the method “Options/Inferred LOH method/Same boundary” ("Fill in noninformative for pair") can be used in addition to the HMM method. It finds the regions with consecutive non-informative calls, with two informative calls as the boundary. If the observed LOH calls of the two boundary SNPs are the same (both are loss or retention), we infer the call of the non-informative markers in between to be the same as the informative boundary.
For the tumor-only LOH inference, when no reference genotype file is used at "Options/Chromosome", in HMM the probability of observing AB markers giving underlying retention state is set to be "Options/Chromosome/Average HET rate" (e.g. 0.3 for 10K and 0.2 for 100/500K arrays). The smaller it is, the more likely we will observe consecutive homozygous calls from data and the less need there is to infer LOH to explain these homozygous calls. When a reference genotype file is used, the SNP specific HET rate will be estimated from the file and used for basic HMM, and the previous-marker dependent HET rate will be estimated from the file and used for “HMM considering haplotype”.
The tumor-only LOH inference using haplotype information is illustrated here using 100K SNP array data. First combine 100K Xba and Hind arrays, and read the combined data file into dChip using “Analysis/Get external data”. At “Tools/Array list file”, put only a tumor sample in a “Standardize group” to infer LOH from only tumor samples. At “Analysis/Chromosome/Options”, specify “Inferred LOH method” as “HMM considering haplotype”, and specify a normal reference genotype file at “Reference genotype file” (unzip this file: 100K genotypes of CEPH 60 parents, data source; make such files; 500K genotypes of CEPH 60 parents). V12/14/06+: When no "Options/Reference genotype file" is specified, the normal samples specified in sample info file as "Ploidy(numeric)" of 2 will be used to estimate SNP heterozygosity and genotype dependence probabilities.
If the “Remove LOH regions” threshold (T) is not 100%, an HMM-inferred LOH region in tumor-only sample will be removed if its genotypes are “consistent” with more than T % of the 60 normal reference samples (see manuscript above for details). Click OK to run HMM for tumor-only LOH inference. The figure below: (Left) Using the basic HMM (“Inferred LOH method” is “Hidden Markov Model”), compare the LOH inferred from paired normal and tumor samples with the LOH inferred from tumor-only samples. Many small blue horizontal lines indicate falsely-inferred LOH. (Right) The “HMM considering haplotype” method, also with “Remove LOH regions” threshold of 10%.

See Zhao et al. 2004 for reference (page 2). We need multiple normal samples (e.g. 10) for reference, and ideally they come from the same array core and batch as the tumor samples. If no normal is available, one may specify all tumor samples having “Ploidy(numeric)” as 2 (see below) to make conservative estimate of copy number changes. One may use the reference data from published data, but it is better use your own normals, since the normal samples and tumor samples from the same batch have more similar characteristics and thus these normal samples best serve as the references for copy number analysis.
The steps to perform copy number analysis are: (1) “Analysis/Open group”. Supply a sample information file with optional “Gender” (Male or Female; only affects X chromosome analysis) and required “Ploidy(numeric)” columns (Example sample info file; save in text format). Be sure to capitalize the first letter in "Gender, Ploidy(numeric), Female, Male", and there is no space between “Ploidy” and “(numeric)”. dChip will recognize and use the these two sample information columns. Samples with “Ploidy(numeric)” column as 2 are regarded as normal samples and are used to compute the mean and variation of the signal of 2 copy. Normal samples do not have to be in the array list files to be used, and only samples in the array list files are displayed and their copy numbers are inferred. For tumor samples, specify ploidy as missing (blank). If no sample is specified as 2 for “Ploidy(numeric)”, specify “Options/Chromosome/% of samples trimmed” to be > 1 to use trimmed analysis. If there are no normal samples, you do not need a sample information file with “Ploidy” column.
[Forum thread] For chromosome X the “Gender” information is used so male’s signal is taken as one copy when computing mean signal (of 2 copy). Gender of samples has default value of “Female” if no "Gender" column exists in sample information file.
(2) Perform “Analysis/Normalize” and “Analysis/Model-based expression”, using “Options/PM/MM difference model”. A signal value is computed for each SNP in each sample and is analogous to the expression value in expression array analysis. Based on these signal values, the raw copy number for a SNP in a sample is computed as (Signal / (mean signal of normal samples at this SNP) * 2), and log2 ratio is computed as log2(Signal / mean signal of normal samples at this SNP). The raw copy numbers and log2 ratio data can be viewed in the chromosome view (see below).
dChip handles expression and SNP array in the same manner. If you “Open group” a data set for the first time, it only reads in raw CEL intensities and calls. “Normalize” and “MBEI” will be followed to normalize CEL values and compute MBEI, and save them to DCP files. In future dChip sessions, normalized CEL values and computed MBEI will be read at “Open group” and indicated at lower-right corner of dChip, so you don’t have to do these two steps again.
(3) Specify “Tools/Array list file” containing paired or single samples separated by “Standardize separators”. Then go to “Analysis/Chromosome”, select “Analysis method/Signal value (Copy number)”, specify genome information file, refGene and cytoband information files (the last two are optional).
Click OK to run and display copy number view. LOH analysis will also be run together. In the figures below (data from Zhao et al. 2004): left: raw copy number estimate. The 0 copy is displayed in white, and 5 or more copy are displayed in pure red. The gray box on the right represents the value range from 0 copy to 4 copy, and the red line represents the normal 2 copy. Right: Inferred copy number. The blue curve in the gray box displays the copy number of sample “2171 tumor” in another form, either raw copy (left) or inferred copy (right). Click the data area of a sample to display its copy number as the curve.

Use key “D” or “Chromosome/Next data type” to switch between copy number, LOH, log2 ratio and genotype views. Use key “I” or “Chromosome/Display inferred” to switch between raw copy numbers and inferred copy numbers. “Tools/Options/Clustering/Displaying range” can be adjusted although dChip selects default range for each data view (-2 to 2 for log2 ratio, 0 to 5 for copy number).
See the Chromosome view and Adjusting the chromosome view sections for more information on the copy number data view.
Adjust batch and individual effect
In the SNP copy number analysis, check “Options/Chromosome/User paired normal as reference” to use the signal of the paired normal to obtain the raw copy numbers of tumor samples, as opposed to using the average signal of all normal samples. The normal samples still use their average to get raw copy numbers.
We often observe batch effect in copy number analysis. For example, Arupa Ganguly observed "some arrays hybridized on a different date have a higher copy number for the normal tissue as well as the tumor tissues". If a tumor and its paired normal are in the same batch, we can put them in the standardize group and check “Options/Chromosome/use paired normal as reference”. In this case the normal samples still use their average to get raw copy numbers (may not be good due to batch effect), but tumor samples are adjusted for both batch and individual effect by its paired normal sample.
If normal and tumor samples are not paired or they are not in the same batch, but each batch has its own normal samples, it’s best to analyze samples from the same batch together (an "array list file" contains sample of the same batch). After obtaining raw copy numbers, use “Chromosome/Export SNP data” to export raw copy numbers, and finally column-wise combine such raw copy number file of multiple batches and format it as “External data file” and analyze the combined file in a different dChip session (use “Get external data” to read this file).
[V7/22/07+] In
copy number analysis, samples of different batches or experiment conditions can
use their own normal reference samples. Specify a "RefBatch" column
in sample information file with the same value for samples in the same batch or
condition. "Ploidy(numeric)" and trimming will work within each
"RefBatch". (Forum
thread)
[Forum thread] Recent studies discovered that there are copy number variations in normal tissue samples across individuals (Iafrate et al. 2004, Sebat et al. 2004). The implication is that we cannot always assume that the reference normal samples used to obtain reference signal distribution all have copy number of 2 at a SNP locus. In addition, researchers often cannot obtain enough (~10) normal reference samples to analyze together with tumor samples. When normal samples are not available or too few, set “Options/Chromosome/% of samples trimmed” to be > 0 to obtain reference signal distribution without using the information of which samples are normal. For example, if 10% is specified, we assume that for any SNP less than 10% of all the samples have abnormal copy number. For a SNP, 5% of samples with extreme signals are trimmed from each end, and the rest values are used to estimate the mean and standard deviation of the signal distribution of normal copy number 2. The large amplifications and homozygous deletions can be found, but the method may infer false copy number changes in normal samples. You can try different trimmed % and compare the results.
If you specify "% of samples trimmed " as a negative value (e.g. -20), only the normal samples (specified as 2 for "Ploidy(numeric)" in sample information file), instead of all samples in the group, are trimmed by this percent and used for computing signal distribution of copy 2.
Figure below: median-smoothed copy number (window size 11 SNPs, trimmed 1% samples) for chromosome 19 of Affymetrix HapMap trio dataset. A sample (name highlighted in blue) has heterozygous deletion in the 19p13.11 region, which is also one of the most frequently deleted region found by Iafrate et al. 2004.

When a region has copy change in the same direction in most samples, this trimming method will likely miss the region. In such case having at least one normal sample will be useful, so at a first round analysis one can pair this normal with every tumor in array list file and check “Options/Use paired normal as reference” to get raw copy number and assess whether such regions are likely.
Set “Options/Chromosome/Inferred copy method” to be “Hidden Markov Model” to infer copy numbers (see Zhao et al. 2004, page 2 for method). In the inferred copy number view on the right (figure above), there is no significance value associated with the inferred copy number on the right --- the whole curve is the most likely underlying copy number based on HMM model. The abnormal copies inferred in normal samples are likely to be false positive. One can toggle between the inferred and raw copy number to confirm or reject the inferred copy numbers.
Computing inferred copy number
takes some time and here is a way to save some time. At the inferred copy view,
use “Chromosome/Export SNP data” to save the
inferred copy numbers into a file. In the next session, use the same array list
file as before, set “Options/Chromosome/Inferred copy method” to be “Read from
file” and specify the inferred copy number file at “Options/Chromosome/Inferred
copy file” to read in existing inferred copy number rapidly. As an additional
utility, in this way different groups of DCP files can be analyzed separately
(e.g. normalized against different references (e.g.
Set “Options/Chromosome/HMM length” to N to perform HMM inference of LOH and copy number for a stretch of maximum N SNP markers each time. This can increase the speed for SNP array with density > 100K, where chromosome 1 has > 9K marker but one can set “HMM length” to be 1000. Such HMM length is mainly a practical consideration since for 500K array one chromosome can contain 50K SNPs, but the SNPs far apart do not provide much information to each other. So HMM is carried out for every chromosome segment containing a particular number of SNPS (e.g. 10K as HMM length). Except at the segment boundaries, the HMM inferred copy will be very close using different large HMM lengths.
[Version 7/20/06+] Setting “Inferred copy step" to 1 (default) will infer integer copy numbers 0 to 26. Setting “Options/Chromosome/Inferred copy step” to be > 0 and < 1 to infer fractional copy numbers in the multiple of the copy step, up to 27 copy numbers. This option can accommodate fractional copy numbers in tumor samples with high ploidy (average copy number), where the normalization across tumor and normal samples scales the overall ploidy to be 2 for tumor samples. For example, a sample with ploidy of 10 has real copy change step of 0.2, when scaled to an overall ploidy of 2 by normalization. Setting "Inferred copy" step to 0 will infer these custom set copy numbers: {0, 0.05, 0.1, 0.2, 0.4, 0.6, 0.8, 1, 1.3, 1.6, 2, 2.5, 3, 4, 5, 6, 8, 10, 12, 15, 18, 21, 25, 30, 40, 60, 100}.
Set “Options/Chromosome/Inferred copy method” to be “Median smoothing” and set a SNP marker window size (e.g. 10) to median smooth raw copy numbers as the inferred copy number. Compared to HMM-inferred copy number, this method performs faster and gives closer result to the raw copy numbers; It is also robust to outliers in raw copy numbers, and does not need parameter specification in HMM fitting. However, median-smoothed copy numbers are not as smooth as HMM-inferred copy numbers, and copy changes smaller than half of the window size will be smoothed out.
[V9/14/07+] “Options/Chromsome/d.f. of t” is the degree of freedom of t-distribution used in HMM emission probabilities. The lower value can make t-distribution have longer tails and therefore more tolerant to outliers and may lead to more smoothed HMM inferred copy. Sub-integer “Inferred step” such as 0.2 may also help to smooth. "Window" is only for median smoothing.
Hyperploidy and normal sample contamination
[Forum thread] If tumor sample is contaminated with normal cells at some proportion, the common effect of such normal contamination is to make a real LOH region contain intervening LOH and retention calls, and decrease the real copy number gain (e.g. 2.5 copy instead of 3) and moderate the real copy number deletion (e.g. 1.5 copy instead of 1). Note that hyperploid tumor cells will also produce non-integer copy numbers since the overall ploidy is scaled to 2 by normalization and comparing to normal samples. For example, 2 copy in triploid cells will appear as 1.33 copy in SNP analysis.
One way around is to allow sub-integer copy number in dChip HMM inference (e.g. specify 0.2 as the copy step at “Options/Chromosome/Inferred copy step). In a sample with high ploidy such as triploid cell, the absolute 1, 2, 3 copy may be inferred as 0.6 (~=0.67), 1.4(~=1.33) and 2 copy. In a sample with 50% normal and 50% tumor mixture, the absolute copy 1, 2, 3 in the tumor cell may be inferred as 1.4 (~1.5), 2 and 2.4 (~2.5). So although we cannot get absolute copy number, the relative changes from the overall ploidy (i.e. 2) may still be informative for finding copy gain and losses.
[V8/25/06+] In sample information file, specify “Ploidy(numeric)” column of a sample as not 2 and not blank for a tumor sample (e.g. 3.1 for a near triploid sample), so its HMM inferred copy number will consider ploidy by scaling raw copy number by the ratio of ploidy:2 before inferring copy number.
[V10/3/07+] Check "Options/Chromosome/Scale copy number mode to 2 copy" to try to recover absolute copy numbers (a similar "cytonormalization" method is used in Mullighan et al. 07). The inferred copy numbers are binned into copy number intervals of 0.05, and the copy number bin with the largest SNP counts is regarded as corresponding to absolute copy number 2 in cells. The assumption is that chromosome regions with any altered copy number are smaller than un-altered chromosome regions with copy number 2. Then this "mode copy number" is used to scale the inferred copy numbers by a factor of (2 / mode copy). The raw signals and raw copy numbers are not scaled. The genome-wide effect can be seen below (left: before scaling; right: after scaling).

To make a copy number distribution plot similar to Figure 1B of Zhao et al. 2005, select "Chromosome/Summary Plot" at the inferred copy number view. The copy number thresholds outside which a sample is counted as gain or loss at a SNP are specified at "Tools/Options/Chromosome/Curve Min or Max". Specify a cytoband file at “Analysis/Chromosome” to show dotted lines marking the centromeres. [Version 2/26/06+] The thresholds (Min and Max at “Options/Chromosome”) will be divided by 2 before comparing to chromosome X copy number in male to count gain or loss. For example, if Min is 1 and Max is 3, copy number =< 0.5 or >= 1.5 are counted. [V12/22/06+] At the chromosome view, use “Chromosome/Show All” to toggle between displaying all chromosomes or just one chromosome, and then select “Chromosome/Summary Plot” to draw summary plot for all or one chromosome.

[V11/29/07+] If showing LOH in copy number view is on, UPD will be displayed in the copy summary plot, with black color representing UPD proportion:

[V9/15/07+] When in the LOH or copy number view, "Chromosome/Export SNP data/UCSC Wiggle format" can export a custom track data file for the selected sample. Checking “Output only current chromosome” will export for the current chromosome. The file can then be uploaded in the UCSC genome browser ("manage custom tracks" button) to view together with cytobands or gene name. You may use key "D" and "I" to toggle to different data views and check "Append to this file" to export multiple types of data into the same file. An example image is displayed below (data from Zhao et al. 2004). (Suggested by Charlotte Schjerling)

[V11/22/07+] Check “Chromosome/Show LOH in Copy” to show uniparental disomy (UPD) in black in inferred copy or log2 view. UPD is defined at the SNPs which have probability (LOH) >= threshold (set at “Options/Prob(LOH) threshold”) and near normal copy number (2 +/− margin, set at “Options/Chromosome/Show LOH at copy 2 +/−”. A log2 view example is below: blue: deletion, red: gain, black: UPD, white: normal copy. (Suggested by Jian Huang)

See Li et al. 2008 BMC Bioinformatics for details. At “Analysis/Chromosome” with analysis method as “Copy number & LOH”, check “Options/Infer MCP” to infer major copy proportions (MCP). The inferred MCP values will be displayed as the inferred data at the genotype data view (use key “D” to toggle).
(Updated 6/20/08)