dChip: Function Updates (2000-07)
Recent
updates Older
dChip versions
12/14/07: (1) Start to add toolbar
and menu icons. (2) R view is changed to an
“R codes” dialog and the Analysis view is used for R output.
12/7/07: “Tools/Gene list by keywords” can accept a gene symbol file
(each line for a gene) and convert gene symbols to
probe set names.
11/29/07: UPD can be displayed in
copy number summary plot.
11/22/07: Check “Chromosome/Show LOH in Copy” to show uniparental disomy (UPD) in black in
inferred copy or log2 view.
11/19/07: Process human gene ST array.
11/18/07: Sample correlation
matrix is drawn in the Plots view. When samples are clustered, the
correlation matrix has the same order.
10/10/07: (1) While in the chromosome view, use "Chromosome/Sort Samples" to sort
samples according to the current SNP's data values under view.
(2) Make SNP genome information files using
a Python program.
10/7/07: Set "Options/Chromosome/Fixed HMM SD" to be > 0
to use this value as the standard deviation (SD) of copy number HMM's emission
distribution, instead of computing SDs SNP-wise from normal samples. Larger
values will lead to more smooth HMM-inferred copy numbers.
10/3/07: Scale inferred copy number by
"mode copy number" to recover absolute copy numbers.
9/15/07: Export SNP copy number or LOH data
in UCSC Wiggle format.
8/25/07: "Chromosome/Export SNP data" can export chromosome regions for both inferred copy
and LOH data, and for both single samples and summary score (use key S to
toggle). To compute summary score for copy
number data, set "Options/Threshold" to be not equal to 2.
8/12/07: Chromosome region data can be exported
to find genes or cytobands altered in high percentage of samples.
8/9/07: Bug corrected: an error
when reading reference normal or combined genotype file existed in dchip
versions from 6/29/07 to 8/8/07. It affected these functions: LOH analysis
(LD-HMM and haplotype correction); Reading external genotype call file for CEL
files without paired genotype call file (e.g. SNP 5.0 and 6.0 data). (Discussion)
8/4/07: Can analyze SNP 6.0 arrays
(SNPs but not CNV probes). Use this genome
info file. (Discussion)
7/22/07: Can specify a
"RefBatch" column in sample information to indicate
batches for copy number analysis.
6/29/07: Can analyze SNP 5.0 arrays
(SNPs but not CNV probes). Use this genome
info file. (Discussion)
4/16/07: Chromosome
regions may be specified when analyzing SNP data. When reference genes are
specified as chromosome regions, it may be helpful to focus attention on the
copy number/LOH events in these genes.
4/4/07: At the gene clustering
view, if a gene cluster is selected, "Tools/Gene function enrichment"
can analyze this cluster of genes. The function term names in summary table are also sorted.
3/31/07: The function terms of
redundant probe sets for the same genes will only be counted once at gene function enrichment analysis ("Tools/Gene
Function Enrichment" and gene clustering) and when reading gene
information file at "Open group", whether "Tools/Options/Mask
redundant probe set" is checked or not. A summary
table of enrichment analysis is output in the analysis view.
3/24/07: When making gene
information file, gene symbols will be added in
the gene name column of gene information file. (Discussion)
2/28/07: If the probe set mask file has
"Including" in the first line, the file will be regarded as a probe
set inclusion file and only the probe sets in this file will be included in the
cdf.bin file and downstream analysis.
2/26/07: At "Analysis/Compare samples/Combine comparison",
check "Make M-A plot"
to plot log ratio against log product of the two groups' mean expression level.
2/1/07: (1) At "Open
group", specify "Suffix of TXT/CHP file" as ".chp" to
read expression or SNP calls from CHP files. This avoids
converting CHP files to TXT files to use in dChip.
(2) At the chromosome view, toggle
"Chromosome/Draw Average Curve" to
show single-sample or average LOH or copy number data across samples.
1/23/07: Select the menu "Chromosome/Draw All Curves" at the chromosome
view to show the data curves of all samples.
1/4/07: At “Chromosome/Clustering”,
checking “Export sample ordering into ‘sample_order.txt’” will export a text
file into the working directory after clustering. This file contains sample
names as ordered by clustering.
12/14/06: For tumor-only LOH inference, when no
"Options/Reference genotype file" is specified, the normal samples
specified in sample info file as "Ploidy(numeric)" of 2 will be used
to estimate SNP heterozygosity and genotype dependence probabilities.
11/16/06: At the clustering or
chromosome view, use “View/Find sample” to find a
particular sample. (suggested by Yohan)
9/17/06: At the chromosome view,
control+click a data point to append its information to a "data
points.txt" file in the working directory. The information includes sample
name, chromosome position, and gene names within 100Kb surrounding region. This
can help to manually identify copy changed regions from raw copy number view
when data is noisy or human eyes work better. 2/8/07: Show or export the
nearest gene to a SNP.
8/29/06: Faster display update when
zooming at the "Analysis/Chromosome" view.
7/31/06: At "Open group",
there is an option for specifying TXT file suffix
(e.g. ".brlmm.txt"). At "View/Export image", "Export all chromosomes individually" can be checked.
(suggested by Charlotte Schjerling)
7/27/06: Check "Analysis/Open
group/Perform 'Analysis/Normalize & MBEI afterwards" to continuously execute these three steps. Normalization
and MBEI will use options set at "Open group/Options/Model", and the
baseline array will be the default one with median overall intensity.
(suggested by Charles Mullighan)
6/1/06: Specify consanguineous relationship in a family
to reduce pedigree size and speed up linkage analysis.
5/29/06: Read SNP CEL files without matching TXT
genotype files but with combined genotype file.
4/10/06: Combine two sub-arrays at "Open
group" without using external data files.
12/12/05: To make a copy number summary plot similar to Figure 1B
of Zhao et al.
2005, select "Chromosome/Summary Plot" at the inferred copy
number view. (suggested by Edward Attiyeh)
11/17/05: In the chromosome view,
check “Tools/Options/Clustering/Sample names always
visible” to always display the sample names and information on the top
of the data area. (suggested by Charles Mullighan)
11/13/05: Specify “Analysis/Open
group/Other information/SNP information file”
to provide allele frequency and other SNP information. (suggested by Ann
Mullally)
11/11/05: Check “Tools/Options/Chromosome/Show
only tumor of paired sample” to display only tumor
samples in copy number view when paired normal and tumor samples both
exist in array list file. (suggested by Peter Ouillette and Changzhong Chen)
11/8/05: Process
human exon array data and view data along chromosome.
11/3/05: Dahia
et al. 05 identified a novel loci for familial pheochromocytoma syndrome by
integration of two-locus linkage analysis, transcription profiling, and
genome-wide SNP-based copy number mapping.
11/1/05: Use an array list file to define batches
and the “Tools/Adjust batch effect” function to scale (multiply a value) the
expression values from batch 2 to the last batch, so that for each gene, the
mean of each batch is the same as the mean of the 1st batch. Then one can redo
gene filtering and clustering to see if batch effect is gone. If so one may use
“Analysis/Compare samples” to pool samples from different batches and do
comparison.
10/15/05: (1) Select
“Analysis/Normalize/Options/Normalization method: Quantile normalization” for
quantile normalization (Bolstad
et al. 2003, Workman
et al. 2002). 6000 matching quantiles from two arrays are used to fit a
running median normalization curve. M-A plots are also added in the Normalization Plot. (suggested by
Igor Klacansky)
(2) Select “Analysis/Model-based
Expression/Options/Method used: Average Difference” to use Average Difference method to
compute expression values or signal.
(3) Set
"Tools/Options/Chromosome/HMM length" to N to perform linkage
analysis for segments of N markers at a time (e.g. 2000 for 100K array
where chromosome 1 has more than 9000 markers). (suggested by Annemieke
Verkerk)
10/13/05: Use “Analysis/Filter SNPs” to select better SNPs using fragment
length or No Call rate to use in the downstream analysis. (suggested by Zhigang
Wang)
10/12/05: “Image/Normalization Plot” is now
implemented within dChip without calling R. They can be viewed during
normalization as well by checking “Analysis/Normalize/View normalization
plot”.
10/7/05: Add "Pathway drawing and
analysis".
10/6/05: Add the manual page “Allele sharing analysis” for SNP array, including Application in graft-versus-host disease, Non-parametric
linkage analysis and Permute to find significant loci.
10/5/05: Add functions to analyze 500K SNP array.
10/3/05: (1) Set
“Options/Chromosome/Inferred copy method” to be “Median smoothing” and set a SNP marker window
size (e.g. 10) to median smooth raw copy numbers as the inferred copy number.
(2) To infer the LOH status of non-informative LOH calls from paired normal/tumor LOH
analysis, the method “Options/Inferred LOH method/Same boundary” can be used in addition to the
HMM method.
(3) In the clustering or chromosome
view, set “Options/Clustering/Number of letters shown for sample information”
to be greater than 1 to display 1 or more letters
above samples. (suggested by Charles
Mullighan)
10/2/05: (1) In the SNP copy number
analysis, when normal samples are not available or too few, set
“Options/Chromosome/% of samples trimmed” to be
> 0 to obtain reference signal distribution without using the information of
which samples are normal.
(2) Set “Options/Chromosome/HMM
length” to N to perform HMM inference of LOH and copy number for a
stretch of maximum N SNP markers each time. This can increase the speed
for SNP array with density > 100K, where chromosome 1 has > 9K marker but
one can set “HMM length” to be 1000.
10/1/05: (1) In
the SNP copy number analysis, check “Options/Chromosome/User paired normal as
reference” to use the
signal of the paired normal to obtain the raw copy numbers of tumor
samples, as opposed to using the average signal of all normal samples.
(suggested by Peter Ouillette)
(2) Use the
"Tools/Percentile
filtering" function to select genes by its fold change between
a high and a low percentile across samples. (suggested by Wing Wong)
(3) At the gene
clustering view, “Clustering/Export Same Gene” can export the probe sets
belonging to the same gene as the selected probe sets (click the area between
the clustering tree and blue-red image to select a single probe set). The
exported gene listed can be viewed by “Analysis/Clustering”. (suggested by Wing
Wong)
9/29/05: Cheng and
Peter Ouillette successfully troubleshooted a dChip usage problem through remotely accessing Peter’s PC. This first
try appears to be an efficient way to solve elusive problems.
9/28/05: Start to
put the dChip manual into a WikiBook,
so anyone can edit it or post discussions. (3/13/06: this effort has stopped)
9/23/05: At
“Analysis/Genome”, report in the analysis view the gene names in the
significant stretches and the information of multiple comparison, such as “24
significant stretches found at 0.05 level from 2494 p-value assessments”.
(suggested by Pawel Michalak and Wei Zhao)
9/21/05: Added common probe set file for HG-U95Av2 vs. HG-U133_plus_2
array.
9/15/05:
“Tools/Make information file” may report the error of “NumTerm == MaxTerm at category 'Gene Ontology'”,
which is due to more GO terms than before in the latest GO structure files.
Update to the latest dChip to correct it. (reported by Xueqing Zhang and
Patrick Loerch)
9/13/05:
“Tools/Options/Model” adds an option to truncate negative PM/MM differences to
0 before modeling in the “PM/MM difference model”. By default it is checked to
compute all-positive expression values; uncheck it to use the method as before.
9/12/05: “Google site search”
function is added to the dChip main page and manual page.
5/26/05: (1) Use Affy Files parsers
SDK
to read in binary CEL
files. So it is no longer needed to convert binary CEL files to text CEL
files for dChip to read.
(2) Uncheck "Analysis/Open
group/Options/Load probe data in memory" to not
load probe data so that a large dataset containing many arrays or large
array types (e.g. 100K SNP array) can be loaded faster. Then do normalization
and model-based expression computation the same way as before. However CEL image and PM/MM data views are not available
since they use probe level data.
5/5/05: Check
“Tools/Export expression value/Append to this
file” to append the output data to an existing
data file. This is useful for combining the data of sub-arrays.
(suggested by Changzhong Chen)
3/29/05: Update to
handle 3/23/05 Affymetrix annotation CSV
files. In the “HG-U133_Plus_2_annot.csv” file, “LocusLink” is changed to
“Entrez Gene” in the header line and "Chr:" is changed to
"chr" in the “Chromosomal Location” column. Also handle the tab in
the "Gene Title" column (e.g. 1425167_a_at in Mouse430_2_annot.csv)
to generate correctly formatted gene info file. (suggested by Andrea
Richardson)
2/23/05:
“Analysis/Open group/Other information/Probe set mask file” can accept individual probes to mask
them out from CDF file. 4/15/05: Corrected the bug that eliminates both probe 4
and 14 for a string "14,".
(suggested by Igor Leykin and Bin Yao)
9/14/04: At “Analysis/Hierarchical
clustering/Options/Standardize rows”, one can select a
sample (default is mean) to be subtracted during standardization. This is
useful when a known baseline is desired to be displayed as white for all genes,
and other samples display relative up-regulation (red) or down-regulation
(blue). (suggested by Changzhong Chen)
8/28/04: Use Affy
C++ source code (Files parsers SDK) to read binary CDF files. (suggested by
Lucy He)
4/13/04: “Tools/Print settings”
will print out the current settings and parameters.
Then the whole analysis log can be saved using “Analysis/Save log”. (suggested
by Bill Sellers) [11/11/05: moved this function to the “Tools/Options/Print
Settings” button.]
4/4/04: Update “Tools/Make
Information file” to handle 4/2/04 NetAffx annotation and ortholog CSV files, which has slightly different format than
previous CSV files.
3/19/04: Update “Tools/Make
Information file” to convert current NetAffx ortholog CSV
files to dChip common probe
set files. This is useful for combining expression data across
species. (requested by Enrique Millan)
12/26/03: (1) One may add a numerical column in “sample information file”. The
column header needs to contain “(numeric)”, for example, “Age(numeric)”. Such
continuous variable will be standardized and displayed in the clustering
picture. (2) “R view/ANOVA filtering” and “Clustering/Similar profile” are
merged into “Analysis/Analysis of
Variance”.
10/22/03: Update to handle Oct.
2003 or later NetAffx CSV files
when making gene info files.
10/16/03: You may use this Affy
CEL file converting tool to convert the new binary-format CEL file to the
old text-format CEL file so dChip can read.
8/9/03: [New] In the PM/MM
data view, select menu “Data/Show All Array” to view the probe patterns of the
current probe set in all arrays specified by the array list file. Such
probe-level data and patterns are very useful for confirming the properness of
computed gene expression values and changes. (suggested by Yu Guo)
7/30/03: Two software GoSurfer and Tight
Clustering developed by Wong lab can be called from dChip at the “Tools”
menu. (suggested by Wing Wong. GoSurfer developed by Sheng Zhong. Tight
Clustering developed by George C. Tseng)
7/8/03: “Tools/Make
information file”: the maximal number of protein domain terms allowed is
increased to 4000, so that the NetAffx June 2003 annotation file for HG_U133A
can be processed (reported by Mike Wang).
7/8/03: [6/27/03 bug fixed]
RG_U34A array has probe set names containing “#”
(e.g. L00981mRNA#2_at). Anything in a line after “#”will be interpreted as
comments by the R “read.table()” fucntion used in the R view “Get expression” button and this caused
failure of the “Get expression” function. Now the “comment.char = ""”
parameter is added to the read.table() function. (reported by Charlotte
Schjerling)
7/3/03: [Bug fixed] When
“Analysis/Open group/Options/Mask redundant probe sets” or “Omit Affymetrix
control probe sets” is checked, the annotation information in the gene
information file for redundant or Affy probe sets was not used at all. However
we want to read the information for these probe sets, but not count them in the
gene numbers associated with annotation terms to avoid biasing the “significant
gene clusters” or “Tools/Classify
genes” functions. This is corrected now. (reported by Susan G. Hilsenbeck)
7/3/03: Use dates in the version number (e.g. Version 1.3
Test (7/3/03)) to better keep track of dChip updates. (suggested by Charlotte
Schjerling)
6/27/03: The expression data
object obtained by “Get Expression” in the R
view now has probe set names as row names. (suggested by Leo Schalkwyk)
6/25/03: [Bug fixed] A bug
introduced on 6/13/03 switched the PM and MM row in the Grid (2, 1) of PM/MM
Data view. (reported by Jorg D. Becker)
6/12/03: [Bug corrected] 5/21/03+
versions regard probe set names starting “AF” (“AFFX”
had been used before) as Affy control probe sets. However, when
“Tools/Options/Analysis/Omit Affy probe sets” is checked, some probe sets in
RG_U34A array are wrongly regarded as control probe sets (e.g. AF007107_s_at).
Now “AFFX” is used again.
(reported by Charlotte Schjerling)
5/28/03: [New feature] At
“Analysis/Filter genes”, one can choose “Standard deviation” in criterion 1 for
variation filtering. This is useful when the data is log transformed (by
checking “Analysis/Open group/Options/Log transform” or reading in log-scaled
data at “Analysis/Get external data”), and standard deviation is preferred over
CV (coefficient of variation, or standard deviation / mean) due to variance
stabilization property of the log transformation. (suggested by Wing Wong)
5/22/03: [New feature] Check
“Analysis/Open group/Options/Ignore probe sets” to not
use probe set information in the CDF file. This is useful when there are
too many probes in a probe set, and our main interest is only normalizing
arrays. (suggested by Todd Mockler)
5/21/03: [New feature] Assess empirical false discovery rate at
“Compare samples”.
5/21/03: [Change] For the unpaired
t-test of “Analysis/Compare samples”, the degree of
freedom was group.size1 + group.size2 – 1 (instead of –2 to accommodate
1 vs. 1 comparison, so it’s ok when n1=n2=1 and standard error of a single
expression value is model-based). Now the
Weltch correction for d.f. is used. The comparison result files have header
line “[COMPARE_CRITERIA_V2]” to indicate these changes.
5/19/03: [bug corrected] Extra tabs added to the end of array list file name and filter input file name in INI file. The problem
may be due to that a comparison criterion file has been saved in Excel, and
Excel added some extra tabs on rows with only 1 column. Later on dChip reads in
the comparison criterion file and uses the array list file and filter input
file name in it. (reported by Keith Crist; 6/11/03: similar problem reported and
corrected for “lda result file”)
5/17/03: [New
feature] In “Image/Normalization
plot”, any array can be selected as the baseline
array. Thus the scatterplot of probe values between any two arrays can
be visualized.
5/8/03: [New and
change; updates from now on may not be reflected in the manual at the same
time] 1) Euclidean distance can be used for clustering
(set at “Tools/Options/Clustering”), which may be more reasonable to use for
sample clustering using gene-wise standardized values. 2) By default
“Tools/Options/Clustering/Add new color for Control+Click” is not checked and
new “Control+Click” clusters will be in light blue to distinguish from the
current “Click” cluster in blue. 3) If a single gene is selected in the
clustering picture, the value range for a gene is displayed but the displaying
range is from 0, so that relative fold change can be visualized. 4) At
"Cluster/Selected branch/Export" data, if "Output all colored
gene branches" is checked, all the colored branches (selected by
Control+Click and Click) will be exported. Note that the samples exported are
all the colored ones, and the gene clusters exported are in the order of
clicking, not the visualized order. (suggested by Wing Wong)
5/8/03:
[Withdrawn] “Analysis/Normalization” starts the “Invariant Set” selection form
a list of “Stable probe sets”.
4/20/03: [New
feature] If “Clustering/Selected branch/Export
data/Cut the tree at the height of current branch and export all branches” is
checked, one may export gene expression data grouped in clusters. These
clusters are obtained by cutting the gene clustering tree at the height of the
selected blue branch. In previous version, multiple colored (by using
Control+Click) gene clusters was exported. (suggested by Bin Zhang)
4/17/03: [New feature] In “Image/Normalization plot”, one can also
plot the normalized values on the X-axis by checking the option “Use normalized
values to view the result after normalization”. (suggested by Wenhong Fan)
4/12/03: [New feature] In “Tools/Gene list file/By
Annotation”, one can select multiple terms to get the union or intersection of
the genes belonging to these categories, and apply the “Filter genes” function
immediately using this gene list as the input gene list. (suggested by Wing
Wong)
4/12/03: [Change] “View/Go to GenBank; Go to
UniGene; Go to LocusLink; Go to NetAffx”
are combined into “View/Online Database”. Check the “GenBank” option to link to
the “UniGene” databae. Withdraw the “probe number” option from “View/Find
Gene”.
4/9/03: [Bug corrected] When
“Tools/Options/Analysis/Do not read array list file” is checked, the individual
arrays are not correctly treated as individual samples, and this causes no
sample names in “Compare samples” dialog and incorrect “Filter genes” results.
7/7/03: a 4/9/03 bug is corrected: checking “Tools/Options/Analysis/Open
group/Do not read array list file” made array list files not used both at “Open
group” and after “Open group” (reported by Sabina Chiaretti and Mark).
4/7/03: [But corrected] When the lines of a
“Comparison gene list file” are long, “Tools/Classify genes” may causes
crashing. (reported by Shahab Asgharzadeh)
4/5/03: [New feature] One may use “Tools/Make information file” to generate
dChip information files based on the quarterly updated NetAffx annotation
files. Also see the ChipInfo
software for broader applications of this effort.
3/17/03: [New feature] In hierarchical clustering,
the “Average linkage” method can be specified at “Tools/Options/Clustering”.
Previously the only linkage method is centroid
linkage. (suggested by Casper Frederiksen and Jean-philippe Brunet)
3/16/03: [New feature] Check the “Copy to clipboard”
option in “View/Export image” to copy
the image to the clipboard in BMP or EMF format. (suggested by Yu Guo)
3/16/03: [Change] In “Analysis/Compare samples”, the presence call % criterion
can be specified for the baseline and the experiment group separately.
(suggested by Tao Lu)
3/16/03: [Change] The compare criteria are saved in
the beginning of the “Compare result
file” if “Analysis/Compare
samples/Combine comparisons/Output comparison criteria” is checked, instead of
a separate “Compare criterion file”.
3/6/03: [Bug
corrected] A file name like “D:\chip.data\all.dchip\file.CEL” led dChip to extract the file as a DAT format
file since “.dat” is found in the file name (reported by Xueqing Zhang).
3/6/03: [New
feature] The outputs of “Analysis/Normalize” contains the median probe
intensity before and after normalization (suggested by Eric Libby).
3/3/03: [New
feature] Read in CEL file of PM-only array
(suggested and testing data by Jeremy Erickson).
3/1/03: [Bug
corrected] The “LDA classification view” sometimes did
not update the screen image correctly between separate “Analysis/LDA
Classification” calls.
2/26/03: [Bug
corrected] An ending "\" or internal “\\” specified at the
"Tools/Options/Analysis/Working directory" string (e.g.
“C:\array\\other\lung\lung_loh\” caused that the file names in some dialogs
such as “View/Export image” cannot be clicked and changed. To change such existing
path names, either edit the configuration (*.ini) files in the same directory
as dchip.exe, or apply “Tools/Options/Reset default” (reported by Yu Guo).
2/5/03: [New
feature] Check “Tools/Export data/Expression value/Include
header information” to include information such as the modeling method,
baseline array into the exported expression data files (suggested Victoria
Perreau).
1/31/03: [Bug
corrected] dChip reverses signs of theta’s and phi’s during model fitting to
ensure most theta’s and phi’s are positive. In PM-only
model, counting zero theta values as negative caused the possibility of
having negative expression values (reported by Tanya Logvinenko).
1/28/03: [Bug
corrected] 1. When the number of arrays is small (e.g. 2), the gene names were
not displayed in the Clustering View. 2. The “View/Export image” dialog
sometimes didn’t show up; also see note 2/26/03 (reported by Yu Guo).
12/30/02:
[Withdrawn feature] The “Analysis/Print” function does not work properly and is
withdrawn. One can use “Analysis/Save” to first save the contents of the
Analysis View into a Word file and then print. (reported by Susan G.
Hilsenbeck)
12/19/02: [Bug
corrected] “Image/Unscrambled” (renamed to “Image/Probe Together”) caused
various problems when computing expression values in the “unscrambled” mode
(e.g. the background values used for PM-only model don’t correctly consider the
unscrambling effect). Now this function is re-implemented in a new mechanism.
(reported by Reinhard Hoffmann, Thomas Seidl, Laurent Gautier and James
MacDonald)
12/11/02: [Bug
corrected] DAT files of arrays with large dimension (e.g. ATH1 with 712^2
probes) were not read correctly at right and bottom margins. (reported by Susan
J. Miller)
12/10/02: [New feature]
“Tools/Options/Model/Exclude x 5’ probes” to always call the x most 5’ probes
in a probe set as probe outlier, thus not use them in the model-based
expression values. This is useful when there is known mRNA degradation in the
sample and 5’ signals are not reliable, or when small samples are amplified
using 2-round IVT protocols and 5’ probes tend to have amplification biases.
(suggested by Edward Fox and Christine Konradi)
12/9/02: [Change]
The “Analysis/Model-based expression/Log x transform expression values” option
is moved to “Tools/Options/Analysis”. This way the DCP files always store
original expression values, and a user can choose to log transform the expression
values at the “Analysis/Open group” step. (A bug – always log-transforming
after “Open group” – was introduced here and is corrected on 12/30/02; reported
by Philippe Guardiola and Susan G. Hilsenbeck) (A bug – storing log-transformed
expression values into DCP files after “Model-based expression” – was
introduced here and is corrected on 1/16/03; reported by James MacDonald)
12/6/02: [New
feature] Check “Tools/Options/Model/Do not call all replicate arrays as array
outlier" and then specify an array list
file with replicate separators to discard array
outliers called in all replicates of a tissue type, since this is the
real biology effect.
(suggested by Joerg D. Becker)
11/28/02: [New feature]
“Image/Scale CEL value” can be
used to scale (multiply a constant value) the
unnormalized CEL values in an
array so that the median intensity is a particular value. This is useful
for normalizing different
tissue types. (suggested by Joerg D. Becker)
11/19/02: [Bug corrected] Sample
information file with more than 20 columns caused dChip to crash. Now the limit
is increased to 40 columns and the boundary check for this value is added
(reported by Tao Shi).
11/16/02: [V1.3+] “Analysis/Chromosome” function can display the
expression data of a list of genes along chromosomes. (suggested by Stanley F.
Nelson and Robert Gentleman)
11/15/02: dChip 1.2 released. See below for function updates.
10/31/02: [New
feature] Display the relative probe
position information in the “Data View”.
10/31/02: [Change] “Analysis/Map chromosome”
renamed to “Analysis/Genome”.
10/31/02: [Change]
“Tools/Options/Model/Perform outlier detection” is split into “Check array
outlier”, “Check single outlier” and “Check probe outlier”. One may change these options to perform or not perform a particular outlier
detection.
10/30/02: [Change] When a TXT file (containing A/P calls) has any probe set
not in the CDF file, “Analysis/Open group” will ignore the TXT file and compute its own A/P calls. One can
disable this feature by checking “Tools/Options/Analysis/Allow TXT files
to contain probe sets not in the CDF file”, so that dChip only ignores
the unknown probe sets (e.g. those masked by “Probe set mask file”). (suggested by
Igor Klacansky)
10/27/02: [New feature] The
“Image/Normalization Plot/Use
smoothing spline to normalize and save result to DCP file” option allows for using a smoothing spline to fit the
normalization curve to the points in the “Invariant Set”. (suggested by Xinmin
Zhang)
10/12/02:
[New feature; 05/03: withdrawn] “Analysis/Normalization” can start the
“Invariant Set” selection form a list of “Stable probe
sets”. For example, one can perform “Analysis/Model-based expression” without
normalization and then use “Analysis/Filter
genes” to use only criteria (2) with 100% threshold to obtain genes called
“Present” in all samples. Then use this gene list as
“Analysis/Normalization/Stable probe sets” to normalize arrays and re-compute
expression values. Uncheck “Apply ‘Invariant Set’ selection…” will use all
probes of the “Stable probe set” for normalization. It is also good to use “Image/Normalization plot”
(see below) to check the validity of the normalization.
9/19/02: [Bug corrected] Source
code bug in partial_sort() function corrected. This bug may affect the median
or percentile computation such as those used in the outlier detection
procedures (reported by Ming Lin).
[New feature] Gene Filtering by ANOVA through the
“R View” (see Interface
with R software for necessary setup procedures to use the function)
(suggested by Frank Buxton, Susan Hilsenbeck and Dona Wu).
[New feature] Look for EXP files with the same name as CEL file for “Description”, and if available use it
as ChipName (also can be supplied by “sample information file”). Check
“Tools/Options/Analysis/Use ‘Description’ in EXP file as array name” to enable
this option.
[New feature] Report the file format number of
DCP and CDF.BIN files during “Open group”. dChip 1.1 and 1.2 uses format 3;
dChip 1.0 uses format 2; dChip beta test version used format 0 and 1.
(suggested by Susan Hilsenbeck)
[New feature] Read in the MAS5 “Signal” from MAS5
analysis result file by checking “Open group/Read in MAS5 Signal” (suggested by
Greer M. Murphy and Song Her).
[New feature] Check
“Tools/Options/Analysis/Search and save DCP file in the Working directory” to store DCP files into different places than CEL files. This way we may perform different
analysis (e.g. normalization using different baseline array, MBEI with log
transformation) and store the results into DCP files under different
directories while maintaining the single copy of CEL
files. (suggested by Anne Bowcock, Victoria Perreau and Susan Hilsenbeck)
[New feature] Take log base on 10 or other bases at
“Analysis/Model-based expression” (suggested by Casper M. Frederiksen).
[New feature] “Logged” indicator at the lower-right corner to distinguish
log-transformed expression indexes. (suggested by Susan G. Hilsenbeck)
[New feature] The “Select sample by
category” button in the “Compare samples” dialog has “Use
inversion” button for selecting samples not having a particular
property.
[New feature] In the “Chromosome
View”, use “View/Find gene” to find a specific gene in the highlighted set
(suggested by Isabella Tai).
[New feature] Add “Windows Enhanced Metafile (*.emf)” image format at
“View/Export image”. The file is in vector format and can be enlarged without
losing resolution. It can be inserted in Word or Powerpoint files by
“Insert/Picture/From file” or converted to EPS format by Adobe Illustrator.
[New feature] “Image/Export CEL” dialog has an option “Export
probe set name, probe pair order and PM/MM indication”, which will add
additional data columns correlating a probe cell to its corresponding probe
sets (suggested by Yuval Kluger)
[Change] Simply the menu items
“View/PM/MM Data”, “View/CEL Image”,
etc. to a single “View/Next view”. Also one can use “Enter” or “Shift+Enter”
key to switch to other views.
[Change] In “Compare sample result”
files the number of decimal digit is reduced to 2
for easier reading. (suggested by Feng Wu)
[Change] “Image/Export CEL” can export all arrays
into CEL files at once.
[Change] “Image/Normalization plot”
will use the chosen baseline if the array has
been normalized; otherwise use the default baseline with median overall
intensity (suggested by Tiago Duarte)
[Change] Require MAS text file has the “Signal” column as well as
the “Detection” column.
[Change] The
“Analysis/Hierarchical clustering/Only draw lines for standard separator”
checkbox is moved to “Tools/Array list file”. (suggested by Robert Gentleman)
[Withdrawn feature] V1.2 cannot convert CDF.BIN file and DCP file in the old
format to the current format (file format 3). Use dChip v1.1 to do this.
[Bug corrected]
“Tools/Options/Analysis/Mask redundant probe sets…” and
““Tools/Options/Analysis/Omit Affy control probe sets…” take effect at reading
gene lists or filtering genes, but not at “Open group” where the gene
information file is read in. This leads to artificial
significant functional groups. For example, with “Mask redundant probe
sets” checked, “Tools/Classify genes” on the all the probe sets in HG_U95AV2
chip will result in "Found 544 GeneOntology ‘cell fraction’ genes in a
5933-group (all: 703/8100, PValue: 0.004912) ***”; this is because the relative
size of some functional groups has been increased by removing duplicate probe
sets from the list. This bug is now corrected; note that after changing these
two options one needs to do “Analysis/Open group” to re-compute the number of
used probe sets for each GO function (reported by Kieran Holland).
[Bug corrected] When using
“Analysis/Save” to save the analysis results into a word file, dChip ignores
the user-specified file name (reported by Anna Tsimelzon).
[Bug corrected] “Tools/Export data/Expression
value” exports GCT file (format 1.2) that can correctly work with GeneCluster
2.0.
5/8/02: The dChip console
(command-line) version continuously executes the normalization and model-based
expression steps to generate a tab-delimited text file containing the
expression values. Source codes available to use
dChip on other platforms. (suggested by Casper Frederiksen and Allen Day)
4/28/02: dChip version 1.1 (Suggested and helped by: Jianhua Hu, Edward J.
Oakeley, Simon M Lin, Allen Fienberg, Sanjay Jain, Chunfa Jie, Greer Murphy,
Ken Aldape, Tiago Duarte, Tiago R Magalhaes, Igor Zwir, Dale Muzzey, Mayetri
Gupta):
New:
· PM-only
model results in always-positive expression indexes. Specify different
methods through the “Analysis/Model-based expression/Options” dialog or use the
menu “Data/Next model” to switch between models.
· Handles Human U133 chip
via file format change (V1.1 will upgrade the DCP or CDF.BIN files generated by
V1.0)
· New PSI file format; specify PSI
file through the “Analysis/Model-based expression/Options” dialog; use
“Tools/Export data/Probe sensitivity index” to export f vales and their standard
errors as text format.
· The input gene list in
the “Analysis/Filter genes; Compare samples” dialog can be used to exclude
these genes from filtering or comparison. Click the “Filter on” or “Compare on”
file buttons to switch the mode.
· The context-specific
links to online manual in various dialogs.
¾ Clustering
·
“Tools/Options/Clustering/Standardize rows” option. One may choose not to
standardize a gene’s expression value across samples when the scale of the data
is already adjusted.
· “Clustering/Selected
branch/Export data” has the option to export gene-wise standardized values.
·
“Tools/Options/Clustering/Distance” has the option to use 1 - |r| as distance
measure, where r is the Pearson’s correlation.
· “Clustering/Similar
profile” can search genes with high positive or negative correlations with the
current gene or the selected gene branch. When “Standardize separators” are
present, check “Analysis/Hierarchical clustering/Only draw lines…” to make this
function work properly.
Changes:
· New outlier detection algorithm handles the
image contaminations more reasonably.
· The Menu item
“Tools/Reset default settings” is changed as the “Tools/Options/Reset default”
button.
· The “Analysis/Filter genes;
Compare samples” function by default ignores Affy’s control genes (probe set
names starting with “AFFX-”),
since their changes are generally not interesting. “Tools/Options/Analysis/Omit
Affymetrix control…” to change this setting.
· “Analysis/Model-based
expression/Export” function moved to “Tools/Export data/Expression value”.
· “Data/Export probe set”
function moved to “Tools/Export data/Probe set”.
· In the “Analysis/Compare
samples/Combine comparisons” dialog, the “Insert complement” button is changed
to the “And not” and “Or not” options. Thus a single comparison can be negated.
· The “Tools/Classify
samples” function copies all columns of the “gene list file” into the output
“classified file”, so the output file can have expression values or fold changes.
· After “Analysis/Get
external data”, “Analysis/Normalize” uses the Invariant Set Normalization method
(V1.0 uses a using a simplified ISN method with fixed rank difference threshold
50 without iteration). Check the “Show scatter-plot…” option to show
normalization scatter-plot (installation of R
needed) when normalizing. Also when fitting the running median curve at the two
tails, 5% of the “invariant” points are used to fit a ray at one end fixed
(V1.0 uses 1¤300
of the “invariant” points); this makes the high-end normalization relationship
more smooth and robust.
· The “Analysis/Map
chromosome” function only checks gene stretches of length < 20 for significant
p-values. Previously all gene stretches are checked.
Withdrawn:
· The option
“Analysis/Model-based expression/Use average difference instead of MBEI” is
gone. Affy’s MAS 5.0 software
adopts “Signal” as expression index.
Bug corrected:
· After “Analysis/Get
external data”, the “Analysis/Map chromosome” and “Analysis/LDA Classification”
does not show the result images.
· In the “Analysis/View”
some letters cannot be input, such as “A” or “M”. This is due to the shortcut
keys for menu “Data/Animate” or “Data/Next model”. Now these shortcut keys are
changed to “Control+A” or “Control+M”.
4/4/02: [Version 1.1 Test only] PM-only model results in always-positive expression
indexes. New outlier detection algorithm
handles the image contaminations more reasonably.
3/13/02: Ecoli gene information and genome information file available
(suggested by Igor Zwir). In the “Map chromosome” function, ignore the “MAX_STRETCH limit is reached” message and uncheck
“Tools/Options/Chromosome/Outline significant…” to turn off the p-value
highlighting, since there is only one chromosome and may result in too many significant
p-values.
3/12/02: HG-U133
gene information files available (A, B, unzip all files to the same directory;
helped by Siming Shou and Miguel Rea).
2/1/02: “View/Go
to NetAffx” to go the NetAffy
website for the current probe set. (suggested by
Victoria Perreau)
1/31/02: Linking
to online resources such as “View/Go to LocusLink” may not work on some
computers. Checking "Tools/Options/Analysis/Show online link dialog"
to show a dialog containing the web address and also automatically copy the
address to the clipboard, then one can manually paste it into the address bar
of Internet browser. (reported by Susan Hilsenbeck,
Casper Frederiksen, Victoria Perreau)
1/29/02: (1) Bug corrected: When going from “Clustering View” to
"Data View", the PM/MM data image was not refreshed correctly; as a
result the same probe set persists there. (reported
by Greer M. Murphy)
(2) “Image/Export CEL” will export
model-based single outliers and array outliers in the
[OUTLIERS] section of the CEL
file. (suggested by Edward J. Oakeley and Yizheng Li)
1/26/02: Check
the button “Perform Principal Component Analysis instead” in the “Analysis/LDA
Classification” dialog to perform Principal Component
Analysis. (suggested by Anne Bowcock and Stephen
Haggarty)
1/14/02: Use the
“Analysis/Compare samples/Combine comparisons/Compare on” button to restrict the comparison to a gene list. (suggested by Yingxi Lin)
1/12/02: Updated Yeast S98 gene information file with GeneOntology
terms and added its genome
information file. Downloading of the new version of dChip is needed. (suggested by Simon Lin, courtesy of SGD database)
1/10/02: (1) Combine the data for different species (suggested by Florian Storch and Stephen Haggarty, courtesy
of TIGR RESOURCERER
database)
(2) Bug corrected: when the expression values
are truncated at “Analysis/Model-based expression”, the standard errors are set
to 0. Failure to consider this led to incorrect average of the identically
truncated values between replicates. (reported by
Michael Boutros)
12/30/01: Updated
HG_U95AV2 and MG_U74AV2 gene information file
(944 GeneOntology terms, 971 ProteinDomain terms and 377 Cytoband terms). In
the clustering picture, use Shift+Left/Right key to change the width of the
annotational columns, and Right-click to go to the website of GeneOntology or
Pfam entries. (protein domain suggested by Wing Wong
and Florian Storch)
12/14/01: (1)
After clustering, the p-values of the gene and sample
clusters are calculated using exact hypergeometric distribution.
Previously the binomial approximation of hypergeometric distribution, and then
normal approximation of the binomial was used. But for these very small
p-values a high accuracy is desirable. At “Tools/Options/Clustering” the
default p-value thresholds for gene clusters is changed to 0.005. (suggested by Steve Horvath)
(2) The sample cluster p-values are now calculated
with regard to the samples defined in the “Array list file”, not all the
arrays in the group. (suggested by Steve Horvath and
Robert Gentleman)
12/12/01: Map a list of genes to chromosome by “Analysis/Map
chromosome” (suggested by Wing Wong, Robert Gentleman
and Andrea Richardson)
12/11/01: (1) In the “Analysis View”, the error messages are colored in red. (suggested by Robert Gentleman)
(2) Small Excel files and exported images are inserted into the
“Analysis View” for convenience (uncheck “Tools/Options/Analysis/Insert Excel
and Image outputs into the Analysis View” to disable the function). The analysis output can be saved into a Word file by
“Analysis/Save”.
(3) “Tools/More gene information” to read in a customized gene information file and use
it with priority over the main gene info file specified in “Analysis/Open
group”. (suggested by Michael Boutros)
12/2/01: (1)
Check “Analysis/Model-based expression/Apply log2 transform” to log2 transform the expression values. (also check
“Ignore existing calculated expressions” if necessary). The model-based
standard errors are set to 0 for the modified or transformed expression values.
When working with the log-transformed values at “Analysis/Compare samples”, use E-B, B-E instead of E/B, B/E for fold changes. (suggested by Bradley Messmer)
(2) Check “Tools/Options/Analysis/Mask redundant probe sets when reading
gene list file” to exclude the redundant probe sets
from a gene list. Multiple probe sets for the same gene tend to bias the
result of array clustering and also lead to erroneous functional group
identification in the gene clustering. (suggested by
Bradley Messmer)
12/1/01: (1) Used
the Nearest Neighbor algorithm (see the reference) to increase the speed of clustering (e.g. the time of
clustering on 1400*6 values reduces from 80s to 4s). In addition, one can
uncheck “Tools/Options/Clustering/Pre-calculate distances” to calculate the distances between genes or samples on-the-fly;
this is useful when clustering on a large number of genes (e.g 12K), which
requires too much memory to store all the distances and causes virtual-memory
swapping that slows the process down. (suggested by
Edward Oakeley)
(2) Uncheck “Analysis/Hierarchical clustering/Cluster genes” to cluster samples without clustering genes. (suggested by Ruty Shai and Bradley Messmer)
(3) Use “Control+Click” to change the color of
the GeneOntology blocks in the clustering picture; the selected colors
cannot be saved right now. (suggested Michael
Boutros)
11/23/01: Merged dchip.exe and “dchip large.exe”. Drosophila chip
users can use the normal version of dChip as well. File conversions of cdf.bin
and dcp files will be automatically performed.
11/19/01:
“Analysis/Compare samples” can have different fold
change criterion for E/B and B/E and different mean difference criterion
for E-B and B-E. (suggested by Tiago R Magalhaes)
11/14/01: “Clustering/Selected
branch/Export image” to export the clustering image of
the selected main gene cluster outlined by blue lines. The sample
clustering tree is not attached to the image. (suggested
by Sanjay Jain and Huan Dong)
11/13/01: In the
“Clustering” view, use Control+Click to select and
color multiple gene or sample clusters. The multiple clusters can be
exported or deleted (gene only) by “Clustering/Selected branch” functions.
Clicking still works to select the main cluster (outlined by blue lines), used
for cluster resampling. (suggested by Huan Dong)
11/11/01: Add
“Tools/Classify Genes” for classifying
genes by functional groups (suggested by Miguel
Ramalho Santos and Nikhil Munshi)
11/9/01: (1) Negative expression values are set
to 1 when calculating fold changes in “Analysis/Compare samples”.
Previously fold changes involving negative expression values are set to be
non-informative 0; however when one expression is large (say 1000) and the
other is -10 (at noise level of absent genes), it is helpful to bring the -10
to a small positive number so a large fold change is calculated and the gene
gets selected.
(2) Began to use R as the engine for some computing
and graphic tasks. (suggested by Robert Gentleman)
(3) Use “Image/Normalization
plot” to view the normalization scatter plot between one array and the
baseline array. (suggested by Casper M. Frederiksen;
data courtesy of Andrea Richardson)
(4) On start of dChip there is an automatic display of the dChip updates
since the last use.
10/29/01: (1) Add
“Select by category” button in the “Analysis/Compare
Sample” and “Analysis/Model-based Expression/Export” dialog. (suggested by Robert Gentleman; data courtesy of Andrea
Richardson)
(2) Deleted “Array list file” selection button in many dialogs. Specify
“Array list file” only through “Tools/Array list file”.
10/23/01: Combine comparison criteria using “not” operator, by
“Analysis/Compare Samples/Combine Comparison/Insert complement” button. (suggested by by John K. Park and Wing Wong)
10/4/01: (1) Use
"Clustering/Similar Profile" function to export
a list of genes with similar profile with the current highlighted gene.
The resultant list can be used as the "gene list file" in
"Analysis/Hierarchical Clustering" dialog to view these genes. (suggested by Andrea Richardson)
(2) Bug corrected: MG_U74 gene information files updated
using Sep.7.01 version of Unigene file. In the old “mg_u74av2 gene info.xls”, probe set
160309_at was annotated as amelogenin. By checking with LocusLink (ID: 11704),
UniGene (Mm.172556) and BLAST, it seems that is a mistake. (reported by Feng Wu)
9/27/01: Add paired t-test p-value as a
filtering criteria in “Analysis/Compare
samples”. (suggested by Stephen Henderson, Susan
Hilsenbeck and Jenny Z. Xiang)
9/6/01: Gene filtering and clustering decoupled:
first use “Analysis/Filter genes” to generate a filtered gene list (the
filtering can be restricted to an input gene list), then use
“Analysis/Hierarchical clustering” to cluster on the filtered gene list. (suggested by Wing Wong and Laura Forsberg)
9/5/01: (1) “Tools/Gene list file/By keywords”
and “View/Find gene” accepts wildcard strings as
“keywords”. (suggested by Wing Wong, codes
courtesy of Florian Schintke)
(2) Export functional category information in “compare result file” by
checking "Tools/Options/Analysis/Output GeneOntology terms". (suggested by Miguel Ramalho Santos)
8/29/01, 6/17/01:
Output gene list by GeneOntology
or keywords (suggested by Robert Gentleman and
Casper M. Frederiksen)
8/23/01: Probe sensitivity
index file. [V1.0 manual] If a PSI
file is specified in the “Analysis/Model-based expression/Calculate” dialog and
the checkbox “Use existing probe sensitivity index in this PSI file” is unchecked, the probe sensitivity
indexes will be saved after the model fitting is performed on all probe sets.
At a later time, the PSI file can
be used to fit the expression values for other arrays by checking the checkbox
“Use existing probe sensitivity index in this PSI
file”. (suggested by Richard Lempicki and Robert
Gentleman)
8/20/01: (1) In
“Compare Samples”, dChip will export both fold change or confidence bound of
it, if either of them is used in the filtering criterion. (suggested by Andrew Bent and Soemini Kasanmoentalib)
(2) “Tools/Reset Default Settings” restores
dChip parameters to the default values. (suggested
by Robert Gentleman)
8/16/01: Improvement of array outlier calling
method: For a probe set, the model fitting still uses all arrays, but
the identification of array-outliers is done for absent arrays and present
arrays separately, to avoid the situation that small standard errors of
expression indexes of absent arrays make present arrays called as
array-outlier; previously I tried to avoid this by only fitting the model using
present arrays, as a result absent arrays are not fitted and not called as
outliers --this led to much fewer array-outliers. (03/03: this is obsolete; in the current method, P/A calls help to correct
signs but do not affect array outlier calling. However, the number of
array/probe outliers is restricted to be at most 50% of all arrays/probes.)
6/29/01: The
model fitting is changed to use only the arrays where a probe set is called
“Present” by Affy’s algorithm (or minimum of 3 arrays regardless of the
Absolute calls). This avoids the situation where a
gene is “Present” in a minority of arrays but these arrays are called
“Array-outlier” for the gene. Now these arrays
are correctly identified as having good patterns. Other changes in the “Data
View” are: in grid (2, 2) an array is represented by a cyan circle if it is
called “Absent” for the gene (blue circles still representing “array-outliers”);
in grid (1, 3) the red fitted curve is always shown whether the array is
“array-outlier” for the gene or not. (suggested by
Brain Yandell, data courtesy of Daniel Auclair and Elizabeth K. Robinson)
7/27/01: If
checkbox “Always show sample names and clusters on the
top” in “Tools/Options/Cluster” dialog is on, when ones scroll down to
see other genes in the cluster one can still see the samples names and cluster
trees. (suggested by Stefano Colella)
7/18/01: Image contamination correction. (suggested by Robert Gentleman, data courtesy of Eric
Schadt)
7/13/01: The
“Data/Export probe set” menu can export the PM/MM data for multiple probe sets.
(suggested by Laura Forsberg)
7/1/01: After
“Analysis/Get External Data”, one can use
“Analysis/Normalize” to normalize expression values using a simplified version
of the Invariant Method (see manual).
This function used to be a linear scaling to make the arrays to have the same
median. (suggested by Arindam Bhattacharjee)
6/30/01: Using
“View/Find Gene” and “View/Find Next”, one can search
genes by keywords such as “troponin”. (suggested
by Arindam Bhattacharjee)
6/28/01: A user
can specify a “Working directory” in the “Analysis/Open
group” dialog, under which dChip exports configuration (.ini) file and other
output files. (suggested by Victoria Perreau)
6/27/01: (1)
Navigate probe sets in the array CEL
image using 'Home' and 'End' keys. (suggested by
Yizheng Li)
(2) Bug corrected: In some
dChip output files line breaks occur after gene descriptions and cause
“frame-shift” in output files. I tried to correct this by eliminating “\n” at
the end of gene descriptions, but let me know if this is still a problem. (reported by Brain Yandell, Thomas Cappola and David
Gerhold)
6/26/01: (1)
Replace “sample name file” by “sample information file” in
“Analysis/Open Group/Other information” dialog. Significant sample clusters can be
calculated. (data courtesy of Andrea Richardson and
Catherine Gradek)
(2) Bug corrected: During
“Analysis/Open Group”, dChip reports “Search and extract PM/MM data from CEL files of chip type under” but finds no array data files. This
may due to “.” in the directory name. (reported by
Karen Vranizan and Adam Olshen, Michel Bellis)
6/22/01: (1) The
displaying range of the clustering picture used to be [-3, 3] for the
standardized expression values for each gene. Now this range can be customized
at “Tools/Options/Clustering” dialog. (suggested by
Thomas Seidl)
6/17/01: (1)
Moved “Array list file” dialog under “Tools” menu, instead of having it many
times under various “Analysis/*” dialogs.
(2) Took away “Start
clustering using filtered genes” from “Analysis/Compare samples/Combine
comparisons” dialog, so that “Compare samples” and “Hierarchical clustering”
are decoupled. Use “compare result file” as “gene list file” in
“Analysis/Hierarchical clustering/Filtering genes” dialog for clustering
analysis using filtered genes.
6/6/01: Color genes with a particular function in blue in the
clustering picture. Clicking the function bars on
the right side of the clustering data will select the corresponding function as
the “current function” and color the genes of this function in blue. The
“current function” is also reset when selecting the “functional cluster” icons
on the left pane (suggested by Robert Gentleman)
5/7/01: Use probe set mask file to exclude probe
sets from the analysis . Using dChip “Image/Unscrambled” function we can
move all the excluded probe sets to the bottom of the array
image (still randomly placed; U74A array); we note that for these probe
sets there are still hybridization signals.
(suggested by Jason M. Laramie and Scott Oakes)
4/30/01: Combine the data for
Human arrays of different chip types. (suggested
by Stan Nelson, Daniel Auclair and Isabella Tai)
4/17/01: (1) In
“Analysis/Model-based Expression”, we can check “Use Average Difference instead
of Model-based expression” to calculate traditional Average Difference as expression
levels. [Obsolete: The standard errors are
still for model-based expression levels, and the array-outliers is still
computed using the model-based approach. Note that the Affymetrix Average
Difference method uses a super-scoring method to exclude probes whose PM/MM
difference is outside 3 standard deviation of all probe differences in either
of the two comparing arrays in their comparison analysis. Here since we are analyzing
multiple arrays at the same time, when calculating Average Differences a probe
is excluded if its difference is outlier in any of the arrays, until a minimum
of 5 probes is reached then all 5 probes will be used.] (suggested by Matthew Tudor)
(2) “Analysis/Alternative Transcripts” function: Different tissues types
may give different probe response patterns, this may be due to alternative
splicing. Probe sets called
"array-outlier" in all selected arrays will be exported. (suggested by Patrick Jay)
4/10/01: (1) Bug report: in “Analysis/Compare Samples”, the
size of Experiment group was not used correctly, this may lead to incorrect
standard error and fold change confidence interval calculation. (Stefan Horvath)
(2) “Analysis/Get
External Data” can read in Whitehead RES
format data file. (Andy Bhattacharjee)
3/17/01:
“Analysis/Get External Data” to read in an external
tab-delimited data file with first row
array names and first column gene names. Absolute call and standard error
columns can also be contained.
3/15/01: Add CEL intensity pictures also in PM/MM
Data view. (Andy Bhattacharjee)
3/7/01: In
“Analysis/Open Group/CDF file” dialog we can specify
a gene information file, which contains
gene descriptions from Affymetrix EASI
database, as well as LocusLink
Gene Ontology terms classifying a
gene by its biological process, molecular functions and cellular components.
“Analysis/Hierarchical Clustering” will use such functional category
information to assess whether a local cluster is enriched by genes having a
particular function, and highlight these
“functionally significant” clusters. (Data courtesy
to Dan Tang)
3/2/01: Merge
“Analysis/Pair-wise Comparison, Two-group Comparison,
Filter Interesting Genes” into “Analysis/Compare Samples”. The three dialogs here can be used to specify comparisons,
combine comparisons and specify arrays used and replicates to be pooled. The
genes satisfying the comparison criterion can be exported to a file or used for
clustering. Sample names followed by “*” refer to how many additional replicate
arrays are pooled for this sample. The function of exporting expression values
is moved to “Analysis/Model-based Expression/Export”.
2/20/01: In
“Analysis/Hierarchical Clustering/Sample handling” tab, we can read in a gene
function file. The functional categories of genes will be shown as color bars
on the right (data from the reference). In this way we may visually check if
genes belonging to a functional category is enriched in a cluster. (Reference:
Cho et al. 2001. Transcriptional
regulation and function during the human cell cycle, Nature Genetics,
Vol 27, 48-54)
2/18/01: Take
away “Pool duplicate arrays” and “Group every n
samples” checkboxes in “Analysis/Hierarchical Clustering/Sample handling” tab.
Instead, we can insert “Replicate separator” and “Standardize separator” in array list file. Replicate arrays separated by
“Replicate separators” will be pooled using weighted averaging method (weights
being the measurement accuracy, so expression values with large standard errors
receive smaller weights), and samples separated by “Standardize separators”
will be standardized (rescale to have mean 0 and standard deviation 1 across samples
for each gene) within themselves. If “Only draw lines for standardize
separator” checkbox in “Sample handling” tab is checked, “Standardize
separators” just add vertical lines between
group of samples.
2/11/01: We can
specify a data file list in “Analysis/Open
Group” dialog (leaving “Data directory” blank) and
dChip will use the arrays specified in the file. There can be directory names
in the data file list. (suggested by Michael Angelo)
2/10/01:
“Analysis/Hierarchical Clustering” can read in a tab-delimited data file,
without opening a group of arrays first.
2/7/01:
“Clustering/Show Profile” function displays a profile
plot for the currently selected cluster. The Y-axis has the same range as
the color scale on the bottom of the picture. The value of the profile curve
for each sample is the average of the standardized expression values of all
selected genes in this sample (standardization is a linear scaling for each
gene so its expression values across all samples have mean 0 and standard
deviation 1). The error bar extends 1 standard deviation (of the selected
genes’ standardized expression values in a sample) on both sides. Shorter error
bars indicate tighter clustering of genes at this sample point. (suggested by Deming Wang)
2/6/01: (1) Add
“Clustering/Save Tree” function for saving the clustering result, and the file
can be read in as “Analysis/Hierarchical Clustering/Filter genes/Gene list or
tree file” (and the filtering criterion are thus ignored). (suggested by Stan Nelson)
(2) In
“Analysis/Model-based Expression” dialog, we can specify to output an array
quality summary file, containing percent of probe sets called “array outlier”,
percent of probe pairs called “single outlier”, and percent of “P” calls.
Arrays with more than 5% array outliers change their icons to dark blue. We
need to redo “Open Group” (check “ignore existing DCP file), “Normalize” and
“Model-based Expression” to calculate these statistics. (suggested by Stan Nelson, Andrew Kirby)
2/1/01: (1)
Merged “Analysis/Export Expression” into “Analysis/Two-group Comparison”. If no
arrays are chosen in group 2, the expression values in group 1 will be
exported.
(2) We can use “Analysis/ Hierarchical Clustering/Array list file” tab
to create an “array list file”. When specified in “Analysis/ Hierarchical
Clustering/Sample handling” tab, this file dictates which arrays are used for
clustering in what order.
1/31/01: (1) Add
“Clustering/Export Selected” menu to export the expression data of selected
braches. The exported file can be used as “gene list file” in
“Analysis/Hierarchical Clustering” dialog to perform clustering using only this
subset of genes. (suggested by Deming Wang)
(2) In
“Analysis/Model-based Expression” dialog, we can choose to truncate low or
negative expression values to a small value, or to a given percentile of the
expression values that are called “A”. (suggested by
Stan Nelson)
1/30/01: The
“Analysis/Filtering interesting genes” dialog now accepts simple logical
combination of criterion using AND
and OR. (suggested by Michael Zhang)
1/25/01: An icon
will be added for each exported tab-delimited analysis result file, under the
Analysis icon. Clicking it will invoke Excel to open the file. (suggested by Mei Xu)
1/24/01: Add
“Image/Unscramble” function, which re-organizes the probes of the same probe
set together for arrays using "distributed probe set format" (e.g.
Human U95 arrays), so that we can view such arrays in the old way. This makes
“Image/Array outlier” function still applicable for such probe-scrambled
arrays. (suggested by Andy Bhattacharjee)
1/23/01: (1) Add
checkbox “GCT format for GeneCluster” in “Analysis/Export Expression” dialog,
for exporting expression data files to use with GeneCluster. (suggested by Michael Angelo)
(2) Clustering
View is linked to CEL and PM/MM
Data views. That is, in the clustering picture, we can click a data point (the
expression value of a gene in a sample), and go to look at the CEL level data. This is useful for those who are
curious about unusual data points in the clustering picture (such as large
negative expressions values), and want to trace back to the raw data.
1/22/01: Options
added in “Analysis/Export Expression” and “Analysis/Hierarchical Clustering”
dialogs so we can treat expression values identified by the model to be outlier
(i.e. array-outliers in CEL
images) as missing values. They are exported as blank entries in tab-delimited
file or shown as black (Blue/Red coloring) or white (Green/Red coloring) boxes
in clustering picture. This is another way of using measurement error of
model-based expression values in down-stream analysis, besides resampling
clustering trees. (suggested by Priya Sudarsanam)
1/21/01: (1) Add
“View/Export image” menu item, for exporting CEL,
PM/MM data or clustering images into BMP file.
(2) In “Analysis/Hierarchical Clustering” dialog, we
may read in a “gene list file” (each line has a probe set name) to cluster a
pre-selected subset of genes (filtering criterion are thus ignored; this file
may be the output file of “Analysis/Filter interesting genes” function). We can
also read in an “array list file” with each line specifying an array to be used
as columns of clustering data matrix. We may also pool duplicate arrays using
measurement-error weighting scheme before clustering.
1/18/01: (1) Add
“Image/Export CEL” menu item. We
can use it to export normalized data into CEL-like
file. If you want to export the raw data, check the “Use unnormalized data”
checkbox in “Analysis/Open group” dialog when opening a group. (suggested by Margaret C. Cam)
(2) When viewing
array images, we can use the four arrow keys to zoom in and out. (suggested by Andy Bhattacharjee)
1/15/01: (1)
Changed “Analysis/Open group” dialog, so we can read in gene name file
(tab-delimited file, the 1st column is Affymetrix probe id, the 2nd
column is gene name/description) and sample name file (the 1st column
is array file name (without .cel or .dat suffix), the 2nd column is
sample name). Such information will be used when exporting results or
displaying clustering trees.
(2) Add
“Analysis/Filter interesting genes” dialog, for filtering genes by fold changes
(or the lower confidence bound of them) of multiple pair-wise comparisons. (suggested by Dan Tang)
1/10/01: In
“Analysis/Hierarchical Clustering” dialog, we may group the samples for the
standardization purpose. That is, in stead of standardizing a gene to have mean
0 and standard deviation 1 across all samples, we standardize it’s expression
values in samples of the same experiment (using the same cell lines) to have
mean 0 and standard deviation 1. This is because we are interested in the
differences caused by the various treatments, instead of the differences
existing among cell lines. (suggested by Deming Wang)
1/9/01: We can
right-click a non-gene node in the clustering tree to exchange the positions of
its two branches, in order to interactively adjust the ordering of genes in
clustering trees. (suggested by Andy Bhattacharjee)
1/5/01: In
“Analysis/Open group” dialog, we can read dChip (DCP) files, which is the
format used internally by dChip. In this way, we only need to carry dChip files
around along with dChip to demonstrate the downstream analysis.
1/1/01: (1) In
“Analysis/Open group” dialog, we can extract CEL
and DAT files at the same time. (suggested by Yan
Cui)
(2) “Hierarchical
Clustering” is now two-way.
12/18/00: Add
“Hierarchical Clustering” analysis item and menu.
12/4/00: Take
away "Look for presence calls in TXT
files?" checkbox in "Open Group" dialog. dChip will always look
for TXT files for presence calls,
if not found it will calculate them in a similar way that is described in Affymetrix
Analysis Manual (93% agreement with GeneChip’calls in one comparison).