SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Análisis masivo de expresión, SNP,
CNV y biomarcadores
M. Gonzalo Claros
Rocío Bautista, Pedro Seoane, Hicham Benzekri, Isabel González Gayte, Rosario
Carmona, Darío Guerrero-Fernández, Rafael Larrosa, Macarena Arroyo
Noé Fernández-Pozo, David Velasco
Análisis de expresión
2
Micromatrices de dos colores
3
BioMed Central
Page 1 of 13
(page number not for citation purposes)
BMC Bioinformatics
Open AccessSoftware
PreP+07: improvements of a user friendly tool to preprocess and
analyse microarray data
Victoria Martin-Requena1, Antonio Muñoz-Merida1, M Gonzalo Claros2 and
Oswaldo Trelles*1
Address: 1Computer Architecture department, University of Málaga, Málaga, Spain and 2Molecular Biology and Biochemistry department,
University of Málaga, Málaga, Spain
Email: Victoria Martin-Requena - vickymr@ac.uma.es; Antonio Muñoz-Merida - amunoz@uma.es; M Gonzalo Claros - claros@uma.es;
Oswaldo Trelles* - ots@ac.uma.es
* Corresponding author
Abstract
Background: Nowadays, microarray gene expression analysis is a widely used technology that
scientists handle but whose final interpretation usually requires the participation of a specialist. The
need for this participation is due to the requirement of some background in statistics that most
users lack or have a very vague notion of. Moreover, programming skills could also be essential to
analyse these data. An interactive, easy to use application seems therefore necessary to help
researchers to extract full information from data and analyse them in a simple, powerful and
confident way.
Results: PreP+07 is a standalone Windows XP application that presents a friendly interface for
spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error
removal and statistical analyses. Additionally, it contains two unique implementation of the
procedures – double scan and Supervised Lowess-, a complete set of graphical representations –
MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as
tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the
equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of
differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor
Limma packages were statistically identical when the data set was only normalized; however, a slight
variability was appreciated when the data was both normalized and scaled.
Conclusion: PreP+07 implementation provides a high degree of freedom in selecting and
organizing a small set of widely used data processing protocols, and can handle many data formats.
Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing
of his/her microarray results and obtain a list of differentially expressed genes using PreP+07
without any programming skills. All of this gives support to scientists that have been using previous
PreP releases since its first version in 2003.
Published: 12 January 2009
BMC Bioinformatics 2009, 10:16 doi:10.1186/1471-2105-10-16
Received: 29 August 2008
Accepted: 12 January 2009
This article is available from: http://www.biomedcentral.com/1471-2105/10/16
© 2009 Martin-Requena et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
BioMed Central
Page 1 of 13
(page number not for citation purposes)
BMC Bioinformatics
Open AccessSoftware
PreP+07: improvements of a user friendly tool to preprocess and
analyse microarray data
Victoria Martin-Requena1, Antonio Muñoz-Merida1, M Gonzalo Claros2 and
Oswaldo Trelles*1
Address: 1Computer Architecture department, University of Málaga, Málaga, Spain and 2Molecular Biology and Biochemistry department,
University of Málaga, Málaga, Spain
Email: Victoria Martin-Requena - vickymr@ac.uma.es; Antonio Muñoz-Merida - amunoz@uma.es; M Gonzalo Claros - claros@uma.es;
Oswaldo Trelles* - ots@ac.uma.es
* Corresponding author
Abstract
Background: Nowadays, microarray gene expression analysis is a widely used technology that
scientists handle but whose final interpretation usually requires the participation of a specialist. The
need for this participation is due to the requirement of some background in statistics that most
users lack or have a very vague notion of. Moreover, programming skills could also be essential to
analyse these data. An interactive, easy to use application seems therefore necessary to help
researchers to extract full information from data and analyse them in a simple, powerful and
confident way.
Results: PreP+07 is a standalone Windows XP application that presents a friendly interface for
spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error
removal and statistical analyses. Additionally, it contains two unique implementation of the
procedures – double scan and Supervised Lowess-, a complete set of graphical representations –
MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as
tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the
equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of
differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor
Limma packages were statistically identical when the data set was only normalized; however, a slight
variability was appreciated when the data was both normalized and scaled.
Conclusion: PreP+07 implementation provides a high degree of freedom in selecting and
organizing a small set of widely used data processing protocols, and can handle many data formats.
Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing
of his/her microarray results and obtain a list of differentially expressed genes using PreP+07
without any programming skills. All of this gives support to scientists that have been using previous
PreP releases since its first version in 2003.
Published: 12 January 2009
BMC Bioinformatics 2009, 10:16 doi:10.1186/1471-2105-10-16
Received: 29 August 2008
Accepted: 12 January 2009
This article is available from: http://www.biomedcentral.com/1471-2105/10/16
© 2009 Martin-Requena et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
En conclusión MADE4-2C es capaz de detectar
errores en la intensidad de la señal, en el lavado, la
hibridación, el marcaje con el fluoróforo, las agujas
de impresión y la calidad de las sondas impresas.
Esto ayuda a evitar que los resultados se basen en
las variaciones técnicas en lugar de en las variacio-
nes biológicas. Además, ofrece toda la información
en un informe denso pero comprensible para el in-
vestigador, lo que permite una buena evaluación del
experimento sin tener unos conocimientos avanza-
dos sobre micromatrices.
9.2.3. Descarte de sondas fallidas
Una vez que se proporciona información al usua-
rio sobre la calidad de los datos originales que quie-
re analizar, MADE4-2C procede a la corrección del
ruido de fondo utilizando normexp ([184]) y genera
las gráficas MA que muestran cómo quedan los da-
tos tras corregir el fondo (figuras 2.10 y 2.11, apén-
dice B).
A continuación se muestran las sondas que se uti-
lizarán en el experimento y las que se descartarán.
Una sonda se descartará siempre cuando su punto
está vacío según la información del fichero GAL, o
cuando la sonda contiene una secuencia artefactual
o mal caracterizada (información que se incorporó
desde el fichero BadSpots.txt). Existen dos moti-
vos de rechazo que solo afectan a algunas sondas en
una micromatriz, pero no tiene por qué afectar a las
demás réplicas:
El punto correspondiente a la sonda no se im-
primió o es de baja calidad, lo que viene indica-
do por su peso específico a partir de los campos
flags y area.
La corrección del ruido de fondo con normexp
ha marcado la sonda como descartable.
La tolerancia a estos fallos es controlable median-
te un parámetro del fichero de configuración (véase
el apéndice D) que indica el número de réplicas fa-
llidas permitidas para cada sonda en el experimento
que se analiza. Lo recomendable es que se retire la
sonda en todas las micromatrices en cuanto falle
una de las réplicas por cualquiera de los motivos
anteriores, aunque teóricamente el análisis se pue-
de realizar con tal que una sonda tenga dos o más
réplicas valores de intensidad válidos. En el caso de
los experimentos analizados sobre la expresión gé-
sis (figura 2.12, apéndice B). Es de esperar que este
filtro no retire más del 15 % de las sondas [184] co-
mo se muestra en la figura 2.12 del apéndice B. En
cambio, es recomendable repetir el experimento si
se acaban descartando más del 15 % de las sondas,
como se muestra en la figura 9.4.
Figura 9.4: Ejemplo de figura generada por
MADE4-2C para indicar que se han descarta-
do demasiadas sondas impresas para el análisis
posterior.
9.2.4. Normalización
La normalización de los datos tiene en cuenta
las réplicas técnicas para confirmar que los valo-
res de expresión no introducen más variabilidad de
la que había antes de la normalización, y que nin-
guno de los marcajes con fluoróforos añade nin-
gún tipo de sesgo a los datos. Aunque son mu-
chos los métodos de normalización que se han pro-
puesto, todavía no hay un consenso claro de que
un método sea el mejor frente a las diferentes
condiciones experimentales posibles [45], y pues-
to que el método de normalización utilizado es
uno de los factores que más afectará posteriormen-
te a la detección de GED [187, 98, 45], y es po-
sible obtener mejores resultados combinando dos
de ellos [187], MADE4-2C lleva a cabo la norma-
lización de modo independiente con varios méto-
dos: Print-tip loess [207], Print-tip loess +
scale, Print-tip loess + quantile [28], con la
función normalizeBetweenArrays de limma, y por
último, VSN [62] y VSN + Print-tip loess [45].
9.3. IDENTIFICACIÓN DE UNA MUESTRA PROBLEMÁTIC
Figura 9.9: Correlación negativa de las réplicas
detectada en los experimentos de brotes y hojas de
pinsapo.
naturales de Sierra Bermeja (Málaga), que se hi-
bridaron con el Pinarray1 y con una micromatriz
con secuencias de pino obtenidas por hibridación
sustractiva por supresión, llamada SSH-Ma (apar-
tado 8.1). A continuación se presenta el diseño del
experimento y los datos obtenidos al hibridrar con
SSH-Ma por ser donde se observó este comporta-
miento originalmente. Las réplicas del experimento
se organizan del siguiente modo:
Individuo 1-Sur, hibridado en la micromatriz
10a marcando la muestra de madera madura
con Cy3 y la de madera juvenil con Cy5. La
micromatriz se dividió en dos réplicas técnicas
10a-A y 10a-Z.
Individuo 1-Norte, hibridado en la microma-
triz 22a marcando la muestra de madera madu-
ra con Cy3 y la de madera juvenil con Cy5. La
micromatriz se dividió en dos réplicas técnicas
22a-A y 22a-Z.
Individuo 2-Norte, hibridado en la micro-
matriz 23a, con intercambio de fluoróforos en
relación a las hibridaciones anteriores, marcan-
do la muestra de madera madura con Cy5 y la
de madera juvenil con Cy3. La micromatriz se
dividió en dos réplicas técnicas 23a-A y 23a-Z.
Individuo 3-Sur, hibridado en la micromatriz
24a, con intercambio de fluoróforos en relación
a las dos primeras micromatrices, marcando la
madera
vidió en
Distancia
Correlaci
Figura 9.
tancias y c
nes realizad
réplicas téc
en el texto
En el aná
tados no mo
tancias entre
plicas técnic
ra 9.10), lo q
bien hecho. P
se observó q
quedar emp
que llevaba
del resto de
(figura 9.10
tearnos si ca
comportami
la búsqueda
2C permite
tuaciones se
con la librer
patrones de
ral, aunque
mediciones d
ORIGINAL PAPER
Gene expression profiling in the stem of young maritime pine
trees: detection of ammonium stress-responsive genes in the apex
Javier Canales • Concepcio´n A´ vila • Francisco R. Canto´n • David Pacheco-Villalobos •
Sara Dı´az-Moreno • David Ariza • Juan J. Molina-Rueda • Rafael M. Navarro-Cerrillo •
M. Gonzalo Claros • Francisco M. Ca´novas
Received: 25 May 2011 / Revised: 30 August 2011 / Accepted: 12 September 2011
Ó Springer-Verlag 2011
Abstract The shoots of young conifer trees represent an
interesting model to study the development and growth of
conifers from meristematic cells in the shoot apex to dif-
ferentiated tissues at the shoot base. In this work, micro-
array analysis was used to monitor contrasting patterns of
gene expression between the apex and the base of maritime
pine shoots. A group of differentially expressed genes were
selected and validated by examining their relative expres-
sion levels in different sections along the stem, from the
top to the bottom. After validation of the microarray data,
additional gene expression analyses were also performed in
the shoots of young maritime pine trees exposed to dif-
ferent levels of ammonium nutrition. Our results show that
the apex of maritime pine trees is extremely sensitive to
conditions of ammonium excess or deficiency, as revealed
by the observed changes in the expression of stress-
responsive genes. This new knowledge may be used to
precocious detection of early symptoms of nitrogen
nutritional stresses, thereby increasing survival and growth
rates of young trees in managed forests.
Keywords Conifers Á Pine development Á Nitrogen Á
Ammonium nutrition Á Transcriptional regulation
Introduction
Forests are essential components of the ecosystems, and
they play a fundamental role in the regulation of terrestrial
carbon sinks. Coniferous forests dominate large ecosys-
tems in the Northern Hemisphere and include a broad
variety of woody plant species, some ranking as the largest,
tallest, and longest living organisms on Earth (Farjon
2010). Conifers are the most important group of gymno-
sperms and have evolved very efficient physiological
adaptation systems after the separation from angiosperms,
which occurred more than 300 million years ago. Conifer
trees are also of great economic importance, as they are
major sources for timber, oleoresin, and paper production.
Maritime pine (Pinus pinaster Aiton) stands are dis-
tributed in the southwestern area of the Mediterranean
region. P. pinaster dominates the forest scenario in France,
Spain and Portugal, where this is the most widely planted
species in about 4 million hectares. The maritime pine is
particularly tolerant to abiotic stresses showing relatively
high-levels of intra-specific variability (Aranda et al.
2010). The maritime pine is also the most advanced conifer
Communicated by K. Klimaszewska.
Electronic supplementary material The online version of this
article (doi:10.1007/s00468-011-0625-z) contains supplementary
material, which is available to authorized users.
J. Canales Á C. A´ vila Á F. R. Canto´n Á D. Pacheco-Villalobos Á
S. Dı´az-Moreno Á J. J. Molina-Rueda Á M. G. Claros Á
F. M. Ca´novas (&)
Departamento de Biologı´a Molecular y Bioquı´mica,
Facultad de Ciencias, Instituto Andaluz de Biotecnologı´a,
Campus Universitario de Teatinos, Universidad de Ma´laga,
Trees
DOI 10.1007/s00468-011-0625-z
30 s at 72°C). The fluorescence signal was captured at the
end of each extension step and melting curve analysis was
performed from 60 to 95°C. The PCR products were ver-
ified by melting point analysis at the end of each experi-
ment, and, during protocol development, by gel
electrophoresis.
The baseline calculation and starting concentration (N0)
per sample of the amplification reactions were estimated
directly from raw fluorescence data using the LinReg 11.3
program (Ruijter et al. 2009). The relative expression
levels were obtained from the ratio between the N0 of the
target gene and the normalisation factor. We used the
geometric mean of three control genes (actin, 40S ribo-
somal protein and elongation factor 1 alpha) to calculate
the normalisation factor (Vandesompele et al. 2002). Ref-
erence genes were selected based on their stable expression
in the microarrays. Furthermore, these genes were stably
expressed in all conditions and tissue portions examined as
determined by statistical analysis using Normfinder
(Andersen et al. 2004).
Results and discussion
Differential gene expression between the apex
and the base of maritime pine shoots
The differential gene expression was analysed in maritime
pine stems using microarrays. Intact total RNA was
extracted from the apex and the basal part of the stems,
labelled with CyDye and hybridised to slides of PINAR-
RAY, a maritime pine microarray constructed in our lab-
oratory. Microarray data were lowess normalised to
account for intensity-dependent differences between
channels. After normalisation, the dye-swap replicates did
not show strong deviations from linearity, proving a low
dye bias. The comparisons between replicates showed a
high degree of reproducibility, with Pearson’s correlation
coefficients of approximately 0.98. Similar transcriptomic
analyses have been previously performed in Sitka spruce
(Friedmann et al. 2007). Microarray analyses were also
used for transcript profiling in differentiating xylem of
loblolly pine and white spruce (Yang et al. 2004; Pavy
et al. 2008).
Genes differentially expressed at the apical and the basal
parts of the maritime pine stem were identified by bioin-
formatic analysis of hybridisation signals in the microarray,
using a cut-off t test p value  0.05 and a fold change
genes encoding photosynthetic proteins, including those
located in the thylakoid membranes involved in the
photosystems I and II, light-harvesting complexes, as well
as soluble proteins of the plastid stroma such as the small
subunit of ribulose-1,5-bisphosphate carboxylase/oxygen-
ase (Rubisco SSU; EC 4.1.1.39), were particularly abun-
dant. This part of the stem contains the shoot apical
meristem which drives stem growth and develops new
needles requiring the biosynthesis of proteins for the pho-
tosynthetic machinery. Also abundant were transcripts for
lipid transfer proteins (LPT), metallothionein-like proteins
(MT) and stress proteins such as an antimicrobial peptide
(AMP), a putative dehydrin and a late embryogenesis
abundant protein. The expression of stress-related genes
has also been reported in the apical shoot meristem of Sitka
spruce where they may be involved in the protection of
meristematic cells against mechanical wounding or insect
attack (Ralph et al. 2006). Interestingly, a number of genes
involved in lignin biosynthesis and cell wall formation
were also upregulated in the apical part of the maritime
pine stem. These included a putative cinnamoyl-CoA
reductase (EC 1.2.1.44), a serine-hydroxymethyltransferase
(EC 2.1.2.1), xyloglucan endotransglycosylases (EC
2.4.1.207), an endo-1,4-b-mannosidase (EC 3.2.1.78), a
putative proline-rich arabinogalactan and a germin-like
Fig. 1 Graphical representation of the microarray data analysis.
Trees
ammonium excess. We have previously report
ammonium excess and deficiency trigger changes
transcriptome of maritime pine roots (Canales
2010). The differential expression patterns of a
of representative genes suggested the existe
potential links between ammonium-responsive ge
genes involved in amino acid metabolism, particu
asparagine biosynthesis and utilisation (Canales
2010). The results reported here indicate that th
bolic changes observed in roots are transmitted
stem apex. This fact implies the existence of a s
signal that may represent a part of the respo
maritime pine seedlings to nutritional stress by
nium. The nature of this systemic signal is p
unknown; however, we can speculate that altered
of organic nitrogen in the form of asparagine
involved. High-levels of this amino acid accumu
pine hypocotyls and a role of asparagine in nitro
allocation has been proposed (Can˜as et al. 2006).
asparagine is a vehicle for nitrogen transport in
and it is well known that there is a stress-
asparagine accumulation in response to minera
ciencies, drought or pathogen attack (Lea et al.
Fig. 5 Genes differentially expressed in maritime pine stems in
response to ammonium excess (E) or deficiency (D) identified by
microarray analysis. Log expression ratio values from each treatment
were represented as heatmaps
12
RESEARCH ARTICLE Open Access
Reprogramming of gene expression during
compression wood formation in pine: Coordinated
modulation of S-adenosylmethionine, lignin and
lignan related genes
David P Villalobos1,2
, Sara M Díaz-Moreno1,3
, El-Sayed S Said1
, Rafael A Cañas1
, Daniel Osuna1,4
,
Sonia H E Van Kerckhoven1
, Rocío Bautista1
, Manuel Gonzalo Claros1
, Francisco M Cánovas1
and
Francisco R Cantón1*
Abstract
Background: Transcript profiling of differentiating secondary xylem has allowed us to draw a general picture of the
genes involved in wood formation. However, our knowledge is still limited about the regulatory mechanisms that
coordinate and modulate the different pathways providing substrates during xylogenesis. The development of
compression wood in conifers constitutes an exceptional model for these studies. Although differential expression
of a few genes in differentiating compression wood compared to normal or opposite wood has been reported, the
broad range of features that distinguish this reaction wood suggest that the expression of a larger set of genes
would be modified.
Villalobos et al. BMC Plant Biology 2012, 12:100
http://www.biomedcentral.com/1471-2229/12/100
using the Pine Gene Index database (Additional file 3).
Sequences that matched with the same entry in the data-
base were assumed to represent the same gene. There-
fore, the final numbers of unigenes were reduced to 331
for Cx and 165 for Ox. Most of these genes showed sig-
nificant similarities to sequences in databases (293 in Cx
and 145 in Ox), although some of them were similar to
sequences with unknown function (49 in Cx and 45 in
Ox). The number of unigenes with no significant simi-
larity was low in both cases (38 in Cx and 20 in Ox).
The genes with assigned function were grouped into
functional categories using the Arabidopsis thaliana Mun-
ich Information Centre for Protein Sequences (MIPS)
database, and suppression of redundancy in MIPS funcat
assignations by decision according to their most probable
role in xylem development (Additional file 3). In keeping
with the greater number of genes identified as up-
Figure 3 Volcano plots of microarray analyses to identify genes
differentially expressed during compression and opposite
wood formation. The common logarithm of the p-value was
represented as a function of the binary logarithm of the
background-corrected and normalized opposite:compression
fluorescence ratio (log2 Fold Change) for each spot. Vertical bars
delimit the spots showing up-regulation in developing compression
xylem by at least 1.5-fold compared to developing opposite xylem
(Up-regulated in Cx) or spots showing up-regulation in developing
opposite xylem by at least 1.5-fold compared to developing
compression xylem (Up-regulated in Ox). The horizontal line delimits
the spots showing significant up-regulation under the criteria of an
adjusted p-value ≤ 0.001. Therefore, the upper left and right sectors
delimited by the horizontal and vertical lines include the spots (in
red) containing probes for genes significantly up-regulated in
developing compression or opposite xylem respectively. The
number of spots corresponding with genes significantly up-
regulated in Cx or Ox are shown in the top side of the sector. (a)
Results from the analysis of microarray 1 constructed with cDNA
clones from the composite library. (b) Results from the analysis of
microarray 2 constructed with cDNA clones from subtractive
libraries.
Villalobos et al. BMC Plant Biology 2012, 12:100 Page 5 of 17
http://www.biomedcentral.com/1471-2229/12/100
Otros tipos de micromatrices
4
Percentage of use in different testing methods of the different R package, background correction,
normalization and transformation functions available
Dataset1 Dataset2 Dataset1 Dataset2 Average
Control type 1(%) Control type 2(%) (%)
Package
beadarray 16.0 11.1 15.0 12.5 13.7
lumi 84.0 88.9 85.0 87.5 86.3
Normalization
loess (lumi) 11.1 18.5 12.5 17.9 15.0
median(beadarray) 3.7 0.0 2.5 0.0 1.6
qspline(beadarray) 2.5 1.9 2.5 1.8 2.2
quantile (lumi) 17.3 22.2 17.5 25.0 20.5
quantile (beadarray) 3.7 1.9 3.8 3.6 3.2
rankinvariant 9.9 0.0 10.0 0.0 5.0
rsn (lumi) 13.6 20.4 12.5 19.6 16.5
rsn(beadarray) 2.5 1.9 2.5 0.0 1.7
ssn(lumi) 13.6 0.0 13.8 0.0 6.8
vsn (lumi) 18.5 27.8 18.8 26.8 23.0
vsn(beadarray) 3.7 5.6 3.8 5.4 4.6
Transformation
log2(lumi) 29.6 29.6 30.0 28.6 29.5
log2(Beadarray) 6.2 1.9 6.3 1.8 4.0
vst(lumi) 27.2 25.9 27.5 25.0 26.4
vst(beadarray) 4.9 7.4 5.0 7.1 6.1
cubicroot 9.9 20.4 8.8 19.6 14.7
none 22.2 14.8 22.5 17.9 19.3
Background correction
bgAdjust (lumi) 22.2 24.1 22.5 23.2 23.3
bgAdjust.Affy(lumi) 14.8 14.8 15.0 14.3 14.7
forcePositive(lumi) 23.5 27.8 23.8 26.8 26.1
none (lumi) 23.5 22.2 23.8 23.2 23.1
none(beadarray) 16.0 11.1 15.0 12.5 13.7
BedArray (Illumina)
Agilent
Determinar el mejor protocolo
Preprocesamiento
Corrección)de)
ruido)de)fondo
Normalización)de)
los)datos
Media)de)los)puntos)
replicados
Expresión0diferencial
Comparaciones)
Estimación)
variabilidad)media)
por)eBayes
Filtro)por)P)y)
logFC
Target
Datos0crudos
Diseño0
experimental
Genes0expresados0
diferencialmente
COLABORACIÓN:
Fernando Cardona

Juan A. G. Ranea
Micromatrices de Affymetrix
5
On Selecting the Best Pre-processing Method for
Affymetrix Genechips
J.P. Florido1
, H. Pomares1
, I. Rojas1
, J.C. Calvo1
, J.M. Urquiza1
,
and M. Gonzalo Claros2
1
Department of Computer Architecture and Computer Technology, University of Granada,
Granada, Spain
{jpflorido,hector}@ugr.es, {irojas,jccalvo,jurquiza}@atc.ugr.es
2
Department of Molecular Biology and Biochemistry, University of Málaga, Málaga, Spain
claros@uma.es
Abstract. Affymetrix High Oligonucleotide expression arrays, also known as
Affymetrix GeneChips, are widely used for the high-throughput assessment of
gene expression of thousands of genes simultaneously. Although disputed by
several authors, there are non-biological variations and systematic biases that
must be removed as much as possible before an absolute expression level for
every gene is assessed. Several pre-processing methods are available in the
literature and five common ones (RMA, GCRMA, MAS5, dChip and VSN) and
two customized Loess methods are benchmarked in terms of data variability,
similarity of data distributions and correlation coefficient among replicated
slides in a variety of real examples. Besides, it will be checked how the variant
and invariant genes can influence on preprocessing performance.
1 Introduction
Microarray technology is a powerful tool used for the high-throughput assessment of
gene expression of thousands of genes simultaneously which can be used to infer
metabolic pathways, to characterize protein-protein interactions or to extract target
genes for developing therapies for various diseases [1]. Several platforms are
currently available, including the commonly used high oligonucleotide-based
Affymetrix GeneChip® arrays.
As described in [1], an Affymetrix GeneChip contains probe sets of 10-20 probe
pairs representing unique genes. Each probe pair consists of two oligonucleotides of
25 bp in length, namely perfect match (PM) probes (the exact complement of an
mRNA) and the mismatch (MM) probes (which are identical to the perfect match
except that one base is changed at the center position). The MM probe is supposed to
distinguish noise caused by non-specific hybridization from the specific hybridization
signal, although some researchers recommend avoiding its use [17].
A typical microarray experiment has biological and technical sources of variation
[2]. Biological variation results from tissue heterogeneity, genetic polymorphism, and
changes in mRNA levels within cells and among individuals due to sex, age, race,
genotype-environment interactions and other “living” factors. Biological variation is
of interest to researchers as it reflects true variation among experiments. On the other
Joan Cabestany Francisco Sandoval
Alberto Prieto Juan M. Corchado (Eds.)
Bio-Inspired Systems:
Computational and
Ambient Intelligence
10th International Work-Conference
on Artificial Neural Networks, IWANN 2009
Salamanca, Spain, June 10-12, 2009
Proceedings, Part I
1 3
E↵ect of Pre-processing methods on Microarray-based SVM
classifiers in A↵ymetrix Genechips
J.P.Florido, H.Pomares, I.Rojas, J.M.Urquiza, L.J.Herrera, M.G.Claros
Abstract— A↵ymetrix High Oligonucleotide expression
arrays are widely used for the high-throughput assessment
of gene expression of thousands of genes simultaneously.
Although disputed by several authors, there are non-biological
variations and systematic biases that must be removed as
much as possible through the pre-processing step before an
absolute expression level for every gene is assessed. It is
important to evaluate microarray pre-processing procedures
not only to the detection of di↵erentially expressed genes,
but also to classification, since a major use of microarrays
is the expression-based phenotype classification. Thus, in
this paper, we use several cancer microarray datasets to
assess the influence of five di↵erent pre-processing methods
in Support Vector Machine-based classification methodologies
with di↵erent kernels: linear, Radial Basis Functions (RBFs)
and polynomial.
I. Introduction
Microarray technology is a powerful tool used for the high-
throughput assessment of gene expression of thousands of
genes simultaneously which can be used to infer metabolic
pathways, to characterize protein-protein interactions or to
extract target genes for developing therapies for various dis-
eases [1]. Several platforms are currently available, including
the commonly used high oligonucleotide-based A↵ymetrix
GeneChip R arrays. As described in [1], an A↵ymetrix
GeneChip contains probe sets of 10-20 probe pairs re-
presenting unique genes. Each probe pair consists of two
oligonucleotides of 25 bp in length, namely perfect match
(PM) probes (the exact complement of an mRNA) and the
mismatch (MM) probes (which are identical to the perfect
match except that one base is changed at the center position).
The MM probe is supposed to distinguish noise caused by
non-specific hybridization from the specific hybridization
signal, although some researchers recommend avoiding its
use [2]. A typical microarray experiment has biological
and technical sources of variation [3]. Biological variation
results from tissue heterogeneity, genetic polymorphism, and
changes in mRNA levels within cells and among individuals
quality of array data. Therefore, since those systematic non-
biological sources of variation mask real biological variation,
significant pre-processing is required and involves four steps
for A↵ymetrix GeneChips: background correction, normal-
ization, PM correction and summarization [4].
Assessment of the e↵ectiveness of pre-processing has
mainly been confined to the ability to detect di↵erentially ex-
pressed genes [5] [6] or in terms of data variability, similarity
in data distributions and correlation among replicates [7].
However, a major use of microarrays is phenotype classi-
fication via expression-based classifiers: given a collection
of gene expression profiles for tissue samples belonging to
various cancer types, the goal is to build a classifier to
automatically determine the cancer type of a new sample
at high precision. Classifying cancer tissues based on their
gene expression profiles has the promise of providing more
reliable means to diagnose and predict various types of
cancers [8], but the accuracy of these predictions may depend
on the pre-processing method selected.
Thus, in this work, several cancer microarray data sets
are used to assess the e↵ect of di↵erent pre-processing
methods (RMA, GCRMA, VSN, dChip and MAS5) in high-
order analytical tasks such as classification using Support
Vector Machines (SVMs) with three di↵erent kernels: Linear,
Radial Basis Functions (RBFs) and polynomial. SVMs are
usually preferred in microarray-based classification due to
its outperformance compared to other paradigms, namely, k-
Nearest Neighbors, backpropagation and probabilistic neural
networks, weighted voting methods and decision trees [9]
due to two special aspects of microarray data: high dimen-
sionality and small sample size. Kernel methods represent
one way to cope with the curse of dimensionality [8].
Previous related work about the e↵ect of pre-processing
methods relative to classification has been focused on
cDNA microarrays using k-Nearest Neighbor classi-
fiers [10], [11], [12], Support Vector Machines [11], [12]
presenting unique genes. Each probe pair consists of two
oligonucleotides of 25 bp in length, namely perfect match
(PM) probes (the exact complement of an mRNA) and the
mismatch (MM) probes (which are identical to the perfect
match except that one base is changed at the center position).
The MM probe is supposed to distinguish noise caused by
non-specific hybridization from the specific hybridization
signal, although some researchers recommend avoiding its
use [2]. A typical microarray experiment has biological
and technical sources of variation [3]. Biological variation
results from tissue heterogeneity, genetic polymorphism, and
changes in mRNA levels within cells and among individuals
due to sex, age, race, genotype-environment interactions and
other ”living” factors. Biological variation is of interest to
researchers as it reflects true variation among experiments.
On the other hand, sample preparation, labeling, hybridiza-
tion and other steps of microarray experiment can contribute
to technical variation, which can significantly impact the
J.P.Florido, H.Pomares, I.Rojas, J.M.Urquiza and L.J.Herrera are with
the Department of Computer Architecture and Computer Technol-
ogy, CITIC-UGR, University of Granada, Spain (corresponding author:
jpflorido@ugr.es)
M.G.Claros is with the Department of Molecular Biology and Bioche-
mistry, University of Malaga, Spain
Radial Basis Functions (RBFs) and polynomial. SVMs are
usually preferred in microarray-based classification due to
its outperformance compared to other paradigms, namely, k-
Nearest Neighbors, backpropagation and probabilistic neural
networks, weighted voting methods and decision trees [9]
due to two special aspects of microarray data: high dimen-
sionality and small sample size. Kernel methods represent
one way to cope with the curse of dimensionality [8].
Previous related work about the e↵ect of pre-processing
methods relative to classification has been focused on
cDNA microarrays using k-Nearest Neighbor classi-
fiers [10], [11], [12], Support Vector Machines [11], [12]
and linear discriminant analysis, regular histogram, Gaussian
kernel, perceptron and multiple perceptron with majority
voting [12]. Instead, our study is related to A↵ymetrix
Genechips microarray technology.
Section II describes the main pre-processing methods
existing in the literature for A↵ymetrix Genechips, section
III introduces SVMs classifiers and section IV states experi-
mental results. Conclusions are drawn in section V.
II. Pre-processing Affymetrix Genechips
Instead of describing how every pre-processing method
(RMA, GCRMA, VSN, dChip and MAS5) works, they will
978-1-4244-8126-2/10/$26.00 ©2010 IEEE
VSN performs statistically better (P < 0.05) than the others.
So, these results suggest that RMA, VSN and dChip methods
are the preferred ones, which is consistent with the results
given in [7] and in terms of classification rate (Fig.1).
Fig. 4. Means and 95% LSD intervals of the di↵erent pre-processing
methods through the mean of Spearman Coe cient quality metric
From Figs.2 and 4 and focusing on the RMA and GCRMA
pre-processing methods, it can be observed the influence of
the background correction step employed (Table I). In this
case, there are statistical di↵erences (P < 0.05) in terms of
data variability and Spearman correlation coe cient quality
metrics between RMA and GCRMA preprocessing methods.
These statistical di↵erences were also present in terms of
misclassification rate (Fig.1).
Although this work studies the e↵ect of pre-processing
methods in terms of classification rate, it would be also
interesting to study whether the number of genes selected
in the feature selection step and the kernel method used in
the SVM classifier a↵ect the results.
From Fig.5, it can be observed that the accuracy of SVM
is a↵ected by the number of genes selected by t-test. There
are no statistical di↵erences (P > 0.05) when the number of
genes selected varies from 10 to 400. On the other hand,
when very few genes (5) are selected or the number is
large (600-2000 and the whole chip) SVM’s performance
gets worse. In the first case, the data does not contain
enough discriminative information and, in the second case,
per
rad
(P
the
ker
dec
con
in w
the
Fig.
kern
I
the
Ge
MA
Ma
lite
di↵
plo
sin
our
VS
mis
per
PROCEEDINGS Open Access
Gene expression pattern in swine neutrophils
after lipopolysaccharide exposure: a time course
comparison
Gema Sanz-Santos1
, Ángeles Jiménez-Marín1
, Rocío Bautista2
, Noé Fernández2
, Gonzalo M Claros2
, Juan J Garrido1*
From International Symposium on Animal Genomics for Animal Health (AGAH 2010)
Paris, France. 31 May – 2 June 2010
Abstract
Background: Experimental exposure of swine neutrophils to bacterial lipopolysaccharide (LPS) represents a model
to study the innate immune response during bacterial infection. Neutrophils can effectively limit the infection by
secreting lipid mediators, antimicrobial molecules and a combination of reactive oxygen species (ROS) without new
synthesis of proteins. However, it is known that neutrophils can modify the gene expression after LPS exposure. We
performed microarray gene expression analysis in order to elucidate the less known transcriptional response of
neutrophils during infection.
Methods: Blood samples were collected from four healthy Iberian pigs and neutrophils were isolated and incubated
during 6, 9 and 18 hrs in presence or absence of lipopolysaccharide (LPS) from Salmonella enterica serovar Typhimurium.
RNA was isolated and hybridized to Affymetrix Porcine GeneChip®
. Microarray data were normalized using Robust
Microarray Analysis (RMA) and then, differential expression was obtained by an analysis of variance (ANOVA).
Results: ANOVA data analysis showed that the number of differentially expressed genes (DEG) after LPS treatment vary
with time. The highest transcriptional response occurred at 9 hr post LPS stimulation with 1494 DEG whereas at 6 and
18 hr showed 125 and 108 DEG, respectively. Three different gene expression tendencies were observed: genes in
cluster 1 showed a tendency toward up-regulation; cluster 2 genes showing a tendency for down-regulation at 9 hr;
and cluster 3 genes were up-regulated at 9 hr post LPS stimulation. Ingenuity Pathway Analysis revealed a delay of
neutrophil apoptosis at 9 hr. Many genes controlling biological functions were altered with time including those
controlling metabolism and cell organization, ubiquitination, adhesion, movement or inflammatory response.
Conclusions: LPS stimulation alters the transcriptional pattern in neutrophils and the present results show that the
robust transcriptional potential of neutrophils under infection conditions, indicating that active regulation of gene
Sanz-Santos et al. BMC Proceedings 2011, 5(Suppl 4):S11
http://www.biomedcentral.com/1753-6561/5/S4/S11
Finally, cluster 3 consists of 335 up-regulated genes.
Functions associated with these molecules are related
to cellular assembly and reorganization, cellular main-
tenance and gene expression. Canonical pathways are
related to protein ubiquitination signaling, PDGF sig-
naling and IL-3 signaling which is involved in cell sur-
vival by activation of JAK/STAT signaling and BCL2
[10]. Network 2 (Additional file 4) highlights NF-B
interactions and covers several canonical pathways
such as acute phase response signaling and interferon
signaling.
Inhibition of spontaneous apoptosis at 9 hrs
Turnover of aging neutrophils occurs in the absence of
activation through a process known as spontaneous
Figure 2 Differentially expressed genes grouped into three different clusters. Cluster 1 contains 8 genes with up-regulation tendency
through the time course. 747 genes belonging the cluster 2, with a down-regulation tendency at 9 hr. Opposite tendency can be observed in
the cluster 3, where 335 genes show an up-regulation at 9 hr and down-regulation at 18 hr.
UP DOWN
hours 61 64
hours 388 1106
8 hours 50 58
61
388
50
64
1106
58
0
200
400
600
800
1000
1200
1400
1600
6 hours 9 hours 18 hours
DOWN
UP
Figure 3 Differentially expressed genes in each time point. 125
and 108 genes were altered at 6 and 18 hr respectively, with a
similar number of up and down-regulated genes. Most significant
transcriptional changes were observed at 9 hr post LPS stimulation.
1106 genes were down-regulated and 388 were up-regulated.
Sanz-Santos et al. BMC Proceedings 2011, 5(Suppl 4):S11
http://www.biomedcentral.com/1753-6561/5/S4/S11
Page 4 of 6
RESEARCH Open Access
Pyroptosis and adaptive immunity mechanisms
are promptly engendered in mesenteric
lymph-nodes during pig infections with
Salmonella enterica serovar Typhimurium
Rodrigo Prado Martins1
, Carmen Aguilar1
, James E Graham2
, Ana Carvajal3
, Rocío Bautista4
, M Gonzalo Claros4
and Juan J Garrido1*
Abstract
In this study, we explored the transcriptional response and the morphological changes occurring in porcine
mesenteric lymph-nodes (MLN) along a time course of 1, 2 and 6 days post infection (dpi) with Salmonella
Typhimurium. Additionally, we analysed the expression of some Salmonella effectors in tissue to complete our view
VETERINARY RESEARCH
Martins et al. Veterinary Research 2013, 44:120
http://www.veterinaryresearch.org/content/44/1/120
node in the network diagram represented a gene and its
relationship with other molecules was represented by a
line (solid and dotted lines represent direct and indirect
association respectively). Nodes with a red background
were input genes detected in this study while grey
nodes were molecules inserted by IPA based upon the
Ingenuity Knowledge Base to produce a highly connected
network. The score estimated the probability that a
collection of genes equal to or greater than the number
in a network could be achieved by chance alone. Scores
of 3 or higher were considered to have a 99.9% confi-
dence of not being generated by random chance alone.
For statistical analysis of enriched functions/pathways, an
IPA Knowledge Base was used as a reference set and the
Fisher’s exact test was employed to estimate the signifi-
cance of association. P-values below 0.05 were considered
statistically significant. For graphical representation of
the canonical pathways, the ratio indicates the percentage
of genes taking part in a pathway that could be found in
an uploaded data set and –log(p-value) means the level
of confidence of association. The threshold line repre-
sented a p-value of 0.05.
Relative gene expression analysis by qPCR
Real-time quantitative PCR (qPCR) assays were per-
formed as previously described [11]. Fold change values
were calculated by the 2−ΔΔCq
method [17] using beta-
actin as the reference gene. Afterwards, data were stan-
dardized as proposed by Willems et al. [18] and analyzed
by Kruskal–Wallis and Mann–Whitney tests using the
software SPSS 15.0 for Windows (SPSS Inc, Chicago, IL,
USA). Fold changes of 1 denoted no change in gene
expression. Values lower and higher than 1 denoted
down and up-regulation respectively. To be represented
in Table 1, a fold change of down-regulated genes
was calculated as −1/2−ΔΔCq
. Primer pairs used for
amplifications can be found as supporting information
(see Additional file 1).
Western blot analysis
For protein extractions, MLN samples from all experi-
mental animals were separately homogenized on ice with
lysis buffer (7 M urea, 2 M thiourea, 4% w/v CHAPS,
0.5 mM PMSF) using a glass tissue-lyser and protein
lysate concentration was determined using a Bradford
Protein Assay (Bio-Rad). Subsequently, protein from in-
dividual replicates belonging to the same group was
pooled (30 ug total), electrophoretically fractionated in
12% (w/v) SDS-PAGE gels and transferred onto a PVDF
membrane (Millipore, Bedford, MA, USA). Western blot
assays were carried out as described by Martins et al.
[10] employing the following primary antibodies: 4B7/8
for swine histocompatibility class I antigen (SLAI) detec-
tion [19], 1 F12 for swine histocompatibility class II
antigen (SLAII) detection [19], anti-CTLA4 (Epitomics,
Burlingame, CA, USA) and anti-Clathrin light chain
(ab24579, Abcam, Cambridge, UK). To confirm equal
sample loading, membranes were reblotted with anti-
GAPDH monoclonal antibody (GenScript, Picastaway,
NJ, USA) and no statistical differences for GAPDH
abundance were observed between groups in all assays.
Membranes were scanned in an FLA-5100 imager
Table 1 Microarray data validation by qPCR.
Gene MICROARRAY qPCR
Fold change BF Fold change p-value
1 dpi 2 dpi 6 dpi 1 dpi 2 dpi 6 dpi
CD180 1.7 2.6 1.5 0.0000429 1.1 1.8 1.2 0.010
CD1A 1.1 −1.4 1.2 0.00047793 −1.4 −2.5 1.2 0.013
DAB2 −1.2 −2.6 −1.2 6.62E-13 −3.1 −6.5 −2.6 0.001
EIF4H −1.1 −1.1 −1.1 0.0000101 −1.5 −1.4 −1.8 0.021
ENPP6 1.3 2.0 −1.2 0.0000448 1.2 1.8 −1.7 0.000
F13A1 1.4 2.2 −1.1 0.00000227 1 1.7 −2.2 0.012
HLA-Bb
1.0 −1.1 −1.2 0.00023747 −1.4 −1.4 −1.9 0.047
HLA-DRB5b
1.0 −1.1 1.0 0.0000311 −1.4 −1.6 −2 0.036
HSPA1Ba
3.3 1.4 −1.1 0.0001166 2.5 1.4 −1.3 0.025
HSPH1 2.3 1.7 −1.0 0.00000424 1.5 1.1 −2 0.003
IL16 −1.0 −1.2 −1.1 8.12E-07 1 −1.1 −1.5 0.035
LPCAT2 1.2 2.3 1.0 0.0000146 1.4 2 −1.3 0.010
PSMC2 −1.0 −1.0 −1.1 0.00105861 −1.1 −1.4 −1.8 0.036
TRAC −1.0 −1.1 −1.1 0.00000951 −1.5 −1.8 −1.8 0.010
a
Data from microarray analysis are mean values from two different probes. b
Amplified with SLA-B and SLA-DRB5 primers.
Martins et al. Veterinary Research 2013, 44:120 Page 3 of 14
http://www.veterinaryresearch.org/content/44/1/120
A miRNA Signature Predictive of Early Recurrence
Microarray de miRNA de Affymetrix
6
A microRNA Signature Associated with Early Recurrence
in Breast Cancer
Luis G. Pe´rez-Rivas1.
, Jose´ M. Jerez2.
, Rosario Carmona3
, Vanessa de Luque1
, Luis Vicioso4
,
M. Gonzalo Claros3,5
, Enrique Viguera6
, Bella Pajares1
, Alfonso Sa´nchez1
, Nuria Ribelles1
,
Emilio Alba1
, Jose´ Lozano1,5
*
1 Laboratorio de Oncologı´a Molecular, Servicio de Oncologı´a Me´dica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga,
Spain, 2 Departamento de Lenguajes y Ciencias de la Computacio´n, Universidad de Ma´laga, Ma´laga, Spain, 3 Plataforma Andaluza de Bioinforma´tica, Universidad de
Ma´laga, Ma´laga, Spain, 4 Servicio de Anatomı´a Patolo´gica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain,
5 Departmento de Biologı´a Molecular y Bioquı´mica, Universidad de Ma´laga, Ma´laga, Spain, 6 Departmento of Biologı´a Celular, Gene´tica y Fisiologı´a Animal, Universidad de
Ma´laga, Ma´laga, Spain
Abstract
Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern
after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years,
respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk
patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current
management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in
71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed
early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated
tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray
data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially
expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were
down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk
group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-
relapsing patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by public
databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in
an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related
microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast
surgery.
Citation: Pe´rez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoS
ONE 9(3): e91884. doi:10.1371/journal.pone.0091884
Editor: Sonia Rocha, University of Dundee, United Kingdom
Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014
Copyright: ß 2014 Pe´rez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio de
Economı´a, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucı´a (TIN-4026, to JJ). The funders had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: jlozano@uma.es
. These authors contributed equally to this work.
Introduction
Breast cancer comprises a group of heterogeneous diseases that
can be classified based on both clinical and molecular features [1–
5]. Improvements in the early detection of primary tumors and the
development of novel targeted therapies, together with the
systematic use of adjuvant chemotherapy, has drastically reduced
mortality rates and increased disease-free survival (DFS) in breast
cancer. Still, about one third of patients undergoing breast tumor
excision will develop metastases, the major life-threatening event
which is strongly associated with poor outcome [6,7].
The risk of relapse after tumor resection is not constant over
time. A detailed examination of large series of long-term follow-up
years, respectively, followed by a nearly flat plateau in which the
risk of relapse tends to zero [8–10]. A causal link between tumor
surgery and the bimodal pattern of recurrence has been proposed
by some investigators (i.e. an iatrogenic effect) [11]. According to
that model, surgical removal of the primary breast tumor would
accelerate the growth of dormant metastatic foci by altering the
balance between circulating pro- and anti-angiogenic factors
[9,11–14]. Such hypothesis is supported by the fact that the two
peaks of relapse are observed regardless other factors than surgery,
such as the axillary nodal status, the type of surgery or the
administration of adjuvant therapy. Although estrogen receptor
(ER)-negative tumors are commonly associated with a higher risk
In order to select the statistically significant and differentially
expressed miRNAs from Fig. 1, paired and multiple comparisons
among the prognosis groups A, B and C were performed. Two
different approaches, limma and RankProd Bioconductor, were
employed. Only those candidates with a fold change (FC).2
(either up- or down-regulated) and an adjusted p-value,0.05 were
selected (Table 2). Thus, comparison of the logFC and p-values
obtained with both limma and RankProd libraries led to the
identification of miR-149, miR-20b, miR-30a-3p, miR-342-5p,
downregulation in basal-like tumors. They also showed an inverse
relationship between the mitotic index and both miR-30a-3p and
miR-342-5p [76].
Differential expression of all six miRNAs were also determined
by RT-qPCR in the three prognosis groups (Table 2). With the
exception of miR-625, which could not be validated, miR-149,
miR-20b, miR10a, miR-30a-3p and miR-342-5p (the ‘‘5-miRNA
signature’’, from now on) were all confirmed to be down-regulated
in tumors from relapsing patients (groups B or C) when compared
Table 2. Most significant deregulated miRNAs in breast tumors from relapsing patients.
limma F* RankProd** RT-qPCR***
Comparison#
miRNA logFC adj-pval logFC adj-pval logFC SE
B/A hsa-miR-149 21.410 0.0016 21.615 ,0.00001 22.646 0.724
hsa-miR-20b 21.048 0.0071 21.237 ,0.00001 21.542 0.521
hsa-miR-30a-3p 21.359 0.0078 21.521 ,0.00001 21.001 0.514
hsa-miR-625 21.149 0.0014 21.377 ,0.00001 20.347 0.282
hsa-miR-10a 21.235 0.0168 21.547 ,0.00001 21.108 0.404
BC/A hsa-miR-149 21.120 0.0117 21.329 ,0.00001 22.555 0.681
hsa-miR-20b 21.016 0.0076 21.155 ,0.00001 21.470 0.536
hsa-miR-30a-3p 21.124 0.0256 21.326 ,0.00001 20.994 0.458
hsa-miR-625 21.003 0.0049 21.223 ,0.00001 20.266 0.237
B/AC hsa-miR-149 21.294 0.0052 21.446 ,0.00001 22.340 0.698
hsa-miR-10a 21.397 0.0093 21.647 ,0.00001 21.241 0.404
hsa-miR-342-5p 21.123 0.0159 21.254 ,0.00001 21.194 0.627
#
Group A = no recurrence, Group B = early recurrence (#24 months after surgery), Group C = late recurrence (50–60 months after surgery).
*limma F, analysis of filtered data (sd.70%) using limma.
**RankProd, analysis of unfiltered data using RankProduct algorithm.
***RT-qPCR, Relative miRNA expression was calculated using the DDCt method. The standard error (SE) was calculated based on the theory of error propagation [107].
doi:10.1371/journal.pone.0091884.t002
PLOS ONE | www.plosone.org 6 March 2014 | Volume 9 | Issue 3 | e91884
B
B
A
B
B
A
B
B
B
B
C
A
A
C
A
B
B
A
A
B
A
B
B
B
B
A
A
B
B
C
A
A
A
B
A
A
A
A
C
A
A
A
A
A
A
A
C
C
A
A
C
A
A
A
A
A
B
A
A
C
B
A
C
B
A
B
B
A
C
B
C
C
B
B
B
hsa−miR−10a_st
hsa−miR−149_st
hsa−miR−20b_st
hsa−miR−30a−star_st
hsa−miR−342−5p_st
Pérez-Rivas et al., Figure 2
-3
-2
-1
0
miR-10a
log2FoldChange
-3
-2
-1
0
miR-149
log2FoldChange
-3
-2
-1
0
miR-20b
log2FoldChange
-3
-2
-1
0
miR-30a-3p
log2FoldChange
-3
-2
-1
0
miR-342-5p
log2FoldChange
B vs A
BC vs A
B vs AC
A
B
COLABORACIÓN:
Emilio Alba

José M. Jerez
RNA-seq
7
SOFTWARE Open Access
SeqTrim: a high-throughput pipeline for
pre-processing any type of sequence read
Juan Falgueras1
, Antonio J Lara2
, Noé Fernández-Pozo3
, Francisco R Cantón3
, Guillermo Pérez-Trabado2,4
,
M Gonzalo Claros2,3*
Abstract
Background: High-throughput automated sequencing has enabled an exponential growth rate of sequencing
data. This requires increasing sequence quality and reliability in order to avoid database contamination with
artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-
processing algorithms.
Results: SeqTrim has been implemented both as a Web and as a standalone command line application. Already-
published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality,
vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of
several input and output formats allows its inclusion in sequence processing workflows. Due to its specific
algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It
performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing
reads and does not lead to over-trimming.
Conclusions: SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including
next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know
what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual
sequence if desired. The recommended pipeline reveals more information about each sequence than previously
described pre-processors and can discard more sequencing or experimental artefacts.
Background
Sequencing projects and Expressed Sequence Tags
(ESTs) are essential for gene discovery, mapping, func-
tional genomics and for future efforts in genome anno-
tations, which include identification of novel genes, gene
location, polymorphisms and even intron-exon bound-
aries. The availability of high-throughput automated
sequencing has enabled an exponential growth rate of
sequence data, although not always with the desired
quality. This exponential growth is enhanced by the so
called “next-generation sequencing”, and efforts have to
be made in order to increase the quality and reliability
of sequences incorporated into databases: up to 0.4% of
sequences in nucleotide databases contain contaminant
sequences [1,2]. The situation is even worse in the EST
databases, where vector contamination rate reach 1.63%
of sequences [3]. Hence, improved and user friendly
bioinformatic tools are required to produce more reli-
able high-throughput pre-processing methods.
Pre-processing includes filtering of low-quality
sequences, identification of specific features (such as
poly-A or poly-T tails, terminal transferase tails, and
adaptors), removal of contaminant sequences (from vec-
tor to any other artefacts) and trimming the undesired
segments. There are some bioinformatic tools that can
accomplish individual pre-processing aspects (e.g. Trim-
Seq, TrimEST, VectorStrip, VecScreen, ESTPrep [4],
crossmatch, Figaro [5]), and other programs that cope
with the complete pre-processing pipeline such as
PreGap4 [6] or the broadly used tools Lucy [7,8] and
SeqClean [9]. Most of these require installation, are dif-
ficult to configure, environment-specific, or focused on
specific needs (like a design only for ESTs), or require a
change in implementation and design of either the pro-
gram or the protocols within the laboratory itself.
* Correspondence: claros@uma.es
2
Plataforma Andaluza de Bioinformática, Universidad de Málaga, 29071
Málaga, Spain
Falgueras et al. BMC Bioinformatics 2010, 11:38
http://www.biomedcentral.com/1471-2105/11/38
© 2010 Falgueras et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
DEgenes Hunter - A Self-customised Gene
Expression Analysis Workflow for Non-model
Organisms
Isabel Gonz´alez Gayte1
, Roc´ıo Bautista Moreno2
, and M. Gonzalo Claros1,2
1
Departamento de Biolog´ıa Molecular y Bioqu´ımica, Universidad de M´alaga,
29071 M´alaga, Spain
2
Plataforma Andaluza de Bioinform´atica, Centro de Supercomputaci´on y
Bioinnovaci´on, Universidad de M´alaga,
29071 M´alaga, Spain
Abstract. Data from high-throughput RNA sequencing require the de-
velopment of more sophisticate bioinformatics tools to perform optimal
gene expression analysis. Several R libraries are well considered for differ-
ential expression analyses but according to recent comparative studies,
there is still an overall disagreement about which one is the most appro-
priate for each experiment. The applicable R libraries mainly depend on
the presence or not of a reference genome and the number of replicates
gene expression analysis. Several R libraries are well considered for differ-
ential expression analyses but according to recent comparative studies,
there is still an overall disagreement about which one is the most appro-
priate for each experiment. The applicable R libraries mainly depend on
the presence or not of a reference genome and the number of replicates
per condition. Here it is presented DEgenes Hunter, a RNA-seq analysis
workflow for the detection of differentially expressed genes (DEGs) in
organisms without genomic reference. The first advantage of DEgenes
Hunter over other available solutions is that it is able to decide the most
suitable algorithms to be employed according to the number of biological
replicates provided in the sample. The different workflow branches allow
its automatic self-customisation depending on the input data, when used
by users without advanced statistical and programming skills. All appli-
cable libraries served to obtain their respective DEGs and, as another
advantage, genes marked as DEGs by all R packages employed are consid-
ered ‘common DEGs’, showing the lowest false discovery rate compared
to the ‘complete DEGs’ group. A third advantage of DEgenes Hunter is
that it comes with an integrated quality control module to discard or
disregard low quality data before and after preprocessing. The ‘common
DEGs’ are finally submitted to a functional gene set enrichment analysis
(GSEA) and clustering. All results are provided as a PDF report.
Keywords: RNA-seq, R, pipeline, workflow, differential expression,
bioinformatic tool, functional analysis.
1 Introduction
Nowadays, high-throughput technologies are well considered for genetic stud-
ies. For the analysis of gene expression profiles, data are obtained from RNA
sequencing (RNA-seq) experiments. RNA-seq provides precise measurements of
F. Ortu˜no and I. Rojas (Eds.): IWBBIO 2015, Part II, LNCS 9044, pp. 313–321, 2015.
c⃝ Springer International Publishing Switzerland 2015
http://www.scbi.uma.es/seqtrimnext
MiSeq @ CIMES
Estamos trabajando para aplicarlo en organismos
modelo: vid, lenguado y humanos
Siempre confirmamos con varios algoritmos
8
DEgenes Hunter - A Self-customised Gene Expression Analysis Workflow 315
Input (Count Data)
Data Filtering
Replicates  1 ?
Replicates 3 ?
DESeq2
edgeR
limma
NOISeq
DESeq2
DESeq2
edgeR
FUNCTIONAL ANALYSiS
topGO
Headmap and Clustering
Output
(Pdf Report)
YES
YES
NO
NO
Fig. 1. DEgenes Hunter main workflow
2 Methods
DEgenes Hunter - A Self-customised Gene Expression Analysis Workflow 317
GO:0003674
molecular_function
1.0000
225 / 41433
GO:0003824
catalytic activity
0.0012
128 / 19303
GO:0004347
glucose−6−phosphate ...
2.02e−11
7 / 22
GO:0004497
monooxygenase activi...
9.77e−11
15 / 294
GO:0005488
binding
0.9677
127 / 25778
GO:0008289
lipid binding
8.45e−16
29 / 797
GO:0016491
oxidoreductase activ...
3.08e−19
50 / 2066
GO:0016853
isomerase activity
3.28e−05
11 / 440
GO:0016860
intramolecular oxido...
1.68e−08
8 / 82
GO:0016861
intramolecular oxido...
4.79e−10
8 / 53
GO:0046906
tetrapyrrole binding
6.07e−11
16 / 335
GO:0097159
organic cyclic compo...
0.9982
57 / 14111
GO:1901363
heterocyclic compoun...
0.9981
57 / 14093
1 2 3 4 5 6
−1.5−1.0−0.50.00.51.01.5
sample
Samples
1.5
1.0
0.5
0.0
–0.5
–1.0
–1.5
Zscoreexpression
C1 C2 C3 T1 T2 T3
A
B
C
Samples
C1 C2 C3 T1 T2 T3
Fig. 2. Example analyses that can be performed with DEgenes Hunter on the ‘common
DEGs’ group. A: A GSEA analysis performed with topGO, where rectangle colour
represents the relative significance, ranging from dark red (most significant) to bright
yellow (least significant). B: A typical heatmap that can also be used as a quality
control to verify that control samples (C1, C2 and C3) and treatment samples (T1, T2
and T3) are grouped together. C: Expression clustering performed using cluster where
the genes have similar expression levels among control samples, and a clearly higher
value in treatment samples.
3.2 Performance Testing
Utility of ‘common DEGs’ group was confirmed comparing their FDR values.
Figure 3 shows that the FDR for ‘common DEGs’ is considerably lower than
for ‘complete DEGs’ and ‘non-common DEGs’ using separately any R package.
Since there is no clear way to set the threshold for qNOISeq [15], it is very high
in all cases.
DEgenes Hunter - A Self-customised Gene Expression Analysis Workflow 31
100/0 50/50 0/100
Fig. 4. Venn diagrams showing the numbers of DEGs found in synthetic data whe
different DEG ratios are used. 100/0 corresponds to all over-expressed/none repressed
50/50 is the balanced ratio, and 100/0 corresponds to none over-expressed/all re
pressed.
of a Pinus pinaster gene, one from photosynthetic tissue
and one from non-photosynthetic tissue (Table 1) were
analysed. Sequences were aligned with MultAlin using
identified a divergent region, and that the primers were
correctly designed and worked as predicted by the
software.
Figure 6 Use of AlignMiner for designing several specific primer pairs for PCR amplification of the different isoforms of the AtGS1
nucleotide sequence (A) The 5’ and 3’ divergent regions obtained with Entropy that were selected for primer design including the
characteristic parameters of each region. (B) Results of the in silico “PCR amplification” with BioPHP [34] using the different primer pairs. Note that
the actual 3’ primers are complementary to the sequences shown on the right.
Guerrero et al. Algorithms for Molecular Biology 2010, 5:24
http://www.almob.org/content/5/1/24
Page 12 of 16
¿Qué región es más variable en un alineamiento?
9
SOFTWARE ARTICLE Open Access
AlignMiner: a Web-based tool for detection of
divergent regions in multiple sequence
alignments of conserved sequences
Darío Guerrero1
, Rocío Bautista1
, David P Villalobos2
, Francisco R Cantón2
, M Gonzalo Claros1,2*
Abstract
Background: Multiple sequence alignments are used to study gene or protein function, phylogenetic relations,
genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus
on conserved segments or residues. Small divergent regions, however, are biologically important for specific
quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and
yet have received little attention. As a consequence, they must be selected empirically by the researcher.
AlignMiner has been developed to fill this gap in bioinformatic analyses.
Results: AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of
conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained
using any of a variety of algorithms, which does not appear to have a significant impact on the final results.
AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method
that provides the highest number of regions with the greatest length, and Weighted being the most restrictive.
Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master
sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable
user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their
results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and
experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific
polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a
module that deploys several oligonucleotide parameters for designing primers “on the fly”.
Conclusions: AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide
different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its
usage will save researchers’ time and ensure an objective selection of the best-possible divergent region when
closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer.
Background
Since the early days of bioinformatics, the elucidation of
similarities between sequences has been an attainable
goal to bioinformaticians and other scientists. In fact,
multiple sequence alignments (MSAs) stand at a cross-
road between computation and biology and, as a result,
long-standing programs for DNA or protein MSAs are
nowadays widely used, offering high quality MSAs. In
recent years, by means of similarities between sequences
and due to the rapid accumulation of gene and genome
sequences, it has been possible to predict the function
and role of a number of genes, discern protein structure
and function [1], perform new phylogenetic tree recon-
struction, conduct genome evolution studies [2], and
design primers. Several scores for quantification of resi-
due conservation and even detection of non-strictly-con-
served residues have been developed that depend on the
composition of the surrounding residue sequence [3],
and new sequence aligners are able to integrate highly
heterogeneous information and a very large number of
sequences. Without exception, the sequence similarity of
* Correspondence: claros@uma.es
1
Plataforma Andaluza de Bioinformática (Universidad de Málaga), Severo
Ochoa, 34, 29590 Málaga, Spain
Guerrero et al. Algorithms for Molecular Biology 2010, 5:24
http://www.almob.org/content/5/1/24
© 2010 Guerrero et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Table 2 Details of primers designed with AlignMiner to identify specifically by PCR the five A. thaliana GS1 genes as
well as the two primer pairs that identify the photosynthetic and non-photosynthetic isoforms of P. pinaster; note
that the 3’ (reverse) primer is complementary to the sequence appearing in Figures 6 and 8.
Isoform Primer Length %GC Tm (°C) Amplicon size (bp)
GS1.1 5’-GGTCTTTAGCAACCCTGA-3’ 18 50 54.6 740
5’-ATCATCAAGGATTCCAGA-3’ 18 39 48.7
GS1.2 5’-GATCTTTGCTAACCCTGA-3’ 18 44 51.3 739
5’-CTTTCAAGGGTTCCAGAG-3’ 18 50 53.6
GS1.3 5’-AATCTTCGATCATCCCAA-3’ 18 39 50 739
5’-AAAGTCTAAAGCTTAGAG-3’ 18 33 46
GS1.4 5’-GATCTTCAGCCACCCCGA-3’ 18 61 59.4 739
5’-AATGTGTCATCAACCGAG-3’ 18 44 51.5
GS1.5 5’-GATCTTTGAAGACCCTAG-3’ 18 44 48.8 740
5’-TCTTTCATGGTTTCCAAA-3’ 18 33 50.1
Photosyntetic isoform 5’-AGTGCGCATTAAGGACCCATCA-3’ 22 50 61 177
5’-ACACACTGGCTTCCACAATAGG-3’ 22 50 59.4
Non-photosynthetic isoform 5’-ACAGATGATCTAGGACATGC-3’ 20 45 52 169
5’-CACTTATTTGCACTTGAAGG-3’ 20 40 52.6
Figure 7 Correlation between the most divergent amino acid sequences and antigenicity of the AtGS1 protein MSA. (A) Similarity plot
obtained using the Entropy method; the most divergent regions being are highlighted. (B) Aligned sequences for the two divergent regions
together (underlined in black) and their score in relation to other divergent regions. (C) Localisation of each divergent region in the alignment
where: (i) nucleotides in bold are the predicted epitopes for B-cells; (ii) an “e” denotes predicted solvent accessibility for this position; and (iii)
red-boxed amino acids correspond to the sequence of the matching divergent region. It is clearly seen that divergent sequences overlap with
the predicted epitopes and the solvent-accessible amino acids.
Guerrero et al. Algorithms for Molecular Biology 2010, 5:24
http://www.almob.org/content/5/1/24
Page 13 of 16
Cebadores
capaces de
distinguir alelos
Epítopos
específicos
http://www.scbi.uma.es/alignminer
of a Pinus pinaster gene, one from photosynthetic tissue
and one from non-photosynthetic tissue (Table 1) were
analysed. Sequences were aligned with MultAlin using
identified a divergent region, and that the primers were
correctly designed and worked as predicted by the
software.
Figure 6 Use of AlignMiner for designing several specific primer pairs for PCR amplification of the different isoforms of the AtGS1
nucleotide sequence (A) The 5’ and 3’ divergent regions obtained with Entropy that were selected for primer design including the
characteristic parameters of each region. (B) Results of the in silico “PCR amplification” with BioPHP [34] using the different primer pairs. Note that
the actual 3’ primers are complementary to the sequences shown on the right.
Guerrero et al. Algorithms for Molecular Biology 2010, 5:24
http://www.almob.org/content/5/1/24
Page 12 of 16
¿Qué región es más variable en un alineamiento?
9
SOFTWARE ARTICLE Open Access
AlignMiner: a Web-based tool for detection of
divergent regions in multiple sequence
alignments of conserved sequences
Darío Guerrero1
, Rocío Bautista1
, David P Villalobos2
, Francisco R Cantón2
, M Gonzalo Claros1,2*
Abstract
Background: Multiple sequence alignments are used to study gene or protein function, phylogenetic relations,
genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus
on conserved segments or residues. Small divergent regions, however, are biologically important for specific
quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and
yet have received little attention. As a consequence, they must be selected empirically by the researcher.
AlignMiner has been developed to fill this gap in bioinformatic analyses.
Results: AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of
conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained
using any of a variety of algorithms, which does not appear to have a significant impact on the final results.
AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method
that provides the highest number of regions with the greatest length, and Weighted being the most restrictive.
Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master
sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable
user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their
results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and
experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific
polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a
module that deploys several oligonucleotide parameters for designing primers “on the fly”.
Conclusions: AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide
different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its
usage will save researchers’ time and ensure an objective selection of the best-possible divergent region when
closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer.
Background
Since the early days of bioinformatics, the elucidation of
similarities between sequences has been an attainable
goal to bioinformaticians and other scientists. In fact,
multiple sequence alignments (MSAs) stand at a cross-
road between computation and biology and, as a result,
long-standing programs for DNA or protein MSAs are
nowadays widely used, offering high quality MSAs. In
recent years, by means of similarities between sequences
and due to the rapid accumulation of gene and genome
sequences, it has been possible to predict the function
and role of a number of genes, discern protein structure
and function [1], perform new phylogenetic tree recon-
struction, conduct genome evolution studies [2], and
design primers. Several scores for quantification of resi-
due conservation and even detection of non-strictly-con-
served residues have been developed that depend on the
composition of the surrounding residue sequence [3],
and new sequence aligners are able to integrate highly
heterogeneous information and a very large number of
sequences. Without exception, the sequence similarity of
* Correspondence: claros@uma.es
1
Plataforma Andaluza de Bioinformática (Universidad de Málaga), Severo
Ochoa, 34, 29590 Málaga, Spain
Guerrero et al. Algorithms for Molecular Biology 2010, 5:24
http://www.almob.org/content/5/1/24
© 2010 Guerrero et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Table 2 Details of primers designed with AlignMiner to identify specifically by PCR the five A. thaliana GS1 genes as
well as the two primer pairs that identify the photosynthetic and non-photosynthetic isoforms of P. pinaster; note
that the 3’ (reverse) primer is complementary to the sequence appearing in Figures 6 and 8.
Isoform Primer Length %GC Tm (°C) Amplicon size (bp)
GS1.1 5’-GGTCTTTAGCAACCCTGA-3’ 18 50 54.6 740
5’-ATCATCAAGGATTCCAGA-3’ 18 39 48.7
GS1.2 5’-GATCTTTGCTAACCCTGA-3’ 18 44 51.3 739
5’-CTTTCAAGGGTTCCAGAG-3’ 18 50 53.6
GS1.3 5’-AATCTTCGATCATCCCAA-3’ 18 39 50 739
5’-AAAGTCTAAAGCTTAGAG-3’ 18 33 46
GS1.4 5’-GATCTTCAGCCACCCCGA-3’ 18 61 59.4 739
5’-AATGTGTCATCAACCGAG-3’ 18 44 51.5
GS1.5 5’-GATCTTTGAAGACCCTAG-3’ 18 44 48.8 740
5’-TCTTTCATGGTTTCCAAA-3’ 18 33 50.1
Photosyntetic isoform 5’-AGTGCGCATTAAGGACCCATCA-3’ 22 50 61 177
5’-ACACACTGGCTTCCACAATAGG-3’ 22 50 59.4
Non-photosynthetic isoform 5’-ACAGATGATCTAGGACATGC-3’ 20 45 52 169
5’-CACTTATTTGCACTTGAAGG-3’ 20 40 52.6
Figure 7 Correlation between the most divergent amino acid sequences and antigenicity of the AtGS1 protein MSA. (A) Similarity plot
obtained using the Entropy method; the most divergent regions being are highlighted. (B) Aligned sequences for the two divergent regions
together (underlined in black) and their score in relation to other divergent regions. (C) Localisation of each divergent region in the alignment
where: (i) nucleotides in bold are the predicted epitopes for B-cells; (ii) an “e” denotes predicted solvent accessibility for this position; and (iii)
red-boxed amino acids correspond to the sequence of the matching divergent region. It is clearly seen that divergent sequences overlap with
the predicted epitopes and the solvent-accessible amino acids.
Guerrero et al. Algorithms for Molecular Biology 2010, 5:24
http://www.almob.org/content/5/1/24
Page 13 of 16
Cebadores
capaces de
distinguir alelos
Epítopos
específicosGuerrero et al. Algorithms for Molecular Biology 2010, 5:24
http://www.almob.org/content/5/1/24
Page 14 of 16
http://www.scbi.uma.es/alignminer
Bases de datos de genomas
10
Genetic and physical mapping of the QTLAR3 controlling
blight resistance in chickpea (Cicer arietinum L)
E. Madrid • P. Seoane • M. G. Claros •
F. Barro • J. Rubio • J. Gil • T. Milla´n
Received: 14 January 2014 / Accepted: 14 February 2014 / Published online: 26 February 2014
Ó Springer Science+Business Media Dordrecht 2014
Abstract Physical and genetic maps of chickpea a
QTL related to Ascochyta blight resistance and
located in LG2 (QTLAR3) have been constructed.
Single-copy markers based on candidate genes located
in the Ca2 pseudomolecule were for the first time
obtained and found to be useful for refining the QTL
position. The location of the QTLAR3 peak was linked
to an ethylene insensitive 3-like gene (Ein3). The Ein3
gene explained the highest percentage of the total
phenotypic variation for resistance to blight (44.3 %)
with a confidence interval of 16.3 cM. This genomic
region was predicted to be at the Ca2 physical position
32–33 Mb, comprising 42 genes. Candidate genes
located in this region include Ein3, Avr9/Cf9 and
Argonaute 4, directly involved in disease resistance
mechanisms. However, there are other genes outside
the confidence interval that may play a role in the
blight resistance pathway. The information reported in
this paper will facilitate the development of functional
markers to be used in the screening of germplasm
collections or breeding materials, improving the
efficiency and effectiveness of conventional breeding
methods.
Keywords Ascochyta blight Á CandidategenesÁ
Physical map Á Molecular markers
Introduction
Chickpea (Cicer arietinum L.) is a self-pollinated
diploid (2n = 2x = 16) annual grain legume widely
grown in arid and semi-arid areas across the six
continents. Together with other pulse crops, such as
lentil (Lens culinaris Medik.), dry pea (Pisum sativum
L.) and dry bean (Phaseolus vulgaris L.), chickpea is a
major source of protein in human diets, particularly in
low-income countries. In addition, chickpea crops
play an important role in the maintenance of soil
fertility, particularly in dry, rain-fed areas (Berrada
et al. 2007).
One of the most important factors contributing to
instability in chickpea yields is Ascochyta blight,
Electronic supplementary material The online version of
this article (doi:10.1007/s10681-014-1084-6) contains supple-
mentary material, which is available to authorized users.
E. Madrid () Á F. Barro
Institute for Sustainable Agriculture, CSIC, Apdo 4084,
14080 Co´rdoba, Spain
e-mail: b62mahee@uco.es
P. Seoane Á M. G. Claros
Departamento de Biologı´a Molecular y Bioquı´mica, y
Plataforma Andaluza de Bioinforma´tica, Universidad de
Ma´laga, 29071 Ma´laga, Spain
J. Rubio
A´ rea de Mejora y Biotecnologı´a, IFAPA Centro Alameda
del Obispo, Apdo 3092, 14080 Co´rdoba, Spain
J. Gil Á T. Milla´n
Departamento de Gene´tica, Universidad de Co´rdoba,
Campus Rabanales, Edif. C5, 14071 Co´rdoba, Spain
123
Euphytica (2014) 198:69–78
DOI 10.1007/s10681-014-1084-6
Genetic and physical mapping of the QTLAR3 controlling
blight resistance in chickpea (Cicer arietinum L)
E. Madrid • P. Seoane • M. G. Claros •
F. Barro • J. Rubio • J. Gil • T. Milla´n
Received: 14 January 2014 / Accepted: 14 February 2014 / Published online: 26 February 2014
Ó Springer Science+Business Media Dordrecht 2014
Abstract Physical and genetic maps of chickpea a
QTL related to Ascochyta blight resistance and
located in LG2 (QTLAR3) have been constructed.
Single-copy markers based on candidate genes located
in the Ca2 pseudomolecule were for the first time
obtained and found to be useful for refining the QTL
position. The location of the QTLAR3 peak was linked
to an ethylene insensitive 3-like gene (Ein3). The Ein3
gene explained the highest percentage of the total
phenotypic variation for resistance to blight (44.3 %)
with a confidence interval of 16.3 cM. This genomic
region was predicted to be at the Ca2 physical position
32–33 Mb, comprising 42 genes. Candidate genes
located in this region include Ein3, Avr9/Cf9 and
Argonaute 4, directly involved in disease resistance
mechanisms. However, there are other genes outside
the confidence interval that may play a role in the
blight resistance pathway. The information reported in
this paper will facilitate the development of functional
markers to be used in the screening of germplasm
collections or breeding materials, improving the
efficiency and effectiveness of conventional breeding
methods.
Keywords Ascochyta blight Á CandidategenesÁ
Physical map Á Molecular markers
Introduction
Chickpea (Cicer arietinum L.) is a self-pollinated
diploid (2n = 2x = 16) annual grain legume widely
grown in arid and semi-arid areas across the six
continents. Together with other pulse crops, such as
lentil (Lens culinaris Medik.), dry pea (Pisum sativum
L.) and dry bean (Phaseolus vulgaris L.), chickpea is a
major source of protein in human diets, particularly in
low-income countries. In addition, chickpea crops
play an important role in the maintenance of soil
fertility, particularly in dry, rain-fed areas (Berrada
et al. 2007).
One of the most important factors contributing to
instability in chickpea yields is Ascochyta blight,
Electronic supplementary material The online version of
this article (doi:10.1007/s10681-014-1084-6) contains supple-
mentary material, which is available to authorized users.
E. Madrid () Á F. Barro
Institute for Sustainable Agriculture, CSIC, Apdo 4084,
14080 Co´rdoba, Spain
e-mail: b62mahee@uco.es
P. Seoane Á M. G. Claros
Departamento de Biologı´a Molecular y Bioquı´mica, y
Plataforma Andaluza de Bioinforma´tica, Universidad de
Ma´laga, 29071 Ma´laga, Spain
J. Rubio
A´ rea de Mejora y Biotecnologı´a, IFAPA Centro Alameda
del Obispo, Apdo 3092, 14080 Co´rdoba, Spain
J. Gil Á T. Milla´n
Departamento de Gene´tica, Universidad de Co´rdoba,
Campus Rabanales, Edif. C5, 14071 Co´rdoba, Spain
123
Euphytica (2014) 198:69–78
DOI 10.1007/s10681-014-1084-6
SNP
SNP
BD de transcriptomas
11
De novo assembly of maritime pine transcriptome:
implications for forest breeding and biotechnology
Javier Canales1,†
, Rocio Bautista2,†
, Philippe Label3†
, Josefa Gomez-Maldonado1
, Isabelle Lesur4,5,6
,
Noe Fernandez-Pozo2
, Marina Rueda-Lopez1
, Dario Guerrero-Fernandez2
, Vanessa Castro-Rodrıguez1
,
Hicham Benzekri2
, Rafael A. Ca~nas1
, Marıa-Angeles Guevara7
, Andreia Rodrigues8
, Pedro Seoane2
,
Caroline Teyssier9
, Alexandre Morel9
, Francßois Ehrenmann4,5
, Gregoire Le Provost4,5
, Celine Lalanne4,5
, Celine
Noirot10
, Christophe Klopp10
, Isabelle Reymond11
, Angel Garcıa-Gutierrez1
, Jean-Francßois Trontin11
, Marie-Anne
Lelu-Walter9
, Celia Miguel8
, Marıa Teresa Cervera7
, Francisco R. Canton1
, Christophe Plomion4,5
, Luc Harvengt11
,
Concepcion Avila1,2
, M. Gonzalo Claros1,2
and Francisco M. Canovas1,2,
*
1
Departamento de Biologıa Molecular y Bioquımica, Facultad de Ciencias, Universidad de Malaga, Malaga, Spain
2
Plataforma Andaluza de Bioinformatica, Edificio de Bioinnovacion, Parque Tecnologico de Andalucıa, Malaga, Spain
3
INRA, Universite Blaise Pascal, Aubiere Cedex, France
4
INRA, Cestas, France
5
Universite de Bordeaux, Talence, France
6
HelixVenture, Merignac, France
7
Departamento de Ecologıa y Genetica Forestal, INIA-CIFOR, Madrid, Spain
8
Forest Biotech Lab, IBET/ITQB, Oeiras, Portugal
9
INRA, Unite Amelioration, Genetique et Physiologie Forestieres, Orleans Cedex 2, France
10
INRA de Toulouse Midi-Pyrenees, Auzeville, Castanet Tolosan cedex, France
11
FCBA, P^ole Biotechnologie et Sylviculture, Cestas, France
Received 20 July 2013;
revised 24 September 2013;
accepted 26 September 2013.
*Correspondence (Tel: +34 952131942;
fax: +34 952132376;
email: canovas@uma.es)
†
These authors contributed equally to work.
Summary
Maritime pine (Pinus pinaster Ait.) is a widely distributed conifer species in Southwestern
Europe and one of the most advanced models for conifer research. In the current work,
comprehensive characterization of the maritime pine transcriptome was performed using a
combination of two different next-generation sequencing platforms, 454 and Illumina.
De novo assembly of the transcriptome provided a catalogue of 26 020 unique transcripts in
maritime pine trees and a collection of 9641 full-length cDNAs. Quality of the transcriptome
assembly was validated by RT-PCR amplification of selected transcripts for structural and
regulatory genes. Transcription factors and enzyme-encoding transcripts were annotated.
Furthermore, the available sequencing data permitted the identification of polymorphisms and
Plant Biotechnology Journal (2014) 12, pp. 286–299 doi: 10.1111/pbi.12136
http://www.scbi.uma.es/sustainpinedb/
RESEARCH ARTICLE Open Access
De novo assembly, characterization and functional
annotation of Senegalese sole (Solea senegalensis)
and common sole (Solea solea) transcriptomes:
integration in a database and design of a
microarray
Hicham Benzekri1,2
, Paula Armesto3
, Xavier Cousin4,5
, Mireia Rovira6
, Diego Crespo6
, Manuel Alejandro Merlo7
,
David Mazurais8
, Rocío Bautista2
, Darío Guerrero-Fernández2
, Noe Fernandez-Pozo1
, Marian Ponce3
, Carlos Infante9
,
Jose Luis Zambonino8
, Sabine Nidelet10
, Marta Gut11
, Laureana Rebordinos7
, Josep V Planas6
, Marie-Laure Bégout4
,
M Gonzalo Claros1,2
and Manuel Manchado3*
Abstract
Background: Senegalese sole (Solea senegalensis) and common sole (S. solea) are two economically and
evolutionary important flatfish species both in fisheries and aquaculture. Although some genomic resources and
tools were recently described in these species, further sequencing efforts are required to establish a complete
transcriptome, and to identify new molecular markers. Moreover, the comparative analysis of transcriptomes will be
useful to understand flatfish evolution.
Results: A comprehensive characterization of the transcriptome for each species was carried out using a large set
of Illumina data (more than 1,800 millions reads for each sole species) and 454 reads (more than 5 millions reads
only in S. senegalensis), providing coverages ranging from 1,384x to 2,543x. After a de novo assembly, 45,063 and
38,402 different transcripts were obtained, comprising 18,738 and 22,683 full-length cDNAs in S. senegalensis and S.
solea, respectively. A reference transcriptome with the longest unique transcripts and putative non-redundant new
transcripts was established for each species. A subset of 11,953 reference transcripts was qualified as highly reliable
orthologs (97% identity) between both species. A small subset of putative species-specific, lineage-specific and
flatfish-specific transcripts were also identified. Furthermore, transcriptome data permitted the identification of single
nucleotide polymorphisms and simple-sequence repeats confirmed by FISH to be used in further genetic and expression
studies. Moreover, evidences on the retention of crystallins crybb1, crybb1-like and crybb3 in the two species of soles are
also presented. Transcriptome information was applied to the design of a microarray tool in S. senegalensis that was
successfully tested and validated by qPCR. Finally, transcriptomic data were hosted and structured at SoleaDB.
Conclusions: Transcriptomes and molecular markers identified in this study represent a valuable source for future
genomic studies in these economically important species. Orthology analysis provided new clues regarding sole
genome evolution indicating a divergent evolution of crystallins in flatfish. The design of a microarray and establishment
of a reference transcriptome will be useful for large-scale gene expression studies. Moreover, the integration of
Benzekri et al. BMC Genomics 2014, 15:952
http://www.biomedcentral.com/1471-2164/15/952
http://www.juntadeandalucia.es/
agriculturaypesca/ifapa/soleadb_ifapa/
ReprOlive y alérgenos nuevos
12
Unigen
number
QSEQID FLN_STATUS FLN_HIT_DEFINITION SACC
ALLERGOME
CODE
SDEFINITION
1 olive_transcript_000475 Complete Sure sp=5-methyltetrahydropteroyltriglutamate--homocysteine methyltransferase; Catharanthus roseus (Madagascar periwinkle) (Vinca rosea).E3VW74 - Pollen allergen MetE (Fragment) OS=Amaranthus retroflexus PE=2 SV=1
2 olive_transcript_000659 Complete Sure sp=Luminal-binding protein 5; Nicotiana tabacum (Common tobacco).Q9FSY7 243; 3215 Putative luminal binding protein OS=Corylus avellana GN=BiP PE=2 SV=1
3 olive_transcript_002489 Complete Putative sp=Cysteine proteinase RD19a; Arabidopsis thaliana (Mouse-ear cress).A5HIJ3 1 Cysteine protease Cp3 OS=Actinidia deliciosa PE=2 SV=1
4 olive_transcript_003129 Complete Sure sp=Malate dehydrogenase, mitochondrial; Fragaria ananassa (Strawberry).P17783 6159 Malate dehydrogenase, mitochondrial OS=Citrullus lanatus GN=MMDH PE=1 SV=1
5 olive_transcript_003931 Complete Sure sp=L-ascorbate peroxidase 1, cytosolic; Arabidopsis thaliana (Mouse-ear cress).Q42661 2423 L-ascorbate peroxidase OS=Capsicum annuum PE=2 SV=1
6 olive_transcript_005675 C_terminal Putative sp=Glyceraldehyde-3-phosphate dehydrogenase, cytosolic; Petroselinum crispum (Parsley) (Petroselinum hortense).C7C4X1 9501; 9502 Glyceraldehyde-3-phosphate dehydrogenase OS=Triticum aestivum GN=ga3pd PE=2 SV=1
7 olive_transcript_007323 Complete Putative sp=Triosephosphate isomerase, cytosolic; Petunia hybrida (Petunia).Q9FS79 920; 9498 Triosephosphate isomerase OS=Triticum aestivum GN=tpis PE=2 SV=1
8 olive_transcript_008377 C_terminal Sure sp=Glyceraldehyde-3-phosphate dehydrogenase, cytosolic; Antirrhinum majus (Garden snapdragon).C7C4X1 9501; 9502 Glyceraldehyde-3-phosphate dehydrogenase OS=Triticum aestivum GN=ga3pd PE=2 SV=1
9 olive_transcript_008559 Complete Sure sp=Superoxide dismutase [Mn], mitochondrial; Nicotiana plumbaginifolia (Leadwort-leaved tobacco) (Tex-Mex tobacco).Q9FSJ2 380; 383 Superoxide dismutase (Fragment) OS=Hevea brasiliensis GN=sod PE=2 SV=1
10 olive_transcript_008909 - - B9T876 - Minor allergen Alt a, putative OS=Ricinus communis GN=RCOM_0066700 PE=3 SV=1
11 olive_transcript_009735 - - W9RZW9 - Minor allergen Alt a 7 OS=Morus notabilis GN=L484_009041 PE=3 SV=1
12 olive_transcript_010769 * Complete Sure sp=Probable calcium-binding protein CML13; Arabidopsis thaliana (Mouse-ear cress).Q2KM81 1070; 3105 Polcalcin OS=Artemisia vulgaris PE=2 SV=1
13 olive_transcript_018199 C_terminal Putative sp=Peptidyl-prolyl cis-trans isomerase 1; Glycine max (Soybean) (Glycine hispida).Q8L5T1 134 Peptidyl-prolyl cis-trans isomerase OS=Betula pendula GN=ppiase (CyP) PE=2 SV=1
14 olive_transcript_027589 * C_terminal Putative sp=Profilin; Litchi chinensis (Lychee).Q2PQ57 449 Profilin OS=Litchi chinensis PE=2 SV=1
POLLEN TRANSCRIPTOME ALLERGOME – UNIPROT ALLERGENS
Nuevos
alérgenos sin
describir
Nuevas profilinas y
variantes de
alérgenos conocidos
http://reprolive.eez.csic.es/
Búsquedas semánticas
COLABORACIÓN:
José Aldana
AutoFlow: automatización de «workflows»
13
Figure 4
Time(hours)
Total_time
Euler_assembling_k_25
Euler_assembling_k_29
MIRA3_assembling
Euler_remove_artifacts_k_25
Euler_remove_artifacts_k_259
validate_contigs_with_mapping_k_25
validate_contigs_with_mapping_k_29
rescue_unmapped_contigs_k_25
rescue_unmapped_contigs_k_29
recover_MIRA3_debris
MIRA3_remove_artifacts
CAP3_reconciliation_k_25
CAP3_reconciliation_k_29
FLN_analysis_of_CAP3_contigs_k_25
FLN_analysis_of_CAP3_contigs_k_29
TIDs
choose_best_assembly+cp_best_assembly
AutoFlow, a Versatile Workflow Engine Illustrated by Assembling an
Optimised de novo Transcriptome for a Non-Model Species, such as Faba
Bean (Vicia faba)
Running title: AutoFlow, a versatile workflow engine
Pedro Seoane1
, Sara Ocaña2
, Rosario Carmona3
, Rocío Bautista3
, Eva Madrid4
,
Ana M. Torres2
, M. Gonzalo Claros1,3,*
Mi bioinformática para el IBIMA
Mi bioinformática para el IBIMA
Mi bioinformática para el IBIMA

Weitere ähnliche Inhalte

Ähnlich wie Mi bioinformática para el IBIMA

Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2Sophia Banton
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISIRJET Journal
 
Managing the analysis of high-throughput data
Managing the analysis of high-throughput dataManaging the analysis of high-throughput data
Managing the analysis of high-throughput dataJavier Quílez Oliete
 
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...EL-Hachemi Guerrout
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsPriscill Orue Esquivel
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Manuel Martín
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET Journal
 
Plant Disease Prediction Using Image Processing
Plant Disease Prediction Using Image ProcessingPlant Disease Prediction Using Image Processing
Plant Disease Prediction Using Image ProcessingIRJET Journal
 
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNINGA SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNINGIRJET Journal
 
IRJET - Plant Disease Detection using Decision Tree Algorithm and Automated D...
IRJET - Plant Disease Detection using Decision Tree Algorithm and Automated D...IRJET - Plant Disease Detection using Decision Tree Algorithm and Automated D...
IRJET - Plant Disease Detection using Decision Tree Algorithm and Automated D...IRJET Journal
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtaiSirris
 
Emerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process AnalysisEmerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process AnalysisEmerson Exchange
 

Ähnlich wie Mi bioinformática para el IBIMA (20)

Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2Cardiology_Metabolomics_workshop_2016_v2
Cardiology_Metabolomics_workshop_2016_v2
 
Medical science
Medical scienceMedical science
Medical science
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 
Managing the analysis of high-throughput data
Managing the analysis of high-throughput dataManaging the analysis of high-throughput data
Managing the analysis of high-throughput data
 
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...Medical Image Segmentation Using Hidden Markov Random Field  A Distributed Ap...
Medical Image Segmentation Using Hidden Markov Random Field A Distributed Ap...
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
 
2015 ipst-i
2015 ipst-i2015 ipst-i
2015 ipst-i
 
2015-IPST-I
2015-IPST-I2015-IPST-I
2015-IPST-I
 
2015 ipst-i
2015 ipst-i2015 ipst-i
2015 ipst-i
 
Disease Prediction Using Machine Learning
Disease Prediction Using Machine LearningDisease Prediction Using Machine Learning
Disease Prediction Using Machine Learning
 
Plant Disease Prediction Using Image Processing
Plant Disease Prediction Using Image ProcessingPlant Disease Prediction Using Image Processing
Plant Disease Prediction Using Image Processing
 
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNINGA SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
 
Madhavi tippani
Madhavi tippaniMadhavi tippani
Madhavi tippani
 
IRJET - Plant Disease Detection using Decision Tree Algorithm and Automated D...
IRJET - Plant Disease Detection using Decision Tree Algorithm and Automated D...IRJET - Plant Disease Detection using Decision Tree Algorithm and Automated D...
IRJET - Plant Disease Detection using Decision Tree Algorithm and Automated D...
 
2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai2 partners ed_kickoff_dtai
2 partners ed_kickoff_dtai
 
ODVSML_Presentation
ODVSML_PresentationODVSML_Presentation
ODVSML_Presentation
 
Emerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process AnalysisEmerson Exchange 3D plots Process Analysis
Emerson Exchange 3D plots Process Analysis
 

Mehr von M. Gonzalo Claros

Manuscritos-a-bioinfo Olimipadas.pdf
Manuscritos-a-bioinfo Olimipadas.pdfManuscritos-a-bioinfo Olimipadas.pdf
Manuscritos-a-bioinfo Olimipadas.pdfM. Gonzalo Claros
 
Genoma humano con fósiles.pdf
Genoma humano con fósiles.pdfGenoma humano con fósiles.pdf
Genoma humano con fósiles.pdfM. Gonzalo Claros
 
Genes, genomas y ordenadores.pdf
Genes, genomas y ordenadores.pdfGenes, genomas y ordenadores.pdf
Genes, genomas y ordenadores.pdfM. Gonzalo Claros
 
210531 Covid-19 and bioinformatics
210531 Covid-19 and bioinformatics210531 Covid-19 and bioinformatics
210531 Covid-19 and bioinformaticsM. Gonzalo Claros
 
Redacta, corrige y traduce textos científicos sin morir en el intento
Redacta, corrige y traduce textos científicos sin morir en el intentoRedacta, corrige y traduce textos científicos sin morir en el intento
Redacta, corrige y traduce textos científicos sin morir en el intentoM. Gonzalo Claros
 
191129 aeter19 mgc slideshare
191129 aeter19 mgc slideshare191129 aeter19 mgc slideshare
191129 aeter19 mgc slideshareM. Gonzalo Claros
 
191128 corrigere2 slideshare
191128 corrigere2 slideshare191128 corrigere2 slideshare
191128 corrigere2 slideshareM. Gonzalo Claros
 
181214 Bioinformática vegetal
181214 Bioinformática vegetal181214 Bioinformática vegetal
181214 Bioinformática vegetalM. Gonzalo Claros
 
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancerM. Gonzalo Claros
 
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
180427 Traducir, redactar y corregir: no solo de ciencia vive la cienciaM. Gonzalo Claros
 
Cómo traducir y redactar textos científicos en español
Cómo traducir y redactar textos científicos en españolCómo traducir y redactar textos científicos en español
Cómo traducir y redactar textos científicos en españolM. Gonzalo Claros
 
170602 Traducir química sin saber química
170602 Traducir química sin saber química170602 Traducir química sin saber química
170602 Traducir química sin saber químicaM. Gonzalo Claros
 
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...M. Gonzalo Claros
 
De los rasgos poligénicos a los poligenómicos 250517
De los rasgos poligénicos a los poligenómicos 250517De los rasgos poligénicos a los poligenómicos 250517
De los rasgos poligénicos a los poligenómicos 250517M. Gonzalo Claros
 
Bioinformatics and the logic of life
Bioinformatics and the logic of lifeBioinformatics and the logic of life
Bioinformatics and the logic of lifeM. Gonzalo Claros
 
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606M. Gonzalo Claros
 
Bioinformática: desde las proteínas mitocondriales a la genómica
Bioinformática: desde las proteínas mitocondriales a la genómicaBioinformática: desde las proteínas mitocondriales a la genómica
Bioinformática: desde las proteínas mitocondriales a la genómicaM. Gonzalo Claros
 

Mehr von M. Gonzalo Claros (20)

Manuscritos-a-bioinfo Olimipadas.pdf
Manuscritos-a-bioinfo Olimipadas.pdfManuscritos-a-bioinfo Olimipadas.pdf
Manuscritos-a-bioinfo Olimipadas.pdf
 
Genoma humano con fósiles.pdf
Genoma humano con fósiles.pdfGenoma humano con fósiles.pdf
Genoma humano con fósiles.pdf
 
Genes, genomas y ordenadores.pdf
Genes, genomas y ordenadores.pdfGenes, genomas y ordenadores.pdf
Genes, genomas y ordenadores.pdf
 
210531 Covid-19 and bioinformatics
210531 Covid-19 and bioinformatics210531 Covid-19 and bioinformatics
210531 Covid-19 and bioinformatics
 
Redacta, corrige y traduce textos científicos sin morir en el intento
Redacta, corrige y traduce textos científicos sin morir en el intentoRedacta, corrige y traduce textos científicos sin morir en el intento
Redacta, corrige y traduce textos científicos sin morir en el intento
 
191129 aeter19 mgc slideshare
191129 aeter19 mgc slideshare191129 aeter19 mgc slideshare
191129 aeter19 mgc slideshare
 
191128 corrigere2 slideshare
191128 corrigere2 slideshare191128 corrigere2 slideshare
191128 corrigere2 slideshare
 
181214 Bioinformática vegetal
181214 Bioinformática vegetal181214 Bioinformática vegetal
181214 Bioinformática vegetal
 
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
 
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
 
Cómo traducir y redactar textos científicos en español
Cómo traducir y redactar textos científicos en españolCómo traducir y redactar textos científicos en español
Cómo traducir y redactar textos científicos en español
 
Vengo a hablar de mi libro
Vengo a hablar de mi libroVengo a hablar de mi libro
Vengo a hablar de mi libro
 
170602 Traducir química sin saber química
170602 Traducir química sin saber química170602 Traducir química sin saber química
170602 Traducir química sin saber química
 
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
 
De los rasgos poligénicos a los poligenómicos 250517
De los rasgos poligénicos a los poligenómicos 250517De los rasgos poligénicos a los poligenómicos 250517
De los rasgos poligénicos a los poligenómicos 250517
 
160620 sole nomics v2
160620 sole nomics v2160620 sole nomics v2
160620 sole nomics v2
 
150522 bioinfo gis lr
150522 bioinfo gis lr150522 bioinfo gis lr
150522 bioinfo gis lr
 
Bioinformatics and the logic of life
Bioinformatics and the logic of lifeBioinformatics and the logic of life
Bioinformatics and the logic of life
 
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
 
Bioinformática: desde las proteínas mitocondriales a la genómica
Bioinformática: desde las proteínas mitocondriales a la genómicaBioinformática: desde las proteínas mitocondriales a la genómica
Bioinformática: desde las proteínas mitocondriales a la genómica
 

Kürzlich hochgeladen

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 

Kürzlich hochgeladen (20)

Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 

Mi bioinformática para el IBIMA

  • 1. Análisis masivo de expresión, SNP, CNV y biomarcadores M. Gonzalo Claros Rocío Bautista, Pedro Seoane, Hicham Benzekri, Isabel González Gayte, Rosario Carmona, Darío Guerrero-Fernández, Rafael Larrosa, Macarena Arroyo Noé Fernández-Pozo, David Velasco
  • 3. Micromatrices de dos colores 3 BioMed Central Page 1 of 13 (page number not for citation purposes) BMC Bioinformatics Open AccessSoftware PreP+07: improvements of a user friendly tool to preprocess and analyse microarray data Victoria Martin-Requena1, Antonio Muñoz-Merida1, M Gonzalo Claros2 and Oswaldo Trelles*1 Address: 1Computer Architecture department, University of Málaga, Málaga, Spain and 2Molecular Biology and Biochemistry department, University of Málaga, Málaga, Spain Email: Victoria Martin-Requena - vickymr@ac.uma.es; Antonio Muñoz-Merida - amunoz@uma.es; M Gonzalo Claros - claros@uma.es; Oswaldo Trelles* - ots@ac.uma.es * Corresponding author Abstract Background: Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way. Results: PreP+07 is a standalone Windows XP application that presents a friendly interface for spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error removal and statistical analyses. Additionally, it contains two unique implementation of the procedures – double scan and Supervised Lowess-, a complete set of graphical representations – MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor Limma packages were statistically identical when the data set was only normalized; however, a slight variability was appreciated when the data was both normalized and scaled. Conclusion: PreP+07 implementation provides a high degree of freedom in selecting and organizing a small set of widely used data processing protocols, and can handle many data formats. Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+07 without any programming skills. All of this gives support to scientists that have been using previous PreP releases since its first version in 2003. Published: 12 January 2009 BMC Bioinformatics 2009, 10:16 doi:10.1186/1471-2105-10-16 Received: 29 August 2008 Accepted: 12 January 2009 This article is available from: http://www.biomedcentral.com/1471-2105/10/16 © 2009 Martin-Requena et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. BioMed Central Page 1 of 13 (page number not for citation purposes) BMC Bioinformatics Open AccessSoftware PreP+07: improvements of a user friendly tool to preprocess and analyse microarray data Victoria Martin-Requena1, Antonio Muñoz-Merida1, M Gonzalo Claros2 and Oswaldo Trelles*1 Address: 1Computer Architecture department, University of Málaga, Málaga, Spain and 2Molecular Biology and Biochemistry department, University of Málaga, Málaga, Spain Email: Victoria Martin-Requena - vickymr@ac.uma.es; Antonio Muñoz-Merida - amunoz@uma.es; M Gonzalo Claros - claros@uma.es; Oswaldo Trelles* - ots@ac.uma.es * Corresponding author Abstract Background: Nowadays, microarray gene expression analysis is a widely used technology that scientists handle but whose final interpretation usually requires the participation of a specialist. The need for this participation is due to the requirement of some background in statistics that most users lack or have a very vague notion of. Moreover, programming skills could also be essential to analyse these data. An interactive, easy to use application seems therefore necessary to help researchers to extract full information from data and analyse them in a simple, powerful and confident way. Results: PreP+07 is a standalone Windows XP application that presents a friendly interface for spot filtration, inter- and intra-slide normalization, duplicate resolution, dye-swapping, error removal and statistical analyses. Additionally, it contains two unique implementation of the procedures – double scan and Supervised Lowess-, a complete set of graphical representations – MA plot, RG plot, QQ plot, PP plot, PN plot – and can deal with many data formats, such as tabulated text, GenePix GPR and ArrayPRO. PreP+07 performance has been compared with the equivalent functions in Bioconductor using a tomato chip with 13056 spots. The number of differentially expressed genes considering p-values coming from the PreP+07 and Bioconductor Limma packages were statistically identical when the data set was only normalized; however, a slight variability was appreciated when the data was both normalized and scaled. Conclusion: PreP+07 implementation provides a high degree of freedom in selecting and organizing a small set of widely used data processing protocols, and can handle many data formats. Its reliability has been proven so that a laboratory researcher can afford a statistical pre-processing of his/her microarray results and obtain a list of differentially expressed genes using PreP+07 without any programming skills. All of this gives support to scientists that have been using previous PreP releases since its first version in 2003. Published: 12 January 2009 BMC Bioinformatics 2009, 10:16 doi:10.1186/1471-2105-10-16 Received: 29 August 2008 Accepted: 12 January 2009 This article is available from: http://www.biomedcentral.com/1471-2105/10/16 © 2009 Martin-Requena et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. En conclusión MADE4-2C es capaz de detectar errores en la intensidad de la señal, en el lavado, la hibridación, el marcaje con el fluoróforo, las agujas de impresión y la calidad de las sondas impresas. Esto ayuda a evitar que los resultados se basen en las variaciones técnicas en lugar de en las variacio- nes biológicas. Además, ofrece toda la información en un informe denso pero comprensible para el in- vestigador, lo que permite una buena evaluación del experimento sin tener unos conocimientos avanza- dos sobre micromatrices. 9.2.3. Descarte de sondas fallidas Una vez que se proporciona información al usua- rio sobre la calidad de los datos originales que quie- re analizar, MADE4-2C procede a la corrección del ruido de fondo utilizando normexp ([184]) y genera las gráficas MA que muestran cómo quedan los da- tos tras corregir el fondo (figuras 2.10 y 2.11, apén- dice B). A continuación se muestran las sondas que se uti- lizarán en el experimento y las que se descartarán. Una sonda se descartará siempre cuando su punto está vacío según la información del fichero GAL, o cuando la sonda contiene una secuencia artefactual o mal caracterizada (información que se incorporó desde el fichero BadSpots.txt). Existen dos moti- vos de rechazo que solo afectan a algunas sondas en una micromatriz, pero no tiene por qué afectar a las demás réplicas: El punto correspondiente a la sonda no se im- primió o es de baja calidad, lo que viene indica- do por su peso específico a partir de los campos flags y area. La corrección del ruido de fondo con normexp ha marcado la sonda como descartable. La tolerancia a estos fallos es controlable median- te un parámetro del fichero de configuración (véase el apéndice D) que indica el número de réplicas fa- llidas permitidas para cada sonda en el experimento que se analiza. Lo recomendable es que se retire la sonda en todas las micromatrices en cuanto falle una de las réplicas por cualquiera de los motivos anteriores, aunque teóricamente el análisis se pue- de realizar con tal que una sonda tenga dos o más réplicas valores de intensidad válidos. En el caso de los experimentos analizados sobre la expresión gé- sis (figura 2.12, apéndice B). Es de esperar que este filtro no retire más del 15 % de las sondas [184] co- mo se muestra en la figura 2.12 del apéndice B. En cambio, es recomendable repetir el experimento si se acaban descartando más del 15 % de las sondas, como se muestra en la figura 9.4. Figura 9.4: Ejemplo de figura generada por MADE4-2C para indicar que se han descarta- do demasiadas sondas impresas para el análisis posterior. 9.2.4. Normalización La normalización de los datos tiene en cuenta las réplicas técnicas para confirmar que los valo- res de expresión no introducen más variabilidad de la que había antes de la normalización, y que nin- guno de los marcajes con fluoróforos añade nin- gún tipo de sesgo a los datos. Aunque son mu- chos los métodos de normalización que se han pro- puesto, todavía no hay un consenso claro de que un método sea el mejor frente a las diferentes condiciones experimentales posibles [45], y pues- to que el método de normalización utilizado es uno de los factores que más afectará posteriormen- te a la detección de GED [187, 98, 45], y es po- sible obtener mejores resultados combinando dos de ellos [187], MADE4-2C lleva a cabo la norma- lización de modo independiente con varios méto- dos: Print-tip loess [207], Print-tip loess + scale, Print-tip loess + quantile [28], con la función normalizeBetweenArrays de limma, y por último, VSN [62] y VSN + Print-tip loess [45]. 9.3. IDENTIFICACIÓN DE UNA MUESTRA PROBLEMÁTIC Figura 9.9: Correlación negativa de las réplicas detectada en los experimentos de brotes y hojas de pinsapo. naturales de Sierra Bermeja (Málaga), que se hi- bridaron con el Pinarray1 y con una micromatriz con secuencias de pino obtenidas por hibridación sustractiva por supresión, llamada SSH-Ma (apar- tado 8.1). A continuación se presenta el diseño del experimento y los datos obtenidos al hibridrar con SSH-Ma por ser donde se observó este comporta- miento originalmente. Las réplicas del experimento se organizan del siguiente modo: Individuo 1-Sur, hibridado en la micromatriz 10a marcando la muestra de madera madura con Cy3 y la de madera juvenil con Cy5. La micromatriz se dividió en dos réplicas técnicas 10a-A y 10a-Z. Individuo 1-Norte, hibridado en la microma- triz 22a marcando la muestra de madera madu- ra con Cy3 y la de madera juvenil con Cy5. La micromatriz se dividió en dos réplicas técnicas 22a-A y 22a-Z. Individuo 2-Norte, hibridado en la micro- matriz 23a, con intercambio de fluoróforos en relación a las hibridaciones anteriores, marcan- do la muestra de madera madura con Cy5 y la de madera juvenil con Cy3. La micromatriz se dividió en dos réplicas técnicas 23a-A y 23a-Z. Individuo 3-Sur, hibridado en la micromatriz 24a, con intercambio de fluoróforos en relación a las dos primeras micromatrices, marcando la madera vidió en Distancia Correlaci Figura 9. tancias y c nes realizad réplicas téc en el texto En el aná tados no mo tancias entre plicas técnic ra 9.10), lo q bien hecho. P se observó q quedar emp que llevaba del resto de (figura 9.10 tearnos si ca comportami la búsqueda 2C permite tuaciones se con la librer patrones de ral, aunque mediciones d ORIGINAL PAPER Gene expression profiling in the stem of young maritime pine trees: detection of ammonium stress-responsive genes in the apex Javier Canales • Concepcio´n A´ vila • Francisco R. Canto´n • David Pacheco-Villalobos • Sara Dı´az-Moreno • David Ariza • Juan J. Molina-Rueda • Rafael M. Navarro-Cerrillo • M. Gonzalo Claros • Francisco M. Ca´novas Received: 25 May 2011 / Revised: 30 August 2011 / Accepted: 12 September 2011 Ó Springer-Verlag 2011 Abstract The shoots of young conifer trees represent an interesting model to study the development and growth of conifers from meristematic cells in the shoot apex to dif- ferentiated tissues at the shoot base. In this work, micro- array analysis was used to monitor contrasting patterns of gene expression between the apex and the base of maritime pine shoots. A group of differentially expressed genes were selected and validated by examining their relative expres- sion levels in different sections along the stem, from the top to the bottom. After validation of the microarray data, additional gene expression analyses were also performed in the shoots of young maritime pine trees exposed to dif- ferent levels of ammonium nutrition. Our results show that the apex of maritime pine trees is extremely sensitive to conditions of ammonium excess or deficiency, as revealed by the observed changes in the expression of stress- responsive genes. This new knowledge may be used to precocious detection of early symptoms of nitrogen nutritional stresses, thereby increasing survival and growth rates of young trees in managed forests. Keywords Conifers Á Pine development Á Nitrogen Á Ammonium nutrition Á Transcriptional regulation Introduction Forests are essential components of the ecosystems, and they play a fundamental role in the regulation of terrestrial carbon sinks. Coniferous forests dominate large ecosys- tems in the Northern Hemisphere and include a broad variety of woody plant species, some ranking as the largest, tallest, and longest living organisms on Earth (Farjon 2010). Conifers are the most important group of gymno- sperms and have evolved very efficient physiological adaptation systems after the separation from angiosperms, which occurred more than 300 million years ago. Conifer trees are also of great economic importance, as they are major sources for timber, oleoresin, and paper production. Maritime pine (Pinus pinaster Aiton) stands are dis- tributed in the southwestern area of the Mediterranean region. P. pinaster dominates the forest scenario in France, Spain and Portugal, where this is the most widely planted species in about 4 million hectares. The maritime pine is particularly tolerant to abiotic stresses showing relatively high-levels of intra-specific variability (Aranda et al. 2010). The maritime pine is also the most advanced conifer Communicated by K. Klimaszewska. Electronic supplementary material The online version of this article (doi:10.1007/s00468-011-0625-z) contains supplementary material, which is available to authorized users. J. Canales Á C. A´ vila Á F. R. Canto´n Á D. Pacheco-Villalobos Á S. Dı´az-Moreno Á J. J. Molina-Rueda Á M. G. Claros Á F. M. Ca´novas (&) Departamento de Biologı´a Molecular y Bioquı´mica, Facultad de Ciencias, Instituto Andaluz de Biotecnologı´a, Campus Universitario de Teatinos, Universidad de Ma´laga, Trees DOI 10.1007/s00468-011-0625-z 30 s at 72°C). The fluorescence signal was captured at the end of each extension step and melting curve analysis was performed from 60 to 95°C. The PCR products were ver- ified by melting point analysis at the end of each experi- ment, and, during protocol development, by gel electrophoresis. The baseline calculation and starting concentration (N0) per sample of the amplification reactions were estimated directly from raw fluorescence data using the LinReg 11.3 program (Ruijter et al. 2009). The relative expression levels were obtained from the ratio between the N0 of the target gene and the normalisation factor. We used the geometric mean of three control genes (actin, 40S ribo- somal protein and elongation factor 1 alpha) to calculate the normalisation factor (Vandesompele et al. 2002). Ref- erence genes were selected based on their stable expression in the microarrays. Furthermore, these genes were stably expressed in all conditions and tissue portions examined as determined by statistical analysis using Normfinder (Andersen et al. 2004). Results and discussion Differential gene expression between the apex and the base of maritime pine shoots The differential gene expression was analysed in maritime pine stems using microarrays. Intact total RNA was extracted from the apex and the basal part of the stems, labelled with CyDye and hybridised to slides of PINAR- RAY, a maritime pine microarray constructed in our lab- oratory. Microarray data were lowess normalised to account for intensity-dependent differences between channels. After normalisation, the dye-swap replicates did not show strong deviations from linearity, proving a low dye bias. The comparisons between replicates showed a high degree of reproducibility, with Pearson’s correlation coefficients of approximately 0.98. Similar transcriptomic analyses have been previously performed in Sitka spruce (Friedmann et al. 2007). Microarray analyses were also used for transcript profiling in differentiating xylem of loblolly pine and white spruce (Yang et al. 2004; Pavy et al. 2008). Genes differentially expressed at the apical and the basal parts of the maritime pine stem were identified by bioin- formatic analysis of hybridisation signals in the microarray, using a cut-off t test p value 0.05 and a fold change genes encoding photosynthetic proteins, including those located in the thylakoid membranes involved in the photosystems I and II, light-harvesting complexes, as well as soluble proteins of the plastid stroma such as the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygen- ase (Rubisco SSU; EC 4.1.1.39), were particularly abun- dant. This part of the stem contains the shoot apical meristem which drives stem growth and develops new needles requiring the biosynthesis of proteins for the pho- tosynthetic machinery. Also abundant were transcripts for lipid transfer proteins (LPT), metallothionein-like proteins (MT) and stress proteins such as an antimicrobial peptide (AMP), a putative dehydrin and a late embryogenesis abundant protein. The expression of stress-related genes has also been reported in the apical shoot meristem of Sitka spruce where they may be involved in the protection of meristematic cells against mechanical wounding or insect attack (Ralph et al. 2006). Interestingly, a number of genes involved in lignin biosynthesis and cell wall formation were also upregulated in the apical part of the maritime pine stem. These included a putative cinnamoyl-CoA reductase (EC 1.2.1.44), a serine-hydroxymethyltransferase (EC 2.1.2.1), xyloglucan endotransglycosylases (EC 2.4.1.207), an endo-1,4-b-mannosidase (EC 3.2.1.78), a putative proline-rich arabinogalactan and a germin-like Fig. 1 Graphical representation of the microarray data analysis. Trees ammonium excess. We have previously report ammonium excess and deficiency trigger changes transcriptome of maritime pine roots (Canales 2010). The differential expression patterns of a of representative genes suggested the existe potential links between ammonium-responsive ge genes involved in amino acid metabolism, particu asparagine biosynthesis and utilisation (Canales 2010). The results reported here indicate that th bolic changes observed in roots are transmitted stem apex. This fact implies the existence of a s signal that may represent a part of the respo maritime pine seedlings to nutritional stress by nium. The nature of this systemic signal is p unknown; however, we can speculate that altered of organic nitrogen in the form of asparagine involved. High-levels of this amino acid accumu pine hypocotyls and a role of asparagine in nitro allocation has been proposed (Can˜as et al. 2006). asparagine is a vehicle for nitrogen transport in and it is well known that there is a stress- asparagine accumulation in response to minera ciencies, drought or pathogen attack (Lea et al. Fig. 5 Genes differentially expressed in maritime pine stems in response to ammonium excess (E) or deficiency (D) identified by microarray analysis. Log expression ratio values from each treatment were represented as heatmaps 12 RESEARCH ARTICLE Open Access Reprogramming of gene expression during compression wood formation in pine: Coordinated modulation of S-adenosylmethionine, lignin and lignan related genes David P Villalobos1,2 , Sara M Díaz-Moreno1,3 , El-Sayed S Said1 , Rafael A Cañas1 , Daniel Osuna1,4 , Sonia H E Van Kerckhoven1 , Rocío Bautista1 , Manuel Gonzalo Claros1 , Francisco M Cánovas1 and Francisco R Cantón1* Abstract Background: Transcript profiling of differentiating secondary xylem has allowed us to draw a general picture of the genes involved in wood formation. However, our knowledge is still limited about the regulatory mechanisms that coordinate and modulate the different pathways providing substrates during xylogenesis. The development of compression wood in conifers constitutes an exceptional model for these studies. Although differential expression of a few genes in differentiating compression wood compared to normal or opposite wood has been reported, the broad range of features that distinguish this reaction wood suggest that the expression of a larger set of genes would be modified. Villalobos et al. BMC Plant Biology 2012, 12:100 http://www.biomedcentral.com/1471-2229/12/100 using the Pine Gene Index database (Additional file 3). Sequences that matched with the same entry in the data- base were assumed to represent the same gene. There- fore, the final numbers of unigenes were reduced to 331 for Cx and 165 for Ox. Most of these genes showed sig- nificant similarities to sequences in databases (293 in Cx and 145 in Ox), although some of them were similar to sequences with unknown function (49 in Cx and 45 in Ox). The number of unigenes with no significant simi- larity was low in both cases (38 in Cx and 20 in Ox). The genes with assigned function were grouped into functional categories using the Arabidopsis thaliana Mun- ich Information Centre for Protein Sequences (MIPS) database, and suppression of redundancy in MIPS funcat assignations by decision according to their most probable role in xylem development (Additional file 3). In keeping with the greater number of genes identified as up- Figure 3 Volcano plots of microarray analyses to identify genes differentially expressed during compression and opposite wood formation. The common logarithm of the p-value was represented as a function of the binary logarithm of the background-corrected and normalized opposite:compression fluorescence ratio (log2 Fold Change) for each spot. Vertical bars delimit the spots showing up-regulation in developing compression xylem by at least 1.5-fold compared to developing opposite xylem (Up-regulated in Cx) or spots showing up-regulation in developing opposite xylem by at least 1.5-fold compared to developing compression xylem (Up-regulated in Ox). The horizontal line delimits the spots showing significant up-regulation under the criteria of an adjusted p-value ≤ 0.001. Therefore, the upper left and right sectors delimited by the horizontal and vertical lines include the spots (in red) containing probes for genes significantly up-regulated in developing compression or opposite xylem respectively. The number of spots corresponding with genes significantly up- regulated in Cx or Ox are shown in the top side of the sector. (a) Results from the analysis of microarray 1 constructed with cDNA clones from the composite library. (b) Results from the analysis of microarray 2 constructed with cDNA clones from subtractive libraries. Villalobos et al. BMC Plant Biology 2012, 12:100 Page 5 of 17 http://www.biomedcentral.com/1471-2229/12/100
  • 4. Otros tipos de micromatrices 4 Percentage of use in different testing methods of the different R package, background correction, normalization and transformation functions available Dataset1 Dataset2 Dataset1 Dataset2 Average Control type 1(%) Control type 2(%) (%) Package beadarray 16.0 11.1 15.0 12.5 13.7 lumi 84.0 88.9 85.0 87.5 86.3 Normalization loess (lumi) 11.1 18.5 12.5 17.9 15.0 median(beadarray) 3.7 0.0 2.5 0.0 1.6 qspline(beadarray) 2.5 1.9 2.5 1.8 2.2 quantile (lumi) 17.3 22.2 17.5 25.0 20.5 quantile (beadarray) 3.7 1.9 3.8 3.6 3.2 rankinvariant 9.9 0.0 10.0 0.0 5.0 rsn (lumi) 13.6 20.4 12.5 19.6 16.5 rsn(beadarray) 2.5 1.9 2.5 0.0 1.7 ssn(lumi) 13.6 0.0 13.8 0.0 6.8 vsn (lumi) 18.5 27.8 18.8 26.8 23.0 vsn(beadarray) 3.7 5.6 3.8 5.4 4.6 Transformation log2(lumi) 29.6 29.6 30.0 28.6 29.5 log2(Beadarray) 6.2 1.9 6.3 1.8 4.0 vst(lumi) 27.2 25.9 27.5 25.0 26.4 vst(beadarray) 4.9 7.4 5.0 7.1 6.1 cubicroot 9.9 20.4 8.8 19.6 14.7 none 22.2 14.8 22.5 17.9 19.3 Background correction bgAdjust (lumi) 22.2 24.1 22.5 23.2 23.3 bgAdjust.Affy(lumi) 14.8 14.8 15.0 14.3 14.7 forcePositive(lumi) 23.5 27.8 23.8 26.8 26.1 none (lumi) 23.5 22.2 23.8 23.2 23.1 none(beadarray) 16.0 11.1 15.0 12.5 13.7 BedArray (Illumina) Agilent Determinar el mejor protocolo Preprocesamiento Corrección)de) ruido)de)fondo Normalización)de) los)datos Media)de)los)puntos) replicados Expresión0diferencial Comparaciones) Estimación) variabilidad)media) por)eBayes Filtro)por)P)y) logFC Target Datos0crudos Diseño0 experimental Genes0expresados0 diferencialmente COLABORACIÓN: Fernando Cardona
 Juan A. G. Ranea
  • 5. Micromatrices de Affymetrix 5 On Selecting the Best Pre-processing Method for Affymetrix Genechips J.P. Florido1 , H. Pomares1 , I. Rojas1 , J.C. Calvo1 , J.M. Urquiza1 , and M. Gonzalo Claros2 1 Department of Computer Architecture and Computer Technology, University of Granada, Granada, Spain {jpflorido,hector}@ugr.es, {irojas,jccalvo,jurquiza}@atc.ugr.es 2 Department of Molecular Biology and Biochemistry, University of Málaga, Málaga, Spain claros@uma.es Abstract. Affymetrix High Oligonucleotide expression arrays, also known as Affymetrix GeneChips, are widely used for the high-throughput assessment of gene expression of thousands of genes simultaneously. Although disputed by several authors, there are non-biological variations and systematic biases that must be removed as much as possible before an absolute expression level for every gene is assessed. Several pre-processing methods are available in the literature and five common ones (RMA, GCRMA, MAS5, dChip and VSN) and two customized Loess methods are benchmarked in terms of data variability, similarity of data distributions and correlation coefficient among replicated slides in a variety of real examples. Besides, it will be checked how the variant and invariant genes can influence on preprocessing performance. 1 Introduction Microarray technology is a powerful tool used for the high-throughput assessment of gene expression of thousands of genes simultaneously which can be used to infer metabolic pathways, to characterize protein-protein interactions or to extract target genes for developing therapies for various diseases [1]. Several platforms are currently available, including the commonly used high oligonucleotide-based Affymetrix GeneChip® arrays. As described in [1], an Affymetrix GeneChip contains probe sets of 10-20 probe pairs representing unique genes. Each probe pair consists of two oligonucleotides of 25 bp in length, namely perfect match (PM) probes (the exact complement of an mRNA) and the mismatch (MM) probes (which are identical to the perfect match except that one base is changed at the center position). The MM probe is supposed to distinguish noise caused by non-specific hybridization from the specific hybridization signal, although some researchers recommend avoiding its use [17]. A typical microarray experiment has biological and technical sources of variation [2]. Biological variation results from tissue heterogeneity, genetic polymorphism, and changes in mRNA levels within cells and among individuals due to sex, age, race, genotype-environment interactions and other “living” factors. Biological variation is of interest to researchers as it reflects true variation among experiments. On the other Joan Cabestany Francisco Sandoval Alberto Prieto Juan M. Corchado (Eds.) Bio-Inspired Systems: Computational and Ambient Intelligence 10th International Work-Conference on Artificial Neural Networks, IWANN 2009 Salamanca, Spain, June 10-12, 2009 Proceedings, Part I 1 3 E↵ect of Pre-processing methods on Microarray-based SVM classifiers in A↵ymetrix Genechips J.P.Florido, H.Pomares, I.Rojas, J.M.Urquiza, L.J.Herrera, M.G.Claros Abstract— A↵ymetrix High Oligonucleotide expression arrays are widely used for the high-throughput assessment of gene expression of thousands of genes simultaneously. Although disputed by several authors, there are non-biological variations and systematic biases that must be removed as much as possible through the pre-processing step before an absolute expression level for every gene is assessed. It is important to evaluate microarray pre-processing procedures not only to the detection of di↵erentially expressed genes, but also to classification, since a major use of microarrays is the expression-based phenotype classification. Thus, in this paper, we use several cancer microarray datasets to assess the influence of five di↵erent pre-processing methods in Support Vector Machine-based classification methodologies with di↵erent kernels: linear, Radial Basis Functions (RBFs) and polynomial. I. Introduction Microarray technology is a powerful tool used for the high- throughput assessment of gene expression of thousands of genes simultaneously which can be used to infer metabolic pathways, to characterize protein-protein interactions or to extract target genes for developing therapies for various dis- eases [1]. Several platforms are currently available, including the commonly used high oligonucleotide-based A↵ymetrix GeneChip R arrays. As described in [1], an A↵ymetrix GeneChip contains probe sets of 10-20 probe pairs re- presenting unique genes. Each probe pair consists of two oligonucleotides of 25 bp in length, namely perfect match (PM) probes (the exact complement of an mRNA) and the mismatch (MM) probes (which are identical to the perfect match except that one base is changed at the center position). The MM probe is supposed to distinguish noise caused by non-specific hybridization from the specific hybridization signal, although some researchers recommend avoiding its use [2]. A typical microarray experiment has biological and technical sources of variation [3]. Biological variation results from tissue heterogeneity, genetic polymorphism, and changes in mRNA levels within cells and among individuals quality of array data. Therefore, since those systematic non- biological sources of variation mask real biological variation, significant pre-processing is required and involves four steps for A↵ymetrix GeneChips: background correction, normal- ization, PM correction and summarization [4]. Assessment of the e↵ectiveness of pre-processing has mainly been confined to the ability to detect di↵erentially ex- pressed genes [5] [6] or in terms of data variability, similarity in data distributions and correlation among replicates [7]. However, a major use of microarrays is phenotype classi- fication via expression-based classifiers: given a collection of gene expression profiles for tissue samples belonging to various cancer types, the goal is to build a classifier to automatically determine the cancer type of a new sample at high precision. Classifying cancer tissues based on their gene expression profiles has the promise of providing more reliable means to diagnose and predict various types of cancers [8], but the accuracy of these predictions may depend on the pre-processing method selected. Thus, in this work, several cancer microarray data sets are used to assess the e↵ect of di↵erent pre-processing methods (RMA, GCRMA, VSN, dChip and MAS5) in high- order analytical tasks such as classification using Support Vector Machines (SVMs) with three di↵erent kernels: Linear, Radial Basis Functions (RBFs) and polynomial. SVMs are usually preferred in microarray-based classification due to its outperformance compared to other paradigms, namely, k- Nearest Neighbors, backpropagation and probabilistic neural networks, weighted voting methods and decision trees [9] due to two special aspects of microarray data: high dimen- sionality and small sample size. Kernel methods represent one way to cope with the curse of dimensionality [8]. Previous related work about the e↵ect of pre-processing methods relative to classification has been focused on cDNA microarrays using k-Nearest Neighbor classi- fiers [10], [11], [12], Support Vector Machines [11], [12] presenting unique genes. Each probe pair consists of two oligonucleotides of 25 bp in length, namely perfect match (PM) probes (the exact complement of an mRNA) and the mismatch (MM) probes (which are identical to the perfect match except that one base is changed at the center position). The MM probe is supposed to distinguish noise caused by non-specific hybridization from the specific hybridization signal, although some researchers recommend avoiding its use [2]. A typical microarray experiment has biological and technical sources of variation [3]. Biological variation results from tissue heterogeneity, genetic polymorphism, and changes in mRNA levels within cells and among individuals due to sex, age, race, genotype-environment interactions and other ”living” factors. Biological variation is of interest to researchers as it reflects true variation among experiments. On the other hand, sample preparation, labeling, hybridiza- tion and other steps of microarray experiment can contribute to technical variation, which can significantly impact the J.P.Florido, H.Pomares, I.Rojas, J.M.Urquiza and L.J.Herrera are with the Department of Computer Architecture and Computer Technol- ogy, CITIC-UGR, University of Granada, Spain (corresponding author: jpflorido@ugr.es) M.G.Claros is with the Department of Molecular Biology and Bioche- mistry, University of Malaga, Spain Radial Basis Functions (RBFs) and polynomial. SVMs are usually preferred in microarray-based classification due to its outperformance compared to other paradigms, namely, k- Nearest Neighbors, backpropagation and probabilistic neural networks, weighted voting methods and decision trees [9] due to two special aspects of microarray data: high dimen- sionality and small sample size. Kernel methods represent one way to cope with the curse of dimensionality [8]. Previous related work about the e↵ect of pre-processing methods relative to classification has been focused on cDNA microarrays using k-Nearest Neighbor classi- fiers [10], [11], [12], Support Vector Machines [11], [12] and linear discriminant analysis, regular histogram, Gaussian kernel, perceptron and multiple perceptron with majority voting [12]. Instead, our study is related to A↵ymetrix Genechips microarray technology. Section II describes the main pre-processing methods existing in the literature for A↵ymetrix Genechips, section III introduces SVMs classifiers and section IV states experi- mental results. Conclusions are drawn in section V. II. Pre-processing Affymetrix Genechips Instead of describing how every pre-processing method (RMA, GCRMA, VSN, dChip and MAS5) works, they will 978-1-4244-8126-2/10/$26.00 ©2010 IEEE VSN performs statistically better (P < 0.05) than the others. So, these results suggest that RMA, VSN and dChip methods are the preferred ones, which is consistent with the results given in [7] and in terms of classification rate (Fig.1). Fig. 4. Means and 95% LSD intervals of the di↵erent pre-processing methods through the mean of Spearman Coe cient quality metric From Figs.2 and 4 and focusing on the RMA and GCRMA pre-processing methods, it can be observed the influence of the background correction step employed (Table I). In this case, there are statistical di↵erences (P < 0.05) in terms of data variability and Spearman correlation coe cient quality metrics between RMA and GCRMA preprocessing methods. These statistical di↵erences were also present in terms of misclassification rate (Fig.1). Although this work studies the e↵ect of pre-processing methods in terms of classification rate, it would be also interesting to study whether the number of genes selected in the feature selection step and the kernel method used in the SVM classifier a↵ect the results. From Fig.5, it can be observed that the accuracy of SVM is a↵ected by the number of genes selected by t-test. There are no statistical di↵erences (P > 0.05) when the number of genes selected varies from 10 to 400. On the other hand, when very few genes (5) are selected or the number is large (600-2000 and the whole chip) SVM’s performance gets worse. In the first case, the data does not contain enough discriminative information and, in the second case, per rad (P the ker dec con in w the Fig. kern I the Ge MA Ma lite di↵ plo sin our VS mis per PROCEEDINGS Open Access Gene expression pattern in swine neutrophils after lipopolysaccharide exposure: a time course comparison Gema Sanz-Santos1 , Ángeles Jiménez-Marín1 , Rocío Bautista2 , Noé Fernández2 , Gonzalo M Claros2 , Juan J Garrido1* From International Symposium on Animal Genomics for Animal Health (AGAH 2010) Paris, France. 31 May – 2 June 2010 Abstract Background: Experimental exposure of swine neutrophils to bacterial lipopolysaccharide (LPS) represents a model to study the innate immune response during bacterial infection. Neutrophils can effectively limit the infection by secreting lipid mediators, antimicrobial molecules and a combination of reactive oxygen species (ROS) without new synthesis of proteins. However, it is known that neutrophils can modify the gene expression after LPS exposure. We performed microarray gene expression analysis in order to elucidate the less known transcriptional response of neutrophils during infection. Methods: Blood samples were collected from four healthy Iberian pigs and neutrophils were isolated and incubated during 6, 9 and 18 hrs in presence or absence of lipopolysaccharide (LPS) from Salmonella enterica serovar Typhimurium. RNA was isolated and hybridized to Affymetrix Porcine GeneChip® . Microarray data were normalized using Robust Microarray Analysis (RMA) and then, differential expression was obtained by an analysis of variance (ANOVA). Results: ANOVA data analysis showed that the number of differentially expressed genes (DEG) after LPS treatment vary with time. The highest transcriptional response occurred at 9 hr post LPS stimulation with 1494 DEG whereas at 6 and 18 hr showed 125 and 108 DEG, respectively. Three different gene expression tendencies were observed: genes in cluster 1 showed a tendency toward up-regulation; cluster 2 genes showing a tendency for down-regulation at 9 hr; and cluster 3 genes were up-regulated at 9 hr post LPS stimulation. Ingenuity Pathway Analysis revealed a delay of neutrophil apoptosis at 9 hr. Many genes controlling biological functions were altered with time including those controlling metabolism and cell organization, ubiquitination, adhesion, movement or inflammatory response. Conclusions: LPS stimulation alters the transcriptional pattern in neutrophils and the present results show that the robust transcriptional potential of neutrophils under infection conditions, indicating that active regulation of gene Sanz-Santos et al. BMC Proceedings 2011, 5(Suppl 4):S11 http://www.biomedcentral.com/1753-6561/5/S4/S11 Finally, cluster 3 consists of 335 up-regulated genes. Functions associated with these molecules are related to cellular assembly and reorganization, cellular main- tenance and gene expression. Canonical pathways are related to protein ubiquitination signaling, PDGF sig- naling and IL-3 signaling which is involved in cell sur- vival by activation of JAK/STAT signaling and BCL2 [10]. Network 2 (Additional file 4) highlights NF-B interactions and covers several canonical pathways such as acute phase response signaling and interferon signaling. Inhibition of spontaneous apoptosis at 9 hrs Turnover of aging neutrophils occurs in the absence of activation through a process known as spontaneous Figure 2 Differentially expressed genes grouped into three different clusters. Cluster 1 contains 8 genes with up-regulation tendency through the time course. 747 genes belonging the cluster 2, with a down-regulation tendency at 9 hr. Opposite tendency can be observed in the cluster 3, where 335 genes show an up-regulation at 9 hr and down-regulation at 18 hr. UP DOWN hours 61 64 hours 388 1106 8 hours 50 58 61 388 50 64 1106 58 0 200 400 600 800 1000 1200 1400 1600 6 hours 9 hours 18 hours DOWN UP Figure 3 Differentially expressed genes in each time point. 125 and 108 genes were altered at 6 and 18 hr respectively, with a similar number of up and down-regulated genes. Most significant transcriptional changes were observed at 9 hr post LPS stimulation. 1106 genes were down-regulated and 388 were up-regulated. Sanz-Santos et al. BMC Proceedings 2011, 5(Suppl 4):S11 http://www.biomedcentral.com/1753-6561/5/S4/S11 Page 4 of 6 RESEARCH Open Access Pyroptosis and adaptive immunity mechanisms are promptly engendered in mesenteric lymph-nodes during pig infections with Salmonella enterica serovar Typhimurium Rodrigo Prado Martins1 , Carmen Aguilar1 , James E Graham2 , Ana Carvajal3 , Rocío Bautista4 , M Gonzalo Claros4 and Juan J Garrido1* Abstract In this study, we explored the transcriptional response and the morphological changes occurring in porcine mesenteric lymph-nodes (MLN) along a time course of 1, 2 and 6 days post infection (dpi) with Salmonella Typhimurium. Additionally, we analysed the expression of some Salmonella effectors in tissue to complete our view VETERINARY RESEARCH Martins et al. Veterinary Research 2013, 44:120 http://www.veterinaryresearch.org/content/44/1/120 node in the network diagram represented a gene and its relationship with other molecules was represented by a line (solid and dotted lines represent direct and indirect association respectively). Nodes with a red background were input genes detected in this study while grey nodes were molecules inserted by IPA based upon the Ingenuity Knowledge Base to produce a highly connected network. The score estimated the probability that a collection of genes equal to or greater than the number in a network could be achieved by chance alone. Scores of 3 or higher were considered to have a 99.9% confi- dence of not being generated by random chance alone. For statistical analysis of enriched functions/pathways, an IPA Knowledge Base was used as a reference set and the Fisher’s exact test was employed to estimate the signifi- cance of association. P-values below 0.05 were considered statistically significant. For graphical representation of the canonical pathways, the ratio indicates the percentage of genes taking part in a pathway that could be found in an uploaded data set and –log(p-value) means the level of confidence of association. The threshold line repre- sented a p-value of 0.05. Relative gene expression analysis by qPCR Real-time quantitative PCR (qPCR) assays were per- formed as previously described [11]. Fold change values were calculated by the 2−ΔΔCq method [17] using beta- actin as the reference gene. Afterwards, data were stan- dardized as proposed by Willems et al. [18] and analyzed by Kruskal–Wallis and Mann–Whitney tests using the software SPSS 15.0 for Windows (SPSS Inc, Chicago, IL, USA). Fold changes of 1 denoted no change in gene expression. Values lower and higher than 1 denoted down and up-regulation respectively. To be represented in Table 1, a fold change of down-regulated genes was calculated as −1/2−ΔΔCq . Primer pairs used for amplifications can be found as supporting information (see Additional file 1). Western blot analysis For protein extractions, MLN samples from all experi- mental animals were separately homogenized on ice with lysis buffer (7 M urea, 2 M thiourea, 4% w/v CHAPS, 0.5 mM PMSF) using a glass tissue-lyser and protein lysate concentration was determined using a Bradford Protein Assay (Bio-Rad). Subsequently, protein from in- dividual replicates belonging to the same group was pooled (30 ug total), electrophoretically fractionated in 12% (w/v) SDS-PAGE gels and transferred onto a PVDF membrane (Millipore, Bedford, MA, USA). Western blot assays were carried out as described by Martins et al. [10] employing the following primary antibodies: 4B7/8 for swine histocompatibility class I antigen (SLAI) detec- tion [19], 1 F12 for swine histocompatibility class II antigen (SLAII) detection [19], anti-CTLA4 (Epitomics, Burlingame, CA, USA) and anti-Clathrin light chain (ab24579, Abcam, Cambridge, UK). To confirm equal sample loading, membranes were reblotted with anti- GAPDH monoclonal antibody (GenScript, Picastaway, NJ, USA) and no statistical differences for GAPDH abundance were observed between groups in all assays. Membranes were scanned in an FLA-5100 imager Table 1 Microarray data validation by qPCR. Gene MICROARRAY qPCR Fold change BF Fold change p-value 1 dpi 2 dpi 6 dpi 1 dpi 2 dpi 6 dpi CD180 1.7 2.6 1.5 0.0000429 1.1 1.8 1.2 0.010 CD1A 1.1 −1.4 1.2 0.00047793 −1.4 −2.5 1.2 0.013 DAB2 −1.2 −2.6 −1.2 6.62E-13 −3.1 −6.5 −2.6 0.001 EIF4H −1.1 −1.1 −1.1 0.0000101 −1.5 −1.4 −1.8 0.021 ENPP6 1.3 2.0 −1.2 0.0000448 1.2 1.8 −1.7 0.000 F13A1 1.4 2.2 −1.1 0.00000227 1 1.7 −2.2 0.012 HLA-Bb 1.0 −1.1 −1.2 0.00023747 −1.4 −1.4 −1.9 0.047 HLA-DRB5b 1.0 −1.1 1.0 0.0000311 −1.4 −1.6 −2 0.036 HSPA1Ba 3.3 1.4 −1.1 0.0001166 2.5 1.4 −1.3 0.025 HSPH1 2.3 1.7 −1.0 0.00000424 1.5 1.1 −2 0.003 IL16 −1.0 −1.2 −1.1 8.12E-07 1 −1.1 −1.5 0.035 LPCAT2 1.2 2.3 1.0 0.0000146 1.4 2 −1.3 0.010 PSMC2 −1.0 −1.0 −1.1 0.00105861 −1.1 −1.4 −1.8 0.036 TRAC −1.0 −1.1 −1.1 0.00000951 −1.5 −1.8 −1.8 0.010 a Data from microarray analysis are mean values from two different probes. b Amplified with SLA-B and SLA-DRB5 primers. Martins et al. Veterinary Research 2013, 44:120 Page 3 of 14 http://www.veterinaryresearch.org/content/44/1/120
  • 6. A miRNA Signature Predictive of Early Recurrence Microarray de miRNA de Affymetrix 6 A microRNA Signature Associated with Early Recurrence in Breast Cancer Luis G. Pe´rez-Rivas1. , Jose´ M. Jerez2. , Rosario Carmona3 , Vanessa de Luque1 , Luis Vicioso4 , M. Gonzalo Claros3,5 , Enrique Viguera6 , Bella Pajares1 , Alfonso Sa´nchez1 , Nuria Ribelles1 , Emilio Alba1 , Jose´ Lozano1,5 * 1 Laboratorio de Oncologı´a Molecular, Servicio de Oncologı´a Me´dica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain, 2 Departamento de Lenguajes y Ciencias de la Computacio´n, Universidad de Ma´laga, Ma´laga, Spain, 3 Plataforma Andaluza de Bioinforma´tica, Universidad de Ma´laga, Ma´laga, Spain, 4 Servicio de Anatomı´a Patolo´gica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain, 5 Departmento de Biologı´a Molecular y Bioquı´mica, Universidad de Ma´laga, Ma´laga, Spain, 6 Departmento of Biologı´a Celular, Gene´tica y Fisiologı´a Animal, Universidad de Ma´laga, Ma´laga, Spain Abstract Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years, respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in 71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early- relapsing patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by public databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast surgery. Citation: Pe´rez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoS ONE 9(3): e91884. doi:10.1371/journal.pone.0091884 Editor: Sonia Rocha, University of Dundee, United Kingdom Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014 Copyright: ß 2014 Pe´rez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio de Economı´a, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucı´a (TIN-4026, to JJ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: jlozano@uma.es . These authors contributed equally to this work. Introduction Breast cancer comprises a group of heterogeneous diseases that can be classified based on both clinical and molecular features [1– 5]. Improvements in the early detection of primary tumors and the development of novel targeted therapies, together with the systematic use of adjuvant chemotherapy, has drastically reduced mortality rates and increased disease-free survival (DFS) in breast cancer. Still, about one third of patients undergoing breast tumor excision will develop metastases, the major life-threatening event which is strongly associated with poor outcome [6,7]. The risk of relapse after tumor resection is not constant over time. A detailed examination of large series of long-term follow-up years, respectively, followed by a nearly flat plateau in which the risk of relapse tends to zero [8–10]. A causal link between tumor surgery and the bimodal pattern of recurrence has been proposed by some investigators (i.e. an iatrogenic effect) [11]. According to that model, surgical removal of the primary breast tumor would accelerate the growth of dormant metastatic foci by altering the balance between circulating pro- and anti-angiogenic factors [9,11–14]. Such hypothesis is supported by the fact that the two peaks of relapse are observed regardless other factors than surgery, such as the axillary nodal status, the type of surgery or the administration of adjuvant therapy. Although estrogen receptor (ER)-negative tumors are commonly associated with a higher risk In order to select the statistically significant and differentially expressed miRNAs from Fig. 1, paired and multiple comparisons among the prognosis groups A, B and C were performed. Two different approaches, limma and RankProd Bioconductor, were employed. Only those candidates with a fold change (FC).2 (either up- or down-regulated) and an adjusted p-value,0.05 were selected (Table 2). Thus, comparison of the logFC and p-values obtained with both limma and RankProd libraries led to the identification of miR-149, miR-20b, miR-30a-3p, miR-342-5p, downregulation in basal-like tumors. They also showed an inverse relationship between the mitotic index and both miR-30a-3p and miR-342-5p [76]. Differential expression of all six miRNAs were also determined by RT-qPCR in the three prognosis groups (Table 2). With the exception of miR-625, which could not be validated, miR-149, miR-20b, miR10a, miR-30a-3p and miR-342-5p (the ‘‘5-miRNA signature’’, from now on) were all confirmed to be down-regulated in tumors from relapsing patients (groups B or C) when compared Table 2. Most significant deregulated miRNAs in breast tumors from relapsing patients. limma F* RankProd** RT-qPCR*** Comparison# miRNA logFC adj-pval logFC adj-pval logFC SE B/A hsa-miR-149 21.410 0.0016 21.615 ,0.00001 22.646 0.724 hsa-miR-20b 21.048 0.0071 21.237 ,0.00001 21.542 0.521 hsa-miR-30a-3p 21.359 0.0078 21.521 ,0.00001 21.001 0.514 hsa-miR-625 21.149 0.0014 21.377 ,0.00001 20.347 0.282 hsa-miR-10a 21.235 0.0168 21.547 ,0.00001 21.108 0.404 BC/A hsa-miR-149 21.120 0.0117 21.329 ,0.00001 22.555 0.681 hsa-miR-20b 21.016 0.0076 21.155 ,0.00001 21.470 0.536 hsa-miR-30a-3p 21.124 0.0256 21.326 ,0.00001 20.994 0.458 hsa-miR-625 21.003 0.0049 21.223 ,0.00001 20.266 0.237 B/AC hsa-miR-149 21.294 0.0052 21.446 ,0.00001 22.340 0.698 hsa-miR-10a 21.397 0.0093 21.647 ,0.00001 21.241 0.404 hsa-miR-342-5p 21.123 0.0159 21.254 ,0.00001 21.194 0.627 # Group A = no recurrence, Group B = early recurrence (#24 months after surgery), Group C = late recurrence (50–60 months after surgery). *limma F, analysis of filtered data (sd.70%) using limma. **RankProd, analysis of unfiltered data using RankProduct algorithm. ***RT-qPCR, Relative miRNA expression was calculated using the DDCt method. The standard error (SE) was calculated based on the theory of error propagation [107]. doi:10.1371/journal.pone.0091884.t002 PLOS ONE | www.plosone.org 6 March 2014 | Volume 9 | Issue 3 | e91884 B B A B B A B B B B C A A C A B B A A B A B B B B A A B B C A A A B A A A A C A A A A A A A C C A A C A A A A A B A A C B A C B A B B A C B C C B B B hsa−miR−10a_st hsa−miR−149_st hsa−miR−20b_st hsa−miR−30a−star_st hsa−miR−342−5p_st Pérez-Rivas et al., Figure 2 -3 -2 -1 0 miR-10a log2FoldChange -3 -2 -1 0 miR-149 log2FoldChange -3 -2 -1 0 miR-20b log2FoldChange -3 -2 -1 0 miR-30a-3p log2FoldChange -3 -2 -1 0 miR-342-5p log2FoldChange B vs A BC vs A B vs AC A B COLABORACIÓN: Emilio Alba
 José M. Jerez
  • 7. RNA-seq 7 SOFTWARE Open Access SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read Juan Falgueras1 , Antonio J Lara2 , Noé Fernández-Pozo3 , Francisco R Cantón3 , Guillermo Pérez-Trabado2,4 , M Gonzalo Claros2,3* Abstract Background: High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre- processing algorithms. Results: SeqTrim has been implemented both as a Web and as a standalone command line application. Already- published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions: SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. Background Sequencing projects and Expressed Sequence Tags (ESTs) are essential for gene discovery, mapping, func- tional genomics and for future efforts in genome anno- tations, which include identification of novel genes, gene location, polymorphisms and even intron-exon bound- aries. The availability of high-throughput automated sequencing has enabled an exponential growth rate of sequence data, although not always with the desired quality. This exponential growth is enhanced by the so called “next-generation sequencing”, and efforts have to be made in order to increase the quality and reliability of sequences incorporated into databases: up to 0.4% of sequences in nucleotide databases contain contaminant sequences [1,2]. The situation is even worse in the EST databases, where vector contamination rate reach 1.63% of sequences [3]. Hence, improved and user friendly bioinformatic tools are required to produce more reli- able high-throughput pre-processing methods. Pre-processing includes filtering of low-quality sequences, identification of specific features (such as poly-A or poly-T tails, terminal transferase tails, and adaptors), removal of contaminant sequences (from vec- tor to any other artefacts) and trimming the undesired segments. There are some bioinformatic tools that can accomplish individual pre-processing aspects (e.g. Trim- Seq, TrimEST, VectorStrip, VecScreen, ESTPrep [4], crossmatch, Figaro [5]), and other programs that cope with the complete pre-processing pipeline such as PreGap4 [6] or the broadly used tools Lucy [7,8] and SeqClean [9]. Most of these require installation, are dif- ficult to configure, environment-specific, or focused on specific needs (like a design only for ESTs), or require a change in implementation and design of either the pro- gram or the protocols within the laboratory itself. * Correspondence: claros@uma.es 2 Plataforma Andaluza de Bioinformática, Universidad de Málaga, 29071 Málaga, Spain Falgueras et al. BMC Bioinformatics 2010, 11:38 http://www.biomedcentral.com/1471-2105/11/38 © 2010 Falgueras et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. DEgenes Hunter - A Self-customised Gene Expression Analysis Workflow for Non-model Organisms Isabel Gonz´alez Gayte1 , Roc´ıo Bautista Moreno2 , and M. Gonzalo Claros1,2 1 Departamento de Biolog´ıa Molecular y Bioqu´ımica, Universidad de M´alaga, 29071 M´alaga, Spain 2 Plataforma Andaluza de Bioinform´atica, Centro de Supercomputaci´on y Bioinnovaci´on, Universidad de M´alaga, 29071 M´alaga, Spain Abstract. Data from high-throughput RNA sequencing require the de- velopment of more sophisticate bioinformatics tools to perform optimal gene expression analysis. Several R libraries are well considered for differ- ential expression analyses but according to recent comparative studies, there is still an overall disagreement about which one is the most appro- priate for each experiment. The applicable R libraries mainly depend on the presence or not of a reference genome and the number of replicates gene expression analysis. Several R libraries are well considered for differ- ential expression analyses but according to recent comparative studies, there is still an overall disagreement about which one is the most appro- priate for each experiment. The applicable R libraries mainly depend on the presence or not of a reference genome and the number of replicates per condition. Here it is presented DEgenes Hunter, a RNA-seq analysis workflow for the detection of differentially expressed genes (DEGs) in organisms without genomic reference. The first advantage of DEgenes Hunter over other available solutions is that it is able to decide the most suitable algorithms to be employed according to the number of biological replicates provided in the sample. The different workflow branches allow its automatic self-customisation depending on the input data, when used by users without advanced statistical and programming skills. All appli- cable libraries served to obtain their respective DEGs and, as another advantage, genes marked as DEGs by all R packages employed are consid- ered ‘common DEGs’, showing the lowest false discovery rate compared to the ‘complete DEGs’ group. A third advantage of DEgenes Hunter is that it comes with an integrated quality control module to discard or disregard low quality data before and after preprocessing. The ‘common DEGs’ are finally submitted to a functional gene set enrichment analysis (GSEA) and clustering. All results are provided as a PDF report. Keywords: RNA-seq, R, pipeline, workflow, differential expression, bioinformatic tool, functional analysis. 1 Introduction Nowadays, high-throughput technologies are well considered for genetic stud- ies. For the analysis of gene expression profiles, data are obtained from RNA sequencing (RNA-seq) experiments. RNA-seq provides precise measurements of F. Ortu˜no and I. Rojas (Eds.): IWBBIO 2015, Part II, LNCS 9044, pp. 313–321, 2015. c⃝ Springer International Publishing Switzerland 2015 http://www.scbi.uma.es/seqtrimnext MiSeq @ CIMES Estamos trabajando para aplicarlo en organismos modelo: vid, lenguado y humanos
  • 8. Siempre confirmamos con varios algoritmos 8 DEgenes Hunter - A Self-customised Gene Expression Analysis Workflow 315 Input (Count Data) Data Filtering Replicates 1 ? Replicates 3 ? DESeq2 edgeR limma NOISeq DESeq2 DESeq2 edgeR FUNCTIONAL ANALYSiS topGO Headmap and Clustering Output (Pdf Report) YES YES NO NO Fig. 1. DEgenes Hunter main workflow 2 Methods DEgenes Hunter - A Self-customised Gene Expression Analysis Workflow 317 GO:0003674 molecular_function 1.0000 225 / 41433 GO:0003824 catalytic activity 0.0012 128 / 19303 GO:0004347 glucose−6−phosphate ... 2.02e−11 7 / 22 GO:0004497 monooxygenase activi... 9.77e−11 15 / 294 GO:0005488 binding 0.9677 127 / 25778 GO:0008289 lipid binding 8.45e−16 29 / 797 GO:0016491 oxidoreductase activ... 3.08e−19 50 / 2066 GO:0016853 isomerase activity 3.28e−05 11 / 440 GO:0016860 intramolecular oxido... 1.68e−08 8 / 82 GO:0016861 intramolecular oxido... 4.79e−10 8 / 53 GO:0046906 tetrapyrrole binding 6.07e−11 16 / 335 GO:0097159 organic cyclic compo... 0.9982 57 / 14111 GO:1901363 heterocyclic compoun... 0.9981 57 / 14093 1 2 3 4 5 6 −1.5−1.0−0.50.00.51.01.5 sample Samples 1.5 1.0 0.5 0.0 –0.5 –1.0 –1.5 Zscoreexpression C1 C2 C3 T1 T2 T3 A B C Samples C1 C2 C3 T1 T2 T3 Fig. 2. Example analyses that can be performed with DEgenes Hunter on the ‘common DEGs’ group. A: A GSEA analysis performed with topGO, where rectangle colour represents the relative significance, ranging from dark red (most significant) to bright yellow (least significant). B: A typical heatmap that can also be used as a quality control to verify that control samples (C1, C2 and C3) and treatment samples (T1, T2 and T3) are grouped together. C: Expression clustering performed using cluster where the genes have similar expression levels among control samples, and a clearly higher value in treatment samples. 3.2 Performance Testing Utility of ‘common DEGs’ group was confirmed comparing their FDR values. Figure 3 shows that the FDR for ‘common DEGs’ is considerably lower than for ‘complete DEGs’ and ‘non-common DEGs’ using separately any R package. Since there is no clear way to set the threshold for qNOISeq [15], it is very high in all cases. DEgenes Hunter - A Self-customised Gene Expression Analysis Workflow 31 100/0 50/50 0/100 Fig. 4. Venn diagrams showing the numbers of DEGs found in synthetic data whe different DEG ratios are used. 100/0 corresponds to all over-expressed/none repressed 50/50 is the balanced ratio, and 100/0 corresponds to none over-expressed/all re pressed.
  • 9. of a Pinus pinaster gene, one from photosynthetic tissue and one from non-photosynthetic tissue (Table 1) were analysed. Sequences were aligned with MultAlin using identified a divergent region, and that the primers were correctly designed and worked as predicted by the software. Figure 6 Use of AlignMiner for designing several specific primer pairs for PCR amplification of the different isoforms of the AtGS1 nucleotide sequence (A) The 5’ and 3’ divergent regions obtained with Entropy that were selected for primer design including the characteristic parameters of each region. (B) Results of the in silico “PCR amplification” with BioPHP [34] using the different primer pairs. Note that the actual 3’ primers are complementary to the sequences shown on the right. Guerrero et al. Algorithms for Molecular Biology 2010, 5:24 http://www.almob.org/content/5/1/24 Page 12 of 16 ¿Qué región es más variable en un alineamiento? 9 SOFTWARE ARTICLE Open Access AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences Darío Guerrero1 , Rocío Bautista1 , David P Villalobos2 , Francisco R Cantón2 , M Gonzalo Claros1,2* Abstract Background: Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results: AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers “on the fly”. Conclusions: AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers’ time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer. Background Since the early days of bioinformatics, the elucidation of similarities between sequences has been an attainable goal to bioinformaticians and other scientists. In fact, multiple sequence alignments (MSAs) stand at a cross- road between computation and biology and, as a result, long-standing programs for DNA or protein MSAs are nowadays widely used, offering high quality MSAs. In recent years, by means of similarities between sequences and due to the rapid accumulation of gene and genome sequences, it has been possible to predict the function and role of a number of genes, discern protein structure and function [1], perform new phylogenetic tree recon- struction, conduct genome evolution studies [2], and design primers. Several scores for quantification of resi- due conservation and even detection of non-strictly-con- served residues have been developed that depend on the composition of the surrounding residue sequence [3], and new sequence aligners are able to integrate highly heterogeneous information and a very large number of sequences. Without exception, the sequence similarity of * Correspondence: claros@uma.es 1 Plataforma Andaluza de Bioinformática (Universidad de Málaga), Severo Ochoa, 34, 29590 Málaga, Spain Guerrero et al. Algorithms for Molecular Biology 2010, 5:24 http://www.almob.org/content/5/1/24 © 2010 Guerrero et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Table 2 Details of primers designed with AlignMiner to identify specifically by PCR the five A. thaliana GS1 genes as well as the two primer pairs that identify the photosynthetic and non-photosynthetic isoforms of P. pinaster; note that the 3’ (reverse) primer is complementary to the sequence appearing in Figures 6 and 8. Isoform Primer Length %GC Tm (°C) Amplicon size (bp) GS1.1 5’-GGTCTTTAGCAACCCTGA-3’ 18 50 54.6 740 5’-ATCATCAAGGATTCCAGA-3’ 18 39 48.7 GS1.2 5’-GATCTTTGCTAACCCTGA-3’ 18 44 51.3 739 5’-CTTTCAAGGGTTCCAGAG-3’ 18 50 53.6 GS1.3 5’-AATCTTCGATCATCCCAA-3’ 18 39 50 739 5’-AAAGTCTAAAGCTTAGAG-3’ 18 33 46 GS1.4 5’-GATCTTCAGCCACCCCGA-3’ 18 61 59.4 739 5’-AATGTGTCATCAACCGAG-3’ 18 44 51.5 GS1.5 5’-GATCTTTGAAGACCCTAG-3’ 18 44 48.8 740 5’-TCTTTCATGGTTTCCAAA-3’ 18 33 50.1 Photosyntetic isoform 5’-AGTGCGCATTAAGGACCCATCA-3’ 22 50 61 177 5’-ACACACTGGCTTCCACAATAGG-3’ 22 50 59.4 Non-photosynthetic isoform 5’-ACAGATGATCTAGGACATGC-3’ 20 45 52 169 5’-CACTTATTTGCACTTGAAGG-3’ 20 40 52.6 Figure 7 Correlation between the most divergent amino acid sequences and antigenicity of the AtGS1 protein MSA. (A) Similarity plot obtained using the Entropy method; the most divergent regions being are highlighted. (B) Aligned sequences for the two divergent regions together (underlined in black) and their score in relation to other divergent regions. (C) Localisation of each divergent region in the alignment where: (i) nucleotides in bold are the predicted epitopes for B-cells; (ii) an “e” denotes predicted solvent accessibility for this position; and (iii) red-boxed amino acids correspond to the sequence of the matching divergent region. It is clearly seen that divergent sequences overlap with the predicted epitopes and the solvent-accessible amino acids. Guerrero et al. Algorithms for Molecular Biology 2010, 5:24 http://www.almob.org/content/5/1/24 Page 13 of 16 Cebadores capaces de distinguir alelos Epítopos específicos http://www.scbi.uma.es/alignminer
  • 10. of a Pinus pinaster gene, one from photosynthetic tissue and one from non-photosynthetic tissue (Table 1) were analysed. Sequences were aligned with MultAlin using identified a divergent region, and that the primers were correctly designed and worked as predicted by the software. Figure 6 Use of AlignMiner for designing several specific primer pairs for PCR amplification of the different isoforms of the AtGS1 nucleotide sequence (A) The 5’ and 3’ divergent regions obtained with Entropy that were selected for primer design including the characteristic parameters of each region. (B) Results of the in silico “PCR amplification” with BioPHP [34] using the different primer pairs. Note that the actual 3’ primers are complementary to the sequences shown on the right. Guerrero et al. Algorithms for Molecular Biology 2010, 5:24 http://www.almob.org/content/5/1/24 Page 12 of 16 ¿Qué región es más variable en un alineamiento? 9 SOFTWARE ARTICLE Open Access AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences Darío Guerrero1 , Rocío Bautista1 , David P Villalobos2 , Francisco R Cantón2 , M Gonzalo Claros1,2* Abstract Background: Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results: AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers “on the fly”. Conclusions: AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers’ time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer. Background Since the early days of bioinformatics, the elucidation of similarities between sequences has been an attainable goal to bioinformaticians and other scientists. In fact, multiple sequence alignments (MSAs) stand at a cross- road between computation and biology and, as a result, long-standing programs for DNA or protein MSAs are nowadays widely used, offering high quality MSAs. In recent years, by means of similarities between sequences and due to the rapid accumulation of gene and genome sequences, it has been possible to predict the function and role of a number of genes, discern protein structure and function [1], perform new phylogenetic tree recon- struction, conduct genome evolution studies [2], and design primers. Several scores for quantification of resi- due conservation and even detection of non-strictly-con- served residues have been developed that depend on the composition of the surrounding residue sequence [3], and new sequence aligners are able to integrate highly heterogeneous information and a very large number of sequences. Without exception, the sequence similarity of * Correspondence: claros@uma.es 1 Plataforma Andaluza de Bioinformática (Universidad de Málaga), Severo Ochoa, 34, 29590 Málaga, Spain Guerrero et al. Algorithms for Molecular Biology 2010, 5:24 http://www.almob.org/content/5/1/24 © 2010 Guerrero et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Table 2 Details of primers designed with AlignMiner to identify specifically by PCR the five A. thaliana GS1 genes as well as the two primer pairs that identify the photosynthetic and non-photosynthetic isoforms of P. pinaster; note that the 3’ (reverse) primer is complementary to the sequence appearing in Figures 6 and 8. Isoform Primer Length %GC Tm (°C) Amplicon size (bp) GS1.1 5’-GGTCTTTAGCAACCCTGA-3’ 18 50 54.6 740 5’-ATCATCAAGGATTCCAGA-3’ 18 39 48.7 GS1.2 5’-GATCTTTGCTAACCCTGA-3’ 18 44 51.3 739 5’-CTTTCAAGGGTTCCAGAG-3’ 18 50 53.6 GS1.3 5’-AATCTTCGATCATCCCAA-3’ 18 39 50 739 5’-AAAGTCTAAAGCTTAGAG-3’ 18 33 46 GS1.4 5’-GATCTTCAGCCACCCCGA-3’ 18 61 59.4 739 5’-AATGTGTCATCAACCGAG-3’ 18 44 51.5 GS1.5 5’-GATCTTTGAAGACCCTAG-3’ 18 44 48.8 740 5’-TCTTTCATGGTTTCCAAA-3’ 18 33 50.1 Photosyntetic isoform 5’-AGTGCGCATTAAGGACCCATCA-3’ 22 50 61 177 5’-ACACACTGGCTTCCACAATAGG-3’ 22 50 59.4 Non-photosynthetic isoform 5’-ACAGATGATCTAGGACATGC-3’ 20 45 52 169 5’-CACTTATTTGCACTTGAAGG-3’ 20 40 52.6 Figure 7 Correlation between the most divergent amino acid sequences and antigenicity of the AtGS1 protein MSA. (A) Similarity plot obtained using the Entropy method; the most divergent regions being are highlighted. (B) Aligned sequences for the two divergent regions together (underlined in black) and their score in relation to other divergent regions. (C) Localisation of each divergent region in the alignment where: (i) nucleotides in bold are the predicted epitopes for B-cells; (ii) an “e” denotes predicted solvent accessibility for this position; and (iii) red-boxed amino acids correspond to the sequence of the matching divergent region. It is clearly seen that divergent sequences overlap with the predicted epitopes and the solvent-accessible amino acids. Guerrero et al. Algorithms for Molecular Biology 2010, 5:24 http://www.almob.org/content/5/1/24 Page 13 of 16 Cebadores capaces de distinguir alelos Epítopos específicosGuerrero et al. Algorithms for Molecular Biology 2010, 5:24 http://www.almob.org/content/5/1/24 Page 14 of 16 http://www.scbi.uma.es/alignminer
  • 11. Bases de datos de genomas 10 Genetic and physical mapping of the QTLAR3 controlling blight resistance in chickpea (Cicer arietinum L) E. Madrid • P. Seoane • M. G. Claros • F. Barro • J. Rubio • J. Gil • T. Milla´n Received: 14 January 2014 / Accepted: 14 February 2014 / Published online: 26 February 2014 Ó Springer Science+Business Media Dordrecht 2014 Abstract Physical and genetic maps of chickpea a QTL related to Ascochyta blight resistance and located in LG2 (QTLAR3) have been constructed. Single-copy markers based on candidate genes located in the Ca2 pseudomolecule were for the first time obtained and found to be useful for refining the QTL position. The location of the QTLAR3 peak was linked to an ethylene insensitive 3-like gene (Ein3). The Ein3 gene explained the highest percentage of the total phenotypic variation for resistance to blight (44.3 %) with a confidence interval of 16.3 cM. This genomic region was predicted to be at the Ca2 physical position 32–33 Mb, comprising 42 genes. Candidate genes located in this region include Ein3, Avr9/Cf9 and Argonaute 4, directly involved in disease resistance mechanisms. However, there are other genes outside the confidence interval that may play a role in the blight resistance pathway. The information reported in this paper will facilitate the development of functional markers to be used in the screening of germplasm collections or breeding materials, improving the efficiency and effectiveness of conventional breeding methods. Keywords Ascochyta blight Á CandidategenesÁ Physical map Á Molecular markers Introduction Chickpea (Cicer arietinum L.) is a self-pollinated diploid (2n = 2x = 16) annual grain legume widely grown in arid and semi-arid areas across the six continents. Together with other pulse crops, such as lentil (Lens culinaris Medik.), dry pea (Pisum sativum L.) and dry bean (Phaseolus vulgaris L.), chickpea is a major source of protein in human diets, particularly in low-income countries. In addition, chickpea crops play an important role in the maintenance of soil fertility, particularly in dry, rain-fed areas (Berrada et al. 2007). One of the most important factors contributing to instability in chickpea yields is Ascochyta blight, Electronic supplementary material The online version of this article (doi:10.1007/s10681-014-1084-6) contains supple- mentary material, which is available to authorized users. E. Madrid () Á F. Barro Institute for Sustainable Agriculture, CSIC, Apdo 4084, 14080 Co´rdoba, Spain e-mail: b62mahee@uco.es P. Seoane Á M. G. Claros Departamento de Biologı´a Molecular y Bioquı´mica, y Plataforma Andaluza de Bioinforma´tica, Universidad de Ma´laga, 29071 Ma´laga, Spain J. Rubio A´ rea de Mejora y Biotecnologı´a, IFAPA Centro Alameda del Obispo, Apdo 3092, 14080 Co´rdoba, Spain J. Gil Á T. Milla´n Departamento de Gene´tica, Universidad de Co´rdoba, Campus Rabanales, Edif. C5, 14071 Co´rdoba, Spain 123 Euphytica (2014) 198:69–78 DOI 10.1007/s10681-014-1084-6 Genetic and physical mapping of the QTLAR3 controlling blight resistance in chickpea (Cicer arietinum L) E. Madrid • P. Seoane • M. G. Claros • F. Barro • J. Rubio • J. Gil • T. Milla´n Received: 14 January 2014 / Accepted: 14 February 2014 / Published online: 26 February 2014 Ó Springer Science+Business Media Dordrecht 2014 Abstract Physical and genetic maps of chickpea a QTL related to Ascochyta blight resistance and located in LG2 (QTLAR3) have been constructed. Single-copy markers based on candidate genes located in the Ca2 pseudomolecule were for the first time obtained and found to be useful for refining the QTL position. The location of the QTLAR3 peak was linked to an ethylene insensitive 3-like gene (Ein3). The Ein3 gene explained the highest percentage of the total phenotypic variation for resistance to blight (44.3 %) with a confidence interval of 16.3 cM. This genomic region was predicted to be at the Ca2 physical position 32–33 Mb, comprising 42 genes. Candidate genes located in this region include Ein3, Avr9/Cf9 and Argonaute 4, directly involved in disease resistance mechanisms. However, there are other genes outside the confidence interval that may play a role in the blight resistance pathway. The information reported in this paper will facilitate the development of functional markers to be used in the screening of germplasm collections or breeding materials, improving the efficiency and effectiveness of conventional breeding methods. Keywords Ascochyta blight Á CandidategenesÁ Physical map Á Molecular markers Introduction Chickpea (Cicer arietinum L.) is a self-pollinated diploid (2n = 2x = 16) annual grain legume widely grown in arid and semi-arid areas across the six continents. Together with other pulse crops, such as lentil (Lens culinaris Medik.), dry pea (Pisum sativum L.) and dry bean (Phaseolus vulgaris L.), chickpea is a major source of protein in human diets, particularly in low-income countries. In addition, chickpea crops play an important role in the maintenance of soil fertility, particularly in dry, rain-fed areas (Berrada et al. 2007). One of the most important factors contributing to instability in chickpea yields is Ascochyta blight, Electronic supplementary material The online version of this article (doi:10.1007/s10681-014-1084-6) contains supple- mentary material, which is available to authorized users. E. Madrid () Á F. Barro Institute for Sustainable Agriculture, CSIC, Apdo 4084, 14080 Co´rdoba, Spain e-mail: b62mahee@uco.es P. Seoane Á M. G. Claros Departamento de Biologı´a Molecular y Bioquı´mica, y Plataforma Andaluza de Bioinforma´tica, Universidad de Ma´laga, 29071 Ma´laga, Spain J. Rubio A´ rea de Mejora y Biotecnologı´a, IFAPA Centro Alameda del Obispo, Apdo 3092, 14080 Co´rdoba, Spain J. Gil Á T. Milla´n Departamento de Gene´tica, Universidad de Co´rdoba, Campus Rabanales, Edif. C5, 14071 Co´rdoba, Spain 123 Euphytica (2014) 198:69–78 DOI 10.1007/s10681-014-1084-6 SNP SNP
  • 12. BD de transcriptomas 11 De novo assembly of maritime pine transcriptome: implications for forest breeding and biotechnology Javier Canales1,† , Rocio Bautista2,† , Philippe Label3† , Josefa Gomez-Maldonado1 , Isabelle Lesur4,5,6 , Noe Fernandez-Pozo2 , Marina Rueda-Lopez1 , Dario Guerrero-Fernandez2 , Vanessa Castro-Rodrıguez1 , Hicham Benzekri2 , Rafael A. Ca~nas1 , Marıa-Angeles Guevara7 , Andreia Rodrigues8 , Pedro Seoane2 , Caroline Teyssier9 , Alexandre Morel9 , Francßois Ehrenmann4,5 , Gregoire Le Provost4,5 , Celine Lalanne4,5 , Celine Noirot10 , Christophe Klopp10 , Isabelle Reymond11 , Angel Garcıa-Gutierrez1 , Jean-Francßois Trontin11 , Marie-Anne Lelu-Walter9 , Celia Miguel8 , Marıa Teresa Cervera7 , Francisco R. Canton1 , Christophe Plomion4,5 , Luc Harvengt11 , Concepcion Avila1,2 , M. Gonzalo Claros1,2 and Francisco M. Canovas1,2, * 1 Departamento de Biologıa Molecular y Bioquımica, Facultad de Ciencias, Universidad de Malaga, Malaga, Spain 2 Plataforma Andaluza de Bioinformatica, Edificio de Bioinnovacion, Parque Tecnologico de Andalucıa, Malaga, Spain 3 INRA, Universite Blaise Pascal, Aubiere Cedex, France 4 INRA, Cestas, France 5 Universite de Bordeaux, Talence, France 6 HelixVenture, Merignac, France 7 Departamento de Ecologıa y Genetica Forestal, INIA-CIFOR, Madrid, Spain 8 Forest Biotech Lab, IBET/ITQB, Oeiras, Portugal 9 INRA, Unite Amelioration, Genetique et Physiologie Forestieres, Orleans Cedex 2, France 10 INRA de Toulouse Midi-Pyrenees, Auzeville, Castanet Tolosan cedex, France 11 FCBA, P^ole Biotechnologie et Sylviculture, Cestas, France Received 20 July 2013; revised 24 September 2013; accepted 26 September 2013. *Correspondence (Tel: +34 952131942; fax: +34 952132376; email: canovas@uma.es) † These authors contributed equally to work. Summary Maritime pine (Pinus pinaster Ait.) is a widely distributed conifer species in Southwestern Europe and one of the most advanced models for conifer research. In the current work, comprehensive characterization of the maritime pine transcriptome was performed using a combination of two different next-generation sequencing platforms, 454 and Illumina. De novo assembly of the transcriptome provided a catalogue of 26 020 unique transcripts in maritime pine trees and a collection of 9641 full-length cDNAs. Quality of the transcriptome assembly was validated by RT-PCR amplification of selected transcripts for structural and regulatory genes. Transcription factors and enzyme-encoding transcripts were annotated. Furthermore, the available sequencing data permitted the identification of polymorphisms and Plant Biotechnology Journal (2014) 12, pp. 286–299 doi: 10.1111/pbi.12136 http://www.scbi.uma.es/sustainpinedb/ RESEARCH ARTICLE Open Access De novo assembly, characterization and functional annotation of Senegalese sole (Solea senegalensis) and common sole (Solea solea) transcriptomes: integration in a database and design of a microarray Hicham Benzekri1,2 , Paula Armesto3 , Xavier Cousin4,5 , Mireia Rovira6 , Diego Crespo6 , Manuel Alejandro Merlo7 , David Mazurais8 , Rocío Bautista2 , Darío Guerrero-Fernández2 , Noe Fernandez-Pozo1 , Marian Ponce3 , Carlos Infante9 , Jose Luis Zambonino8 , Sabine Nidelet10 , Marta Gut11 , Laureana Rebordinos7 , Josep V Planas6 , Marie-Laure Bégout4 , M Gonzalo Claros1,2 and Manuel Manchado3* Abstract Background: Senegalese sole (Solea senegalensis) and common sole (S. solea) are two economically and evolutionary important flatfish species both in fisheries and aquaculture. Although some genomic resources and tools were recently described in these species, further sequencing efforts are required to establish a complete transcriptome, and to identify new molecular markers. Moreover, the comparative analysis of transcriptomes will be useful to understand flatfish evolution. Results: A comprehensive characterization of the transcriptome for each species was carried out using a large set of Illumina data (more than 1,800 millions reads for each sole species) and 454 reads (more than 5 millions reads only in S. senegalensis), providing coverages ranging from 1,384x to 2,543x. After a de novo assembly, 45,063 and 38,402 different transcripts were obtained, comprising 18,738 and 22,683 full-length cDNAs in S. senegalensis and S. solea, respectively. A reference transcriptome with the longest unique transcripts and putative non-redundant new transcripts was established for each species. A subset of 11,953 reference transcripts was qualified as highly reliable orthologs (97% identity) between both species. A small subset of putative species-specific, lineage-specific and flatfish-specific transcripts were also identified. Furthermore, transcriptome data permitted the identification of single nucleotide polymorphisms and simple-sequence repeats confirmed by FISH to be used in further genetic and expression studies. Moreover, evidences on the retention of crystallins crybb1, crybb1-like and crybb3 in the two species of soles are also presented. Transcriptome information was applied to the design of a microarray tool in S. senegalensis that was successfully tested and validated by qPCR. Finally, transcriptomic data were hosted and structured at SoleaDB. Conclusions: Transcriptomes and molecular markers identified in this study represent a valuable source for future genomic studies in these economically important species. Orthology analysis provided new clues regarding sole genome evolution indicating a divergent evolution of crystallins in flatfish. The design of a microarray and establishment of a reference transcriptome will be useful for large-scale gene expression studies. Moreover, the integration of Benzekri et al. BMC Genomics 2014, 15:952 http://www.biomedcentral.com/1471-2164/15/952 http://www.juntadeandalucia.es/ agriculturaypesca/ifapa/soleadb_ifapa/
  • 13. ReprOlive y alérgenos nuevos 12 Unigen number QSEQID FLN_STATUS FLN_HIT_DEFINITION SACC ALLERGOME CODE SDEFINITION 1 olive_transcript_000475 Complete Sure sp=5-methyltetrahydropteroyltriglutamate--homocysteine methyltransferase; Catharanthus roseus (Madagascar periwinkle) (Vinca rosea).E3VW74 - Pollen allergen MetE (Fragment) OS=Amaranthus retroflexus PE=2 SV=1 2 olive_transcript_000659 Complete Sure sp=Luminal-binding protein 5; Nicotiana tabacum (Common tobacco).Q9FSY7 243; 3215 Putative luminal binding protein OS=Corylus avellana GN=BiP PE=2 SV=1 3 olive_transcript_002489 Complete Putative sp=Cysteine proteinase RD19a; Arabidopsis thaliana (Mouse-ear cress).A5HIJ3 1 Cysteine protease Cp3 OS=Actinidia deliciosa PE=2 SV=1 4 olive_transcript_003129 Complete Sure sp=Malate dehydrogenase, mitochondrial; Fragaria ananassa (Strawberry).P17783 6159 Malate dehydrogenase, mitochondrial OS=Citrullus lanatus GN=MMDH PE=1 SV=1 5 olive_transcript_003931 Complete Sure sp=L-ascorbate peroxidase 1, cytosolic; Arabidopsis thaliana (Mouse-ear cress).Q42661 2423 L-ascorbate peroxidase OS=Capsicum annuum PE=2 SV=1 6 olive_transcript_005675 C_terminal Putative sp=Glyceraldehyde-3-phosphate dehydrogenase, cytosolic; Petroselinum crispum (Parsley) (Petroselinum hortense).C7C4X1 9501; 9502 Glyceraldehyde-3-phosphate dehydrogenase OS=Triticum aestivum GN=ga3pd PE=2 SV=1 7 olive_transcript_007323 Complete Putative sp=Triosephosphate isomerase, cytosolic; Petunia hybrida (Petunia).Q9FS79 920; 9498 Triosephosphate isomerase OS=Triticum aestivum GN=tpis PE=2 SV=1 8 olive_transcript_008377 C_terminal Sure sp=Glyceraldehyde-3-phosphate dehydrogenase, cytosolic; Antirrhinum majus (Garden snapdragon).C7C4X1 9501; 9502 Glyceraldehyde-3-phosphate dehydrogenase OS=Triticum aestivum GN=ga3pd PE=2 SV=1 9 olive_transcript_008559 Complete Sure sp=Superoxide dismutase [Mn], mitochondrial; Nicotiana plumbaginifolia (Leadwort-leaved tobacco) (Tex-Mex tobacco).Q9FSJ2 380; 383 Superoxide dismutase (Fragment) OS=Hevea brasiliensis GN=sod PE=2 SV=1 10 olive_transcript_008909 - - B9T876 - Minor allergen Alt a, putative OS=Ricinus communis GN=RCOM_0066700 PE=3 SV=1 11 olive_transcript_009735 - - W9RZW9 - Minor allergen Alt a 7 OS=Morus notabilis GN=L484_009041 PE=3 SV=1 12 olive_transcript_010769 * Complete Sure sp=Probable calcium-binding protein CML13; Arabidopsis thaliana (Mouse-ear cress).Q2KM81 1070; 3105 Polcalcin OS=Artemisia vulgaris PE=2 SV=1 13 olive_transcript_018199 C_terminal Putative sp=Peptidyl-prolyl cis-trans isomerase 1; Glycine max (Soybean) (Glycine hispida).Q8L5T1 134 Peptidyl-prolyl cis-trans isomerase OS=Betula pendula GN=ppiase (CyP) PE=2 SV=1 14 olive_transcript_027589 * C_terminal Putative sp=Profilin; Litchi chinensis (Lychee).Q2PQ57 449 Profilin OS=Litchi chinensis PE=2 SV=1 POLLEN TRANSCRIPTOME ALLERGOME – UNIPROT ALLERGENS Nuevos alérgenos sin describir Nuevas profilinas y variantes de alérgenos conocidos http://reprolive.eez.csic.es/ Búsquedas semánticas COLABORACIÓN: José Aldana
  • 14. AutoFlow: automatización de «workflows» 13 Figure 4 Time(hours) Total_time Euler_assembling_k_25 Euler_assembling_k_29 MIRA3_assembling Euler_remove_artifacts_k_25 Euler_remove_artifacts_k_259 validate_contigs_with_mapping_k_25 validate_contigs_with_mapping_k_29 rescue_unmapped_contigs_k_25 rescue_unmapped_contigs_k_29 recover_MIRA3_debris MIRA3_remove_artifacts CAP3_reconciliation_k_25 CAP3_reconciliation_k_29 FLN_analysis_of_CAP3_contigs_k_25 FLN_analysis_of_CAP3_contigs_k_29 TIDs choose_best_assembly+cp_best_assembly AutoFlow, a Versatile Workflow Engine Illustrated by Assembling an Optimised de novo Transcriptome for a Non-Model Species, such as Faba Bean (Vicia faba) Running title: AutoFlow, a versatile workflow engine Pedro Seoane1 , Sara Ocaña2 , Rosario Carmona3 , Rocío Bautista3 , Eva Madrid4 , Ana M. Torres2 , M. Gonzalo Claros1,3,*