SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Elysium Technologies Private Limited
                                     ISO 9001:2008 A leading Research and Development Division
                                     Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                     Website: elysiumtechnologies.com, elysiumtechnologies.info
                                     Email: info@elysiumtechnologies.com


                                     IEEE Final Year Project List 2011-2012



Abstract                                           DATA ENGINEERING                                          2011 - 2012

01        Dual Framework and Algorithms for Targeted Online Data Delivery



             A variety of emerging online data delivery applications challenge existing techniques for data delivery to human users,
             applications, or middleware that are accessing data from multiple autonomous servers. In this paper, we develop a
             framework for formalizing and comparing pull-based solutions and present dual optimization approaches. The first
             approach, most commonly used nowadays, maximizes user utility under the strict setting of meeting a priori constraints
             on the usage of system resources. We present an alternative and more flexible approach that maximizes user utility by
             satisfying all users. It does this while minimizing the usage of system resources. We discuss the benefits of this latter
             approach and develop an adaptive monitoring solution Satisfy User Profiles (SUPs). Through formal analysis, we identify
             sufficient optimality conditions for SUP. Using real (RSS feeds) and synthetic traces, we empirically analyze the behavior
             of SUP under varying conditions. Our experiments show that we can achieve a high degree of satisfaction of user utility
             when the estimations of SUP closely estimate the real event stream, and has the potential to save a significant amount of
             system resources. We further show that SUP can exploit feedback to improve user utility with only a moderate increase in
             resource utilization.




02         A Flexible Data and Sensor A Fast Multiple Longest Common Subsequence (MLCS) Algorithm




            How to achieve a flexible data and sensor planning service to schedule, plan, and empower diverse sensors and
            heterogeneous data ordering systems is a big challenge. In this paper, a service-oriented framework of data and sensor
            planning service for virtual sensors is proposed. The framework includes an Open Geospatial Consortium (OGC)-compliant
            Sensor Planning Service (SPS), a Web Notification Service (WNS), a Sensor Observation Service (SOS), and virtual sensors.
            There are two important key technologies in this framework, namely a flexible SPS middleware and an asynchronous
            message notification mechanism. The flexible SPS middleware, based on a configuration file and standard interfaces, is
            adopted to integrate virtual sensors into a sensor Web. A WNS-based asynchronous notification middleware is used to
            inform the user of the status of a task that may need midterm or long-term actions. The framework has been successfully
            demonstrated in application scenarios for Simplified General Perturbations Satellite Orbit Model 4 (SGP4) and Earth
            Observation System Clearing HOuse (ECHO). The results show that the proposed method has the following improvements
            over the existing SPS implementation: a uniform planning service for more satellites, a seamless connection with data
            order systems, and a flexible service-oriented framework for virtual sensors.




03          A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification




            Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. In this
            paper, we propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the
            feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each
            other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical
            mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We
            then have one extracted feature for each cluster. The extracted feature, corresponding to a cluster, is a weighted
            combination of the words contained in the cluster. By this algorithm, the derived membership functions match
            closely with and describe properly the real distribution of the training data. Besides, the user need not specify the
            number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted

Madurai                                             Trichy                                              Kollam
Elysium Technologies Private Limited                Elysium Technologies Private Limited                Elysium Technologies Private Limited
230, Church Road, Annanagar,                        3rd Floor,SI Towers,                                Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                      15 ,Melapudur , Trichy,                             kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.          Tamilnadu – 620 001.                                Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                 Contact : 91431 - 4002234.                          eMail: elysium.kollam@gmail.com
                                                    eMail: elysium.trichy@gmail.com
                                                                              1
Elysium Technologies Private Limited
                                      ISO 9001:2008 A leading Research and Development Division
                                      Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                      Website: elysiumtechnologies.com, elysiumtechnologies.info
                                      Email: info@elysiumtechnologies.com


                                      IEEE Final Year Project List 2011-2012
            features can then be avoided. Experimental results show that our method can run faster and obtain better extracted
            features than other methods.




04         A Generic Multilevel Architecture for Time Series Prediction




            Rapidly evolving businesses generate massive amounts of time-stamped data sequences and cause a demand for both
            univariate and multivariate time series forecasting. For such data, traditional predictive models based on autoregression
            are often not sufficient to capture complex nonlinear relationships between multidimensional features and the time series
            outputs. In order to exploit these relationships for improved time series forecasting while also better dealing with a wider
            variety of prediction scenarios, a forecasting system requires a flexible and generic architecture to accommodate and tune
            various individual predictors as well as combination methods. In reply to this challenge, an architecture for combined,
            multilevel time series prediction is proposed, which is suitable for many different universal regressors and combination
            methods. The key strength of this architecture is its ability to build a diversified ensemble of individual predictors that form
            an input to a multilevel selection and fusion process before the final optimized output is obtained. Excellent generalization
            ability is achieved due to the highly boosted complementarity of individual models further enforced through cross-
            validation-linked training on exclusive data subsets and ensemble output postprocessing. In a sample configuration with
            basic neural network predictors and a mean combiner, the proposed system has been evaluated in different scenarios and
            showed a clear prediction performance gain.




05          A Link Analysis Extension of Correspondence Analysis for Mining Relational Databases




            This work introduces a link analysis procedure for discovering relationships in a relational database or a graph,
            generalizing both simple and multiple correspondence analysis. It is based on a random walk model through the
            database defining a Markov chain having as many states as elements in the database. Suppose we are interested in
            analyzing the relationships between some elements (or records) contained in two different tables of the relational
            database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest
            and preserving the main characteristics of the initial chain, is extracted by stochastic complementation [41]. This
            reduced chain is then analyzed by projecting jointly the elements of interest in the diffusion map subspace [42] and
            visualizing the results. This two-step procedure reduces to simple correspondence analysis when only two tables are
            defined, and to multiple correspondence analysis when the database takes the form of a simple star-schema. On the
            other hand, a kernel version of the diffusion map distance, generalizing the basic diffusion map distance to directed
            graphs, is also introduced and the links with spectral clustering are discussed. Several data sets are analyzed by
            using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational
            databases or graphs.




06         A Personalized Ontology Model for Web Information Gathering




             As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in
            personalized web information gathering. However, when representing user profiles, many models have utilized only
            knowledge from either a global knowledge base or a user local information. In this paper, a personalized ontology model is
            proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from
            both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against
            benchmark models in web information gathering. The results show that this ontology model is successful




Madurai                                              Trichy                                                Kollam
Elysium Technologies Private Limited                 Elysium Technologies Private Limited                  Elysium Technologies Private Limited
230, Church Road, Annanagar,                         3rd Floor,SI Towers,                                  Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                       15 ,Melapudur , Trichy,                               kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.           Tamilnadu – 620 001.                                  Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                  Contact : 91431 - 4002234.                            eMail: elysium.kollam@gmail.com
                                                     eMail: elysium.trichy@gmail.com
                                                                                2
Elysium Technologies Private Limited
                                     ISO 9001:2008 A leading Research and Development Division
                                     Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                     Website: elysiumtechnologies.com, elysiumtechnologies.info
                                     Email: info@elysiumtechnologies.com


                                     IEEE Final Year Project List 2011-2012


07         Adaptive Cluster Distance Bounding for High-Dimensional Indexing



             We consider approaches for similarity search in correlated, high-dimensional data sets, which are derived within a
            clustering framework. We note that indexing by “vector approximation” (VA-File), which was proposed as a
            technique to combat the “Curse of Dimensionality,” employs scalar quantization, and hence necessarily ignores
            dependencies across dimensions, which represents a source of suboptimality. Clustering, on the other hand,
            exploits interdimensional correlations and is thus a more compact representation of the data set. However, existing
            methods to prune irrelevant clusters are based on bounding hyperspheres and/or bounding rectangles, whose lack
            of tightness compromises their efficiency in exact nearest neighbor search. We propose a new cluster-adaptive
            distance bound based on separating hyperplane boundaries of Voronoi clusters to complement our cluster based
            index. This bound enables efficient spatial filtering, with a relatively small preprocessing storage overhead and is
            applicable to euclidean and Mahalanobis similarity measures. Experiments in exact nearest-neighbor set retrieval,
            conducted on real data sets, show that our indexing method is scalable with data set size and data dimensionality
            and outperforms several recently proposed indexes. Relative to the VA-File, over a wide range of quantization
            resolutions, it is able to reduce random IO accesses, given (roughly) the same amount of sequential IO operations,
            by factors reaching 100X and more.




08        Anonymous Publication of Sensitive Transactional Data




            Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to
            enforce privacy-preserving paradigms, such as k-anonymity and ‘-diversity, while minimizing the information loss incurred
            in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low
            dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transactional data (or basket
            data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two
            categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate
            nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing
            (LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1)
            reduction to a band matrix and 2) Gray encoding-based sorting. These representations facilitate the formation of
            anonymized groups with low information loss, through an efficient linear-time heuristic. We show experimentally, using
            real-life data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NN-
            search yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead.
            The data transformation based on Gray code sorting performs best in terms of both data utility and execution time.




09          Answering Frequent Probabilistic Inference Queries in Databases




            Existing solutions for probabilistic inference queries mainly focus on answering a single inference query, but seldom
            address the issues of efficiently returning results for a sequence of frequent queries, which is more popular and
            practical in many real applications. In this paper, we mainly study the computation caching and sharing among a
            sequence of inference queries in databases. The clique tree propagation (CTP) algorithm is first introduced in
            databases for probabilistic inference queries. We use the materialized views to cache the intermediate results of the
            previous inference queries, which might be shared with the following queries, and consequently reduce the time
            cost. Moreover, we take the query workload into account to identify the frequently queried variables. To optimize
            probabilistic inference queries with CTP, we cache these frequent query variables into the materialized views to
            maximize the reuse. Due to the existence of different query plans, we present heuristics to estimate costs and select
            the optimal query plan. Finally, we present the experimental evaluation in relational databases to illustrate the validity
            and superiority of our approaches in answering frequent probabilistic inference queries.


Madurai                                            Trichy                                              Kollam
Elysium Technologies Private Limited               Elysium Technologies Private Limited                Elysium Technologies Private Limited
230, Church Road, Annanagar,                       3rd Floor,SI Towers,                                Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                     15 ,Melapudur , Trichy,                             kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.         Tamilnadu – 620 001.                                Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                Contact : 91431 - 4002234.                          eMail: elysium.kollam@gmail.com
                                                   eMail: elysium.trichy@gmail.com
                                                                             3
Elysium Technologies Private Limited
                                      ISO 9001:2008 A leading Research and Development Division
                                      Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                      Website: elysiumtechnologies.com, elysiumtechnologies.info
                                      Email: info@elysiumtechnologies.com


                                      IEEE Final Year Project List 2011-2012



10         Authenticated Multistep Nearest Neighbor Search




            Multistep processing is commonly used for nearest neighbor (NN) and similarity search in applications involving
            highdimensional data and/or costly distance computations. Today, many such applications require a proof of result
            correctness. In this setting, clients issue NN queries to a server that maintains a database signed by a trusted authority.
            The server returns the NN set along with supplementary information that permits result verification using the data set
            signature. An adaptation of the multistep NN algorithm incurs prohibitive network overhead due to the transmission of false
            hits, i.e., records that are not in the NN set, but are nevertheless necessary for its verification. In order to alleviate this
            problem, we present a novel technique that reduces the size of each false hit. Moreover, we generalize our solution for a
            distributed setting, where the database is horizontally partitioned over several servers. Finally, we demonstrate the
            effectiveness of the proposed solutions with real data sets of various dimensionalities.




11         Automatic Discovery of Personal Name Aliases from the Web



             An individual is typically referred by numerous name aliases on the web. Accurate identification of aliases of a given
            person name is useful in various web related tasks such as information retrieval, sentiment analysis, personal name
            disambiguation, and relation extraction. We propose a method to extract aliases of a given personal name from the
            web. Given a personal name, the proposed method first extracts a set of candidate aliases. Second, we rank the
            extracted candidates according to the likelihood of a candidate being a correct alias of the given name. We propose a
            novel, automatically extracted lexical pattern-based approach to efficiently extract a large set of candidate aliases
            from snippets retrieved from a web search engine. We define numerous ranking scores to evaluate candidate aliases
            using three approaches: lexical pattern frequency, word co-occurrences in an anchor text graph, and page counts on
            the web. To construct a robust alias detection system, we integrate the different ranking scores into a single ranking
            function using ranking support vector machines. We evaluate the proposed method on three data sets: an English
            personal names data set, an English place names data set, and a Japanese personal names data set. The proposed
            method outperforms numerous baselines and previously proposed name alias extraction methods, achieving a
            statistically significant mean reciprocal rank (MRR) of 0.67. Experiments carried out using location names and
            Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different
            types of named entities, and for different languages. Moreover, the aliases extracted using the proposed method are
            successfully utilized in an information retrieval task and improve recall by 20 percent in a relationdetection task.




12         Geospatial Automatic Enrichment of Semantic Relation Network and Its Application to Word Sense Disambiguation




            The most fundamental step in semantic information processing (SIP) is to construct knowledge base (KB) at the human
            level; that is to the general understanding and conception of human knowledge. WordNet has been built to be the most
            systematic and as close to the human level and is being applied actively in various works. In one of our previous research,
            we found that a semantic gap exists between concept pairs of WordNet and those of real world. This paper contains a study
            on the enrichment method to build a KB. We describe the methods and the results for the automatic enrichment of the
            semantic relation network. A rule based method using WordNet’s glossaries and an inference method using axioms for
            WordNet relations are applied for the enrichment and an enriched WordNet (E-WordNet) is built as the result. Our
            experimental results substantiate the usefulness of E-WordNet. An evaluation by comparison with the human level is
            attempted. Moreover, WSD-SemNet, a new word sense disambiguation (WSD) method in which E-WordNet is applied, is
            proposed and evaluated by comparing it with the state-of-the-art algorithm..




Madurai                                             Trichy                                                Kollam
Elysium Technologies Private Limited                Elysium Technologies Private Limited                  Elysium Technologies Private Limited
230, Church Road, Annanagar,                        3rd Floor,SI Towers,                                  Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                      15 ,Melapudur , Trichy,                               kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.          Tamilnadu – 620 001.                                  Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                 Contact : 91431 - 4002234.                            eMail: elysium.kollam@gmail.com
                                                    eMail: elysium.trichy@gmail.com
                                                                               4
Elysium Technologies Private Limited
                                     ISO 9001:2008 A leading Research and Development Division
                                     Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                     Website: elysiumtechnologies.com, elysiumtechnologies.info
                                     Email: info@elysiumtechnologies.com


                                     IEEE Final Year Project List 2011-2012


13          Branch-and-Bound for Model Selection and Its Computational Complexity



            Branch-and-bound methods are used in various data analysis problems, such as clustering, seriation and feature
            selection. Classical approaches of branch-and-bound based clustering search through combinations of various
            partitioning possibilities to optimize a clustering cost. However, these approaches are not practically useful for
            clustering of image data where the size of data is large. Additionally, the number of clusters is unknown in most of
            the image data analysis problems. By taking advantage of the spatial coherency of clusters, we formulate an
            innovative branch-and-bound approach, which solves clustering problem as a model-selection problem. In this
            generalized approach, cluster parameter candidates are first generated by spatially coherent sampling. A branch-
            andbound search is carried out through the candidates to select an optimal subset. This paper formulates this
            approach and investigates its average computational complexity. Improved clustering quality and robustness to
            outliers compared to conventional iterative approach are demonstrated with experiments.




14         Measuring Client-Perceived Pageview Response Time of Internet Services




            As e-commerce services are exponentially growing, businesses need quantitative estimates of client-perceived response
            times to continuously improve the quality of their services. Current server-side nonintrusive measurement techniques are
            limited to nonsecured HTTP traffic. In this paper, we present the design and evaluation a monitor, namely sMonitor, which
            is able to measure client-perceived response times for both HTTP and HTTPS traffic. At the heart of sMonitor is a novel
            size-based analysis method that parses live packets to delimit different webpages and to infer their response times. The
            method is based on the observation that most HTTP(S)-compatible browsers send significantly larger requests for
            container objects than those for embedded objects. sMonitor is designed to operate accurately in the presence of
            complicated browser behaviors, such as parallel downloading of multiple webpages and HTTP pipelining, as well as packet
            losses and delays. It requires only to passively collect network traffic in and out of the monitored secured services. We
            conduct comprehensive experiments across a wide range of operating conditions using live secured Internet services, on
            the PlanetLab, and on controlled networks. The experimental results demonstrate that sMonitor is able to control the
            estimation error within 6.7 percent, in comparison with the actual measured time at the client side.




15          Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints




            Most existing data stream classification techniques ignore one important aspect of stream data: arrival of a novel
            class. We address this issue and propose a data stream classification technique that integrates a novel class
            detection mechanism into traditional classifiers, enabling automatic detection of novel classes before the true labels
            of the novel class instances arrive. Novel class detection problem becomes more challenging in the presence of
            concept-drift, when the underlying data distributions evolve in streams. In order to determine whether an instance
            belongs to a novel class, the classification model sometimes needs to wait for more test instances to discover
            similarities among those instances. A maximum allowable wait time Tc is imposed as a time constraint to classify a
            test instance. Furthermore, most existing stream classification approaches assume that the true label of a data point
            can be accessed immediately after the data point is classified. In reality, a time delay Tl is involved in obtaining the
            true label of a data point since manual labeling is time consuming. We show how to make fast and correct
            classification decisions under these constraints and apply them to real benchmark data. Comparison with state-of-
            the-art stream classification techniques proves the superiority of our approach.




Madurai                                             Trichy                                                Kollam
Elysium Technologies Private Limited                Elysium Technologies Private Limited                  Elysium Technologies Private Limited
230, Church Road, Annanagar,                        3rd Floor,SI Towers,                                  Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                      15 ,Melapudur , Trichy,                               kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.          Tamilnadu – 620 001.                                  Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                 Contact : 91431 - 4002234.                            eMail: elysium.kollam@gmail.com
                                                    eMail: elysium.trichy@gmail.com
                                                                               5
Elysium Technologies Private Limited
                                      ISO 9001:2008 A leading Research and Development Division
                                      Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                      Website: elysiumtechnologies.com, elysiumtechnologies.info
                                      Email: info@elysiumtechnologies.com


                                      IEEE Final Year Project List 2011-2012



16          Classification Using Streaming Random Forests




            We consider the problem of data stream classification, where the data arrive in a conceptually infinite stream, and the
            opportunity to examine each record is brief. We introduce a stream classification algorithm that is online, running in
            amortized Oð1Þ time, able to handle intermittent arrival of labeled records, and able to adjust its parameters to respond to
            changing class boundaries (“concept drift”) in the data stream. In addition, when blocks of labeled data are short, the
            algorithm is able to judge internally whether the quality of models updated from them is good enough for deployment on
            unlabeled records, or whether further labeled records are required. Unlike most proposed stream-classification algorithms,
            multiple target classes can be handled. Experimental results on real and synthetic data show that accuracy is comparable
            to a conventional classification algorithm that sees all of the data at once and is able to make multiple passes over it.




17         CoFiDS: A Belief-Theoretic Approach for Automated Collaborative Filtering




            Automated Collaborative Filtering (ACF) refers to a group of algorithms used in recommender systems, a research topic
            that has received considerable attention due to its e-commerce applications. However, existing techniques are rarely
            capable of dealing with imperfections in user-supplied ratings. When such imperfections (e.g., ambiguities) cannot be
            avoided, designers resort to simplifying assumptions that impair the system’s performance and utility. We have developed
            a novel technique referred to as CoFiDS—Collaborative Filtering based on Dempster-Shafer belief-theoretic framework—
            that can represent a wide variety of data imperfections, propagate them throughout the decision-making process without
            the need to make simplifying assumptions, and exploit contextual information. With its DS-theoretic predictions, the
            domain expert can either obtain a “hard” decision or can narrow the set of possible predictions to a smaller set. With its
            capability to handle data imperfections, CoFiDS widens the applicability of ACF to such critical and sensitive domains as
            medical decision support systems and defense-related applications. We describe the theoretical foundation of the system
            and report experiments with a benchmark movie data set. We explore some essential aspects of CoFiDS’ behavior and
            show that its performance compares favorably with other ACF systems




18         Collaborative Filtering with Personalized Skylines




            Collaborative filtering (CF) systems exploit previous ratings and similarity in user behavior to recommend the top-k objects/
            records which are potentially most interesting to the user assuming a single score per object. However, in various
            applications, a record (e.g., hotel) maybe rated on several attributes (value, service, etc.), in which case simply returning the
            ones with the highest overall scores fails to capture the individual attribute characteristics and to accommodate different
            selection criteria. In order to enhance the flexibility of CF, we propose Collaborative Filtering Skyline (CFS), a general
            framework that combines the advantages of CF with those of the skyline operator. CFS generates a personalized skyline for
            each user based on scores of other users with similar behavior. The personalized skyline includes objects that are good on
            certain aspects, and eliminates the ones that are not interesting on any attribute combination. Although the integration of
            skylines and CF has several attractive properties, it also involves rather expensive computations. We face this challenge
            through a comprehensive set of algorithms and optimizations that reduce the cost of generating personalized skylines. In
            addition to exact skyline processing, we develop an approximate method that provides error guarantees. Finally, we
            propose the top-k personalized skyline, where the user specifies the required output cardinality




Madurai                                              Trichy                                                 Kollam
Elysium Technologies Private Limited                 Elysium Technologies Private Limited                   Elysium Technologies Private Limited
230, Church Road, Annanagar,                         3rd Floor,SI Towers,                                   Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                       15 ,Melapudur , Trichy,                                kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.           Tamilnadu – 620 001.                                   Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                  Contact : 91431 - 4002234.                             eMail: elysium.kollam@gmail.com
                                                     eMail: elysium.trichy@gmail.com
                                                                                6
Elysium Technologies Private Limited
                                     ISO 9001:2008 A leading Research and Development Division
                                     Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                     Website: elysiumtechnologies.com, elysiumtechnologies.info
                                     Email: info@elysiumtechnologies.com


                                     IEEE Final Year Project List 2011-2012


19        Comprehensive Citation Index for Research Networks



            The existing Science Citation Index only counts direct citations, whereas PageRank disregards the number of direct
            citations. We propose a new Comprehensive Citation Index (CCI) that evaluates both direct and indirect intellectual
            influence of research papers, and show that CCI is more reliable in discovering research papers with far-reaching
            influence.




20         Constrained Skyline Query Processing against Distributed Data.




            The skyline of a multidimensional point set is a subset of interesting points that are not dominated by others. In this paper,
            we investigate constrained skyline queries in a large-scale unstructured distributed environment, where relevant data are
            distributed among geographically scattered sites. We first propose a partition algorithm that divides all data sites into
            incomparable groups such that the skyline computations in all groups can be parallelized without changing the final result.
            We then develop a novel algorithm framework called PaDSkyline for parallel skyline query processing among partitioned
            site groups. We also employ intragroup optimization and multifiltering technique to improve the skyline query processes
            within each group. In particular, multiple (local) skyline points are sent together with the query as filtering points, which
            help identify unqualified local skyline points early on a data site. In this way, the amount of data to be transmitted via
            network connections is reduced, and thus, the overall query response time is shortened further. Cost models and heuristics
            are proposed to guide the selection of a given number of filtering points from a superset. A costefficient model is
            developed to determine how many filtering points to use for a particular data site.




21         Continuous Monitoring of Distance-Based Range Queries




            Given a positive value r, a distance-based range query returns the objects that lie within the distance r of the query
            location. In this paper, we focus on the distance-based range queries that continuously change their locations in a
            euclidean space. We present an efficient and effective monitoring technique based on the concept of a safe zone.
            The safe zone of a query is the area with a property that while the query remains inside it, the results of the query
            remain unchanged. Hence, the query does not need to be reevaluated unless it leaves the safe zone. Our
            contributions are as follows: 1) We propose a technique based on powerful pruning rules and a unique access order
            which efficiently computes the safe zone and minimizes the I/O cost. 2) We theoretically determine and
            experimentally verify the expected distance a query moves before leaving the safe zone and, for majority of queries,
            the expected number of guard objects. 3) Our experiments demonstrate that the proposed approach is close to
            optimal and is an order of magnitude faster than a naı¨ve algorithm. 4) We also extend our technique to monitor the
            queries in a road network. Our algorithm is up to two order of magnitude faster than a naı¨ve algorithm.




22         Cosdes: A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme




            The E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In
            recent years, the notion of collaborative spam filtering with near-duplicate similarity matching scheme has been widely
            discussed. The primary idea of the similarity matching scheme for spam detection is to maintain a known spam database,
            formed by user feedback, to block subsequent near-duplicate spams. On purpose of achieving efficient similarity matching
            and reducing storage utilization, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail
            content text. However, these abstractions of e-mails cannot fully catch the evolving nature of spams, and are thus not
            effective enough in near-duplicate detection. In this paper, we propose a novel e-mail abstraction scheme, which considers


Madurai                                             Trichy                                               Kollam
Elysium Technologies Private Limited                Elysium Technologies Private Limited                 Elysium Technologies Private Limited
230, Church Road, Annanagar,                        3rd Floor,SI Towers,                                 Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                      15 ,Melapudur , Trichy,                              kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.          Tamilnadu – 620 001.                                 Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                 Contact : 91431 - 4002234.                           eMail: elysium.kollam@gmail.com
                                                    eMail: elysium.trichy@gmail.com
                                                                               7
Elysium Technologies Private Limited
                                       ISO 9001:2008 A leading Research and Development Division
                                       Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                       Website: elysiumtechnologies.com, elysiumtechnologies.info
                                       Email: info@elysiumtechnologies.com


                                       IEEE Final Year Project List 2011-2012
            e-mail layout structure to represent e-mails. We present a procedure to generate the e-mail abstraction using HTML content
            in e-mail, and this newly devised abstraction can more effectively capture the near-duplicate phenomenon of spams.
            Moreover, we design a complete spam detection system Cosdes (standing for COllaborative Spam DEtection System),
            which possesses an efficient near-duplicate matching scheme and a progressive update scheme. The progressive update
            scheme enables system Cosdes to keep the most up-to-date information for near-duplicate detection. We evaluate Cosdes
            on a live data set collected from a real e-mail server and show that our system outperforms the prior approaches in
            detection results and is applicable to the real world.




23        Coupling Logical Analysis of Data and Shadow Clustering for Partially Defined Positive Boolean Function Reconstruction




             The problem of reconstructing the AND-OR expression of a partially defined positive Boolean function (pdpBf) is
            solved by adopting a novel algorithm, denoted by LSC, which combines the advantages of two efficient techniques,
            Logical Analysis of Data (LAD) and Shadow Clustering (SC). The kernel of the approach followed by LAD consists in
            a breadth-first enumeration of all the prime implicants whose degree is not greater than a fixed maximum d. In
            contrast, SC adopts an effective heuristic procedure for retrieving the most promising logical products to be
            included in the resulting AND-OR expression. Since the computational cost required by LAD prevents its application
            even for relatively small dimensions of the input domain, LSC employs a depth-first approach, with asymptotically
            linear memory occupation, to analyze the prime implicants having degree not greater than d. In addition, the
            theoretical analysis proves that LSC presents almost the same asymptotic time complexity as LAD. Extensive
            simulations on artificial benchmarks validate the good behavior of the computational cost exhibited by LSC, in
            agreement with the theoretical analysis. Furthermore, the pdpBf retrieved by LSC always shows a better
            performance, in terms of complexity and accuracy, with respect to those obtained by LAD.




24        Data Leakage Detection




            We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third
            parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebody’s laptop). The
            distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been
            independently gathered by other means. We propose data allocation strategies (across the agents) that improve the
            probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In
            some cases, we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and
            identifying the guilty party.




25        Decision Trees for Uncertain Data



            Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers
            to handle data with uncertain information. Value uncertainty arises in many applications during the data collection
            process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple
            repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but
            by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives
            (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the
            “complete information” of a data item (taking into account the probability density function (pdf)) is utilized. We
            extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments
            have been conducted which show that the resulting classifiers are more accurate than those using value averages.
            Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree
            construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose
            a series of pruning techniques that can greatly improve construction efficiency.



Madurai                                              Trichy                                           Kollam
Elysium Technologies Private Limited                 Elysium Technologies Private Limited             Elysium Technologies Private Limited
230, Church Road, Annanagar,                         3rd Floor,SI Towers,                             Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                       15 ,Melapudur , Trichy,                          kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.           Tamilnadu – 620 001.                             Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                  Contact : 91431 - 4002234.                       eMail: elysium.kollam@gmail.com
                                                     eMail: elysium.trichy@gmail.com
                                                                             8
Elysium Technologies Private Limited
                                      ISO 9001:2008 A leading Research and Development Division
                                      Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                      Website: elysiumtechnologies.com, elysiumtechnologies.info
                                      Email: info@elysiumtechnologies.com


                                      IEEE Final Year Project List 2011-2012



26        Design and Implementation of an Intrusion Response System for Relational Databases




            The intrusion response component of an overall intrusion detection system is responsible for issuing a suitable response
            to an anomalous request. We propose the notion of database response policies to support our intrusion response system
            tailored for a DBMS. Our interactive response policy language makes it very easy for the database administrators to specify
            appropriate response actions for different circumstances depending upon the nature of the anomalous request. The two
            main issues that we address in context of such response policies are that of policy matching, and policy administration. For
            the policy matching problem, we propose two algorithms that efficiently search the policy database for policies that match
            an anomalous request. We also extend the PostgreSQL DBMS with our policy matching mechanism, and report
            experimental results. The experimental evaluation shows that our techniques are very efficient. The other issue that we
            address is that of administration of response policies to prevent malicious modifications to policy objects from legitimate
            users. We propose a novel Joint Threshold Administration Model (JTAM) that is based on the principle of separation of
            duty. The key idea in JTAM is that a policy object is jointly administered by at least k database administrator (DBAs), that is,
            any modification made to a policy object will be invalid unless it has been authorized by at least k DBAs. We present design
            details of JTAM which is based on a cryptographic threshold signature scheme, and show how JTAM prevents malicious
            modifications to policy objects from authorized users. We also implement JTAM in the PostgreSQL DBMS, and report
            experimental results on the efficiency of our techniques.




27         Differential Privacy via Wavelet Transforms




            Privacy-preserving data publishing has attracted considerable research interest in recent years. Among the existing
            solutions, ˇ-differential privacy provides the strongest privacy guarantee. Existing data publishing methods that
            achieve ˇ-differential privacy, however, offer little data utility. In particular, if the output data set is used to answer
            count queries, the noise in the query answers can be proportional to the number of tuples in the data, which renders
            the results useless. In this paper, we develop a data publishing technique that ensures ˇ-differential privacy while
            providing accurate answers for range-count queries, i.e., count queries where the predicate on each attribute is a
            range. The core of our solution is a framework that applies wavelet transforms on the data before adding noise to it.
            We present instantiations of the proposed framework for both ordinal and nominal data, and we provide a theoretical
            analysis on their privacy and utility guarantees. In an extensive experimental study on both real and synthetic data,
            we show the effectiveness and efficiency of our solution.




28         Discovering Activities to Recognize and Track in a Smart Environment




            The machine learning and pervasive sensing technologies found in smart homes offer unprecedented opportunities for
            providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to
            monitor the functional health of smart home residents, we need to design technologies that recognize and track activities
            that people normally perform as part of their daily routines. Although approaches do exist for recognizing activities, the
            approaches are applied to activities that have been preselected and for which labeled training data are available. In
            contrast, we introduce an automated approach to activity tracking that identifies frequent activities that naturally occur in
            an individual’s routine. With this capability, we can then track the occurrence of regular activities to monitor functional
            health and to detect changes in an individual’s patterns and lifestyle. In this paper, we describe our activity mining and
            tracking approach, and validate our algorithms on data collected in physical smart environments.




Madurai                                              Trichy                                                Kollam
Elysium Technologies Private Limited                 Elysium Technologies Private Limited                  Elysium Technologies Private Limited
230, Church Road, Annanagar,                         3rd Floor,SI Towers,                                  Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                       15 ,Melapudur , Trichy,                               kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.           Tamilnadu – 620 001.                                  Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                  Contact : 91431 - 4002234.                            eMail: elysium.kollam@gmail.com
                                                     eMail: elysium.trichy@gmail.com
                                                                                9
Elysium Technologies Private Limited
                                    ISO 9001:2008 A leading Research and Development Division
                                    Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                    Website: elysiumtechnologies.com, elysiumtechnologies.info
                                    Email: info@elysiumtechnologies.com


                                    IEEE Final Year Project List 2011-2012


29         Discovering Conditional Functional Dependencies




              This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension
            of functional dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules
            for cleaning relational data. However, finding quality CFDs is an expensive process that involves intensive manual
            effort. To effectively identify data cleaning rules, we develop techniques for discovering CFDs from relations. Already
            hard for traditional FDs, the discovery problem is more difficult for CFDs. Indeed, mining patterns in CFDs introduces
            new challenges. We provide three methods for CFD discovery. The first, referred to as CFDMiner, is based on
            techniques for mining closed item sets, and is used to discover constant CFDs, namely, CFDs with constant patterns
            only. Constant CFDs are particularly important for object identification, which is essential to data cleaning and data
            integration. The other two algorithms are developed for discovering general CFDs. One algorithm, referred to as
            CTANE, is a levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other, referred to as
            FastCFD, is based on the depth-first approach used in FastFD, a method for discovering FDs. It leverages closed-
            item-set mining to reduce the search space. As verified by our experimental study, CFDMiner can be multiple orders
            of magnitude faster than CTANE and FastCFD for constant CFD discovery. CTANE works well when a given relation
            is large, but it does not scale well with the arity of the relation. FastCFD is far more efficient than CTANE when the
            arity of the relation is large; better still, leveraging optimization based on closed-item-set mining, FastCFD also
            scales well with the size of the relation. These algorithms provide a set of cleaning-rule discovery tools for users to
            choose for different applications.




30        Effective Navigation of Query Results Based on Concept Hierarchies




            Search queries on biomedical databases, such as PubMed, often return a large number of results, only a small
            subset of which is relevant to the user. Ranking and categorization, which can also be combined, have been
            proposed to alleviate this information overload problem. Results categorization for biomedical databases is the focus
            of this work. A natural way to organize biomedical citations is according to their MeSH annotations. MeSH is a
            comprehensive concept hierarchy used by PubMed. In this paper, we present the BioNav system, a novel search
            interface that enables the user to navigate large number of query results by organizing them using the MeSH concept
            hierarchy. First, the query results are organized into a navigation tree. At each node expansion step, BioNav reveals
            only a small subset of the concept nodes, selected such that the expected user navigation cost is minimized. In
            contrast, previous works expand the hierarchy in a predefined static manner, without navigation cost modeling. We
            show that the problem of selecting the best concepts to reveal at each node expansion is NP-complete and propose
            an efficient heuristic as well as a feasible optimal algorithm for relatively small trees. We show experimentally that
            BioNav outperforms state-of-the-art categorization systems by up to an order of magnitude, with respect to the user
            navigation cost.




31         Efficient Periodicity Mining in Time Series Databases Using Suffix Trees




            Periodic pattern mining or periodicity detection has a number of applications, such as prediction, forecasting,
            detection of unusual activities, etc. The problem is not trivial because the data to be analyzed are mostly noisy and
            different periodicity types (namely symbol, sequence, and segment) are to be investigated. Accordingly, we argue
            that there is a need for a comprehensive approach capable of analyzing the whole time series or in a subsection of it
            to effectively handle different types of noise (to a certain degree) and at the same time is able to detect different
            types of periodic patterns; combining these under one umbrella is by itself a challenge. In this paper, we present an
            algorithm which can detect symbol, sequence (partial), and segment (full cycle) periodicity in time series. The
            algorithm uses suffix tree as the underlying data structure; this allows us to design the algorithm such that its

Madurai                                           Trichy                                            Kollam
Elysium Technologies Private Limited              Elysium Technologies Private Limited              Elysium Technologies Private Limited
230, Church Road, Annanagar,                      3rd Floor,SI Towers,                              Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                    15 ,Melapudur , Trichy,                           kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.        Tamilnadu – 620 001.                              Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com               Contact : 91431 - 4002234.                        eMail: elysium.kollam@gmail.com
                                                  eMail: elysium.trichy@gmail.com
                                                                          10
Elysium Technologies Private Limited
                                      ISO 9001:2008 A leading Research and Development Division
                                      Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                      Website: elysiumtechnologies.com, elysiumtechnologies.info
                                      Email: info@elysiumtechnologies.com


                                      IEEE Final Year Project List 2011-2012
            worstcase complexity is Oðk:n2Þ, where k is the maximum length of periodic pattern and n is the length of the
            analyzed portion (whole or subsection) of the time series. The algorithm is noise resilient; it has been successfully
            demonstrated to work with replacement, insertion, deletion, or a mixture of these types of noise. We have tested the
            proposed algorithm on both synthetic and real data from different domains, including protein sequences. The
            conducted comparative study demonstrate the applicability and effectiveness of the proposed algorithm; it is
            generally more time-efficient and noise-resilient than existing algorithms.




32         A Efficient Relevance Feedback for Content-Based Image Retrieval by Mining User Navigation Patterns




            Nowadays, content-based image retrieval (CBIR) is the mainstay of image retrieval systems. To be more profitable,
            relevance feedback techniques were incorporated into CBIR such that more precise results can be obtained by taking
            user’s feedbacks into account. However, existing relevance feedback-based CBIR methods usually request a number of
            iterative feedbacks to produce refined search results, especially in a large-scale image database. This is impractical and
            inefficient in real applications. In this paper, we propose a novel method, Navigation-Pattern-based Relevance Feedback
            (NPRF), to achieve the high efficiency and effectiveness of CBIR in coping with the large-scale image data. In terms of
            efficiency, the iterations of feedback are reduced substantially by using the navigation patterns discovered from the user
            query log. In terms of effectiveness, our proposed search algorithm NPRFSearch makes use of the discovered navigation
            patterns and three kinds of query refinement strategies, Query Point Movement (QPM), Query Reweighting (QR), and Query
            Expansion (QEX), to converge the search space toward the user’s intention effectively. By using NPRF method, high quality
            of image retrieval on RF can be achieved in a small number of feedbacks. The experimental results reveal that NPRF
            outperforms other existing methods significantly in terms of precision, coverage, and number of feedbacks.




33        Efficient Techniques for Online Record Linkage




            The need to consolidate the information contained in heterogeneous data sources has been widely documented in
            recent years. In order to accomplish this goal, an organization must resolve several types of heterogeneity problems,
            especially the entity heterogeneity problem that arises when the same real-world entity type is represented using
            different identifiers in different data sources. Statistical record linkage techniques could be used for resolving this
            problem. However, the use of such techniques for online record linkage could pose a tremendous communication
            bottleneck in a distributed environment (where entity heterogeneity problems are often encountered). In order to
            resolve this issue, we develop a matching tree, similar to a decision tree, and use it to propose techniques that
            reduce the communication overhead significantly, while providing matching decisions that are guaranteed to be the
            same as those obtained using the conventional linkage technique. These techniques have been implemented, and
            experiments with real-world and synthetic databases show significant reduction in communication overhead.




34        Efficient Top-k Approximate Subtree Matching in Small Memory




            We consider the Top-k Approximate Subtree Matching (TASM) problem: finding the k best matches of a small query tree
            within a large document tree using the canonical tree edit distance as a similarity measure between subtrees. Evaluating
            the tree edit distance for large XML trees is difficult: the best known algorithms have cubic runtime and quadratic space
            complexity, and, thus, do not scale. Our solution is TASM-postorder, a memory-efficient and scalable TASM algorithm. We
            prove an upper bound for the maximum subtree size for which the tree edit distance needs to be evaluated. The upper
            bound depends on the query and is independent of the document size and structure. A core problem is to efficiently prune
            subtrees that are above this size threshold. We develop an algorithm based on the prefix ring buffer that allows us to prune
            all subtrees above the threshold in a single postorder scan of the document. The size of the prefix ring buffer is linear in the
            threshold. As a result, the space complexity of TASM-postorder depends only on k and the query size, and the runtime of
            TASM-postorder is linear in the size of the document. Our experimental evaluation on large synthetic and real XML


Madurai                                              Trichy                                                Kollam
Elysium Technologies Private Limited                 Elysium Technologies Private Limited                  Elysium Technologies Private Limited
230, Church Road, Annanagar,                         3rd Floor,SI Towers,                                  Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                       15 ,Melapudur , Trichy,                               kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.           Tamilnadu – 620 001.                                  Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                  Contact : 91431 - 4002234.                            eMail: elysium.kollam@gmail.com
                                                     eMail: elysium.trichy@gmail.com
                                                                               11
Elysium Technologies Private Limited
                                      ISO 9001:2008 A leading Research and Development Division
                                      Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                      Website: elysiumtechnologies.com, elysiumtechnologies.info
                                      Email: info@elysiumtechnologies.com


                                      IEEE Final Year Project List 2011-2012
             documents confirms our analytic results.




35         Energy Time Series Forecasting Based on Pattern Sequence Similarity




             This paper presents a new approach to forecast the behavior of time series based on similarity of pattern sequences.
             First, clustering techniques are used with the aim of grouping and labeling the samples from a data set. Thus, the
             prediction of a data point is provided as follows: first, the pattern sequence prior to the day to be predicted is
             extracted. Then, this sequence is searched in the historical data and the prediction is calculated by averaging all the
             samples immediately after the matched sequence. The main novelty is that only the labels associated with each
             pattern are considered to forecast the future behavior of the time series, avoiding the use of real values of the time
             series until the last step of the prediction process. Results from several energy time series are reported and the
             performance of the proposed method is compared to that of recently published techniques showing a remarkable
             improvement in the prediction.




36        Energy Time Series Forecasting Based on Pattern Sequence Similarity




             This paper presents a new approach to forecast the behavior of time series based on similarity of pattern sequences. First,
             clustering techniques are used with the aim of grouping and labeling the samples from a data set. Thus, the prediction of a
             data point is provided as follows: first, the pattern sequence prior to the day to be predicted is extracted. Then, this
             sequence is searched in the historical data and the prediction is calculated by averaging all the samples immediately after
             the matched sequence. The main novelty is that only the labels associated with each pattern are considered to forecast the
             future behavior of the time series, avoiding the use of real values of the time series until the last step of the prediction
             process. Results from several energy time series are reported and the performance of the proposed method is compared to
             that of recently published techniques showing a remarkable improvement in the prediction.




37         Estimating and Enhancing Real-Time Data Service Delays: Control-Theoretic Approaches




             It is essential to process real-time data service requests such as stock quotes and trade transactions in a timely
             manner using fresh data, which represent the current real-world phenomena such as the stock market status. Users
             may simply leave when the database service delay is excessive. Also, temporally inconsistent data may give an
             outdated view of the real-world status. However, supporting the desired timeliness and freshness is challenging due
             to dynamic workloads. To address the problem, we present new approaches for 1) database backlog estimation, 2)
             fine-grained closed-loop admission control based on the backlog model, and 3) incoming load smoothing. Our
             backlog estimation and control-theoretic approaches aim to support the desired service delay bound without
             degrading the data freshness, critical for real-time data services. Specifically, we design, implement, and evaluate
             two feedback controllers based on linear control theory and fuzzy logic control theory, to meet the desired service
             delay. Workload smoothing, under overload, helps the database admit and process more transactions in a timely
             fashion by probabilistically reducing the burstiness of incoming data service requests. In terms of the data service
             delay and throughput, our closed-loop admission control and probabilistic load smoothing schemes considerably
             outperform several baselines in the experiments undertaken in a stock trading database testbed.




38        Experience Transfer for the Configuration Tuning in Large-Scale Computing Systems




             This paper proposes a new strategy, the experience transfer, to facilitate the management of large-scale computing
             systems. It deals with the utilization of management experiences in one system (or previous systems) to benefit the same

Madurai                                             Trichy                                               Kollam
Elysium Technologies Private Limited                Elysium Technologies Private Limited                 Elysium Technologies Private Limited
230, Church Road, Annanagar,                        3rd Floor,SI Towers,                                 Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                      15 ,Melapudur , Trichy,                              kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.          Tamilnadu – 620 001.                                 Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                 Contact : 91431 - 4002234.                           eMail: elysium.kollam@gmail.com
                                                    eMail: elysium.trichy@gmail.com
                                                                              12
Elysium Technologies Private Limited
                                      ISO 9001:2008 A leading Research and Development Division
                                      Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                      Website: elysiumtechnologies.com, elysiumtechnologies.info
                                      Email: info@elysiumtechnologies.com


                                      IEEE Final Year Project List 2011-2012
             management task in other systems (or current systems). We use the system configuration tuning as a case application to
             demonstrate all procedures involved in the experience transfer including the experience representation, experience
             extraction, and experience embedding. The dependencies between system configuration parameters are treated as
             transferable experiences in the configuration tuning for two reasons: 1) because such knowledge is helpful to the efficiency
             of the optimal configuration search, and 2) because the parameter dependencies are typically unchanged between two
             similar systems. We use the Bayesian network to model configuration dependencies and present a configuration tuning
             algorithm based on the Bayesian network construction and sampling. As a result, after the configuration tuning is
             completed in the original system, we can obtain a Bayesian network as the by-product which records the dependencies
             between system configuration parameters. Such a network is then embedded into the tuning process in other similar
             systems as transferred experiences to improve the configuration search efficiency. Experimental results in a web-based
             system show that with the help of transferred experiences, the configuration tuning process can be significantly
             accelerated.




39        Exploring Application-Level Semantics for Data Compression




             This Natural phenomena show that many creatures form large social groups and move in regular patterns. However,
             previous works focus on finding the movement patterns of each single object or all objects. In this paper, we first
             propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their
             movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D,
             which exploits the obtained group movement patterns to reduce the amount of delivered data. The compression
             algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge phase, we propose a
             Merge algorithm to merge and compress the location data of a group of moving objects. In the entropy reduction
             phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that obtains the optimal
             solution. Moreover, we devise three replacement rules and derive the maximum compression ratio. The experimental
             results show that the proposed compression algorithm leverages the group movement patterns to reduce the
             amount of delivered data effectively and efficiently.




40        Exploring Application-Level Semantics for Data Compression




             Natural phenomena show that many creatures form large social groups and move in regular patterns. However, previous
             works focus on finding the movement patterns of each single object or all objects. In this paper, we first propose an
             efficient distributed mining algorithm to jointly identify a group of moving objects and discover their movement patterns in
             wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which exploits the obtained group
             movement patterns to reduce the amount of delivered data. The compression algorithm includes a sequence merge and an
             entropy reduction phases. In the sequence merge phase, we propose a Merge algorithm to merge and compress the
             location data of a group of moving objects. In the entropy reduction phase, we formulate a Hit Item Replacement (HIR)
             problem and propose a Replace algorithm that obtains the optimal solution. Moreover, we devise three replacement rules
             and derive the maximum compression ratio. The experimental results show that the proposed compression algorithm
             leverages the group movement patterns to reduce the amount of delivered data effectively and efficiently.




41        Finding Correlated Biclusters from Gene Expression Data




             Extracting biologically relevant information from DNA microarrays is a very important task for drug development and
             test, function annotation, and cancer diagnosis. Various clustering methods have been proposed for the analysis of
             gene expression data, but when analyzing the large and heterogeneous collections of gene expression data,
             conventional clustering algorithms often cannot produce a satisfactory solution. Biclustering algorithm has been
             presented as an alternative approach to standard clustering techniques to identify local structures from gene


Madurai                                             Trichy                                               Kollam
Elysium Technologies Private Limited                Elysium Technologies Private Limited                 Elysium Technologies Private Limited
230, Church Road, Annanagar,                        3rd Floor,SI Towers,                                 Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                      15 ,Melapudur , Trichy,                              kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.          Tamilnadu – 620 001.                                 Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                 Contact : 91431 - 4002234.                           eMail: elysium.kollam@gmail.com
                                                    eMail: elysium.trichy@gmail.com
                                                                              13
Elysium Technologies Private Limited
                                     ISO 9001:2008 A leading Research and Development Division
                                     Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                     Website: elysiumtechnologies.com, elysiumtechnologies.info
                                     Email: info@elysiumtechnologies.com


                                     IEEE Final Year Project List 2011-2012
            expression data set. These patterns may provide clues about the main biological processes associated with different
            physiological states. In this paper, different from existing bicluster patterns, we first introduce a more general
            pattern: correlated bicluster, which has intuitive biological interpretation. Then, we propose a novel transform
            technique based on singular value decomposition so that identifying correlated-bicluster problem from gene
            expression matrix is transformed into two global clustering problems. The Mixed-Clustering algorithm and the Lift
            algorithm are devised to efficiently produce ˇ-corBiclusters. The biclusters obtained using our method from gene
            expression data sets of multiple human organs and the yeast Saccharomyces cerevisiae demonstrate clear
            biological meanings.




42        Frequent Item Computation on a Chip




            This Computing frequent items is an important problem by itself and as a subroutine in several data mining algorithms. In
            this paper, we explore how to accelerate the computation of frequent items using field-programmable gate arrays (FPGAs)
            with a threefold goal: increase performance over existing solutions, reduce energy consumption over CPU-based systems,
            and explore the design space in detail as the constraints on FPGAs are very different from those of traditional software-
            based systems. We discuss three design alternatives, each one of them exploiting different FPGA features and each one
            providing different performance/scalability trade-offs. An important result of the paper is to demonstrate how the inherent
            massive parallelism of FPGAs can improve performance of existing algorithms but only after a fundamental redesign of the
            algorithms. Our experimental results show that, e.g., the pipelined solution we introduce can reach more than 100 million
            tuples per second of sustained throughput (four times the best available results to date) by making use of techniques that
            are not available to CPU-based solutions. Moreover, and unlike in software approaches, the high throughput is independent
            of the skew of the Zipf distribution of the input and at a far lower energy cost. paper presents a new approach to forecast
            the behavior of time series based on similarity of pattern sequences. First, clustering techniques are used with the aim of
            grouping and labeling the samples from a data set. Thus, the prediction of a data point is provided as follows: first, the
            pattern sequence prior to the day to be predicted is extracted. Then, this sequence is searched in the historical data and the
            prediction is calculated by averaging all the samples immediately after the matched sequence. The main novelty is that only
            the labels associated with each pattern are considered to forecast the future behavior of the time series, avoiding the use of
            real values of the time series until the last step of the prediction process. Results from several energy time series are
            reported and the performance of the proposed method is compared to that of recently published techniques showing a
            remarkable improvement in the prediction.




43       Inconsistency-Tolerant Integrity Checking




            All methods for efficient integrity checking require all integrity constraints to be totally satisfied, before any update is
            executed. However, a certain amount of inconsistency is the rule, rather than the exception in databases. In this
            paper, we close the gap between theory and practice of integrity checking, i.e., between the unrealistic theoretical
            requirement of total integrity and the practical need for inconsistency tolerance, which we define for integrity
            checking methods. We show that most of them can still be used to check whether updates preserve integrity, even if
            the current state is inconsistent. Inconsistency-tolerant integrity checking proves beneficial both for integrity
            preservation and query answering. Also, we show that it is useful for view updating, repairs, schema evolution, and
            other applications.




44        Initialization and Restart in Stochastic Local Search: Computing a Most Probable Explanation in Bayesian Networks




            For hard computational problems, stochastic local search has proven to be a competitive approach to finding optimal or
            approximately optimal problem solutions. Two key research questions for stochastic local search algorithms are: Which

Madurai                                              Trichy                                              Kollam
Elysium Technologies Private Limited                 Elysium Technologies Private Limited                Elysium Technologies Private Limited
230, Church Road, Annanagar,                         3rd Floor,SI Towers,                                Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                       15 ,Melapudur , Trichy,                             kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.           Tamilnadu – 620 001.                                Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                  Contact : 91431 - 4002234.                          eMail: elysium.kollam@gmail.com
                                                     eMail: elysium.trichy@gmail.com
                                                                              14
Elysium Technologies Private Limited
                                      ISO 9001:2008 A leading Research and Development Division
                                      Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                      Website: elysiumtechnologies.com, elysiumtechnologies.info
                                      Email: info@elysiumtechnologies.com


                                      IEEE Final Year Project List 2011-2012
             algorithms are effective for initialization? When should the search process be restarted? In the present work, we investigate
             these research questions in the context of approximate computation of most probable explanations (MPEs) in Bayesian
             networks (BNs). We introduce a novel approach, based on the Viterbi algorithm, to explanation initialization in BNs. While
             the Viterbi algorithm works on sequences and trees, our approach works on BNs with arbitrary topologies. We also give a
             novel formalization of stochastic local search, with focus on initialization and restart, using probability theory and mixture
             models. Experimentally, we apply our methods to the problem of MPE computation, using a stochastic local search
             algorithm known as Stochastic Greedy Search. By carefully optimizing both initialization and restart, we reduce the MPE
             search time for application BNs by several orders of magnitude compared to using uniform at random initialization without
             restart. On several BNs from applications, the performance of Stochastic Greedy Search is competitive with clique tree
             clustering, a state-of-the-art exact algorithm used for MPE computation in BNs.




45        Integration of the HL7 Standard in a Multiagent System to Support Personalized Access to e-Health Services




             This Natural phenomena show that many creatures form large social groups and move in regular patterns. However,
             previous works focus on finding the movement patterns of each single object or all objects. In this paper, we first
             propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their
             movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D,
             which exploits the obtained group movement patterns to reduce the amount of delivered data. The compression
             algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge phase, we propose a
             Merge algorithm to merge and compress the location data of a group of moving objects. In the entropy reduction
             phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that obtains the optimal
             solution. Moreover, we devise three replacement rules and derive the maximum compression ratio. The experimental
             results show that the proposed compression algorithm leverages the group movement patterns to reduce the
             amount of delivered data effectively and efficiently.




46        Exploring Application-Level Semantics for Data Compression




             Natural phenomena show that many creatures form large social groups and move in regular patterns. However, previous
             works focus on finding the movement patterns of each single object or all objects. In this paper, we first propose an
             efficient distributed mining algorithm to jointly identify a group of moving objects and discover their movement patterns in
             wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which exploits the obtained group
             movement patterns to reduce the amount of delivered data. The compression algorithm includes a sequence merge and an
             entropy reduction phases. In the sequence merge phase, we propose a Merge algorithm to merge and compress the
             location data of a group of moving objects. In the entropy reduction phase, we formulate a Hit Item Replacement (HIR)
             problem and propose a Replace algorithm that obtains the optimal solution. Moreover, we devise three replacement rules
             and derive the maximum compression ratio. The experimental results show that the proposed compression algorithm
             leverages the group movement patterns to reduce the amount of delivered data effectively and efficiently.




47        Intertemporal Discount Factors as a Measure of Trustworthiness in Electronic Commerce




             In multiagent interactions, such as e-commerce and file sharing, being able to accurately assess the trustworthiness
             of others is important for agents to protect themselves from losing utility. Focusing on rational agents in e-
             commerce, we prove that an agent’s discount factor (time preference of utility) is a direct measure of the agent’s
             trustworthiness for a set of reasonably general assumptions and definitions. We propose a general list of desiderata
             for trust systems and discuss how discount factors as trustworthiness meet these desiderata. We discuss how
             discount factors are a robust measure when entering commitments that exhibit moral hazards. Using an online
             market as a motivating example, we derive some analytical methods both for measuring discount factors and for
             aggregating the measurements.

Madurai                                              Trichy                                               Kollam
Elysium Technologies Private Limited                 Elysium Technologies Private Limited                 Elysium Technologies Private Limited
230, Church Road, Annanagar,                         3rd Floor,SI Towers,                                 Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                       15 ,Melapudur , Trichy,                              kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.           Tamilnadu – 620 001.                                 Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                  Contact : 91431 - 4002234.                           eMail: elysium.kollam@gmail.com
                                                     eMail: elysium.trichy@gmail.com
                                                                               15
Elysium Technologies Private Limited
                                      ISO 9001:2008 A leading Research and Development Division
                                      Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore
                                      Website: elysiumtechnologies.com, elysiumtechnologies.info
                                      Email: info@elysiumtechnologies.com


                                      IEEE Final Year Project List 2011-2012




48        IR-Tree: An Efficient Index for Geographic Document Search




            Given a geographic query that is composed of query keywords and a location, a geographic search engine retrieves
            documents that are the most textually and spatially relevant to the query keywords and the location, respectively, and ranks
            the retrieved documents according to their joint textual and spatial relevances to the query. The lack of an efficient index
            that can simultaneously handle both the textual and spatial aspects of the documents makes existing geographic search
            engines inefficient in answering geographic queries. In this paper, we propose an efficient index, called IR-tree, that
            together with a top-k document search algorithm facilitates four major tasks in document searches, namely, 1) spatial
            filtering, 2) textual filtering, 3) relevance computation, and 4) document ranking in a fully integrated manner. In addition, IR-
            tree allows searches to adopt different weights on textual and spatial relevance of documents at the runtime and thus
            caters for a wide variety of applications. A set of comprehensive experiments over a wide range of scenarios has been
            conducted and the experiment results demonstrate that IR-tree outperforms the state-of-theart approaches for geographic
            document searches.




49       Knowledge Discovery in Services (KDS): Aggregating Software Services to Discover Enterprise Mashups




            Service mashup is the act of integrating the resulting data of two complementary software services into a common
            picture. Such an approach is promising with respect to the discovery of new types of knowledge. However, before
            service mashup routines can be executed, it is necessary to predict which services (of an open repository) are viable
            candidates. Similar to Knowledge Discovery in Databases (KDD), we introduce the Knowledge Discovery in Services
            (KDS) process that identifies mashup candidates. In this work, the KDS process is specialized to address a
            repository of open services that do not contain semantic annotations. In these situations, specialized techniques are
            required to determine equivalences among open services with reasonable precision. This paper introduces a bottom-
            up process for KDS that adapts to the environment of services for which it operates. Detailed experiments are
            discussed that evaluate KDS techniques on an open repository of services from the Internet and on a repository of
            services created in a controlled environment.




50        Learning Semi-Riemannian Metrics for Semisupervised Feature Extraction




            Discriminant feature extraction plays a central role in pattern recognition and classification. Linear Discriminant Analysis
            (LDA) is a traditional algorithm for supervised feature extraction. Recently, unlabeled data have been utilized to improve
            LDA. However, the intrinsic problems of LDA still exist and only the similarity among the unlabeled data is utilized. In this
            paper, we propose a novel algorithm, called Semisupervised Semi-Riemannian Metric Map (S3RMM), following the
            geometric framework of semi- Riemannian manifolds. S3RMM maximizes the discrepancy of the separability and similarity
            measures of scatters formulated by using semi-Riemannian metric tensors. The metric tensor of each sample is learned via
            semisupervised regression. Our method can also be a general framework for proposing new semisupervised algorithms,
            utilizing the existing discrepancy-criterion-based algorithms. The experiments demonstrated on faces and handwritten
            digits show that S3RMM is promising for semisupervised feature extraction.




51         Load Shedding in Mobile Systems with MobiQual




            In location-based, mobile continual query (CQ) systems, two key measures of quality-of-service (QoS) are: freshness
            and accuracy. To achieve freshness, the CQ server must perform frequent query reevaluations. To attain accuracy,

Madurai                                              Trichy                                                 Kollam
Elysium Technologies Private Limited                 Elysium Technologies Private Limited                   Elysium Technologies Private Limited
230, Church Road, Annanagar,                         3rd Floor,SI Towers,                                   Surya Complex,Vendor junction,
Madurai , Tamilnadu – 625 020.                       15 ,Melapudur , Trichy,                                kollam,Kerala – 691 010.
Contact : 91452 4390702, 4392702, 4394702.           Tamilnadu – 620 001.                                   Contact : 91474 2723622.
eMail: info@elysiumtechnologies.com                  Contact : 91431 - 4002234.                             eMail: elysium.kollam@gmail.com
                                                     eMail: elysium.trichy@gmail.com
                                                                               16
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining

Weitere ähnliche Inhalte

Was ist angesagt?

Validation of hardness and tensile strength of al 7075 based hybrid composite...
Validation of hardness and tensile strength of al 7075 based hybrid composite...Validation of hardness and tensile strength of al 7075 based hybrid composite...
Validation of hardness and tensile strength of al 7075 based hybrid composite...IAEME Publication
 
Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...ijccsa
 
Whiteboard image reconstruction using matlab
Whiteboard image reconstruction using matlabWhiteboard image reconstruction using matlab
Whiteboard image reconstruction using matlabeSAT Publishing House
 
VTU final year project report Main
VTU final year project report MainVTU final year project report Main
VTU final year project report Mainathiathi3
 

Was ist angesagt? (6)

Validation of hardness and tensile strength of al 7075 based hybrid composite...
Validation of hardness and tensile strength of al 7075 based hybrid composite...Validation of hardness and tensile strength of al 7075 based hybrid composite...
Validation of hardness and tensile strength of al 7075 based hybrid composite...
 
thesis
thesisthesis
thesis
 
Ie3514301434
Ie3514301434Ie3514301434
Ie3514301434
 
Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...Cobe framework cloud ontology blackboard environment for enhancing discovery ...
Cobe framework cloud ontology blackboard environment for enhancing discovery ...
 
Whiteboard image reconstruction using matlab
Whiteboard image reconstruction using matlabWhiteboard image reconstruction using matlab
Whiteboard image reconstruction using matlab
 
VTU final year project report Main
VTU final year project report MainVTU final year project report Main
VTU final year project report Main
 

Andere mochten auch

IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...sunda2011
 
PhaseVis: What, When, Where, and Who in Visualizing the Four Phases of Emerge...
PhaseVis: What, When, Where, and Who in Visualizing the Four Phases of Emerge...PhaseVis: What, When, Where, and Who in Visualizing the Four Phases of Emerge...
PhaseVis: What, When, Where, and Who in Visualizing the Four Phases of Emerge...seungwonvt
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...sunda2011
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...sunda2011
 
Presentation Freelex Eng
Presentation Freelex EngPresentation Freelex Eng
Presentation Freelex Engdepepi
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Communica...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Communica...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Communica...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Communica...sunda2011
 
Safety-clenbuterol-Poisoning/2011-english vers.
Safety-clenbuterol-Poisoning/2011-english vers.Safety-clenbuterol-Poisoning/2011-english vers.
Safety-clenbuterol-Poisoning/2011-english vers.sasaqro
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Biomedica...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Biomedica...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Biomedica...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Biomedica...sunda2011
 

Andere mochten auch (8)

IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
 
PhaseVis: What, When, Where, and Who in Visualizing the Four Phases of Emerge...
PhaseVis: What, When, Where, and Who in Visualizing the Four Phases of Emerge...PhaseVis: What, When, Where, and Who in Visualizing the Four Phases of Emerge...
PhaseVis: What, When, Where, and Who in Visualizing the Four Phases of Emerge...
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Computati...
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Imageproc...
 
Presentation Freelex Eng
Presentation Freelex EngPresentation Freelex Eng
Presentation Freelex Eng
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Communica...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Communica...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Communica...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Communica...
 
Safety-clenbuterol-Poisoning/2011-english vers.
Safety-clenbuterol-Poisoning/2011-english vers.Safety-clenbuterol-Poisoning/2011-english vers.
Safety-clenbuterol-Poisoning/2011-english vers.
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Biomedica...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Biomedica...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Biomedica...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Biomedica...
 

Ähnlich wie IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining

Cloud computing
Cloud computingCloud computing
Cloud computingsunda2011
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networknew
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::NetworknewIEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networknew
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networknewsunda2011
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Paralleld...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Paralleld...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Paralleld...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Paralleld...sunda2011
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Mobilecom...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Mobilecom...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Mobilecom...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Mobilecom...sunda2011
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...SBGC
 
Final Year IEEE Project 2013-2014 - Web Services Project Title and Abstract
Final Year IEEE Project 2013-2014  - Web Services Project Title and AbstractFinal Year IEEE Project 2013-2014  - Web Services Project Title and Abstract
Final Year IEEE Project 2013-2014 - Web Services Project Title and Abstractelysiumtechnologies
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...sunda2011
 
.Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com .Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com msudan92
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networkse...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networkse...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networkse...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networkse...sunda2011
 
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...birdsking
 
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...ijcseit
 
Ieee projects 2012 2013 - Mobile Computing
Ieee projects 2012 2013 - Mobile ComputingIeee projects 2012 2013 - Mobile Computing
Ieee projects 2012 2013 - Mobile ComputingK Sundaresh Ka
 
A HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTION
A HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTIONA HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTION
A HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTIONijcsit
 
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmCloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmIRJET Journal
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cseSBGC
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Auromatio...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Auromatio...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Auromatio...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Auromatio...sunda2011
 
Final Year IEEE Project 2013-2014 - Parallel and Distributed Systems Project...
Final Year IEEE Project 2013-2014  - Parallel and Distributed Systems Project...Final Year IEEE Project 2013-2014  - Parallel and Distributed Systems Project...
Final Year IEEE Project 2013-2014 - Parallel and Distributed Systems Project...elysiumtechnologies
 
IEEE Projects 2013 For ME Cse @ Seabirds ( Trichy, Thanjavur, Perambalur, Di...
IEEE Projects 2013 For ME Cse @  Seabirds ( Trichy, Thanjavur, Perambalur, Di...IEEE Projects 2013 For ME Cse @  Seabirds ( Trichy, Thanjavur, Perambalur, Di...
IEEE Projects 2013 For ME Cse @ Seabirds ( Trichy, Thanjavur, Perambalur, Di...SBGC
 
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )SBGC
 

Ähnlich wie IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining (20)

Cloud computing
Cloud computingCloud computing
Cloud computing
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networknew
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::NetworknewIEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networknew
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networknew
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Paralleld...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Paralleld...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Paralleld...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Paralleld...
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Mobilecom...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Mobilecom...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Mobilecom...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Mobilecom...
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
 
Final Year IEEE Project 2013-2014 - Web Services Project Title and Abstract
Final Year IEEE Project 2013-2014  - Web Services Project Title and AbstractFinal Year IEEE Project 2013-2014  - Web Services Project Title and Abstract
Final Year IEEE Project 2013-2014 - Web Services Project Title and Abstract
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Parallel ...
 
.Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com .Net projects 2011 by core ieeeprojects.com
.Net projects 2011 by core ieeeprojects.com
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networkse...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networkse...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networkse...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Networkse...
 
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
Ieee projects-2014-bulk-ieee-projects-2015-title-list-for-me-be-mphil-final-y...
 
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
 
Ieee projects 2012 2013 - Mobile Computing
Ieee projects 2012 2013 - Mobile ComputingIeee projects 2012 2013 - Mobile Computing
Ieee projects 2012 2013 - Mobile Computing
 
A HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTION
A HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTIONA HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTION
A HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTION
 
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmCloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
 
H040101063069
H040101063069H040101063069
H040101063069
 
Ieee projects 2012 for cse
Ieee projects 2012 for cseIeee projects 2012 for cse
Ieee projects 2012 for cse
 
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Auromatio...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Auromatio...IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Auromatio...
IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Auromatio...
 
Final Year IEEE Project 2013-2014 - Parallel and Distributed Systems Project...
Final Year IEEE Project 2013-2014  - Parallel and Distributed Systems Project...Final Year IEEE Project 2013-2014  - Parallel and Distributed Systems Project...
Final Year IEEE Project 2013-2014 - Parallel and Distributed Systems Project...
 
IEEE Projects 2013 For ME Cse @ Seabirds ( Trichy, Thanjavur, Perambalur, Di...
IEEE Projects 2013 For ME Cse @  Seabirds ( Trichy, Thanjavur, Perambalur, Di...IEEE Projects 2013 For ME Cse @  Seabirds ( Trichy, Thanjavur, Perambalur, Di...
IEEE Projects 2013 For ME Cse @ Seabirds ( Trichy, Thanjavur, Perambalur, Di...
 
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
IEEE Projects 2013 For ME Cse Seabirds ( Trichy, Thanjavur, Karur, Perambalur )
 

Kürzlich hochgeladen

How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptxAneriPatwari
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 

Kürzlich hochgeladen (20)

How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
ARTERIAL BLOOD GAS ANALYSIS........pptx
ARTERIAL BLOOD  GAS ANALYSIS........pptxARTERIAL BLOOD  GAS ANALYSIS........pptx
ARTERIAL BLOOD GAS ANALYSIS........pptx
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
Plagiarism,forms,understand about plagiarism,avoid plagiarism,key significanc...
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 

IEEE Final Year Projects 2011-2012 :: Elysium Technologies Pvt Ltd::Data mining

  • 1. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 Abstract DATA ENGINEERING 2011 - 2012 01 Dual Framework and Algorithms for Targeted Online Data Delivery A variety of emerging online data delivery applications challenge existing techniques for data delivery to human users, applications, or middleware that are accessing data from multiple autonomous servers. In this paper, we develop a framework for formalizing and comparing pull-based solutions and present dual optimization approaches. The first approach, most commonly used nowadays, maximizes user utility under the strict setting of meeting a priori constraints on the usage of system resources. We present an alternative and more flexible approach that maximizes user utility by satisfying all users. It does this while minimizing the usage of system resources. We discuss the benefits of this latter approach and develop an adaptive monitoring solution Satisfy User Profiles (SUPs). Through formal analysis, we identify sufficient optimality conditions for SUP. Using real (RSS feeds) and synthetic traces, we empirically analyze the behavior of SUP under varying conditions. Our experiments show that we can achieve a high degree of satisfaction of user utility when the estimations of SUP closely estimate the real event stream, and has the potential to save a significant amount of system resources. We further show that SUP can exploit feedback to improve user utility with only a moderate increase in resource utilization. 02 A Flexible Data and Sensor A Fast Multiple Longest Common Subsequence (MLCS) Algorithm How to achieve a flexible data and sensor planning service to schedule, plan, and empower diverse sensors and heterogeneous data ordering systems is a big challenge. In this paper, a service-oriented framework of data and sensor planning service for virtual sensors is proposed. The framework includes an Open Geospatial Consortium (OGC)-compliant Sensor Planning Service (SPS), a Web Notification Service (WNS), a Sensor Observation Service (SOS), and virtual sensors. There are two important key technologies in this framework, namely a flexible SPS middleware and an asynchronous message notification mechanism. The flexible SPS middleware, based on a configuration file and standard interfaces, is adopted to integrate virtual sensors into a sensor Web. A WNS-based asynchronous notification middleware is used to inform the user of the status of a task that may need midterm or long-term actions. The framework has been successfully demonstrated in application scenarios for Simplified General Perturbations Satellite Orbit Model 4 (SGP4) and Earth Observation System Clearing HOuse (ECHO). The results show that the proposed method has the following improvements over the existing SPS implementation: a uniform planning service for more satellites, a seamless connection with data order systems, and a flexible service-oriented framework for virtual sensors. 03 A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. In this paper, we propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We then have one extracted feature for each cluster. The extracted feature, corresponding to a cluster, is a weighted combination of the words contained in the cluster. By this algorithm, the derived membership functions match closely with and describe properly the real distribution of the training data. Besides, the user need not specify the number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 1
  • 2. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 features can then be avoided. Experimental results show that our method can run faster and obtain better extracted features than other methods. 04 A Generic Multilevel Architecture for Time Series Prediction Rapidly evolving businesses generate massive amounts of time-stamped data sequences and cause a demand for both univariate and multivariate time series forecasting. For such data, traditional predictive models based on autoregression are often not sufficient to capture complex nonlinear relationships between multidimensional features and the time series outputs. In order to exploit these relationships for improved time series forecasting while also better dealing with a wider variety of prediction scenarios, a forecasting system requires a flexible and generic architecture to accommodate and tune various individual predictors as well as combination methods. In reply to this challenge, an architecture for combined, multilevel time series prediction is proposed, which is suitable for many different universal regressors and combination methods. The key strength of this architecture is its ability to build a diversified ensemble of individual predictors that form an input to a multilevel selection and fusion process before the final optimized output is obtained. Excellent generalization ability is achieved due to the highly boosted complementarity of individual models further enforced through cross- validation-linked training on exclusive data subsets and ensemble output postprocessing. In a sample configuration with basic neural network predictors and a mean combiner, the proposed system has been evaluated in different scenarios and showed a clear prediction performance gain. 05 A Link Analysis Extension of Correspondence Analysis for Mining Relational Databases This work introduces a link analysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a random walk model through the database defining a Markov chain having as many states as elements in the database. Suppose we are interested in analyzing the relationships between some elements (or records) contained in two different tables of the relational database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest and preserving the main characteristics of the initial chain, is extracted by stochastic complementation [41]. This reduced chain is then analyzed by projecting jointly the elements of interest in the diffusion map subspace [42] and visualizing the results. This two-step procedure reduces to simple correspondence analysis when only two tables are defined, and to multiple correspondence analysis when the database takes the form of a simple star-schema. On the other hand, a kernel version of the diffusion map distance, generalizing the basic diffusion map distance to directed graphs, is also introduced and the links with spectral clustering are discussed. Several data sets are analyzed by using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs. 06 A Personalized Ontology Model for Web Information Gathering As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in personalized web information gathering. However, when representing user profiles, many models have utilized only knowledge from either a global knowledge base or a user local information. In this paper, a personalized ontology model is proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against benchmark models in web information gathering. The results show that this ontology model is successful Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 2
  • 3. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 07 Adaptive Cluster Distance Bounding for High-Dimensional Indexing We consider approaches for similarity search in correlated, high-dimensional data sets, which are derived within a clustering framework. We note that indexing by “vector approximation” (VA-File), which was proposed as a technique to combat the “Curse of Dimensionality,” employs scalar quantization, and hence necessarily ignores dependencies across dimensions, which represents a source of suboptimality. Clustering, on the other hand, exploits interdimensional correlations and is thus a more compact representation of the data set. However, existing methods to prune irrelevant clusters are based on bounding hyperspheres and/or bounding rectangles, whose lack of tightness compromises their efficiency in exact nearest neighbor search. We propose a new cluster-adaptive distance bound based on separating hyperplane boundaries of Voronoi clusters to complement our cluster based index. This bound enables efficient spatial filtering, with a relatively small preprocessing storage overhead and is applicable to euclidean and Mahalanobis similarity measures. Experiments in exact nearest-neighbor set retrieval, conducted on real data sets, show that our indexing method is scalable with data set size and data dimensionality and outperforms several recently proposed indexes. Relative to the VA-File, over a wide range of quantization resolutions, it is able to reduce random IO accesses, given (roughly) the same amount of sequential IO operations, by factors reaching 100X and more. 08 Anonymous Publication of Sensitive Transactional Data Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as k-anonymity and ‘-diversity, while minimizing the information loss incurred in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transactional data (or basket data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing (LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1) reduction to a band matrix and 2) Gray encoding-based sorting. These representations facilitate the formation of anonymized groups with low information loss, through an efficient linear-time heuristic. We show experimentally, using real-life data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NN- search yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead. The data transformation based on Gray code sorting performs best in terms of both data utility and execution time. 09 Answering Frequent Probabilistic Inference Queries in Databases Existing solutions for probabilistic inference queries mainly focus on answering a single inference query, but seldom address the issues of efficiently returning results for a sequence of frequent queries, which is more popular and practical in many real applications. In this paper, we mainly study the computation caching and sharing among a sequence of inference queries in databases. The clique tree propagation (CTP) algorithm is first introduced in databases for probabilistic inference queries. We use the materialized views to cache the intermediate results of the previous inference queries, which might be shared with the following queries, and consequently reduce the time cost. Moreover, we take the query workload into account to identify the frequently queried variables. To optimize probabilistic inference queries with CTP, we cache these frequent query variables into the materialized views to maximize the reuse. Due to the existence of different query plans, we present heuristics to estimate costs and select the optimal query plan. Finally, we present the experimental evaluation in relational databases to illustrate the validity and superiority of our approaches in answering frequent probabilistic inference queries. Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 3
  • 4. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 10 Authenticated Multistep Nearest Neighbor Search Multistep processing is commonly used for nearest neighbor (NN) and similarity search in applications involving highdimensional data and/or costly distance computations. Today, many such applications require a proof of result correctness. In this setting, clients issue NN queries to a server that maintains a database signed by a trusted authority. The server returns the NN set along with supplementary information that permits result verification using the data set signature. An adaptation of the multistep NN algorithm incurs prohibitive network overhead due to the transmission of false hits, i.e., records that are not in the NN set, but are nevertheless necessary for its verification. In order to alleviate this problem, we present a novel technique that reduces the size of each false hit. Moreover, we generalize our solution for a distributed setting, where the database is horizontally partitioned over several servers. Finally, we demonstrate the effectiveness of the proposed solutions with real data sets of various dimensionalities. 11 Automatic Discovery of Personal Name Aliases from the Web An individual is typically referred by numerous name aliases on the web. Accurate identification of aliases of a given person name is useful in various web related tasks such as information retrieval, sentiment analysis, personal name disambiguation, and relation extraction. We propose a method to extract aliases of a given personal name from the web. Given a personal name, the proposed method first extracts a set of candidate aliases. Second, we rank the extracted candidates according to the likelihood of a candidate being a correct alias of the given name. We propose a novel, automatically extracted lexical pattern-based approach to efficiently extract a large set of candidate aliases from snippets retrieved from a web search engine. We define numerous ranking scores to evaluate candidate aliases using three approaches: lexical pattern frequency, word co-occurrences in an anchor text graph, and page counts on the web. To construct a robust alias detection system, we integrate the different ranking scores into a single ranking function using ranking support vector machines. We evaluate the proposed method on three data sets: an English personal names data set, an English place names data set, and a Japanese personal names data set. The proposed method outperforms numerous baselines and previously proposed name alias extraction methods, achieving a statistically significant mean reciprocal rank (MRR) of 0.67. Experiments carried out using location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities, and for different languages. Moreover, the aliases extracted using the proposed method are successfully utilized in an information retrieval task and improve recall by 20 percent in a relationdetection task. 12 Geospatial Automatic Enrichment of Semantic Relation Network and Its Application to Word Sense Disambiguation The most fundamental step in semantic information processing (SIP) is to construct knowledge base (KB) at the human level; that is to the general understanding and conception of human knowledge. WordNet has been built to be the most systematic and as close to the human level and is being applied actively in various works. In one of our previous research, we found that a semantic gap exists between concept pairs of WordNet and those of real world. This paper contains a study on the enrichment method to build a KB. We describe the methods and the results for the automatic enrichment of the semantic relation network. A rule based method using WordNet’s glossaries and an inference method using axioms for WordNet relations are applied for the enrichment and an enriched WordNet (E-WordNet) is built as the result. Our experimental results substantiate the usefulness of E-WordNet. An evaluation by comparison with the human level is attempted. Moreover, WSD-SemNet, a new word sense disambiguation (WSD) method in which E-WordNet is applied, is proposed and evaluated by comparing it with the state-of-the-art algorithm.. Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 4
  • 5. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 13 Branch-and-Bound for Model Selection and Its Computational Complexity Branch-and-bound methods are used in various data analysis problems, such as clustering, seriation and feature selection. Classical approaches of branch-and-bound based clustering search through combinations of various partitioning possibilities to optimize a clustering cost. However, these approaches are not practically useful for clustering of image data where the size of data is large. Additionally, the number of clusters is unknown in most of the image data analysis problems. By taking advantage of the spatial coherency of clusters, we formulate an innovative branch-and-bound approach, which solves clustering problem as a model-selection problem. In this generalized approach, cluster parameter candidates are first generated by spatially coherent sampling. A branch- andbound search is carried out through the candidates to select an optimal subset. This paper formulates this approach and investigates its average computational complexity. Improved clustering quality and robustness to outliers compared to conventional iterative approach are demonstrated with experiments. 14 Measuring Client-Perceived Pageview Response Time of Internet Services As e-commerce services are exponentially growing, businesses need quantitative estimates of client-perceived response times to continuously improve the quality of their services. Current server-side nonintrusive measurement techniques are limited to nonsecured HTTP traffic. In this paper, we present the design and evaluation a monitor, namely sMonitor, which is able to measure client-perceived response times for both HTTP and HTTPS traffic. At the heart of sMonitor is a novel size-based analysis method that parses live packets to delimit different webpages and to infer their response times. The method is based on the observation that most HTTP(S)-compatible browsers send significantly larger requests for container objects than those for embedded objects. sMonitor is designed to operate accurately in the presence of complicated browser behaviors, such as parallel downloading of multiple webpages and HTTP pipelining, as well as packet losses and delays. It requires only to passively collect network traffic in and out of the monitored secured services. We conduct comprehensive experiments across a wide range of operating conditions using live secured Internet services, on the PlanetLab, and on controlled networks. The experimental results demonstrate that sMonitor is able to control the estimation error within 6.7 percent, in comparison with the actual measured time at the client side. 15 Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints Most existing data stream classification techniques ignore one important aspect of stream data: arrival of a novel class. We address this issue and propose a data stream classification technique that integrates a novel class detection mechanism into traditional classifiers, enabling automatic detection of novel classes before the true labels of the novel class instances arrive. Novel class detection problem becomes more challenging in the presence of concept-drift, when the underlying data distributions evolve in streams. In order to determine whether an instance belongs to a novel class, the classification model sometimes needs to wait for more test instances to discover similarities among those instances. A maximum allowable wait time Tc is imposed as a time constraint to classify a test instance. Furthermore, most existing stream classification approaches assume that the true label of a data point can be accessed immediately after the data point is classified. In reality, a time delay Tl is involved in obtaining the true label of a data point since manual labeling is time consuming. We show how to make fast and correct classification decisions under these constraints and apply them to real benchmark data. Comparison with state-of- the-art stream classification techniques proves the superiority of our approach. Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 5
  • 6. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 16 Classification Using Streaming Random Forests We consider the problem of data stream classification, where the data arrive in a conceptually infinite stream, and the opportunity to examine each record is brief. We introduce a stream classification algorithm that is online, running in amortized Oð1Þ time, able to handle intermittent arrival of labeled records, and able to adjust its parameters to respond to changing class boundaries (“concept drift”) in the data stream. In addition, when blocks of labeled data are short, the algorithm is able to judge internally whether the quality of models updated from them is good enough for deployment on unlabeled records, or whether further labeled records are required. Unlike most proposed stream-classification algorithms, multiple target classes can be handled. Experimental results on real and synthetic data show that accuracy is comparable to a conventional classification algorithm that sees all of the data at once and is able to make multiple passes over it. 17 CoFiDS: A Belief-Theoretic Approach for Automated Collaborative Filtering Automated Collaborative Filtering (ACF) refers to a group of algorithms used in recommender systems, a research topic that has received considerable attention due to its e-commerce applications. However, existing techniques are rarely capable of dealing with imperfections in user-supplied ratings. When such imperfections (e.g., ambiguities) cannot be avoided, designers resort to simplifying assumptions that impair the system’s performance and utility. We have developed a novel technique referred to as CoFiDS—Collaborative Filtering based on Dempster-Shafer belief-theoretic framework— that can represent a wide variety of data imperfections, propagate them throughout the decision-making process without the need to make simplifying assumptions, and exploit contextual information. With its DS-theoretic predictions, the domain expert can either obtain a “hard” decision or can narrow the set of possible predictions to a smaller set. With its capability to handle data imperfections, CoFiDS widens the applicability of ACF to such critical and sensitive domains as medical decision support systems and defense-related applications. We describe the theoretical foundation of the system and report experiments with a benchmark movie data set. We explore some essential aspects of CoFiDS’ behavior and show that its performance compares favorably with other ACF systems 18 Collaborative Filtering with Personalized Skylines Collaborative filtering (CF) systems exploit previous ratings and similarity in user behavior to recommend the top-k objects/ records which are potentially most interesting to the user assuming a single score per object. However, in various applications, a record (e.g., hotel) maybe rated on several attributes (value, service, etc.), in which case simply returning the ones with the highest overall scores fails to capture the individual attribute characteristics and to accommodate different selection criteria. In order to enhance the flexibility of CF, we propose Collaborative Filtering Skyline (CFS), a general framework that combines the advantages of CF with those of the skyline operator. CFS generates a personalized skyline for each user based on scores of other users with similar behavior. The personalized skyline includes objects that are good on certain aspects, and eliminates the ones that are not interesting on any attribute combination. Although the integration of skylines and CF has several attractive properties, it also involves rather expensive computations. We face this challenge through a comprehensive set of algorithms and optimizations that reduce the cost of generating personalized skylines. In addition to exact skyline processing, we develop an approximate method that provides error guarantees. Finally, we propose the top-k personalized skyline, where the user specifies the required output cardinality Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 6
  • 7. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 19 Comprehensive Citation Index for Research Networks The existing Science Citation Index only counts direct citations, whereas PageRank disregards the number of direct citations. We propose a new Comprehensive Citation Index (CCI) that evaluates both direct and indirect intellectual influence of research papers, and show that CCI is more reliable in discovering research papers with far-reaching influence. 20 Constrained Skyline Query Processing against Distributed Data. The skyline of a multidimensional point set is a subset of interesting points that are not dominated by others. In this paper, we investigate constrained skyline queries in a large-scale unstructured distributed environment, where relevant data are distributed among geographically scattered sites. We first propose a partition algorithm that divides all data sites into incomparable groups such that the skyline computations in all groups can be parallelized without changing the final result. We then develop a novel algorithm framework called PaDSkyline for parallel skyline query processing among partitioned site groups. We also employ intragroup optimization and multifiltering technique to improve the skyline query processes within each group. In particular, multiple (local) skyline points are sent together with the query as filtering points, which help identify unqualified local skyline points early on a data site. In this way, the amount of data to be transmitted via network connections is reduced, and thus, the overall query response time is shortened further. Cost models and heuristics are proposed to guide the selection of a given number of filtering points from a superset. A costefficient model is developed to determine how many filtering points to use for a particular data site. 21 Continuous Monitoring of Distance-Based Range Queries Given a positive value r, a distance-based range query returns the objects that lie within the distance r of the query location. In this paper, we focus on the distance-based range queries that continuously change their locations in a euclidean space. We present an efficient and effective monitoring technique based on the concept of a safe zone. The safe zone of a query is the area with a property that while the query remains inside it, the results of the query remain unchanged. Hence, the query does not need to be reevaluated unless it leaves the safe zone. Our contributions are as follows: 1) We propose a technique based on powerful pruning rules and a unique access order which efficiently computes the safe zone and minimizes the I/O cost. 2) We theoretically determine and experimentally verify the expected distance a query moves before leaving the safe zone and, for majority of queries, the expected number of guard objects. 3) Our experiments demonstrate that the proposed approach is close to optimal and is an order of magnitude faster than a naı¨ve algorithm. 4) We also extend our technique to monitor the queries in a road network. Our algorithm is up to two order of magnitude faster than a naı¨ve algorithm. 22 Cosdes: A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme The E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In recent years, the notion of collaborative spam filtering with near-duplicate similarity matching scheme has been widely discussed. The primary idea of the similarity matching scheme for spam detection is to maintain a known spam database, formed by user feedback, to block subsequent near-duplicate spams. On purpose of achieving efficient similarity matching and reducing storage utilization, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail content text. However, these abstractions of e-mails cannot fully catch the evolving nature of spams, and are thus not effective enough in near-duplicate detection. In this paper, we propose a novel e-mail abstraction scheme, which considers Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 7
  • 8. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 e-mail layout structure to represent e-mails. We present a procedure to generate the e-mail abstraction using HTML content in e-mail, and this newly devised abstraction can more effectively capture the near-duplicate phenomenon of spams. Moreover, we design a complete spam detection system Cosdes (standing for COllaborative Spam DEtection System), which possesses an efficient near-duplicate matching scheme and a progressive update scheme. The progressive update scheme enables system Cosdes to keep the most up-to-date information for near-duplicate detection. We evaluate Cosdes on a live data set collected from a real e-mail server and show that our system outperforms the prior approaches in detection results and is applicable to the real world. 23 Coupling Logical Analysis of Data and Shadow Clustering for Partially Defined Positive Boolean Function Reconstruction The problem of reconstructing the AND-OR expression of a partially defined positive Boolean function (pdpBf) is solved by adopting a novel algorithm, denoted by LSC, which combines the advantages of two efficient techniques, Logical Analysis of Data (LAD) and Shadow Clustering (SC). The kernel of the approach followed by LAD consists in a breadth-first enumeration of all the prime implicants whose degree is not greater than a fixed maximum d. In contrast, SC adopts an effective heuristic procedure for retrieving the most promising logical products to be included in the resulting AND-OR expression. Since the computational cost required by LAD prevents its application even for relatively small dimensions of the input domain, LSC employs a depth-first approach, with asymptotically linear memory occupation, to analyze the prime implicants having degree not greater than d. In addition, the theoretical analysis proves that LSC presents almost the same asymptotic time complexity as LAD. Extensive simulations on artificial benchmarks validate the good behavior of the computational cost exhibited by LSC, in agreement with the theoretical analysis. Furthermore, the pdpBf retrieved by LSC always shows a better performance, in terms of complexity and accuracy, with respect to those obtained by LAD. 24 Data Leakage Detection We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebody’s laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means. We propose data allocation strategies (across the agents) that improve the probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In some cases, we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and identifying the guilty party. 25 Decision Trees for Uncertain Data Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the “complete information” of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency. Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 8
  • 9. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 26 Design and Implementation of an Intrusion Response System for Relational Databases The intrusion response component of an overall intrusion detection system is responsible for issuing a suitable response to an anomalous request. We propose the notion of database response policies to support our intrusion response system tailored for a DBMS. Our interactive response policy language makes it very easy for the database administrators to specify appropriate response actions for different circumstances depending upon the nature of the anomalous request. The two main issues that we address in context of such response policies are that of policy matching, and policy administration. For the policy matching problem, we propose two algorithms that efficiently search the policy database for policies that match an anomalous request. We also extend the PostgreSQL DBMS with our policy matching mechanism, and report experimental results. The experimental evaluation shows that our techniques are very efficient. The other issue that we address is that of administration of response policies to prevent malicious modifications to policy objects from legitimate users. We propose a novel Joint Threshold Administration Model (JTAM) that is based on the principle of separation of duty. The key idea in JTAM is that a policy object is jointly administered by at least k database administrator (DBAs), that is, any modification made to a policy object will be invalid unless it has been authorized by at least k DBAs. We present design details of JTAM which is based on a cryptographic threshold signature scheme, and show how JTAM prevents malicious modifications to policy objects from authorized users. We also implement JTAM in the PostgreSQL DBMS, and report experimental results on the efficiency of our techniques. 27 Differential Privacy via Wavelet Transforms Privacy-preserving data publishing has attracted considerable research interest in recent years. Among the existing solutions, ˇ-differential privacy provides the strongest privacy guarantee. Existing data publishing methods that achieve ˇ-differential privacy, however, offer little data utility. In particular, if the output data set is used to answer count queries, the noise in the query answers can be proportional to the number of tuples in the data, which renders the results useless. In this paper, we develop a data publishing technique that ensures ˇ-differential privacy while providing accurate answers for range-count queries, i.e., count queries where the predicate on each attribute is a range. The core of our solution is a framework that applies wavelet transforms on the data before adding noise to it. We present instantiations of the proposed framework for both ordinal and nominal data, and we provide a theoretical analysis on their privacy and utility guarantees. In an extensive experimental study on both real and synthetic data, we show the effectiveness and efficiency of our solution. 28 Discovering Activities to Recognize and Track in a Smart Environment The machine learning and pervasive sensing technologies found in smart homes offer unprecedented opportunities for providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to monitor the functional health of smart home residents, we need to design technologies that recognize and track activities that people normally perform as part of their daily routines. Although approaches do exist for recognizing activities, the approaches are applied to activities that have been preselected and for which labeled training data are available. In contrast, we introduce an automated approach to activity tracking that identifies frequent activities that naturally occur in an individual’s routine. With this capability, we can then track the occurrence of regular activities to monitor functional health and to detect changes in an individual’s patterns and lifestyle. In this paper, we describe our activity mining and tracking approach, and validate our algorithms on data collected in physical smart environments. Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 9
  • 10. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 29 Discovering Conditional Functional Dependencies This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension of functional dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. However, finding quality CFDs is an expensive process that involves intensive manual effort. To effectively identify data cleaning rules, we develop techniques for discovering CFDs from relations. Already hard for traditional FDs, the discovery problem is more difficult for CFDs. Indeed, mining patterns in CFDs introduces new challenges. We provide three methods for CFD discovery. The first, referred to as CFDMiner, is based on techniques for mining closed item sets, and is used to discover constant CFDs, namely, CFDs with constant patterns only. Constant CFDs are particularly important for object identification, which is essential to data cleaning and data integration. The other two algorithms are developed for discovering general CFDs. One algorithm, referred to as CTANE, is a levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other, referred to as FastCFD, is based on the depth-first approach used in FastFD, a method for discovering FDs. It leverages closed- item-set mining to reduce the search space. As verified by our experimental study, CFDMiner can be multiple orders of magnitude faster than CTANE and FastCFD for constant CFD discovery. CTANE works well when a given relation is large, but it does not scale well with the arity of the relation. FastCFD is far more efficient than CTANE when the arity of the relation is large; better still, leveraging optimization based on closed-item-set mining, FastCFD also scales well with the size of the relation. These algorithms provide a set of cleaning-rule discovery tools for users to choose for different applications. 30 Effective Navigation of Query Results Based on Concept Hierarchies Search queries on biomedical databases, such as PubMed, often return a large number of results, only a small subset of which is relevant to the user. Ranking and categorization, which can also be combined, have been proposed to alleviate this information overload problem. Results categorization for biomedical databases is the focus of this work. A natural way to organize biomedical citations is according to their MeSH annotations. MeSH is a comprehensive concept hierarchy used by PubMed. In this paper, we present the BioNav system, a novel search interface that enables the user to navigate large number of query results by organizing them using the MeSH concept hierarchy. First, the query results are organized into a navigation tree. At each node expansion step, BioNav reveals only a small subset of the concept nodes, selected such that the expected user navigation cost is minimized. In contrast, previous works expand the hierarchy in a predefined static manner, without navigation cost modeling. We show that the problem of selecting the best concepts to reveal at each node expansion is NP-complete and propose an efficient heuristic as well as a feasible optimal algorithm for relatively small trees. We show experimentally that BioNav outperforms state-of-the-art categorization systems by up to an order of magnitude, with respect to the user navigation cost. 31 Efficient Periodicity Mining in Time Series Databases Using Suffix Trees Periodic pattern mining or periodicity detection has a number of applications, such as prediction, forecasting, detection of unusual activities, etc. The problem is not trivial because the data to be analyzed are mostly noisy and different periodicity types (namely symbol, sequence, and segment) are to be investigated. Accordingly, we argue that there is a need for a comprehensive approach capable of analyzing the whole time series or in a subsection of it to effectively handle different types of noise (to a certain degree) and at the same time is able to detect different types of periodic patterns; combining these under one umbrella is by itself a challenge. In this paper, we present an algorithm which can detect symbol, sequence (partial), and segment (full cycle) periodicity in time series. The algorithm uses suffix tree as the underlying data structure; this allows us to design the algorithm such that its Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 10
  • 11. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 worstcase complexity is Oðk:n2Þ, where k is the maximum length of periodic pattern and n is the length of the analyzed portion (whole or subsection) of the time series. The algorithm is noise resilient; it has been successfully demonstrated to work with replacement, insertion, deletion, or a mixture of these types of noise. We have tested the proposed algorithm on both synthetic and real data from different domains, including protein sequences. The conducted comparative study demonstrate the applicability and effectiveness of the proposed algorithm; it is generally more time-efficient and noise-resilient than existing algorithms. 32 A Efficient Relevance Feedback for Content-Based Image Retrieval by Mining User Navigation Patterns Nowadays, content-based image retrieval (CBIR) is the mainstay of image retrieval systems. To be more profitable, relevance feedback techniques were incorporated into CBIR such that more precise results can be obtained by taking user’s feedbacks into account. However, existing relevance feedback-based CBIR methods usually request a number of iterative feedbacks to produce refined search results, especially in a large-scale image database. This is impractical and inefficient in real applications. In this paper, we propose a novel method, Navigation-Pattern-based Relevance Feedback (NPRF), to achieve the high efficiency and effectiveness of CBIR in coping with the large-scale image data. In terms of efficiency, the iterations of feedback are reduced substantially by using the navigation patterns discovered from the user query log. In terms of effectiveness, our proposed search algorithm NPRFSearch makes use of the discovered navigation patterns and three kinds of query refinement strategies, Query Point Movement (QPM), Query Reweighting (QR), and Query Expansion (QEX), to converge the search space toward the user’s intention effectively. By using NPRF method, high quality of image retrieval on RF can be achieved in a small number of feedbacks. The experimental results reveal that NPRF outperforms other existing methods significantly in terms of precision, coverage, and number of feedbacks. 33 Efficient Techniques for Online Record Linkage The need to consolidate the information contained in heterogeneous data sources has been widely documented in recent years. In order to accomplish this goal, an organization must resolve several types of heterogeneity problems, especially the entity heterogeneity problem that arises when the same real-world entity type is represented using different identifiers in different data sources. Statistical record linkage techniques could be used for resolving this problem. However, the use of such techniques for online record linkage could pose a tremendous communication bottleneck in a distributed environment (where entity heterogeneity problems are often encountered). In order to resolve this issue, we develop a matching tree, similar to a decision tree, and use it to propose techniques that reduce the communication overhead significantly, while providing matching decisions that are guaranteed to be the same as those obtained using the conventional linkage technique. These techniques have been implemented, and experiments with real-world and synthetic databases show significant reduction in communication overhead. 34 Efficient Top-k Approximate Subtree Matching in Small Memory We consider the Top-k Approximate Subtree Matching (TASM) problem: finding the k best matches of a small query tree within a large document tree using the canonical tree edit distance as a similarity measure between subtrees. Evaluating the tree edit distance for large XML trees is difficult: the best known algorithms have cubic runtime and quadratic space complexity, and, thus, do not scale. Our solution is TASM-postorder, a memory-efficient and scalable TASM algorithm. We prove an upper bound for the maximum subtree size for which the tree edit distance needs to be evaluated. The upper bound depends on the query and is independent of the document size and structure. A core problem is to efficiently prune subtrees that are above this size threshold. We develop an algorithm based on the prefix ring buffer that allows us to prune all subtrees above the threshold in a single postorder scan of the document. The size of the prefix ring buffer is linear in the threshold. As a result, the space complexity of TASM-postorder depends only on k and the query size, and the runtime of TASM-postorder is linear in the size of the document. Our experimental evaluation on large synthetic and real XML Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 11
  • 12. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 documents confirms our analytic results. 35 Energy Time Series Forecasting Based on Pattern Sequence Similarity This paper presents a new approach to forecast the behavior of time series based on similarity of pattern sequences. First, clustering techniques are used with the aim of grouping and labeling the samples from a data set. Thus, the prediction of a data point is provided as follows: first, the pattern sequence prior to the day to be predicted is extracted. Then, this sequence is searched in the historical data and the prediction is calculated by averaging all the samples immediately after the matched sequence. The main novelty is that only the labels associated with each pattern are considered to forecast the future behavior of the time series, avoiding the use of real values of the time series until the last step of the prediction process. Results from several energy time series are reported and the performance of the proposed method is compared to that of recently published techniques showing a remarkable improvement in the prediction. 36 Energy Time Series Forecasting Based on Pattern Sequence Similarity This paper presents a new approach to forecast the behavior of time series based on similarity of pattern sequences. First, clustering techniques are used with the aim of grouping and labeling the samples from a data set. Thus, the prediction of a data point is provided as follows: first, the pattern sequence prior to the day to be predicted is extracted. Then, this sequence is searched in the historical data and the prediction is calculated by averaging all the samples immediately after the matched sequence. The main novelty is that only the labels associated with each pattern are considered to forecast the future behavior of the time series, avoiding the use of real values of the time series until the last step of the prediction process. Results from several energy time series are reported and the performance of the proposed method is compared to that of recently published techniques showing a remarkable improvement in the prediction. 37 Estimating and Enhancing Real-Time Data Service Delays: Control-Theoretic Approaches It is essential to process real-time data service requests such as stock quotes and trade transactions in a timely manner using fresh data, which represent the current real-world phenomena such as the stock market status. Users may simply leave when the database service delay is excessive. Also, temporally inconsistent data may give an outdated view of the real-world status. However, supporting the desired timeliness and freshness is challenging due to dynamic workloads. To address the problem, we present new approaches for 1) database backlog estimation, 2) fine-grained closed-loop admission control based on the backlog model, and 3) incoming load smoothing. Our backlog estimation and control-theoretic approaches aim to support the desired service delay bound without degrading the data freshness, critical for real-time data services. Specifically, we design, implement, and evaluate two feedback controllers based on linear control theory and fuzzy logic control theory, to meet the desired service delay. Workload smoothing, under overload, helps the database admit and process more transactions in a timely fashion by probabilistically reducing the burstiness of incoming data service requests. In terms of the data service delay and throughput, our closed-loop admission control and probabilistic load smoothing schemes considerably outperform several baselines in the experiments undertaken in a stock trading database testbed. 38 Experience Transfer for the Configuration Tuning in Large-Scale Computing Systems This paper proposes a new strategy, the experience transfer, to facilitate the management of large-scale computing systems. It deals with the utilization of management experiences in one system (or previous systems) to benefit the same Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 12
  • 13. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 management task in other systems (or current systems). We use the system configuration tuning as a case application to demonstrate all procedures involved in the experience transfer including the experience representation, experience extraction, and experience embedding. The dependencies between system configuration parameters are treated as transferable experiences in the configuration tuning for two reasons: 1) because such knowledge is helpful to the efficiency of the optimal configuration search, and 2) because the parameter dependencies are typically unchanged between two similar systems. We use the Bayesian network to model configuration dependencies and present a configuration tuning algorithm based on the Bayesian network construction and sampling. As a result, after the configuration tuning is completed in the original system, we can obtain a Bayesian network as the by-product which records the dependencies between system configuration parameters. Such a network is then embedded into the tuning process in other similar systems as transferred experiences to improve the configuration search efficiency. Experimental results in a web-based system show that with the help of transferred experiences, the configuration tuning process can be significantly accelerated. 39 Exploring Application-Level Semantics for Data Compression This Natural phenomena show that many creatures form large social groups and move in regular patterns. However, previous works focus on finding the movement patterns of each single object or all objects. In this paper, we first propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which exploits the obtained group movement patterns to reduce the amount of delivered data. The compression algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge phase, we propose a Merge algorithm to merge and compress the location data of a group of moving objects. In the entropy reduction phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that obtains the optimal solution. Moreover, we devise three replacement rules and derive the maximum compression ratio. The experimental results show that the proposed compression algorithm leverages the group movement patterns to reduce the amount of delivered data effectively and efficiently. 40 Exploring Application-Level Semantics for Data Compression Natural phenomena show that many creatures form large social groups and move in regular patterns. However, previous works focus on finding the movement patterns of each single object or all objects. In this paper, we first propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which exploits the obtained group movement patterns to reduce the amount of delivered data. The compression algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge phase, we propose a Merge algorithm to merge and compress the location data of a group of moving objects. In the entropy reduction phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that obtains the optimal solution. Moreover, we devise three replacement rules and derive the maximum compression ratio. The experimental results show that the proposed compression algorithm leverages the group movement patterns to reduce the amount of delivered data effectively and efficiently. 41 Finding Correlated Biclusters from Gene Expression Data Extracting biologically relevant information from DNA microarrays is a very important task for drug development and test, function annotation, and cancer diagnosis. Various clustering methods have been proposed for the analysis of gene expression data, but when analyzing the large and heterogeneous collections of gene expression data, conventional clustering algorithms often cannot produce a satisfactory solution. Biclustering algorithm has been presented as an alternative approach to standard clustering techniques to identify local structures from gene Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 13
  • 14. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 expression data set. These patterns may provide clues about the main biological processes associated with different physiological states. In this paper, different from existing bicluster patterns, we first introduce a more general pattern: correlated bicluster, which has intuitive biological interpretation. Then, we propose a novel transform technique based on singular value decomposition so that identifying correlated-bicluster problem from gene expression matrix is transformed into two global clustering problems. The Mixed-Clustering algorithm and the Lift algorithm are devised to efficiently produce ˇ-corBiclusters. The biclusters obtained using our method from gene expression data sets of multiple human organs and the yeast Saccharomyces cerevisiae demonstrate clear biological meanings. 42 Frequent Item Computation on a Chip This Computing frequent items is an important problem by itself and as a subroutine in several data mining algorithms. In this paper, we explore how to accelerate the computation of frequent items using field-programmable gate arrays (FPGAs) with a threefold goal: increase performance over existing solutions, reduce energy consumption over CPU-based systems, and explore the design space in detail as the constraints on FPGAs are very different from those of traditional software- based systems. We discuss three design alternatives, each one of them exploiting different FPGA features and each one providing different performance/scalability trade-offs. An important result of the paper is to demonstrate how the inherent massive parallelism of FPGAs can improve performance of existing algorithms but only after a fundamental redesign of the algorithms. Our experimental results show that, e.g., the pipelined solution we introduce can reach more than 100 million tuples per second of sustained throughput (four times the best available results to date) by making use of techniques that are not available to CPU-based solutions. Moreover, and unlike in software approaches, the high throughput is independent of the skew of the Zipf distribution of the input and at a far lower energy cost. paper presents a new approach to forecast the behavior of time series based on similarity of pattern sequences. First, clustering techniques are used with the aim of grouping and labeling the samples from a data set. Thus, the prediction of a data point is provided as follows: first, the pattern sequence prior to the day to be predicted is extracted. Then, this sequence is searched in the historical data and the prediction is calculated by averaging all the samples immediately after the matched sequence. The main novelty is that only the labels associated with each pattern are considered to forecast the future behavior of the time series, avoiding the use of real values of the time series until the last step of the prediction process. Results from several energy time series are reported and the performance of the proposed method is compared to that of recently published techniques showing a remarkable improvement in the prediction. 43 Inconsistency-Tolerant Integrity Checking All methods for efficient integrity checking require all integrity constraints to be totally satisfied, before any update is executed. However, a certain amount of inconsistency is the rule, rather than the exception in databases. In this paper, we close the gap between theory and practice of integrity checking, i.e., between the unrealistic theoretical requirement of total integrity and the practical need for inconsistency tolerance, which we define for integrity checking methods. We show that most of them can still be used to check whether updates preserve integrity, even if the current state is inconsistent. Inconsistency-tolerant integrity checking proves beneficial both for integrity preservation and query answering. Also, we show that it is useful for view updating, repairs, schema evolution, and other applications. 44 Initialization and Restart in Stochastic Local Search: Computing a Most Probable Explanation in Bayesian Networks For hard computational problems, stochastic local search has proven to be a competitive approach to finding optimal or approximately optimal problem solutions. Two key research questions for stochastic local search algorithms are: Which Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 14
  • 15. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 algorithms are effective for initialization? When should the search process be restarted? In the present work, we investigate these research questions in the context of approximate computation of most probable explanations (MPEs) in Bayesian networks (BNs). We introduce a novel approach, based on the Viterbi algorithm, to explanation initialization in BNs. While the Viterbi algorithm works on sequences and trees, our approach works on BNs with arbitrary topologies. We also give a novel formalization of stochastic local search, with focus on initialization and restart, using probability theory and mixture models. Experimentally, we apply our methods to the problem of MPE computation, using a stochastic local search algorithm known as Stochastic Greedy Search. By carefully optimizing both initialization and restart, we reduce the MPE search time for application BNs by several orders of magnitude compared to using uniform at random initialization without restart. On several BNs from applications, the performance of Stochastic Greedy Search is competitive with clique tree clustering, a state-of-the-art exact algorithm used for MPE computation in BNs. 45 Integration of the HL7 Standard in a Multiagent System to Support Personalized Access to e-Health Services This Natural phenomena show that many creatures form large social groups and move in regular patterns. However, previous works focus on finding the movement patterns of each single object or all objects. In this paper, we first propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which exploits the obtained group movement patterns to reduce the amount of delivered data. The compression algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge phase, we propose a Merge algorithm to merge and compress the location data of a group of moving objects. In the entropy reduction phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that obtains the optimal solution. Moreover, we devise three replacement rules and derive the maximum compression ratio. The experimental results show that the proposed compression algorithm leverages the group movement patterns to reduce the amount of delivered data effectively and efficiently. 46 Exploring Application-Level Semantics for Data Compression Natural phenomena show that many creatures form large social groups and move in regular patterns. However, previous works focus on finding the movement patterns of each single object or all objects. In this paper, we first propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which exploits the obtained group movement patterns to reduce the amount of delivered data. The compression algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge phase, we propose a Merge algorithm to merge and compress the location data of a group of moving objects. In the entropy reduction phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that obtains the optimal solution. Moreover, we devise three replacement rules and derive the maximum compression ratio. The experimental results show that the proposed compression algorithm leverages the group movement patterns to reduce the amount of delivered data effectively and efficiently. 47 Intertemporal Discount Factors as a Measure of Trustworthiness in Electronic Commerce In multiagent interactions, such as e-commerce and file sharing, being able to accurately assess the trustworthiness of others is important for agents to protect themselves from losing utility. Focusing on rational agents in e- commerce, we prove that an agent’s discount factor (time preference of utility) is a direct measure of the agent’s trustworthiness for a set of reasonably general assumptions and definitions. We propose a general list of desiderata for trust systems and discuss how discount factors as trustworthiness meet these desiderata. We discuss how discount factors are a robust measure when entering commitments that exhibit moral hazards. Using an online market as a motivating example, we derive some analytical methods both for measuring discount factors and for aggregating the measurements. Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 15
  • 16. Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com IEEE Final Year Project List 2011-2012 48 IR-Tree: An Efficient Index for Geographic Document Search Given a geographic query that is composed of query keywords and a location, a geographic search engine retrieves documents that are the most textually and spatially relevant to the query keywords and the location, respectively, and ranks the retrieved documents according to their joint textual and spatial relevances to the query. The lack of an efficient index that can simultaneously handle both the textual and spatial aspects of the documents makes existing geographic search engines inefficient in answering geographic queries. In this paper, we propose an efficient index, called IR-tree, that together with a top-k document search algorithm facilitates four major tasks in document searches, namely, 1) spatial filtering, 2) textual filtering, 3) relevance computation, and 4) document ranking in a fully integrated manner. In addition, IR- tree allows searches to adopt different weights on textual and spatial relevance of documents at the runtime and thus caters for a wide variety of applications. A set of comprehensive experiments over a wide range of scenarios has been conducted and the experiment results demonstrate that IR-tree outperforms the state-of-theart approaches for geographic document searches. 49 Knowledge Discovery in Services (KDS): Aggregating Software Services to Discover Enterprise Mashups Service mashup is the act of integrating the resulting data of two complementary software services into a common picture. Such an approach is promising with respect to the discovery of new types of knowledge. However, before service mashup routines can be executed, it is necessary to predict which services (of an open repository) are viable candidates. Similar to Knowledge Discovery in Databases (KDD), we introduce the Knowledge Discovery in Services (KDS) process that identifies mashup candidates. In this work, the KDS process is specialized to address a repository of open services that do not contain semantic annotations. In these situations, specialized techniques are required to determine equivalences among open services with reasonable precision. This paper introduces a bottom- up process for KDS that adapts to the environment of services for which it operates. Detailed experiments are discussed that evaluate KDS techniques on an open repository of services from the Internet and on a repository of services created in a controlled environment. 50 Learning Semi-Riemannian Metrics for Semisupervised Feature Extraction Discriminant feature extraction plays a central role in pattern recognition and classification. Linear Discriminant Analysis (LDA) is a traditional algorithm for supervised feature extraction. Recently, unlabeled data have been utilized to improve LDA. However, the intrinsic problems of LDA still exist and only the similarity among the unlabeled data is utilized. In this paper, we propose a novel algorithm, called Semisupervised Semi-Riemannian Metric Map (S3RMM), following the geometric framework of semi- Riemannian manifolds. S3RMM maximizes the discrepancy of the separability and similarity measures of scatters formulated by using semi-Riemannian metric tensors. The metric tensor of each sample is learned via semisupervised regression. Our method can also be a general framework for proposing new semisupervised algorithms, utilizing the existing discrepancy-criterion-based algorithms. The experiments demonstrated on faces and handwritten digits show that S3RMM is promising for semisupervised feature extraction. 51 Load Shedding in Mobile Systems with MobiQual In location-based, mobile continual query (CQ) systems, two key measures of quality-of-service (QoS) are: freshness and accuracy. To achieve freshness, the CQ server must perform frequent query reevaluations. To attain accuracy, Madurai Trichy Kollam Elysium Technologies Private Limited Elysium Technologies Private Limited Elysium Technologies Private Limited 230, Church Road, Annanagar, 3rd Floor,SI Towers, Surya Complex,Vendor junction, Madurai , Tamilnadu – 625 020. 15 ,Melapudur , Trichy, kollam,Kerala – 691 010. Contact : 91452 4390702, 4392702, 4394702. Tamilnadu – 620 001. Contact : 91474 2723622. eMail: info@elysiumtechnologies.com Contact : 91431 - 4002234. eMail: elysium.kollam@gmail.com eMail: elysium.trichy@gmail.com 16