This document presents research on improving version control systems. It discusses a semi-distributed architecture using web data mining for better load balancing. A syntax-aware differencing algorithm is also suggested to calculate differences at the code structure level for Java. The research is evaluated through simulations showing improvements to synchronization and difference computation speed. Future work includes integrating the approaches into coding environments and testing on diverse networks.
1. Version Control
By Researcher:
Waleed Mohamed Mahmoud Al-Adrousy
Computer Science Department,
Faculty of computers and Information Systems
Mansoura University
Dr-Samir El-Desouky El-Mougy Dr-Ahmed Abou El-Fetouh Saleh
Computer Science Department, Information System Department,
Faculty of computers and Information Systems Faculty of computers and Information Systems
Mansoura university Mansoura university
2. Agenda
● Version Control introduction
● Objectives
● Previous work
● Applied Algorithms and Technologies:
– Suggested load balancing architecture
– Suggested Differencing Algorithm
● Testing results.
● Future work
2
3. Agenda
● Version Control introduction
● Objectives
● Previous work
● Applied Algorithms and Technologies:
– Suggested load balancing architecture
– Suggested Differencing Algorithm
● Testing results.
● Future work
3
4. Version Control Definition
● Network based system
● Controls access to computer files
● Track modifications for current and back-up
files
● Tracks History.
● Synchronizes Concurrent Access to files.
4
11. Agenda
● Version Control introduction
● Objectives
● Previous work
● Applied Algorithms and Technologies:
– Suggested load balancing architecture
– Suggested Differencing Algorithm
● Testing results.
● Future work
11
12. Objectives
● Part 1
– Better load balancing based on behavior analysis.
– Optimization of synchronization process.
– Dynamic clustering of work.
– Compromise centralized and distributed models.
● Part 2
– Grammar based difference calculation.
– Difference computation speed.
– Adding on-line support for syntax differencing.
– Application on java language.
12
13. Agenda
● Version Control introduction
● Objectives
● Previous work
● Applied Algorithms and Technologies:
– Suggested load balancing architecture
– Suggested Differencing Algorithm
● Testing results.
● Future work
13
15. Previous Work (Continue)
● Dick Grune in 1986 (CVS)
● CollabNet Inc in 2000 (subversion)
● Peer to peer (P2P) technolgies in late 1990s
15
16. Previous Work (Cont.)
● Language Modeling is :
internal source code representation for processing
● Some famous language modeling techniques:
Famix Model
XML representation standard of java source code
JavaML standard
16
17. Agenda
● Version Control introduction
● Objectives
● Previous work
● Applied Algorithms and Technologies:
– Suggested load balancing architecture
– Suggested Differencing Algorithm
● Testing results.
● Future work
17
18. Agenda
● Version Control introduction
● Objectives
● Previous work
● Applied Algorithms and Technologies:
– Suggested load balancing architecture
– Suggested Differencing Algorithm
● Testing results.
● Future work
18
20. Part 1 Objectives
● Better load balancing based on behavior analysis.
● Optimization of synchronization process.
● Dynamic clustering of work.
● Getting both advantages of centralized and
distributed models.
20
23. Web Data Mining
● Definition...
● 3 Types of Algorithms:
– Centrality and Closeness
– Ranking
– Clustering
23
24. Graph
● Definition:
a set of vertices and a set of edges . Edges are specified as a
pair, (v1, v2), where v1 and v2 are two vertices in the graph. A vertex
can also have a weight, sometimes also called a cost.
● Types
– Directed → like project dependencies
– Undirected → like communication between developers
24
33. Existing Differencing Algorithms
Comparison Line Based Structure Based
Example LCS DiffX, Xdiff and Xydiff
Comparison unit Line Structure
Implementation Difficulty Easy Hard
Dealing with logical nature
of code that consists of Deals with it
Ignores it
(classes, methods, (helpful for developers)
objects,...etc)
33
34. Part 2 Objectives
● Adding grammar based difference calculation.
● Enhancing difference computation speed.
● Adding on-line support for syntax differencing.
● Application on java language.
34
35. Abstract Syntax Tree (AST)
● Important for parsers to model source code as
structure instead of plain text/lines
● Many parser generators for java , ANTLR is chosen
35
36. XML Standard
● Known Structured data representation format.
● Used widely for interoperability
● Used in many protocols in Internet and web
services.
36
41. Agenda
● Version Control introduction
● Objectives
● Previous work
● Applied Algorithms and Technologies:
– Suggested load balancing architecture
– Suggested Differencing Algorithm
● Testing results.
● Future work
41
42. Simulation Results
● Two Subsystems are simulated
– Semi-structured version control
JfreeChart, Jung,MyJTable, and Piccolo Tools
– Syntax Aware Diff For Web Based Version Control
Systems
AJAX, GWT, XML, ANTLR, JDOM, XML unit and java servlets
● Note : the following results are based on custom
simulation not real-life data according to limitation
of human team to apply tests .
42
57. Part 1 Future work
(for our semi-distributed proposed algorithm)
● Making a real-world case study.
● Integration with coding Environment- IDE.
● Considering some aspects like security and backup.
● Testing in many network platforms on
heterogeneous devices.
57
58. Part 2 Future work
(Syntax-Aware Differencing)
● Merge the semi-distributed algorithm with differencing algorithm.
● Reduce the long representation of short Java source code.
● Enhance the readability of the differencing results.
● Integrate with an existing IDE
● Enhance the Visualization of Graphical user Interface (GUI) of the web tool.
● Enhance the used model of asynchronous web page design (Rich Online
IDE).
● Develop as a web service.
● Port this algorithm to other languages rather than Java by replacing the
modeling part to read the other language grammar.
58