Keynote presentation given by Diomidis Spinellis, Professor in the Department of Management Science and Technology of the Athens University of Economics and Business, and Editor in Chief of IEEE Software.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Half-Century of Unix; History, Preservation, and Lessons Learned
1. Half Century of Unix:
History, Preservation, and
Lessons Learned
Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
@CoolSWEng
www.spinellis.gr
dds@aueb.gr
2.
3.
4.
5.
6.
7. Overview
• Unix history
• Unix history repository contents
• Repository creation process
• Contributing extensions
• Example 1: Programming practices
• Example 2: Architectural evolution
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29. Why Unix is important
• Exemplar design
• Technical contributions,
• Impact
• Development model
• Widespread use
• “unusual simplicity, power, and elegance”
30.
31. System technology
• Hierarchical file system
• Compatible file, device, networking, and inter-
process I/O
• Pipes and filters architecture
• Virtual file systems
• The shell as a user-selectable regular process
32. Associated Technologies
• C and C++
• Parser and lexical analyzer generators
• Software development environments
• Document preparation tools and declarative
markup
• Scripting languages
• TCP/IP networking
• Configuration management systems
33.
34.
35. Motivation
• Explore evolution of programming style
• Consolidate digital artifacts of historical
importance
• Collect and record history that is fading away
• Provide a data set of digital archeology and
repository mining
36. Things to Take Away …
• 1.1GB Git repository
– github.com/dspinellis/unix-history-repo
• Documentation of the authorship details
• Open source project
– github.com/dspinellis/unix-history-make
• Techniques and tools for snapshot import
• Ideas for empirical studies
37. In Numbers …
Metric Unix history Linux history
Start date 30/06/1970 17/09/1991
Start files 43 92
Start lines 11,500 917,812
End files 63,049 51,396
End lines 27,388,943 21,525,436
Data set size (.git) 1.1GB 1.0GB
Number of commits 495,622 611,735
Number of merges 2,523 48,821
Number of authors 973 18,465
Days with activity 13,004 5,126
75. Git Fast Import
# 315830189 ../archive/3bsd/usr/src/cmd/ex/ex_addr.c
blob
mark:3
data5190
/* Copyright (c) 1979 Regents of the University of California */
#include "ex.h"
#include "ex_re.h"
[...]
# Start development commits from a clean slate
commit refs/heads/BSD-3-Snapshot-Development
mark:10
author Bill Joy <wnj@ucbvax.Berkeley.EDU> 287674317 -0800
committer Bill Joy <wnj@ucbvax.Berkeley.EDU> 287674317 -0800
data99
Start development on BSD 3
Create reference copy of all prior development files
(Synthetic commit)
merge Bell-32V
merge BSD-2
M 100644 1468bde18e292c07e5d30ecbd7fd2b91a60e4626 .ref-Bell-
32V/usr/include/stat.h
M 100644 1468bde18e292c07e5d30ecbd7fd2b91a60e4626 .ref-Bell-
32V/usr/include/sys/stat.h
M 100644 816685f1f60f44dfaed7e673294b9d07a12114e5 .ref-Bell-
32V/usr/man/man2/open.2
[...]
# 315830189 ../archive/3bsd/usr/src/cmd/ex/ex_addr.c
commit refs/heads/BSD-3-Snapshot-Development
mark:13
authorBill Joy <wnj@ucbvax.Berkeley.EDU> 315830189 -0800
committer Bill Joy <wnj@ucbvax.Berkeley.EDU>315830189 -0800
data75
BSD 3 development
Work on file usr/src/cmd/ex/ex_addr.c
(Synthetic commit)
M 100644 :3 usr/src/cmd/ex/ex_addr.c
[...]
# Release
commit refs/heads/BSD-Release
mark :3700
authorBill Joy <wnj@ucbvax.Berkeley.EDU> 315928541 -0800
committer Bill Joy <wnj@ucbvax.Berkeley.EDU>315928541 -0800
data78
BSD 3 release
Snapshotof the completed development branch
(Synthetic commit)
from :3699
merge Bell-32V
merge BSD-2
D .ref-Bell-32V
D .ref-BSD-2
tag BSD-3
from :3700
tagger Bill Joy <wnj@ucbvax.Berkeley.EDU>315928541 -0800
data91
Tagged 3 release snapshot of BSD with 3
Source directory: ../archive/3bsd
(Synthetic tag)
done
76. Research Applications
• Software evolution
• Handover across generations
• Software/hardware co-evolution
• Evolution of programming practices
• Organizational culture
• Individual programmers
• Code longevity
• Git engineering
77. /*
* Editor
*/
#include <signal.h>
#include <sgtty.h>
#include <setjmp.h>
#define NULL 0
#define FNSIZE 64
#define LBSIZE 512
#define ESIZE 128
#define GBSIZE 256
#define NBRA 5
#define EOF -1
#define KSIZE 9
#define CBRA 1
#define CCHR 2
#define CDOT 4
#define CCL 6
#define NCCL 8
#define CDOL 10
#define CEOF 11
#define CKET 12
#define CBACK 14
#define STAR 01
char Q[] = "";
char T[] = "TMP";
#define READ 0
#define WRITE 1
int peekc;
int lastc;
char savedfile[FNSIZE];
char file[FNSIZE];
char linebuf[LBSIZE];
char rhsbuf[LBSIZE/2];
char expbuf[ESIZE+4];
int circfl;
int *zero;
int *dot;
int *dol;
int *addr1;
int *addr2;
char genbuf[LBSIZE];
long count;
char *nextip;
char *linebp;
int ninbuf;
int io;
int pflag;
long lseek();
int (*oldhup)();
int (*oldquit)();
int vflag = 1;
int xflag;
int xtflag;
int kflag;
char key[KSIZE + 1];
char crbuf[512];
char perm[768];
char tperm[768];
int listf;
int col;
char *globp;
int tfile = -1;
int tline;
char *tfname;
char *loc1;
char *loc2;
char *locs;
char ibuff[512];
int iblock = -1;
char obuff[512];
int oblock = -1;
int ichanged;
int nleft;
char WRERR[] = "WRITE ERROR";
int names[26];
int anymarks;
char *braslist[NBRA];
char *braelist[NBRA];
int nbra;
int subnewa;
int subolda;
int fchange;
int wrapp;
unsigned nlall = 128;
int *address();
char *getline();
char *getblock();
char *place();
char *mktemp();
char *malloc();
char *realloc();
jmp_buf savej;
main(argc, argv)
char **argv;
{
register char *p1, *p2;
extern int onintr(), quit(), onhup();
int (*oldintr)();
oldquit = signal(SIGQUIT, SIG_IGN);
oldhup = signal(SIGHUP, SIG_IGN);
oldintr = signal(SIGINT, SIG_IGN);
if ((int)signal(SIGTERM, SIG_IGN) == 0)
signal(SIGTERM, quit);
argv++;
while (argc > 1 && **argv=='-') {
switch((*argv)[1]) {
case '0':
vflag = 0;
break;
case 'q':
signal(SIGQUIT, SIG_DFL);
vflag = 1;
break;
case 'x':
xflag = 1;
break;
}
argv++;
argc--;
}
if(xflag){
getkey();
kflag = crinit(key, perm);
}
if (argc>1) {
p1 = *argv;
p2 = savedfile;
while (*p2++ = *p1++)
;
globp = "r";
}
zero = (int *)malloc(nlall*sizeof(int));
tfname = mktemp("/tmp/eXXXXX");
init();
if (((int)oldintr&01) == 0)
signal(SIGINT, onintr);
if (((int)oldhup&01) == 0)
signal(SIGHUP, onhup);
setjmp(savej);
commands();
quit();
}
commands()
{
int getfile(), gettty();
register *a1, c;
for (;;) {
if (pflag) {
pflag = 0;
addr1 = addr2 = dot;
goto print;
}
addr1 = 0;
addr2 = 0;
do {
addr1 = addr2;
if ((a1 = address())==0) {
c = getchr();
break;
}
addr2 = a1;
if ((c=getchr()) == ';') {
c = ',';
dot = a1;
}
78.
79.
80.
81.
82.
83.
84.
85.
86. H1: Programming practices reflect
technology affordances
Increase in mean file
length
(lines / file)
87. H1: Programming practices reflect
technology affordances
Increase in mean file
functionality
(statements / file)
88. H1: Programming practices reflect
technology affordances
Increase in mean line
length
(characters / line)
89. H1: Programming practices reflect
technology affordances
Increase in mean
identifier length
(characters / line)
int creat();
90. … and I once heard an old-timer growl at a
young programmer:
“I've written boot loaders that were shorter
than your variable names!”
— Stephen C. Johnson
91. H1: Programming practices reflect
technology affordances
Increase in mean
function length
(lines / function)
{
}
92.
93. H2: Modularity increases with code
size
Increase in number of
static declarations /
statement
static short splice;
94. H2: Modularity increases with code
size
Increase in number of
#include directives /
line
#include "if_uba.h"
95.
96. H3: New language features are
increasingly used to saturation point
Increase in number of
const declarations /
statement
const char *panicstr;
97. H3: New language features are
increasingly used to saturation point
Increase in number of
enum declarations /
statement
enum uio_rw rw;
98. H3: New language features are
increasingly used to saturation point
Increase in number of
inline declarations /
statement
inline uchar get_byte ();
99. H3: New language features are
increasingly used to saturation point
Increase in number of
void declarations /
statement
sc_max_unit(void)
100. H3: New language features are
increasingly used to saturation point
Increase in number of
volatile declarations /
statement
volatile struct proc *p, *pp;
101. H3: New language features are
increasingly used to saturation point
Increase in number of
unsigned declarations
/ statement
unsigned c[BMAX + 1];
102.
103. H4: Programmers trust the compiler
for register allocation
Decreasing number of
register declarations /
statement
register struct ifnet *ifp;
121. PDP-7 (1970)
• Kernel (2489 lines of PDP-7 assembly)
• Layering and partitioning
• System call
• Code and data scoping
• Interpreter
122. First Research Edition (1971)
• Complete rewrite (4213 lines kernel)
• Reference architecture
– 34 system calls
– 18 common with PDP-7 version
– 18 survive until today
• Binary code API
• Abstraction of standard I/O
• Devices as files
123. Second Research Edition (1972)
• Software library
• User-contributed code
– Public and documented
• Shell as a user program
• Interoperability through documented file
formats
139. Action Items
• Use the repository for your research
• Improve repository
– Authors and maps
– Merge concurrent SCCS, CVS commits
– 2.* BSD
– Research Editions 8-10, and Plan 9
– NetBSD, OpenBSD
• Lobby to open the code of System V
• Improve Git’s performance and accuracy
141. Funding Credit
The research described has been partially
carried out as part of the CROSSMINER
Project, which has received funding from the
European Union’s Horizon 2020 Research and
Innovation Programme under grant
agreement No. 732223.
146. Extending the Data Set
• Add data download Makefile rule
• Add authorship information
• Add non-import file list
• Add tree graft import statement
• Rebuild the history repository.
• Verify checked out version matches original data
• Verify git blame / log, branches / merges
• Add corresponding verification rules
151. H3: New language features are
increasingly used to saturation point
Lackluster adoption
of signed declarations
long signed t;
152. Language Evolution
[It is a mistake having keywords that]
“what they add to the cost of learning and using
the language is not repaid in greater
expressiveness”
— Dennis M. Ritchie
155. Inconsistency over 19 style rules
• 0: perfectly consistent
• 0.5: completely inconsistent
if (p) {
156. Investment Advice
• Minimal involvement of
the programmer
• At least modest gains
• Very low downside risk
• Static analysis to locate
bugs
• Resource management
• Utilization of multiple
computing cores
• Optimization of cache
and memory access
patterns
• Reduction of energy use
159. Agreement with Lehman’s Laws
• Increasing Complexity
• Conservation of Familiarity
• Declining Quality
• Feedback System
160. Hypotheses
1. Programming practices reflect technology
affordances
2. Modularity increases with code size
3. New language features are increasingly used to
saturation point
4. Programmers trust the compiler for register allocation
5. Code formatting practices converge to a common
standard
6. Software complexity evolution follows self correction
7. Code readability increases
161. Analysis
• Calculated weighted derivate values
– Densities
– Averages
• General Additive Model (GAM) regression
– With cubic splines
163. H7: Code readability increases
Seen
• Increased in the past
• Does not continue to
increase
• Developers lost interest?
• Diminishing returns of
investing in the code's
documentary structure
Future directions
• More powerful
programming structures
• Refactoring
• Specialized libraries
• Model-driven development
• Meta-programming
• Domain-specific languages
• Static analysis
• Online collaboration
platforms
164. Threats to Validity
• No causal relationships
• Single system (Unix)
• No match for all
programmers
– Two Turing award winners
– Two Fortune 500 founders