Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Half-Century of Unix; History, Preservation, and Lessons Learned

717 Aufrufe

Veröffentlicht am

Keynote presentation given by Diomidis Spinellis, Professor in the Department of Management Science and Technology of the Athens University of Economics and Business, and Editor in Chief of IEEE Software.

Veröffentlicht in: Technologie
  • I’ve personally never heard of companies who can produce a paper for you until word got around among my college groupmates. My professor asked me to write a research paper based on a field I have no idea about. My research skills are also very poor. So, I thought I’d give it a try. I chose a writer who matched my writing style and fulfilled every requirement I proposed. I turned my paper in and I actually got a good grade. I highly recommend ⇒ www.HelpWriting.net ⇐
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • If you’re struggling with your assignments like me, check out ⇒ www.WritePaper.info ⇐. My friend sent me a link to to tis site. This awesome company. After I was continuously complaining to my family and friends about the ordeals of student life. They wrote my entire research paper for me, and it turned out brilliantly. I highly recommend this service to anyone in my shoes. ⇒ www.WritePaper.info ⇐.
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Very nice tips on this. In case you need help on any kind of academic writing visit our website HelpWriting.net and place your order
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Half-Century of Unix; History, Preservation, and Lessons Learned

  1. 1. Half Century of Unix: History, Preservation, and Lessons Learned Diomidis Spinellis Department of Management Science and Technology Athens University of Economics and Business @CoolSWEng www.spinellis.gr dds@aueb.gr
  2. 2. Overview • Unix history • Unix history repository contents • Repository creation process • Contributing extensions • Example 1: Programming practices • Example 2: Architectural evolution
  3. 3. Why Unix is important • Exemplar design • Technical contributions, • Impact • Development model • Widespread use • “unusual simplicity, power, and elegance”
  4. 4. System technology • Hierarchical file system • Compatible file, device, networking, and inter- process I/O • Pipes and filters architecture • Virtual file systems • The shell as a user-selectable regular process
  5. 5. Associated Technologies • C and C++ • Parser and lexical analyzer generators • Software development environments • Document preparation tools and declarative markup • Scripting languages • TCP/IP networking • Configuration management systems
  6. 6. Motivation • Explore evolution of programming style • Consolidate digital artifacts of historical importance • Collect and record history that is fading away • Provide a data set of digital archeology and repository mining
  7. 7. Things to Take Away … • 1.1GB Git repository – github.com/dspinellis/unix-history-repo • Documentation of the authorship details • Open source project – github.com/dspinellis/unix-history-make • Techniques and tools for snapshot import • Ideas for empirical studies
  8. 8. In Numbers … Metric Unix history Linux history Start date 30/06/1970 17/09/1991 Start files 43 92 Start lines 11,500 917,812 End files 63,049 51,396 End lines 27,388,943 21,525,436 Data set size (.git) 1.1GB 1.0GB Number of commits 495,622 611,735 Number of merges 2,523 48,821 Number of authors 973 18,465 Days with activity 13,004 5,126
  9. 9. Repository Contents • Research Edition Unix: PDP-7, V1, V3–V7 • Unix 32V • BSD 1, 2, 3, 4, 4.1, 4.2, 4.3 *, 4.4 * • 386BSD 0.0, 0.1 • FreeBSD 1.0–11.0 • Tags • Contributors • Branches and merges
  10. 10. Research Editions • PDP-7 Unix Printed kernel and utilities
  11. 11. Research Editions • 1st: (Nov 1971) Printed PDP-11 kernel
  12. 12. Research Editions • 1st: (Nov 1971) Printed PDP-11 kernel • 2nd: (Jun 1972) Dump DECtape fragments of programs
  13. 13. Research Editions • 1st: (Nov 1971) Printed PDP-11 kernel • 2nd: (Jun 1972) Dump DECtape fragments of programs • 3rd: (Feb 1973) 90% C kernel • 4th: (Nov 1973) only troff manual • 5th: (June 1974): No manual source • 6th: (May 1975): Complete, widely distributed
  14. 14. Research Editions • 1st: (Nov 1971) Printed PDP-11 kernel • 2nd: (Jun 1972) Dump DECtape fragments of programs • 3rd: (Feb 1973) 90% C kernel • 4th: (Nov 1973) troff manual • 5th: (June 1974): No manual source • 6th: (May 1975): Widely distributed • 7th: (Jan 1979): awk, expr, find, lex, sed, tar, uucp, Bourne shell, …
  15. 15. 32/V • 1978 • By John Raiser and Tom London • Bell Labs Holmdel • VAX as a large PDP-11 – swapping, not paging
  16. 16. 15 Berkeley Snapshots • BSD (1978): ex, Pascal, tools • 2BSD: vi, termcap, csh, …) • 3BSD (1979): VM • 4BSD (1980): CSRG/DARPA (email, ^Z, signals) • 4.1c2BSD (1982): TCP/IP, ftp, rsh, rlogin, … • … • 4.3BSD (1988) performance, BIND • 4.3BSD Net/1 (1988) no AT&T licensing • … • 4.4BSD-Lite Release/2 (1995) last enahancements
  17. 17. 1 10 100 1000 10000 100000 1000000 10000000 01/06/1974 01/05/1975 01/04/1976 01/03/1977 01/02/1978 01/01/1979 01/12/1979 01/11/1980 01/10/1981 01/09/1982 01/08/1983 01/07/1984 01/06/1985 01/05/1986 01/04/1987 01/03/1988 01/02/1989 01/01/1990 01/12/1990 01/11/1991 01/10/1992 01/09/1993 01/08/1994 01/07/1995 01/06/1996 01/05/1997 01/04/1998 01/03/1999 01/02/2000 01/01/2001 01/12/2001 01/11/2002 01/10/2003 01/09/2004 01/08/2005 01/07/2006 01/06/2007 01/05/2008 01/04/2009 01/03/2010 01/02/2011 01/01/2012 01/12/2012 Unix Kernel (Research, BSD, FreeBSD) LOC
  18. 18. Metadata • Date • Author • Commit parents
  19. 19. Creation process • Gather primary material (11GB) • Populate author maps, author details • Import command – Release snapshots – SCCS – (CVS), Git • Build script • Lookaside reference files
  20. 20. GitHub Integration
  21. 21. Git Fast Import # 315830189 ../archive/3bsd/usr/src/cmd/ex/ex_addr.c blob mark:3 data5190 /* Copyright (c) 1979 Regents of the University of California */ #include "ex.h" #include "ex_re.h" [...] # Start development commits from a clean slate commit refs/heads/BSD-3-Snapshot-Development mark:10 author Bill Joy <wnj@ucbvax.Berkeley.EDU> 287674317 -0800 committer Bill Joy <wnj@ucbvax.Berkeley.EDU> 287674317 -0800 data99 Start development on BSD 3 Create reference copy of all prior development files (Synthetic commit) merge Bell-32V merge BSD-2 M 100644 1468bde18e292c07e5d30ecbd7fd2b91a60e4626 .ref-Bell- 32V/usr/include/stat.h M 100644 1468bde18e292c07e5d30ecbd7fd2b91a60e4626 .ref-Bell- 32V/usr/include/sys/stat.h M 100644 816685f1f60f44dfaed7e673294b9d07a12114e5 .ref-Bell- 32V/usr/man/man2/open.2 [...] # 315830189 ../archive/3bsd/usr/src/cmd/ex/ex_addr.c commit refs/heads/BSD-3-Snapshot-Development mark:13 authorBill Joy <wnj@ucbvax.Berkeley.EDU> 315830189 -0800 committer Bill Joy <wnj@ucbvax.Berkeley.EDU>315830189 -0800 data75 BSD 3 development Work on file usr/src/cmd/ex/ex_addr.c (Synthetic commit) M 100644 :3 usr/src/cmd/ex/ex_addr.c [...] # Release commit refs/heads/BSD-Release mark :3700 authorBill Joy <wnj@ucbvax.Berkeley.EDU> 315928541 -0800 committer Bill Joy <wnj@ucbvax.Berkeley.EDU>315928541 -0800 data78 BSD 3 release Snapshotof the completed development branch (Synthetic commit) from :3699 merge Bell-32V merge BSD-2 D .ref-Bell-32V D .ref-BSD-2 tag BSD-3 from :3700 tagger Bill Joy <wnj@ucbvax.Berkeley.EDU>315928541 -0800 data91 Tagged 3 release snapshot of BSD with 3 Source directory: ../archive/3bsd (Synthetic tag) done
  22. 22. Research Applications • Software evolution • Handover across generations • Software/hardware co-evolution • Evolution of programming practices • Organizational culture • Individual programmers • Code longevity • Git engineering
  23. 23. /* * Editor */ #include <signal.h> #include <sgtty.h> #include <setjmp.h> #define NULL 0 #define FNSIZE 64 #define LBSIZE 512 #define ESIZE 128 #define GBSIZE 256 #define NBRA 5 #define EOF -1 #define KSIZE 9 #define CBRA 1 #define CCHR 2 #define CDOT 4 #define CCL 6 #define NCCL 8 #define CDOL 10 #define CEOF 11 #define CKET 12 #define CBACK 14 #define STAR 01 char Q[] = ""; char T[] = "TMP"; #define READ 0 #define WRITE 1 int peekc; int lastc; char savedfile[FNSIZE]; char file[FNSIZE]; char linebuf[LBSIZE]; char rhsbuf[LBSIZE/2]; char expbuf[ESIZE+4]; int circfl; int *zero; int *dot; int *dol; int *addr1; int *addr2; char genbuf[LBSIZE]; long count; char *nextip; char *linebp; int ninbuf; int io; int pflag; long lseek(); int (*oldhup)(); int (*oldquit)(); int vflag = 1; int xflag; int xtflag; int kflag; char key[KSIZE + 1]; char crbuf[512]; char perm[768]; char tperm[768]; int listf; int col; char *globp; int tfile = -1; int tline; char *tfname; char *loc1; char *loc2; char *locs; char ibuff[512]; int iblock = -1; char obuff[512]; int oblock = -1; int ichanged; int nleft; char WRERR[] = "WRITE ERROR"; int names[26]; int anymarks; char *braslist[NBRA]; char *braelist[NBRA]; int nbra; int subnewa; int subolda; int fchange; int wrapp; unsigned nlall = 128; int *address(); char *getline(); char *getblock(); char *place(); char *mktemp(); char *malloc(); char *realloc(); jmp_buf savej; main(argc, argv) char **argv; { register char *p1, *p2; extern int onintr(), quit(), onhup(); int (*oldintr)(); oldquit = signal(SIGQUIT, SIG_IGN); oldhup = signal(SIGHUP, SIG_IGN); oldintr = signal(SIGINT, SIG_IGN); if ((int)signal(SIGTERM, SIG_IGN) == 0) signal(SIGTERM, quit); argv++; while (argc > 1 && **argv=='-') { switch((*argv)[1]) { case '0': vflag = 0; break; case 'q': signal(SIGQUIT, SIG_DFL); vflag = 1; break; case 'x': xflag = 1; break; } argv++; argc--; } if(xflag){ getkey(); kflag = crinit(key, perm); } if (argc>1) { p1 = *argv; p2 = savedfile; while (*p2++ = *p1++) ; globp = "r"; } zero = (int *)malloc(nlall*sizeof(int)); tfname = mktemp("/tmp/eXXXXX"); init(); if (((int)oldintr&01) == 0) signal(SIGINT, onintr); if (((int)oldhup&01) == 0) signal(SIGHUP, onhup); setjmp(savej); commands(); quit(); } commands() { int getfile(), gettty(); register *a1, c; for (;;) { if (pflag) { pflag = 0; addr1 = addr2 = dot; goto print; } addr1 = 0; addr2 = 0; do { addr1 = addr2; if ((a1 = address())==0) { c = getchr(); break; } addr2 = a1; if ((c=getchr()) == ';') { c = ','; dot = a1; }
  24. 24. H1: Programming practices reflect technology affordances Increase in mean file length (lines / file)
  25. 25. H1: Programming practices reflect technology affordances Increase in mean file functionality (statements / file)
  26. 26. H1: Programming practices reflect technology affordances Increase in mean line length (characters / line)
  27. 27. H1: Programming practices reflect technology affordances Increase in mean identifier length (characters / line) int creat();
  28. 28. … and I once heard an old-timer growl at a young programmer: “I've written boot loaders that were shorter than your variable names!” — Stephen C. Johnson
  29. 29. H1: Programming practices reflect technology affordances Increase in mean function length (lines / function) { }
  30. 30. H2: Modularity increases with code size Increase in number of static declarations / statement static short splice;
  31. 31. H2: Modularity increases with code size Increase in number of #include directives / line #include "if_uba.h"
  32. 32. H3: New language features are increasingly used to saturation point Increase in number of const declarations / statement const char *panicstr;
  33. 33. H3: New language features are increasingly used to saturation point Increase in number of enum declarations / statement enum uio_rw rw;
  34. 34. H3: New language features are increasingly used to saturation point Increase in number of inline declarations / statement inline uchar get_byte ();
  35. 35. H3: New language features are increasingly used to saturation point Increase in number of void declarations / statement sc_max_unit(void)
  36. 36. H3: New language features are increasingly used to saturation point Increase in number of volatile declarations / statement volatile struct proc *p, *pp;
  37. 37. H3: New language features are increasingly used to saturation point Increase in number of unsigned declarations / statement unsigned c[BMAX + 1];
  38. 38. H4: Programmers trust the compiler for register allocation Decreasing number of register declarations / statement register struct ifnet *ifp;
  39. 39. H5: Code formatting practices converge to a common standard
  40. 40. H5: Code formatting practices converge to a common standard Decrease in code inconsistency if (q()) { if( q() ) {
  41. 41. H5: Code formatting practices converge to a common standard Decrease in indentation spaces standard deviation if (a) while (b) for (;;)
  42. 42. H6: Software complexity evolution follows self correction Mean lines / function { …. …. …. }
  43. 43. H6: Software complexity evolution follows self correction Mean statement nesting if (a) while (b) for (;;) if (d())
  44. 44. H6: Software complexity evolution follows self correction Density of C preprocessor conditionals #if #ifdef #elif
  45. 45. H6: Software complexity evolution follows self correction Density of C preprocessor non- include directives #define #if
  46. 46. H6: Software complexity evolution follows self correction goto keyword density
  47. 47. H7: Code readability increases Mean indentation spaces converge around 6
  48. 48. H7: Code readability increases Statements / line decrease a(); b++; d();
  49. 49. H7: Code readability increases Comment character density
  50. 50. “Kludge” words bugbug buggy bullsh*t cr*p crash d*mn d*mned doom doomed fixme f*ck f*cker f*cking hack hacked hackery hacks hell kludge kludges lame lameness p**p screwed screws sh*t sh*ts s*ck s*cks todo xxx
  51. 51. H7: Code readability increases Kludge word density
  52. 52. PDP-7 (1970) • Kernel (2489 lines of PDP-7 assembly) • Layering and partitioning • System call • Code and data scoping • Interpreter
  53. 53. First Research Edition (1971) • Complete rewrite (4213 lines kernel) • Reference architecture – 34 system calls – 18 common with PDP-7 version – 18 survive until today • Binary code API • Abstraction of standard I/O • Devices as files
  54. 54. Second Research Edition (1972) • Software library • User-contributed code – Public and documented • Shell as a user program • Interoperability through documented file formats
  55. 55. Third Research Edition (Feb 1973) • Pipe abstraction • Tools as filters
  56. 56. Fourth Research Edition (Nov 1973) • Implemented in “new B” (C) – 7141 lines, only 768 in PDP-11 assembly • Structured programming • Language-independent API • Data structure definitions • Device driver abstraction – Method interfaces – Strategy functions
  57. 57. dspinellis.github.io/unix-history-man
  58. 58. Action Items • Use the repository for your research • Improve repository – Authors and maps – Merge concurrent SCCS, CVS commits – 2.* BSD – Research Editions 8-10, and Plan 9 – NetBSD, OpenBSD • Lobby to open the code of System V • Improve Git’s performance and accuracy
  59. 59. Thank you! github.com/dspinellis/unix-history-repo dds@aueb.gr www.spinellis.gr @CoolSWEng
  60. 60. Funding Credit The research described has been partially carried out as part of the CROSSMINER Project, which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 732223.
  61. 61. Image Credits • Decades: Wikipedia (1970,80,90,2000) • ASR-33 Teletype: Rama & Musée Bolo • VT100: Jason Scott • VAX 11/780: Joe Mabel • PDP 11/20: Image courtesy of Computer History Museum • VAX in use: Photo courtesy of Berkeley Lab © 2010 The Regents of the University of California, through the Lawrence Berkeley National Laboratory. • Pentium: Iorsh • Hypotheses: Niklas Morberg • Modules: Suatu Ketika • Reading glasses: Walt Stoneburner • Cables: christof tof • Chemical flasks: Joe Sullivan • Snake Oil cover: Clark Stanley • Haswell Chip: Intel Free Press • Sparcstation10: Thomas Kaiser • Gold coins: Anonimski • Go to statementconsidered harmful: Edsger W. Dijkstra and ACM • Manny Lehman: © Copyright 2009 Imperial College London • Saladin and Guy de Lusignan after battle of Hattin in 1187: Said Tahsine (Creativecommons licenses)
  62. 62. Backup Slides
  63. 63. Commits per year 0 5000 10000 15000 20000 25000 30000 35000 40000 1972 1973 1974 1975 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
  64. 64. Extending the Data Set • Add data download Makefile rule • Add authorship information • Add non-import file list • Add tree graft import statement • Rebuild the history repository. • Verify checked out version matches original data • Verify git blame / log, branches / merges • Add corresponding verification rules
  65. 65. cqmetrics $ qmcalc contrib/nvi/ex/ex.c 63249 2372 0 25.6648 19 74 21.701 12 697 0 2.00861 2 5 1.29004 13 7 5 63 0 0 00 0 0 0 0 15 0 0 190 29143 253 0 0 175 27 16 3 5 2 0.000914077 12 123.19 4071.31 589.322 34258.2 9239.62 12 1 43.25 11 346 92.9338 3126 1 4.28055 3 20 2.62054 340 17.16471 7 20 3.13555 1273 6.66667 8.00288 8 12 0.1903 00 0 7 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 813 33 468 511 168 58 801 688 168 646 168 801 95 40 735 95 468 95 800 github.com/dspinellis/cqmetrics
  66. 66. H3: New language features are increasingly used to saturation point Lackluster adoption of signed declarations long signed t;
  67. 67. Language Evolution [It is a mistake having keywords that] “what they add to the cost of learning and using the language is not repaid in greater expressiveness” — Dennis M. Ritchie
  68. 68. Handling of Evolution Good • C++ • Fortran • Java Mismanaged • Perl • Python Limits • Lisp • C
  69. 69. Inconsistency over 19 style rules • 0: perfectly consistent • 0.5: completely inconsistent if (p) {
  70. 70. Investment Advice • Minimal involvement of the programmer • At least modest gains • Very low downside risk • Static analysis to locate bugs • Resource management • Utilization of multiple computing cores • Optimization of cache and memory access patterns • Reduction of energy use
  71. 71. Data Sources
  72. 72. Authorship Collection • Papers, books, documentation • Scan source code, manual pages • Unix StackExchange Q&A • File location (e.g. /usr/sys/dmr) • Propagation
  73. 73. Agreement with Lehman’s Laws • Increasing Complexity • Conservation of Familiarity • Declining Quality • Feedback System
  74. 74. Hypotheses 1. Programming practices reflect technology affordances 2. Modularity increases with code size 3. New language features are increasingly used to saturation point 4. Programmers trust the compiler for register allocation 5. Code formatting practices converge to a common standard 6. Software complexity evolution follows self correction 7. Code readability increases
  75. 75. Analysis • Calculated weighted derivate values – Densities – Averages • General Additive Model (GAM) regression – With cubic splines
  76. 76. H7: Code readability increases
  77. 77. H7: Code readability increases Seen • Increased in the past • Does not continue to increase • Developers lost interest? • Diminishing returns of investing in the code's documentary structure Future directions • More powerful programming structures • Refactoring • Specialized libraries • Model-driven development • Meta-programming • Domain-specific languages • Static analysis • Online collaboration platforms
  78. 78. Threats to Validity • No causal relationships • Single system (Unix) • No match for all programmers – Two Turing award winners – Two Fortune 500 founders
  79. 79. Architectural Evolution • Qualitative analysis – Components and connectors – Patterns and principles • Quantitative analysis – Size – Cohesion – Coupling – Complexity
  80. 80. Cyclomatic Complexity 0 1 2 3 4 5 6 7 8 9 Releases (1973-2016) Kernel
  81. 81. Cyclomatic Complexity 0 1 2 3 4 5 6 7 8 9 Releases (1973-2016) Library
  82. 82. Cyclomatic Complexity 0 1 2 3 4 5 6 7 8 9 10 Releases (1973-2016) Tools

×