Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Artificial intelligence and open source

Wird geladen in …3

Hier ansehen

1 von 93 Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie Artificial intelligence and open source (20)


Weitere von Carlos Toxtli (20)

Aktuellste (20)


Artificial intelligence and open source

  1. 1. Artificial Intelligence and Open Source by Carlos Toxtli
  2. 2. Introduction
  3. 3. “Open source exists because of Artificial Intelligence. Artificial Intelligence exists because of Open Source”
  4. 4. The MIT researcher meets the Carnegie Mellon professor It's 1980. A 27-year-old artificial intelligence researcher from the Massachusetts Institute of Technology (MIT) is at Carnegie Mellon University's (CMU) computer science lab. It just so happens that a professor there knows something about the malfunctioning printer that's been giving him —and the whole MIT AI Lab— headaches for the last several months.
  5. 5. (parenthesis) MIT and CMU are the top 2 AI universities
  6. 6. A nice social hack His department, like those at many universities at the time, shared a PDP-10 computer and a single printer. One problem they encountered was that paper would regularly jam in the printer, causing a string of print jobs to pile up in a queue until someone fixed the jam. To get around this problem, the MIT staff came up with a nice social hack: They wrote code for the printer driver so that when it jammed, a message would be sent to everyone who was currently waiting for a print job: "The printer is jammed, please fix it." This way, it was never stuck for long.
  7. 7. The researcher asks the professor about the code
  8. 8. The researcher was Richard Stallman ...
  9. 9. Some context of Richard Stallman at that time ... Stallman enrolled as a graduate student at the Massachusetts Institute of Technology (MIT). He pursued a doctorate in physics for one year, but moved to the MIT AI Laboratory. As a research assistant at MIT under Gerry Sussman, Stallman published a paper (with Sussman) in 1977 on an AI truth maintenance system, called dependency-directed backtracking. This paper was an early work on the problem of intelligent backtracking in constraint satisfaction problems. The technique Stallman and Sussman introduced is still the most general and powerful form of intelligent backtracking.
  10. 10. He took it very personal
  11. 11. The professor signed an NDA The software Xerox provided didn't let him. It was written in binary, and no one at the lab could alter any of the commands. Sproull had signed a non-disclosure agreement (NDA) with Xerox. If he gave Stallman the source code, there was indeed potential harm: a lawsuit from a very large corporation. NDAs were fairly new at that time, but they were gaining popularity with major tech companies like Xerox and IBM. This was Stallman's first encounter with one. And, when he found out about it, he came to see it in apocalyptic terms.
  12. 12. So he decided to create an open operating system As a reaction to this, Stallman resolved that he would create a complete operating system that would not deprive users of the freedom to understand how it worked, and would allow them to make changes if they wished. It was the birth of the free software movement.
  13. 13. Such reaction was because of his lab ethic code To understand why Stallman viewed Sproull's NDA and refusal to hand over the source code as such a threat, you have to understand the ethic that MIT's AI Lab had embraced for its nearly 21 years of existence—an ethic Stallman held dear. That ethic was based on sharing. It was based on freedom. And it was based on the belief that individual contributions were just as important as the community in which they were made.
  14. 14. MIT's Tech Model Railroad Club (TMRC) The TMRC changed the way all of us interact with machines. Without it, our daily use of smartphones, laptops, and even self-driving cars might look completely different. We at TMRC use the term 'hacker' only in its original meaning: someone who applies ingenuity to create a clever result, called a 'hack'
  15. 15. LISP emerged at the MIT AI Lab MIT electrical engineering professor John McCarthy offered a course that charted a revolutionary path for computing at MIT. It was a language course. The language was LISP. And its inventor was the course's instructor, McCarthy. LISP was designed to create artificial intelligence— another term coined by McCarthy. McCarthy and fellow MIT professor Marvin Minsky believed machines designed to do simple calculations and told to carry out other rudimentary tasks were capable of much, much more. These machines could be taught to think for themselves. And, in doing so, they could be made intelligent.
  16. 16. The MIT AI lab was created before their computer science department Out of this belief and following the invention of LISP, McCarthy and Minsky created the AI Lab, even though the university wouldn't have a formal computer science department until 1975—when the electrical engineering department became the electrical engineering and computer science department.
  17. 17. The first AI was used for … In those early days of computing, when machines were gigantic and expensive (the IBM 704 was worth several million dollars), you needed to schedule time to access them. The Lab drew heavily from the TMRC hackers, who were becoming increasingly interested in the university's small collection of computers. And while many appreciated McCarthy's and Minsky's dream of teaching machines how to think, these hackers really wanted to do something much more basic with the machines. They wanted to play with them.
  18. 18. Games were an excellent motivation for contribution Two hackers (Peter Samson and Jack Dennis) discovered the answer when they programmed the Lab's TX-0 computer to play the music of Johann Sebastian Bach. With that feat accomplished, the hackers turned to other potential hacks. Could a machine provide other forms of entertainment? In 1962, three other hackers (Steve "Slug" Russell, Martin "Shag" Graetz, and Wayne Witaenem) developed one of the world's first video games on the Lab's PDP-1. It was called Spacewar!
  19. 19. Spacewar
  20. 20. And it also incited collaboration for hardware In terms of gameplay on the computer itself, hitting the switches on the PDP-1 quickly enough was difficult and cumbersome. And so, two fellow TMRC hackers—Alan Kotok and Bob Saunders—picked through the random parts and electronics in the club's tool room one day. They used those spare parts to fashion the first joysticks.
  21. 21. It was an example of the hacker ethic When all was said and done, more than 10 hackers had left their marks on Spacewar! Its gameplay was the result of successive hackers improving upon previous hackers' works—hacks on hacks on hacks. Spacewar! represented the best of the hacker ethic. It illustrated the type of innovation and problem-solving that open collaboration brought.
  22. 22. Machines making music and novel ideas For their parts, McCarthy and Minsky supported these student hackers—whether by developing a video game or teaching a machine to make music. It was these hackers' clever ways of attacking problems that would enable AI research to continue. As Minsky and McCarthy later reflected on this first class of researchers: "When those first students came to work with us, they had to bring along a special kind of courage and vision, because most authorities did not believe AI was possible at all."
  23. 23. Hackers expanded the conception of the computer capabilities Computers were marketed more as support tools than as engines of change. For industry leaders, hacking was less a contribution than a distraction. The hacker ethic yielded a unique form of ingenuity that pushed the boundaries of AI research and the power of computing forward.
  24. 24. The hacker ethic was replicated in other places In the meantime, as this first generation of hackers graduated from MIT, the hacker ethic passed to a new class. And, in the process, it reached beyond the campus of MIT and the borders of Massachusetts.
  25. 25. Hacker culture In 1962, John McCarthy took a position at Stanford University and started the Stanford Artificial Intelligence Lab (SAIL). With him came the hacker ethic, which a new generation soon adopted.
  26. 26. Hacker culture It's safe to say that without the Homebrew Computer Club, the personal computer and, eventually, the smartphone would not exist as we know them today.
  27. 27. Some known members Among the members of the Homebrew Computer Club were Steve "Woz" Wozniak and his friend Steve Jobs.
  28. 28. DARPA wanted weapons, not games But while the hacker ethic was spreading to new adherents, AI research was experiencing some tumultuous times. Funding from the Defense Advanced Research Projects Agency (DARPA), which had propelled major AI research projects during the 1960s, disappeared suddenly in 1969 with the passage of the Mansfield Amendment. The amendment stipulated that funding would no longer go to undirected research projects, which included the great majority of AI projects.
  29. 29. The interactions were not natural In 1974, things got even worse when DARPA pulled nearly $3 million from Robert Sproull's future home, Carnegie Mellon University's AI Lab. The Lab had been working on a speech recognition program, which DARPA hoped could be installed in planes for pilots to give direct commands. The only problem? Those commands had to be spoken in a particular order and cadence. Pilots, it turns out, have a hard time doing that in highly stressful combat situations.
  30. 30. So they moved to enterprise- related AI projects With government funds dwindling, researchers turned to the business world as their primary source of funding and marketing for AI projects. In the late 1970s, these enterprise-related AI projects centered on emulating expert decision-making, and they became known as "expert systems." Using if-then reasoning, these projects and resulting software became highly successful.
  31. 31. Proprietary code is a competitive advantage Software had boomed into a multimillion-dollar industry. The innovation achieved in the AI Labs and computer science departments at MIT, Stanford, Carnegie Mellon, and other universities had gone mainstream. And, with the help of new companies like Apple, computers were finally becoming vehicles of transformation. But, while the hacker ethic continued to thrive among hobbyists, proprietary products became the standard in this new burgeoning tech industry. Businesses relied on competitive advantage. If a company was going to be successful, it needed a product that no one else had. That reality conflicted with the hacker ethic, which prized sharing as one of its foundational principles.
  32. 32. You can make a high-end salary by proprietary code In the former camp, openness and transparency still thrived. But in the latter, those ideals were verboten. If you wanted to make a high-end salary working with computers, you needed to cross that line. And many did. NDAs became the contract these former hackers signed to seal their transformation.
  33. 33. The hacker ethic turns to dust as the A.I. winter approaches In the months after Richard Stallman's unfortunate meeting with Robert Sproull, most of the hackers at MIT's AI Lab left for a company, called Symbolics, started by the Lab's former administrative director, Russell Noftsker. Stallman, who refused to go the proprietary route, was one of the few who stayed behind at MIT.
  34. 34. A.I. businesses did not succeed Within a year, the billion-dollar business of specialized LISP hardware collapsed. A few years later, in the early 1990s, the expert systems market would follow suit. They eventually proved ineffective and too costly. Soon, artificial intelligence entered a long and seemingly endless winter.
  35. 35. It was a shame to work in AI AI was associated with systems that have all too often failed to live up to their promises. Even the term "artificial intelligence" fell out of fashion. In 2005, The New York Times reported that AI had become so stigmatized that "some computer scientists and software engineers avoided the term artificial intelligence for fear of being viewed as wild-eyed dreamers."
  36. 36. The hacker ethic reborn While artificial intelligence endured a long, slow decline, Richard Stallman— the self-proclaimed "last true hacker"—sought to resurrect the hacker ethic.
  37. 37. So it was the A.I. winter and Stallman had plans By late 1983, Stallman was ready to announce his project and recruit supporters and helpers on Usenet (a kind of pre-web Reddit) . In September 1983, he announced the creation of the GNU project (GNU stands for GNU's Not Unix—a recursive acronym). Calling on individual programmers to contribute to his project, especially those "for whom knowing they are helping humanity is as important as money." The goal would be to develop a new operating system based on the principle of sharing.
  38. 38. So he started the free software movement This was the beginning of what would become the free software movement. For many, it was the reincarnation of the hacker ethic. It countered the proprietary model of development and emphasized the value of sharing. And it was solidified with the creation of the GNU General Public License (GPL), which was released on February 25, 1989.
  39. 39. He developed GCC In January 1984, he started working full-time on the project, first creating a compiler system (GCC) and various operating system utilities. Early in 1985, he published "The GNU Manifesto," which was a call to arms for programmers to join the effort, and launched the Free Software Foundation in order to accept donations to support the work.
  40. 40. Linux adopted his GPL v2 license With Linus Torvalds' creation of Linux® in 1991 (and then choosing to release it under version 2 of the GPL in 1992), the free software movement gained even greater attention and momentum outside the normal hacker channels.
  41. 41. Hacker Ethic 2.0 not as A.I. but as free software And around the same time that artificial intelligence entered its long winter, the hacker ethic 2.0—in the form of free software—surged in popularity.
  42. 42. Netscape shared its source code In 1998, Netscape made headlines when it released the source code for its proprietary Netscape Communicator Internet Suite. This move prompted a serious discussion among developers about how to apply the Free Software Foundation's ideals to the commercial software industry. Was it possible to develop software openly and transparently, but still make a profit?
  43. 43. Free software and open source discussions For Stallman, the initial motivation must always be the ethical idea of available and accessible software for all—i.e. free as in free speech (one of Stallman's favorite analogies). If you went on to make a profit, great. There was nothing wrong with that. In his view, though, this other branch's initial motivation was profits, while the ethical idea of accessibility was secondary.
  44. 44. So the open source term was coined In early February 1998, Christine Peterson gave this new branch its official name when she suggested "open source" as an alternative to free software following the Netscape release. Later that same month, Bruce Perens and Eric S. Raymond launched the Open Source Initiative (OSI). The OSI's founding conference adopted Peterson's suggested name to further differentiate it "from the philosophically and politically focused label 'free software.'"
  45. 45. The free software movement inspired open source. Yet, despite this divide, many who work in open source recognize Stallman as a founding father. It was the free software movement that would later inspire the emergence of open source.
  46. 46. A.I. winter was about to end and the open source movement was strong In the years that followed, open source would come to equal, and in some cases rival, proprietary development. This was especially true once AI's long winter came to an end.
  47. 47. But the A.I. needed a common code baseline Several programming languages were used to develop the Artificial Intelligence algorithms, but in order to be able to replicate existing code there should be a common platform. Most scientists preferred C or Python.
  48. 48. Python In December 1989, Van Rossum had been looking for a "'hobby' programming project that would keep [him] occupied during the week around Christmas" as his office was closed when he decided to write an interpreter for a "new scripting language [he] had been thinking about lately". He attributes choosing the name "Python" as a big fan of Monty Python.
  49. 49. Numpy The Python programming language was not initially designed for numerical computing, but attracted the attention of the scientific and engineering community early on, so that a special interest group called matrix-sig was founded in 1995 with the aim of defining an array computing package. Among its members was Python designer and maintainer Guido van Rossum, who implemented extensions to Python's syntax (in particular the indexing syntax) to make array computing easier. An implementation of a matrix package was completed by Jim Fulton, then generalized by Jim Hugunin to become Numeric, also variously called Numerical Python extensions or NumPy.
  50. 50. Scientific open source SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. It was done in NumPy.
  51. 51. Machine Learning Modern AI tools and techniques were not congregated in a single tool. Scikit-learn (initially a SciPy plugin) was initially developed by David Cournapeau as a Google summer of code project in 2007 and it gathered the common machine learning tools in a the same open source toolkit.
  52. 52. Deep Learning Created in 2007. Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix- valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures. Theano is an open source project primarily developed by a Montreal Institute for Learning Algorithms (MILA) at the Université de Montréal. They stopped on 2017 due to competing offerings by strong industrial players
  53. 53. The raise of Corporate Open Source Google open sourced his internal AI framework, TensorFlow, in 2015. Facebook decided to give support to Pytorch in 2016. It is important to mention that torch was created in 2002 for C++ but it did not gain popularity until the python wrapper was released and when Caffe2 was merged in 2018.
  54. 54. Open Neural Network Exchange (ONNX) In September 2017 Facebook and Microsoft introduced a system for switching between machine learning frameworks such as PyTorch and Caffe2. Framework interoperability: Allow developers to more easily move between frameworks, some of which may be more desirable for specific phases of the development process, such as fast training, network architecture flexibility or inferencing on mobile devices. In November 2019 ONNX was accepted as graduate project in Linux Foundation AI.
  55. 55. That’s the history so lets recap The hacker ethic was crucial for the early A.I. advancements, but when the winter came, that was adopted as a philosophy of the free software. Once the A.I. winter ended, the open source movement was as strong that even the largest software companies opened their coding tools.
  56. 56. But … Who actually write that code? If it is true that there is diversity in the background of participants of any open source community. For the A.I. projects there is not that diverse. So, Who are the main actors?
  57. 57. Ph.D.
  58. 58. What is a Ph.D.? A Doctor of Philosophy
  59. 59. Graduate studies After completing the undergraduate studies (bachelor) then you can pursue a graduate degree (masters or doctorate). Richard Stallman was pursuing a graduate degree when he started the movement. Most of the A.I. algorithms creators have a Ph.D. degree.
  60. 60. The achieved that not because of the Ph.D. degree Because of the time available to experiment Because of the work in teams Because of the computational power Because they were not concerned about working and money.
  61. 61. The Ph.D. is a job Some people think that the PhDs are only students. Most of the Ph.D. students get a salary either for his work as Research Assistants, Teaching Assistant, or they have Fellowships or Sponsorships. Some companies and governments have projects with the universities and the graduate students are the people who execute them.
  62. 62. So are you saying that I need a Ph.D. to contribute? Not exactly. Contributing to A.I. frameworks that are a collection of algorithms requires to create an state-of-the-art algorithm that merits their inclusion. In that case a Ph.D. is able to spend months in a solutions that outperforms existing algorithms, so it is more probable that a Ph.D. can author it. When you create your own project or you contribute to an existing one by coding the building blocks or adapting research code, then there are a lot of opportunities to contribute.
  63. 63. So, How can I start? In most of the A.I. repositories there are issues marked as :"good first issue" or related labels. Try to find them and start contributing. GitHub search: is:open label:"good first issue"
  64. 64. There are some A.I. frameworks per companie I am listing some of the most popular A.I. frameworks that are maintained by companies. It is a good opportunity to learn deeply how it works and to get a job if you are a committed contributor.
  65. 65. Facebook https://ai.facebook.com/tools/ Pytorch Detectron StarSpace
  66. 66. Google https://ai.google/tools/#developers Tensorflow ML Kit AI Experiments
  67. 67. Uber https://www.uber.com/us/en/uberai/ Ludwig Pyro Plato
  68. 68. Microsoft https://www.microsoft.com/en-us/ai CNTK ONNX NNI
  69. 69. Some other Open Source and A.I. curiosities There are some other facts related to open source and A.I. that are interesting to know.
  70. 70. Is OpenAI really open? OpenAI is an independent research organization consisting of the for-profit corporation OpenAI LP and its parent organization, the non-profit OpenAI Inc. The corporation conducts research in the field of artificial intelligence (AI) with the stated aim to promote and develop friendly AI in such a way as to benefit humanity as a whole; it is considered a competitor to DeepMind. The organization was founded in San Francisco in late 2015 by Elon Musk, Sam Altman, and others, who pledged US$1 billion. OpenAI stated they would "freely collaborate" with other institutions and researchers by making its patents and research open to the public. In 2019, OpenAI became a for profit company
  71. 71. Why is AI succeeding this time? There are 3 main factors ● Modern AI approaches are not rule based anymore but data-driven. ● Now we have enough data and effective synthetic generation data techniques. ● We have the computation power thanks to GPUs and other high-performance hardware.
  72. 72. And what about DeepMind DeepMind Technologies is a UK artificial intelligence company founded in September 2010, and acquired by Google in 2014. The company is based in London, with research centres in Canada, France, and the United States. In 2015, it became a wholly owned subsidiary of Alphabet Inc. DeepMind Technologies' goal is to "solve intelligence", which they are trying to achieve by combining "the best techniques from machine learning and systems neuroscience to build powerful general-purpose learning algorithms". They are trying to formalize intelligence in order to not only implement it into machines, but also understand the human brain
  73. 73. Let's explain what a GPU is A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel. With the emergence of deep learning, the importance of GPUs has increased. It was found that while training deep learning neural networks, GPUs can be 250 times faster than CPUs. The explosive growth of Deep Learning in recent years has been attributed to the emergence of general purpose GPUs.
  74. 74. CPU vs GPU
  75. 75. And to explain what CUDA is CUDA (Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach termed GPGPU (General-Purpose computing on Graphics Processing Units). CUDA is used in the backend of all the major deep learning frameworks,
  76. 76. But CUDA is not open source ... So why do every framework use CUDA if it is not open source? Why do they want to support the NVIDIA sellings? The answer is simple. NVIDIA was the first optimized hardware that showed to have stable and fast results during intensive processing. But there are some good news, there is an open source alternative, and they are starting to open part of their hardware documentation: https://github.com/nvidia/open-gpu-doc
  77. 77. OpenCL is the open source alternative OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field- programmable gate arrays (FPGAs) and other processors or hardware accelerators. It is gradually being adopted for some parts of existing frameworks. https://www.khronos.org/opencl/
  78. 78. But a TPU is a more optimized hardware A tensor processing unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google specifically for neural network machine learning. It has 8-fold per pod (with up to 1,024 chips per pod). But only Google has them in their data centers.
  79. 79. So how can I run A.I. without a GPU There are not much free options, but Google has an easy to use notebook based platform that runs GPUs and TPUs in the background for free. Its name is Google Colaboratory.
  80. 80. Google Colaboratory Colaboratory (also known as Colab) is a free Jupyter notebook environment that runs in the cloud and stores its notebooks on Google Drive. Colab was originally an internal Google project; an attempt was made to open source all the code and work more directly upstream, leading to the development of the "Open in Colab" Google Chrome extension, but this eventually ended, and Colab development continued internally. It gives a 300Gb hard drive, and a 12Gb GPU memory for the NVIDIA Tesla K80 GPU instance.
  81. 81. Some useful resources to learn A.I. from code Papers with code https://paperswithcode.com/ A.I. Notebooks for colab http://bit.ly/awesome-ai
  82. 82. Conclusions The A.I. scene and the free software movements are intimately related. In A.I. frameworks, the Ph.D. often writes the algorithms and enthusiastic committers the application structure or migrate existing code to the framework format. Getting started in the A.I. scene is easier by doing introductory issues in the repositories. Learning and testing tasks are easier and free by using free cloud services such as Google Colaboratory.
  83. 83. THANKS Carlos Toxtli http://www.carlostoxtli.com @ctoxtli