Más contenido relacionado

Presentaciones para ti(20)

Similar a Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda Team Edition(20)


Más de



Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda Team Edition

  1. 1 Confidential and Proprietary. © 2020 Anaconda Elevating your Python & R, AI & ML Strategy Anaconda Team Edition, Constructor, Conda-Pack, Docker December 2, 2020
  2. 2 Confidential and Proprietary. © 2020 Anaconda Agenda ● The conda package & environment management system ● Anaconda Team Edition: an enterprise AI / ML package repository ○ The importance of curated CVEs for security ● Managing conda environments for collaboration & distribution ● Tools for distributing conda environments: ○ Constructor & Conda-Pack ● Docker, conda, & OpenShift: a brief look
  3. 3 Confidential and Proprietary. © 2020 Anaconda A brief look at Conda
  4. 4 Confidential and Proprietary. © 2020 Anaconda The conda package format ● Cross-platform ○ Windows, Mac, Linux (x86/ARM/PPC), z/OS ● Cross-language ○ Python, R, Rust, Go… ● Rich metadata ○ License information facilitates governance ○ Dependency graph ensures package compatibility and environment correctness ● Binary package format ○ Packages are precompiled ○ Dynamic linking reduces vendoring (improving governance) and ensures cross-compatibility
  5. 5 Confidential and Proprietary. © 2020 Anaconda Anaconda Team Edition
  6. 6 Confidential and Proprietary. © 2020 Anaconda Team Edition - CVE Curation Process The National Institute of Standards and Technology(NIST) National Vulnerability Database (NVD) CVE Data Source. Associating NVD CVE data with packages in the Anaconda Repository Automated Matching. Anaconda staff review NVD CVE data for accuracy and then categorize, refine and improve the reported information. In some cases, CVEs are patched Human Curation. Accurate CVE metadata allows organizations to filter out OSS packages that don’t meet their security requirements Refined CVE Metadata. Goal: high-quality, accurate, and dependable CVE information Performed by Anaconda Distribution Team
  7. 7 Confidential and Proprietary. © 2020 Anaconda Team Edition - CVE Curation Example
  8. 8 Confidential and Proprietary. © 2020 Anaconda Environment Management Distributing packages ready to use
  9. 9 Confidential and Proprietary. © 2020 Anaconda Why standardize environments? ● Reduces support burden ○ Encourages use of mature, stable, widely adopted packages ● Encourages collaboration ○ Everyone is using a common package set ○ Common tool choices make it easier for people to help each other ● Ensures portability and deployability ○ Cross-platform (Windows -> Linux) more feasible ○ Potentially tighter requirements for production than development ● All that said: allowing customization is important
  10. 10 Confidential and Proprietary. © 2020 Anaconda Environment types ● Standard Python 3.x ● Python 2.x (for legacy applications only?) ● R (with/without Python?) ● GPU-enabled
  11. 11 Confidential and Proprietary. © 2020 Anaconda Environment destinations ● Desktops ● Edge nodes ● JupyterHub ● Anaconda Enterprise ● Third-party platforms (Domino, Cloudera DSW) ● Clusters: Spark, Dask, HPC/GPU ● Containerized applications: Docker, Kubernetes, OpenShift ● ???
  12. 12 Confidential and Proprietary. © 2020 Anaconda Specification rendering ● Layer 1: Global environment specifications ○ Minimal version pinning (e.g., Python / Scikit-Learn / NumPy / Pandas / Tensorflow / CUDA) ○ Objective: general major version decisionmaking ○ No platform-specific exceptions, except for package omissions on certain platforms ● Layer 2: Precise pinning of top-level specifications ○ Find a common set of precise version pins for Layer 1 specs only ○ Objective: cross platform compatibility ○ Platform-specific exceptions included as needed ● Layer 3: Fully realized specifications ○ All packages & dependencies identified and versioned down to the build
  13. 13 Confidential and Proprietary. © 2020 Anaconda Specification rendering: Example Minimal pinning python 3.6 scikit-learn 0.22 numpy 1.18 pandas 1.0 tensorflow >1.10,<2 notebook Full top-level pinning python 3.6.10 scikit-learn 0.22.1 numpy 1.18.1 pandas 1.0.3 tensorflow 1.13.2 notebook 6.0.3 Final specification ready for rendering absl-py-0.9.0-py36_0 appnope-0.1.0-py36hf537a9a_0 astor-0.8.0-py36_0 attrs-19.3.0-py_0 backcall-0.1.0-py36_0 blas-1.0-mkl bleach-3.1.4-py_0 c-ares-1.15.0-h1de35cc_1001 ca-certificates-2020.1.1-0 certifi-2020.4.5.1-py36_0 dbus-1.13.14-h517e14e_0 decorator-4.4.2-py_0 defusedxml-0.6.0-py_0 entrypoints-0.3-py36_0 expat-2.2.6-h0a44026_0 gast-0.3.3-py_0 gettext- glib-2.63.1-h70d4741_1 grpcio-1.16.1-py36h044775b_1 h5py-2.10.0-py36h3134771_0 hdf5-1.10.4-hfa1e0ec_0 icu-58.2-h0a44026_3 importlib-metadata-1.6.0-py36_0 importlib_metadata-1.6.0-0 intel-openmp-2019.4-233 ipykernel-5.1.4-py36h39e3cac_0 ipython-7.13.0-py36h5ca1d4c_0 ipython_genutils-0.2.0-py36_0 ipywidgets-7.5.1-py_0 jedi-0.17.0-py36_0 jinja2-2.11.2-py_0 joblib-0.15.1-py_0 jpeg-9b-he5867d9_2 jsonschema-3.2.0-py36_0 jupyter-1.0.0-py36_7 jupyter_client-6.1.3-py_0 jupyter_console-6.1.0-py_0 jupyter_core-4.6.3-py36_0 keras-applications-1.0.8-py_0 keras-preprocessing-1.1.0-py_1 libcxx-10.0.0-1 libedit-3.1.20181209-hb402a30_0 libffi-3.3-h0a44026_1 libgfortran-3.0.1-h93005f0_2 libiconv-1.16-h1de35cc_0 libpng-1.6.37-ha441bb4_0 libprotobuf-3.12.3-hab81aa3_0 libsodium-1.0.16-h3efe00b_0 llvm-openmp-10.0.0-h28b9765_0 markdown-3.1.1-py36_0 markupsafe-1.1.1-py36h1de35cc_0 mistune-0.8.4-py36h1de35cc_0 mkl-2019.4-233 mkl-service-2.3.0-py36hfbe908c_0 mkl_fft-1.0.15-py36h5e564d8_0 mkl_random-1.1.1-py36h959d312_0 nbconvert-5.6.1-py36_0 nbformat-5.0.6-py_0 ncurses-6.2-h0a44026_1 notebook-6.0.3-py36_0 numpy-1.18.1-py36h7241aed_0 numpy-base-1.18.1-py36h3304bdc_1 openssl-1.1.1g-h1de35cc_0 pandas-1.0.3-py36h6c726b0_0 pandoc- pandocfilters-1.4.2-py36_1 parso-0.7.0-py_0 pcre-8.43-h0a44026_0 pexpect-4.8.0-py36_0 pickleshare-0.7.5-py36_0 pip-20.0.2-py36_3 prometheus_client-0.7.1-py_0 prompt-toolkit-3.0.5-py_0 prompt_toolkit-3.0.5-0 protobuf-3.12.3-py36hb1e8313_0 ptyprocess-0.6.0-py36_0 pygments-2.6.1-py_0 pyqt-5.9.2-py36h655552a_2 pyrsistent-0.16.0-py36h1de35cc_0 python-3.6.10-hf48f09d_2 python-dateutil-2.8.1-py_0 pytz-2020.1-py_0 pyzmq-18.1.1-py36h0a44026_0 qt-5.9.7-h468cd18_1 qtconsole-4.7.4-py_0 qtpy-1.9.0-py_0 readline-8.0-h1de35cc_0 scikit-learn-0.22.1-py36h27c97d8_0 scipy-1.4.1-py36h9fa6033_0 send2trash-1.5.0-py36_0 setuptools-47.1.1-py36_0 sip-4.19.8-py36h0a44026_0 six-1.15.0-py_0 sqlite-3.31.1-h5c1f38d_1 tensorboard-1.13.1-py36_0 tensorflow-1.13.2-hfddd6c2_0 tensorflow-base-1.13.2-py36_0 tensorflow-estimator-1.13.0-py36h24bf2e0_0 termcolor-1.1.0-py36_1 terminado-0.8.3-py36_0 testpath-0.4.4-py_0 tk-8.6.8-ha441bb4_0 tornado-6.0.4-py36h1de35cc_1 traitlets-4.3.3-py36_0 wcwidth-0.1.9-py_0 webencodings-0.5.1-py36_1 werkzeug-1.0.1-py_0 wheel-0.34.2-py36_0 widgetsnbextension-3.5.1-py36_0 xz-5.2.5-h1de35cc_0 zeromq-4.3.1-h0a44026_3 zipp-3.1.0-py_0 zlib-1.2.11-h1de35cc_3
  14. 14 Confidential and Proprietary. © 2020 Anaconda Environment rendering The primary output of the specification rendering process is an environment specification file—a simple text file of precise conda package specifications Additional rendering outputs: ● Conda metapackages ● Constructor installers ● conda-pack redistributables ● Hadoop parcels ● Docker images Leverage automation processes (e.g., Jenkins) for automatic building
  15. 15 Confidential and Proprietary. © 2020 Anaconda Versioning and archiving ● All rendered environments are versioned and archived ○ Ensures reproducibility of legacy models ● New projects should, by default, be pointed to the “latest” environment ○ Symbolic link: “environment-latest” -> “environment-2020.06.01” ● It should be easy to connect to recent historical environments ○ Ensures stability of longer-term model development ○ Exceptions made for environments pulled for vulnerability remediation ● For production, move off of “latest” to the corresponding version ○ Or to a fully specified custom environment ● A regular update cadence gives users predictability
  16. 16 Confidential and Proprietary. © 2020 Anaconda Distributing Conda Environments Constructor, Conda-Pack
  17. 17 Confidential and Proprietary. © 2020 Anaconda The conda package manager is GREAT at creating highly customized Python & R environments ● Dependency metadata ensures I get all the packages I need—even the ones I didn’t know I needed ● I can pin the package versions I care about as precisely as I want, and let conda give me the latest compatible versions of everything else Every data scientist can tailor their environment to precisely serve their needs Motivation
  18. 18 Confidential and Proprietary. © 2020 Anaconda But now that I have the perfect environment, how do I… ● Share it with my team members or even my entire department ● Distribute it to 100 different machines (e.g., Spark / Dask) ● Safely archive it so I can recover it, months or years later What tools are at my disposal? Motivation
  19. 19 Confidential and Proprietary. © 2020 Anaconda conda list --export > environment.txt conda create -n env_name --file environment.txt conda env export > env.yml conda env create -f env.yml Commands like these often do the job—until they don’t… ● What if the target machines don’t have conda installed? ● What if the recipient is a conda novice? ● What if the packages are no longer available in my repository? ● What if I’m building environments for air-gapped computers? Environment specs?
  20. 20 Confidential and Proprietary. © 2020 Anaconda Constructor and Conda-Pack provide two freely available, open source tools to bundle conda environments into sharable, transportable, archivable assets ● Constructor: user-friendly, custom installers similar to the Anaconda and Miniconda installers distributed by Anaconda ● Conda-Pack: simple, compact archives of environments for lower-level or automated approaches to distribution and management Addressing the challenge: two tools
  21. 21 Confidential and Proprietary. © 2020 Anaconda Constructor
  22. 22 Confidential and Proprietary. © 2020 Anaconda ● ● Builds custom installers for a valid conda environment ● Gives users the same installer experience they are already accustomed to ○ GUI installers for Mac and Windows ○ Shell installers for Linux and Mac ● Provides lots of customization options ○ Custom package sets ○ Conda configuration settings ○ Logos and branding Conda Constructor
  23. 23 Confidential and Proprietary. © 2020 Anaconda How it works ● Reads the construct.yaml file ● Calls conda in “dry run” mode to fully resolve environment specifications ● Downloads the necessary conda packages ● Combines the conda packages and install scripting into an OS-dependent executable installer format ○ Linux/Mac: POSIX-shell based installer ○ Windows: NSIS-based GUI installer ○ Mac: macOS standard GUI installer
  24. 24 Confidential and Proprietary. © 2020 Anaconda The results
  25. 25 Confidential and Proprietary. © 2020 Anaconda ● condarc: preinstall a custom configuration for conda upon install ○ channel_alias: point conda to an internal repository ○ default_channels : what does “defaults” mean? ○ ssl_verify: silence those pesky SSL issues ● channels_remap: maps build channels to install channels ○ Build an installer against one repository (e.g., Anaconda) for use in an internal environment connected to another repository (e.g., Team Edition, Artifactory, Nexus) ○ The package metadata is modified so that the installed packages appear to have come from the internal repository originally ● exclude: exclude certain packages from installation ○ readline: if you want to build a GPL-free installer ○ anaconda: removing the metapackage makes for more flexible customization by the user Advanced options
  26. 26 Confidential and Proprietary. © 2020 Anaconda name: Sortaconda version: 2020.10.01 installer_type: all channels_remap: - src: dst: - src: # [win] dst: # [win] specs: - python=3.6.10 - conda=4.8.4 - anaconda=2020.02 exclude: - anaconda condarc: channel_alias: po The construct.yaml file
  27. 27 Confidential and Proprietary. © 2020 Anaconda Conda-Pack
  28. 28 Confidential and Proprietary. © 2020 Anaconda ● ● Creates a simple, standard-format archive (.tar.gz, .zip, etc.) ● ...but… conda environments are not relocatable ○ Moving a conda environment to a different directory can render it inoperable ○ Conda employs special relocation logic that “adjusts” packages as they are installed so they operate correctly at the install location ● conda-pack solves this in one of two ways ○ Normal mode: it includes a script called conda-unpack that can be run after unpacking that reproduces conda’s relocation logic ○ Fixed destination mode: the archive is be “pre-adjusted” for the destination. It must be installed in that directory, but no post-install script is required ● new development: Cloudera parcel support Conda-Pack
  29. 29 Confidential and Proprietary. © 2020 Anaconda ● As simple as: conda-pack -n <env> ○ Produces a file <env>.tar.gz ● Common options: conda-pack -n <env> # or -p <path> --format <format> # tar, zip, tar.gz, … --output <filename> # <env>.<format> --dest-prefix <path> # for known destinations --exclude <pattern> # excludes files --include <pattern> # re-include files Calling convention
  30. 30 Confidential and Proprietary. © 2020 Anaconda ● File format nearly identical to standard conda-pack ● As simple as: conda-pack -n <env> --format parcel ● All parcel-specific options conda-pack -n <environment> --format parcel --parcel-name <name> # env name --parcel-version <version> # YYYY.MM.DD --parcel-distro <distro> # el7 --parcel-root <root> # /opt/cloudera/parcels ● In pre-release: conda install ctools/label/dev::conda-pack Parcel support
  31. 31 Confidential and Proprietary. © 2020 Anaconda Packing examples
  32. 32 Confidential and Proprietary. © 2020 Anaconda ● Deliver archive to destination machine ● Create the target directory ● Unpack the archive into the destination directory ● Activate: source <path>/bin/activate With a standard conda-pack archive (no --dest-prefix): ● Finish relocation: <path>/bin/conda-unpack That’s it! Environment is now fully usable in its new location Unpacking a conda-pack archive
  33. 33 Confidential and Proprietary. © 2020 Anaconda Unpacking example
  34. 34 Confidential and Proprietary. © 2020 Anaconda Conda, Docker, & OpenShift A brief look
  35. 35 Confidential and Proprietary. © 2020 Anaconda Building Docker containers for OpenShift ● Ensuring the application runs as any UID ○ Group writable (GID 0) ○ Use nss_wrapper for username setting if necessary ● Deliver / build the conda environment within the container ● Bake activation into the entrypoint Motivation
  36. 36 Confidential and Proprietary. © 2020 Anaconda Wrapping up
  37. 37 Confidential and Proprietary. © 2020 Anaconda ● Start getting to know conda if you haven’t already ● Check out constructor & conda-pack ○ ○ ● Grab my simple Docker demo ○ ● Connect with me on LinkedIn! ○ Thank you! Next steps
  38. 38 Confidential and Proprietary. © 2020 Anaconda Q & A
  39. Thank You! Michael Grant
  40. About Anaconda About Anaconda With more than 20 million users, Anaconda is the world’s most popular data science platform and the foundation of modern machine learning. We pioneered the use of Python for data science, champion its vibrant community, and continue to steward open-source projects that make tomorrow’s innovations possible. Our enterprise-grade solutions enable corporate, research, and academic institutions around the world to harness the power of open-source for competitive advantage, groundbreaking research, and a better world. Visit to learn more. 40