1. From Coders to Builders of the Intelligent World
Software engineering for AI: Invariants and unsolved problems
从程序员到智能世界的构建者
AI软件工程:不变之处和尚待解决的难题
Jez Humble
CTO, DevOps Research and Assessment LLC
DevOps研究与评估(DORA)协会CTO
2. Abstract
目录
A very short
history of DevOps
DevOps发展简史
1
A brief
introduction to
AI delivery
lifecycle
AI交付生命周期简介
2 What stays
the same
AI软件开发和传统软件开
发的相同点
3
What changes
AI软件开发的独特之处
4 Goals and
unsolved
problems
目标和待解决的难题
5
4. A cross-functional community of practice dedicated to the
study of building, evolving and operating rapidly changing,
secure, resilient systems at scale
致力于研究快速变化、安全、弹性系统的大规模构建、演进和运营的
跨领域实践社区
DevOps movement
DevOps运动
5. The three ways
三步工作法
第一步:系统思考
(Business) (Custimer)
Dev Ops
(企业) (客户)
The first way:
System thinking
Dev Ops
The second way:
Amplify feedback loops
第二步:放大反馈循环
Dev Ops
第三步:持续实验和学习的文化
The third way:
Culture of continual experimentation and learning
6. • Lead time for changes (check in to release)
• 变更前置时间(从提交到发布)
• Deploy frequency
• 部署频率
• Time to restore service
• 故障恢复时长
• Change fail rate
• 变更失败率
First way: Metrics for software delivery performance
第一步:软件交付的衡量指标
http://bit.ly/2018-devops-report
7. “so much of creation is discovery, and you can’t discover
anything if you can’t see what you’re doing.”
“创造是一个发现的过程。如果你看不到自己在做什么,就不会
有新的发现。”
— Bret Victor
Feedback loops
反馈循环
Bret Victor,(创造原则) Inventing on Principle, http://vimeo.com/36579366
8. Feedback loops
反馈循环
Delivery team Version control Commit stage Automated
acceptance test
Manual
validations Release
Check in
Feedback
Trigger
Approval
Approval
Feedback 反馈
交付团队 版本控制 提交阶段
自动验收测试 人工校验 发布
提交 触发
反馈
Feedback 反馈
Feedback 反馈
Check in 提交 Trigger 触发
Trigger 触发
Trigger 触发
Trigger 触发Check in 提交
Feedback 反馈
Feedback 反馈
Feedback 反馈
审批
审批
(What value did we create?)
我们创造了什么价值?
9. Machine learning development lifecycle
机器学习开发生命周期
Data ETL Select algorithm Train model
Validate model
Data ETL
Production
算法选择 模型训练
模型验证
数据ETL
生产
数据提取、转换
和加载(ETL)
10. AIDLC vs SDLC
传统软件开发与AI软件开发对比
Sdlc 传统软件开发生命周期 AIdlc AI软件开发生命周期
Materials 输入
Code
代码
Algorithms, scenarios, data
算法、场景和数据
Production 输出
Package
软件包
Model
模型
Configuration management
配置管理
Code versioning, infrastructure-as-code, API versioning
代码版本控制、基础设施即代码、API版本控制
Data & data dictionary management, model & platform
versioning, API versioning
数据&数据字典管理、模型&平台版本控制、API版本控制
Continuous integration
持续集成
Constantly validating behavior of code against tests
持续通过测试验证编码行为
Constantly validating model against scenarios and data
持续基于场景和数据进行模型验证
Continuous delivery
持续交付
Always ready to deploy to production and smart devices
(iOS/Android)
随时可部署,用于生产及智能设备(iOS/Android)
Always ready to deploy to production and smart devices
(edge)
随时可部署,用于生产及智能设备(边缘)
Observability/
care and feeding
可观察性/维护和支持
Instrument code, monitoring & alerting infrastructure
代码级测试、对基础设施进行监控和告警
Collect model accuracy, continuous model training –
data feeds back into training
收集模型准确性信息,持续训练模型,并利用反馈的数据
进行进一步的训练
11. • Toolchain for deployment pipeline
• 运用工具链,建立部署管道
• Platform for testing, training, and production
deployment
• 通过平台进行测试、训练和生产部署
• Optimize for short lead times/tight feedback loops
• 进行优化,缩短前置时间和反馈循环
• TDD
• TDD(测试驱动开发)
What’s
the same?
二者的相同点
12. • Training lead times
• 训练前置时间
• Data management – only possible with teams
• 数据管理必须通过团队协作
• Data pipeline as well as delivery pipeline
• 数据管道和交付管道
• Allocate R&D time for algorithm selection and
model training
• 分配研发时间,用于算法选择和模型训练
• Edge: Hardware heterogeneity – interface hell
• 边缘:硬件异构(接口繁杂)
What’s
different?
AI软件开发的独特之处
14. • Each model independently verifiable
• How to avoid big up-front design for
data schemas/APIs?
• How to avoid chatty, fine-grained
communication?
• Security is an emergent property
Edge: Lessons from microservices
边缘:微服务的经验教训
• 各模型可进行独立验证
• 如何避免从一开始就想设计出完
美的数据模式/APIs?
• 如何避免交谈式、颗粒度过细的
通信?
• 安全越来越重要
15. Invest in AI infrastructure: Toolchains and platforms
AI基础设施投资:工具链和平台
• For developers, training and validating models, data pipelines/ETL, deployment to
the cloud and edge, instrumentation and ongoing training
• 针对开发者,训练和验证模型、数据管道/ETL、部署到云和边缘、测试验证和持续训练
• Comprehensive data and configuration management
• 数据和配置综合管理
Get visibility into – and optimize for – lead times
实现前置时间可视化、缩短前置时间
The goal
目标
Exploit and elevate the constraints: Hardware and R&D time
充分利用限制条件,实现优化:硬件和研发时间
16. • Development and training lead times
• Large-scale development
• Understanding and debugging AI models
• Building and operating highly distributed
systems
• AI hype
Unsolved problems
待解决的难题
• 开发和训练前置时间
• 大规模开发
• AI模型的理解和调试
• 高分布式系统的建设和
运营
• AI炒作
17. To receive the following
如果您想获取以下资料
A copy of this presentation
本次演讲PPT
• The link to the 2018 Accelerate State of DevOps Report (and previous years)
• 《2018年DevOps促进现状报告》(以及往年报告)链接
• A 100 page excerpt from Lean Enterprise
• 《精益企业》一书中100页节选
• Excerpts from the DevOps Handbook and Accelerate
• 《DevOps实践指南》和《加速:精益软件和DevOps的科学》两本书的节选
• 30% off my video workshop: creating high performance organizations
• 7折优惠购买“打造高效能组织”的视频研讨课程
• A 20m preview of my Continuous Delivery video workshop
• “持续交付”的视频研讨20分钟
• Discount code for CD video + interviews with Eric Ries & more
• “持续交付”视频+Eric Ries采访以及其他内容的折扣码
Just pick up your phone and send an email
请拿起您的手机并发送邮件至
To: jezhumble@sendyourslides.com Subject: DevOps