2. 概要
• 9th USENIX Symposium on Operating Systems Design and
Implementation (OSDI '10)
– October 4–6, 2010, Vancouver, BC, Canada
– Brad Chen とRemizi Arpaci-Dusseauのレビューの説明。
• 199投稿、32採択。3ラウンドレビュー。最後のレビューでも70+の論文がある。
• 参加者524 名
– Best Paper 2本
• Yale大のEfficient System-Enforced Deterministic Parallelism
• IBMのThe Turtles Project: Design and Implementation of Nested Virtualization
– ポスターあり。75本。日本から2件(慶応河野研、産総研)
• ワークショップ
– Workshop on Managing Systems via Log Analysis and Machine Learning
Techniques (SLAML '10)
– Sixth Workshop on Hot Topics in System Dependability (HotDep '10)
– 2010 Workshop on Power Aware Computing and Systems (HotPower '10)
– 2010 Workshop on the Economics of Networks, Systems, and Computation
(NetEcon '10)
– 5th International Workshop on Systems Software Verification (SSV '10)
• 23rd ACM Symposium on Operating Systems Principles (SOSP)
October 23-26, 2011, Cascais, Portugal
– http://sosp2011.gsd.inesc-id.pt/
3. プログラム1日目
• Kernels: Past, Present, and Future Session Chair: Hank Levy, University of Washington
– An Analysis of Linux Scalability to Many Cores (48コアのLinuxスケーラビリティ解析)
Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris, and Nickolai Zeldovich, MIT CSAIL
– Trust and Protection in the Illinois Browser Operating System
Shuo Tang, Haohui Mai, and Samuel T. King, University of Illinois at Urbana-Champaign
– FlexSC: Flexible System Call Scheduling with Exception-Less System Calls
Livio Soares and Michael Stumm, University of Toronto
• Inside the Data Center, 1 Session Chair: Bianca Schroeder, University of Toronto
– Finding a Needle in Haystack: Facebook's Photo Storage
Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel, Facebook Inc.
– Availability in Globally Distributed Storage Systems (Googleのストレージステム解析)
Daniel Ford, François Labelle, Florentina I. Popovici, Murray Stokely, Van-Anh Truong, Luiz Barroso, Carrie Grimes, and Sean Quinlan, Google, Inc.
– Nectar: Automatic Management of Data and Computation in Datacenters– Nectar: Automatic Management of Data and Computation in Datacenters
Pradeep Kumar Gunda, Lenin Ravindranath, Chandramohan A. Thekkath, Yuan Yu, and Li Zhuang, Microsoft Research Silicon Valley
• Security Technologies Session Chair: Bryan Ford, Yale University
– Intrusion Recovery Using Selective Re-execution
Taesoo Kim, Xi Wang, Nickolai Zeldovich, and M. Frans Kaashoek, MIT CSAIL
– Static Checking of Dynamically-Varying Security Policies in Database-Backed Applications
Adam Chlipala, Impredicative LLC
– Accountable Virtual Machines
Andreas Haeberlen, University of Pennsylvania; Paarijaat Aditya, Rodrigo Rodrigues, and Peter Druschel, Max Planck Institute for Software Systems (MPI-SWS)
• Concurrency Bugs Session Chair: George Candea, EPFL
– Bypassing Races in Live Applications with Execution Filters
Jingyue Wu, Heming Cui, and Junfeng Yang, Columbia University
– Effective Data-Race Detection for the Kernel
John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, and Kirk Olynyk, Microsoft Research
– Ad Hoc Synchronization Considered Harmful
Weiwei Xiong, University of Illinois at Urbana-Champaign; Soyeon Park, Jiaqi Zhang, and Yuanyuan Zhou, University of California, San Diego; Zhiqiang Ma, Intel
4. プログラム2日目
• Deterministic Parallelism Session Chair: Emery Berger, University of Massachusetts Amherst
– Deterministic Process Groups in dOS
Tom Bergan, Nicholas Hunt, Luis Ceze, and Steven D. Gribble, University of Washington
– (Best Paper!) Efficient System-Enforced Deterministic Parallelism
Amittai Aviram, Shu-Chun Weng, Sen Hu, and Bryan Ford, Yale University
– Stable Deterministic Multithreading through Schedule Memoization
Heming Cui, Jingyue Wu, Chia-che Tsai,and Junfeng Yang, Columbia University
• Systems Management Session Chair: Sam King, University of Illinois, Urbana-Champaign
– Enabling Configuration-Independent Automation by Non-Expert Users
Nate Kushman and Dina Katabi, Massachusetts Institute of Technology
– Automating Configuration Troubleshooting with Dynamic Information Flow Analysis
Mona Attariyan and Jason Flinn, University of Michigan
• Inside the Data Center, 2 Session Chair: Emin Gün Sirer, Cornell University• Inside the Data Center, 2 Session Chair: Emin Gün Sirer, Cornell University
– Large-scale Incremental Processing Using Distributed Transactions and Notifications
– (GoogleのアルタイムなWeb検索を実現するためのPercolator)
Daniel Peng and Frank Dabek, Google, Inc.
– Reining in the Outliers in Map-Reduce Clusters using Mantri
– (MSRによるジョブの処理を遅らせるOutlier(異常)を制御するMantri)
Ganesh Ananthanarayanan, Microsoft Research and UC Berkeley; Srikanth Kandula and Albert Greenberg, Microsoft Research; Ion Stoica, UC Berkeley; Yi Lu, Microsoft Research; Bikas Saha and Edward Harris, Microsoft Bing
– Transactional Consistency and Automatic Management in an Application Data Cache
Dan R.K. Ports, Austin T. Clements, Irene Zhang, Samuel Madden, and Barbara Liskov, MIT CSAIL
– Piccolo: Building Fast, Distributed Programs with Partitioned Tables
Russell Power and Jinyang Li, New York University
• Cloud Storage Session Chair: Nickolai Zeldovich, Massachusetts Institute of Technology
– Depot: Cloud Storage with Minimal Trust
Prince Mahajan, Srinath Setty, Sangmin Lee, Allen Clement, Lorenzo Alvisi, Mike Dahlin, and Michael Walfish, The University of Texas at Austin
– Comet: An Active Distributed Key-Value Store
Roxana Geambasu, Amit A. Levy, Tadayoshi Kohno, Arvind Krishnamurthy, and Henry M. Levy,University of Washington
– SPORC: Group Collaboration using Untrusted Cloud Resources
Ariel J. Feldman, William P. Zeller, Michael J. Freedman, and Edward W. Felten, Princeton University
5. プログラム3日目
• Production Networks Session Chair: Brad Karp, University College London
– Onix: A Distributed Control Platform for Large-scale Production Networks
Teemu Koponen, Martin Casado, Natasha Gude, and Jeremy Stribling, Nicira Networks; Leon Poutievski, Min Zhu, and Rajiv
Ramanathan, Google; Yuichiro Iwata, Hiroaki Inoue, and Takayuki Hama, NEC; Scott Shenker, International Computer Science Institute (ICSI) and UC
Berkeley
– Can the Production Network Be the Testbed?
Rob Sherwood, Deutsche Telekom Inc. R&D Lab; Glen Gibb and Kok-Kiong Yap, Stanford University; Guido Appenzeller, Big
Switch Networks; Martin Casado, Nicira Networks; Nick McKeown and Guru Parulkar, Stanford University
– Building Extensible Networks with Rule-Based Forwarding
Lucian Popa, University of California, Berkeley, and ICSI, Berkeley; Norbert Egi, Lancaster University; Sylvia Ratnasamy, Intel
Labs, Berkeley; Ion Stoica, University of California, Berkeley
• Mobility Session Chair: Ed Nightingale, Microsoft Research
– TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on
SmartphonesSmartphones
William Enck, The Pennsylvania State University; Peter Gilbert, Duke University; Byung-gon Chun, Intel Labs; Landon P. Cox, Duke
University; Jaeyeon Jung, Intel Labs; Patrick McDaniel,The Pennsylvania State University; Anmol N. Sheth, Intel Labs
– StarTrack Next Generation: A Scalable Infrastructure for Track-Based Applications
Maya Haridasan, Iqbal Mohomed, Doug Terry, Chandramohan A. Thekkath, and Li Zhang,Microsoft Research Silicon Valley
• Virtualization Session Chair: Carl Waldspurger, VMware
– (Best Paper!) The Turtles Project: Design and Implementation of Nested Virtualization
Muli Ben-Yehuda, IBM Research—Haifa; Michael D. Day, IBM Linux Technology Center; Zvi Dubitzky, Michael Factor, Nadav Har'El, and Abel Gordon, IBM Research—Haifa; Anthony Liguori, IBM
Linux Technology Center; Orit Wasserman and Ben-Ami Yassour, IBM Research—Haifa
– mClock: Handling Throughput Variability for Hypervisor IO Scheduling
Ajay Gulati, VMware Inc.; Arif Merchant, HP Labs; Peter J. Varman, Rice University
– Virtualize Everything but Time
Timothy Broomhead, Laurence Cremean, Julien Ridoux, and Darryl Veitch, Center for Ultra-Broadband Information Networks (CUBIN), The
University of Melbourne
6. Trust and Protection in the Illinois Browser Operating System 1/2
Shuo Tang, Haohui Mai, and Samuel T. King, UIUC
• ブラウザとOSをCo-designし、TCB(Trusted Computing Base)
を小さく、安全な実行環境 “IBOS”
– 赤い部分がTCB
• Isolation• Isolation
– Web Page InstanceはSOP (same origin policy[IEEE SSP’10])で管理
– Storageへ保存するオブジェクトは暗号化
– UNIX Layerもブラウザのインスタンスとして実行
• IBOS内にReference Monitorがあり、UI,
Storage, Driverを監視
7. Trust and Protection in the Illinois Browser Operating System 2/2
Shuo Tang, Haohui Mai, and Samuel T. King, UIUC
• Split Driver architecture
– ドライバはユーザ空間。DMAへのアクセスはIBOS
• 実装
– L4 Pistachioベース
– uClibc, IwIP (lightweight TCP/IP Protocol Stack), Qt, WebKit– uClibc, IwIP (lightweight TCP/IP Protocol Stack), Qt, WebKit
– E1000 NIC, VESA video card, mouse, keyboad
8. FlexSC: Flexible System Call Scheduling with Exception-Less System Calls 1/2
Livio Soares and Michael Stumm, University of Toronto
• System CallはMode Switchのオーバーヘッドばかりでなく、
Pipeline flash, cache flashなどの影響で本来の処理が遅くなる
– Xalanでpwrite実行後のIPC (Instruction per cycle)。戻るのに 14000サ
イクル以上かかる。
• Mode Switch(exception)をおこさない
System Callの提案System Callの提案
• マルチコアが推奨
– System call を処理するkernel thread は別コア
– System call pageで引数/帰り値を受け渡し
– シングルでもsystem callを溜めてバッチ処理することで効率化が図れる
9. FlexSC: Flexible System Call Scheduling with Exception-Less System Calls 2/2
Livio Soares and Michael Stumm, University of Toronto
• 実行の想定は M on N threading モデル
– アプリケーションのthread (M)はカーネルtread (N)より多いことが前提。
– System call 後はユーザ空間でthread switchを行う。
• 全てがsystem call待ちになった時には flexsc_wait() system callでwaitへ。これはexception based sytstem call
• libcラッパーとして実装
– Dynamic link しているバイナリは再コンパイル不要
• 性能評価
– Apache
• 通常では200 threadが最適。FlexSCでは1000 treadが最適で116%の向上。
• 類似研究
– multi-calls [Cassyopia, HotOS’03]
– Multi-hyper call [Xen, SOSP’03]
– カーネルを専用コアに割り当てる
• Corey [OSDI’08]
• Factored Operating System (fos) [OS Review ‘09]
10. Intrusion Recovery Using Selective Re-execution 1/2
Taesoo Kim, Xi Wang, Nickolai Zeldovich, and M. Frans Kaashoek, MIT CSAIL
• セキュリティの問題があった場合にスナップショットの時点に戻る
のみでなく、実行履歴から正規の実行を復元する “RETRO”
– ライバルはTaser[SOSP’05]。Taint trackベースでfalse positiveしやすい
• Action history graphによりオブジェクトの依存関係+引数/返り値
を保存。
– 再実行は必要に応じてユーザの再入力を求めるが、極力無くす
– 最適化
• shepherded-execution 再実行の監視
• predicate 同一処理は再実行しない
• refinement 処理の全体を再実行するのではなく、影響のあった部分のみの再実行
11. Intrusion Recovery Using Selective Re-execution 2/2
Taesoo Kim, Xi Wang, Nickolai Zeldovich, and M. Frans Kaashoek, MIT CSAIL
• 実装
– Linux、全system callを保存。Btrfsのスナップショット機能。Logは圧縮
• テスト
– Taserと比較
– オーバーヘッド。SOSP’07で使われた論文投稿システム HotPRPの30分
前を再現して測定。35%CPU up, 4GB log / day
• 関連研究
– BackTracker[ACM TOCS’05], IntroVirt[SOSP’05], Polygraph[EuroSys’09]
– Windows System Restore, Windows Drive Rollback, Mac TimeMachine
• 質問:真っ先にP.Chenが出てきた。
12. Comet: An Active Distributed Key-Value Store 1/2
Roxana Geambasu, Amit A. Levy, Tadayoshi Kohno, Arvind Krishnamurthy,
and Henry M. Levy,University of Washington
• 既存のkey/value store上でアプリケーション固有のカスタマイズ
を可能にする仕組み
– Vanish [USENIX Secureity’09]をVuze DHT上に実装するのに苦労した
ことから開発
• Cometで保存されるデータはput/get/時+定期的(10分程度)に実
行される数十行のコード(hundler)とデータ領域を付けたASO行される数十行のコ ド(hundler)とデ タ領域を付けたASO
(Active Storage Object)
– 実行されるコードは100KB命令、メモリは100KB。
– コードはLua [Software Practice&Expericnce’99]で記述。Sandbox内での実行。
– 使えるAPIも制限あり。隣のIDのノードのみの通信。
13. Comet: An Active Distributed Key-Value Store 1/2
Roxana Geambasu, Amit A. Levy, Tadayoshi Kohno, Arvind Krishnamurthy,
and Henry M. Levy,University of Washington
• HandlerはonGet, onPut, onUpdate, onTimerの4つ。
Handlerの中で使えるAPI
• アプリケーション
– ノードライフタイム測定
– Smart Rendezvous
– Vanish
14. The Turtles Project: Design and Implementation of Nested Virtualization 1/3
Muli Ben-Yehuda, IBM Research—Haifa; Michael D. Day, IBM Linux Technology Center; Zvi Dubitzky, Michael
Factor, Nadav Har'El, and Abel Gordon, IBM Research—Haifa; Anthony Liguori, IBM Linux Technology Center; Orit
Wasserman and Ben-Ami Yassour, IBM Research—Haifa
• Nested Virtualization(多重仮想化)はWindows7のXPモード
をVM内で動かすのに必要
• 多重仮想化は昔からあった(例:IBM z/VM)が、ハードウェアサ
ポート(Multi-level architectural support)が必要。
• X86はsingle-level architectural supportなので効率的な実行• X86 single-level architectural support
には多重仮想化を落とし込むことが必要。Multiplexed
• CPU、メモリ、I/Oの3種類の仮想化技術の多重化が必要
Trapが起これば
L0に落ちる
Guest mode
Root mode
15. The Turtles Project: Design and Implementation of Nested Virtualization 2/3
Muli Ben-Yehuda, IBM Research—Haifa; Michael D. Day, IBM Linux Technology Center; Zvi Dubitzky, Michael
Factor, Nadav Har'El, and Abel Gordon, IBM Research—Haifa; Anthony Liguori, IBM Linux Technology Center; Orit
Wasserman and Ben-Ami Yassour, IBM Research—Haifa
• CPU:Nested VMX
– VMXはL0のみ実行可能なので
Compression が必要
• MMU: Multi-dimensional Paging• MMU: Multi-dimensional Paging
– Shadow Pageあるいは EPT/NPTの仮
想化の組み合わせ
– EPT/NPTはL0のみ実行可能なので
Compression が必要
• I/O:Multi-level Device Assignment
– Emulation, Para-Virtualization, IOMMU
の仮想化の組み合わせ
– 詳細 [Amit, WIOSCA’10]
16. The Turtles Project: Design and Implementation of Nested Virtualization 2/3
Muli Ben-Yehuda, IBM Research—Haifa; Michael D. Day, IBM Linux Technology Center; Zvi Dubitzky, Michael
Factor, Nadav Har'El, and Abel Gordon, IBM Research—Haifa; Anthony Liguori, IBM Linux Technology Center; Orit
Wasserman and Ben-Ami Yassour, IBM Research—Haifa
• オーバーヘッド 6-10%
• 考察
– 現在のCPUではTrapが起こったコアで処理しなければならず、処理を阻害。別のコアで処
理可能にした場合null callで41%の向上[HPCVIRT’07]。
– Cache Pollution
– VM exitのオーバーヘッドが大きい
– Firmware内の仮想化 HyperSpace, LaLa[Scalable Trusted Computing’09]– Firmware内の仮想化 HyperSpace, LaLa[Scalable Trusted Computing’09]
17. HotDep’10
• Sixth Workshop on Hot Topics in System Dependability October 3 2010
• Distributed Algorithms Session Chair: Andreas Haeberlen, University of Pennsylvania, US
– Storyboard: Optimistic Deterministic Multithreading
Rüdiger Kapitza, Matthias Schunter, and Christian Cachin, IBM Research—Zurich; Klaus Stengel and Tobias Distler, Friedrich-Alexander University Erlangen-Nuremberg
– Scalable Agreement: Toward Ordering as a Service
Manos Kapritsos, UT Austin; Flavio P. Junqueira, Yahoo! Research
– Active Quorum Systems
Alysson Bessani, Paulo Sousa, and Miguel Correia, University of Lisbon, Faculty of Sciences
• OS Reliability Session Chair: Gilles Muller, INRIA/LIP6, FR
– We Crashed, Now What?
Cristiano Giuffrida, Lorenzo Cavallaro, and Andrew S. Tanenbaum, Vrije Universiteit, Amsterdam
• MINIX3 のクラッシュ時のリカバリ技術。Checkpointを使わずに、 LLVMが個々のバイナリにリカバリ用のコードを挿入し、問題があっ
た場合にそのコードで再実行。類似研究 SafeDrive[OSDI’06],RecoveryDomain[SOSP’09]
– Improved Device Driver Reliability Through Verification Reuse
Leonid Ryzhyk, NICTA and University of New South Wales; John Keys, Intel Corporation;Balachandra Mirla, NICTA and University of New South Wales; Arun Raghunath and Mona Vij, Intel Corporation; Gernot Heiser, NICTA and University of New South WalesLeonid Ryzhyk, NICTA and University of New South Wales; Intel Corporation; NICTA and University of New South Wales; Intel Corporation; NICTA and University of New South Wales
– Towards Automatically Checking Thousands of Failures with Micro-specifications
Haryadi S. Gunawi, University of California, Berkeley; Thanh Do, University of Wisconsin, Madison;Pallavi Joshi and Joseph M. Hellerstein, University of California, Berkeley; Andrea C.
Arpaci-Dusseau and Remzi H. Arpaci-Dusseau, University of Wisconsin, Madison; Koushik Sen,University of California, Berkeley
• Management and Debugging Session Chair: Steven Hand, University of Cambridge, UK
– Focus Replay Debugging Effort on the Control Plane
Gautam Altekar and Ion Stoica, UC Berkeley
– A Rising Tide Lifts All Boats: How Memory Error Prediction and Prevention Can Help with Virtualized System Longevity
Yuyang Du and Hongliang Yu, Tsinghua University; Yunhong Jiang and Yaozu Dong, Intel Research and Development, Asia-Pacific; Weimin Zheng, Tsinghua University
• Xenを使ったメモリエラーの回避法。Intel のMCA(Machine Check Architecture)を使い、メモリ障害を検出し、問題が起こりそうなら
page/DIMMreplacement or VM live migrationを行う
– A Design for Comprehensive Kernel Instrumentation
Peter Feiner, Angela Demke Brown, and Ashvin Goel, University of Toronto
• DynamoRioを独自のhypervisorに入れ、任意のカーネルをモニタする仕組み。類似研究:ValgrindをL4に入れたFiasco.OC[Vee’10]
• Storage and File Services Session Chair: Rüdiger Kapitza, University of Erlangen-Nuremberg, DE
– Behavior-Based Problem Localization for Parallel File Systems
Michael P. Kasick, Rajeev Gandhi, and Priya Narasimhan, Carnegie Mellon University
– What Consistency Does Your Key-Value Store Actually Provide?
Eric Anderson, Xiaozhou Li, Mehul A. Shah, Joseph Tucek, and Jay J. Wylie, Hewlett-Packard Laboratories
18. SSV’10
• 5th International Workshop on Systems Software Verification, October 6–7 2010
• 17投稿、10採択。参加者30名程度。オーストラリアのNICTAがスポンサー
• Invite Talk
– Static Analysis for Verifying C Programs, and More
Pascal Cuoq, CEA
• Cの解析フレームワークFrama-Cのチュートリアル http://frama-c.com/
– Visualizing Information Flow through C Programs
Joe Hurd, Galois Inc.
• Cのfunction Callを可視化するCIFT C Information Flow Tool• Cのfunction Callを可視化するCIFT C Information Flow Tool
– スライド資料 http://www.gilith.com/research/talks/ssv2010.pdf
• Rubyに使われる話(Rift)と関係ある? http://github.com/brixen/rift
– Ruby Information Flow Tool based on the idea of Cift presented at a Galois tech talk.
– Work in Progress for the Next 100Mloc: Finding Bugs in Real Code
Ansgar Fehnker, NICTA and University of New South Wales
• C/C++のstatic Analsysを行うGoanna http://redlizards.com/
• NIST のStatic analysis tool exposition (SATE)に参加。
19. SSV’10
• Refreed paper
– Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts
Toshiyuki Maeda and Akinori Yonezawa, University of Tokyo
• 東大前田さんのマルチスレッドで型変更が起こった場合の処理について。
• 日本語スライド http://web.yl.is.s.u-tokyo.ac.jp/raw-attachment/wiki/GeneralMeeting/tosh-talk-20100728-003.pptx
– Counterexample-Guided Abstraction Refinement for PLCs
Sebastian Biallas, Jörg Brauer, and Stefan Kowalewski, Embedded Software Laboratory, RWTH Aachen University
– dBug: Systematic Evaluation of Distributed Systems
Jiri Simsa, Randy Bryant, and Garth Gibson, Carnegie Mellon University
– Model-based Testing Without a Model: Assessing Portability in the Seattle Testbed
Justin Cappos and Jonathan Jacky, University of Washington
• ワシントン大の分散コンピューティングのテストベッドであるSeatle https://seattle.cs.washington.edu/html/
• PythonによるModel Check用のフレームワークを提供。PyModel model based testing framework
http://staff.washington.edu/jon/pymodel/www/
– Correctness Proofs for Device Drivers in Embedded Systems– Correctness Proofs for Device Drivers in Embedded Systems
Jianjun Duan and John Regehr, University of Utah
– Lyrebird—Assigning Meanings to Machines
David Cock, NICTA and University of New South Wales
• カーネル検証を行ったSeL4の検証モデルとなるLyrebird。実マシンをモデルに検証するのは大変なのでシンプルなモデルで扱う。MMUをモ
デル化する。
– A Precise Memory Model for Low-Level Bounded Model Checking
Carsten Sinz, Stephan Falke, and Florian Merz, Institute for Theoretical Computer Science, Karlsruhe Institute of Technology
– Verification of Stack Manipulation in the SCIP Processor
J. Aaron Pendergrass, Johns Hopkins University Applied Physics Laboratory
– Towards Proving Security in the Presence of Large Untrusted Components
June Andronick, NICTA and University of New South Wales; David Greenaway, NICTA; Kevin Elphinstone, NICTA and
University of New South Wales
• SeL4で10K行の検証はできたが現状は10M行のプログラム。このためtrustedとuntrustedを分け、trustedは検証、untrustedは隔離実行す
る仕組み
– Loop Refinement Using Octagons and Satisfiability
Jörg Brauer, Volker Kamin, and Stefan Kowalewski, Embedded Software Laboratory, RWTH Aachen University; Thomas
Noll, Software Modelling and Verification Group, RWTH Aachen University