Authors must make a good faith effort to anonymize their submissions, and they should not identify themselves or their institutions either explicitly or by implication (e.g., through the references or acknowledgments). Penglai also reduces the latency of secure memory initialization by three orders of magnitude and gains 3.6x speedup for real-world applications (e.g., MapReduce). To achieve low overhead, selective profiling gathers runtime execution information selectively and incrementally. For instance, the following are not sufficient grounds to specify a conflict with a PC member: they have reviewed the work before, they are employed by your competitor, they are your personal friend, they were your post-doc advisor or advisee, or they had the same advisor as you. This is especially true for DPF over Rnyi DP, a highly composable form of DP. She is the recipient of several best paper awards, the Einstein Chair of the Chinese Academy of Science, the ACM/SIGART Autonomous Agents Research Award, an NSF Career Award, and the Allen Newell Medal for Excellence in Research. We first introduce two new hardware primitives: 1) Guarded Page Table (GPT), which protects page table pages to support page-level secure memory isolation; 2) Mountable Merkle Tree (MMT), which supports scalable integrity protection for secure memory. She also invented the spanning tree algorithm, which transformed Ethernet from a technology that supported a few hundred nodes, to something that can support large networks. sosp ACM Symposium on Operating Systems Principles. Papers not meeting these criteria will be rejected without review, and no deadline extensions will be granted for reformatting. There are two major GNN training obstacles: 1) it relies on high-end servers with many GPUs which are expensive to purchase and maintain, and 2) limited memory on GPUs cannot scale to today's billion-edge graphs. This paper describes the design, implementation, and evaluation of Addra, the first system for voice communication that hides metadata over fully untrusted infrastructure and scales to tens of thousands of users. Memory allocation represents significant compute cost at the warehouse scale and its optimization can yield considerable cost savings. First, Fluffy mutates and executes multi-transaction test cases to find consensus bugs which cannot be found using existing fuzzers for Ethereum. VLDB 2021: Venue Tivoli Hotel & Congress Center Arni Magnussons Gade 2 1577 Copenhagen, Denmark +45 3268 4300 In-person attendees can purchase tickets for the park / gardens with a 15% discount, which is a special offer by Tivoli Hotel & Congress Center to VLDB 2021 attendees. Weak Links in Authentication Chains: A Large-scale Analysis of Email Sender Spoofing Attacks If you are uncertain about how to anonymize your submission, please contact the program co-chairs, osdi21chairs@usenix.org, well in advance of the submission deadline. Haojie Wang, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, and Liyan Zheng, Tsinghua University; Yuanzhi Li, Carnegie Mellon University; Kaiyuan Rong and Yuanyong Chen, Tsinghua University; Zhihao Jia, Carnegie Mellon University and Facebook. We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. The papers will be available online to everyone beginning on the first day of the conference, July 14, 2021. We propose a new framework for computing the embeddings of large-scale graphs on a single machine. Youngseok Yang, Seoul National University; Taesoo Kim, Georgia Institute of Technology; Byung-Gon Chun, Seoul National University and FriendliAI. However, Addra improves message latency in this architecture, which is a key performance metric for voice calls. Tao Luo, Mingen Pan, Pierre Tholoniat, Asaf Cidon, and Roxana Geambasu, Columbia University; Mathias Lcuyer, Microsoft Research. We develop MAGE, an execution engine for SC that efficiently runs SC computations that do not fit in memory. These scripts often make pages slow to load, partly due to a fundamental inefficiency in how browsers process JavaScript content: browsers make it easy for web developers to reason about page state by serially executing all scripts on any frame in a page, but as a result, fail to leverage the multiple CPU cores that are readily available even on low-end phones. USENIX new Date().getFullYear()>document.write(new Date().getFullYear()); Grants for Black Computer Science Students Application, Title Page, Copyright Page, and List of Organizers, OSDI '21 Proceedings Interior (PDF, best for mobile devices). When uploading your OSDI 2021 reviews for your submission to SOSP, you can optionally append a note about how you addressed the reviews and comments. Authors should email the program co-chairs, osdi21chairs@usenix.org, a copy of the related workshop paper and a short explanation of the new material in the conference paper beyond that published in the workshop version. This distinction forces a re-design of the scheduler. Such centralized engines are in a perfect position to censor content and violate users privacy, undermining some of the key tenets behind decentralization. USENIX discourages program co-chairs from submitting papers to the conferences they organize, although they are allowed to do so. (Visa applications can take at least 30 working days to process.) Lifting predicates and crash framing make the specification easy to use for developers, and logically atomic crash specifications allow for modular reasoning in GoJournal, making the proof tractable despite complex concurrency and crash interleavings. Mothy's current research centers on Enzian, a powerful hybrid CPU/FPGA machine designed for research into systems software. OSDI brings together professionals from academic and industrial backgrounds in a premier forum for discussing the design, implementation, and implications of systems software. OSDI'21 accepted 31 papers and 26 papers participated in the AE, a significant increase in the participate ratio: 84%, compared to OSDI'20 (70%) and SOSP'19 (61%). To this end, we propose GNNAdvisor, an adaptive and efficient runtime system to accelerate various GNN workloads on GPU platforms. Using selective profiling, we build DMon, a system that can automatically locate data locality problems in production, identify access patterns that hurt locality, and repair such patterns using targeted optimizations. One important reason for the high cost is, as we observe in this paper, that many sanitizer checks are redundant the same safety property is repeatedly checked leading to unnecessarily wasted computing resources. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. This paper addresses a key missing piece in the current ecosystem of decentralized services and blockchain apps: the lack of decentralized, verifiable, and private search. Authors are required to register abstracts by 3:00 p.m. PST on December 3, 2020, and to submit full papers by 3:00 p.m. PST on December 10, 2020. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. In contrast, CLP achieves significantly higher compression ratio than all commonly used compressors, yet delivers fast search performance that is comparable or even better than Elasticsearch and Splunk Enterprise. As a result, data characteristics and device capabilities vary widely across clients. Distributed Trust: Is Blockchain the answer? The blockchain community considers this hard fork the greatest challenge since the infamous 2016 DAO hack. Session Chairs: Ryan Huang, Johns Hopkins University, and Manos Kapritsos, University of Michigan, Jianan Yao, Runzhou Tao, Ronghui Gu, Jason Nieh, Suman Jana, and Gabriel Ryan, Columbia University. Pollux simultaneously considers both aspects. Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. OSDI is "a premier forum for discussing the design, implementation, and implications of systems software." A total of six research papers from the department were accepted to the . Welcome to the 2021 USENIX Annual Technical Conference (ATC '21) submissions site! Sanitizers detect unsafe actions such as invalid memory accesses by inserting checks that are validated during a programs execution. All deadline times are 23:59 hrs UTC. The 15th USENIX Symposium on Operating Systems Design and Implementation seeks to present innovative, exciting research in computer systems. USENIX NSDI, 2021 Acceptance Rate: 15.99% Fluid: Resource-Aware Hyperparameter Tuning Engine P. Yu*, J. Liu*, M. Chowdhury (*Equal contribution) MLSys, 2021 Acceptance Rate: 23.53% NetLock: Fast, Centralized Lock Management Using Programmable Switches Z. Yu, Y. Zhang, V. Braverman, M. Chowdhury, X. Jin ACM SIGCOMM, 2020 Acceptance Rate: 21.6% Papers so short as to be considered extended abstracts will not receive full consideration. This is unfortunate because good OS design has always been driven by the underlying hardware, and right now that hardware is almost unrecognizable from ten years ago, let alone from the 1960s when Unix was written. Extensive experiments show that GNNAdvisor outperforms the state-of-the-art GNN computing frameworks, such as Deep Graph Library (3.02 faster on average) and NeuGraph (up to 4.10 faster), on mainstream GNN architectures across various datasets. Consensus bugs are bugs that make Ethereum clients transition to incorrect blockchain states and fail to reach consensus with other clients. Password NrOS replicates kernel state on each NUMA node and uses operation logs to maintain strong consistency between replicas. We will look at various problems and approaches, and for each, see if blockchain would help. Just using Lambdas on top of CPU servers offers up to 2.75 more performance-per-dollar than training only with CPU servers. Hence, kernel developers are constantly refining synchronization within OS kernels to improve scalability at the risk of introducing subtle bugs. We focus on NVMe storage devices and show that it is natural to express these semantics in the kernel and the application and only requires a modest two-bit change to the device interface. We also welcome work that explores the interface to related areas such as computer architecture, networking, programming languages, analytics, and databases. See the Preview Session page for an overview of the topics covered in the program. Authors may upload supplementary material in files separate from their submissions. Qing Wang, Youyou Lu, Junru Li, and Jiwu Shu, Tsinghua University. CLP's gains come from using a tuned, domain-specific compression and search algorithm that exploits the significant amount of repetition in text logs. By submitting a paper, you agree that at least one of the authors will attend the conference to present it. We present the results of a 1% experiment at fleet scale as well as the longitudinal rollout in Googles warehouse scale computers. Widely used log-search tools like Elasticsearch and Splunk Enterprise index the logs to provide fast search performance, yet the size of the index is within the same order of magnitude as the raw log size. Welcome to the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI '22) submissions site. Papers must be in PDF format and must be submitted via the submission form. Main conference program: 5-8 April 2022. Despite having the same end goals as traditional ML, FL executions differ significantly in scale, spanning thousands to millions of participating devices. Shaghayegh Mardani, UCLA; Ayush Goel, University of Michigan; Ronny Ko, Harvard University; Harsha V. Madhyastha, University of Michigan; Ravi Netravali, Princeton University. For example, optimistic concurrency control (OCC) is better than two-phase-locking (2PL) under low contention, while the converse is true under high contention. The overhead of GPT is 5% for memory-intensive workloads (e.g., Redis) and negligible for CPU-intensive workloads (e.g., RV8 and Coremarks). We particularly encourage contributions containing highly original ideas, new approaches, and/or groundbreaking results. For conference information, . Web pages today commonly include large amounts of JavaScript code in order to offer users a dynamic experience. This is the first OSDI in an odd year as OSDI moves to a yearly cadence. After request completion, an I/O device must decide either to minimize latency by immediately firing an interrupt or to optimize for throughput by delaying the interrupt, anticipating that more requests will complete soon and help amortize the interrupt cost. The key insight in blk-switch is that Linux's multi-queue storage design, along with multi-queue network and storage hardware, makes the storage stack conceptually similar to a network switch. This fast path contains programmable hardware support for low latency transport and congestion control as well as hardware support for efficient load balancing of RPCs to cores. Jaehyun Hwang and Midhul Vuppalapati, Cornell University; Simon Peter, UT Austin; Rachit Agarwal, Cornell University. Marius is open-sourced at www.marius-project.org. As increasingly more sensitive data is being collected to gain valuable insights, the need to natively integrate privacy controls in data analytics frameworks is growing in importance. In this talk, I'll speculate on how we came to this unfortunate state of affairs, and what might be done to fix it. Prepublication versions of the accepted papers from the summer submission deadline are available below. All papers will be available online to registered attendees before the conference. Advisor: You have a past or present association as thesis advisor or advisee. We demonstrate that Marius achieves the same level of accuracy but is up to one order of magnitude faster. PET discovers and applies program transformations that improve computation efficiency but only maintain partial functional equivalence. We demonstrate the above using design, implementation and evaluation of blk-switch, a new Linux kernel storage stack architecture. Paper abstracts and proceedings front matter are available to everyone now. We have made Fluffy publicly available at https://github.com/snuspl/fluffy to contribute to the security of Ethereum. Most existing schedulers expect users to specify the number of resources for each job, often leading to inefficient resource use. If your paper is accepted and you need an invitation letter to apply for a visa to attend the conference, please contact conference@usenix.org as soon as possible. Our evaluation shows that NrOS scales to 96 cores with performance that nearly always dominates Linux at scale, in some cases by orders of magnitude, while retaining much of the simplicity of a sequential kernel. The OSDI '21 program co-chairs have agreed not to submit their work to OSDI '21. Abstract registrations that do not provide sufficient information to understand the topic and contribution (e.g., empty abstracts, placeholder abstracts, or trivial abstracts) will be rejected, thereby precluding paper submission. In addition, CLP outperforms Elasticsearch and Splunk Enterprise's log ingestion performance by over 13x, and we show CLP scales to petabytes of logs. Swapnil Gandhi and Anand Padmanabha Iyer, Microsoft Research. Graph Neural Networks (GNNs) have gained significant attention in the recent past, and become one of the fastest growing subareas in deep learning. ), Program Co-Chairs: Angela Demke Brown, University of Toronto, and Jay Lorch, Microsoft Research. With the help of thousands of Lambda threads, Dorylus scales GNN training to billion-edge graphs. First, it enables a caller to push a message to a callee in two hops, using a new way of assigning mailboxes to users that resembles how a post office assigns PO boxes to its customers. Dorylus is up to 3.8 faster and 10.7 cheaper compared to existing sampling-based systems. Consensus bugs are extremely rare but can be exploited for network split and theft, which cause reliability and security-critical issues in the Ethereum ecosystem. NrOS is primarily constructed as a simple, sequential kernel with no concurrency, making it easier to develop and reason about its correctness. In experiments with real DL jobs and with trace-driven simulations, Pollux reduces average job completion times by 37-50% relative to state-of-the-art DL schedulers, even when they are provided with ideal resource and training configurations for every job. Forgot your password? We discuss the design and implementation of TEMERAIRE including strategies for hugepage-aware memory layouts to maximize hugepage coverage and to minimize fragmentation overheads. The 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) will take place as a virtual event on July 14-16, 2021. Manuela will present examples and discuss the scope of AI in her research in the finance domain. Yet, existing efforts randomly select FL participants, which leads to poor model and system efficiency. In this paper, we show how to address this inefficiency without requiring pages to be rewritten or browsers to be modified. However, with the increasingly speedy transactions and queries thanks to large memory and fast interconnect, commodity HTAP systems have to make a tradeoff between data freshness and performance degradation. Erhu Feng, Xu Lu, Dong Du, Bicheng Yang, and Xueqiang Jiang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Yubin Xia, Binyu Zang, and Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China. Zeph executes privacy-adhering data transformations in real-time and scales to thousands of data sources, allowing it to support large-scale low-latency data stream analytics. Additionally, there is no assurance that data processing and handling comply with the claimed privacy policies. Proceedings Front Matter Based on the observation that real-world workloads always feature skewed access patterns, Nap introduces a NUMA-aware layer (NAL) on the top of existing concurrent PM indexes, and steers accesses to hot items to this layer. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Session Chairs: Nadav Amit, VMware Research Group, and Ada Gavrilovska, Georgia Institute of Technology, Stephen Ibanez, Alex Mallery, Serhat Arslan, and Theo Jepsen, Stanford University; Muhammad Shahbaz, Purdue University; Changhoon Kim and Nick McKeown, Stanford University. PLDI is a premier forum for programming language research, broadly construed, including design, implementation, theory, applications, and performance. Used Zotero to organize papers about the stress and diffusion between anode and electrolyte and made a summary . Session Chairs: Dushyanth Narayanan, Microsoft Research, and Gala Yadgar, TechnionIsrael Institute of Technology, Jinhyung Koo, Junsu Im, Jooyoung Song, and Juhyung Park, DGIST; Eunji Lee, Soongsil University; Bryan S. Kim, Syracuse University; Sungjin Lee, DGIST. The 20th ACM Workshop on Hot Topics in Networks (HotNets 2021) will bring together researchers in computer networks and systems to engage in a lively debate on the theory and practice of computer networking. Session Chairs: Deniz Altinbken, Google, and Rashmi Vinayak, Carnegie Mellon University, Tanvir Ahmed Khan and Ian Neal, University of Michigan; Gilles Pokam, Intel Corporation; Barzan Mozafari and Baris Kasikci, University of Michigan. The experimental results show that Penglai can support 1,000s enclave instances running concurrently and scale up to 512GB secure memory with both encryption and integrity protection. PET then automatically corrects results to restore full equivalence. Horcrux-compliant web servers perform offline analysis of all the JavaScript code on any frame they serve to conservatively identify, for every JavaScript function, the union of the page state that the function could access across all loads of that page. Authors of each accepted paper must ensure that at least one author registers for the conference, and that their paper is presented in-person at the conference. After three years working on web-based collaboration systems at a startup in North Carolina, he joined Sprint's Advanced Technology Lab in Burlingame, California, in 1998, working on cloud computing and network monitoring. We propose Marius, a system for efficient training of graph embeddings that leverages partition caching and buffer-aware data orderings to minimize disk access and interleaves data movement with computation to maximize utilization. Her robot soccer teams have been RoboCup world champions several times, and the CoBot mobile robots have autonomously navigated for more than 1,000km in university buildings. We observe that scalability challenges in training GNNs are fundamentally different from that in training classical deep neural networks and distributed graph processing; and that commonly used techniques, such as intelligent partitioning of the graph do not yield desired results. SanRazor adopts a novel hybrid approach it captures both dynamic code coverage and static data dependencies of checks, and uses the extracted information to perform a redundant check analysis. We also verified a simple NFS server using GoJournals specs, which confirms that they are helpful for application verification: a significant part of the proof doesnt have to consider concurrency and crashes. Mingyu Li, Jinhao Zhu, and Tianxu Zhang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Cheng Tan, Northeastern University; Yubin Xia, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Sebastian Angel, University of Pennsylvania; Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China. Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding, University of California, Santa Barbara. Papers accompanied by nondisclosure agreement forms will not be considered. However, memory allocation decisions also impact overall application performance via data placement, offering opportunities to improve fleetwide productivity by completing more units of application work using fewer hardware resources. However, your OSDI submission must use an anonymized name for your project or system that differs from any used in such contexts. These are hard deadlines, and no extensions will be given. will work with the steering committee to ensure that the symposium program will accommodate presentations for all accepted papers. . In addition, increasing CPU core counts further complicate kernel development. Professor Veloso is the Past President of AAAI (the Association for the Advancement of Artificial Intelligence), and the co-founder, Trustee, and Past President of RoboCup. Zeph enforces privacy policies cryptographically and ensures that data available to third-party applications complies with users' privacy policies. The ZNS+ also allows each zone to be overwritten with sparse sequential write requests, which enables the LFS to use threaded logging-based block reclamation instead of segment compaction. PLDI seeks outstanding research that extends and/or applies programming-language concepts to advance the field of computing. We identify that current systems for learning the embeddings of large-scale graphs are bottlenecked by data movement, which results in poor resource utilization and inefficient training. Computation separation makes it possible to construct a deep, bounded-asynchronous pipeline where graph and tensor parallel tasks can fully overlap, effectively hiding the network latency incurred by Lambdas. We describe Fluffy, a multi-transaction differential fuzzer for finding consensus bugs in Ethereum. HotCRP.com signin Sign in using your HotCRP.com account. Registering abstracts a week before paper submission is an essential part of the paper-reviewing process, as PC members use this time to identify which papers they are qualified to review. Table of Contents | Authors must limit their responses to (a) correcting factual errors in the reviews or (b) directly addressing questions posed by reviewers. If the conference registration fee will pose a hardship for the presenter of the accepted paper, please contact conference@usenix.org. Submitted papers must be no longer than 12 single-spaced 8.5 x 11 pages, including figures and tables, plus as many pages as needed for references, using 10-point type on 12-point (single-spaced) leading, two-column format, Times Roman or a similar font, within a text block 7 wide x 9 deep. We conclude with a discussion of additional techniques for improving the allocator development process and potential optimization strategies for future memory allocators.