dc.description.abstract | Cloud computing has become increasingly popular in recent years. The benefits of cloud platforms include ease of application deployment, a pay-as-you-go model, and the ability to scale resources up or down based on an application's workload. Today's cloud platforms are being used to host increasingly complex distributed and parallel applications. The main premise of this thesis is that application-aware resource management techniques are better suited for distributed cloud applications over a systems-level one-size-fits-all approach. In this thesis, I study the cloud-based resource management techniques with a particular emphasis on how application-aware approaches can be used to improve system resource utilization and enhance applications' performance and cost. I first study always-on interactive applications that run on transient cloud servers such as Amazon spot instances. I show that by combining techniques like nested virtualization, live migration and lazy restoration together with intelligent bidding trategies, it is feasible to provide high availability to such applications while significantly reducing cost. I next study how to improve performance of parallel data processing applications like Hadoop and Spark that run in the cloud. I argue that network I/O contention in Hadoop can impact application throughput and implement a collaborative application-aware network and task scheduler using software-defined networking. By combining flow scheduling with task scheduling, our system can effectively avoid network contention and improve Hadoop's performance. I then investigate similar issues in Spark and find that task scheduling is more important for Spark jobs. I propose a network-aware task scheduling method that can adaptively schedule tasks for different types of jobs without system tuning and improve Spark's performance significantly. Finally, I study how to deploy network functions in the cloud. Specifically, I focus on comparing different methods of chaining network functions. By carrying out empirical evaluation of two different deployment methods, we figure out the advantages and disadvantages of each method. Our results suggest that the tenant-centric placement provides lower latencies while service-centric approach is more flexible for reconfiguration and capacity scaling. | en_US |