Past Academic Research Projects
Cyber-Physical Green Hadoop Cluster
Uneven thermal-profile and thermal hotspots are a ubiquitous issue in data centers because of complex air flow patterns and varying ability of the cooling system to cool different parts of the data center. On the other hand, data differs in semantics such as access-profiles, sizes, and popularity. In this work, Green Hadoop cluster combines its predictive data-semantics knowledge on the cyber-side with the thermal-profile knowledge of the cluster on the physical-side to do energy-and thermal-aware data placement. Since, computations are sent to the data in the Cloud compute model, energy- and thermal-aware data placement naturally results in an energy-efficient task placement resulting in cooling costs reduction.
Predictive Data and Energy Management in GreenHDFS
In this work, GreenHDFS uses predictive data models that help predict various attributes of the file at the file creation time itself, in order to make the data and energy management self-adaptive, and proactive. These predicted attributes, other data-semantics, and file system insights are then used to guide file management policies such as file migration, file zone placement, and file replication in a finely-tuned, self-adaptive manner instead of relying on one-size-fits-all data management policies. The predictive data models are derived by supervised learning of historical data..
The compute model of the Cloud presents significant challenges to existing task-centric (i.e., aim to migrate and consolidate workload on few servers
during periods of low load so that rest of the servers can be scaled-down) scale-down techniques. Scale-down is very attractive as it allows significant energy savings while allowing energy-proportionality with non-energy-proportional components such as the disks and DRAM. However, scale-down requires significant periods of idleness to amortize the transition energy and performance penalty. It is also important to ensure that scale-down of the servers doesn't impact the performance of the system and also doesn't impact the reliability of the system by resulting in too frequent state transitions. GreenHDFS takes a novel data-classification driven scale-down approach which doesn't impact the performance or the reliability. GreenHDFS splits the cluster into thermal-efficient and thermal-inefficient zones and places dormant data on the thermal-inefficient servers which can then be scaled-down.
PRES: Probabilistic Replay via Execution Sketching
Concurrent programming has become pervasive with the advent of multi-core systems. Concurrency bugs are difficult to detect and diagnose due to their non-deterministic nature. In order to reduce the performance overhead of recording during the production run, team explored software-based methods that could reproduce bugs on multi-processors while adding only a small overhead during the production run. The technique is called PRES: Probabilistic Replay via Execution Sketching. PRES gathers only partial execution information during the production run and uses an intelligent replayer which reconstructs the complete information needed for bug reproduction at the time of replay.
Spotlight on Spotlight
Research goal was to explore a tight integration of the information search and retrieval capability with the file system for higher performance of the search, storage-efficiency of the search indices, and for real-time indexing. As the first phase of the research project, performed a thorough evaluation of Mac’s Spotlight (an information search and retrieval tool) since it is semi-integrated with the file system, and figured out several limitations of a semi-integrated model such as Spotlight’s.
Resource Management in active networking
University of Texas, Austin
Investigated techniques to provide resource management in active networking.
FraPPucino - Field Programmable Processor
University of Texas, Austin, Technology Driven Computer
Architecture, Professor Steve Keckler
Proposed an integration of a RISC processor core with a PLD to combine the traditional advantages of PLD, like ease-of-use, lower risk and faster time to market with the computational power and flexibility of a RISC processor.
Resource management in GLUnix
University of Texas, Austin, Web Operating Systems, Professor Mike Dahlin
Implemented resource management in GLUnix, an operating system for a network of workstations (NOW).
Family-based logging in Treadmarks
University of Texas, Austin, Distributed Computing – I,
Professor Lorenzo Alvisi
Implemented “Family-based Logging”, an optimistic causal based logging system, to provide fault tolerance to TreadMarks - a distributed shared memory system.
Modeling of Parametric Variation by a Nonlinear Function
Solved a problem of describing a parametric variation of an electrical motor by means of a non-linear function. The aim was directed towards mathematical representation of a realistic variation in a closed form. The coefficients in the chosen non linear function were calculated by means of good judgment of said parametric variation.