Yuan Luo, Ph.D. [骆远博士]
See latest updates on Linkedin
yuan [at] yuanluo [dot] net
Yuan Luo works as Data Architect at Edmodo where he designed and implemented the social recommendation system, and the new version of the data-processing pipeline. He is most known for the Hierarchical MapReduce research which inspired at least 7 US patents. He received his PhD in Computer Science from Indiana University in 2015. He was recipient of the K. Jon Barwise Fellowship in School of Informatics and Computing at Indiana University. His research committee members were Prof. Beth Plale, Prof. Geoffrey Fox, Prof. Judy Qiu, Prof. Yuqing Wu from IU, and Dr. Philip Papadopoulos from UCSD. He was a researcher at Data to Insight Center where he was tech lead of multiple NSF/NASA funded projects. He was a research intern at IBM T. J. Watson Research Center in 2012, an research intern in Center for Research in Biological Systems (CRBS) at UCSD in 2009. He was member of Extreme Computing Lab at IUCS under Dr. Dennis Gannon. Yuan Luo received his BS and MS degree in computer science from Jilin University in 2005 and 2008 respectively. He was a visiting scholar of University of California, San Diego. He is affiliated researcher of Pacific Rim Applications and Grid Middleware Assembly (PRAGMA), and co-founder and co-chair emeritus of PRAGMA Students. He served as poster session Chair of the 23rd,24th and 25th PRAGMA Workshop, program committee member of PRAGMA24 and PRAGMA25, and demonstration session chair of PRAGMA27. He was instructor of National Biomedical Computation Resource (NBCR) Summer Institute in 2006 and 2009. Luo is currently on the technical program committee of IEEE CloudNet conference.
Luo lost 40 lbs of fat in a 4-month period. He shares fitness experience with over 160,000 followers on weibo.com. His body transformation story and diet recipes have received hundreds of millions of cumulative views on internet and impacted millions of people to keep fit and stay healthy. He co-founded a Chinese online fitness initiative which later became FitTime即刻运动, helped the initiative launch mobile apps, define and implement online marketing strategies.
Grid Computing, Cloud Computing, Data Intensive Distributed Computing,
* Virtual Cluster Controller
Role:Principal Designer and Developer
The Virtual Cluster Controller (aka, Personal Cloud Controller) is part of the NSF PRAGMA project (SAVI: PRAGMA--ENABLING SCIENTIFIC EXPEDITIONS AND INFRASTRUCTURE EXPERIMENTATION FOR PACIFIC RIM INSTITUTIONS AND RESEARCHERS, OCI 1234983). The SAVI PRAGMA project is launched to enable small-to-medium size international groups to conduct collaborative research and education. It is by creating multidisciplinary, multi-institutional scientific expeditions that pair domain scientists with computer scientists who, together, define, develop and deploy international-scale, experimental cyberinfrastructure. Specifically, by engaging PRAGMA's more than 30 members and affiliates around the Pacific Rim, the SAVI funding will launch or extend three scientific expeditions in biodiversity, lake eutrophication and infectious diseases. The Personal Cloud Controller is being developed to enable international-scale, experimental cyberinfrastructure. The Personal Cloud Controller provides users with high degree of controllability for managing their virtual clusters as well as access detailed status data to monitor the health of the virtual clusters.
See Project Page PRAGMA GitHub for more information.
* Hierarchical MapReduce
Role: Project Creator, Principal Designer and Developer
MapReduce is a model for processing huge datasets on embarrassingly parallel applications using a large number of compute resources. But typical MapReduce frameworks are limited to scheduling jobs to run within a single cluster. However, a single cluster would not be easy to scale, and the input dataset could be widely distributed across multiple clusters. We extend the MapReduce framework to a hierarchical framework that gathers computation resources from different clusters and run MapReduce jobs across them. The applications implemented in this framework adopt the "Map-Reduce-Global Reduce" model where computations are expressed as three functions: Map, Reduce, and Global Reduce. The global controller in our framework splits the data set and maps them onto multiple "local" MapReduce clusters to run map and reduce functions, and the local results are returned back to the global controller to run the Global Reduce function.
See Project Page Hierarchical MapReduce for more information.
* Karma Provenance Collection Tool
Role: Messaging System Designer, Core Developer
Provenance (or lineage, trace) of digital scientific data is a critical component to broadening sharing and reuse of scientific data. Provenance captures the information needed to attribute ownership and determine, among other things, the quality of a particular data set. Provenance collection is often a tightly coupled part of a cyberinfrastructure system, but is better served as a standalone tool. The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.
See Karma Provenance Collection Tool for more information.
Role: Core Developer
The project will improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community. It will customize and integrate Karma, a proven provenance tool into NASA data production by collecting and disseminating provenance of Advanced Microwave Scanning Radiometer - Earth Observing (AMSR-E) standard data products, intially focusing on Sea Ice. The plan is to engage the Sea Ice science team and user community and adhere to the Open Provenance Model (OPM).
See InstantKarma for more information.
Role: Core Developer
The GENI Provenance Registry (NetKarma) project, funded in October 2009, provides a tool for capturing the workflow of GENI slice creation, topology of the slice, operational status and other measurement statistics and correlate it with the experimental data. The tool, NetKarma, allows researchers to see the exact state of the network and store configuration of the experiment and its slice. The provenance of the data will be stored and visualized through a data portal. The provenance data can be used by the researcher to analyze their data, allow for the suspension and resumption of an experiment and provide a single reference to find the details and data collected in an experiment. NetKarma is based on the Karma provenance architecture that has been used to collect scientific workflows in diverse domains such as meterology and life science.
See NetKarma for more information.
* PRAGMA Cloud
Role: Technical Lead and principal developer at Indiana University
* Linked Environments for Atmospheric Discovery (LEAD)
Role: Experiment Builder Developer
Linked Environments for Atmospheric Discovery (LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. The LEAD Portal brings together all the necessary resources at one convenient access point, supported by high-performance computing systems. With LEAD, meteorologists, researchers, educators, and students are no longer passive bystanders or limited to static data or pre-generated images, but rather they are active participants who can acquire and process their own data. LEAD software enhances the experimental process by automating many of the time consuming and complicated tasks associated with meteorological science. The "workflow" tool links data management, assimilation, forecasting, and verification applications into a single experiment. The experiment's output also includes detailed descriptions of the product, also called "metadata."
See LEAD Portal for more information.
* Opal Toolkit
Role: Job Manager (CSF4 Meta-scheduler, and Sigiri Job Manager) Developer
The Grid-based infrastructure enables large-scale scientific applications to be run on distributed resources and coupled in innovative ways. However, in practice, Grid resources are not very easy to use for the end-users who have to learn how to generate security credentials, stage inputs and outputs, access Grid-based schedulers, and install complex client software. There is an imminent need to provide transparent access to these resources so that the end-users are shielded from the complicated details, and free to concentrate on their domain science. Scientific applications wrapped as Web services alleviate some of these problems by hiding the complexities of the back-end security and computational infrastructure, only exposing a simple SOAP API that can be accessed programmatically by application-specific user interfaces. However, writing the application services that access Grid resources can be quite complicated, especially if it has to be replicated for every application. Towards that end, we have implemented Opal, which is a toolkit for wrapping scientific applications as Web services in a matter of hours, providing features such as scheduling, standards-based Grid security and data management in an easy-to-use and configurable manner.
See Opal Website for more information.
* Community Scheduler Framework 4 (CSF4) Meta-scheduler & CSF4 Portlet (Since 2004)
Role: CSF4 Developer, CSF Portlet Designer/Developer
Community Scheduler Framework 4 (CSF4) is the first WSRF compliant community meta-scheduler, and released as an execution management service of Globus Toolkit 4. Using CSF4, the users can work with different local job schedulers, such as LSF, PBS, Condor and SGE, which may belong to different domains. CSF4 Portlet, first carried out in 2006 through the collaboration between Jilin University and University of California, San Diego (UCSD), is a java based web application for dispatching jobs to remote job schedulers, through a web browser, without understanding the underlying Grid services.
* Avian Flu Grid (Since March 2007)
Role: PRAGMA Portal Developer, CSF4 Developer, CSF Portlet Designer/Developer
This project aims to use the grid and high performance computing infrastructure to develop a model for global collaboration in the fight against the pandemic threat of avian flu and other emerging infectious diseases. Through a global partnership forged over the PRAGMA grid development activities, we now aim to build a scalable, global, and open knowledge environment for developing novel inhibitors to avian flu.
The Avian Flu Grid is an integrative effort based on the technology developed by several member institutes to support advanced scientific research for avian flu. The calculation based on these state-of-the-art computational approaches is managed by the CSF4 meta-scheduler through either PRAGMA Portal or Opal-based application specific web services which leverages CSF4 for job distribution.
My work is to support the
scheduling of multiple clusters (CSF4) to distribute jobs transparently at multiple sites around the region.
Presentations (excludes paper presentation)[Top]
Hosted Research Workshops
(Reviewed over 100 abstract submissions for workshop posters and demonstrations over the years.)
Invited Paper Reviews