Yuan Luo | Data Architect, Computer Scientist, Entrepreneur

		Yuan Luo, Ph.D. [骆远博士]
		Ph.D. in Computer Science Data Engineering at Facebook former Data Architect at Edmodo Inc.
		See latest updates on: My pulications are on:
		yuan [at] yuanluo [dot] net
		http://www.yuanluo.net

[ Resume ] [ Curriculum Vitae ] [ Biography ] [ Industry Experience ] [ Teaching ] [ Publications ] [ Presentations ] [ Research Projects ] [ Services ]

Biography[Top]

Yuan Luo is currently driving data engineering effort at Facebook to develop early stage products in emerging markets.

Prior to Facebook, Yuan Luo worked as Data Architect at Edmodo, where he initiated and led the "data-to-action" effort that automatically consumes massive data and triggers user-facing actions that contributed greatly to user acquisition and engagement. Luo redesigned the entire data platform and built a new data-processing pipeline, created Edmodo's social recommendation system, and was responsible for building highly scalable email and push notification platforms.

Academia:
He is most known for the Hierarchical MapReduce research which inspired at least 7 US patents. He received his PhD in Computer Science from Indiana University in 2015. He was recipient of the K. Jon Barwise Fellowship in School of Informatics and Computing at Indiana University. His research committee members were Prof. Beth Plale, Prof. Geoffrey Fox, Prof. Judy Qiu, Prof. Yuqing Wu from IU, and Dr. Philip Papadopoulos from UCSD. He was a researcher at Data to Insight Center where he was tech lead of multiple NSF/NASA funded projects. He was a research intern at IBM T. J. Watson Research Center in 2012, an research intern in Center for Research in Biological Systems (CRBS) at UCSD in 2009. He was member of Extreme Computing Lab at IUCS under Dr. Dennis Gannon. Yuan Luo received his BS and MS degree in computer science from Jilin University in 2005 and 2008 respectively. He was a visiting scholar of University of California, San Diego. He is affiliated researcher of Pacific Rim Applications and Grid Middleware Assembly (PRAGMA), and co-founder and co-chair emeritus of PRAGMA Students. He served as poster session Chair of the 23rd,24th and 25th PRAGMA Workshop, program committee member of PRAGMA24 and PRAGMA25, and demonstration session chair of PRAGMA27. He was instructor of National Biomedical Computation Resource (NBCR) Summer Institute in 2006 and 2009. Luo is currently on the technical program committee of IEEE CloudNet conference.

Entrepreneurship:
Luo lost 40 lbs of fat in a 4-month period. He shares fitness experience with hundreds of thousands of followers on weibo.com. His body transformation story and diet recipes have received hundreds of millions of cumulative views on internet and impacted millions of people to keep fit and stay healthy. He co-founded in 2013 a stealth startup which later became the well-known FitTime即刻运动, helped the startup grow user base, architect web platform, launch mobile apps, define and implement product and marketing strategies. He then left the startup to finish his PhD.

Industry Experience[Top]

At Edmodo, I am honored to work directly with a group of excellent people (engineering, product, data, marketing, support), where I enjoy building platforms, developing products, hacking growth, and mentoring engineers and analysts.
1) Product, User Acquisition and Engagement:

Initiated and led "data to action" project and built a systems (Datacow) that consumes massive data and triggers actions such as email. The system has been widely used in Edmodo, from user onboarding and content digests, to user reactivation and marketing. Datacow has been reached out to 20+ million users within a few months after launch and contributed greatly to the growth of (weekly) active users. Datacow is becoming the central engine to trigger all the non-transactional emails and push notifications.

Initiated and led the development of Edmodo's social recommendation system (PYMK), to increase user acquisition and engagement: a) user acquisition: recommend users to invite people from their contacts list, and b) user engagement: recommend users to connect with peers on Edmodo. Tests show that PYMK achieved the highest user acquisition rate among all our viral invite loop.

2) Data Platform:

Completed redesigned the entire data platform, including architected and led the development of data pipeline workflow management system, ETL process, and hybrid data stores. The new data platform (Redmodo), eases and expedites the consumption of 70 million user data for internal use (eg, Datacow) and for various product (eg, PYMK), while significantly increased the reliability and reduced the maintenance cost. Prior to Redmodo, data engineers performed adhoc fix literally daily. Now most maintenance tasks can be performed by data analysts and DevOps).

3) Email, Push Notification and SMS Platform:

Enhanced the platform to handle recipients and contents preparation and delivery with personalized user experience for millions of users on regular basis (tens of millions of deliveries per day).

Teaching [Top]

Fall 2014:
CSCI B669: Management, Access, and Use of Big and Complex Data
Office Hour: Mon 3:00-5:30 pm EDT, and Thur 6:00-7:30 pm EDT on Google Hangout

Spring 2010:
CSCI B534: Distributed Systems (Meets with CSCI B490: Seminar in Computer Science)
Tue and Thur 5:30pm-6:45pm, Informatics East Room 130
Office Hour: Thursday 2:30pm-3:50pm at LH301H

Fall 2009:
CSCI A110: Introduction to Computers and Computing, Undergraduate Course
CSCI B503: Algorithms Design and Analysis, Graduate Course
Office Hour: By appointment.

Research Projects[Top]

* Virtual Cluster Controller

Role:Principal Designer and Developer

The Virtual Cluster Controller (aka, Personal Cloud Controller) is part of the NSF PRAGMA project (SAVI: PRAGMA--ENABLING SCIENTIFIC EXPEDITIONS AND INFRASTRUCTURE EXPERIMENTATION FOR PACIFIC RIM INSTITUTIONS AND RESEARCHERS, OCI 1234983). The SAVI PRAGMA project is launched to enable small-to-medium size international groups to conduct collaborative research and education. It is by creating multidisciplinary, multi-institutional scientific expeditions that pair domain scientists with computer scientists who, together, define, develop and deploy international-scale, experimental cyberinfrastructure. Specifically, by engaging PRAGMA's more than 30 members and affiliates around the Pacific Rim, the SAVI funding will launch or extend three scientific expeditions in biodiversity, lake eutrophication and infectious diseases. The Personal Cloud Controller is being developed to enable international-scale, experimental cyberinfrastructure. The Personal Cloud Controller provides users with high degree of controllability for managing their virtual clusters as well as access detailed status data to monitor the health of the virtual clusters.

See Project Page PRAGMA GitHub for more information.

* Hierarchical MapReduce

Role: Project Creator, Principal Designer and Developer

MapReduce is a model for processing huge datasets on embarrassingly parallel applications using a large number of compute resources. But typical MapReduce frameworks are limited to scheduling jobs to run within a single cluster. However, a single cluster would not be easy to scale, and the input dataset could be widely distributed across multiple clusters. We extend the MapReduce framework to a hierarchical framework that gathers computation resources from different clusters and run MapReduce jobs across them. The applications implemented in this framework adopt the "Map-Reduce-Global Reduce" model where computations are expressed as three functions: Map, Reduce, and Global Reduce. The global controller in our framework splits the data set and maps them onto multiple "local" MapReduce clusters to run map and reduce functions, and the local results are returned back to the global controller to run the Global Reduce function.

See Project Page Hierarchical MapReduce for more information.

* Karma Provenance Collection Tool

Role: Messaging System Designer, Core Developer

Provenance (or lineage, trace) of digital scientific data is a critical component to broadening sharing and reuse of scientific data. Provenance captures the information needed to attribute ownership and determine, among other things, the quality of a particular data set. Provenance collection is often a tightly coupled part of a cyberinfrastructure system, but is better served as a standalone tool. The Karma tool is a standalone tool that can be added to existing cyberinfrastructure for purposes of collection and representation of provenance data. Karma utilizes a modular architecture that permits support for multiple instrumentation plugins that make it usable in different architectural settings.

See Karma Provenance Collection Tool for more information.

* NASA-InstantKarma

Role: Core Developer

The project will improve the collection, preservation, utility and dissemination of provenance information within the NASA Earth Science community. It will customize and integrate Karma, a proven provenance tool into NASA data production by collecting and disseminating provenance of Advanced Microwave Scanning Radiometer - Earth Observing (AMSR-E) standard data products, intially focusing on Sea Ice. The plan is to engage the Sea Ice science team and user community and adhere to the Open Provenance Model (OPM).

See InstantKarma for more information.

* GENI-NetKarma

Role: Core Developer

The GENI Provenance Registry (NetKarma) project, funded in October 2009, provides a tool for capturing the workflow of GENI slice creation, topology of the slice, operational status and other measurement statistics and correlate it with the experimental data. The tool, NetKarma, allows researchers to see the exact state of the network and store configuration of the experiment and its slice. The provenance of the data will be stored and visualized through a data portal. The provenance data can be used by the researcher to analyze their data, allow for the suspension and resumption of an experiment and provide a single reference to find the details and data collected in an experiment. NetKarma is based on the Karma provenance architecture that has been used to collect scientific workflows in diverse domains such as meterology and life science.

See NetKarma for more information.

* PRAGMA Cloud

Role: Technical Lead and principal developer at Indiana University

IU Servers part of PRAGMA Cloud, IU@PRAGMA

* Linked Environments for Atmospheric Discovery (LEAD)

Role: Experiment Builder Developer

Linked Environments for Atmospheric Discovery (LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. The LEAD Portal brings together all the necessary resources at one convenient access point, supported by high-performance computing systems. With LEAD, meteorologists, researchers, educators, and students are no longer passive bystanders or limited to static data or pre-generated images, but rather they are active participants who can acquire and process their own data. LEAD software enhances the experimental process by automating many of the time consuming and complicated tasks associated with meteorological science. The "workflow" tool links data management, assimilation, forecasting, and verification applications into a single experiment. The experiment's output also includes detailed descriptions of the product, also called "metadata."

See LEAD Portal for more information.

* Opal Toolkit

Role: Job Manager (CSF4 Meta-scheduler, and Sigiri Job Manager) Developer

The Grid-based infrastructure enables large-scale scientific applications to be run on distributed resources and coupled in innovative ways. However, in practice, Grid resources are not very easy to use for the end-users who have to learn how to generate security credentials, stage inputs and outputs, access Grid-based schedulers, and install complex client software. There is an imminent need to provide transparent access to these resources so that the end-users are shielded from the complicated details, and free to concentrate on their domain science. Scientific applications wrapped as Web services alleviate some of these problems by hiding the complexities of the back-end security and computational infrastructure, only exposing a simple SOAP API that can be accessed programmatically by application-specific user interfaces. However, writing the application services that access Grid resources can be quite complicated, especially if it has to be replicated for every application. Towards that end, we have implemented Opal, which is a toolkit for wrapping scientific applications as Web services in a matter of hours, providing features such as scheduling, standards-based Grid security and data management in an easy-to-use and configurable manner.

See Opal Website for more information.

* Community Scheduler Framework 4 (CSF4) Meta-scheduler & CSF4 Portlet (Since 2004)

Role: CSF4 Developer, CSF Portlet Designer/Developer

Community Scheduler Framework 4 (CSF4) is the first WSRF compliant community meta-scheduler, and released as an execution management service of Globus Toolkit 4. Using CSF4, the users can work with different local job schedulers, such as LSF, PBS, Condor and SGE, which may belong to different domains. CSF4 Portlet, first carried out in 2006 through the collaboration between Jilin University and University of California, San Diego (UCSD), is a java based web application for dispatching jobs to remote job schedulers, through a web browser, without understanding the underlying Grid services.

The source code is available at SourceForge and JLU Grid Team

* Avian Flu Grid (Since March 2007)

Role: PRAGMA Portal Developer, CSF4 Developer, CSF Portlet Designer/Developer

This project aims to use the grid and high performance computing infrastructure to develop a model for global collaboration in the fight against the pandemic threat of avian flu and other emerging infectious diseases. Through a global partnership forged over the PRAGMA grid development activities, we now aim to build a scalable, global, and open knowledge environment for developing novel inhibitors to avian flu.

The Avian Flu Grid is an integrative effort based on the technology developed by several member institutes to support advanced scientific research for avian flu. The calculation based on these state-of-the-art computational approaches is managed by the CSF4 meta-scheduler through either PRAGMA Portal or Opal-based application specific web services which leverages CSF4 for job distribution.

My work is to support the scheduling of multiple clusters (CSF4) to distribute jobs transparently at multiple sites around the region.

Publications[Top]

Ph.D. Dissertation

Yuan Luo, Virtual Cluster Management for Analysis of Geographically Distributed and Immovable Data, [Bloomington, Ind.] : Indiana University, 2015-08 [Link]

Journals

Hongliang Li, Xiaohui Wei, Qingwu Fu, Yuan Luo. (2013) MapReduce Delay Scheduling with Deadline Constraint, Concurrency and Computation: Practice and Experience, [DOI] (Impact Factor: 0.942)

Scott Jensen, Beth Plale, Mehmet Aktas, Yuan Luo, Peng Chen, and Helen Conover. Provenance Capture and Use in a Satellite Data Processing Pipeline, IEEE Transactions on Geoscience and Remote Sensing,51(11):5090-5097 (2013)[DOI] (Impact Factor: 3.36)

Yuan Luo, Beth Plale, Zhenhua Guo, Wilfred Li, Judy Qiu, Yiming Sun. (2012) Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Concurrency and Computation: Practice and Experience [DOI] (Impact Factor: 0.942)

Ding, Z.; Wei, X.; Luo, Y.; Ma D; Li, W. W.; Arzberger, P. W., Customized Plug-in Modules in Metascheduler CSF4 for Life Sciences Applications, New Generation Computing, Vol.25 No.4 2007. [pdf][DOI] (Impact Factor: 0.533)

Ding, Z.; Wei, X.; Luo, Y.; et al. A Virtual Job Model to Support Cross-Domain Synchronized Resource Allocation, Journal of Jilin University (Science Edition), Vol. 46 No.2, Mar 26, 2008. (In Chinese with English Abstract). [pdf]

Conferences/Workshops

Peng Chen, Beth Plale, You-Wei Cheah, Devarshi Ghoshal, Scott Jensen, and Yuan Luo. Visualization of Network Data Provenance, Workshop on Massive Data Analytics on Scalable Systems, co-located with High Performance Computing Conference (HiPC), Pune, India, December 18th - 21st, 2012. [DOI]

Plale, B., Withana, E. C., Herath, C., Chandrasekar, K., Luo, Y. Effectiveness of Hybrid Workflow Systems for Computational Science, International Conference on Computational Science (ICCS), Omaha, Nebraska, Jun 4-6, 2012. [DOI]

Luo, Y. and Plale, B. Hierarchical MapReduce Programming Model and Scheduling Algorithms, In Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), Ottawa, Canada, May 13-16, 2012. [DOI][pdf]

Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu, Wilfred Li. 2011. A Hierarchical Framework for Cross-Domain MapReduce Execution. In Proceedings of the second international workshop on Emerging computational methods for the life sciences (ECMLS '11). ACM, New York, NY, USA, 15-22. DOI=10.1145/1996023.1996026 [pdf][DOI][ECMLS2011 Workshop talk in HPDC]

Xiaohui Wei, Yuan Luo, Jishan Gao, et al. The Session Based Fault Tolerance Algorithm of Platform EGO Web Service Gateway, Proceedings of International Symposium on Grid Computing (ISGC2007), Academia Sinica, Taipei, Taiwan, March 26-29, 2007.[pdf][DOI]

Ding, Z.; Luo, Y.; Wei, X.; Misleh, C.; Li, W. W.; Arzberger, P. W.; Tatebe, O. My WorkSphere: Integrative Work Environment for Grid-unaware Biomedical Researchers and Applications, Proceedings of 2nd Grid Computing Environment Workshop, Supercomputing Conference 2006(SC06), Tampa, Florida, 2006.[pdf][RIT Digital Media Library]

Posters

A Personal Cloud Controller, PRAGMA 26 Workshop, Tainan, Taiwan, April 9-11, 2014

Network Transfer over Pacific Rim on PRAGMA Cloud: Performance and Tuning, PRAGMA 25 Workshop, Chinese Academy of Science, Beijing, China, Oct. 16-18, 2013

Vortex2 Metadata Management on PRAGMA Cloud: A GeoPortal Experience. PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012.

A Hierarchical MapReduce Framework, PRAGMA 22 Workshop, Monash University, Melbourne, Australia, April 17-19, 2012.

Improving Twister Messaging System Using Apache Avro, The 2nd IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2010), Indianapolis, USA, Nov 30 - Dec 3. 2010.[link] [CloudCom Abstract]

Karma: Provenance Aggregation Across Layers of GENI Experimental Networks, PRAGMA 19 Workshop, Changchun, China, Sept 13-15. 2010.[link]

GDIA: A Scalable Grid Infrastructure for Data Intensive Applications, National Biomedical Computation Resource Summer Institute 06, San Diego, Aug. 2006.[link]

My WorkSphere: Integrated and Transparent Access to Gfarm Computational Data Grid through GridSphere Portal with Metascheduler CSF4, 3rd International Life Sciences Grid Workshop, Yokohama, Japan, 2006; Yokohama, Japan, 2006. [pdf]

Presentations (excludes paper presentation)[Top]

Cross-Institute Virtual Cluster Management in PRAGMA, PRAGMA 27 Workshop, Bloomington, Indiana, USA, Oct 15-17, 2014

User-level controllability of virtual clusters using HTCondor, PRAGMA 26 Workshop, Tainan, Taiwan, April 9-11, 2014

Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Invited talk at Distinguished Lecture Series of Computer Science and Technology, Jilin University, China, Oct 22, 2013.

Introduction to MapReduce and Hierarchical MapReduce, Guest Lecture in Scientific Data Management and Preservation Class (CSCI-B669), Indiana University, April 10, 2013

A Hierarchical MapReduce Framework, Invited talk at IBM Student Workshop for Frontiers of Cloud Computing 2012, IBM's Thomas J. Watson Research Center in Hawthorne, New York, July 30-31, 2012

Hierarchical MapReduce: Towards Simplified Cross-Domain Data Processing, Invited talk at Cloud Computing Lecture, Indiana University, Oct 12, 2011.

Opal-Sigiri: Software as a Service on PRAGMA Testbed, PRAGMA 20 Workshop, Hong Kong, China, March 2-4. 2011. [Slides]

Metascheduling using the Community Scheduler Framework (CSF4), NBCR Summer Institute 2009, UCSD, Aug 3-7th 2009. [Detail]

Software as a Service (SaaS) for Drug Discovery Workflows, with Wilfred W. Li, Sriram Krishnan, Jane Ren, Luca Clement, Kevin Dong, at UCSD, June 10th 2009.

My WorkSphere: Integrated and Transparent Access to Gfarm Computational Data Grid through GridSphere Portal with Meta-scheduler CSF4, NBCR Special Seminar, UCSD, Aug 28th 2006

Cluster and Grid Computing: Transparent Access and workflow management, NBCR Summer Institute 2006, UCSD, Aug 7-11th 2006

Services[Top]

Membership

Co-founder and co-chair emeritus of PRAGMA Students, Subcommittee Member of PRAGMA Steering Committee, Affilicated Researcher, The Pacific Rim Application and Grid Middleware Assembly, 2005 - 2015

Technical Program Committee Member, IEEE International Conference on Cloud Networking (IEEE CloudNet), 2014, 2015, 2016, 2017

Hosted Research Workshops

Workshop Co-chair, The 1st - 4th PRAGMA Students Workshop, 10/2012, 03/2013, 10/2013 and 04/2014

Session Chair and Program Committee Member, The 23rd - 27th PRAGMA Workshop at Seoul, Korea; Bangkok, Thailand; Beijing, China; Tainan, Taiwan; Bloomington, Indiana, USA, 10/2012, 03/2013, 10/2013, 04/2014 and 10/2014
(Reviewed over 100 abstract submissions for workshop posters and demonstrations over the years.)

Invited Paper Reviews

The IEEE International Conference on Cloud Networking (IEEE CloudNet), 2014, 2015 and 2016 (10 paper review)

International Journal of Computing Science and Mathematics, 2016 (1 paper review)

International Journal of Big Data Intelligence, 2016 (1 paper review)

Frontiers of Information Technology & Electronic Engineering, 2016 (1 paper review)

The 10th IEEE International Conference on e-Science, 2014 (1 paper review)

Journal of Parallel and Distributed Computing, 2014 (1 paper review)

Computing, 2013 (2 paper review)

IEEE Systems Journal, 2013 (1 paper review)

Scalable Computing, 2013 (1 paper reviews)

The 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2012 (1 paper review)

Concurrency and Computation: Practice and Experience 2010-2012 (2 paper reviews)