October 3rd Talks
1) "More Data, More Science and … Moore’s Law?" - Katherine Yelick, Professor of Electrical Engineering and Computer Sciences, UC Berkeley Associate Lab Director for Computing Sciences, Lawrence Berkeley National Lab
In the same way that the Internet has combined with web content and search engines to revolutionize every aspect of our lives, the scientific process is poised to undergo a radical transformation based on the ability to access, analyze, and merge complex data sets. Scientists will be able to combine their own data with that of other scientists, validating models, interpreting experiments, re-using and re-analyzing data, and making use of sophisticated mathematical analyses and simulations to drive the discovery of relationships across data sets. This “scientific web” will yield higher quality science, more insights per experiment, a higher impact from major investments in scientific instruments, and an increased democratization of science—allowing people from a wide variety of backgrounds to participate in the science process.
What does this “big science data” view of the world have to do with computer science? Due to the exponential growth rates in detectors, sequencers and other observational technology, data sets across many science disciplines are outstripping the storage, computing, and algorithmic techniques available to individual scientists. Scientists have always demanded some of the fastest computers for computer simulations, and while this has not abated, there is a new driver for computer performance with the need to analyze large experimental and observational data sets.
In this talk I will describe some examples of how science disciplines such as biology, material science and cosmology are changing in the face of their own data explosions, and how this will lead to a set of open questions for computer scientists due to the scale of the data sets, the data rates, inherent noise and complexity, and the need to “fuse” disparate data sets.
BIO: Katherine Yelick is a Professor of Electrical Engineering and Computer Sciences at the University of California at Berkeley and the Associate Laboratory Director for Computing Sciences at Lawrence Berkeley National Laboratory. She is known for her research in parallel languages, compilers, algorithms, libraries, architecture, and runtime systems. She earned her Ph.D. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology and has been on the faculty at UC Berkeley since 1991 with a joint research appointment at Berkeley Lab since 1996. She was the director of the National Energy Research Scientific Computing Center (NERSC) from 2008 to 2012 and in her current role as Associate Laboratory Director she manages a 300-person organization that includes NERSC, the Energy Science Network (ESNet), and the Computational Research Division. She is an ACM Fellow and recipient of the ACM/IEEE Ken Kennedy Award and the ACM-W Athena award. She is a member of the National Academies Computer Science and Telecommunications Board (CSTB) and the Computing Community Consortium (CCC), and she previously served on the California Council on Science and Technology.
2) "Non-convex Penalties and Applications in Data Science" - Jack Xin, Professor of Mathematics, University of California, Irvine
Finding low dimensional solutions in high dimensional spaces is one of the fundamental problems in data science. Convex penalty functions, most notably the L1 and nuclear norms, have played a major role in achieving sparse and low rank properties in the past decade. This talk gives an overview of recent development of non-convex penalties and algorithms that improve on L1 methods in several applications such as compressed sensing, matrix completion, imaging science and machine learning.
BIO: Jack Xin has been Professor of Mathematics at UC Irvine since 2005. He received his Ph.D in applied mathematics at Courant Institute, New York University in 1990. He was a postdoctoral fellow at Berkeley and Princeton in 1991 and 1992. He was assistant and associate professor of mathematics at the University of Arizona from 1991 to 1999. He was professor of mathematics from 1999 to 2005 at the University of Texas at Austin. His research interests include applied analysis, computational methods and their applications in multi-scale problems, sparse optimization, and data science. He authored over hundred journal papers and two Springer books. He is a fellow of the Guggenheim Foundation, and the American Mathematical Society. He is Editor-in-Chief of Society of Industrial and Applied Mathematics (SIAM) Interdisciplinary Journal Multi-scale Modeling & Simulation (MMS).
3) "Genetic Programming as a tool for Data Science" - Wolfgang Banzhaf, Michigan State University, Department of Computer Science
Genetic Programming is a technique to generate algorithms from data by applying principles of natural evolution. It is part of a larger group of methods known as “Evolutionary Computation”. While most of these methods are known to work as optimizers, Genetic Programming is used to generalize from single instances to larger data sets and to extract models from them. This talk will introduce the method, set it incontext and discuss its broader goals and challenges.
Bio: Wolfgang Banzhaf has recently joined the Computer Science and Engineering Department and is the John R. Koza Professor in Genetic Programming at Michigan State University. From 2003 to 2016 he was professor in the Department of Computer Science at Memorial University of Newfoundland serving as department chair for 10 years. He studied Physics at the Universities of Stuttgart, Munich and Karlsruhe in Germany, and holds a PhD in Physics from the University of Karlsruhe. His research interests are in the field of bio-inspired computing, notably evolutionary computation and in complex adaptive systems. He wrote the first textbook on Genetic Programming and served as the first editor-in- chief of the Springer Journal “Genetic Programming and Evolvable Machines”.
4) "An Ecosystem for Heterogeneous Parallel Computing" - Wu Feng: Virginia Tech, Department of Computer Science
With processor core counts doubling every 24 months and penetrating all markets from high-end servers in supercomputers to desktops and laptops down to even mobile phones, we sit at the dawn of a world of ubiquitous parallelism, one where extracting performance via parallelism is paramount. That is, the "free lunch" to better performance, where programmers could rely on substantial increases in single-threaded performance to improve software, is long over. The burden falls on developers to exploit parallel hardware for performance gains. But how do we lower the cost of extracting such parallelism, particularly in the face of the increasing heterogeneity of processing cores? To address this issue, this talk will present a vision for an ecosystem for delivering accessible and personalized supercomputing to the masses, one with a heterogeneity of (hardware) processing cores on a die or in a package, coupled with enabling software that tunes the parameters of the processing cores with respect to performance, power, and portability. Initial results of different aspects of the ecosystem will be demonstrated via a benchmark suite of computational dwarfs and applications on platforms ranging from the embedded or mobile space to the datacenter, such as Virginia Tech's GPU-accelerated supercomputer, HokieSpeed, which debuted on the Green500 as the most energy-efficient (i.e., greenest) commodity supercomputer in the U.S. and at a fraction of the cost of the fastest supercomputer in the world.
BIO: Wu Feng is the Elizabeth & James Turner Fellow and Professor of Computer Science at Virginia Tech (VT), where he directs the Synergy Lab and serves as the director of the Synergistic Environments for Experimental Computing (SEEC) Center and a VT site co-director of the National Science Foundation Center for High-Performance Reconfigurable Computing (CHREC). In addition, he holds appointments in the Department of Electrical & Computer Engineering, Health Sciences, and Virginia Bioinformatics Institute. He is known for his research in parallel and distributed computing, resulting in an h-index of 43 from 250+ publications and 7000+ citations, four Best Paper Awards, three R&D 100 Awards, international coverage in 1000+ media outlets (including The New York Times, CNN, and BBC News), and a recent feature in a worldwide Microsoft Cloud commercial on computing a cure for cancer. Dr. Feng received a B.S. in Electrical & Computer Engineering and Music (Honors) and an M.S. in Computer Engineering from Penn State University in 1988 and 1990, respectively. He earned a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 1996. His previous professional stints include IBM T.J. Watson Research Center, NASA Ames Research Center, Vosaic, University of Illinois at Urbana-Champaign, Purdue University, The Ohio State University, Orion Multisystems, and Los Alamos National Laboratory.
October 4th Talks
1) "Large-Scale Data Analytics and Its Relationship to Simulation" - Rob W. Leland: Sandia National Labs, Vice President, Science and Technology, and Chief Technology Officer at Sandia National Laboratories.
One of the primary objectives of the President’s National Strategic Computing Initiative is “Increasing coherence between the technology base used for modeling and simulation and that used for data analytic computing.“ I will interpret this objective in the context of Large-Scale Data Analytics (LSDA) problems, i.e. problems that require finding meaningful patterns in data sets that are so large as to require leading-edge processing and storage capability. LSDA problems are increasingly important for government mission work, industrial application, and scientific discovery. Effective solution of some important LSDA problems requires a computational workload that is substantially different from that associated with traditional High Performance Computing (HPC) simulations intended to help understand physical phenomena or to conduct engineering. While traditional HPC application codes exploit structural regularity and data locality to improve performance, many analytics problems lead more naturally to very fine-grained communication between unpredictable sets of processors, resulting in less regular communication patterns that do not map efficiently on to typical HPC systems. In both simulation and analytics domains, however, data movement increasingly dominates the performance, energy usage, and price of computing systems. It is therefore plausible that we could find a more synergistic technology path forward. Even though future machines may continue to be configured differently for the two domains, a more common technological roadmap between them in the form of a degree of convergence in the underlying componentry and design principles to address these common technical challenges could have substantial technical and economic benefits.
Bio: Dr. Leland is the executive responsible for leadership and management of corporate research and development and capabilities stewardship at Sandia National Laboratories. He is also responsible for leadership of technology transfer and strategic research relationships with universities, industry, and the State of New Mexico.
Dr. Leland joined the Parallel Computing Sciences Department at Sandia National Laboratories in 1990 and pursuedwork principally in parallel algorithm development, sparse iterative methods and applied graph theory. There he coauthored Chaco, a graph partitioning and sequencing toolkit widely used to optimize parallel computations. In 1995, Dr. Leland served as a White House Fellow advising the Deputy Secretary of the Treasury on technology modernization at the IRS. Upon returning to Sandia, he led the Parallel Computing Sciences Department and the Computer and Software Systems Research Group.
In 2005, Dr. Leland became Director of the Computing and Networking Services at Sandia, with responsibility for production computing platforms, voice and data networks, desktop support and cyber security for the laboratory. In March of 2010, he became Director of Computing Research, leading a vertically integrated set of capabilities spanning computer architecture, math and computing science, algorithm and tool development, computational sciences and cognitive sciences. He also served during this period as Director of Sandia’s Climate Security Program, which is focused on helping the nation understand and prepare for the national security impacts of climate change. In 2014, Dr. Leland worked at the White House in the Office of Science and Technology Policy where he led the effort to develop the National Strategic Computing Initiative (NSCI) announced by President Obama in July of 2015.
Dr. Leland studied undergraduate electrical engineering at Michigan State University. He attended Oxford University as a Rhodes Scholar and studied applied mathematics and computer science, completing a Ph.D. in Parallel Computing in 1989. Dr. Leland and his wife have two young daughters and reside in Albuquerque, New Mexico. In his spare time, he and his family enjoy skiing and other outdoor activities.
2) "Data-Intensive Science in the 21st Century" - George Djorgovski, Professor and Executive Officer for Astronomy, Director, Center for Data-Driven Discovery, California Institute of Technology
Like most other domains of human endeavor, science is being transformed by the progress in computing and information technology, at a pace that is historically unprecedented. Data volumes and data rates are growing exponentially, following Moore's law. Even more interesting is the growth of data complexity and the overall information content of the data. This opens great new opportunities for discovery, but with great challenges that are shared by all data-rich fields, primarily in the arena of knowledge discovery, including machine learning, computational statistics, novel approaches to multi-dimensional data visualization, etc. As the complexity of data increases, we see an increased reliance on machine intelligence, leading towards a human-computer collaborative discovery. Common challenges call for common solutions, in the form of a methodology transfer from one field to another. Thus, the scientific method evolves as well, and "data science" becomes a new universal language of science, akin to the roles historically played by mathematics and statistics.
BIO: S. George Djorgovski is a Professor of Astronomy and the founding Director of the Center for Data-Driven Discovery. He has worked in many areas of astronomy and cosmology, and has led several sky surveys, currently the Catalina Real-Time Transient Survey that explores transient and variable objects and phenomena in the universe. He is one of the founders of the Virtual Observatory framework, and of the emerging field of Astroinformatics. His scientific interests are centered on the questions of how is computing and information technology changing the ways we do science, scholarship, and education in the 21st century.
3) "The Practical Aspects of Computational Science and Engineering" - Douglass E. Post, DoD High Performance Computing Modernization Program and Carnegie Mellon University Software Engineering Institute
Computational Science and Engineering is becoming ubiquitous in science and engineering. Indeed, if scientists and engineers are not using computing, especially high performance computing, they will be at a competitive disadvantage to those who are doing so. High performance computing is becoming necessary to analyze the data from large-scale experiments (e.g. the Large Hadron Collider), study fluid flow, unravel complex chemical and biological processes, understand climate behavior, and analyze and predict many more natural phenomena. Computing is also becoming important for designing and producing complex products from microchips to airplanes and space ships. Computing power has reached the point (~1-100 PetaFlops) that we can make accurate predictions of the behavior of complex systems. We can include all the important effects, use accurate algorithms with adequate resolution, model a complete system (an airplane, not just a wing section), carry out adequate verification and validation, and achieve sufficiently short turn-around times that enough parameter studies can be done so that the results can be both accurate and useful.
There are, however, significant and very practical challenges to successfully use this new capability. It’s sort of a Malthusian problem (population growing exponentially while food production grows linearly). While computer power has been growing exponentially (partially because chip design and manufacture has been automated with computers), software is still being developed by people who aren’t getting smarter and more productive nearly as fast. The result is that the software needed to exploit the new computers is lagging. The user paradigm is also changing. More and more scientists and engineers no longer write much code. They are beginning to use codes that someone else wrote. The computer/software have become a “virtual experiment.” The users personally become more productive, but they lose some of the ability to understand the limitations and range of validity of the software. Since the software is only a model, extreme care is needed to ensure that the model accurately predicts the real world.
I will discuss these issues by first describing the promise of computing, then the potential pitfalls, and finally what we can (and need) to do to fulfill the enormous promise that computing offers us. I will draw on the “lessons learned” over my career that started in 1967 through my current project. In 2005, I initiated and still lead the DoD Computational Research Engineering Acquisition Tools and Environments (CREATE) Program. The goal is to reduce the costs, time, and risks of successfully developing and fielding complex weapon systems by developing and deploying physics-based HPC software application to predict the performance of military air and ground vehicles, naval vessels, and RF antennas. To succeed we have needed to pay attention to the technical issues of getting the right equations, and solving them correctly and efficiently. But that’s just the beginning. We need to change the paradigm of the way science and engineering are being done. We need to find ways to support “virtual test facilities”, to retain the intellectual property of our codes, to ensure that the codes are accurate, to develop codes that will have a life cycle of many decades, to produce codes that are not only accurate, but usable, maintainable, extensible, portable, and well documented. Good software engineering and agile project management are among the most essential elements.
BIO: Dr. Douglass Post established and leads the DoD Computational Research and Engineering Acquisition Tools and Environments (CREATE) Program. He is an IPA from the Carnegie Mellon University Software Engineering Institute. Doug received a Ph.D. in Physics from Stanford University in 1975. He is Associate Editor-in-Chief of the AIP/IEEE publication “Computing in Science and Engineering”. He led the A Division Nuclear Weapons Simulation programs at the Lawrence Livermore National Laboratory and the Los Alamos National Laboratory nuclear weapons code development programs (1998-2005). Doug also led the International Thermonuclear Experimental Reactor (ITER, https://www.iter.org) Physics Team(1988-1990) for which he received the American Nuclear Society (ANS) Outstanding Technical Achievement Award for fusion science and engineering in 1992, and led the ITER In-Vessel Physics Team (1993-1998). He established and led the tokamak modeling group at the Princeton University Plasma Physics Laboratory (1975-1993). He is a Fellow of the American Nuclear Society, the American Physical Society, and the Institute of Electronic and Electrical Engineers (IEEE). He received the American Society of Naval Engineers 2011 Gold Medal Award for the CREATE Program. He has written over 250 publications with over 7,000 citations.
4) "The partial Hermitian eigenvalue and singular value problems for large, sparse matrices" - Andreas Stathopoulos, Professor of Computer Science, College of William and Mary
The Hermitian eigenvalue problem is central to many scientific and engineering applications which model increasingly complex phenomena and thus give rise to matrices of huge size. This poses significant challenges even to high-performance parallel iterative methods. A related problem is the computation of a few singular values of large matrices, which faces similar challenges because of the explosion of big data applications. In this talk we will outline the major categories of challenges for iterative methods and describe the state-of-the-art techniques used to address them. We will also discuss the state-of-the-practice in software.
BIO: Andreas Stathopoulos is a Professor of Computer Science at the College of William and Mary in Virginia. He was awarded an NSF CISE Postdoctoral Fellowship after receiving his Ph.D. and M.S. in Computer Science from Vanderbilt University; he also completed a B.S. in Mathematics from the University of Athens in Greece. Dr. Stathopoulos’ research interests include numerical analysis and high performance computing; methods for large eigenvalue problems and linear systems of equations; and related applications from materials science and quantum chromodynamics. He co-developed PRIMME (Preconditioned Iterative MultiMethod Eigensolver), one of the foremost eigenvalue packages, and several other significant software tools and has published numerous journal articles and conference papers in computational sciences and applications.