Abstracts and Bios

Monday, September 18th Talks

1) "U.S. Business Firms: All the Data, a Lot of Lousy Theories, and a Family of Very Large Scale Models" - Robert Axtell, George Mason University and Santa Fe Institute
Available to economists today are unprecedented amounts of micro-data. Some of this is ‘digital exhaust,’ very noisy and not terribly useful, but there are a number of domains in which large data universes are becoming available for both research and practical purposes (e.g, large banks using data on all their customers to create new financial products). In this talk I will provide an overview of ostensibly comprehensive data on U.S. businesses that derives from administrative records. These data cover some 30 million firms, 6 million of which have employees, and some 120 million workers. In these data we find dozens of gross patterns, some of which were known from samples of the firm population, others are new. Specifically, distributions of firm sizes, ages, growth rates, productivity, job tenure, and wages, together with inter-firm networks and many conditional relationships (e.g., growth rates conditional on size and age and wages,…), all beg for explanation. However, received theories of firm behavior were largely put forward with little connection to data and are therefore of little help in trying to understand why the data have the structure that they do. I will conclude by describing a class of computational models in which a large number of software ‘agents’ self-organize into teams that have much in common with firms. Suitably parameterized, these models turn out to be capable of reproducing much of the empirical structure of U.S. firms. Computational challenges associated with parallel execution of these non-numerical/equation-free models will be briefly discussed. The application of agent-based computational techniques to large-scale social and economic phenomena, grounded in data, will be touched on.

BIO: Rob Axtell is Professor of Economics and of Computational Social Science at George Mason University. He is Co-Director of the Computational Public Policy Lab at Mason and a member of the Krasnow Institute for Advanced Study there. He is an External Faculty Member at the Santa Fe Institute, Northwestern University’s Institute on Complex Systems, and the University of Waterloo’s Institute for Complexity and Innovation. Previously he was a Senior Fellow in the Economic Studies and Governance Studies programs at the Brookings Institution. He has been Visiting Professor in the Complexity Economics Programme at the University of Oxford, Mellon Visiting Distinguished Professor at Middlebury College, and Visiting Professor of Economics at the New School for Social Research. He holds an interdisciplinary Ph.D. from Carnegie Mellon University. His research involves agent-based computational models of social phenomena. His book Growing Artificial Societies: Social Science from the Bottom Up, (MIT Press, 1996), co-authored with J.M. Epstein, is widely cited as an early statement of the potential of multi-agent systems to represent social processes. His research has appeared in leading general interest scientific journals (e.g., Science, Proceedings of the National Academy of Sciences USA, PLOS One), disciplinary journals (e.g., American Economic Review, Economic Journal, Computational and Mathematical Organization Theory, Journal of Industrial Ecology), and has been reprised in the popular science press (e.g., Nature, Scientific American, Science News, New Scientist, Discover, Technology Review), in newspapers and magazines (e.g., Wall Street Journal, Los Angeles Times, Washington Post, Atlantic Monthly, the New Yorker) and in a museum installation. His current research involves the creation of entire artificial economies consisting of 100s of millions of interacting software agents. His undergraduate degree is from the University of Detroit where he studied chemical engineering and economics before going to work for Exxon Research & Engineering (NJ) and Exxon Production Research (Houston).
 
2) "Developments and Challenges in Past Climate Reconstruction" - Dr. Bo Li, University of Illinois at Urbana-Champaign 

Climate reconstruction depends on the relationship between observed climate variables, called instrumental data available from the mid 19th century, and indirect observations, called proxy data that extends far back in time. Various types of proxies have been used in the reconstruction as the strength from one proxy can in principal compensate another's weakness. Climate model data is also introduced hoping to bring additional information to the climate reconstruction. Lots of research has emerged on understanding and improving the reconstruction methods, such as how to integrate different sources of information, and whether long- or short- memory models are appropriate for climate process. However, there are still many issues remaining unsolved. For example, which reconstruction method performs better and how to combine strengths from different methods? In what capacity the instrumental data, proxy and climate model data are consistent? What is the strength of each data, and how to efficiently use those data to infer the past climate? Compared to the index or series construction, the climate field construction and joint reconstruction of multiple climate variables involve more statistical and computing issues. In this talk, I will give an overview of some recent developments and challenges regarding the paleoclimate reconstruction.  


BIO: Dr. Bo Li received her PhD in Statistics from Texas A&M University in 2006, and then she became a Post-Doc at National Center for Atmospheric Research before joining Purdue as an Assistant Professor in 2008. In 2013 she moved to University of Illinois at Urbana-Champaign as an Associate Professor. Dr. Li’s research mainly focuses on spatial and spatio-temporal statistics and statistical problems in climatology, atmospheric and environmental sciences, and public health. 
 
3) "Big Data in Neuroscience: Analysis Challenges for the Next Decade" - Dr. Mark Reimers,  Michigan State University
New technologies are bringing high-throughput data to experimental neuroscience, as big data came to genomics a decade earlier. As microarrays transformed genomics and stimulated research in statistics, the new high-throughput neural technologies give rise to significant data analysis challenges. Although several technologies are being developed for  the BRAIN Initiative, the most promising technologies are the high-throughput optical imaging technologies, which generate high-resolution stacks of images of brain activity, up to terabytes of data per experiment. However the statistical methods and computational infrastructure to analyze and interpret these data are largely undeveloped.

BIO: Dr. Mark Reimers studies brain function through big data analysis and computational modeling. He obtained his MSc in scientific computing, and his PhD in probability theory from the University of British Columbia in Canada. He has worked at Memorial University in Canada, the Karolinska Institute in Stockholm, at several start-up companies in Toronto and in Boston, at the National Institutes of Health in Maryland, the Virginia Institute for Psychiatric and Behavioral Genetics in Richmond, and since January 2015 in the Neuroscience Program at MSU. 
Dr. Reimers research work focuses on analyzing and interpreting the very large data sets now being generated in neuroscience especially from the high-throughput optical technologies developed by the BRAIN initiative. He supervised the data analysis for the BrainSpan paper on the development of gene expression in the human brain in Nature in 2011, and assisted in data analysis for a paper analyzing cortical dynamics on the surface of mouse brain in Nature Neuroscience in 2013, and on the major psychiatric genetics papers in Nature in 2014 and 2015. He teaches courses at Michigan State on neuroinformatics, genomics of brain and behavior, theory and computational modeling in neuroscience, and advanced analytic techniques for neural activity data.

3) "Infectious Disease Surveillance in the Big Data Era: Towards Faster and Locally Relevant Systems" - Dr. Lone Simonsen, Milken Institute School of Public Health  The George Washington University
While big data have proven immensely useful in fields such as marketing and earth sciences, public health is still relying on more traditional surveillance systems and awaiting the fruits of a big data revolution. A new generation of big data surveillance systems is needed to achieve rapid, flexible, and local tracking of infectious diseases, especially for emerging pathogens. In this discussion, we reflect on the long and distinguished history of disease surveillance and discuss recent developments related to use of big data. We start with a brief review of traditional systems relying on clinical and laboratory reports. We then examine how large-volume medical claims data can, with great spatiotemporal resolution, help elucidate local disease patterns. Finally, we review efforts to develop surveillance systems based on digital and social data streams, including the recent rise and fall of Google Flu Trends. We conclude by advocating for increased use of hybrid systems combining information from traditional surveillance and big data sources, which seems the most promising option moving forward. Throughout the talk, we use influenza as an exemplar of an emerging and reemerging infection which has traditionally been considered a model system for surveillance and modeling.
 
BIO: Professor Lone Simonsen is a global health epidemiologist and a member of the Danish Royal Academy of Sciences and Letters.  She is back in her native Denmark on an EC Marie Curie Senior Fellow Grant in Historic Epidemiology at Dept Public Health at the U. Copenhagen; and a Research Professor at George Washington U, in Washington DC, USA.  After obtaining a PhD in population genetics at U. Massachusetts, she trained at the Centers for Disease Control CDCs EIS program in Atlanta.  Over the past 25 years Simonsen worked an epidemiologist in the US National Institutes of Health where she evaluated vaccine programs (adverse events and benefits), and at the World Health Organization where she worked on hepatitis, HIV, TB drug resistance, SARS and pandemic influenza. At the NIH-NIAID 2000-2007 she advised front office leadership on vaccine program evaluation and emerging infectious diseases.  Her currently researches possibilities and challenges with using “big data” for public helath surveillance.  She co-wrote in 2013 the critical analysis of Google Flu Trends that led to a re-think of the use of such search engine data for disease surveillance.   Recently she co-edited and contributed to a JID issue on “big data” in infectious disease surveillance.   At the university of Copenhagen she focuses on her research on diseases in historic Denmark (cholera, measles, malaria, typhoid fever), as well as contemporary global threats such as Ebola, Zika, MERS and avian influenza.  She is a member of WHO expert committees evaluating pandemic threats and worked WHO and NIH to evaluation the 2009 pandemic; currently she works with colleagues at Yale SPH she currently evaluates pneumococcal childhood vaccine programs in middle- and low income countries on a Gates grant. 
 
4) "Building Exascale Computational Tools for Cancer," - Dr. Arvind Ramanathan,  Oak Ridge National Labratory 
 A major part of the Cancer Moonshot is the strategy to build, apply and utilize modeling and simulation strategies that are informed by machine learning techniques to understand cancer biology. Specifically, this research focuses on integrating diverse genomic, experimental and clinical data to build predictive models for caner research and therapeutics. Three projects are being pursued; the first one focuses on the genomic scale, where the objective is to develop models that can predict tumor response to drugs, the second one focuses on the molecular scale, where the objective is to develop an effective understanding of Ras signaling at the membrane, and the last one focuses on the analysis of electronic medical records of millions of cancer patients to develop models of cancer treatment trajectories. Underlying all of these diverse clinical problems are common data analytic challenges that necessitates the development of novel machine learning platforms that can exercise the full computing power of future supercomputers. We will discuss the different data analytic and machine learning challenges and outline the strategies /progress that we have made as part of this project. 
 
BIO: Arvind Ramanathan is a staff scientist in the Computational Science and Engineering Division and the Health Data Sciences Institute at Oak Ridge National Laboratory. His research interests are at the intersection of data science, high performance computing and biological/healthcare science. He builds data analytic tools to gain insights into the structure-dynamics- function relationships of bio-molecules. In conjunction with biophysical/biochemical experiments and long time- scale computational simulations, his group investigates bio-molecular systems that have implications for human health. In addition, his group has also developed novel data analytic tools for public health surveillance. He has published over 30 papers, and his work has been highlighted in the popular media, including NPR and NBC News. More information about his group and research interests can be found at http://ramanathanlab.org

 Tuesday, September 19th Talks 

1) "Using machine learning to find density functionals," - Dr. Keiron Burke, University of California, Irvine
I will give an overview of the field of density functional theory (DFT), why it is so important, and how we (and others) are using machine-learning to find  density functionals that no human can. Density functional theory (DFT) is an enormous success story, being used to solve electronic structure problems in more than 30,000 papers per year. But all such calculations rely on intuitive and clever approximations. Historically, density functional approximations have been created by humans, both for the elusive exchange-correlation energy and for the Kohn-Sham kinetic energy (in orbital-free) DFT.   I will describe our ongoing work (with Klaus-Robert Mueller's group, and others) to hand this problem over to machines.  Most importantly, unlike almost all human-made functionals, the machines have no biases toward local-type approximations.  This allows them to break bonds and incorporate strong correlation effects without any particular difficulty. I will describe two new works in which (a) we do calculations on real (3D) molecules, bypassing the Kohn-Sham scheme, and (b) show that machines can find the exact functional for strongly correlated solids (but our demo is in 1D). If time permits, I will discuss if this work is a breakthrough or an abomination.

Finding Density Functionals with Machine Learning
 John C. Snyder, Matthias Rupp, Katja Hansen, Klaus-Robert Müller, Kieron BurkePhys. Rev. Lett.108, 253002 (2012).

Pure density functional for strong correlation and the thermodynamic limit from machine learning Li Li, Thomas E. Baker, Steven R. White, Kieron BurkePhys. Rev. B 94, 245129 (2016)

By-passing the Kohn-Sham equations with machine learning Felix Brockherde, Leslie Vogt, Li Li, Mark E Tuckerman, Kieron Burke, Klaus-Robert M uller(accepted in Nature Communications) (2017)

 
BIO: Dr. Kieron Burke is a Chancellor's  professor in both the chemistry and physics departments at UC Irvine and a fellow of the American Physical Society.  His research focusses on developing a theory of quantum mechanics called density functional theory. Dr. Burke obtained his Ph.D. in Physics from UC Santa Barbara in 1989. He was a faculty member at Rutgers University before moving to UCI. Dr. Burke  works on developing all aspects of DFT: formalism, extensions to new areas, new approximations, and simplifications. His work is heavily used in materials science, chemistry, matter under extreme conditions (such as planetary interiors or fusion reactors), magnetic materials, molecular electronics, and so on. 
 
2)  Dr. Karthik Duraisamy, "Data-enabled, Pysics-constrained Predictive Modeling of Complex Systems," University of MIchigan, Associate Professor, Aerospace Engineering
The pursuit of accurate predictive models is a central issue and pacing item in many scientific and engineering disciplines. With the recent growth in computational power and measurement resolution, there is an unprecedented opportunity to use data from fine-scale simulations, as well as critical experiments, to inform, and in some cases even define predictive models.The first part of the talk will cover some recent work on discovering governing equation sets from data. While the general idea of data-driven modeling appears intuitive, the process of obtaining useful predictive models from data in complex problems is less straightforward.  A pragmatic solution is to combine physics-based models with data-based methods and pursue a hybrid approach. The rationale is as follows: by blending data with existing physical knowledge and enforcing known physical constraints, one can improve model robustness and consistency with physical laws and address gaps in data. This would constitute a data-augmented physics-based modeling paradigm. This talk will discuss a coordinated approach of experimental design, statistical inference and machine learning  with the goal of improving predictive capabilities, with examples from turbulence modeling. Statistical inference is used to derive problem-specific information that consistently connects model augmentations to model variables. The outputs of several such inverse problems (on different datasets representative of the physical phenomena) is then  transformed into general functional forms using machine learning. When the machine learning-generated model forms are embedded within a standard solver setting, we show that much improved predictions can be achieved. The final part of the talk will provide a brief overview of a hardware/software ecosystem that is being developed at the University of Michigan to enable large-scale data-augmented model development for computational physics applications.
 
BIO: Prof. Duraisamy is an Associate Professor of Aerospace Engineering at the University of Michigan, Ann Arbor. He obtained a doctorate in aerospace engineering and master’s degree in applied mathematics from the University of Maryland, College Park. Prior to his appointment in 2013 at the University of Michigan, he spent time at Stanford University and the University of Glasgow. He is the founding director of the Center for Data-driven Computational Physics at the Univ of Michigan, which  is focused on deriving data-driven solutions to complex multi-physics problems in many fields. His other research interests are in turbulence modeling and simulations, numerical methods and reduced-order modeling.
 
 
3) “You don’t understand anything until you learn it more than one way” – Minsky MIT,  Doug Riecken, Air Force Office of Scientific Research (AFOSR), Program Officer
Doug has worked for several decades with MIT’s Marvin Minsky. His research includes commonsense reasoning, cognitive architectures/theories of mind, and the role of emotions.  He has conducted research in agent based systems, end-user programming, music composition models, financial predictive modeling, big data analytics, bioinformatics, real-time computer supported cooperative work (CSCW) environments, multimodal reasoning systems, and human/machine learning. 
 
He currently is a Program Officer at AFOSR.  Prior assignments include: Interim Director and Senior Scientist for the Center for Computational Learning Systems at Columbia University, Department Head of the Commonsense Computing Research Department at IBM Research, and Department Head for several departments at AT&T’s Bell Laboratories Research. Over the years, Riecken and his departments have created and delivered numerous research contributions, products and services.  He has over 70 publications including several best paper awards and keynote addresses while serving on various advisory/editorial boards.

 
4) Speaker Panel Discussion, "Cross-cutting concepts" 
 

 September 19th Talks 

1) "Big Data in Neuroscience: Analysis Challenges for the Next Decade" - Dr. Mark Reimers,  Michigan State University
New technologies are bringing high-throughput data to experimental neuroscience, as big data came to genomics a decade earlier. As microarrays transformed genomics and stimulated research in statistics, the new high-throughput neural technologies give rise to significant data analysis challenges. Although several technologies are being developed for  the BRAIN Initiative, the most promising technologies are the high-throughput optical imaging technologies, which generate high-resolution stacks of images of brain activity, up to terabytes of data per experiment. However the statistical methods and computational infrastructure to analyze and interpret these data are largely undeveloped.

BIO: Dr. Mark Reimers studies brain function through big data analysis and computational modeling. He obtained his MSc in scientific computing, and his PhD in probability theory from the University of British Columbia in Canada. He has worked at Memorial University in Canada, the Karolinska Institute in Stockholm, at several start-up companies in Toronto and in Boston, at the National Institutes of Health in Maryland, the Virginia Institute for Psychiatric and Behavioral Genetics in Richmond, and since January 2015 in the Neuroscience Program at MSU. 
Dr. Reimers research work focuses on analyzing and interpreting the very large data sets now being generated in neuroscience especially from the high-throughput optical technologies developed by the BRAIN initiative. He supervised the data analysis for the BrainSpan paper on the development of gene expression in the human brain in Nature in 2011, and assisted in data analysis for a paper analyzing cortical dynamics on the surface of mouse brain in Nature Neuroscience in 2013, and on the major psychiatric genetics papers in Nature in 2014 and 2015. He teaches courses at Michigan State on neuroinformatics, genomics of brain and behavior, theory and computational modeling in neuroscience, and advanced analytic techniques for neural activity data.
 
2) " Should one use an educated or uneducated basis?" - Hongkai Zhao, Chair, Department of Mathematics, University of California Irvine
A good choice of basis is important for representation/approximation, analysis and interpretation of quantities/information of interest. A common difficult balance in real applications is between universality and specificity. This is an application/knowledge dependent choice. We will show examples for which educated basis, i.e., designing problem specific basis or learning basis, are effective, as well as examples for which uneducated basis, i.e., using simple random basis and bless of dimensions, can also be effective. 
 
BIO: Dr. Hongkai Zhao is a Chancellor's Professor in the Mathematics Department at University of California, Irvine. His research focuses on computational and applied mathematics that includes modeling, analysis and developing numerical methods for problems arising from science and engineering. Dr. Zhao obtained Ph.D. from UCLA in 1996 and then worked as Szego Assistant Professor at Stanford University before he moved to UCI. He has won Alfred P. Sloan Fellowship (2002-2004) and Feng Kang Prize in Scientific Computing (2007). Dr. Zhao's research is broad, including 1) numerical methods for moving interface and free boundary problems, 2) inverse problem and imaging, 3) image processing and computer vision, and 4) Hamilton-Jacobi equation and the fast sweeping method.