Lizbeth ‘Libby’ Benson, PhD, is a Research Assistant Professor in the Data Science for Dynamic Intervention Decision Making Center (d3c) at the University of Michigan’s Survey Research Center and Institute for Social Research. Before moving to Michigan, Libby completed a Postdoctoral Fellowship at the TSET Health Promotion Research Center within the NCI-designated Stephenson Cancer Center and University of Oklahoma Health Sciences Center. She received her PhD from the Pennsylvania State University in the department of Human Development and Family Studies and her BA in Psychology from the University of Wisconsin-Madison.

Libby’s research program is focused on intensive longitudinal, computational, and machine learning methods for examining temporal dynamics of affective, social and health behavior experiences using ecological momentary assessment and sensor-based data collected from individuals in their daily lives. Her goals are to understand how behavioral processes unfold across multiple time-scales and contexts, and how this knowledge can be used to build personalized interventions to facilitate health behavior change. Data visualization is also an important component of her work as a way to better understand complex behavioral processes, to generate new ideas, and to use as a tool for scientific communication. Currently, Libby is writing a NIH K01 focused on developing a reinforcement learning algorithm for personalizing intervention content in a smoking cessation just-in-time adaptive intervention.

This project will examine the short-, medium-, and long-term health, social and economic effects of the COVID-19 pandemic on an underserved and vulnerable population of young adults who have been part of the Fragile Families and Childhood Wellbeing Study since birth along with their primary caregivers who are entering middle age. Moreover, we will incorporate planned linkages to multilevel COVID-19 related health, social, and economic measures to identify how temporal and geographic variations contribute to outcomes of the two-generation sample.

Behavioral scientists aim to understand human behavior at the level of the individual (psychology), the level of the group (sociology), and within specific political and economic contexts (economics, political science). Social media data provide a potentially important avenue for learning about human behavior in real time by tapping multiple aspects of an individual’s beliefs, behaviors, emotions, and social networks. Such “data in the wild” provide a lens into human behavior and attitudes not available through traditional survey research. However, it is often unclear how to reconcile the behavioral processes that generate the millions of data points available with the carefully designed measurement strategies that produce typical social scientific data or how these data can yield the clear metrics of validity and reliability necessary for behavioral research. Such data are also rarely protected by the types of ethical safeguards provided to survey respondents. Hence, although these data may offer important new avenues for understanding human beliefs and behavior, harnessing these data in ways useful to social scientists is a major scientific challenge.

Computer scientists know how to “tame” these data — clean them, mine them, and make sense of them. They are leaders in using social media to investigate current events, such as tracking behavior related to social movements and virus outbreaks across the globe. The way they approach the data, however, does not typically adhere to established social science research designs or incorporate rigorous checks of the data’s validity or reliability as it relates to the individuals providing the data or the constructs measured. They also do not focus on understanding the generalizability of the data to the population at large, but instead target specific descriptive and learning tasks, making some of their methods, algorithms and results less useful to the broader research community. In order to extract significant research value from social media data, computer scientists and social scientists must integrate their expertise — or converge — to create and adapt computer science algorithms and data mining methods in ways that adhere to the design structures, measurement rigor and ethical protections of social science.

Big data featuring neuroimaging information collected from large population-based samples have spurred the emergence of population neuroscience research. However, traditional methods for neuroscience research are based on nonrepresentative samples that deviate from the target population, such as convenience and volunteer samples. The lack of representativeness may distort association studies of brain-cognition mechanisms.

The research team’s collaborative work on the Adolescent Brain Cognitive Development Study motivated this project, which presents these common problems in empirical neuroimaging studies, to fill the gap in statistical methodology between survey and neuroscience research. The team develops new strategies to adjust for nonrepresentativeness in association studies with complex and nontraditional survey designs, and to quantify the potential impact of sampling features on statistical and substantive inferences.

The overall objectives are to identify population heterogeneity in the association studies between imaging and cognitive ability measures and generalize multilevel regression and poststratification as a robust framework for inferences based on nonprobability samples. The software delivery with computational scalability and step-by-step guidelines will provide practical recommendations and tools to map the relationships and adjust for selection bias when making population inference. This interdisciplinary project will strengthen the validity and generalizability of population neuroscience research, deepen new association understandings of brain and cognition, and facilitate policy intervention.

Throughout the COVID-19 pandemic, government policy and healthcare implementation responses have been guided by reported positivity rates and vaccination rates in the community. The selection bias of these test data questions their validity as measures of the actual viral incidence in the community and as predictors of clinical burden. Publicly available vaccination data are frequently cited as a proxy for population immunity, but this metric ignores the effects of naturally-acquired immunity. The health disparities concerning asymptomatic and symptomatic patients are not yet studied. The proposal develops a valid metric to estimate the true viral incidence and naturally/vaccine-acquired immunity prevalence in the community, examine the health disparities and social inequality, and monitor the epidemic over time as an operational surveillance system. The approach collects routine testing data on SARS-CoV-2 exposure and antibody seropositivity among patients in a hospital system and performs statistical adjustments of sample representation using multilevel regression and poststratification (MRP), which adjusts for measured differences between the sample and population and also yields stable small area estimates. The data collection and analysis procedure can provide information to entire communities with generalizability and focus on burdens within specific demographics, with close attention to vulnerable populations on disparities across health outcomes, social determinants, and behaviors. In particular, the research will yield group-specific estimates of disparities with respect to asymptomatic and symptomatic patients and how these discrepancies may impact the socio-demographically dependent spread of disease and its subsequent treatment. The MRP adjustment will be made publicly accessible via a web interface and promote broad investigations with integrated data sources toward a national study.

We are only beginning to clarify the ways the COVID-19 pandemic has resulted in substantial changes to American neighborhoods. There has been an excess of permanent business closures, particularly among small neighborhood businesses most vulnerable to social distancing, such as local barbershops and nail salons. COVID-19 outbreaks in late September 2021 caused 2,000 neighborhood schools to close for an average of six days in 39 states.

A burgeoning body of research has tried to understand the forces driving these trends, focusing on infectious disease transmission at the individual level or economic models at the business level. What is not considered is the context in which these changes are taking place. By context, we mean the neighborhood community environment that holds the opportunities, restrictions, risks, and flexibility for post-pandemic growth. The community environment includes:

  1. Job opportunities in business sectors robust to social distancing;
  2. Comprehensive broadband internet access to facilitate telemedicine, online schooling, remote work, and online grocery shopping;
  3. Parks and walkable streets to facilitate socially distanced physical activity and social interaction to mitigate social isolation brought on by the pandemic; and
  4. The provision of medical care through the availability of alternate health care providers and pharmacies.

Access to these neighborhood resources is not equally distributed across America, reinforcing risk for vulnerable populations, including older adults, children and adolescents, racial/ethnic minorities, and those in rural areas. However, a lack of national, standardized, longitudinal metrics of the local neighborhood environment has hindered the ability to identify which communities are most vulnerable to the immediate and longer-term consequences of the pandemic for a host of behavioral, psychological, social, and economic outcomes.

To address this limitation in the nation’s data infrastructure, we will augment, curate and disseminate data from our National Neighborhood Data Archive (NaNDA). This dataset includes a wealth of physical, social and economic characteristics of the local neighborhood across the United States (e.g., racial segregation, business density, environmental hazards, broadband internet access, and healthcare availability), in the years both before and since the pandemic. We will participate with the Consortium on Social, Behavioral, and Economic Research on COVID-19 to integrate, share, and analyze spatially referenced neighborhood data that can be readily linked to existing survey data, cohort studies, or electronic health records at various levels of geography. We will work with the COVID-19 Consortium Coordination Center to identify and create key neighborhood metrics that are priorities for research teams in the Consortium, including a set of common data elements (CDEs) on the social, behavioral and economic indicators of the COVID-19 pandemic at the neighborhood level. We will also develop new metrics of longitudinal neighborhood change in the decades preceding the pandemic, which can inform community risk and resilience since the pandemic.

Alzheimer’s disease and related dementias (ADRD), a leading cause of disability among older adults, has become a critical public health concern. The clock-drawing test, which measures multiple aspects of cognitive function including comprehension, visual spatial abilities, executive function and memory, has been widely used as a screening tool to detect dementia in clinical research, epidemiologic studies, and panel surveys. The test asks subjects to draw a clock, typically with hands showing ten after 11, and then assigns either a binary (e.g. normal vs. abnormal) or ordinal (e.g. 0 to 5) score. An important limitation in large-scale studies is that the test requires manual coding, which could result in biases if coders interpret and implement coding rules in different ways.

Several small-scale studies have explored the use of machine learning methods to automate clock-drawing test coding. Such studies, which have had limited success with ordinal coding, have used methods that are not designed specifically for complex image classification and are less effective than deep learning neural networks, a new and promising area of machine learning. More recently, machine-learning methods have been applied to digital clock-drawing testing, a form of the clock-drawing test that uses a digital pen and tablet. Despite some promising results on small-scale data, thus far these studies have only attempted to code binary categories.

This project develops advanced deep learning neural network models to create and evaluate an intelligent clock-drawing test Clock Scoring system – CloSco – that will automatically code test images. We will use a large, publicly available repository of clock-drawing test images from the 2011-2019 National Health and Aging Trends Study (NHATS), a panel study of Medicare beneficiaries ages 65 and older funded by the National Institute on Aging. Specifically, we will:

  1. Develop an automated clock-drawing test coding system for both ordinal and continuous scores;
  2. Evaluate the performance of the CloSco system and investigate the value of continuous scoring for dementia classification and longitudinal test models; and
  3. Prepare and disseminate NHATS public-use files and documentation with ordinal and continuous clock-drawing test codes assigned using CloSco along with the CloSco deep learning neural network program.

If successful, the DLNN programs may offer a model for automating coding of other widely available drawing tests used to evaluate a variety of cognitive functions.

What do the teams that produce science and the networks in which they are included look like? How is credit allocated within them? Previously, analyses performed using co-authorship data have largely shaped our knowledge of these topics. . However, not all people who make important contributions to research projects appear as coauthors on all articles and some author positions are more prestigious than other positions. Evidence suggests that women and members of underrepresented racial and ethnic groups are disadvantaged in terms of author position. The same is likely true for research staff.

This project combines UMETRICS data covering over 23 million payroll transactions to over 175 thousand people paid on NIH projects at 31 universities which accounts for over one-third of federally funded, academic R&D with Torvik and Smalheiser’s updated Author-ity database of publications by biomedical researchers. The combined data allow us to analyze scientific collaboration networks, as distinct from coauthorships, and shed new light on relatively marginal populations in biomedicine. The interdisciplinary team combines emerging and established scholars and has successfully developed, analyzed, and distributed both datasets.

We will describe the composition of the teams and networks supported on research projects, answering questions such as: How large are the teams that actually conduct research? What factors relate to the size of teams? What types of people work on them in terms of gender, race, ethnicity and age, and in what capacities? How are each of them positioned in collaboration networks? We further study the association of researchers’ characteristics such as gender, race, ethnicity, age, and job title to the credit that they receive for their work. Lastly, we will study how these relationships depend on the gender, race, ethnicity, and age of the Principal Investigator (PI) and on funding mechanisms — for example, whether women and younger PIs are more inclusive and egalitarian than men or if some mechanisms are better in terms of inclusion. These questions are increasingly urgent as scientific teams expand but require information on who actually works on research projects beyond data on authorship alone.

Thomas (Tom) Crossley is Research Professor and Codirector of the Panel Study for Income Dynamics (PSID). Professor Crossley’s research interests include household behavior (particularly consumption and saving behavior), financial security, and living standards; the design, collection and analysis of survey data; and economic measurement more broadly.

The Longitudinal Study of American Youth (LSAY) is a three-cohort two-generation longitudinal study of national samples of public school students in the United States. The two original cohorts consisted of national probability samples of 7th and 10th grade students selected in 1987. These young adults are now 37 to 40 years of age and reside in all 50 states of the U.S. With continued support from the National Science Foundation, the LSAY will launch a new cohort of 7th grade students in the fall of 2015. The new cohort (called Cohort 3) will be exactly one generation younger than the students in the 1987 cohorts, allowing a generational comparison of changes in American family life, schooling, and society. The original cohorts were designed to study the factors related to student interest in science and mathematics, the development of skills in those disciplines, the selection of careers, and the development of sufficient scientific literacy to perform citizenship responsibilities in a democratic society. Cohort 3 will explore the same questions over the next two decades. The first 20 years of LSAY data are available through the Inter-university Consortium for Political and Social Research (ICPSR).

Scroll to Top