Skip to main content

AI can be easily fooled

April 14, 2022

AI can be easily fooled

April 14, 2022

Fooling AI

Yao Li’s interview with Endeavors. Yao Li is an assistant professor in the Department of Statistics and Operations Research within the UNC College of Arts & Sciences. Her research focuses on developing efficient and robust machine learning models to solve real-world problems.

Q: When you were a child, what was your response to this question: “What do you want to be when you grow up?”

A: When I was around 7 years old, my father bought me a book series, “Five Thousand Years of Chinese Nation,” as a birthday gift. It stimulated my interest in ancient civilizations. So, when I was young, I wanted to be an archaeologist, rediscovering our history. Even now, I’m still very interested in it.

Q: Share the pivotal moment in your life that helped you choose your field of study.

A: To be honest, I chose materials chemistry when I applied to college. But I was reallocated to business school by the admission committee. Then, I selected statistics as my major. I really appreciate that adjustment at this moment because I found out that statisticians can work with people from different fields.

Q: Tell us about a time you encountered a tricky problem. How did you handle it and what did you learn from it

A: Before I started doing research, I had never taken any coding classes — and the projects I work on need strong coding skill to implement the methods. I only knew a little bit of R, a programming language. The first project I worked on required me to implement our method into a program called MATLAB and make it faster so that it can work on large datasets, with limited time. I learned MATLAB online via video tutorials and completed the task. Later, when I needed to learn Python, C++, and other things, I searched online for resources. I think this is amazing. Technology is changing our lives in many ways, including the way we learn.

In 2017, Li (center) went on a winery tour of Napa Valley with her parents.

Q: Describe your research in 5 words.

A: AI can be easily fooled.

Q: What are your passions outside of research?

A: Reading, traveling, snowboarding, and board games. Reading is an easy and affordable way to experience different lives. Travel is the opportunity to have new experiences and witness different cultures. Both expand the limits of my life.

Hotelling Lectures 2022

April 13, 2022

Bin Yu to deliver Hotelling Lectures

April 13, 2022

Hotelling Lectures

The Hotelling Lectures are an annual event in the Department of Statistics & Operations Research at the University of North Carolina – Chapel Hill, honoring the memory of Professor Harold Hotelling our first chairman. This year we are honored to have Professor Bin Yu from the University of California at Berkeley deliver our two Hotelling lectures which are open to the public.

Biography
Bin Yu is Chancellor’s Distinguished Professor and Class of 1936 Second Chair in the departments of statistics and EECS at UC Berkeley. She leads the Yu Group which consists of students and postdocs from Statistics and EECS. She was formally trained as a statistician, but her research extends beyond the realm of statistics. Together with her group, her work has leveraged new computational developments to solve important scientific problems by combining novel statistical machine learning approaches with the domain expertise of her many collaborators in neuroscience, genomics and precision medicine. She and her team develop relevant theory to understand random forests and deep learning for insight into and guidance for practice.
She is a member of the U.S. National Academy of Sciences and of the American Academy of Arts and Sciences. She is Past President of the Institute of Mathematical Statistics (IMS), Guggenheim Fellow, Tukey Memorial Lecturer of the Bernoulli Society, Rietz Lecturer of IMS, and a COPSS E. L. Scott prize winner. She holds an Honorary Doctorate from The University of Lausanne (UNIL), Faculty of Business and Economics, in Switzerland. She has recently served on the inaugural scientific advisory committee of the UK Turing Institute for Data Science and AI, and is serving on the editorial board of Proceedings of National Academy of Sciences (PNAS).

Veridical Data Science: the practice of responsible data analysis and decision-making
Tuesday, April 19, 2022 (4:00-5:00pm 209 Manning Hall)
Reception following the lecture 5:00-6:00pm in the 3rd Floor lounge of Hanes Hall

“A.I. is like nuclear energy — both promising and dangerous” — Bill Gates, 2019.
Data Science is a pillar of A.I. and has driven most of recent cutting-edge discoveries in biomedical research and beyond. In practice, Data Science has a life cycle (DSLC) that includes problem formulation, data collection, data cleaning, modeling, result interpretation and the drawing of conclusions. Human judgement calls are ubiquitous at every step of this process, e.g., in choosing data cleaning methods, predictive algorithms and data perturbations. Such judgment calls are often responsible for the “dangers” of A.I. To maximally mitigate these dangers, we developed a framework based on three core principles: Predictability, Computability and Stability (PCS). Through a workflow and documentation (in R Markdown or Jupyter Notebook) that allows one to manage the whole DSLC, the PCS framework unifies, streamlines and expands on the best practices of machine learning and statistics – taking a step forward towards veridical Data Science. In this lecture, we will illustrate the PCS framework through the development of of iterative random forests (iRF) for predictive and stable non-linear interaction discovery and through using iRF and UK biobank data to find gene-gene interactions driving, respectively, red-hair and a heart disease called hypertrophic cariomyopathy.

Interpreting deep neural networks towards trustworthiness
Wednesday, April 20, 2022 (3:30-4:30pm 120 Hanes Hall)
Reception prior to the lecture 3:00-3:30pm in the 3rd Floor lounge of Hanes Hall

Recent deep learning models have achieved impressive predictive performance by learning complex functions of many variables, often at the cost of interpretability. This lecture first defines interpretable machine learning in general and introduces the agglomerative contextual decomposition (ACD) method to interpret neural networks. Extending ACD to the scientifically meaningful frequency domain, an adaptive wavelet distillation (AWD) interpretation method is developed. AWD is shown to be both outperforming deep neural networks and interpretable in two prediction problems from cosmology and cell biology. Finally, a quality-controlled data science life cycle is advocated for building any model for trustworthy interpretation and introduce a Predictability Computability Stability (PCS) framework for such a data science life cycle.

UNC Science Expo

April 6, 2022

STOR at the UNC Science Expo

April 6, 2022

UNC Science Expo

This coming Saturday, April 9th, UNC will be holding its annual Science Expo.

Come and join our students and faculty hosting the STOR booth this year! We have prepared a variety of fun games to showcase our department.

Graduate program ranking

April 6, 2022

Graduate program ranked 11th in the nation

April 6, 2022

Graduate ranking

Numerous University of North Carolina at Chapel Hill graduate programs received high rankings as part of U.S. News & World Report’s “Best Graduate Schools” list.

Our graduate program is ranked 11th among all Statistics programs in the nation.

With a graduate degree, statisticians may find jobs working with data in many sectors, including business, government, academia, public health, technology and other science fields.

CAREER grant

December 28, 2021

Sayan Banerjee receives CAREER grant

December 28, 2021

CAREER grant

Sayan Banerjee received an NSF CAREER grant on

Network Centrality and Its Applications in Detection, Dynamics, and Load Balancing

The project aims to develop a universal mathematical understanding of networks that evolve over time and processes that live on them. The key idea is to identify certain network attributes that carry a footprint of a network’s past as it evolves and exploit them in reconstructing the early stages of a network from its current configuration. This can be used to detect the origin of a rumor spread, popular individuals and their influence in a social network, or a source of a disease outbreak. Another key research direction is the systematic understanding of how local interactions in a large network influence its global geometry. This can be used to increase the net efficiency of a network of servers through cooperative local interactions that will have a long-term impact in improving routing schemes at airport security, big supermarkets, and distribution of vaccines and antidotes.

Seeing patterns in the data

December 6, 2021

Seeing patterns in the data

December 6, 2021

Data patterns

Mariana Olvera-Cravioto uses mathematical models to understand complex topics. She hopes the new data science minor will make data more accessible to students.

If you type “UNC” into Google, chances are you’ll see a list of links that relate not only to UNC, but also to your personal connection to UNC — whether you’re a student, faculty member or alumnus.

Why is that?

“A set of algorithms operating in the background of your Google search allows this to happen,” said Mariana Olvera-Cravioto, an associate professor in the department of statistics and operations research, or STOR. “In the early ’90s, Google created an algorithm for relevance, and it’s become more and more personalized — now it’s super-personal.”

As social media and internet connectivity become more embedded in daily life, most people are familiar with the concept of algorithms. But the inner workings of why and how they function, and their constant evolution, raise myriad questions for applied probability researchers like Olvera-Cravioto.

“By the time I finish this sentence, a million things will have changed in countless random ways across the internet,” she said. “It’s impossible to capture it exactly.”

To tackle the seemingly impossible challenge of “predicting” patterns online, Olvera-Cravioto and her colleagues use probabilistic models. With these models, they can generate simplified versions of real-world scenarios. Why does one website receive a higher PageRank score than a similar site? What happens if a major server goes down?

“The internet is like a brain — it can create new pathways,” she said. “Even if a few key elements break down, it will find a way to keep functioning.”

Visualizing the complexity of the internet and comparing it to neural networks in the brain comes naturally to Olvera-Cravioto. Growing up in Mexico City, she developed a keen interest in math and science, thanks to both of her parents being doctors. She assumed she would become one, too.

“I always enjoyed talking about biology and medicine, but at some point in high school I realized I didn’t like memorizing things — and there is a lot of that in medicine,” she said.

When she entered college and began studying applied math, she knew she had found her calling. “Math was one of those things that always clicked in my brain. It didn’t matter if I felt tired or uninspired, I could always do it.”

Olvera-Cravioto joined STOR in 2018. Now she is leading the new minor in data science program, which she helped design. The minor launched this fall, and she hopes it will attract students from diverse backgrounds and interests. [See story on the new data science minor in The Scoop on page 33.]

“Many of our students will end up in jobs where at some point they will have a data set of information from which they need to obtain knowledge or discern a pattern,” Olvera-Cravioto said.

A marketing professional may look at data for how many people clicked on an online advertisement. A journalist will see how many people liked or shared their articles.

“We’re collecting data everywhere all the time,” Olvera-Cravioto noted. “We need to ensure our students are learning the standard tools to start working with it.”

One of the main goals of the minor is to make data science more accessible.

“It’s something that more and more people need to know how to do, and we want students to not feel so intimidated by it,” she said. “They don’t have to be in a STEM major to use data. And who knows, maybe they’ll like it.”

 By Mary Lide Parker ’10

Emerging patterns in large random systems

December 5, 2021

Emerging patterns in large random systems

December 5, 2021

Random systems

Sayan Banerjee’s interview with Endeavors. Sayan Banerjee is an assistant professor in the Department of Statistics and Operations Research within the UNC College of Arts & Sciences. He studies emerging patterns in large random systems.

Q: When you were a child, what was your response to this question: “What do you want to be when you grow up?”

A: A writer. I spent a lot of time writing poems and short stories and participating in elocution contests. My grandfather was a romantic at heart, and he inspired in me a deep appreciation for literature. I always liked math, but more in the form of solving puzzles that gave that all-too-familiar dopamine rush. It was only later in life that I realized that creativity has a universal appeal, be it through poetry or through mathematical theorems, and I chose the latter avenue.

Q: Share the pivotal moment in your life that helped you choose your field of study.

A: In high school, I subscribed to a math magazine called Mathematics Today, which contained weekly challenges. I started spending more and more time thinking about them and awaiting solutions that were published the following week. I would attribute the start of my mathematical career to those challenges.

Later, at the Indian Statistical Institute, I was lucky to have been taught by several influential Indian mathematicians of the time who introduced me to the more “poetic” side of mathematics. Instead of solving specific problems, the focus there was to develop general abstract theory to set intuition on a logically sound pedestal.

Sayan Banerjee and his wife

Banarjee and his wife on a recent trip to Juneau, Alaska.

Q: Tell us about a time you encountered a tricky problem. How did you handle it and what did you learn from it?

A: Early in my PhD, I asked my advisor, Krzysztof “Chris” Burdzy, for a thesis problem. I was expecting him to tell me to read a bunch of research papers and extend or generalize an existing result. Instead, he gave me a picture! It was a snapshot of an evolving process comprising a bunch of particles, where there was a “leader” who moved in some random fashion — called a random walk or Brownian motion — and the remaining particles followed the leader. As the number of particles grew, this “Brownian conga line” showed interesting geometric patterns. On seeing the confused look on my face, Chris said that a big part of my PhD will be trying to interpret this picture.

I read, learned, and applied whatever I thought was useful to understand it. Most techniques failed. Some worked. In the course of a year, this problem taught me more about research than anything else before or since. A good research problem is what puts you out of your comfort zone and challenges what you think you know. This one shaped my view of research. I don’t quite care if the problem I am working on concerns the hottest topic, as long as it makes me intellectually curious and leads to new and elegant mathematics.

PS: The “Brownian Conga Line” eventually became the main part of my thesis. It got published in one of the top probability journals.

Q: Describe your research in 5 words.

A: Random thoughts on random things.

Q: What are your passions outside of research?

A: I have always been somewhat of an amateur musician (vocals and some guitar). Both my parents are musically inclined, and they were a big influence growing up. Throughout college, grad school, and my years at UNC, I have been part of amateur musical groups. I have had far more good ideas while strumming my guitar than when I am actually working.

Besides, I am really into coffee. I did most of my research in cafés before the pandemic. I work well with passive distractions like the buzz of a café. During the pandemic, I was going crazy until my wife got me an espresso machine. We ceremoniously take Saturdays off from work to try out different brunch places and hang out in libraries.

We are hiring

September 28, 2021

Faculty Positions

September 28, 2021

Faculty Positions

The department seeks to hire three new faculty members with strong teaching and research credentials in any area of the department’s expertise relevant to the theory and applications of data science including machine learning. The appointments are expected to be at the level of assistant professor though one may be at associate professor level. These positions are expected to start July 1, 2022.

The department and university are committed to diversity, equity and inclusion, advancing the ideals espoused at https://diversity.unc.edu. We welcome applications from candidates who will add to the department’s diversity. We will begin considering candidates after November 8, 2021, and will continue accepting applications until the positions are filled.

Come work with us!

Newsletter 2021

September 14, 2021

STORies Newsletter – 2021

September 14, 2021

Newsletter

Dear Friends,
Welcome to the 2021 annual issue of STORies, which looks back at the department’s past academic year 2020/2021 and considers its outlook.

Starting with the elephant in the room, the COVID-19 pandemic presented many challenges over the academic year 2020/2021, from maintaining the department’s educational and other missions mostly remotely to dealing with financial shortfalls at the university level and most sadly to personal losses in the extended families of our faculty, staff and students. There is some consolation in the fact that we all have been confronting this together.

There is also optimism in that vaccines will allow a return to some form of normal soon. In fact, the university is starting the new academic year 2021/2022 with in-person classes even during another surge in COVID-19 cases. It is still to be seen whether the very high vaccination rates in the university community will allow to continue with this in-person experience that is foremost important to our students, but also their parents, faculty, staff and others.

The pandemic aside, there have been several exciting developments in the department over the past year. We have moved towards improving engagement with donors and alumni and revamping the department’s projected image. This is exemplified by our new website stor.unc.edu. Its launch required huge efforts from several dedicated faculty and contributions from the rest of the department’s community. We expect the webpage to continue improving and certainly reflect the latest departmental activities.

We have continued pushing into the data science/analytics domain. Efforts have been ongoing to expand data science activities at the university level, with the department aiming to play a central role. We are organizing the first Data Science & Analytics Career Fair to be held this fall.

The department and other partners launched a data science minor in the college.

Related to our programs, our undergraduate major and minor continue expanding, with the combined number of 1000+ students in each of the last two years. The major was even among Carolina’s top 10 majors from the Class of 2021. The department once again attracted an excellent and diverse incoming class of graduate students introduced in the STORies below. The class will continue to follow a revamped graduate curriculum, which we constantly work on improving.

Vladas Pipiras

Looking ahead at the new academic year 2021/2022, the department’s priorities include the continued push into data science (with data science major, data science lab and other initiatives), revamping our MS program, and planning for the celebration of department’s anniversary. Indeed, it has been 75 years since the foundation of the Department of Statistics, and around 50 years since Operations Research started as a curriculum program. The ongoing pandemic has hampered our planning for the anniversaries, but we hope for some celebration in 2022.

Do continue supporting the department in any way you can!

Vladas Pipiras
Department Chair

Read the entire newsletter

New data science minor

August 23, 2021

New Data Science Minor

August 23, 2021

Data Science Minor
Data Science Minor

Our department has introduced a Data Science minor in Fall 2021 that is designed to appeal to students majoring in a broad array of disciplines. The minor is an important component of the soon-to-launch data science initiative, a pan-University effort.

To minor in data science, a student will take five courses in all. Three of the courses fulfill core requirements: Data and Computational Thinking; Data and Statistical Thinking; and Data, Culture and Society. Each core requirement can be met by taking one of several courses. For example, four courses fulfill the computational thinking requirement — two in computer science, one in geography and one in political science.

Beyond fulfilling the three core requirements, students take two elective courses, selecting among dozens of offerings from more than 20 departments — from anthropology to English to linguistics to biology, as well as STOR. Most of the electives are in College academic departments, but the Gillings School of Global Public Health, the Hussman School of Journalism and Media, and Kenan-Flagler Business School also have elective courses in the minor.