PSU assists "big data" analysts with new course
Author: Office of University Communications
Posted: September 24, 2012

September 24, 2012   
PSU assists ”big data” analysts
(Portland, Ore.) — The explosion of big data – headlined in Time, The Wall Street Journal and 60 Minutes – may worry ordinary citizens, but offers promise of new discoveries for scientists to whom Portland State University (PSU) is now offering a new tool in a course, “Data Mining With Information Theory.”

“Big data,” a recent coinage, refers both to the exponential growth of specific information and its accumulation and use in giant computer data banks. Researchers at NIH, the Mars Lander and the National Highway Safety Board, among others, are seeking answers for vexing problems in terabytes (millions of gigabytes) of data.

In Portland, over 3,500 researchers at OHSU, Kaiser Permanente, PSU, Reed, Lewis and Clark, University of Portland, Intel and elsewhere have been crunching big data for years. They are struggling to understand diverse natural and social phenomena such as diabetes onset, particle behavior in chips, drug abuse, our galaxy’s topology, and word usage in Shakespeare’s plays.
To assist big data analysts, Martin Zwick, professor of Systems Science at PSU, is making a new tool available to Portland’s research community. In a ten-week class, beginning September 25th, mid-career professionals in research and data analysis will learn to use Zwick’s brainchild OCCAM—alongside and with the help of the program’s Ph.D. candidates. Twenty years in the making, OCCAM is a computer program that can find subtle, non-obvious relations in data sets with sample sizes as large as 150,000 cases, with hundreds of variables. While this is not “big data” in the sense of terabytes (e.g., sample sizes in the millions or number of variables in the tens or hundreds of thousands), it is large enough to be valuable to many researchers, and anticipated parallelization will allow OCCAM to handle big data in the future.

“Other programs such as SPSS have long been in use for modeling”, says Prof. Zwick, “but usually users need to know or guess at the likely relationships in the data. In complex situations dozens, hundreds, even thousands of explicit hypotheses need to be formulated. Our program, OCCAM, does not require such a priori knowledge or hypotheses about predicting relationships; it directly identifies strong and even very complex relationships in the data, including many that the analyst may not suspect.”

Beginning in fall quarter 2012, Zwick is conducting a laboratory in the practical use of OCCAM, which has already demonstrated its usefulness in such varied fields as biology, medicine, finance, economics, social and environmental studies, mathematics, and engineering. The ten-week course at the customary University fee will allow students to discover relationships in their own data, which they will bring. Plans are underway for a short 3-5 day intensive workshop to be offered in the future.

For more information on the course, contact Prof. Zwick at