Directed Study: EDM and LA: February 2014

Thursday, February 20, 2014

Knowledge Inference or Latent Knowledge Estimation

Ryan Baker, in week four of the Big Data in Education Course, provides various examples of methodologies all focused on "measuring what a student knows at a specific time," or knowledge inference. This is "often operationalized as measuring what relevant knowledge components a student knows at a specific time." He then describes what a knowledge component is. It is, "anything a student can know, that's meaningful to the current learning situation, which might include skills, knowledge of facts, knowledge of concepts, knowledge of principles, knowledge of schemas." In short, he says, "anything a student can know, can be a knowledge component."

Baker suggests at least three reasons why it is useful to measure what a student knows:

Because education's primary goal is enhancing student knowledge, by measuring it, "you know whether you're making it better."
In addition, if you measure what a student knows, you can inform instructors and other stakeholders about it
Finally, "if you can measure it, you can make automated pedagogical decisions."

He adds the caution measuring what a student knows is different than measuring performance. Specifically he notes, "inferring if a student's performance right now is associated with successfully demonstrating a skill, is not the same as knowing whether the student has this latent skill." A student could guess and perform well. Or a student could slip up when they actual know the skill. Baker suggests one difference is not just looking at performance "at one specific moment," but looking at performance over time to observe patterns in performance.

Observing patterns in performance is at the center of the research I am conducting in the context of online spreadsheet learning. An individual's performance over time does show signs of misconceptions of knowledge. By identifying the underlying misconceptions, in terms of knowledge components, intervention efforts can be focused to specific knowledge components instead of the traditional response - please try again.

One methodology that seems especially relevant to this research is Learning Factors Analysis. I will write a followup post about this particular methodology.

Thursday, February 13, 2014

Datafication and Digitization

I found another important thought from Big Data: A Revolution - the concept of datafication and digitization. Datafication is making information about the world into a reusable, reviewable form. In old days, it was book keeping, logs, and journals. Nowadays, it includes logs of user behavior with systems stored in electronic databases. Digitization is taking hard copy forms of information and making them electronic. This would include Google's efforts to scan the world's books and store them in electronic format. This effort becomes datafication when Google OCRs the books so that people can search for words and patterns between words.

We are digitizing learning. We put the syllabus and needed files online so we no longer have to worry about printing costs. We ask learners to turn things in online, to watch videos made available online, or participate in an online discussion. We datafy this when we have information about these objects and interactions--where things are stored, how it was used, how many times it was used, who used it.

The power of Big Data is when we have lots of data to analyze. Being able to do Big Data in education, however, is limited. We have digitized a lot of learning, but have barely scratched the surface of what data is currently available and what data could yet be made. It is common to be able to access page views, and loggin counts, and number of messages sent from the LMS, but more could be done. Length of time learning, mouse tracking, and capturing information about on-task and off-task behavior could be harnessed. Woolf et al. (2009) describe efforts to capture learner's emotions using mouse pressure, seat pressure, facial recognition, and other biosensor systems. The challenge of our day is to continue to find ways to datafy the learning experience.

Learning Management Systems are trying to datafy more and more of the interactions of users and make that data available to others; however, not all of learning is taking place in an LMS. School administrative systems contain useful learner data, such as GPA, course completion statuses, exam scores, and final grades for courses. Tin Can is a service that tries to collect data from LMSs, website visits, and other online interactions and makes all that data available in useful ways. Still, a challenge is that much of learning is not taking place online. We can not get the true value of Big Data analysis on learning until we have lots of data about the full range of the learning experience. Finding ways to capture data about face-to-face interactions is key.

A final challenge, of course, is student privacy rights. Who should have access to student data? How should it be ethically used? How should it be stored. What can students do about access to their data? As more and more data becomes available, and using that data becomes more widespread, these privacy questions will demand more attention.

Woolf, B., Burleson, W., Arroyo, I., Dragon, T., Cooper, D., & Picard, R. (2009). Affect-aware tutors: Recognising and responding to student affect. International Journal of Learning Technology, 4(3/4), 129–164.

Wednesday, February 12, 2014

Case Study Reviews (2 of 2)

This is the second in a two-part series reviewing case studies of learning analytics. The examples in this and the previous post come directly from the LAK 2013 online course by George Seimens. The purpose of these posts is to look more closely at the kind of data being used and the ways these data are being used.

Building a Purpose Network to Increase Student Engagement and Retention
Purpose: Improve Retention
How: Increasing student engagement, parent engagement, tracking success measures
Data: (from picture) late to class, missed assignments, absence, illness/fatigue, substance abuse, other

This system provides dashboards and comparisons grouped by course, instructor, year in school. Also, this system tracks action plans and recurrence of issues. I can't help but think that most intervention-focused systems, such as this one, implement intervention outside of the instructor. This may not be a bad thing, but it limits the effectiveness of the intervention to generalizable academic skills or responsibility or social scaffolds, which are different from the academic experience the student has with the learning content.

Efficiencies, Learning Outcomes Bolstered by Analytics, Data-Informed Decision Making
Purpose: Improve scheduling (room resources), success rate for Math 010, course offerings,
Data: Student enrollment (by course), success rate for Math 010, career interest data, transferability (to other schools)

This article is a list of 3 or 4 efforts of applying data to making school-level decisions--how to allocate campus resources more effectively, how to improve the success rate of Math 010, how to align course offerings to meet students' transfer objectives.

Improving Retention by Identifying and Supporting "At-Risk" Students
Purpose: Identify, support "at-risk" students
Data: Virtual Learning Environment(VLE) for online course provides interesting data from students

This kind of data is relatively new compared to a traditional classroom. This data affords new measures of engagement for the student. Interestingly, the recommendation and observation is that comparing students to a single engagement profile is not as effective at identifying risk because one student may have a high engagement profile in the VLE, but may not be learning as well as other students who measure low on VLE activity, but score high in the course. Thus, student VLE activity should be compared to previous VLE activity for that student. But the article also notes that different modules in the course will also affect the student's VLE activity. Overall, the application introduces new data that is harder to obtain in traditional face-to-face classes, and provides initial recommendations for how to use that data to identify at-risk students.

Thursday, February 6, 2014

Dethroning Causation

I'm reading a book called Big Data: A Revolution That Will Transform How We Live, Work, and Think, by Mayer-Shonberger and Cukier. I just got to the chapter, "Correlation," which discusses how causation is being dethroned by correlation. I remember my introduction to research and statistics 1 classes, where causation was trumpeted as the gold-standard of research. The aim of traditional research has been to take data from a small group and be able to make claims about a larger group.

Big data shakes things up because we can now get lots of data from the larger group. The book reviews an example about Google, who was able to predict the spread of the flu based on the search terms people were putting into Google. The search terms didn't cause the spread, but the correlation was extremely valuable to health organizations and governments wanting to target interventions and to track or stop the spread.

Big data in education means we can look for meaningful relationships among all sorts of data being produced in electronic learning contexts. Much of the data we can get from the electronic world would be extremely unreasonable to track by people in a traditional face-to-face classroom. Machines can capture, store, process, and analyze information much more efficiently than people can. We can look at big groups, like all student data across the university to a single user's data from on learning activity. And generalizing doesn't seem so important any more. The machine can learn and revise predicting and categorizing models as it gathers more data.

I have thought more and more about what value theory or traditional empirical research has any more. If you just need to let the data speak to find meaningful relationships and patterns, what does the "why" matter as long as the "what" helps us get things done? From what I have learned from Dr. Gibbons Explore, Explain, Design framework (Gibbons & Bunderson 2005), design work, which is focused more on achieving outcomes rather than explaining why outcomes were achieved, is becoming more valuable in the world of big data than traditional explain work done by science. This very notion was argued by Chris Anderson in Wired: http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

So, why does theory and explain research matter? As Mayer-Shonberger and Cukier argue, we use theories and models to build the algorithms that analyze data, and we use theories and models to make sense of the relationships discovered. "After all," they said, "Google used search terms as a proxy for the flu, not the length of people's hair." Theory and models increasingly matter to enable us to make sense of all the data the digital world makes available to us.

Saturday, February 1, 2014

Case Study Reviews (1 of 2)

Recently, I read some examples of cases where Learning Analytics was being applied. Below is a short description of the kinds of data in each of the cases and what was trying to be explained. It is interesting to me that these four do not center on data that a teacher would use to evaluate instruction or learning. Most of this data is centered on institutional statistics rather than more traditional teacher-based data (except for grades), which include evaluating a learner on a particular skill or knowledge component. Another observation is that there is relatively little NEW data being captured. In most cases, there are new connections between existing data that are examined, evaluated, and regressed. But essentially no new data that is being captured here. I suppose one reason for this is it is easier to use available data than to have to create a new data collection process and infrastructure at the institution level. Maybe instructors have the advantage of measuring or observing new data points that could improve the diagnostic/intervening capacity of an instructor or instructional technology.

Cases and Examples of learning Analytics:	Notes
SNAPP Overview	A pre, 2002-built system using social network analysis to identify, measure, track, and explain interactionas (IMHO, a proxy for learning)
Austin Peay State University: Degree Compass	Recommends courses a student should take based on the predicted grade they will receive in the course using course and course grade data from past students
Two Case Studies of Learner Analytics in the University System of Maryland	UMED - Using pre-enrollment (housing paid, tuition paid, enrolled/no data, admissions) data to address retention and graduation rates for at-risk student populations. BSU - fascinating collection of data, but implementation too early to tell how data is connected to intervention yet (lots of ideas, but little tried yet), technology integration is impressive (student services, financial aid, recruitment/admissions, registration/advising, academic services(testing services, learneng communities, tutorng), Curriculum & instruction( course roster, class sched, attendance, assignment scores, official grades)
Analytics in Progress: Technology Use, Student Characteristics, and Student Achievement	Using LMS data (use) and student characteristics to correlate with student achievement, student characteristics (gender, race/ethnicity, family income, previous academic achievement)