Directed Study: EDM and LA: Methodologies

This page is a basic description of methodologies that have been used in education to analyze and use data collected through data mining.

1. Prediction

Prediction is where you try to determine some aspect of a variable based on a pattern shown in other variables. The variable of interest is the predicted variable, whereas the combination of other variables become predictor variables.

Some questions relevant to prediction could include:

1. A student is watching a video right now. Based on other data that can be collected, is that student bored or frustrated?
2. How likely is a student to know the answer in the current problem based on their work in previous problems?
3. Based on academic history, what might the student's college entrance exam score look like?

Application:

1. Look for where students get bored or frustrated to know where to focus improvements to the instructional design.
2. Offer help or recommendations based on the emotional state that a student is predicted to be.
3. Inform a teacher if a student is predicted to be frustrated so the teacher can know to offer help.

a. Regression

There are several types of regression. The most basic is simple linear regression. The goal of simple linear regression is to build a linear model (called "regressor" in data mining lingo) that can be used to predict the value of a predicted variable based on the value of a predictor variable. When data is not mapped out linearly, the data can often be transformed to still allow for linear regression analysis.

If data is not linear, but takes a meaningful structure, you can apply the nonparametric regression trees to break up the linear algorithm into parts that match the structure. For instance, the predictor may at an incline on a graph, but at some point it flattens out, and later declines. You could break up the algorithm into three (regression trees) that each predict y based on direction of x for each of the three parts of the data.

Regression could be used to answer these types of questions:

1. Based on a student's performance, what would be their final exam score?
2. This sophomore has X GPA. What is their likelihood of completing college?
3. With the current textbook demand for this class, how many should we plan to purchase next semester for X number of students?

b. Classification

Classification is used to predict in what group or category ("label" in data mining lingo) a person with a particular data pattern would fit best. The patterns that have been established are called classifiers. The possible types of data to build classifiers are called features.

Algorithms to develop classifiers include:

* Step regression
-used for binary classification (0, 1)
-uses a linear regression function
-an arbitrary cut-off point is determined (>=0.5, or >=0.7, etc)
-you select your features, determine each feature's weight (Y=0.5 a + 0.2b - 0.3c + 0.4 d), apply the algorithm to the data, and see if it makes the cut-off point. If it does, the data pattern would be tagged with a determined label.
-interaction effects may alter findings (fast responses could be a good thing or a bad thing based on what other features may be in interaction--like whether student already knows the material or not)
* Logistic regression
-also used for binary classification (0, 1)
-determines the probability of an observed value (set of values) of fitting under a certain label
-multinomial logistic regression can do more groups for classification, other than just the two.
-interaction effects may alter findings
* J48/C.45 decision trees
-specifically deals with interaction effects
-if X > than 0.5 and Y is <0.4, then Z
-if X > than 0.5 and Y is >0.4, then Q
-if X < than 0.5 and Y is <0.4, then R
-etc
-can handle numerical or categorical predictor variables
-branches can be adjusted/pruned based on predictive power
-good for when data has natural splits (bimodal distribution)
-good for when same construct can be arrived at in multiple ways

Essentially, a decision tree analyzes the variables and tries to determine which variable best splits the data into two groups, where members of a group are most like each other in regards to the other variables. Then, another variable is identified that best splits a group again, and so on, until all members of a group are completely like each other. From there, you can classify the group, or predict a value of a target variable in relation to the group. Suppose you want to identify who feels most comfortable with technology and those who don't. You know a lot about a group of people where you are trying to determine who will like technology and who won't. You find that your first most informative variable is whether people have used technology before. Now you have two groups - those who have used technology and those who haven't. With the group that has used technology, perhaps you find that if you are over 50, you likely don't feel comfortable using technology, while those under 50 tend to feel comfortable using technology. You could continue to divide groups until specific relationships of variables have been identified that best lead to determining whether someone will be comfortable using technology or not.

* JRip decision rules
-An algorithm that builds and prunes decision trees
* K* instance-based classifiers
-A set of algorithms used in machine learning, often for knowledge discovery. It takes new instances and compares them to previous ones, and decides, using the algorithm, whether to modify the model.
-This is also known as nearest neighbor, which can be used for prediction in general. If something with an unknown quality is very similar to three other things, we might use the value of the target quality from the three other cases as a way to predict what the unknown quality of the subject is (majority voting [probability], mean, median).
* Bayesian Statistics and Class-Probability Estimation
-Given the available evidence, what is the probability that this instance belongs to this class?
* etc.

Which algorithms you use often depend on the domain you are working in.

Classification could be used to answer these types of questions:

1. According to this student's level of participation in this class, are they on track for success, at moderate risk for failure, or likely to dropout?
2. Based on this student's use of the learning management system, what grade are they likely to get?
3. By following this student's performance with this online game, are they learning successfully, gaming the system, or quitting?

c. Latent Knowledge Estimation

Latent knowledge estimation is a type of classifier that assess a student's knowledge or skills based on patterns of correctness from learning activities, assessments, etc. It almost seems like just simple testing. But it gets more complex as it creates an adaptive model. As the student continues to learn and do, their knowledge estimation continues to change. Latent knowledge estimation is often used by intelligent tutoring systems to direct the system as to what to do next to help students achieve objectives. This might be the kind of think Khan Academy would be interested in for developing its math tutoring system.

* Bayesian Knowledge Tracing

Predicts the probability that a student has mastered a skill based on previous performance. Also has been used to update the model that determines whether mastery has been achieved.

* Performance Factors Analysis

2. Structure Discovery

With prediction, you define a structure that you think will show patterns. There is an end goal of what you hope to predict. In structure discovery, you go into a data set not knowing what you'll find. You use algorithms to search the data to discover what patterns exist.

a. Clustering

Takes the data set and groups similar types of data points together in clusters. Imagine seeing a scatterplot where two groups seem to be clearly forming. This is the purpose of clustering--to identify what groups seem to be forming from the data. Clustering itself is not an algorithm, but rather what you want to do with data. Then you'd go find appropriate methods of analysis to do it. Wikipedia gets into models and algorithms to do clustering: http://en.wikipedia.org/wiki/Cluster_analysis

b. Factor Analysis

Similar to clustering, but rather than grouping data points, it groups like variables together--I assume that this is based on correlation.

c. Social Network Analysis

Examines relationships among members of a group. Could be used to compare the strengths of different groups of students where the groups were made to complete a project, how socially engaged students are in a learning experience, or how their position in a network relates to their perceptions of how they fit in the community.

d. Domain Structure Discovery

The purpose of this analysis is to find structures of knowledge within a domain. In education, this is often used to discover relationships between skills and test items or educational content (i.e. based on patterns, what do students need to know to do ___? or what test items are getting at critical thinking?).

3. Relationship Mining

Relationship mining is similar to structure discovery in that you are looking to learn from what exists in the data. Specifically with relationship mining, you are looking for which variables in the data correlate, or which variables have the strongest correlation, or which variables correlate most with a particular variable of interest.

a. Association Rule Mining & Co-Occurrence

With association rule mining, you are trying to determine which set of variables correlate with the value of a target variable. Baker & Siemens refers to Ben-Naim et al. (2009), who looked to see which learning behaviors correlated most with academic performance so as to be able to inform struggling students what might be done to improve. With association rule mining, you are creating an if-then statement: if these variables are so, then this variable should be so.

With association discovery, you are trying to determine what tends to happen when something else happens? If customers bought x, how often did they also buy y? We can also estimate the degree of likelihood that the association exists, or determine the strength of the association. How likely is an association to be chance?

We could use this similar approach to determining how often two actions followed each other, such as in a learning sequence.

b. Correlation Mining

Looking to see which variables have positive or negative linear correlations with each other.

c. Sequential Pattern Mining

Looking to see if patterns exist in a certain order of events in leading to a desired target variable. One example Baker & Siemens gave was Perera et al. (2009), who wanted to find which sequence of behaviors in group work led to successful completion of a group project.

d. Causal Data Mining

Used to determine whether one event caused another event to occur. Could be useful for prediction.

4. Distillation of Data for Human Judgment

This methodology is not so much about how to analyze data, but how to present analyzed data in a way that can effectively inform meaningful action. This seems to be an emphasis of Learning Analytics.

5. Discovery with Models

A model using methods described above is then used to discover new patterns or make new models. For example, a predictive model may be used to discover what other variables correlate with the predictive model. Baker & Siemens use Beal, Qu, & Lee (2008) as an example: a prediction model of gaming the system was created, and then relationship mining was used to see what learner characteristics correlated most with gaming the system behavior.

* One useful tool John introduced me to was a page on the Pittsburgh Science of Learning Center (PSLC) Data Shop website where you can explore different questions an educational researcher might ask and what tools, methods, or examples could help you identify how to go about answering the question (using data mining and analysis of course). Here is the link to the page: https://pslcdatashop.web.cmu.edu/ResearchGoals#rg_6

from Baker, R. S. J. D., & Yacef, K. (2009). The State of Educational Data Mining in 2009 : A Review and Future Visions. Journal of Educational Data Mining, 1(1), p. 3

and

Baker, R., Siemens, G. (in press) Educational data mining and learning analytics. To appear in Sawyer, K. (Ed.) Cambridge Handbook of the Learning Sciences: 2nd Edition.

and the help of Wikipedia

Directed Study: EDM and LA

Methodologies

No comments:

Post a Comment