TLT-SWG: evaluation methodology

Showing posts with label evaluation methodology. Show all posts

Tuesday, October 27, 2009

Evaluation Methods: Each user is unique. Assess each one first, then look for patterns

On Monday, I talked about my belief, as a novice evaluator and educator, that evaluation (and teaching) should be organized around programmatic goals: describe every student should learn, study each student's progress toward those goals, and study the program activities that are most crucial (and perhaps most risky) for producing those outcomes.

After some years of experience, however, first at Evergreen and then as a program officer with the Fund for the Improvement of Postsecondary Education (FIPSE), I realized that this Uniform Impact was valuable but limited.

In fact, I now think there are two legitimate, valuable ways to think about, and evaluate, any educational program or service:

Uniform Impact: Pay attention to the same learning goal(s) for each and every student (or, if you're evaluating a faculty support program, pay attention whether all the faculty are making progress in a direction chosen by the program leaders).
Unique Uses: Pay attention to the most important positive and negative outcomes for each user of the program, no matter what those outcomes are.

You can see both perspectives in action in many courses. For example, if an instructor gives three papers an “A,” and remarks, “These three papers had almost nothing in common except that, in different ways, they were each excellent,” the instructor is using a Unique Uses perspective to do assessment.

Each of these two perspectives focuses on things that the other perspective would miss. A Unique Uses perspective is especially important in liberal and professional education: they both want to educate students to exercise judgment and make choices. If every student had the same experiences and outcomes, the experience would be training, not liberal or professional education.

Similarly, Unique Uses is important for transformative uses of technology in education, because many of those uses are intended to empower learners and their instructors. For example, when a faculty member assigns students to set their own topics and then use the library and the Internet to do their own research, some of the outcomes can only be assessed through a Unique Uses approach.

What are the basic steps for doing a Unique Uses evaluation?

Pick a selection, probably a random selection, of users of the program (e.g., students).
Use an outsider to ask them what the most important consequences have been from participating in the program, how they were achieved, and why the interviewee thinks their participation in the program helped cause those consequences (evidence).
Use experts with experience in this type of program ( Eliot Eisner has called these kinds of people 'connoisseurs' because they have educated judgment honed by long experience) to analyze the interviews. For each user, the connoisseur would summarize a) the value of the outcome in the connoisseur's eyes, using a single or multiple rating scales
The connoisseur would also comment on whether and how the program seems to have influenced the outcome for this individual, perhaps with suggestions for how the program could do better next time with this type of user.
The connoisseur(s) then look for patterns in these evaluative narratives about individuals. For example, the connoisseur(s) might notice that many of the participants encountered problems when, in one way or another, their work carried them beyond the expertise of their instructors, and that instructors seemed to have no easy strategy for coping with that.
Finally, the connoisseur(s) write a report to the program with a summary judgment, recommendations for improvement, or both, illustrated with data from relevant cases.

To repeat, a comprehensive evaluation of almost any academic program or service ought to have both Uniform Impact and Unique Uses components, because each type of study will pick up findings that the other will miss. Some programs (e.g. a faculty development program that works in an ad hoc manner with each faculty member requesting help) are best served if the evaluation is mostly about Unique Uses. A training program (e.g., most versions of Spanish 101) is probably best evaluated using mainly Uniform Impact methods. But most programs and services need some of each method.

There are subtle, important differences between these two perspectives. For example,

Defining “excellence”: In a Uniform Impact perspective, program excellence consists of producing great value-added (as measured along program goals) regardless of the characteristics or motivations of the incoming students. In contrast, program excellence in Unique Uses terms is measured in part by generativity: Shakespeare's plays are timeless classics in part because there are so many great, even surprising ways to enact them, even after 400 years. The producer, director and actors are unique users of the text.
Defining the 'technology”: From a Uniform Impact perspective, the technology will be the same for all users. From a Unique Uses perspective, one notices that different users make different choices of which technologies to use, how to use them, and how to use their products.

For more on our recommendations about how to design evaluations, especially studies of educational uses of technology, see the Flashlight Evaluation Handbook. The Flashlight Approach, a PDF in Section I, gives a summary of the key ideas.

Have any evaluations or assessments at your institution used Unique Uses methods? Should they in the future? Please click the comments button below and share your observations and reactions.

PS We're over 3,300 visits to http://bit.ly/ten_things_table. So far, however, most people seem to look at the summary and perhaps one essay. Come back, read more of these mini-essays, and share more of your own observations!

Monday, October 26, 2009

12. To evaluate ed tech, set learning goals & assess student progress toward them (OK but what does this approach miss?)

It's Monday so let's talk about another one of those things I no longer (quite) believe about evaluation of educational uses of technology. Definition: “Evaluation” for me is intentional, formal gathering of information about a program in order to make better decisions about that program.

In 1975, I was the institutional evaluator at The Evergreen State College in Olympia, Washington. I'd offer faculty help in answering their own questions about their own academic programs (a “program” is Evergreen's version of a course). Sometimes faculty would ask for help in framing a good evaluative question about their programs. I'd respond, “First, describe the skills, knowledge or other attributes that you want your students to gain from their experience in your program.”

“Define one or more Learning Objectives for your students” remains step 1 for most evaluations today, including (but not limited to) evaluating the good news and bad news about technology use in academic programs. In sections A-E of this series, I've described five families of outcomes (goals) of technology use, and suggested briefly how to assess each one.

However, outcomes assessment by itself provides little guidance for how to improve outcomes. So the next step is to identify the teaching/learning activities that should produce those desired outcomes. Then the evaluator gathers evidence about whether those activities have really happened, and, if not, why not. Evidence about activities can be extremely helpful in a) explaining outcomes, b) improving outcomes, c) investigating the strengths, weaknesses and value of technology (or any sort of resource or facility) for supporting those activities.

Let's illustrate this with an example.

Suppose, for example, that your institution has been experimenting with the use of online chats and emails to help students learn conversational Spanish. As the evaluator, you'd need to have a procedure for assessing each student's competence in understanding and speaking Spanish. Then you'd use that method to assess all students at the end of the program and perhaps also earlier (so you could see what they need at the beginning, how they're doing in the middle, and what they've each gained by the end).

You would also study how the students are using those online communications channels, what the strengths and weaknesses of each channel are for writing in Spanish, whether there is a relationship between each student's use of those channels and their progress in speaking Spanish, and so on.

Your findings from these studies will signal whether online communications are helping students learn to speak Spanish, and how to make the program work better in the future.

Notice that what I've said so far about designing evaluation is entirely defined by program goals.: the definition of goals sets the assessment agenda and also tells which activities are most important to study. I've labeled this the Uniform Impact perspective, because it assumes that the program's goals are what matter, and that those goals are the same for all students.

Does the Uniform Impact perspective describe the way assessment and evaluation are done? Do any assessments and evaluations that you know go beyond the suggestions above? (Please add your observations below by using the “Comments” button.)

PS. “Ten Things” is gaining readers! The alias for the table of contents – http://bit.ly/ten_things_table – has been clicked over 3,200 times already. Thanks! If you agree these are important questions for faculty and administrators to consider, please add your own observations to any of these posts, old or new, and spread the word about this series.

Wednesday, October 21, 2009

K. Evaluation should be mainly formative and should begin immediately.

Earlier, I described some old beliefs about program evaluation. I used to assume that evaluation of TLT had to be summative ("What did this program accomplish? Does that evidence indicate this program should be expanded and replicated? continued? or canceled?"). The most important data would measure program results (outcomes). You've got to wait years to achieve results (e.g., graduating students) and the first set of reults may be distorted by what was going on as the program started up. Consequently, I assumed, evaluation should be scheduled as late as possible in the program.

Many people still believe these propositions and others I mentioned earlier this week. I still get requests: "The deadline for this grant program is in two days. Here's our draft proposal. Out of our budget of $500,000, we've saved $5,000 for evaluation. If you're able to help us, please send us a copy of your standard evaluation plan."

Yug!

Stakeholders need to see what's going on so they can make better, less risky decisions about what to do next. Getting that kind of useful information is called “formative evaluation.” (By the way, a stakeholder is someone who affects or who is affected by a program: its faculty, the staff who provide it with services, its students, and its benefactors, for example.)

In the realm of teaching and learning with technology (TLT), formative evaluation is even more important than in other realms of education. The program is likely to be novel and rapidly changing, as technologies and circumstances change. So the stakeholders are on unfamiliar ground. Their years of experience may not provide reliable guides for what to do next. Formative evaluation can reduce their risks, and help them notice and seize emerging opportunities.

Formative evaluation also can attract stakeholders into helping the evaluator gather evidence. In contrast, summative evaluation is often seen as a threat by stakeholders. “The summative evaluation will tell us we're doing well (and we already know that). Or perhaps the evaluator will misunderstand what we're doing, creating the risk that our program will be cut or canceled before we have a chance to show what this idea can really do. And no one reads those summative reports anyway unless they're looking for an axe to use on a program. So, no, I don't want to spend time on this and, if I'm forced to cooperate, I have no reason to be honest.” In contrast, formative evaluations should be empowering - a good evaluation usually gives the various stakeholders information they need in order to get more value from their involvement with the program.

What many folks don't realize is that formative evaluation requires different kinds of data than summative evaluation does.

Summative evaluation usually must focus on data about results -- outcomes. But outcomes data by itself has little formative value. If you doubt that, consider that a faculty member has just discovered that the class average on the mid term exam was 57.432. Very precise outcomes data. But not very helpful guidance for figuring out how to teach better next week.

In contrast, a formative evaluation of an educational use of technology will often seek to discover a) what users are actually doing with the technology, and b) why they acted that way (which may have nothing to do with the technology itself). (For more guidance on designing such evaluations, see "The Flashlight Approach" and other chapters of the Flashlight Evaluation Handbook.

Corollary #1: The right time to start evaluating is always "now!" Because the focus is likely to be on activities, not technology, the evaluation of the activity can begin before new technologies or techniques go into use. Baseline data can be collected. And, even more importantly, the team can learn about factors that affect the activity (e.g. 'library research') long before new technology (e.g. new search tools) are acquired. This kind of evaluation can yield insights to assure that the new resources are used to best effect starting on day 1 of their availability.

Corollary #1: When creating an action plan or grant proposal, get an evaluator onto your planning team quickly. An experienced, skillful evaluator should be able to help you develop a more effective, safer action plan.

Corollary #2: when developing your budget, the money and effort needed for evaluation (what the military might call 'intelligence gathering”) may be substantial, especially if your program is breaking new ground.

What are the most helpful, influential evaluations you've seen in the TLT realm? Did they look like this? What kind of information did they gather? Next week, I'm discuss how our current infatuation with learning objectives has overshadowed some very important kinds of evidence, and potentially discouraged us from grabbing some of the most important benefits of technology use in education, benefits that can't be measured by mass progress on outcomes.

Monday, October 19, 2009

11. Evaluating TLT: Suggestions to date, and some old beliefs

For the next couple weeks, I'll be writing about evaluation of eLearning, information literacy programs, high tech classrooms, and other educational uses of technology.

Actually, I've been commenting on evaluation in many of the prior posts, so let's begin with a restatement of suggestions I've made over the last 2 months in this blog series:

Focus on what people are actually doing with help from technology (their activities, especially their repeated activities).
Therefore, when a goal for the technology investment is to attract attention and resources for the academic program, gather data about whether program activities are establishing a sustainable lead over competitors, a lead that attracts attention and resources.
When the goal for technology use is improved learning, focus on whether faculty teaching activities and student learning activities are changing, and whether technology is providing valuable leverage for those changes. (Also assess whether there have been qualitative as well as quantitative improvements in outcomes.)
When the goal is improved access (who can enter and complete your program), measure not only numbers and types of people entering and completing but also study how the ways faculty, staff and students are using technology make the program more (or less) accessible and attractive (literally).
When the goal is cost savings, create models of how people use their time as well as money. And focus on reducing uses of time that are burdensome, while maintaining or improving uses of time that are fulfilling.
When the goal is time-saving, also notice how the saving of time may transform activities, as in the discussion of Reed College in the 1980s, where saving time in rewriting essays led to subtle, cumulative changes in the curriculum and, most likely, in the outcomes of a Reed education.
Gains (and losses) in all the preceding dimensions can be related. So your evaluation plan should usually attend to many, or all, of these dimensions, even if the rationale for the original technology use focused on only one. For example, evaluations of eLearning programs should examine changes in learning, not just access. Evaluations of classroom technology should attend to accessibility, not just learning.

Years ago, I might have looked at a list like this, and also agreed that:

Evaluation should assess outcomes. (how well did we do in the end?)
Evaluation should therefore be done as late as possible in the life of the initiative or project, in order to give those results a chance to become visible (and to resolve startup problems that might have initially obscured what the technology investment could really achieve).
Corollary: When writing a grant proposal, it's helpful to wait until you've virtually completed the project plan and budget before calling in someone like me to write an evaluation plan for you. Just ask the evaluator to contribute the usual boilerplate by tomorrow; after all, evaluation plans are pretty much alike, right?
Corollary #2: If the project succeeds, it will be obvious. If it fails, evaluation can be a threat. So, when developing a budget for your project or program, first allocate every available dollar for the real work. Then budget any dollars that remain for the evaluation.

Do those last four points sound familiar? Have any of those four ideas produced evaluation findings that were worth the time and money? (Tell us about it.) When you plan a project, what purposes do you have for the evaluation? In a couple days, I'll suggest some alternative ideas for evaluation.

PS. This Friday, October 23, at 2 PM ET, please join Steve Gilbert and me online for a live discussion of some of these "Ten Things". Please register in advance by going to this web page, scrolling down to October 23, and following the instructions. And, to help us plan the event, tell us which ideas you'd especially like us to discuss.

Wednesday, August 15, 2007

"Absence of evidence is not evidence of absence"

Too often, lack of conclusive evidence about a product, practice, or principle is prematurely interpreted as proof of lack of value or validity. This pattern of fallacious reasoning can be applied equally inappropriately by opponents of long-standing practices in education (such as lectures) or by opponents of new practices which are perceived as threatening (use of cell phones).
The quotation, excerpts, and article referenced below provide a better explanation of this fallacy and how to avoid it.

"Absence of evidence is not evidence of absence."Carl Sagan,US astronomer & popularizer of astronomy (1934 - 1996) - found on "The Quotations Page" 2007-08-15 at http://www.quotationspage.com/quote/37901.html

"dangers of misinterpretation of non-significantresults"

"…we must question whether theabsence of evidence is a valid enough justification for inaction."

"…we should first ask whether absence of evidence means simplythat there is no information at all."

"While it is usually reasonable not to accept a new treatmentunless there is positive evidence in its favour, when issuesof public health are concerned we must question whether theabsence of evidence is a valid enough justification for inaction.... Can we be comfortablethat the absence of clear evidence in such cases means thatthere is no risk or only a negligible one?"

Above 4 excerpts from:

"Statistics notes: Absence of evidence is not evidence of absence" [Link works as of 2007-08-15]
Douglas G Altman, J Martin Bland, BMJ (British Medical Journal) 1995; 311:485 (19 August). "BMJ is published by BMJ Publishing Group Ltd, a wholly owned subsidiary of the British Medical Association"

Tuesday, May 01, 2007

Info Lit Assessment Online Workshop May 2007

Week 1 Recommended Activity

Write an outcome for your program, drawing upon one of the 9 principles. Start with that you think most important to work on at the current stage of development of your program. You may submit your work in one of 3 ways:

1. privately by email to Anne Zald zald@u.washington.edu or Deb Gilchrist DGilchri@pierce.ctc.edu

2. anonymously - add it as a "comment" to this posting. Go to the bottom of this posting and click on "comments" or "add comment"

3. publicly - - add it as a "comment" to this posting. Go to the bottom of this posting and click on "comments" or "add comment". If you have a Blogger account, use it. If not, include your name and contact info at the end of your comment.

Nine Principles of Good Practice for Assessing Student Learning
http://www.fctel.uncc.edu/pedagogy/assessment/
9Principles.html
http://www.tcnj.edu/~assess/principles.html

OTHER RESOURCES FOR THIS WORKSHOP

Archives of synchronous sessions and other info:
http://oli.tltgroup.org/2007/spring/ILAssessment/workshop.htm

Library Instruction Outcomes: Available in Microsoft Word

Characteristics of Programs of Information Literacy that Illustrate Best Practices: A Guideline http://www.ala.org/ala/acrl/acrlstandards/characteristics.htm

Information Literacy Competency Standards for Higher Education http://www.ala.org/ala/acrl/acrlstandards/
informationliteracycompetency.htm

Task Force on Academic Library Outcomes Assessment Report http://www.ala.org/ala/acrl/acrlpubs/whitepapers/
taskforceacademic.htm

Principles of Good Assessment
http://planning.tltgroup.org/ILAFall2006/
Principles of Good Assessment.pdf

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Anne Zald
University of Washington Libraries

Sunday, February 25, 2007

The Flashlight Approach to Evaluation, Summarized

For a variety of reasons, educators, their units and institutions now often gather evidence in order to evaluate 'stuff'. By 'stuff' we mean tools, resources, and facilities such as course materials, computer software, classroom design, blended courses, distance learning programs, network infrastructure, libraries, ePortfolios, ... By "evaluate" we mean a purposeful gathering of the evidence needed in order to make better choices about what to do next.

These days when people think about evaluation, their thinking often begins and ends with 'outcomes assessment.' But information about outcomes is rarely enough, by itself, to show how to improve those outcomes. (For more on this point, click here.) The Flashlight approach to evaluation is designed to provide the best insights for improving outcomes for the least effort.

The Flashlight approach has a number of elements, of which these are the most important:

1. Activities: Focus on what people do with the 'stuff' at issue. For example, if you want to get more value from personal response systems (some of which are also known as 'clickers'), you first need to discover what faculty and students are actually doing with the clickers. For instance, are the clickers being used to create structured discussion about difficult ideas? to take attendance? to test memorization? Each of those patterns of use (which we refer to as 'activities') will create different benefits, costs, damage...

2. The dark side: Consider people's fears and concerns about the stuff, not just their hopes and goals. For example, if clickers are being used to take attendance, are students sending friends to class with a handful of clickers?

3. Motives, incentives, and disincentives: Value is created by the way people use stuff. So the best clues for increasing that value come from learning why people use the stuff as they do. For example, if you do a workshop on the value of using clickers for conceptual learning, and 30% of the participants don't start using clickers that way, you should investigate the reasons. It might be rooted in their personal approaches to teaching, their disciplines, your training, reactions of students, their facilities, ... No matter what you discover, it's almost always going to be useful in helping you figure out whether and how to get more value from clickers.

4. Education is not a machine: Even when the same faculty member teaches two sections of the same course, and has taught them for years, what students do and learn will differ. Add clickers, or any other technology that increases options for faculty and students, and that variation will probably increase. That's one reason we emphasize unique uses perspective for evaluation, and not just uniform impact approaches.

5. Collaboration: Whose choices influence how the stuff is used? Whose choices could change how the stuff is used? Those are the people who should be involved in helping design your study. If you need help in gathering data (e.g., getting good response rates to surveys), then they can help. They can also help you make sure that your questions and language are clear and compelling.

6. Start evaluating now: To improve outcomes (including costs), change activities. (Buying new stuff is just a means to change activities). And it's always the right time to begin studying an activity. So start evaluating now.
For example, if you're interested in using clickers to foster conceptual learning, start evaluating conceptual learning (including faculty development) now, whether or not you're using clickers yet, whether or not you're experienced in their use yet, whether or not you're considering replacing one kind of clicker with another. For example, if you are considering buying new stuff, your findings can help you
* choose products;
* remove barriers to effective use even before the technology becomes available, and
* provide baseline data for measuring the impact of the new stuff.
In short, whether or not it's time to buy, develop, or replace stuff, it's always time to begin studying activities that use, or that would use, that stuff.

The whole Flashlight Evaluation Handbook is designed to flesh out these and related ideas using examples and specialized guides. The TLT Group can also provide coaching (send us your draft plans and we can talk), collaborators in doing such studies, or we can even do them for you. So take a look at the Handbook and contact us if you'd like to talk (301-270-8311; flashlight@tltgroup.org).

PS. If your institution is a subscriber, feel free to use this summary for workshops. If your institution is not yet a subscriber, we would appreciate it if you would ask permission to use this material.