If this is your first visit or you haven't done so already, please subscribe to my RSS feed to get regular updates.

Showing posts with label research. Show all posts
Showing posts with label research. Show all posts

Saturday, November 24, 2007

Use A/B testing wisely - never in isolation

A recent discussion on a UX forum I participate in turned to the topic of A/B testing.  I really enjoyed the conversation so I wanted to reiterate some of the points I made, and expand on it a little bit as well.  It's not my goal to define A/B testing here but to share my opinion on its use.  I believe that even though A/B testing can be extremely valuable to help identify the best iteration of a site or a particular page, it should never be used in isolation.

Since A/B testing is relatively cheap to do and the results are so compelling, companies are in danger of adopting a "test and learn" culture where pages are just A/B tested with no additional user input.  That would be the wrong way to go.  A/B testing shouldn't be used on its own to make decisions, it should always be used in conjunction with other research methods -- both qualitative (such as usability testing, ethnography) and quantitative (such as desirability studies). 

A/B testing is an important method in the research toolkit because it can give you information that usability testing on its own cannot.  The main goal of A/B testing is to see how business metrics move up and down depending on the version of the page -- click through rates, checkout rates, purchasing rates, etc.  You can't see that with usability testing alone.  But as Kohavi et al. point out in their paper Practical Guide to Controlled Experiments on the Web, A/B testing has some major limitations:

  • Quantitative Metrics, but No Explanations. It is possible to know which variant is better, and by how much, but not why.  In user studies, for example, behavior is often augmented with users’ comments, and hence usability labs can be used to augment and complement controlled experiments.
  • Short term vs. Long Term Effects. Controlled experiments measure effects during the experimentation period, typically a few weeks.   It is wise to look at delayed conversion metrics, where there is a lag from the time a user is exposed to something and take action. These are sometimes called latent conversions.
  • Primacy and Newness Effects. These are opposite effects that need to be recognized. If you change the navigation on a web site, experienced users may be less efficient until they get used to the new navigation, thus giving an inherent advantage to the Control. Conversely, when a new design or feature is introduced, some users will investigate it, click everywhere, and thus introduce a "newness" bias.
  • Features Must be Implemented. A live controlled experiment needs to expose some users to a Treatment different than the current site (Control). The feature may be a prototype that is being tested against a small portion, or may not cover all edge cases.  Nonetheless, the feature must be implemented and be of sufficient quality to expose users to it.
  • Consistency. Users may notice they are getting a different variant than their friends and family. It is also possible that the same user will see multiple variants when using different computers (with different cookies).

As with most things, it is important to use A/B testing responsibly.   Since every research/testing method comes with its own limitations, a combination of methods is the only way to get the full picture and make the right decisions.

Tuesday, October 16, 2007

Measuring the effectiveness of content on e-commerce sites

I've been thinking about the different ways to measure the effectiveness of content/text on e-commerce sites, and more specifically, how to select the best version if you have a variety of different alternatives in front of you, each with its own group of fans who want to get it on the site right away!  Since the "Voice" of a web site can be such an abstract, arbitrary decision, how can we apply methodologically robust research methods to help make these decisions? 

First, I would define "effectiveness" in this context as the optimization of the following 3 concepts:

  • Do users understand what you are trying to tell them and what action they should take to be successful in their task?
  • Are you invoking the desired emotions with your content?
  • Does the proposed content result in higher conversion rates than other alternatives?

It's so important to combine the user perception data (the first two bullets) with business metrics (the last bullet).  From my experience the only way for user experience researchers to affect change is if we can show the positive impact these changes can have on engagement/revenue metrics.

It seems to me that you will be well served by using the following 3 methodologies to measure the relative effectiveness of different versions of the same content.  This is also a really nice way to progressively reduce the number of alternatives down to the best solution:

  • Usability testing.  Start with several different version of the content (~10), along with the current version (if it exists).  Ask users in a lab setting what they understand the content to mean, and any other thoughts they have on the way it sounds.  This should help narrow down the alternatives to 4-6 possibilities.
  • Desirability testing.  Use the Desirability method, but adjust it for use in large sample online surveys by turning it into a between-subjects experimental design.  In the survey, users are asked to rate the content on different brand and design attributes.  This way you can determine what emotional response the content extracts out of users.  You'd also be able to ask users which version of the content they'd prefer, and why.  This method has the added benefit of large numbers to give you confidence in the statistical significance of the results.
  • A/B testing.  Once you've narrowed the alternatives down to 2 or 3, live A/B testing can help you determine which of the alternatives perform better from a revenue or engagement perspective, by looking at differences that can be attributed purely to content changes.  This obviously works easiest when the content is directly related to a revenue-generating task, like the call to action on a checkout page, for example.  But it's not just about revenue -- there are great ways to measure metrics of engagement with the page, which is just as powerful.

Now, I can see 2 issues that make this a pretty difficult task, and it's the reason why the above 3 methods should not be used in isolation.  In combination, they help tell the whole story.

  • It is difficult to know what users really read on a page.  In the first two methods you pretty much have to show people what to read -- that doesn't happen when they visit your site organically with no-one looking over their shoulder.  This is why A/B testing is so important as it gives you a sense of how behavior will change based on content.
  • It is difficult to isolate the effect of content changes from the other influencing factors on a page.  This is the really difficult part.  How do you know that conversion/engagement improved because of the content and not of some other factor on the page, like visual design changes?  That is why it is important to keep the rest of the page exactly the same, and also why usability and desirability testing is important to bring out the perceptual data from users.

This is of course by no means the only way to do this, but I think it's a good approach that balances methodological rigor with the dangers of not overdoing it.  I'd be curious if anyone has any thoughts or ideas on how to improve on this approach...

Tuesday, September 18, 2007

Conference Presentation: Customer Researchers as Health Professionals

A colleague and I will be in Las Vegas next week for the yearly AMA Marketing Research Conference. We will be presenting a session on user experience research entitled Customer Researchers as Health Professionals - How eBay Uses Research to Improve Product Health (view a slightly shortened version online here, or below).

I'll try to summarize briefly what we will be talking about. We start with a little context about how research works at eBay and where it sits in the organization, and then we go over the research strategy I head up at eBay, called Product Health. We use a variety of quantitative and qualitative user experience research methods to track the health of different eBay site areas (or key flows) over time. Below is a schematic of the different research components of Product Health, which, as you can see, aims to cover a broad and holistic view of the Product:

It is our point of view that there is no single research methodology that can tell a complete story. If you want to have a holistic view of your product and how users feel about it and interact with it, it is essential to combine a variety of methodologies together, and keep doing the research over time so that you are able to (1) accurately assess how you're doing, and also (2) identify the areas that need to be improved. The quantitative components help us understand what is happening, and the qualitative components help us understand why we're seeing what we're seeing, and how to fix it.

We also spend some time talking about a synthesis project we did, where we pulled together the results from all the different Product Health research components, and used the insights to come up with 5 guiding principles for product development, which we believe translates pretty well to any e-commerce organization:

  • Build Trust. Be responsive to customer needs for security and service. Ensure the quality and accuracy of the information available on the site. Build responsibility and accountability in the online community.
  • Simplify. Streamline and clarify processes, navigation, site performance issues and fees that can make the customer’s experience unnecessarily complex and impact commerce.
  • Be Relevant. Design an experience around how people naturally explore, evaluate and purchase items. Provide relevant, quality information that supports this experience.
  • Provide value. Align buyers and sellers with a common means of determining item value fees. Treat sellers as “paying customers” who deserve value-added service for the fees they pay.
  • Connect people through commerce. Leverage areas on the site to connect people with common interests through commerce.

We feel that this is a great example of how user-generated insights gathered in a methodologically robust way can drive product strategy and resource allocation effectively within an online organization. So, anyway, if you happen to be in Vegas next week, come check it out!

Thursday, August 23, 2007

Usability testing on Halo 3

Wired Magazine just published an article about usability testing on Halo 3, the much-anticipated next installment in the Halo video game franchise, and the first Halo game for Xbox 360.  Video game usability has always fascinated me (and I try to get hands-on experience in this area as much as I can...), and this is the first time I've seen a mainstream magazine cover it in such detail.

Halo is a genre-changing first person shooter that brought gaming to a new level with its intricate story-line, cinematic feel and epic soundtrack.  And the creators got there through endless hours of testing...  Here are some excerpts from the article showing how they left no stone unturned:

The room we're monitoring is wired with video cameras that Pagulayan can swivel around to record the player's expressions or see which buttons they're pressing on the controller. Every moment of onscreen action is being digitally recorded.

Midway through the first level, his test subject stumbles into an area cluttered with boxes, where aliens — chattering little Grunts and howling, towering Brutes — quickly surround her. She's butchered in about 15 seconds. She keeps plowing back into the same battle but gets killed over and over again.

"Here's the problem," Pagulayan mutters, motioning to a computer monitor that shows us the game from the player's perspective. He points to a bunch of grenades lying on the ground. She ought to be picking those up and using them, he says, but the grenades aren't visible enough. "There's a million of them, but she just missed them. She charged right in." He shakes his head. "That's not acceptable."

After each session Pagulayan analyzes the data for patterns that he can report to Bungie. For example, he produces snapshots of where players are located in the game at various points in time — five minutes in, one hour in, eight hours in — to show how they are advancing. If they're going too fast, the game might be too easy; too slow, and it might be too hard. He can also generate a map showing where people are dying, to identify any topographical features that might be making a battle onerous. And he can produce charts that detail how players died, which might indicate that a particular alien or gun is proving unexpectedly lethal or impotent.

Pagulayan and his team have now analyzed more than 3,000 hours of Halo 3 played by some 600 everyday gamers, tracking everything from favored weapons to how and where — down to the square foot — players most frequently get killed.

The article goes into many more interesting examples of how they solved user issues with clever design.  Be sure to check it out.  And if you haven't seen the Halo 3 trailer yet, here it is for your viewing pleasure...

Wednesday, August 22, 2007

Google ads unethical or just clever design?

Jakob Nielsen's latest Alertbox is sure to get some defensive responses from the folks at Google...  He publishes results from a recent eye tracking study that clearly shows that users do not look at banner ads on web sites at all when they are looking for information or engrossed in the content on the page.  This isn't particularly new information, we've known this for a long time, but he does take it a step further.  First, he explains that there are three main design elements that are effective at attracting eyeballs to online ads (Plain textFaces and of course Cleavage and other "private" body parts).  Then he goes on to explain a fourth design element:

In addition to the three main design elements that occasionally attract fixations in online ads, we discovered a fourth approach that breaks one of publishing's main ethical principles by making the ad look like content:

  • The more an ad looks like a native site component, the more users will look at it.
  • Not only should the ad look like the site's other design elements, it should appear to be part of the specific page section in which it's displayed.

This overtly violates publishing's principle of separating "church and state" -- that is, the distinction between editorial content and paid advertisements should always be clear. Reputable newspapers don't allow advertisers to mimic their branded typefaces or other layout elements. But, to maximize fixations, that's exactly what you should do in a Web ad.

A specific ad may or may not be ethical, depending on how closely it masquerades as content. I caution against going too far, because it can backfire and mislead users. Unethical ads will get you more fixations, but ethical business practices will attract more loyal customers in the long run.

It doesn't take a genius to figure out that he's taking a shot at Google here, because they're obviously really good at making ads look like native site components on their search results pages.  My question is if it's really unethical or just clever design?  How will we know if users are annoyed by these ads, or if the relevance of the ads makes it ok in their minds?  I would be interested to know what kind of answers follow-up qualitative research might uncover.  Eye tracking by itself won't show you how people are feeling about these ads.  My guess is that users wouldn't care as long as the ads are relevant...

Thursday, August 16, 2007

Making the case for ethnographic research to inform design

As a big fan of ethnographic research, two recent articles caught my attention.  Though pretty basic and meant for audiences who are not familiar with this type of research, the articles do make a good case for using this methodology in a business context as part of the design process.  I wanted to quote a few paragraphs that showcase insights that can be gained from ethnographic research that you can not get as effectively from other methodologies.  Bold emphasis added by me.

From a United Airlines publication entitled Executive Secrets - Greenhouse Effect:

Though ethnography is indeed a social science, a number of companies use it to gain a greater understanding of their customers. Their objective is to garner information to help create and develop products and services that better meet customers’ needs — especially those that customers haven’t yet articulated.

Jan Chipchase is an ethnographic specialist with the communications firm Nokia. Last summer, he and a team of designers and other ethnographers spent several weeks in Uganda. They traveled to multiple villages and lived with and observed the residents going about their activities. “We’re charged with bringing the experiences of the local culture into the company,” Chipchase says.

While in Uganda, Chipchase’s team noticed local entrepreneurs who had purchased their own cell phones and then sold minutes to other residents. Because customers paid in advance for their calls, they kept close track of their allotted minutes. Drawing from these observations, Nokia designed phones for use globally so that callers could easily see on the screen the number of minutes used per call.

To determine which observations are significant, the researchers focus not on the sensational but on the patterns that appear. Their goal is to find the actions that are common across many participants and discern their meaning. The insight that results can be compelling. “When it’s done right,” says Chipchase, “ethnography can inform and inspire the design process.”

From a Business Week article entitled Nokia's Global Design Sense:

Our process starts with a team of anthropologists and psychologists working in our design group. They spend time with specific types of people around the world to understand how they behave and communicate. This helps us to understand better and to spot early signals of new patterns of behavior that could be harnessed into mobile communication. Our designers often go out into the field to understand the world they are designing for. All of these observations are brought into the design process to inspire and inform our ideas.

One thing both articles don't mention is the fact that for ethnographic research to be truly effective, it should never be done on its own.  Exactly where in the research process ethnography fits in depends on the specific situation:

  • Ethnographic research is usually done at the beginning of the design process, as is the case in most of the examples in the articles I just referenced.  This research is used to uncover user needs and help designers come up with high-level concepts.  This should be followed up by quantitative work (desirability surveys, needs & attitude surveys) as well us additional usability testing to further flesh out the concepts.
  • Ethnography can also be extremely useful as a follow-up methodology to quantitative research like segmentation studies.  Once a market segmentation has taken place, ethnography can help companies understand each segment better by "living a day in their shoes" and understanding how customers use their products/web site within the context of the rest of their lives.

Ethnography is not an easy methodology to get right -- observing people is easy; knowing what to look for and how to uncover unmet needs and desires is not.  But it can yield extremely valuable insights that all levels of the business can utilize.

Wednesday, August 1, 2007

Social Networking approaches for e-commerce Web sites

I came across this pretty cool ethnographic research study on cell phone usage that is also relevant to how we think about community on the Web.  It’s interesting both from a methodological and a findings point of view.  It’s a pretty short deck, so check it out:
http://sfaapodcasts.files.wordpress.com/2007/05/sfaa-2007-metcalf.pdf

This from the authors:

When we talk about the "user experience" the main emphasis is often on an individual's experience with a particular technology. Even with a purported social technology, for example a social networking site, we still tend to create for the individual's interaction with the site (how does someone find their friend, how do they access this site easily from a mobile device).

However, designing for sociability means thinking about how people experience each other through the technological medium, not just thinking about how they experience the technology. The emphasis is on the human-to-human relationship, not the human-to-technology relationship. This is a crucial difference in design focus. It means designing for an experience between people.

Of course designing for an experience between people doesn't mean ignoring the interaction with the device, but it calls for taking something else into account. That "something else" is often another person or people. How do we, as developers of communication technologies, make the communications more interesting, more exciting and more stimulating for the receiver? How do we help our users meet the needs of the other people in their social network? How do we create a shared experience that is equally compelling for all participating parties? When we begin to think like this, we truly start to think of designing social software, social applications, social media.

A lot of e-commerce web sites seem to be scrambling to figure out how to deal with the social networking phenomenon, and in my opinion there are a lot of knee-jerk reactions going on.  E-commerce sites shouldn't try to become social networking sites.  They should leverage their commerce platform to connect people to each other.

A great example is Facebook apps.  eBay has a brand new Facebook application that allows users to connect their eBay profiles to their Facebook accounts.  You can see others' watch lists and even add items to their list if you think they might be interested.  It's also another avenue for sellers to showcase the items they have for sale.  By using its commerce platform to integrate into an existing social networking site, eBay is building on its strength as an online retailer and plugging into an enormous network without re-inventing the wheel by trying to become a social networking site unto itself.