Speaking proficiency is commonly tested with a face-to-face or telephone interview. No two such interviews are ever the same, even if the tester asks only questions from a list, because of course the answers are free-form spoken language, always different. And in fact sticking to a prepared set of questions is not really effective, since it keeps the conversation from developing in a natural way.
So yes, an interview-form speaking test will necessarily be subjective in some degree. This has long been recognized in the language-testing community. The question, however, is whether subjectivity is a necessary evil or something benign and perhaps even advantageous in some ways.
In designing any test, the three most important goals are practicality, validity, and reliability. To oversimplify, practicality means that the time and resources for conducting the test can reasonably be made available, validity means that the test actually does measure what it claims to measure, and reliability means that if the test is repeated it will give the same result time and time again. From the test-taker’s point of view, asking whether a test is “fair” is mainly a question about its reliability.
Needless to say that a professionally conducted language test will not be influenced by personal factors. It will be irrelevant whether testers agree or disagree with opinions expressed. Even factual errors are to be ignored, since they have no bearing on what the test is intended to measure: use of the language. This kind subjectivity must be completely excluded in rating a test.
Trained language testers use a set of techniques designed to elicit a sample of the examinee’s proficiency. Ideally, the interview will resemble a friendly, casual conversation. Even so the test is actually a specialized procedure to explore different aspects of the examinee’s use of language, sufficient to allow for a well grounded judgment on the overall performance.
If the same person is tested twice within a short time, by different testers, a reliable test will usually give the same result. “Usually,” because performance will always be different on different occasions. What is being rated is the performance, not the person.
If the same performance (e.g. a recorded interview) is rated by different testers, a reliable test will give the same result with a high degree of probability. In a gymnastics competition, for example, each of the judges on the panel gives a score that is arrived at independently. Usually there is a range of scores, but as long as the range is not too large the judging is considered competent and fair. The reason for consistency in scoring is that gymnastics judges—and also language testers—are trained and practiced in using clear criteria for making their decisions. They are not relying merely on personal taste. And language testers, thankfully, are not asked to decide who gets gold, silver, and bronze, but only who gets assigned to which of several broad categories.