PS3 Replay Ranking Project

Discussion in 'Dojo' started by ice-9, Mar 20, 2007.

  1. ice-9

    ice-9 Well-Known Member

    How can one best determine a tier list? Most of the ways we do it now are highly subjective. One player judging will be biased with his or her own experiences (i.e. what we do); perhaps we overweight characters that we personally have match-up weaknesses/strengths against, or we simply may not know other characters well enough to make a good judgment. If you ask multiple people to rank characters (i.e. the Enterbrain tier list), you run the risk that one person’s score not being comparable to that of another. Looking at tournament results (as akai and I like to do) ignores the facts that some characters are more popular than others and that certain players will advance deep into tournaments no matter which character they play.

    From a purely theoretical point of view, the best way to determine balance is to have one person who has “mastered†every character face off against a clone of himself or herself, matching every character against every other character and then looking at the subsequent wins and losses to determine a ranking. Of course that’s impossible. Perhaps the next best thing is to find masters of each character to face off each other in a round robin tournament and then compile wins and losses.

    In fact, that’s exactly how the old GAMEST rankings were done. I also remember similar projects undertaken by the Tekken and Soul Calibur communities to determine tiers. But this sort of ranking hasn’t been done in a long time….or has it?

    Enter the PS3 replays. What I realized is that the PS3 replay matches have basically been organized around a character versus character round robin tournament format! Can we then compile the wins and losses from the replays to determine a tier list?

    Now this assumes that the players who play in the replays are in fact masters of their respective characters (which is why I asked my question in the other thread). Judging by their play from my shoes, that seems like a reasonable assumption to make, although I suspect that there might be players playing multiple characters in the replays. I.e. instead of gathering 17 players there might be only 9 playing the 17 characters. Note how at least one player is super adept at getting DM > OM K > back crumple in the replays.

    I do however have a few reservations. I question whether the Lau player is truly a Lau master. The Goh player, while he has good yomi, makes a ton of command error misses which has cost him matches. The Sarah player seems to be fooling around half the time.

    In any case, let’s assume that the players behind the replays are good enough for us to be able to conclude that the replay matches are representative of a high level of play. Given that...what do the results tell us?

    I divided the results into two categories: match wins/losses, and round wins/losses. The former says that it doesn’t matter how many rounds were won or lost, a win is a win and a loss is a loss. The latter says that it does matter how badly one is beaten; i.e. a character that loses 0-3 to another character may have a match-up problem and this should figure into the teir ranking. (Ideally one should play the other multiple times, but unfortunately we don’t have the privilege in this case and counting by rounds sort of accomplishes the same purpose).

    However, I actually prefer to look at it more from match wins/losses, because one got the sense that certain players gave chances when up 2-0 while others did not. Thus for me the match won/lost felt more “real†than the rounds won/lost.

    When you look at it from an overall match perspective, here is how the results break down:

    +2 (56% win rate): Brad, El Blaze, Jacky, Jeffry, Lei Fei, Wolf
    +0 (50% win rate): Akira, Eileen, Lion, Pai, Shun, Vanessa
    -2 (44% win rate): Aoi, Goh, Kage, Sarah
    -4 (38% win rate): Lau

    Just stop and think about those results for a minute. The most successful characters only won two games more than they lost. Except Lau, the most unsuccessful characters only lost two games more than they won. Those are incredible results – the gap between the “top†and the “bottom†is so narrow that differences might in fact be caused just by random error. I wish I had ranking results from other games to compare, but (Lau aside) a 12% winning ratio gap is almost immaterial.

    If you have a super tier character in the game, wouldn’t you expect at least a 70% or 80% win ratio? That’s what I would expect for Jin if we did this for Tekken 4 and Steve/Heihachi if we did this for Tekken 5. (VF5 does in fact have one god-tier character…Dural, who had a 100% win rate. Now that’s god tier.)

    What’s up with Lau? As I’ve said, I really don’t think the Lau player is very good; he can probably kick my butt but his Lau just doesn’t seem to be on the same level as other players' characters. You can tell just by watching the replays. But we know that Lau is definitely NOT low tier, that’s for sure.

    When you look at it from a rounds won/lost perspective, the distribution gets a little wider:

    +7 El Blaze
    +6
    +5
    +4 Akira, Brad, Jeffry, Lei Fei
    +3 Wolf
    +2 Eileen, Jacky, Pai
    +1
    +0
    -1 Shun
    -2 Lion
    -3 Kage, Vanessa
    -4
    -5 Aoi, Goh
    -6 Lau
    -7 Sarah

    Do you believe the above ranking more, or the ranking as determined by matches won or lost? If you believe the above, then you’d have to accept that El Blaze is super tier and that Shun, Kage, and Vanessa are low tier. /forums/images/%%GRAEMLIN_URL%%/smile.gif What the above results say in comparison to the match results is that characters like El Blaze won big when he won, and/or lost small when he lost; and vice versa for characters like Sarah. However, like I mentioned, I think the rounds won/lost metric is shakier than matches won/lost metric because certain players definitely gave chances when up 2-0 while others did not.

    So…what’s the overall takeaway? My overall takeaway is the following:

    +2 (56% win rate): Brad, El Blaze, Jacky, Jeffry, Lei Fei, Wolf
    +0 (50% win rate): Akira, Eileen, Lion, Pai, Shun, Vanessa
    -2 (44% win rate): Aoi, Goh, Kage, Sarah
    -4 (38% win rate): Lau*
    <span style='font-size: 8pt'>* Lau player not very good</span>

    And going by the above, the game as it is incredibly balanced. (The results are so even in fact sometimes I wonder if they were rigged). Perhaps the above is why AM2 chose not to put in any more gameplay changes for Version B or C.

    Am I saying that there are no tiers in VF5? No, of course not. There will always be tiers in a game that features more than one character. But what I am saying is that tiers are so narrow in VF5 that it is immaterial – even on a high level as the above demonstrates. So when you are playing this game and you seem to be having a hard time against a character...realize that you’re having a hard time against the player, not the character. Don’t bitch and moan about the game and how unfair the game is...think about what you can do to improve yourself and narrow the odds. If Jeffry can be +2 in matches won/lost and +4 in rounds won/lost against elite competition, then hell so can you at your level.

    The end. Crap this is a long post.

    P.S.

    While we’re on the subject of tiers, be sure to check out these additional resources:

    - Complete results from the PS3 ranking matches (please let me know if you find errors)

    - Enterbrain’s tier list

    - Akai’s tournament advancement list
     
  2. ice-9

    ice-9 Well-Known Member

    Quick note -- in the .xls file I attached for the PS3 replay data

    +3 = Won 3-0
    +2 = Won 3-1
    +1 = Won 3-2
    +0 = Same character, did not count
    -1 = Lost 2-3
    -2 = Lost 1-3
    -3 = Lost 0-3
     
  3. lotrzyna

    lotrzyna Well-Known Member

    One of the basic statistics rules says that even the biggest and the most quality(?) part of a whole group will not give you as good results as a whole group (population) would.(AFAIK /forums/images/%%GRAEMLIN_URL%%/wink.gif ) So I would just take a whole results from whole Japan to see witch character is "the best". I do not know how the SEGA characters ranking is build but I think that it is based on win ratio ?? So as long as Chibitas Lion is 1 in overall Japan ranking Lion is the best character etc - of course thats just a confirmation of your sentence that not characters but players are really good. I would base my statistic on SEGA ranking just due to one reason - those the best players play with a really BIG number of other players and from time to time they also play against each other.
    But still - I wouldn't care about tiers in VF5 cos IMO those are really small differences, not even in a 1/4 as big as in Tekken.

    And about that sentence "more than 1 character = tiers" - did International Championship Karate + have a tiers ? jk /forums/images/%%GRAEMLIN_URL%%/smile.gif

    SEGA probably has a necessary data and as lon as they do not make any changes in game play tiers must be infinitesimal I think.
     
  4. Myke

    Myke Administrator Staff Member Content Manager Kage

    PSN:
    Myke623
    XBL:
    Myke623
    No, because the data you're sampling is biased in that it was specifically "chosen" for a purpose.

    I agree with your conclusion though. /forums/images/%%GRAEMLIN_URL%%/smile.gif
     
  5. sanjuroAKIRA

    sanjuroAKIRA Well-Known Member

    How 'bout this...since dural is the "god" character & didn't lose a match, just go through and rank the characters according to how much bar they took from dural. Conclusions gleaned from analysis of this nature would be absolutely incontravertible.
     
  6. WrathX

    WrathX Well-Known Member

    Awesome work, ice. It's very refreshing to see a new perspective on this game, namely that it's not the character that is my difficulty, but the player. This in itself may help me overcome some mental hurdles. /forums/images/%%GRAEMLIN_URL%%/smile.gif
     
  7. ice-9

    ice-9 Well-Known Member

    villain:

    I disagree. If you were to look at the population numbers for the U.S., for example, you will find that there are many Gohs. Is Goh top tier? No. If we were able to record win-loss records for Goh, I'm guessing collectively the ratio will be low since most beginners will pick Goh but veteran players will play a different character.

    The example is extreme but the point is that population rankings is not necessarily a good indicator of a tier list, though I agree it provides important clues.


    Myke:

    <div class="ubbcode-block"><div class="ubbcode-header">Quote:</div><div class="ubbcode-body">No, because the data you're sampling is biased in that it was specifically "chosen" for a purpose.</div></div>

    You're assuming that players played more matches than were shown, and only a few were selected from all those matches for the PS3? Why are you so sure of your assumption? Did you read that somewhere? Would you like to elaborate further?

    If anything, I would guess that in the interest of time and money AM2 did a round robin and that was it. Many of the matches weren't particularly interesting. Also, if they played more than one match per character pair, it doesn't make as much sense to me that they would only show one instead of two matches.


    s_aki:

    OK smartass, so you think this discussion is not very helpful. You don't have to participate you know.
     
  8. Unicorn

    Unicorn Well-Known Masher Content Manager Wolf

    PSN:
    unicorn_cz
    XBL:
    unicorn cz
    Because Dural is so heavy, it is better to poke her that to juggle her. Because of this Pai will be the top tier and Jeffry/Wolf lowest one
     
  9. Myke

    Myke Administrator Staff Member Content Manager Kage

    PSN:
    Myke623
    XBL:
    Myke623
    I didn't read anything anywhere, and I based my assumption purely from a Quality or PR perspective.

    If I were in charge of putting the games on the disc, I'd ensure there was some (not perfect) balance in terms of who beat who, and that seemed to be apparent with your win/loss based results. Why would I do this? Well, let's say the Brad player was just feeling "off" for 12 of his 17 round-robin matches. Would I want those 12 bad games to be on the final disc? Of course not. This would be misleading to players buying my product, and sends the message that Brad is a poor character, or that my game was unbalanced. Conversely, what if Koedo playing Kage was just owning everyone during the round-robin window and didn't lose a single game? Again, same message.

    To give another example, which I think many players can relate to, consider when you're saving replays, or recording your own play sessions. Sometimes you'll just capture everything, and then later when deciding which movies to encode for uploading/sharing, you intentionally select those matches. You only display a subset of the total session.

    Again, if I were wanting to give a good, and balanced view, of what the characters in my game were capable of, I would have a somewhat even distribution in terms of who was beating who. If part of the intent of my replays were to instruct, or inspire (and not just entertain), again I'd hand pick the final cut, and not just leave it to chance.

    <div class="ubbcode-block"><div class="ubbcode-header">Quote:</div><div class="ubbcode-body">If anything, I would guess that in the interest of time and money AM2 did a round robin and that was it.</div></div>
    *shrug* I think a single day is more than enough time to capture a heap load of matches, especially if you factor in multiple setups. I really doubt this one day within the development cycle of the home port, however aggressive it was, would hamper AM2's budget or schedule. For all we know, they probably have a way of porting arcade replays to the ps3!

    <div class="ubbcode-block"><div class="ubbcode-header">Quote:</div><div class="ubbcode-body">Also, if they played more than one match per character pair, it doesn't make as much sense to me that they would only show one instead of two matches.</div></div>
    It makes plenty sense to me -- it's simpler to include only one match per character pair, than multiple matches. I already think the existing matches present on the disc could overwhelm a user for choice, and if you only double the matches per pair, you've exponentially increased the "overwhelminess". /forums/images/%%GRAEMLIN_URL%%/smile.gif And sticking with the theme of the bare-bones-absolute-minimum port, including only one match/pair fits in perfectly!

    Your entire project is based on some heavy assumptions, and I personally think my assumptions are more realistic that yours, but I don't wish to argue over assumptions with you and will be more than happy to agree to disagree.

    Bottom line is I still agree with your conclusions! HA!
     
  10. Unicorn

    Unicorn Well-Known Masher Content Manager Wolf

    PSN:
    unicorn_cz
    XBL:
    unicorn cz
    Then why Jeffry on Evo disk did not win minimally one match, if I remember well?
     
  11. Jide

    Jide Joe Musashi Silver Supporter

    PSN:
    Blatant
    Because it was just exhibition matches...
    It's not that big of a deal surely? Infact these matches could have been taken from the Japanese QA testers for all we know.
     
  12. ice-9

    ice-9 Well-Known Member

    Great point Unicorn. The Evo matches weren't very balanced.

    Myke, another point to keep in mind is that it would take hours to watch all the matches and cherry pick one. Hell it took me over 7 hours to watch them all on Sunday, and that was because I wanted to watch them not because it was my job either. Maybe I'm just lazy (everyone seems to think AM2 is lazy anyway) but if I was the guy responsible for the replays I'd just do a round robin and be done with it rather than spend...what...multiples of 7 hours to watch all the clips, decide which one shows what best, make sure everything is balanced, etc.?

    It looks like though that unless we know someone from AM2 we won't ever know for sure who has the right assumption.
     
  13. Myke

    Myke Administrator Staff Member Content Manager Kage

    PSN:
    Myke623
    XBL:
    Myke623
    But you're one guy, with one PS3, watching all the replays. Maybe they had a team, using multiple resources in parallel, and were given a week or two to go through them?

    You're right though, we just don't know.
     
  14. ice-9

    ice-9 Well-Known Member

    Did you check out akai's post? Every single character is represented in the KS4 national tournament!! Wow, this game is a lot more balanced than anyone thought.
     
  15. Myke

    Myke Administrator Staff Member Content Manager Kage

    PSN:
    Myke623
    XBL:
    Myke623
    Yes, I did. I thought I already agreed, twice in fact, with your conclusion on how balanced the game was? It sounds like you think I'm not convinced? For the third time: I agree with the conclusion, but don't agree with the experiment used to reach it.

    The KS4 qualifier result data are much better suited to advocate the balance in VF5. The sample size and sample period is sufficiently large enough, and you can guarantee that no bias in data selection -- it is what it is.

    Maybe you should have based your project on that instead. /forums/images/%%GRAEMLIN_URL%%/wink.gif
     
  16. ice-9

    ice-9 Well-Known Member

    Uhmm...OK, glad to see you share in my enthusiasm.
     
  17. akai

    akai Moderator Staff Member Bronze Supporter

    PSN:
    Akai_JC
    XBL:
    Akai JC
    Hmmm, I think we all agree at the highest level tiers really is not significant in VF? Anyway, more free time at work...KSIV Qualifiers Overall -

    1) win % 2) win-loss 3) net change in character distribution between qualifiers and KSIV final tournament (relisted for easier reference).

    Eileen 61.7% (50-31) +2.8
    Wolf 59.6% (34-23) +2.1
    Jacky 54.3% (57-48) +4.3
    Kage 53.2% (59-52) +2.6
    Shun 52.9% (36-32) +1.6
    Lion 52.6% (30-27) +3.6
    Brad 51.7% (15-14) +2.7
    Lau 51.4% (36-34) +1.2
    Pai 47.1% (33-37) -0.6
    Jeffry 46.7% (14-16) 0.0
    Lei 46.3% (31-36) -2.0
    Akira 44.3% (31-39) -2.8
    Blaze 44.1% (15-19) -2.2
    Sarah 43.3% (13-17) -1.7
    Aoi 42.1% (16-22) -0.8
    Vanessa 41.9% (18-25) -1.2
    Goh 39.1% (09-14) -1.0

    Same character matches were left out of the calculations. Note that for each player (with a few exceptions) they contribute only one loss to the characters win percentage. They can contribute as many wins before they are eliminated from the tournament qualifier. Note that all of these players contributing to the win percentages shown above, have to had won a "local tournament" to participate in the area qualifiers. So, my assumption is that they are all intermediate level players at least. Overall, win percentages of characters goes well with their net change in character distribution, with major exceptions being Eileen/Wolf compare to Jacky/Lion.

    A possbile explanation for the higher win percentage in qualifiers but less representation in the finals is what Ice-9 mentioned in the tier posts of higher percentage of elite players? No more calculations for me.
     
  18. ice-9

    ice-9 Well-Known Member

    Great work akai; I found your analysis highly insightful. Only 20% difference between the most successful and the least successful character -- that's really good!

    Like I stated in my original post, if there was a "god" tier character I would have expected >70% win rate but according to KS4 we have Eileen and Wolf at 60%. If we there is to be a Kuma in the game I would have expected <30%, but Goh is at 39%.
     
  19. Vortigar

    Vortigar Well-Known Member

    Good stuff this.

    Goh drops to the bottom once again. Was to be expected I guess. Fundementally his game remains unchanged in VF5.
     
  20. Jerky

    Jerky Well-Known Member

    Usually I despise character tier analysis in VF because it sometimes goes hand in hand with people feeling they're somehow a good player just because they play someone like say Goh (not aiming at anyone in particular) ... I'll tell you this: People shouldn't judge by character, but more or less by the player's ability to 1) adapt to change, 2) utilize efficient strategy (yes even if it's "this guy can't deal with elbow"), and 3) WIN.

    I feel sometimes people give honorary kudos to those who play unpopular characters. I'm not for that folks, sorry. Show what you can do WITH the character whether it's Pai, Shun, Jacky or whatever.

    YES I know I may be a bit biased because of my character choice since birth, but I feel VF is a game of wits and I hope people don't look to "tiers" as a way to validate themselves as a player or use these stats to discredit those who have genuinely played hard to get where they are with a popular character.

    Rant aside, I like seeing these experiments done for informational purposes.
     

Share This Page

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice