Update (September 3, 2012): I made some mistakes in my own implementation of RC4. An analysis of the mistake is documented in Mistake in my RC4 implementation. After fixing the mistake, I do not see the skewed distribution shown here. Please be advised that none of the blog posts in this blog are peer reviewed, and mistakes happen.On the quest to look for a pseudo-random number generator that is easy to implement, I gave up on Mersenne Twister but took Wikipedia's advice that a stream cipher can be used for generating pseudo-random sequences. But it also mentions that RC4 is not a particularly uniform stream cipher which makes statistical attacks possible. Because of the simplicity of RC4, I decided to implement it and see really how bad it is. Since RC4 outputs a stream of bytes, I took the histogram of the byte value. The initialization key is the integer value of a stack-allocated object pointer, repeated for 64 bytes, and it's run on a 64-bit machine. I'm not sure if the fact that the key is highly regular has anything to do with the following non-uniform result that I observed.
byte | count | byte | count | byte | count | byte | count | byte | count | byte | count | byte | count | byte | count |
0 | 5499938 | 32 | 50376 | 64 | 81740 | 96 | 253697 | 128 | 50004 | 160 | 144433 | 192 | 147856 | 224 | 147031 |
1 | 134854 | 33 | 50103 | 65 | 50386 | 97 | 115718 | 129 | 50024 | 161 | 49930 | 193 | 149593 | 225 | 50260 |
2 | 50352 | 34 | 50179 | 66 | 954060 | 98 | 50362 | 130 | 50185 | 162 | 50553 | 194 | 134910 | 226 | 78249 |
3 | 115332 | 35 | 50006 | 67 | 50091 | 99 | 50170 | 131 | 50543 | 163 | 50289 | 195 | 50516 | 227 | 49853 |
4 | 50214 | 36 | 50065 | 68 | 50256 | 100 | 50052 | 132 | 50288 | 164 | 49950 | 196 | 50532 | 228 | 50303 |
5 | 50089 | 37 | 50589 | 69 | 50173 | 101 | 50118 | 133 | 143783 | 165 | 49753 | 197 | 50182 | 229 | 50135 |
6 | 207241 | 38 | 49946 | 70 | 50416 | 102 | 50395 | 134 | 50447 | 166 | 50270 | 198 | 50145 | 230 | 134634 |
7 | 68820 | 39 | 50147 | 71 | 49985 | 103 | 49774 | 135 | 50252 | 167 | 50502 | 199 | 49368 | 231 | 49987 |
8 | 50212 | 40 | 103906 | 72 | 49950 | 104 | 50501 | 136 | 49803 | 168 | 50038 | 200 | 50032 | 232 | 50486 |
9 | 695154 | 41 | 49905 | 73 | 50137 | 105 | 50389 | 137 | 50085 | 169 | 50619 | 201 | 66006 | 233 | 49849 |
10 | 50439 | 42 | 49865 | 74 | 50472 | 106 | 50284 | 138 | 49508 | 170 | 50045 | 202 | 50336 | 234 | 50083 |
11 | 49682 | 43 | 948555 | 75 | 50141 | 107 | 50230 | 139 | 50294 | 171 | 50719 | 203 | 50300 | 235 | 50267 |
12 | 50378 | 44 | 49965 | 76 | 140698 | 108 | 50119 | 140 | 49772 | 172 | 50399 | 204 | 49963 | 236 | 50117 |
13 | 50466 | 45 | 50171 | 77 | 49814 | 109 | 50012 | 141 | 50385 | 173 | 50031 | 205 | 50072 | 237 | 49945 |
14 | 49591 | 46 | 50290 | 78 | 50219 | 110 | 90366 | 142 | 50177 | 174 | 50126 | 206 | 50585 | 238 | 50157 |
15 | 49776 | 47 | 50340 | 79 | 49916 | 111 | 49974 | 143 | 50224 | 175 | 50513 | 207 | 50350 | 239 | 50234 |
16 | 287052 | 48 | 205931 | 80 | 151232 | 112 | 131863 | 144 | 50079 | 176 | 50026 | 208 | 50226 | 240 | 131317 |
17 | 50022 | 49 | 50076 | 81 | 50134 | 113 | 49801 | 145 | 50487 | 177 | 50688 | 209 | 81920 | 241 | 50161 |
18 | 118991 | 50 | 50125 | 82 | 50108 | 114 | 50370 | 146 | 66545 | 178 | 78182 | 210 | 49998 | 242 | 50092 |
19 | 50209 | 51 | 50706 | 83 | 50292 | 115 | 50297 | 147 | 50236 | 179 | 50411 | 211 | 50458 | 243 | 50180 |
20 | 50170 | 52 | 50025 | 84 | 221044 | 116 | 116139 | 148 | 247793 | 180 | 215784 | 212 | 50185 | 244 | 50124 |
21 | 50371 | 53 | 49717 | 85 | 50259 | 117 | 50274 | 149 | 50327 | 181 | 758974 | 213 | 49808 | 245 | 90855 |
22 | 50365 | 54 | 49979 | 86 | 50302 | 118 | 50523 | 150 | 50228 | 182 | 50272 | 214 | 50685 | 246 | 50228 |
23 | 50467 | 55 | 50145 | 87 | 49797 | 119 | 105677 | 151 | 50108 | 183 | 50390 | 215 | 113514 | 247 | 50316 |
24 | 100861 | 56 | 78827 | 88 | 113581 | 120 | 138358 | 152 | 57005 | 184 | 53395 | 216 | 147492 | 248 | 50546 |
25 | 50158 | 57 | 49803 | 89 | 50264 | 121 | 50209 | 153 | 50095 | 185 | 50334 | 217 | 50396 | 249 | 50078 |
26 | 50483 | 58 | 50279 | 90 | 50176 | 122 | 49674 | 154 | 50292 | 186 | 50529 | 218 | 49915 | 250 | 50111 |
27 | 50211 | 59 | 49997 | 91 | 50337 | 123 | 50396 | 155 | 50453 | 187 | 49836 | 219 | 50170 | 251 | 50524 |
28 | 49859 | 60 | 50507 | 92 | 50313 | 124 | 50280 | 156 | 50059 | 188 | 50229 | 220 | 248053 | 252 | 49981 |
29 | 50580 | 61 | 50869 | 93 | 50417 | 125 | 49877 | 157 | 50355 | 189 | 50313 | 221 | 49937 | 253 | 50533 |
30 | 50142 | 62 | 50358 | 94 | 50292 | 126 | 50240 | 158 | 49821 | 190 | 50204 | 222 | 50139 | 254 | 49975 |
31 | 50265 | 63 | 50165 | 95 | 50352 | 127 | 244313 | 159 | 50410 | 191 | 50529 | 223 | 202938 | 255 | 246009 |
Now, this observation also indicates that RC4 is a very poor stream cipher. The way stream cipher works is essentially that a stream of pseudo-random bytes are generated and XOR-ed byte-wise with the plaintext in order to obtain ciphertext. The prevalence of 0 values means that most of the plaintext will be identical to the ciphertext because 0 is an XOR identity. The fact that this bad encryption is possible with some key choices should worry a lot of people who are still using RC4.
By a stroke of luck, I tried a slightly modified RC4 that actually gives me a pretty uniform distribution. The difference (from the pseudo-code listing of the RC4 wikipedia article) is highlighted in grey below.
i := 0
j := 0
while GeneratingOutput:
i := (i + 1) mod 256
j := (j + S[i]) mod 256
swap values of S[i] and S[j]
K := S[(S[i] + S[j]) mod 256]
output K ^ j // was just K before
endwhile
This modification gives me the following histogram.
byte | count | byte | count | byte | count | byte | count | byte | count | byte | count | byte | count | byte | count |
0 | 99723 | 32 | 100388 | 64 | 99739 | 96 | 99866 | 128 | 100026 | 160 | 100026 | 192 | 99831 | 224 | 99581 |
1 | 99941 | 33 | 99908 | 65 | 100363 | 97 | 100328 | 129 | 99741 | 161 | 100338 | 193 | 99792 | 225 | 99776 |
2 | 99968 | 34 | 100010 | 66 | 99589 | 98 | 100231 | 130 | 100649 | 162 | 100084 | 194 | 100218 | 226 | 99839 |
3 | 99896 | 35 | 99924 | 67 | 100465 | 99 | 100002 | 131 | 100093 | 163 | 99815 | 195 | 100478 | 227 | 100137 |
4 | 100439 | 36 | 99677 | 68 | 100004 | 100 | 99960 | 132 | 100165 | 164 | 99756 | 196 | 100064 | 228 | 100114 |
5 | 100081 | 37 | 99362 | 69 | 99664 | 101 | 99872 | 133 | 99887 | 165 | 100059 | 197 | 99523 | 229 | 100590 |
6 | 100570 | 38 | 99924 | 70 | 99648 | 102 | 99945 | 134 | 99459 | 166 | 99854 | 198 | 100170 | 230 | 100069 |
7 | 99812 | 39 | 99706 | 71 | 100066 | 103 | 99598 | 135 | 99532 | 167 | 99499 | 199 | 99956 | 231 | 99892 |
8 | 99969 | 40 | 100484 | 72 | 100074 | 104 | 99506 | 136 | 99831 | 168 | 100291 | 200 | 100164 | 232 | 99562 |
9 | 99840 | 41 | 99756 | 73 | 99128 | 105 | 100178 | 137 | 99646 | 169 | 100348 | 201 | 100121 | 233 | 99412 |
10 | 99793 | 42 | 100133 | 74 | 100019 | 106 | 100488 | 138 | 99882 | 170 | 99661 | 202 | 99864 | 234 | 99878 |
11 | 100208 | 43 | 100059 | 75 | 99717 | 107 | 100141 | 139 | 100374 | 171 | 100116 | 203 | 99643 | 235 | 100394 |
12 | 100157 | 44 | 99772 | 76 | 99846 | 108 | 100085 | 140 | 100180 | 172 | 99632 | 204 | 99928 | 236 | 100123 |
13 | 100098 | 45 | 100232 | 77 | 100017 | 109 | 100654 | 141 | 99955 | 173 | 99993 | 205 | 99858 | 237 | 100188 |
14 | 100017 | 46 | 100135 | 78 | 99566 | 110 | 99104 | 142 | 99640 | 174 | 99256 | 206 | 100243 | 238 | 99918 |
15 | 100259 | 47 | 100403 | 79 | 99459 | 111 | 99887 | 143 | 100255 | 175 | 99818 | 207 | 99612 | 239 | 99840 |
16 | 100139 | 48 | 99888 | 80 | 100828 | 112 | 99999 | 144 | 99820 | 176 | 100378 | 208 | 99663 | 240 | 99622 |
17 | 100093 | 49 | 100173 | 81 | 99633 | 113 | 100480 | 145 | 99804 | 177 | 99826 | 209 | 100564 | 241 | 99916 |
18 | 100257 | 50 | 99987 | 82 | 100502 | 114 | 100253 | 146 | 100554 | 178 | 99998 | 210 | 100183 | 242 | 100319 |
19 | 99835 | 51 | 100206 | 83 | 99776 | 115 | 99674 | 147 | 100726 | 179 | 99909 | 211 | 100548 | 243 | 99803 |
20 | 100010 | 52 | 99879 | 84 | 100188 | 116 | 99657 | 148 | 100255 | 180 | 99687 | 212 | 99937 | 244 | 100122 |
21 | 100424 | 53 | 100257 | 85 | 100197 | 117 | 100375 | 149 | 99866 | 181 | 99741 | 213 | 100669 | 245 | 99972 |
22 | 100062 | 54 | 99376 | 86 | 100123 | 118 | 100396 | 150 | 99623 | 182 | 100523 | 214 | 100195 | 246 | 100034 |
23 | 99729 | 55 | 100404 | 87 | 100620 | 119 | 100275 | 151 | 99594 | 183 | 99879 | 215 | 99921 | 247 | 100092 |
24 | 100076 | 56 | 99792 | 88 | 100383 | 120 | 99712 | 152 | 100641 | 184 | 99514 | 216 | 100226 | 248 | 100600 |
25 | 100107 | 57 | 100214 | 89 | 100079 | 121 | 99682 | 153 | 99723 | 185 | 99764 | 217 | 100120 | 249 | 99574 |
26 | 99785 | 58 | 100284 | 90 | 99835 | 122 | 100090 | 154 | 99805 | 186 | 99752 | 218 | 99850 | 250 | 100211 |
27 | 99839 | 59 | 99459 | 91 | 100332 | 123 | 100185 | 155 | 99557 | 187 | 100176 | 219 | 100167 | 251 | 100213 |
28 | 99700 | 60 | 99795 | 92 | 99966 | 124 | 100599 | 156 | 99709 | 188 | 100422 | 220 | 100028 | 252 | 100368 |
29 | 99776 | 61 | 99663 | 93 | 99596 | 125 | 99479 | 157 | 100266 | 189 | 100427 | 221 | 100172 | 253 | 99842 |
30 | 100319 | 62 | 99734 | 94 | 99962 | 126 | 99874 | 158 | 100068 | 190 | 100238 | 222 | 100058 | 254 | 99921 |
31 | 99537 | 63 | 100025 | 95 | 100043 | 127 | 100382 | 159 | 99910 | 191 | 99824 | 223 | 100697 | 255 | 99923 |
I think it is quite possible that this slight modification could be immune to the existing statistical attacks on RC4.
1 comment:
You're probably using an incorrect RC4 implementation. I was able to get similar statistics only when I wrote "S[S[i] + S[j]]" instead of "S[(S[i] + S[j]) & 0xFF]", which would overrun the state buffer in C, and cause it to read random memory (leading to lots of 0 outputs)
http://ideone.com/uE8LHN
Post a Comment