引入

由于现在反爬虫机制的加强,传输数据的中间流程也发生了改变,一些不希望被爬取的网站纷纷加入了一些防盗设施,这是反反爬虫的解析 js 进行抓取的流程变得必不可少。

需要解析的 js 包含什么

就我现在所见过的需要解析 js 的爬虫

  • 不需要请求 key 或者 cookie 里面需要传递什么重要参数的爬虫。简单的 ajax ,直接通过打请求就可以获得数据。WebSocket ,http 1.1 建立的长连接方法,可通过 webSocket.send 发送服务器所要的请求参数直接获取相关数据。

  • 需要进行进一步解析的 key 或者 cookie 里面需要加入特殊字符的爬虫(由于页面中请求 js 的特性,需要先把源文件下载到本地浏览器中,再执行。这类页面的加密其实都应该是明文的。但由于做了混淆,可能我们阅读起来会非常的费力),其中加密的方式有以下几种:

    • String.fromCharCode() 系列及其变种。这种一般是将一段数字,或者函数变成文字,然后再转成 charcode 最后保存在页面中。还原方式就是逆向,然后再 eval
    • 将几个数学函数传入,并进行计算,最终通过返回值确定
    • 自定义函数,并封装到某个 js 中,最终通过此函数进行计算并返回相关值,同时有可能检测当前浏览器状态
    • 函数中添加很多注释,导致函数无法阅读

具体情况

总之,上面的方法归根结底就是将函数变字符串,中间随便加注释,或者加几个数学函数,再 eval 一下这样。难并不难,只是繁琐。现在以我查看 elong 验证的情况具体介绍遇到这种状况时的各个处理方式。

查看页面

打开页面 http://m.elong.com/ihotel/315197/?source_id=315197#detailTab并查看源码

首先我们看到了一个十分不和谐的字符串 屏幕快照 2017-07-10 上午12.04.35

并能够找到与之配合的 js

屏幕快照 2017-07-10 上午12.04.43

1
eval(function(p,a,c,k,e,d){e=function(c){return(c<a?"":e(parseInt(c/a)))+((c=c%a)>35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--)d[e(c)]=k[c]||e(c);k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1;};while(c--)if(k[c])p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c]);return p;}('3 i(){g{2 a=$("#8").f();5(4==a||a==\'\'||a==\'${8}\'){0-6};2 b=7(a);2 c=9(b);0 c}d(e){0-6}};3 7(a){5(4==a||a==\'\'){0 a};2 b=a.j(/\\)\\^-1/h,")&-1");0 b}',20,20,'return||var|function|null|if|99|hijklmn|tsdDetail|eval||||catch||val|try|gm|abcdefgDetail|replace'.split('|'),0,{}))

阅读后发现这个函数先生成一个匿名函数,此函数有 6 个参数,并且此函数在生成的同时被调用,传入下方的若干参数,最终结果会被 eval 。

简单查看 function 后,发现这个函数实际上是一个字符串转化的函数,通过下方给入的值以及相关判断规则,确定这个字符串应该如何由 charcode 的偏移生成一段合理的字符串。即通过 a,c,k,e,d 这些对象去修改 p 的值 p=p.replace(new RegExp('\\b'+e(c)+'\\b','g')

此函数在运行后会生成类似这种的方法

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
function abcdefgDetail() {
    try {
        var a = $("#tsdDetail").val();
        if (null == a || a == '' || a == '${tsdDetail}') {
            return -99
        }
        ;
        var b = hijklmn(a);
        var c = eval(b);
        return c
    } catch (e) {
        return -99
    }
};function hijklmn(a) {
    if (null == a || a == '') {
        return a
    }
    ;
    var b = a.replace(/\)\^-1/gm, ")&-1");
    return b
}

查看后此方法每次生成的都相同,所以我们构造 javascript 函数可以从这一步开始,而不是第一个地方

此函数的效果为从 tsDetail 中提取值,通过替换其中的字符串 replace(/)\^-1/gm, “)&-1”),生成 b ,从而 eval 生成 c。

伪造一个简单且危险的请求测试此部分是否可以正常返回值

想法很好,尝试构造简单的 python 调用方法(这样做十分危险,确保你使用的是 PhantomJS 这种带沙盒的环境)。这样获取 check_code ,repr 的原因是因为我们需要原始的字符串,而不是可打印的字符串。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
value_str = doc('#tsdDetail').val()

ph_runtime = execjs.get('PhantomJS')
js_func = ph_runtime.compile(
   '''
       var a=%s;
       var b=a.replace(/\)\^-1/gm, ")&-1");
       var c=eval(b);
   ''' % (
       repr(value_str),))
       
check_code = str(js_func.eval('c'))

为了强调这一点,我不惜使用标题来说。这样的话一般来说是会失败的,不失败的话危险也是十分巨大的。

为什么?

首先我们只知道了 b 的值是怎样的,但是 eval(b) 的结果会是怎样的我们并不知道,同时,即使返回了正确的数据,万一对方在 b 中加入了什么不为人知的代码的话,对我们的损害时毁灭的(比如 rm -rf 这样,如果你的权限管理不好,或者没用沙盒,是不是感觉很爽?)

进一步查看 b 中的信息

继续我们的内容,这样返回结果是失败的。我们对 b 的内容继续查看,我们其实只需要查看 b 函数究竟是怎样的就可以明确里面运行了是什么,但由于这个字符串加了很多乱码类型的注释,干扰了我们的查看。

乱码类型的注释像这样。

1
2
/*asdfasdkasl;kfdkl;asdkl;f*/var/*asjdfjklwioerjiowkfskldf*/a/*ajsldfjklwioeruioskdf*/=/*asjdfjklasdjlkfkljasdf*/1/*asdfjklasdjkljklasdfklj*/;
/*asjdjklfasdfjlkjkl*/console.log(/*asdjfjkasdfjklalsjkdf*/a/*jasdjlkfjlkasdfljalkjsdf*/); 

虽然直接看的话还是能看懂的,但是它严重干扰了我的视线,让我非常不舒服。于是乎使用正则匹配出并将其删除。这种注释只能是多行的,所以只需要匹配 /**/ 删除就可以,像这样。

1
re.sub('(/\*[\s\S]*?\*/)',' ', your_code)

如果在页面上调试,可以直接用 js 的 regex 删去,同理:

1
js_code.replace(/\/\*[\s\S]*?\*\//g,' ');

处理后生成了这段 js

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
var__JtOCokI16 = String
    .fromCharCode;


var_x_QdX = [2891, 427, 1955, 1458, 1704,];//fUXRd9n


var_$cYbd = function () {
    returnarguments[0] ^
    _x_QdX[0];
};


var_$nK07 = function () {
    returnarguments[0] ^
    _x_QdX[1];
};


var_$w4I = function () {
    returnarguments[0] ^
    _x_QdX[2];
};
var_$DCDm = function () {
    returnarguments[0] ^
    _x_QdX[3];
};
var_$Bkcs = function () {
    returnarguments[0] ^
    _x_QdX[4];
};
eval(__JtOCokI16(32)
    + __JtOCokI16(Math.abs(101) & -1, 0x76, 0x61, 0154, 40, _$w4I(03613), 0x6607 >> 4 >> 4, 117, 110, -1 - ~(0x63 ^ 0),
        _$w4I(2007), 0x6976 >> 4 >> 4, 0x6f, 110, 403 / 0xA, ~~120, _$DCDm(~~1435), 0x7b39 / 0400, 118, 97,
        _$w4I(0x7d1), Math.abs(32) & -1, 100, 61, ~(0x22 ^ -1), 0x2250 >> 4 >> 4, 59, ~(0x76 ^ -1), ~~97, 1143 / 0xA,
        0x2081 >> 4 >> 4, 112, 61, Math.abs(48) & -1, 59, 0x7766 / 0400, ~(0x68 ^ -1), Math.abs(105) & -1, 0x6c, 101 & (-1 ^ 0x00),
        0x0 | 0x28, _$DCDm(Math.abs(1474) & -1), ~(0x3c ^ -1), 0x7832 / 0400, -1 - ~(0x2e ^ 0), 0x0 | 0x6c, 0x65, 110 & (-1 ^ 0x00), 103, ~(0x74 ^ -1),
        0x6873 >> 4 >> 4, 051, 0x7b77 / 0400, 105, 102, 40 & (-1 ^ 0x00), 120 & (-1 ^ 0x00), -1 - ~(0x2e ^ 0), _$Bkcs(0x0 | 0x6cb), -1 - ~(0x68 ^ 0),
        _$w4I(0x7c253 >> 4 >> 4), 114 & (-1 ^ 0x00), 65, ~(0x74 ^ -1), 40, 112, 0x0 | 0x29, ~(0x21 ^ -1), 075, 0x22,
        ~~96, 0x0 | 0x22, 41, -1 - ~(0x64 ^ 0), 43, _$w4I(1950 & (-1 ^ 0x00)), _$DCDm(1482), 0x2e72 / 0400, -1 - ~(0x63 ^ 0), _$cYbd(0xb23),
        97, 0x72, ~(0x41 ^ -1), -1 - ~(0x74 ^ 0), ~~40, 0x70, 433 / 0xA, 43, 410 / 0xA, 59,
        -1 - ~(0x65 ^ 0), 108, 115 & (-1 ^ 0x00), 101, ~(0x7b ^ -1), _$w4I(2005), Math.abs(97) & -1, 114, ~(0x20 ^ -1), -1 - ~(0x6c ^ 0))
    + __JtOCokI16(_$w4I(0x79e06 / 0400), _$DCDm(14824 / 0xA), -1 - ~(0x2e ^ 0), Math.abs(99) & -1, _$cYbd(~~2851), ~(0x61 ^ -1), 114, ~(0x43 ^ -1), _$Bkcs(1735), 100,
        101 & (-1 ^ 0x00), 65, 116, _$Bkcs(Math.abs(1664) & -1), 1127 / 0xA, 0x2b, -1 - ~(0x33 ^ 0), 41, _$Bkcs(1669), ~(0x32 ^ -1),
        -1 - ~(0x38 ^ 0), _$DCDm(~(0x589 ^ -1)), Math.abs(105) & -1, _$Bkcs(0x6ce49 / 0400), 0x28, Math.abs(108) & -1, 0x3e66 >> 4 >> 4, _$nK07(415), 41, 0x64,
        Math.abs(43) & -1, 61, _$nK07(0x0 | 0x1cf), 0x0 | 0x2e, 115, 0x0 | 0x75, 98, 0x0 | 0x73, 1160 / 0xA, 114,
        _$w4I(1931), 0x0 | 0x64, 0x0 | 0x2e, 0x6c87 / 0400, ~(0x65 ^ -1), 110, ~~103, 0164, 0x68, 0x2d87 / 0400,
        120, 46 & (-1 ^ 0x00), 99, 104, 97, Math.abs(114) & -1, 67, 111, 0144, ~~101,
        -1 - ~(0x41 ^ 0), 116, 40, 0x70, ~(0x2b ^ -1), 49, 41 & (-1 ^ 0x00), 42, 57, 54,
        45, ~(0x78 ^ -1), 46, 99, 0x6843 >> 4 >> 4, -1 - ~(0x61 ^ 0), 114, 67, 111 & (-1 ^ 0x00), 0x6466 / 0400,
        101, 0x4135 >> 4 >> 4, 116, 0x2871 / 0400, 112, _$w4I(1928), 50, 051, 43, -1 - ~(0x33 ^ 0),
        -1 - ~(0x31 ^ 0), ~~48, 52, 0x0 | 0x2d, 0x6c, -1 - ~(0x2c ^ 0), _$DCDm(1502), 41, _$Bkcs(0x693), ~~101)
    + __JtOCokI16(0154, -1 - ~(0x73 ^ 0), 101 & (-1 ^ 0x00), Math.abs(32) & -1, ~(0x64 ^ -1), 43, 61, _$nK07(0x18908 / 0400), 96, 34,
        0x3b, _$DCDm(~~1474), 0x2b, 61 & (-1 ^ 0x00), 0x34, 125, 125, _$nK07(Math.abs(473) & -1), ~(0x65 ^ -1), 116,
        0x7564 >> 4 >> 4, 114, 110 & (-1 ^ 0x00), ~(0x20 ^ -1), 100 & (-1 ^ 0x00), ~(0x7d ^ -1), 41, 0x2817 >> 4 >> 4, ~(0x22 ^ -1), 101,
        120, 101, 0143, 117 & (-1 ^ 0x00), 116, 0145, _$DCDm(Math.abs(1434) & -1), _$nK07(386), _$Bkcs(03223), 102,
        0x0 | 0x75, 110, -1 - ~(0x63 ^ 0), 0x7400 >> 4 >> 4, -1 - ~(0x69 ^ 0), 111, 1105 / 0xA, 32, 96, 32,
        42, 0x2554 / 0400, 320 / 0xA, ~(0x7b ^ -1), 116, 0x0 | 0x72, 121, -1 - ~(0x20 ^ 0), 123, 0x72,
        _$w4I(19908 / 0xA), Math.abs(116) & -1, 0x7590 / 0400, 0162, Math.abs(110) & -1, 32, -1 - ~(0x64 ^ 0), 0x0 | 0x28, 57, 52,
        56, Math.abs(44) & -1, 32 & (-1 ^ 0x00), 51, -1 - ~(0x30 ^ 0), 57, 44, -1 - ~(0x20 ^ 0), 0x3920 >> 4 >> 4, ~(0x32 ^ -1),
        53 & (-1 ^ 0x00), 0x3448 >> 4 >> 4, 57, 0x3814 / 0400, 061, 442 / 0xA, 32, 0x0 | 0x34, -1 - ~(0x38 ^ 0), 48,
        44 & (-1 ^ 0x00), Math.abs(32) & -1, ~(0x36 ^ -1), ~~53, 49, 443 / 0xA, ~(0x20 ^ -1), 54, _$Bkcs(1681), 48 & (-1 ^ 0x00))
    + __JtOCokI16(~(0x2c ^ -1), 32, 530 / 0xA, Math.abs(55) & -1, 504 / 0xA, 445 / 0xA, 32 & (-1 ^ 0x00), 0x3542 >> 4 >> 4, 062, -1 - ~(0x32 ^ 0),
        414 / 0xA, 59, 0x7d, 0x2049 >> 4 >> 4, 99, 0x61, 0x74, Math.abs(99) & -1, 0x6847 / 0400, 0x2066 / 0400,
        0x28, 0145, 0x2950 / 0400, 96, Math.abs(32) & -1, 89, 37, 45, 57 & (-1 ^ 0x00), 0x0 | 0x39,
        0x3b00 >> 4 >> 4, Math.abs(125) & -1, 125, 96 & (-1 ^ 0x00), -1 - ~(0x20 ^ 0), 127, Math.abs(37) & -1, 100, ~~40, 0141,
        0x0 | 0x61, 44, 32, 0142, 98, 441 / 0xA, -1 - ~(0x20 ^ 0), 99, ~(0x63 ^ -1), 0x2c,
        ~(0x20 ^ -1), 100 & (-1 ^ 0x00), 0x0 | 0x64, 44, 040, 0x0 | 0x65, 101, 054, 32, 102,
        0x66, 054, ~~32, ~(0x67 ^ -1), 1037 / 0xA, 0x2c74 / 0400, 32 & (-1 ^ 0x00), 1045 / 0xA, 104, 41,
        -1 - ~(0x20 ^ 0), ~~123, 118, 97, ~(0x72 ^ -1), 0x20, 112 & (-1 ^ 0x00), 977 / 0xA, 114, 971 / 0xA,
        109, _$DCDm(1411 & (-1 ^ 0x00)), Math.abs(54) & -1, Math.abs(32) & -1, _$cYbd(Math.abs(2934) & -1), Math.abs(32) & -1, 97, 0x61, 0x3b32 / 0400, 966 / 0xA,
        32, _$DCDm(1429), 0x0 | 0x26, 0x37, _$cYbd(0xb6b97 / 0400), -1 - ~(0x3d ^ 0), 323 / 0xA, 98, 0x6226 / 0400, 96)
    + __JtOCokI16(32, 38, 0x27, 32, 075, ~~32, 99, 99, Math.abs(96) & -1, -1 - ~(0x20 ^ 0),
        _$w4I(0x0 | 0x786), 39, 0x32, 0x2076 / 0400, Math.abs(61) & -1, 32, 1001 / 0xA, Math.abs(100) & -1, 0140, 32,
        _$w4I(0x0 | 0x784), _$DCDm(1428), 502 / 0xA, 040, 61, 0x2086 / 0400, 101, ~(0x65 ^ -1), _$Bkcs(1736), 0x2032 / 0400,
        -1 - ~(0x26 ^ 0), 046, 51, Math.abs(32) & -1, Math.abs(61) & -1, ~~32, -1 - ~(0x66 ^ 0), _$nK07(~(0x1cd ^ -1)), 0140, 040,
        38, 38, 064, _$Bkcs(1672), _$cYbd(2934), 0x2076 / 0400, Math.abs(103) & -1, 103, _$Bkcs(1736 & (-1 ^ 0x00)), 040,
        864 / 0xA, 0x2728 / 0400, 0x3170 / 0400, _$Bkcs(1672), Math.abs(61) & -1, 0x0 | 0x20, 104 & (-1 ^ 0x00), 0x0 | 0x68, 964 / 0xA, 0x2067 >> 4 >> 4,
        0x2679 >> 4 >> 4, 392 / 0xA, 48, _$Bkcs(0x0 | 0x688), Math.abs(61) & -1, ~(0x20 ^ -1), 97, ~(0x28 ^ -1), ~~96, 041,
        ~~73, _$DCDm(0x591), 0x2c50 / 0400, 0x6082 / 0400, ~(0x21 ^ -1), 0x4011 / 0400, _$DCDm(0x596), 0x2925 >> 4 >> 4, 0x60, 0x2047 >> 4 >> 4,
        0x37, 39, -1 - ~(0x34 ^ 0), 326 / 0xA, _$cYbd(2934), 32, 988 / 0xA, _$nK07(-1 - ~(0x183 ^ 0)), 96, 32,
        _$w4I(~~1951), _$DCDm(~~1430), 060, 96, 32, 61, 35, ~~50, 41 & (-1 ^ 0x00), Math.abs(96) & -1)
    + __JtOCokI16(322 / 0xA, 92, Math.abs(34) & -1, 35 & (-1 ^ 0x00), 51 & (-1 ^ 0x00), ~(0x60 ^ -1), 323 / 0xA, 0x4042 >> 4 >> 4, _$Bkcs(03200), _$cYbd(0x0 | 0xb7e),
        96, 32, 99 & (-1 ^ 0x00), 0x2766 / 0400, 0x32, Math.abs(96) & -1, -1 - ~(0x20 ^ 0), 0x6628 / 0400, 0x0 | 0x24, 49,
        0x2944 / 0400, 32, 0x0 | 0x2b, 0140, Math.abs(33) & -1, 726 / 0xA, _$DCDm(1425 & (-1 ^ 0x00)), ~(0x60 ^ -1), _$Bkcs(1672), 0x41,
        0x2613 >> 4 >> 4, 53, 0x6031 >> 4 >> 4, _$Bkcs(1672), 113, 332 / 0xA, 96, 32, 59, 92,
        ~~34, 0x6007 >> 4 >> 4, _$nK07(0613), ~~87, 057, ~~54, ~~32, 61, 966 / 0xA, 32,
        64, 36, 0x0 | 0x2d, 96 & (-1 ^ 0x00), ~~32, 75 & (-1 ^ 0x00), 0x2d05 / 0400, 55, ~(0x60 ^ -1), 32 & (-1 ^ 0x00),
        124, ~~38, 96, 33 & (-1 ^ 0x00), 107, 0x2412 / 0400, 52, 0140, 321 / 0xA, 0x51,
        ~~39, 564 / 0xA, 96, 32, 53, 0x2698 / 0400, _$DCDm(1490), 33, 60, 36,
        ~(0x34 ^ -1), 41, 0x3b, -1 - ~(0x60 ^ 0), 92, Math.abs(34) & -1, 0147, 37, -1 - ~(0x60 ^ 0), _$cYbd(2920 & (-1 ^ 0x00)),
        50, 35, _$nK07(~(0x1cb ^ -1)), 0x2138 / 0400, 76, 35 & (-1 ^ 0x00), 51, 0x2051 / 0400, 0x0 | 0x2a, 0x20)
    + __JtOCokI16(52, 0x60, Math.abs(32) & -1, 55, 92, 346 / 0xA, ~(0x32 ^ -1), 96, 92, 349 / 0xA,
        Math.abs(61) & -1, 0x28, 52, 96, ~~33, -1 - ~(0x76 ^ 0), ~~36, ~(0x60 ^ -1), 925 / 0xA, ~~34,
        Math.abs(64) & -1, 36, 50, Math.abs(96) & -1, _$cYbd(29239 / 0xA), 0167, Math.abs(39) & -1, 57, 96, -1 - ~(0x20 ^ 0),
        ~~95, 36, 56, 32, 42, -1 - ~(0x20 ^ 0), ~~51, Math.abs(96) & -1, ~(0x5c ^ -1), 34,
        ~(0x55 ^ -1), ~(0x27 ^ -1), 0x3356 >> 4 >> 4, 0x6086 >> 4 >> 4, 32, _$w4I(1938), 36, ~~57, 32, 43,
        965 / 0xA, 0x5c89 / 0400, Math.abs(34) & -1, -1 - ~(0x36 ^ 0), ~~37, 0x60, 33, 61, 044, -1 - ~(0x37 ^ 0),
        ~~41, 0x3b21 >> 4 >> 4, 0x6079 >> 4 >> 4, ~~37, ~~110, _$w4I(0x780), Math.abs(99) & -1, 0x60, 32 & (-1 ^ 0x00), _$cYbd(2936),
        _$w4I(1920), 53, 96, 0x20, 0x7d, 0x2437 >> 4 >> 4, 0x3391 >> 4 >> 4, -1 - ~(0x60 ^ 0), 0x0 | 0x20, 92 & (-1 ^ 0x00),
        Math.abs(34) & -1, 35, ~~54, 415 / 0xA, 59, -1 - ~(0x60 ^ 0), 0x26, _$nK07(503), 0x2296 >> 4 >> 4, 382 / 0xA,
        97 & (-1 ^ 0x00), Math.abs(40) & -1, 97 & (-1 ^ 0x00), _$Bkcs(16680 / 0xA), 0x2018 >> 4 >> 4, 0x0 | 0x62, 051, 32, 123, 0x6977 >> 4 >> 4)
    + __JtOCokI16(102, 040, ~(0x28 ^ -1), 1163 / 0xA, 121, 0160, 0x6548 >> 4 >> 4, 111, 102 & (-1 ^ 0x00), 0x0 | 0x20,
        108, 0157, 99, ~~97, 0x0 | 0x60, 0x20, 565 / 0xA, 33, 075, _$cYbd(0xb76),
        32, 92, _$nK07(393), 117, 0x6e, 100, ~~101, 0x6662 >> 4 >> 4, 105, ~(0x6e ^ -1),
        ~(0x65 ^ -1), _$DCDm(14945 / 0xA), 92, 344 / 0xA, 32, 0x7c44 >> 4 >> 4, 0x7c87 / 0400, -1 - ~(0x20 ^ 0), 96, _$cYbd(0x0 | 0xb6b),
        51, _$DCDm(14337 / 0xA), 46, 104 & (-1 ^ 0x00), 114, Math.abs(101) & -1, 102, ~~32, 0x0 | 0x21, -1 - ~(0x3d ^ 0),
        ~~32, ~(0x5c ^ -1), 34 & (-1 ^ 0x00), 0x7359 >> 4 >> 4, ~~116, 114, Math.abs(105) & -1, 110, 103 & (-1 ^ 0x00), ~~92,
        -1 - ~(0x22 ^ 0), 96, ~(0x27 ^ -1), Math.abs(42) & -1, 38 & (-1 ^ 0x00), 0x6132 >> 4 >> 4, Math.abs(32) & -1, 43, 0x2008 >> 4 >> 4, _$cYbd(2857 & (-1 ^ 0x00)),
        Math.abs(59) & -1, 125, ~(0x20 ^ -1), 101, 108, 0163, _$DCDm(14959 / 0xA), 0x20, 0x6058 / 0400, 0x20,
        105, -1 - ~(0x30 ^ 0), 0x21, 96 & (-1 ^ 0x00), 32, 111, 42, _$nK07(397), 046, 96,
        0x2031 / 0400, 107, ~~46, 0x6f66 >> 4 >> 4, 115, 116, 0x6e, _$nK07(458), _$Bkcs(~(0x6c5 ^ -1)), Math.abs(101) & -1)
    + __JtOCokI16(Math.abs(96) & -1, 0x2137 >> 4 >> 4, 67 & (-1 ^ 0x00), 337 / 0xA, 0x0 | 0x60, 0x20, _$DCDm(0x5c865 / 0400), 35, ~(0x20 ^ -1), ~(0x26 ^ -1),
        _$w4I(1925), -1 - ~(0x60 ^ 0), _$nK07(0x0 | 0x18b), 47, 467 / 0xA, Math.abs(46) & -1, 109, ~(0x61 ^ -1), 116, 991 / 0xA,
        ~(0x68 ^ -1), 40, _$nK07(503), 0x2244 / 0400, 101 & (-1 ^ 0x00), 108, 0x6f, 0x0 | 0x6e, 103, 92,
        -1 - ~(0x22 ^ 0), 41, 964 / 0xA, _$Bkcs(1673), 57, 0x2865 >> 4 >> 4, 0x2d, 0x0 | 0x60, 33, 59,
        0x2654 >> 4 >> 4, 1236 / 0xA, -1 - ~(0x62 ^ 0), 32 & (-1 ^ 0x00), ~~61, 96 & (-1 ^ 0x00), ~(0x20 ^ -1), 0x2b03 >> 4 >> 4, 35 & (-1 ^ 0x00), Math.abs(97) & -1,
        0x0 | 0x20, 619 / 0xA, 32, _$Bkcs(0x6ca), 32, -1 - ~(0x2d ^ 0), 0x20, 976 / 0xA, 96, 0x0 | 0x23,
        37, -1 - ~(0x24 ^ 0), ~~96, 0x0 | 0x20, 0x4273 >> 4 >> 4, _$w4I(1920), 960 / 0xA, 0x5c, ~~34, 0156,
        38, ~(0x62 ^ -1), 0x6076 / 0400, 0x0 | 0x5c, 0x2252 / 0400, 0x0 | 0x5c, Math.abs(92) & -1, _$nK07(4038 / 0xA), 966 / 0xA, 33,
        88, _$DCDm(0x58d79 / 0400), 0x0 | 0x72, Math.abs(101) & -1, 1021 / 0xA, 96 & (-1 ^ 0x00), _$DCDm(1427), 93, 0x36, Math.abs(114) & -1,
        ~~101, 0x6689 / 0400, -1 - ~(0x60 ^ 0), 338 / 0xA, 0114, 67, 1186 / 0xA, 0141, 114, 32 & (-1 ^ 0x00))
    + __JtOCokI16(0143, 96, 33, Math.abs(104) & -1, 0x2118 >> 4 >> 4, 0x3e15 >> 4 >> 4, 32, 0x3078 / 0400, 040, 63,
        0x0 | 0x20, 98, _$nK07(0x18b40 >> 4 >> 4), 58, 0x2093 >> 4 >> 4, 487 / 0xA, 96, 32 & (-1 ^ 0x00), 615 / 0xA, 33 & (-1 ^ 0x00),
        102, 111 & (-1 ^ 0x00), 0x72, 32 & (-1 ^ 0x00), ~~40, 118, 97, 114, 32, 0x0 | 0x69,
        32, 61, 32, 48, 0x3b91 >> 4 >> 4, 32, 0x69, _$nK07(0x18b), ~~60, 320 / 0xA,
        Math.abs(99) & -1, 59 & (-1 ^ 0x00), 32, 105, 0x2b, 43, ~(0x29 ^ -1), 32 & (-1 ^ 0x00), 0x7b68 / 0400, 0x0 | 0x61,
        32, 053, 61, _$Bkcs(1672), 105, _$w4I(~~1944), 125, Math.abs(96) & -1, ~~32, 108,
        Math.abs(36) & -1, ~~96, 0134, 348 / 0xA, 056, _$w4I(1931 & (-1 ^ 0x00)), _$DCDm(~~1489), _$nK07(0x1cb46 >> 4 >> 4), 0x5c, 34,
        53 & (-1 ^ 0x00), 0x0 | 0x21, Math.abs(44) & -1, 0x2013 / 0400, _$cYbd(0xb2845 >> 4 >> 4), _$w4I(1987), 92, Math.abs(34) & -1, 47, 42,
        100, ~(0x6f ^ -1), 99, 117, 109, 0x0 | 0x65, 0156, 116, 0x6057 / 0400, 37,
        38, 435 / 0xA, 96, 33, _$cYbd(-1 - ~(0xb05 ^ 0)), _$w4I(0x78456 / 0400), 96, 0x2452 / 0400, ~(0x5a ^ -1), 47)
    + __JtOCokI16(104, -1 - ~(0x69 ^ 0), 115 & (-1 ^ 0x00), 1165 / 0xA, _$cYbd(-1 - ~(0xb24 ^ 0)), 114, _$DCDm(1483), 0x6009 / 0400, _$Bkcs(0x68884 / 0400), _$cYbd(0xb70),
        0x35, 96, 37, _$Bkcs(1669), 48, 0x6003 / 0400, 33, _$nK07(0x18b), ~(0x25 ^ -1), 0x60,
        _$w4I(~~1920), ~(0x34 ^ -1), -1 - ~(0x2e ^ 0), ~(0x60 ^ -1), 0x2055 >> 4 >> 4, 51, 44, 0x2e32 / 0400, 99, _$cYbd(-1 - ~(0xb39 ^ 0)),
        101, _$DCDm(1491), -1 - ~(0x74 ^ 0), ~~101, ~~69, 0x6c73 / 0400, ~~101, 0x60, 33 & (-1 ^ 0x00), _$cYbd(2842),
        0x2578 >> 4 >> 4, 96, ~~33, ~(0x7f ^ -1), 0x2485 / 0400, 96 & (-1 ^ 0x00), 38, -1 - ~(0x62 ^ 0), 0x2814 >> 4 >> 4, 0x6010 / 0400,
        -1 - ~(0x20 ^ 0), 0x0 | 0x34, 0x37, 0x6f12 >> 4 >> 4, 0x62, 0x6a64 >> 4 >> 4, 101, 99, 116 & (-1 ^ 0x00), 96,
        35, 846 / 0xA, 0x2892 / 0400, 0x6344 / 0400, ~~96, ~(0x23 ^ -1), 835 / 0xA, 379 / 0xA, -1 - ~(0x60 ^ 0), 46,
        48, ~(0x29 ^ -1), 34 & (-1 ^ 0x00), _$Bkcs(~~1665), _$cYbd(2914)));

获取最终的 js 函数

上面是几个方法函数,下面是通过一些数学计算生成了一些数,通过第一个方法函数转成字符串,经过转换后 js 就变得基本上可读了,大概为以下类型:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
execute();
function execute() {
    try {
        return d(948, 309, 9254981, 480, 651, 690, 572, 522);
    } catch (e) {
        return -99;
    }
}
function d(aa, bb, cc, dd, ee, ff, gg, hh) {
    var param16 = aa;
    var param17 = bb;
    var param1 = cc;
    var param12 = dd;
    var param2 = ee;
    var param3 = ff;
    var param4 = gg;
    var param11 = hh;
    var param10 = a(param16, param17);
    var param14 = b(a(param10, param2), param3);
    var param15 = a(param12, param11) + param4;
    var param5 = b(param1, param3);
    var param6 = param5 - param4;
    var param7 = a(param6, param4);
    var param8 = a(param2, param4);
    param4 = param2 + param3 * 4;
    param2 = b(a(param4, param1), param2);
    var param9 = param8 * 3;
    var param13 = param9 + b(param12, param7);
    return c(param15, param13, param6);
}
function a(a, b) {
    if (typeof location == "undefined" || typeof location.href != "string") {
        return a + b;
    } else if (typeof location != "undefined" && typeof location.hostname == "string" && location.hostname.match("elong")) {
        return a - b;
    } else {
        b = a - b;
        a = b - a;
        return a - b;
    }
}
function b(a, b) {
    if (typeof location != "undefined" && typeof location.href == "string" && location.href.match("elong")) {
        return a - b;
    } else {
        var c = b > 0 ? b : 0 - b;
        for (var i = 0; i < c; i++) {
            a += i;
        }
        return a;
    }
}
function c(a, b, c) {
    if (typeof document == "undefined") {
        return a;
    } else if (typeof history == "undefined") {
        return b;
    } else if (typeof document != "undefined" && (typeof document.createElement == "function" || typeof document.createElement == "object")) {
        return c;
    } else {
        return -99;
    }
}

构造伪造的 python 方法

这些方法中 execute 为主方法,它调用 d ,d 又调用 a,b,c 三个函数。从 a,b,c 中我们发现了为什么之前我们获取到的 check_code 是不准确的。由于 location ,history,document 等相关判断导致的,但由于这是系统变量,执行执行 location=xxx 的话会跳转到指定页面,所以使用伪造的方法进行运行。以下为最终的解析方法:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2017/7/4 下午9:40
# @Author  : Hou Rong
# @Site    : 
# @File    : test_web.py
# @Software: PyCharm
import requests
import execjs
import pyquery

if __name__ == '__main__':
    out_date = '2017-10-05'
    in_date = '2017-10-04'
    hotel_id = '315197'
    session = requests.session()

    new_headers = {
        'Accept': 'application/json',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-CN,zh;q=0.8',
        'Cache-Control': 'no-cache',
        'Connection': 'keep-alive',
        'Host': 'm.elong.com',
        'DNT': '1',
        'X-Requested-With': 'XMLHttpRequest',
        'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1',
    }

    session.headers.update(new_headers)

    url_A = 'http://m.elong.com/ihotel/{0}/?source_id={0}#detailTab'.format(hotel_id)
    page_A = session.get(url_A)

    # debug
    doc = pyquery.PyQuery(page_A.text)

    # lxml //*[@id="tsdDetail"]
    value_str = doc('#tsdDetail').val()

    ph_runtime = execjs.get('PhantomJS')
    js_func = ph_runtime.compile(
        '''
        var localContext = {
            "location": {
                href: "http://m.elong.com",
                hostname: "m.elong.com"
            },
            "history": "history",
            "document": {
                createElement: function () {
                }
            }
        };
        with (localContext) 
        {
            var a=%s;
            var b=a.replace(/\)\^-1/gm, ")&-1");
            var c=eval(b);
        }
        ''' % (
            repr(value_str),))
    check_code = str(js_func.eval('c'))

    print('check code', check_code)

    session.headers.update({
        'Referer': url_A
    })

    page_B = session.get(
        'http://m.elong.com/ihotel/detail/DetailRoomList/?hotelId={3}&inDate={0}&outDate={1}&roomPerson=1|2&code={2}'.format(
            in_date, out_date, check_code, hotel_id)
    )
    print(page_B.text)

构造 localContext 使得其中的判断正确,而后再次调用我们之前的函数即可。由此我们获取到了最终的 check_code 。

所以我们最终的解析就可以用这两部完成:

  1. 打请求,计算并获取 check_code
  2. 通过正确的 check_code 加上相关参数获取数据即可