此为历史版本和 IPFS 入口查阅区,回到作品页
shixiaolong0
IPFS 指纹 这是什么

作品指纹

同样的请求头request可以请求成功,而scrapy却不行

shixiaolong0
·
·
具体是什么原因?抓包分析分析

原始请求头:

headers = {
    "Accept": "*/*",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Authorization": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBZGRyZXNzIjoiMHgwZUMxMTUwMzJBNzRkZTRGNTE5RDRiNENmZDJGQjhiNDQyNzU5NDllIiwiTm9uY2UiOiJZMThnczZTZ1F3WlQ1bWd2MiIsImV4cCI6MTY5MDM0MzAxNCwiSnd0RXJyb3IiOm51bGx9.yT3JkK9buqzTqg1VaWXQN2qcs2pvUOSlL0BknNAgiBo",
    # 'Authorization':'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBZGRyZXNzIjoiMHgwZUMxMTUwMzJBNzRkZTRGNTE5RDRiNENmZDJGQjhiNDQyNzU5NDllIiwiTm9uY2UiOiJVUTB6T1Y5dWx0TklrOHZ1aiIsImV4cCI6MTY5MDM0MzQ5NSwiSnd0RXJyb3IiOm51bGx9.n0b_Sh7WB6KwvO41Q_PsP9SIMCokJyl9g2ekaTNUFxg',
    # "Content-Length": "979",
    "Content-Type": "application/json",
    "Dnt": "1",
    "Origin": "https://galxe.com",
    "Sec-Ch-Ua": "\"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"114\", \"Google Chrome\";v=\"114\"",
    "Sec-Ch-Ua-Mobile": "?0",
    "Sec-Ch-Ua-Platform": "\"macOS\"",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
}

用requests请求抓包结果:

POST /query HTTP/1.1
Host: graphigo.prd.galaxy.eco
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Accept-Encoding: gzip, deflate, br
Accept: */*
Connection: keep-alive
Accept-Language: zh-CN,zh;q=0.9
Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBZGRyZXNzIjoiMHgwZUMxMTUwMzJBNzRkZTRGNTE5RDRiNENmZDJGQjhiNDQyNzU5NDllIiwiTm9uY2UiOiJZMThnczZTZ1F3WlQ1bWd2MiIsImV4cCI6MTY5MDM0MzAxNCwiSnd0RXJyb3IiOm51bGx9.yT3JkK9buqzTqg1VaWXQN2qcs2pvUOSlL0BknNAgiBo
Content-Length: 988
Content-Type: application/json
Dnt: 1
Origin: https://galxe.com
Sec-Ch-Ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "macOS"
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: cross-site

用scrapy请求抓包结果:

POST /query HTTP/1.1
Content-Length: 990
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBZGRyZXNzIjoiMHgwZUMxMTUwMzJBNzRkZTRGNTE5RDRiNENmZDJGQjhiNDQyNzU5NDllIiwiTm9uY2UiOiJZMThnczZTZ1F3WlQ1bWd2MiIsImV4cCI6MTY5MDM0MzAxNCwiSnd0RXJyb3IiOm51bGx9.yT3JkK9buqzTqg1VaWXQN2qcs2pvUOSlL0BknNAgiBo
Content-Length: 979
Content-Type: application/json
DNT: 1
Origin: https://galxe.com
Sec-Ch-Ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "macOS"
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: cross-site
Referer: https://galxe.com/Linea/leaderboard
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Host: graphigo.prd.galaxy.eco

能够发现请求头不同key的排列顺序不一样,常规来讲,这些应该没什么影响;

奇怪的是scrapy抓包得到的请求头出现了两个`Content-Length`, 这很不合理;

基本可以确定scrapy和request发起请求时对于头部的处理事不一样的,对于scrapy而言会自动处理修改头部的某些信息,而这些信息如果被提前定义了,可能会导致冲突,无法请求成功,比如响应400;

处理方案是:剔除非必要的请求头key;

CC BY-NC-ND 4.0 授权