同样的请求头request可以请求成功,而scrapy却不行
原始请求头:
headers = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "zh-CN,zh;q=0.9",
"Authorization": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBZGRyZXNzIjoiMHgwZUMxMTUwMzJBNzRkZTRGNTE5RDRiNENmZDJGQjhiNDQyNzU5NDllIiwiTm9uY2UiOiJZMThnczZTZ1F3WlQ1bWd2MiIsImV4cCI6MTY5MDM0MzAxNCwiSnd0RXJyb3IiOm51bGx9.yT3JkK9buqzTqg1VaWXQN2qcs2pvUOSlL0BknNAgiBo",
# 'Authorization':'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBZGRyZXNzIjoiMHgwZUMxMTUwMzJBNzRkZTRGNTE5RDRiNENmZDJGQjhiNDQyNzU5NDllIiwiTm9uY2UiOiJVUTB6T1Y5dWx0TklrOHZ1aiIsImV4cCI6MTY5MDM0MzQ5NSwiSnd0RXJyb3IiOm51bGx9.n0b_Sh7WB6KwvO41Q_PsP9SIMCokJyl9g2ekaTNUFxg',
# "Content-Length": "979",
"Content-Type": "application/json",
"Dnt": "1",
"Origin": "https://galxe.com",
"Sec-Ch-Ua": "\"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"114\", \"Google Chrome\";v=\"114\"",
"Sec-Ch-Ua-Mobile": "?0",
"Sec-Ch-Ua-Platform": "\"macOS\"",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
}
用requests请求抓包结果:
POST /query HTTP/1.1
Host: graphigo.prd.galaxy.eco
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Accept-Encoding: gzip, deflate, br
Accept: */*
Connection: keep-alive
Accept-Language: zh-CN,zh;q=0.9
Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBZGRyZXNzIjoiMHgwZUMxMTUwMzJBNzRkZTRGNTE5RDRiNENmZDJGQjhiNDQyNzU5NDllIiwiTm9uY2UiOiJZMThnczZTZ1F3WlQ1bWd2MiIsImV4cCI6MTY5MDM0MzAxNCwiSnd0RXJyb3IiOm51bGx9.yT3JkK9buqzTqg1VaWXQN2qcs2pvUOSlL0BknNAgiBo
Content-Length: 988
Content-Type: application/json
Dnt: 1
Origin: https://galxe.com
Sec-Ch-Ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "macOS"
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: cross-site
用scrapy请求抓包结果:
POST /query HTTP/1.1
Content-Length: 990
Accept: */*
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Authorization: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJBZGRyZXNzIjoiMHgwZUMxMTUwMzJBNzRkZTRGNTE5RDRiNENmZDJGQjhiNDQyNzU5NDllIiwiTm9uY2UiOiJZMThnczZTZ1F3WlQ1bWd2MiIsImV4cCI6MTY5MDM0MzAxNCwiSnd0RXJyb3IiOm51bGx9.yT3JkK9buqzTqg1VaWXQN2qcs2pvUOSlL0BknNAgiBo
Content-Length: 979
Content-Type: application/json
DNT: 1
Origin: https://galxe.com
Sec-Ch-Ua: "Not.A/Brand";v="8", "Chromium";v="114", "Google Chrome";v="114"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "macOS"
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: cross-site
Referer: https://galxe.com/Linea/leaderboard
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36
Host: graphigo.prd.galaxy.eco
能够发现请求头不同key的排列顺序不一样,常规来讲,这些应该没什么影响;
奇怪的是scrapy抓包得到的请求头出现了两个`Content-Length`, 这很不合理;
基本可以确定scrapy和request发起请求时对于头部的处理事不一样的,对于scrapy而言会自动处理修改头部的某些信息,而这些信息如果被提前定义了,可能会导致冲突,无法请求成功,比如响应400;
处理方案是:剔除非必要的请求头key;