股票现金流量表的爬取(有局限性)
皮猪 529 2

目录

  • 前言
  • 编写代码
  • 总结

前言

还是上一篇文章的网站,但本次抓取的是股票的现金流量表,数据更多了一些,所以代码还是有点问题,抓取少量股票时存入excel里的数据格式是没问题的,但数据量一多,中间有被反爬了,那么表格后面的数据就会出现格式问题,因为我还比较菜,等以后再回过头来解决这个问题吧。 我们来看看结果。 股票现金流量表的爬取(有局限性)

编写代码

1.分析网页

股票id的获取我就不说了,可以去看上一篇文章股票数据的爬取(selenium的使用) 现金流量表是在股票页面的公司概况里,点进去后然后点击财务分析再点击现金流量表。 股票现金流量表的爬取(有局限性) —————————————————————————————————— 股票现金流量表的爬取(有局限性) 然后又是老套路,右键点击检查,进入network,然后勾选xhr,因为一般这种异步加载的数据都在xhr里,如果这里找不到我们再去勾选all一个一个看。 这个网页一刷新出现的这些包里有我们需要的数据。data里有5条数据,刚好对应上面的5个年份,但不要忘记这个现金流量表不止这几个年份,点击棕色小框的箭头。然后发现xhr里又多出来一个,看看它们的链接之间的规律。 股票现金流量表的爬取(有局限性)

http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2021-03-31%2C2020-12-31%2C2020-06-30%2C2020-03-31%2C2019-12-31&code=SZ300986
http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2019-06-30%2C2018-12-31%2C2017-12-31%2C2017-06-30%2C2016-12-31&code=SZ300986
http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2016-12-31%2C2016-06-30%2C2015-12-31%2C2015-06-30%2C2014-12-31&code=SZ300986

然后发现就是dates这个参数的值变化了,然后code是股票的板块加id。这样的话dates我们可以不用去修改,弄一个列表来存放就好了。但要注意的是有些股票的年份很少,只到2015年,而有些股票却可以到1999年,所以这个url要找全。 接着我们来看一下这个数据,嗯,又是和上篇文章中的股票行情一样的情况,无法对应到表格里。还是老办法,ctrl+shift+F进行全局搜索,只有一个文件包含这个字段,点击进去一看,我们找到我们需要的对应关系了。其他的字段对应关系也在这附近。 股票现金流量表的爬取(有局限性) 股票现金流量表的爬取(有局限性)

2.编写代码

首先是获取到股票的id。

def get_ids_names(n):
    '''
    :n: 板块序号
    :data: 存放id和name的字典的列表,f12:id, f14:name
    :num: 股票的数量
    :return {num: data}
    :urls中pz为一次性获取的数量,超过最大值只返回最大数量
    '''
    urls = [
        'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=2000&po=1&np=1&fid=f3&fs=m:1+t:2,m:1+t:23&fields=f12,f14', #上证A股2000
        'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=3000&po=1&np=1&fid=f3&fs=m:0+t:6,m:0+t:80&fields=f12,f14', #深证A股3000
        'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=1000&po=1&np=1&fid=f3&fs=m:0+t:80&fields=f12,f14',  #创业板1000
        'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=300&po=1&np=1&fid=f3&fs=m:1+t:23&fields=f12,f14',  #科创版300
    ]
    url = urls[n]
    d = response(url,0.5).json()['data']
    num = d['total']
    data = d['diff']
    print(num,len(data))
    return {num: data}

获取到股票id后,传入到处理股票id的函数里。

def chuli_ids_names(ids_names, n):
    id_name_queue = Queue()
    for id_name in ids_names:
        id_name_queue.put((id_name,n))
    th1 = MyThread(id_name_queue)
    th1.start()
    th1.join()
    return th1.y

这里因为股票数量比较多,我就使用的多线程。

class MyThread(threading.Thread):
    def __init__(self, queue):
        threading.Thread.__init__(self)
        self.queue = queue
        self.y = []
    def run(self):
        while not self.queue.empty():   #当队列不为空
            data = self.queue.get()
            g_datas = get_gdata(data[0], data[1])
            new_gdatas = []
            for da in g_datas:
                d = list(da.values())[0]
                # print(d)
                new_gdatas.append({d['REPORT_DATE']: jianxi(d)})
            self.y.append({data[0]['f12']+data[0]['f14']: new_gdatas})
            # print(self.y)
            print(len(self.y))

这里面的get_gdata函数是获取一只股票的现金流量表的函数。

def get_gdata(id_name,n):
    urls = [
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2021-03-31%2C2020-12-31%2C2020-09-30%2C2020-06-30%2C2020-03-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2020-03-31%2C2019-12-31%2C2019-09-30%2C2018-12-31%2C2017-12-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2018-03-31%2C2017-12-31%2C2017-09-30%2C2017-06-30%2C2017-03-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2016-12-31%2C2016-09-30%2C2016-06-30%2C2016-03-31%2C2015-12-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2015-09-30%2C2015-06-30%2C2015-03-31%2C2014-12-31%2C2014-09-30&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2014-06-30%2C2014-03-31%2C2013-12-31%2C2013-09-30%2C2013-06-30&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2013-03-31%2C2012-12-31%2C2012-09-30%2C2012-06-30%2C2012-03-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2011-12-31%2C2011-09-30%2C2011-06-30%2C2011-03-31%2C2010-12-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2010-12-31%2C2010-06-30%2C2009-12-31%2C2008-12-31%2C2007-12-31&code=S'

    ]
    id = id_name['f12']
    d = c_id(id, n)
    # print(d,id)
    g_datas = []  # 一只股票的数据
    datas = []
    data_time = []
    t = 2
    for u in urls:
        url = u + d
        res = response(url, 0)
        if 'data' in res.json().keys():
            for i in range(int(res.json()['count'])):
                if res.json()['data'][i]['REPORT_DATE'] not in data_time:
                    data_time.append(res.json()['data'][i]['REPORT_DATE'])
                    datas.append(res.json()['data'][i])
        else:
            if t == 0:
                break
            t -= 1
    print('共有', len(datas), '列')
    for data in datas:
        g_datas.append({data['REPORT_DATE']: data})
    return g_datas

获取到一只股票的现金流量表后将其按{股票id+股票名称:现金流量表}这样的格式添加到self.y里。一个板块所有的股票数据获取完毕后将数据写入excel表里。

def write_b_data(n,b_data):
    '''
    :b_datas:一个板块所有股票的现金流量表,存放格式为[{id+name: g_data},...,{id+name: g_data}]
    :g_data:一只股票的现金流量表,存放格式为[{time:data},...,{time:data}]
    :data:一个时间的现金流量,存放格式为{key:value,...,key:value}
    '''
    name = ['上证A股','深证A股','创业版','科创版']
    wb = load_workbook('xjls.xlsx')
    ws = wb.create_sheet(name[n], 0)
    bt = {  # 表格第一行
        'A': '现金流量表',
        'B': '2021-03-31 00:00:00',
        'C': '2020-12-31 00:00:00',
        'D': '2020-09-30 00:00:00',
        'E': '2020-06-30 00:00:00',
        'F': '2020-03-31 00:00:00',
        'G': '2019-12-31 00:00:00',
        'H': '2019-09-30 00:00:00',
        'I': '2019-06-30 00:00:00',
        'J': '2019-03-31 00:00:00',
        'K': '2018-12-31 00:00:00',
        'L': '2018-09-30 00:00:00',
        'M': '2018-06-30 00:00:00',
        'N': '2018-03-31 00:00:00',
        'O': '2017-12-31 00:00:00',
        'P': '2017-09-30 00:00:00',
        'Q': '2017-06-30 00:00:00',
        'R': '2017-03-31 00:00:00',
        'S': '2016-12-31 00:00:00',
        'T': '2016-09-30 00:00:00',
        'U': '2016-06-30 00:00:00',
        'V': '2016-03-31 00:00:00',
        'W': '2015-12-31 00:00:00',
        'X': '2015-09-30 00:00:00',
        'Y': '2015-06-30 00:00:00',
        'Z': '2015-03-31 00:00:00',
        'AA': '2014-12-31 00:00:00',
        'AB': '2014-09-30 00:00:00',
        'AC': '2014-06-30 00:00:00',
        'AD': '2014-03-31 00:00:00',
        'AE': '2013-12-31 00:00:00',
        'AF': '2013-09-30 00:00:00',
        'AG': '2013-06-30 00:00:00',
        'AH': '2013-03-31 00:00:00',
        'AI': '2012-12-31 00:00:00',
        'AJ': '2012-09-30 00:00:00',
        'AK': '2012-06-30 00:00:00',
        'AL': '2012-03-31 00:00:00',
        'AM': '2011-12-31 00:00:00',
        'AN': '2011-09-30 00:00:00',
        'AO': '2011-06-30 00:00:00',
        'AP': '2011-03-31 00:00:00',
        'AQ': '2010-12-31 00:00:00',
        'AR': '2010-09-30 00:00:00',
        'AS': '2010-06-30 00:00:00',
        'AT': '2010-03-31 00:00:00',
        'AU': '2009-12-31 00:00:00',
        'AV': '2009-09-30 00:00:00',
        'AW': '2009-06-30 00:00:00',
        'AX': '2009-03-31 00:00:00',
        'AY': '2008-12-31 00:00:00',
        'AZ': '2008-09-30 00:00:00',
        'BA': '2008-06-30 00:00:00',
        'BB': '2008-03-31 00:00:00',
        'BC': '2007-12-31 00:00:00',
        'BD': '2007-09-30 00:00:00',
        'BE': '2007-06-30 00:00:00',
        'BF': '2007-03-31 00:00:00',
        'BG': '2006-12-31 00:00:00',
        'BH': '2006-09-30 00:00:00',
        'BI': '2006-06-30 00:00:00',
        'BJ': '2006-03-31 00:00:00',
        'BK': '2005-12-31 00:00:00',
        'BL': '2005-09-30 00:00:00',
        'BM': '2005-06-30 00:00:00',
        'BN': '2005-03-31 00:00:00',
        'BO': '2004-12-31 00:00:00',
        'BP': '2004-09-30 00:00:00',
        'BQ': '2004-06-30 00:00:00',
        'BR': '2004-03-31 00:00:00',
        'BS': '2003-12-31 00:00:00',
        'BT': '2003-09-30 00:00:00',
        'BU': '2003-06-30 00:00:00',
        'BV': '2003-03-31 00:00:00',
        'BW': '2002-12-31 00:00:00',
        'BX': '2002-09-30 00:00:00',
        'BY': '2002-06-30 00:00:00',
        'BZ': '2002-03-31 00:00:00',
        'CA': '2001-12-31 00:00:00',
        'CB': '2001-09-30 00:00:00',
        'CC': '2001-06-30 00:00:00',
        'CD': '2001-03-31 00:00:00',
        'CE': '2000-12-31 00:00:00',
        'CF': '2000-09-30 00:00:00',
        'CG': '2000-06-30 00:00:00',
        'CH': '2000-03-31 00:00:00',
        'CI': '1999-12-31 00:00:00',
        'CJ': '1999-09-30 00:00:00',
        'CK': '1999-06-30 00:00:00',
        'CL': '1999-03-31 00:00:00',
        'CM': '1998-12-31 00:00:00',
        'CN': '1998-09-30 00:00:00',
        'CO': '1998-06-30 00:00:00',
        'CP': '1998-03-31 00:00:00',
        'CQ': '1997-12-31 00:00:00',
        'CR': '1997-09-30 00:00:00',
        'CS': '1997-06-30 00:00:00',
        'CT': '1997-03-31 00:00:00',
        'CU': '1996-12-31 00:00:00',
        'CV': '1996-09-30 00:00:00',
        'CW': '1996-06-30 00:00:00',
        'CX': '1996-03-31 00:00:00',
    }
    for key in list(bt.keys()):     #存入表头
        ws[key + '1'] = bt[key]
    i = 2
    g = 0
    for g_data in b_data:
        ws['A' + str(i)] = list(g_data.keys())[0]      #存入股票id名称
        i += 1
        t = 0
        for data in list(g_data.values())[0]:
            for key in list(data.keys()):
                b = 3 + g * 116
                value1 = data[key]
                if t == 0:
                    for k_values in list(value1.keys()):  # 存入第一列名称
                        ws['A' + str(i)] = k_values
                        i += 1
                    t = 1
                xh = get_v_key(bt, key)
                for v in value1.values():   #存入股票对应时间的现金流量表
                    if v == None:
                        v = '----'
                    ws[xh + str(b)] = v
                    # print(xh + str(b), v)
                    b += 1
        g += 1
        print(g)
    wb.save('xjls.xlsx')
    wb.close()

3.总的代码

import requests
import time
from openpyxl import Workbook
from openpyxl import load_workbook
import threading
from queue import Queue

# 数据量少可以使用,数据量一多就出问题

class MyThread(threading.Thread):
    def __init__(self, queue):
        threading.Thread.__init__(self)
        self.queue = queue
        self.y = []
    def run(self):
        while not self.queue.empty():   #当队列不为空
            data = self.queue.get()
            g_datas = get_gdata(data[0], data[1])
            new_gdatas = []
            for da in g_datas:
                d = list(da.values())[0]
                # print(d)
                new_gdatas.append({d['REPORT_DATE']: jianxi(d)})
            self.y.append({data[0]['f12']+data[0]['f14']: new_gdatas})
            # print(self.y)
            print(len(self.y))

def get_gdata(id_name,n):
    urls = [
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2021-03-31%2C2020-12-31%2C2020-09-30%2C2020-06-30%2C2020-03-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2020-03-31%2C2019-12-31%2C2019-09-30%2C2018-12-31%2C2017-12-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2018-03-31%2C2017-12-31%2C2017-09-30%2C2017-06-30%2C2017-03-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2016-12-31%2C2016-09-30%2C2016-06-30%2C2016-03-31%2C2015-12-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2015-09-30%2C2015-06-30%2C2015-03-31%2C2014-12-31%2C2014-09-30&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2014-06-30%2C2014-03-31%2C2013-12-31%2C2013-09-30%2C2013-06-30&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2013-03-31%2C2012-12-31%2C2012-09-30%2C2012-06-30%2C2012-03-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2011-12-31%2C2011-09-30%2C2011-06-30%2C2011-03-31%2C2010-12-31&code=S',
        'http://f10.eastmoney.com/NewFinanceAnalysis/xjllbAjaxNew?companyType=4&reportDateType=0&reportType=1&dates=2010-12-31%2C2010-06-30%2C2009-12-31%2C2008-12-31%2C2007-12-31&code=S'

    ]
    id = id_name['f12']
    d = c_id(id, n)
    # print(d,id)
    g_datas = []  # 一只股票的数据
    datas = []
    data_time = []
    t = 2
    for u in urls:
        url = u + d
        res = response(url, 0)
        if 'data' in res.json().keys():
            for i in range(int(res.json()['count'])):
                if res.json()['data'][i]['REPORT_DATE'] not in data_time:
                    data_time.append(res.json()['data'][i]['REPORT_DATE'])
                    datas.append(res.json()['data'][i])
        else:
            if t == 0:
                break
            t -= 1
    print('共有', len(datas), '列')
    for data in datas:
        g_datas.append({data['REPORT_DATE']: data})
    return g_datas

def response(url,sleep_time):
    times = 3
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36 Edg/89.0.774.76',
    }
    try:
        res = requests.get(url,headers=headers)
        if res.status_code >=200 and res.status_code<300:
            time.sleep(sleep_time)
            return res
        else:
            return None
    except  Exception as e:
        print(e)
        if times>0:
            print('机会次数:'+str(times))
            response(url,sleep_time)
        else:
            print('无法爬取')

def c_id(id, n):
    if n == 0 or n == 3:
        return ''.join(['H',id])
    elif n == 1 or n == 2:
        return ''.join(['Z',id])
    else:
        print('板块错误')

def get_ids_names(n):
    '''
    n: 板块序号
    data: 存放id和name的字典的列表,f12:id, f14:name
    num: 股票的数量
    return {num: data}
    urls中pz为一次性获取的数量,超过最大值只返回最大数量
    '''
    urls = [
        'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=2000&po=1&np=1&fid=f3&fs=m:1+t:2,m:1+t:23&fields=f12,f14', #上证A股2000
        'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=3000&po=1&np=1&fid=f3&fs=m:0+t:6,m:0+t:80&fields=f12,f14', #深证A股3000
        'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=1000&po=1&np=1&fid=f3&fs=m:0+t:80&fields=f12,f14',  #创业板1000
        'http://81.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=300&po=1&np=1&fid=f3&fs=m:1+t:23&fields=f12,f14',  #科创版300
    ]
    url = urls[n]
    d = response(url,0.5).json()['data']
    num = d['total']
    data = d['diff']
    print(num,len(data))
    return {num: data}

def jianxi(d):
    ys = {
        '经营活动产生的现金流量': '',
        '\t销售商品、提供劳务收到的现金': 'SALES_SERVICES',
        '\t客户存款和同业存放款项净增加额': 'DEPOSIT_INTERBANK_ADD',
        '\t向中央银行借款净增加额': 'LOAN_PBC_ADD',
        '\t向其他金融机构拆入资金净增加额': 'OFI_BF_ADD',
        '\t收到原保险合同保费取得的现金': 'RECEIVE_ORIGIC_PREMIUM',
        '\t收到再保险业务现金净额': 'RECEIVE_REINSURE_NET',
        '\t保户储金及投资款净增加额': 'INSURED_INVEST_ADD',
        '\t处置交易性金融资产净增加额': 'DISPOSAL_TFA_ADD',
        '\t收取利息、手续费及佣金的现金': 'RECEIVE_INTEREST_COMMISSION',
        '\t拆入资金净增加额': 'BORROW_FUND_ADD',
        '\t发放贷款及垫款的净减少额': 'LOAN_ADVANCE_REDUCE',
        '\t回购业务资金净增加额': 'REPO_BUSINESS_ADD',
        '\t收到的税收返还': 'RECEIVE_TAX_REFUND',
        '\t收到其他与经营活动有关的现金': 'RECEIVE_OTHER_OPERATE',
        '\t经营活动现金流入的其他项目': 'OPERATE_INFLOW_OTHER',
        '\t经营活动现金流入的平衡项目': 'OPERATE_INFLOW_BALANCE',
        '经营活动现金流入小计': 'TOTAL_OPERATE_INFLOW',
        '\t购买商品、接受劳务支付的现金': 'BUY_SERVICES',
        '\t客户贷款及垫款净增加额': 'LOAN_ADVANCE_ADD',
        '\t存放中央银行和同业款项净增加额': 'PBC_INTERBANK_ADD',
        '\t支付原保险合同赔付等款项的现金': 'PAY_ORIGIC_COMPENSATE',
        '\t支付利息、手续费及佣金的现金': 'PAY_INTEREST_COMMISSION',
        '\t支付保单红利的现金': 'PAY_POLICY_BONUS',
        '\t支付给职工以及为职工支付的现金': 'PAY_STAFF_CASH',
        '\t支付的各项税费': 'PAY_ALL_TAX',
        '\t支付其他与经营活动有关的现金': 'PAY_OTHER_OPERATE',
        '\t经营活动现金流出的其他项目': 'OPERATE_OUTFLOW_OTHER',
        '\t经营活动现金流出的平衡项目': 'OPERATE_OUTFLOW_BALANCE',
        '经营活动现金流出小计': 'TOTAL_OPERATE_OUTFLOW',
        '\t经营活动产生的现金流量净额其他项目': 'OPERATE_NETCASH_OTHER',
        '\t经营活动产生的现金流量净额平衡项目': 'OPERATE_NETCASH_BALANCE',
        '经营活动产生的现金流量净额': 'OPERATE_NETCASH_OTHER',
        '投资活动产生的现金流量': '',
        '\t收回投资收到的现金': 'WITHDRAW_INVEST',
        '\t取得投资收益收到的现金': 'RECEIVE_INVEST_INCOME_QOQ',
        '\t处置固定资产、无形资产和其他长期资产收回的现金净额': 'DISPOSAL_LONG_ASSET',
        '\t处置子公司及其他营业单位收到的现金': 'DISPOSAL_SUBSIDIARY_OTHER',
        '\t减少质押和定期存款所收到的现金': 'REDUCE_PLEDGE_TIMEDEPOSITS',
        '\t收到的其他与投资活动有关的现金': 'RECEIVE_OTHER_INVEST',
        '\t投资活动现金流入的其他项目': 'INVEST_INFLOW_OTHER',
        '\t投资活动现金流入的平衡项目': 'INVEST_INFLOW_BALANCE',
        '投资活动现金流入小计': 'TOTAL_INVEST_INFLOW',
        '\t购建固定资产、无形资产和其他长期资产支付的现金': 'CONSTRUCT_LONG_ASSET',
        '\t投资支付的现金': 'INVEST_PAY_CASH',
        '\t质押贷款净增加额': 'PLEDGE_LOAN_ADD',
        '\t取得子公司及其他营业单位支付的现金净额': 'OBTAIN_SUBSIDIARY_OTHER',
        '\t增加质押和定期存款所支付的现金': 'ADD_PLEDGE_TIMEDEPOSITS',
        '\t支付其他与投资活动有关的现金': 'PAY_OTHER_INVEST',
        '\t投资活动现金流出的其他项目': 'INVEST_OUTFLOW_OTHER',
        '\t投资活动现金流出的平衡项目': 'INVEST_OUTFLOW_BALANCE',
        '投资活动现金流出小计': 'TOTAL_INVEST_OUTFLOW',
        '\t投资活动产生的现金流量净额其他项目': 'INVEST_NETCASH_OTHER',
        '\t投资活动产生的现金流量净额平衡项目': 'INVEST_NETCASH_BALANCE',
        '投资活动产生的现金流量净额': 'INVEST_NETCASH_OTHER',
        '筹资活动产生的现金流量': '',
        '\t吸收投资收到的现金': 'ACCEPT_INVEST_CASH',
        '\t其中:子公司吸收少数股东投资收到的现金': 'SUBSIDIARY_ACCEPT_INVEST',
        '\t取得借款收到的现金': 'RECEIVE_LOAN_CASH',
        '\t发行债券收到的现金': 'ISSUE_BOND',
        '\t收到的其他与筹资活动有关的现金': 'RECEIVE_OTHER_FINANCE',
        '\t筹资活动现金流入的其他项目': 'FINANCE_INFLOW_OTHER',
        '\t筹资活动现金流入的平衡项目': 'FINANCE_INFLOW_BALANCE',
        '筹资活动现金流入小计': 'TOTAL_FINANCE_INFLOW_QOQ',
        '\t偿还债务所支付的现金': 'PAY_DEBT_CASH_QOQ',
        '\t分配股利、利润或偿付利息支付的现金': 'ASSIGN_DIVIDEND_PORFIT_QOQ',
        '\t其中:子公司支付给少数股东的股利、利润': 'SUBSIDIARY_PAY_DIVIDEND',
        '\t购买子公司少数股权而支付的现金': 'BUY_SUBSIDIARY_EQUITY',
        '\t支付的其他与筹资活动有关的现金': 'PAY_OTHER_FINANCE',
        '\t其中:子公司减资支付给少数股东的现金': 'SUBSIDIARY_REDUCE_CASH',
        '\t筹资活动现金流出的其他项目': 'FINANCE_OUTFLOW_OTHER',
        '\t筹资活动现金流出的平衡项目': 'FINANCE_OUTFLOW_BALANCE',
        '筹资活动现金流出小计': 'TOTAL_FINANCE_OUTFLOW',
        '\t筹资活动产生的现金流量净额其他项目': 'FINANCE_NETCASH_OTHER',
        '\t筹资活动产生的现金流量净额平衡项目': 'FINANCE_NETCASH_BALANCE',
        '筹资活动产生的现金流量净额': 'FINANCE_NETCASH_OTHER',
        '汇率变动对现金及现金等价物的影响': 'RATE_CHANGE_EFFECT',
        '\t现金及现金等价物净增加额其他项目': 'CCE_ADD_OTHER',
        '\t现金及现金等价物净增加额平衡项目': 'CCE_ADD_BALANCE',
        '现金及现金等价物净增加额': 'CCE_ADD',
        '\t加:期初现金及现金等价物余额': 'BEGIN_CCE',
        '\t期末现金及现金等价物余额其他项目': 'END_CCE_OTHER',
        '\t期末现金及现金等价物余额平衡项目': 'END_CCE_BALANCE',
        '期末现金及现金等价物余额': 'END_CCE',
        '补充资料': '',
        '\t净利润': 'NETPROFIT',
        '\t少数股东损益': 'MINORITY_INTEREST',
        '\t资产减值准备': 'ASSET_IMPAIRMENT',
        '\t固定资产和投资性房地产折旧': 'FA_IR_DEPR',
        '\t其中:固定资产折旧、油气资产折耗、生产性生物资产折旧': 'OILGAS_BIOLOGY_DEPR',
        '\t投资性房地产折旧': 'IR_DEPR',
        '\t无形资产摊销': 'IA_AMORTIZE',
        '\t长期待摊费用摊销': 'LPE_AMORTIZE',
        '\t递延收益摊销': 'DEFER_INCOME_AMORTIZE',
        '\t待摊费用的减少': 'PREPAID_EXPENSE_REDUCE',
        '\t预提费用的增加': 'ACCRUED_EXPENSE_ADD',
        '\t处置固定资产、无形资产和其他长期资产的损失': 'DISPOSAL_LONGASSET_LOSS',
        '\t固定资产报废损失': 'FA_SCRAP_LOSS_QOQ',
        '\t公允价值变动损失': 'FAIRVALUE_CHANGE_LOSS',
        '\t财务费用': 'FINANCE_EXPENSE',
        '\t投资损失': 'INVEST_LOSS',
        '\t递延所得税': 'DEFER_TAX',
        '\t其中:递延所得税资产减少': 'DT_ASSET_REDUCE',
        '\t递延所得税负债增加': 'DT_LIAB_ADD',
        '\t预计负债的增加': 'PREDICT_LIAB_ADD',
        '\t存货的减少': 'INVENTORY_REDUCE',
        '\t经营性应收项目的减少': 'OPERATE_RECE_REDUCE',
        '\t经营性应付项目的增加': 'OPERATE_PAYABLE_ADD',
        '\t其他': 'OTHER',
        '\t经营活动产生的现金流量净额其他项目_': 'OPERATE_NETCASH_OTHERNOTE',
        '\t经营活动产生的现金流量净额平衡项目_': 'OPERATE_NETCASH_BALANCENOTE',
        '经营活动产生的现金流量净额_': 'NETCASH_OPERATENOTE',
        '\t债务转为资本': 'DEBT_TRANSFER_CAPITAL',
        '\t一年内到期的可转换公司债券': 'CONVERT_BOND_1YEAR',
        '\t融资租入固定资产': 'FINLEASE_OBTAIN_FA',
        '\t不涉及现金收支的投资和筹资活动金额其他项目': 'UNINVOLVE_INVESTFIN_OTHER',
        '\t现金的期末余额': 'END_CASH',
        '\t减:现金的期初余额': 'BEGIN_CASH',
        '\t加:现金等价物的期末余额': 'END_CASH_EQUIVALENTS_QOQ',
        '\t减:现金等价物的期初余额': 'BEGIN_CASH_EQUIVALENTS',
        '\t现金及现金等价物净增加额其他项目_': 'CCE_ADD_OTHERNOTE',
        '\t现金及现金等价物净增加额平衡项目_': 'CCE_ADD_BALANCENOTE',
        '现金及现金等价物的净增加额': 'CCE_ADDNOTE',
        '\t审计意见(境内)': 'OPINION_TYPE',
        '\t审计意见(境外)': 'OSOPINION_TYPE'
    }
    new_data = {}
    # print(d)
    try:
        for key, value in ys.items():
            # print(key, value)
            if value != '' and value in d.keys():
                new_data[key] = d[value]
        return new_data
    except BaseException as b:
        print(b)

def chuli_ids_names(ids_names, n, num):
    id_name_queue = Queue()
    for id_name in ids_names:
        id_name_queue.put((id_name,n))
    th1 = MyThread(id_name_queue)
    th1.start()
    th1.join()
    return th1.y
    # b_data = get_bdata(n, num, ids_names)
    # write_b_data(i, b_data)

def get_v_key(dic,v):
    '''
    根据值取字典的键
    '''
    if v in dic.values():
        return list(dic.keys())[list(dic.values()).index(v)]

def write_b_data(n,b_data):
    '''
    b_datas:一个板块所有股票的现金流量表,存放格式为[{id+name: g_data},...,{id+name: g_data}]
    g_data:一只股票的现金流量表,存放格式为[{time:data},...,{time:data}]
    data:一个时间的现金流量,存放格式为{key:value,...,key:value}
    '''
    name = ['上证A股','深证A股','创业版','科创版']
    wb = load_workbook('xjls.xlsx')
    ws = wb.create_sheet(name[n], 0)
    bt = {  # 表格第一行
        'A': '现金流量表',
        'B': '2021-03-31 00:00:00',
        'C': '2020-12-31 00:00:00',
        'D': '2020-09-30 00:00:00',
        'E': '2020-06-30 00:00:00',
        'F': '2020-03-31 00:00:00',
        'G': '2019-12-31 00:00:00',
        'H': '2019-09-30 00:00:00',
        'I': '2019-06-30 00:00:00',
        'J': '2019-03-31 00:00:00',
        'K': '2018-12-31 00:00:00',
        'L': '2018-09-30 00:00:00',
        'M': '2018-06-30 00:00:00',
        'N': '2018-03-31 00:00:00',
        'O': '2017-12-31 00:00:00',
        'P': '2017-09-30 00:00:00',
        'Q': '2017-06-30 00:00:00',
        'R': '2017-03-31 00:00:00',
        'S': '2016-12-31 00:00:00',
        'T': '2016-09-30 00:00:00',
        'U': '2016-06-30 00:00:00',
        'V': '2016-03-31 00:00:00',
        'W': '2015-12-31 00:00:00',
        'X': '2015-09-30 00:00:00',
        'Y': '2015-06-30 00:00:00',
        'Z': '2015-03-31 00:00:00',
        'AA': '2014-12-31 00:00:00',
        'AB': '2014-09-30 00:00:00',
        'AC': '2014-06-30 00:00:00',
        'AD': '2014-03-31 00:00:00',
        'AE': '2013-12-31 00:00:00',
        'AF': '2013-09-30 00:00:00',
        'AG': '2013-06-30 00:00:00',
        'AH': '2013-03-31 00:00:00',
        'AI': '2012-12-31 00:00:00',
        'AJ': '2012-09-30 00:00:00',
        'AK': '2012-06-30 00:00:00',
        'AL': '2012-03-31 00:00:00',
        'AM': '2011-12-31 00:00:00',
        'AN': '2011-09-30 00:00:00',
        'AO': '2011-06-30 00:00:00',
        'AP': '2011-03-31 00:00:00',
        'AQ': '2010-12-31 00:00:00',
        'AR': '2010-09-30 00:00:00',
        'AS': '2010-06-30 00:00:00',
        'AT': '2010-03-31 00:00:00',
        'AU': '2009-12-31 00:00:00',
        'AV': '2009-09-30 00:00:00',
        'AW': '2009-06-30 00:00:00',
        'AX': '2009-03-31 00:00:00',
        'AY': '2008-12-31 00:00:00',
        'AZ': '2008-09-30 00:00:00',
        'BA': '2008-06-30 00:00:00',
        'BB': '2008-03-31 00:00:00',
        'BC': '2007-12-31 00:00:00',
        'BD': '2007-09-30 00:00:00',
        'BE': '2007-06-30 00:00:00',
        'BF': '2007-03-31 00:00:00',
        'BG': '2006-12-31 00:00:00',
        'BH': '2006-09-30 00:00:00',
        'BI': '2006-06-30 00:00:00',
        'BJ': '2006-03-31 00:00:00',
        'BK': '2005-12-31 00:00:00',
        'BL': '2005-09-30 00:00:00',
        'BM': '2005-06-30 00:00:00',
        'BN': '2005-03-31 00:00:00',
        'BO': '2004-12-31 00:00:00',
        'BP': '2004-09-30 00:00:00',
        'BQ': '2004-06-30 00:00:00',
        'BR': '2004-03-31 00:00:00',
        'BS': '2003-12-31 00:00:00',
        'BT': '2003-09-30 00:00:00',
        'BU': '2003-06-30 00:00:00',
        'BV': '2003-03-31 00:00:00',
        'BW': '2002-12-31 00:00:00',
        'BX': '2002-09-30 00:00:00',
        'BY': '2002-06-30 00:00:00',
        'BZ': '2002-03-31 00:00:00',
        'CA': '2001-12-31 00:00:00',
        'CB': '2001-09-30 00:00:00',
        'CC': '2001-06-30 00:00:00',
        'CD': '2001-03-31 00:00:00',
        'CE': '2000-12-31 00:00:00',
        'CF': '2000-09-30 00:00:00',
        'CG': '2000-06-30 00:00:00',
        'CH': '2000-03-31 00:00:00',
        'CI': '1999-12-31 00:00:00',
        'CJ': '1999-09-30 00:00:00',
        'CK': '1999-06-30 00:00:00',
        'CL': '1999-03-31 00:00:00',
        'CM': '1998-12-31 00:00:00',
        'CN': '1998-09-30 00:00:00',
        'CO': '1998-06-30 00:00:00',
        'CP': '1998-03-31 00:00:00',
        'CQ': '1997-12-31 00:00:00',
        'CR': '1997-09-30 00:00:00',
        'CS': '1997-06-30 00:00:00',
        'CT': '1997-03-31 00:00:00',
        'CU': '1996-12-31 00:00:00',
        'CV': '1996-09-30 00:00:00',
        'CW': '1996-06-30 00:00:00',
        'CX': '1996-03-31 00:00:00',
    }
    for key in list(bt.keys()):     #存入表头
        ws[key + '1'] = bt[key]
    i = 2
    g = 0
    for g_data in b_data:
        ws['A' + str(i)] = list(g_data.keys())[0]      #存入股票id名称
        i += 1
        t = 0
        for data in list(g_data.values())[0]:
            for key in list(data.keys()):
                b = 3 + g * 116
                value1 = data[key]
                if t == 0:
                    for k_values in list(value1.keys()):  # 存入第一列名称
                        ws['A' + str(i)] = k_values
                        i += 1
                    t = 1
                xh = get_v_key(bt, key)
                for v in value1.values():   #存入股票对应时间的现金流量表
                    if v == None:
                        v = '----'
                    ws[xh + str(b)] = v
                    # print(xh + str(b), v)
                    b += 1
        g += 1
        print(g)
    wb.save('xjls.xlsx')
    wb.close()

def main():
    '''
            上证A股 H+id
            深证A股 Z+id
            创业板  Z+id
            科创版  H+id
        '''
    # wb = Workbook()
    # wb.save('xjls.xlsx')
    a = 0
    num_ids_names = get_ids_names(a)
    ids_names = list(num_ids_names.values())[0]
    num = list(num_ids_names.keys())[0]
    print(a, num, ids_names)
    b_data = chuli_ids_names(ids_names, a, num)
    write_b_data(a, b_data)
    # for i in range(3):
    #     num_ids_names = get_ids_names(i)
    #     ids_names = list(num_ids_names.values())[0]
    #     num = list(num_ids_names.keys())[0]
    #     print(i, num, ids_names)
    #     b_data = chuli_ids_names(ids_names, i, num)
    #     write_b_data(i,b_data)

if __name__ == '__main__':
    main()

总结

因为数据量太大的缘故,中途容易数据获取失败,然后导致数据写入格式错误。经测验,爬取科创版这个股票数量较少的板块数据是无误的。但像上证A股这种一个板块上千只股票就无法正确爬取,代码仍需要进一步优化。

评论区

索引目录