问题

我正在通过pd.read_html从谷歌金融中删除表数据,然后将该数据保存到excel通过df.to_excel(),如下所示:

     dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
    xlWriter = pd.ExcelWriter(output.xlsx, engine='xlsxwriter')

    for i, df in enumerate(dfs):
        df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))
    xlWriter.save()
 

但是,保存到excel的数字被存储为文本,在单元格的角落中有小绿色三角形.当将此数据移动到excel时,如何将它们作为实际值而不是文本存储?

  最佳答案

考虑将数字列转换为浮点数,因为pd.read_html将Web数据读取为字符串类型(即对象).但在转换为浮点数之前,您需要将连字符替换为NaN:

 import pandas as pd
import numpy as np

dfs = pd.read_html('https://www.google.com/finance?q=NASDAQ%3AGOOGL' +
                   '&fstype=ii&ei=9YBMWIiaLo29e83Rr9AM', flavor='html5lib')
xlWriter = pd.ExcelWriter('Output.xlsx', engine='xlsxwriter')
workbook = xlWriter.book

for i, df in enumerate(dfs):
    for col in df.columns[1:]:                  # UPDATE ONLY NUMERIC COLS 
        df.loc[df[col] == '-', col] = np.nan    # REPLACE HYPHEN WITH NaNs
        df[col] = df[col].astype(float)         # CONVERT TO FLOAT   

    df.to_excel(xlWriter, sheet_name='Sheet{}'.format(i))

xlWriter.save()
 

  相同标签的其他问题

pythonexcelpandas