問題

我想從一段中刪除單詞列表.所以我建立了我想刪除的列表

 fitlerWords= ['Cage','Contract','Number','Quantity','Unit','Cost','AWD','Date','CONTINUED',
 'SECTION', 'Procurement','history','For','on','Next','Page','Continuation','Sheet',
'Reference','of','Document','Being','CONTINUED','pages','SECTION']
 

如果存在,我想從這個句子中刪除上面的單詞

015536159/6630 CAGE Contract Number Quantity Unit Cost AWD Date 32YK1 SPE2DH19P0522 22.000 1394.13000 20190102 32YK1 SPE2DH18P1630 21.000 1356.41000 20180604 74YZ3 SPE2DH18P1184 15.000 1282.50000 20180314 32YK1 SPE2DH17V1630 16.000 1335.91000 20170214 58837 SPE2DH16V2501 17.000 1369.00000 20160601 32YK1 SPE2DH16M0463 13.000 1358.20000 20151125 CONTINUED ON NEXT PAGE X{0,0} CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: SPE2DH-19-T-6601 PAGE 4 OF 22 PAGES SECTION A Procurement History for NSN/FSC:015536159/6630 CAGE Contract Number Quantity Unit Cost AWD Date 32YK1 S$ DH16M0068 32YK1 SPE2DH14V3122 32YK1 S$ DH14V2252 32YK1 SPE2DH14V0165 58837 SPM2DH13V1222 08576 SPM2DH13M0509 58837 SPM2DH12V0342 08576 SPM2DH12M0490 08576 SPM2DH11V1261 3BSP4 SPM2DSO8MA800 3BSP4 SPM2DS08M6542 3BSP4 SPM2DS08M5128 3BSP4 SPM2DS08M5127 3BSP4 SPM2DS08M5125 18.000 1462.05000 20151005 12.000 1246.39000 20140918 9.000 1246.39000 20140711 10.000 1246.39000 20131223 12.000 1258.00000 20130724 15.000 1100.09000 20121205 27.000 1200.00000 20111223 34.000 1057.77000 20111202 3.000 1057.77000 20110727 2.000 947.16000 20080721 100.000 947.16000 20080323 2.000 947.16000 20080227 2.000 947.16000 20080227 2.000 947.16000 20080225 CONTINUED ON NEXT PAGE X{0,0} CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: SPE2DH-19-T-6601 PAGE 5 OF 22 PAGES SECTION B

所以我使用了這段程式碼

 for x in fitlerWords:
    try:
        filteredHistory = history.replace(x,"")
    except Exception as e:
        print(e, x)

    print(filteredHistory)
 

當我列印時,我得到段落.沒有被刪除.我做錯了什麼?如果存在,如何從段落中過濾所有這些單詞?

  最佳答案

使用re.sub與包含所有關鍵字的交替:

 fitlerWords = ['Cage','Contract','Number','Quantity','Unit','Cost','AWD','Date','CONTINUED', 'SECTION', 'Procurement','history','For','on','Next','Page','Continuation','Sheet','Reference','of','Document','Being','CONTINUED','pages','SECTION']
regex = r'(?:' + '|'.join(filterWords) + r')'
filteredHistory = re.sub(regex, '', history, flags=re.IGNORECASE)
print(filteredHistory)
 

注意:根據您對已更換的歷史文字的審美情況,您可能還希望刪除每個關鍵字的周圍空格,比如右邊的空格.在這種情況下,我們可以嘗試:

 regex = r'(?:' + '|'.join(filterWords) + r')\s*'
filteredHistory = re.sub(regex, '', history, flags=re.IGNORECASE)
 

這裡的正則表示式邏輯構建了一個模式,如下所示:

 (?:Cage|Contract|Number|Quantity)
 

它當然會有更多關鍵字,但這是一般模式.我們使用re.sub匹配此模式,然後用空字串替換,以有效刪除所有匹配的關鍵字. re.IGNORECASE標誌使這個正則表示式更換,無論關鍵字的情況如何.

  相同標籤的其他問題

python