问题

我得到了一个包含数百个变量的数据集,它们的标签全部弄乱了.

在检查了一些变量后,它看起来像数字随机出现在标签中。

以下是使用Stata的auto玩具数据集的类似示例:

 Ma6k0e a6nd Mo0d5e3l  
Pri1ce3  
Mi1le3age3 (mpg)  
Re6pa7ir R3ec8or9d 1978  
He4ad3ro5om (i7n.)   
Tr2un8k s9pa5ce (c4u.333ft.)  
We0ig7ht (lbs.)  
Len2gt4h (in.)  
Tu1rn Cir9c0le (ft.)   
Di7spl1ac3e7ment (cu.333in.)  
Ge3a6r Ra6ti1o  
Ca5r ty4pe2
 

如何快速清理这个?

  最佳答案

这是一种快速的方法:

 sysuse auto, clear

label variable make "Ma6k0e a6nd Mo0d5e3l"
label variable price "Pri1ce3"
label variable mpg "Mi1le3age3 (mpg)"
label variable rep78 "Re6pa7ir R3ec8or9d 1978"
label variable headroom "He4ad3ro5om (i7n.)"
label variable trunk "Tr2un8k s9pa5ce (c4u.333ft.)"
label variable weight "We0ig7ht (lbs.)"
label variable length "Len2gt4h (in.)"
label variable turn "Tu1rn Cir9c0le (ft.) "
label variable displacement "Di7spl1ac3e7ment (cu.333in.)"
label variable gear_ratio "Ge3a6r Ra6ti1o"
label variable foreign "Ca5r ty4pe2"

foreach var of varlist * {
    display ""
    display "`: variable label `var''"
    label variable `var' `"`= ustrregexra("`: variable label `var''", "[0-9]", "")'"'
    display "`: variable label `var''"
}
 

结果:

 Ma6k0e a6nd Mo0d5e3l
Make and Model

Pri1ce3
Price

Mi1le3age3 (mpg)
Mileage (mpg)

Re6pa7ir R3ec8or9d 1978
Repair Record 

He4ad3ro5om (i7n.)
Headroom (in.)

Tr2un8k s9pa5ce (c4u.333ft.)
Trunk space (cu.ft.)

We0ig7ht (lbs.)
Weight (lbs.)

Len2gt4h (in.)
Length (in.)

Tu1rn Cir9c0le (ft.) 
Turn Circle (ft.) 

Di7spl1ac3e7ment (cu.333in.)
Displacement (cu.in.)

Ge3a6r Ra6ti1o
Gear Ratio

Ca5r ty4pe2
Car type
 

  相同标签的其他问题

stata