Hello, everyone: I have a question, there is a group of file names, in the name of the file contains part of the page number, but some of the prefix of the document contains three digits of the period number, and some contain the year (four digits).
What I want to do is to extract the number of pages from this set of file names, instead of the period number of three digits or the year of four digits. But I always extract incorrectly when I use regular rules.
Can you give me an analysis of the problem? Thank you!
The following example:
877中国大饭店p4
877似懂非懂说p45-50 53-60
877萨法撒发放88
877水电费代收2019
封面-沙发
877圣达菲龙东大62-fd
献血的p58
Part of the extraction code I wrote:
ARRAY(str) filesy
findrx(subject[i] "\d{1,2}" 0 4 filesy)
However, the prefix 87 is extracted by using the rule D {1,2} and 20,19 is also extracted in 2019. Four-digit years and prefix "87" are not the result I want. What's wrong with my rules?
What I want to do is to extract the number of pages from this set of file names, instead of the period number of three digits or the year of four digits. But I always extract incorrectly when I use regular rules.
Can you give me an analysis of the problem? Thank you!
The following example:
877中国大饭店p4
877似懂非懂说p45-50 53-60
877萨法撒发放88
877水电费代收2019
封面-沙发
877圣达菲龙东大62-fd
献血的p58
Part of the extraction code I wrote:
ARRAY(str) filesy
findrx(subject[i] "\d{1,2}" 0 4 filesy)
However, the prefix 87 is extracted by using the rule D {1,2} and 20,19 is also extracted in 2019. Four-digit years and prefix "87" are not the result I want. What's wrong with my rules?