Parsing SEC RSS Title field using regex
All, I have an RSS feed from the SEC with company title as follows; e.g.,
10-Q - What ever INC (0000123456) (Filer)
so the general structure is:
form_name + whitespace + dash + whitespace + company_name + " (" +
SIC_Number + ") (Filer)"
I need to extract the company_name and SIC_Number. Note the form_name can
have a dash, and the company name will have white spaces. This can be done
(I'm using python) by using the re.split function for the dashes, and
again for the brackets, but it's ugly.
What would the proper RegEx be?
No comments:
Post a Comment