Approach
The titles generally accurately describes the manuscript, and therefore I used titles for clustering the documents for similarities. I used vectorization for converting text to numbers. The different NLP and clustering algorithms like Word2Vec, Kmeans are used under the LDA (Latent Symantic Analysis) approach which treats every document or sentence as bag of words. The TFIDF, or term frequency–inverse document frequency, is used to reflect how important a word is to a document. The silhouette score was used a sa metric for evaluating number of clusters. Rearrange the titles by their occurrences, grab their titles and decide the topic.
What were the common regulatory priorities of these agencies from 2001 through 2006?
Upon clustering data, of 2001 - 2006 we got that these were the common priorities of the agencies. Disclosure of new schemes, electronic filings,new regulatories for trading facilities. Analysis of financial measures, electronic filing. Information regarding state, international bank activities. Delegation of authorities Investment advisories Amendments of rules for mutual funds transactions,securities and commissions. Regulations for securities Risk based capital Guidance
What new topics and issues emerged from 2007 through 2012? (And what topics went away?)
New topics emerged were: Commodity Pool Operations Insurance Regulations Mortagage Acts,SRO,SHO regulations Rules of Practice and Procedure in Adjudicatory Proceedings Rules and amendments to Temporary Liquidity Guarantee Program Registration, Regulations, exemptions for swap data, dealers Revision and new registration acts for securities Financial market utilities Dodd-Frank Act Implementation Capital Guidelines What went away? guidelines for using electronic filings Rules regarding delegation of authorities were reduced.
How significantly do topics shift between administrations?
In 2007 the SEC were focused on the capital guidance whereas in 2012, they focused on Amendment of forms regarding Investment. In 2007 CFPB didn’t had released any rule, whereas in 2012 they were focused on Home mortgage rules and consumer rules. In 2007, FDIC were focussed on corporate reorganization and reporting requirements, and switched to Amendments of the FDIC regulations in 2012.
How many times, on average, would a topic need to be discussed before it is flagged as an emerging topic?
Based on the topics I grabbed from the docs, using simple analysis the titles under securities and capital were more in number, hence on an average there were around 200 titles concerning these words.
Built With
- python
- sklearn
Log in or sign up for Devpost to join the conversation.