This study analyzes corporate risk factor disclosure in annual report by k means clustering. Risk factor which became available as digital data by the introduction of EDINET and XBRL are expressed as a vector weighted by the IF-TDF, and analyzed k means clustering. As a result of the analysis, it was revealed that the similarity of the description contents of risk factor is high in banking industry, retailing industry, construction industry, real estate industry, information and communication industry. These results can be expected to be useful not only for research that analyzes corporate risk factor disclosure behavior such as prior research but also for research on the usefulness of corporate risk factor disclosure for analysis models. By applying natural language processing and machine learning methods used in this research to analysis of nonfinancial information, it becomes possible to quantitatively analyze nonfinancial information which is qualitative information.