In a recent study conducted by Washington State University, researchers explored the capabilities and limitations of artificial intelligence models like ChatGPT in financial licensing exams. While these models excel at multiple-choice questions, they struggle with more complex tasks that require nuanced reasoning. The study highlights both the potential and the current shortcomings of AI in finance, emphasizing the importance of human expertise for specialized financial tasks.
The Performance of AI Models in Financial Exams
Researchers examined over 10,000 responses from various AI language models, including BARD, Llama, and ChatGPT, to understand their proficiency in answering financial exam questions. These models were not only asked to select answers but also provide detailed explanations. Despite ChatGPT's superior performance compared to its counterparts, it still encountered difficulties with advanced topics. This suggests that while AI can synthesize well-established concepts, it falters when dealing with specific or unusual issues.
The study revealed that ChatGPT version 4.0 provided the most accurate and human-like responses among all models. However, after fine-tuning the earlier version of ChatGPT (3.5) with examples of correct answers, its performance significantly improved, sometimes even surpassing the paid version. Yet, both versions struggled with specialized scenarios such as determining client insurance coverage or tax status. This indicates that AI models are better suited for routine tasks rather than intricate financial analyses.
Implications for the Financial Industry
The research underscores the importance of human professionals in handling complex financial matters. While AI can assist with general concepts, it is not yet capable of fully replacing experienced financial analysts. The study authors suggest that AI should be viewed as a supportive tool rather than a replacement for human expertise. Additionally, the findings imply changes in how investment banks might employ entry-level analysts, potentially reducing the number of junior positions due to the menial nature of tasks assigned to them.
To further explore AI's capabilities, the researchers are now investigating its ability to evaluate potential merger deals using data beyond its training period. Preliminary results indicate that AI struggles with this task, reinforcing the notion that human judgment remains indispensable in high-stakes financial decisions. Overall, the study provides valuable insights into the evolving role of AI in the financial sector, highlighting areas where technology can enhance efficiency without undermining professional expertise.