ChatGPT, a generative AI chatbot based on a large language model, has seen a rapid rise in popularity. Its ability to understand and generate human-like text has opened new opportunities for technological applications and real-life use cases. This article focuses on using ChatGPT for data analysis, an area where its capabilities are both promising and challenging.
ChatGPT, developed by OpenAI, is a conversational agent designed to understand and generate human language. It's powered by a large language model, which allows it to interpret and respond to a wide range of text inputs.
To use ChatGPT for data analytics, a ChatGPT Plus subscription is required since file upload is not available for free users. ChatGPT Advanced Data Analysis is a feature within ChatGPT that allows users to upload data directly to ChatGPT and ask questions about the data.
Using ChatGPT for data analysis can be a powerful way to leverage AI in understanding and interpreting data. Here are some practical applications:
We leveraged ChatGPT's Advanced Data Analysis feature in various contexts, where it demonstrated both versatility and its limitations. Here’s a summary of how it performed in different scenarios.
Tools: Custom GPT, Code Interpreter.
Dataset: Arbitrary dataset with structured advertising data.
In this example, we used Code Interpreter and requested a Custom GPT to analyze and generate insights from an arbitrary advertising dataset with structured data. ChatGPT successfully created visualizations and identified key metric trends, such as cost, revenue, click-through rate (CTR), and conversions. While it pinpointed top-performing traffic sources and offered actionable suggestions, the final recommendations were generally broad and not actionable.
Tools: Custom GPT, Code Interpreter.
Dataset: Google Search Console Data Export.
In another case, we asked a Custom GPT to suggest SEO improvements. When analyzing Google Search Console Data, ChatGPT offered a regional performance comparison and recognized date-wise trends. Its analysis of top queries, search appearance, and devices led to valid, actionable recommendations. However, when asked about page performance it provided non-existent page URLs (data hallucination).
Tools: Custom GPT, Code Interpreter.
Dataset: Mixpanel MAU data.
In this example, a custom GPT was used to analyze and generate insights from Mixpanel MAU data. In analyzing Mixpanel's Monthly Active Users (MAU) data, ChatGPT produced visualizations and trendlines for key metrics. Although it provided a general overview of the data, both the observations and the recommendations lacked depth, emphasizing the need for more detailed, context-specific insights.
Tools: Custom GPT, Code Interpreter.
Dataset: Arbitrary customer feedback dataset with unstructured text.
ChatGPT's performance in analyzing unstructured text from customer feedback highlighted its limitations in handling incomplete data. It struggled with fields that were not fully populated, only generating insights from fully completed columns. Nonetheless, the insights it did provide were valid and well-summarized, showing its potential in text analysis when provided with complete data.
In all of the scenarios mentioned, ChatGPT was able to process and analyze data to some degree. However, the observations made contained errors (data hallucinations) and were often too generic, lacking specificity and detail.
AI-powered tools like ChatGPT or Gemini mainly use a conversation style for interaction, which isn't always the best way to work with data. SQL, for example, was made to be somewhat like English, and getting answers from data through structured query language or query-builders might be more practical and accurate. Moreover, when chatting with data in plain English, users need to know exactly what questions to ask. Along with this, there are other limitations:
In addition to these limitations, over-reliance on AI for data interpretation can lead to misinterpretation and overlook crucial nuances that human analysts or rules-based non-AI systems might catch.
Utilizing LLM-based tools such as ChatGPT or Gemini in data analysis requires writing precise prompts and understanding the scope and limitations of each tool.
Exploring other AI-powered data analysis tools, like Narrative BI, which offers easy-to-use solutions based on specialized Generative BI model, can help you avoid ChatGPT's hallucinations.
Anticipated improvements in AI models could enhance data analytics capabilities, making AI interfaces more robust and insightful.
While ChatGPT offers exciting possibilities in data analysis, it's crucial to be aware of its limitations and the need for human oversight. The role of AI in data analytics is evolving, and ChatGPT is a significant, albeit limited, part of this landscape.
Unlike ChatGPT, trained for generic tasks, Generative BI solutions offer more specialized functions and deeper insights compared to chat-based AI.
Consider trying Narrative BI, a generative business intelligence platform, for a more tailored data analysis experience, especially if you're seeking insights beyond what generic models like ChatGPT can offer.