Deep Research: OpenAI vs Gemini. A split screen of two different web crawlers (mechanic spiders) examine the internet.

OpenAI Crushes Gemini in Deep Research Test: Part Time Larry

The various LLM’a are in a furious race to become better and better. They are becoming more reflective and can make more careful decisions and thorough investigations. Here about the research aspect in this post: Deep Research: OpenAI vs Gemini.

The race for AI dominance is heating up, and for private SaaS companies, understanding the capabilities of the major players is crucial. Choosing the right AI foundation can be the difference between a breakthrough product and a costly misstep. This brings us to a fascinating comparison: OpenAI deep research vs Gemini. Can we leverage the power of AI itself to understand the strengths and weaknesses of these AI giants? Let’s find out, guided by a recent video from the YouTube channel Part Time Larry.

Introducing Part Time Larry

Part Time Larry (@parttimelarry) is a unique YouTube channel blending the worlds of financial markets and software engineering. With over 114,000 subscribers and 179 videos, Larry focuses on teaching viewers how to build Python programs for analyzing, visualizing, and integrating market data, APIs, and financial services. His channel is a go-to resource for anyone interested in the practical application of coding to finance. In a recent video, Larry put his prompt engineering skills to the test in a fascinating comparison of AI deep research capabilities.

Channel Presentation

  • Part Time Larry focuses on the practical application of Python programming to financial markets and data analysis. He provides tutorials and demonstrations on topics like API integration, data visualization, and algorithmic trading.
  • His style is informative and practical, with a clear emphasis on code examples and real-world applications. He appears to target a technically-minded audience.
  • Typical audience likely includes developers, quants, data scientists, and finance professionals with an interest in coding.
  • Key Facts:
    • Subscriber count: 114,000
    • Average video length: 10-30 minutes (based on video list analysis)
    • Posting frequency: Roughly weekly (based on video list analysis)
    • Notable series/themes: API integrations, financial data analysis, Python coding tutorials, and increasingly, AI exploration.

The Video’s Core: A Deep Dive Comparison

  • Video Title: “OpenAI Deep Research vs Gemini – Private SaaS Companies”
  • What’s the Video About?
    • The video is a head-to-head comparison of OpenAI and Gemini’s deep research capabilities, focusing on their relevance to private SaaS companies.
  • The Research Question:
    • “What are the key differences in recent research output between OpenAI (newest model) and Google’s Gemini 1.5 Pro (with Deep Research), specifically regarding the Remote Monitoring and Management (RMM) software (tools used by IT professionals to manage computer systems remotely) market for Managed Service Providers (MSPs) (companies that remotely manage a customer’s IT infrastructure) in North America, and what are the implications of these differences for investors?”
  • Why This Market? (Data Scarcity):
    • Larry deliberately chose a niche, data-scarce market to test the AIs’ ability to:
      • Find information from scattered sources (Reddit, niche blogs, industry databases).
      • “Fill in the blanks” and synthesize a coherent narrative.
      • Understand industry-specific jargon and acronyms.
      • Make predictions with incomplete data.
  • AI Models Used:
    • OpenAI newest model (with deep research).
    • Gemini 1.5 Pro (with deep research).

Crafting the Prompt: Larry’s Approach

  • Prompt Summary:
    • Larry crafted an exceptionally detailed prompt, spanning approximately 60 lines in its Markdown format. This prompt, which you can find in the video description [link to video: https://www.youtube.com/watch?v=zE4eApsRYuo], outlined a six-part report structure, provided clear instructions, specified the desired output, and even engaged in a bit of role-playing (“Take as long as you need…”). It’s a masterclass in prompt engineering.
  • Prompt Breakdown (Key Elements):
    • Context Setting: The prompt clearly defines the task (compiling an investor-focused research report) and the target industry (RMM software for MSPs in North America).
    • Specific Task: The core instruction is to “Compile a comprehensive report…” with detailed analysis of various aspects of the market.
    • Constraints: The prompt focuses the research on North America and the specific RMM/MSP market.
    • Output Format: A structured report tailored for an investor, with citations for all data points.
    • Six-Part Structure: The prompt meticulously defines six key sections:
      1. Market Overview
      2. Financials and Valuation
      3. Competitive Landscape and Key Players
      4. Investment and Market Accessibility
      5. Product and Industry Trends
      6. Strategic Outlook and Predictions
    • Detailed Sub-points: Each section contains highly specific sub-questions and tasks, leaving little room for ambiguity. For example, within “Financials and Valuation,” the prompt asks for revenue figures, growth rates, SaaS metrics (ARR (Annual Recurring Revenue), churn, NRR, CAC, LTV, etc.), valuation estimates, and public comps.

Evaluating the AI Responses

Larry’s Evaluation Method

  • Larry meticulously judged the reports based on a clear set of criteria:
    • Length: The sheer volume of information provided (depth).
    • Structure: How well the reports adhered to the six-part structure outlined in the prompt.
    • Correctness: The factual accuracy of the information presented.
    • “Filling in the Gaps”: The ability to infer missing information and draw logical conclusions based on limited data.
    • Insightfulness: The generation of actionable insights relevant to investors.
    • Bold Predictions: The willingness to make specific, forward-looking predictions about the market.

Expert Opinions (LLM Judges)

  • LLM Judges: Larry cleverly used not only his own expertise but also enlisted OpenAI, Claude, and even Gemini itself as LLM judges to evaluate the reports. This provided a multi-faceted perspective on the quality of the AI-generated research.
  • LLM Judges’ Prompt: He provided a separate, detailed prompt to the LLM judges, instructing them to “Evaluate and compare two investment reports… identify strengths and weaknesses…” with specific metrics across ten categories, including: Overall Comparison, Depth & Quality of Research, Investment-Focused Analysis, Competitive Landscape & Strategic Insights, Use of AI & Emerging Trends, Sentiment Analysis & Industry Perception, Clarity, Readability & Organization, Predictive Accuracy & Forward Thinking, Technical Accuracy & Data Integrity, and Strengths & Weaknesses of Each Researcher/LLM.
  • Unanimous Agreement (Summarized by Larry): Crucially, Larry states that all the LLM judges, including Gemini 2.0 Pro Experimental, concurred that OpenAI’s report was significantly superior. He summarizes their findings, highlighting agreement on points like OpenAI’s superior strategic insights, actionable insights, and more exhaustive analysis.
    The unanimous agreement among the LLM judges, including Gemini itself, reinforces Larry’s own assessment. But beyond the overall verdict, the video reveals specific strengths and weaknesses of each AI that are crucial for understanding their practical applications.

Prompt Analysis: Strengths, Weaknesses and Gemini Fails

  • Strengths: Larry’s prompt was a model of clarity and detail. The six-part structure, specific sub-questions, and focus on investor-relevant information were key to eliciting a strong response from OpenAI. The prompt’s length, while substantial, was justified by the complexity of the task.
  • Weaknesses: While the prompt was excellent overall, its breadth might have contributed to OpenAI missing the very recent events that Gemini captured. This isn’t necessarily a flaw in the prompt itself, but rather a demonstration of the inherent trade-offs in designing any research task. Larry’s subsequent experiment (33:10), where he prompted OpenAI specifically about the company, showed that the AI did have access to the updated information. This highlights a potential benefit of “chaining” prompts, as Larry suggests, to ensure both breadth and depth.
  • Overall Assessment: Larry’s prompt engineering was exceptional, demonstrating a deep understanding of how to effectively communicate with and guide an LLM.
  • Prompt Modification: Larry himself suggests a valuable technique: “chaining” prompts. This involves breaking down the research into a series of smaller, more focused tasks. For example, the initial prompt could identify key players; then, separate prompts could be used to conduct deep research on each player individually. This could potentially improve both depth and accuracy, especially regarding real-time information.
  • Gemini Fails
    • Structure: Gemini did not follow the order of the report sections specified in the prompt.
    • Length/Depth: Gemini’s report was significantly shorter and less detailed (approximately 3,000 words vs. 15,000 words).
    • Missing Public Information: Gemini failed to incorporate publicly available financial data for nAble, a key player in the market.
    • “Filling in the Blanks”: Gemini often stated that information was unavailable, rather than attempting to infer or extrapolate, as OpenAI did.
    • Bland Predictions: Gemini’s predictions were generic and lacked the specific, bold forecasts that OpenAI provided.

Key Takeaways

The results of Larry’s experiment, as detailed in the video, paint a clear picture of the current state of AI deep research. While both OpenAI and Gemini 1.5 Pro offer valuable capabilities, the differences in their performance were significant, with implications for SaaS companies and investors.

  • OpenAI’s Dominance: The video demonstrates a clear and decisive victory for OpenAI’s deep research capabilities over Gemini 1.5 Pro’s in this specific scenario. Larry consistently points out OpenAI’s superior depth, accuracy, structure, insightfulness, and willingness to make predictions.

  • Gemini’s Niche Advantage: While largely outmatched, Gemini 1.5 Pro did demonstrate a slight advantage in capturing very recent (January 2025) events – a company acquisition and a CEO resignation – that OpenAI initially missed in the comprehensive report. Larry later clarifies (33:10) that this wasn’t necessarily a knowledge cutoff issue; OpenAI was able to find this information when prompted specifically about the company in question. This suggests the initial miss might have been due to the prompt’s breadth, prioritizing a broad overview over capturing every recent development, or potentially a choice to prioritize speed by using a cached snapshot of information. This highlights a trade-off between comprehensiveness and up-to-the-second accuracy.

  • Practical Implications:

    • SaaS companies and investors can potentially leverage LLMs for rapid, in-depth market research.
    • OpenAI’s current deep research capabilities appear to be significantly more advanced for this type of complex task.
    • Careful prompt engineering is essential.
    • OpenAI’s Analytical Depth: One of the most striking differences was OpenAI’s ability to go beyond readily available data and perform insightful analysis. Larry highlights this (23:37) how OpenAI intelligently inferred key SaaS metrics, even for private companies where precise figures like CAC and LTV are typically hidden. For example, by analyzing Ninja One’s public statements about its marketing strategy (minimal spending, reliance on word-of-mouth), OpenAI deduced that the company likely had a “relatively low customer acquisition cost.” Furthermore (26:03), OpenAI demonstrated its analytical prowess by extrapolating from the publicly available data of companies like nAble (a publicly traded competitor) and the formerly-public Datto to develop a broader understanding of SaaS metric trends within the entire RMM market. This “filling in the blanks” capability is crucial for insightful analysis in data-scarce environments.
    • Bold Predictions vs. Generalities: The “Strategic Outlook & Predictions” section showcases a stark contrast between the two AIs. Larry emphasizes (27:59) that OpenAI was willing to make bold, specific predictions, offering concrete timelines and scenarios. OpenAI speculated on potential IPOs, such as suggesting CA might go public in “late 2025 or 26,” and even projected specific growth targets, like Ninja One doubling its customer base within 18-24 months (30:06). On the other hand, Gemini 1.5 Pro’s predictions were, as Larry put it, “super boring” (27:15). Gemini offered only generic statements about continued market growth, AI adoption, and cybersecurity focus – observations that provide minimal value to an investor seeking actionable intelligence.
    • Gemini’s Factual Oversights: A significant weakness of Gemini 1.5 Pro, as Larry points out (22:32), was its failure to incorporate readily available financial data for nAble. Despite identifying nAble as a key player, Gemini seemingly overlooked the crucial fact that it’s a publicly traded company (ticker symbol: NABL). While Gemini presented figures for other players in the market, it failed to take advantage of nAble´s available data. This omission highlights a critical difference in research thoroughness compared to OpenAI, which leveraged nAble’s public filings to provide a more in-depth financial analysis.
    • Structure and Adherence to Instructions: The importance of following instructions is clearly demonstrated in Larry’s analysis (14:16). He explicitly notes that his prompt defined a six-part report structure, beginning with “Market Overview.” OpenAI followed this structure meticulously. However, Gemini 1.5 Pro inverted the order, placing “Competitive Landscape” first. This seemingly minor detail reveals a significant difference in how the two AIs process and respond to complex instructions.
  • Future Outlook: The video implicitly suggests that while OpenAI currently holds a strong lead, the competition in AI research is fierce, and Gemini (and other models) will likely continue to improve. The speed of advancements makes continuous evaluation of these tools critical for staying ahead.


Conclusion

This head-to-head comparison between OpenAI and Gemini 1.5 Pro’s deep research, as presented by Part Time Larry, offers a compelling glimpse into the power and limitations of current LLM technology. The video clearly demonstrates that, with meticulous prompt engineering, LLMs can be invaluable tools for accelerating and enhancing market research, particularly in niche, data-scarce industries. While OpenAI currently holds a significant advantage in this area, the rapid pace of AI development suggests that the landscape will continue to evolve.

The key takeaway, however, extends beyond simply choosing the “winning” AI. Larry’s success hinges on his mastery of prompt engineering. The exceptionally detailed, structured prompt—written in Markdown, a format preferred by LLMs—was the crucial ingredient. This underscores a vital point: the quality of the output from these powerful tools is directly proportional to the quality of the input. Vague or poorly structured prompts will yield vague or inaccurate results, while carefully crafted, specific instructions unlock the true potential of AI research.

Furthermore, Larry’s experiment serves as a stark warning to those in the business of selling market reports, particularly on platforms like Substack. As he points out (36:53), the level of analysis provided by OpenAI’s deep research, given a well-engineered prompt, is already incredibly high. To compete with this readily available and increasingly sophisticated technology, purveyors of market intelligence will need to offer truly exceptional insights, going far beyond what an LLM can currently generate. The bar for human expertise has been significantly raised.

We encourage readers to watch the full video and experiment with their own research prompts, remembering that the key to success lies in clear, detailed, and well-structured instructions – a lesson expertly demonstrated by Part Time Larry. What insights can you uncover?

Read a review of Google Deep Research and Perplexity in this blogpost on Foodcourtification.com.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *