Open Source Research Findings
Part 2 of a 3 Part Series
By Vic Singh and James Barone
Cracking the AI Open-Source Code: Part II — Findings from our Study
tl;dr: Open-source AI may be the next forefront of open innovation, but the landscape remains complex. In Part 1 of this series, we illustrated our view of the existing open-source AI market map and tooling stack. In this post — Part 2 of this series — we illustrate our findings from expert interviews and surveys, which reveal which models & frameworks lead in popularity, how investors see the potential across the stack (particularly in areas around observability, security, and privacy), and how to assess promising startups. Overall, our research provided key insights into the dynamic open-source AI landscape as it continues to rapidly evolve.
Navigating the AI Landscape — Insights from Domain Experts and Practitioners
In parallel with developing our market map, we wanted to better understand how domain experts and practitioners in the industry are approaching building, researching, buying and investing in the open-source AI space. We spoke with numerous leaders, in addition to distributing a comprehensive survey, to gauge current opinions on open-source AI. The research asked respondents to share their stance on open vs closed source AI frameworks, which specific open-source models & tools they actively (or will actively) utilize, and their outlook on the trajectory of the market. Our goal was to synthesize perspectives from both startups and incumbents on how they are leveraging open source in their AI strategies.
Discovery
The first questions we asked were related to the discovery of open-source projects. We wanted to understand the best ways to learn about new projects initially. Typical forms of distribution, such as HackerNews, Blog Posts, and X (Twitter) are the easiest ways to find new open-source software, with HuggingFace also growing in popularity.
Usefulness
Next, we wanted to learn how usefulness can be measured. In other words, how can we tell if a project is valuable to its community? Community engagement and Github metrics are, unsurprisingly, popular ways to gauge a project’s success; however, the metric that was the best indicator of success was usage statistics, which can be difficult to measure simply by looking at forks, commits, stars, etc. on Github. In our 1:1 conversations with founders and investors, many vocalized the importance of incorporating telemetry, or the automatic collection, measurement, and transmission of software usage and user data, into their projects.
Open-source purists typically steer away from incorporating telemetry into their products, but it’s clear that measuring usage is the best way to understand users, iterate, and develop better features. Startups like Crowd.dev and OpenTelemetry have recognized this need and are attempting to solve this gap in the market by making it simple for open-source developers to incorporate telemetry into their projects.
Model Preference
In our research, we also wanted to understand which of the open-source foundational models and frameworks founders and engineers were utilizing. Here, the clear favorite was Meta’s LLaMA LLM and Pytorch framework.
When asked specifically why LLaMA and Pytorch were used over almost any other framework, the answers coalesced around compatibility and interoperability. One survey responder summarized it perfectly: “PyTorch enables us to finetune almost any OSS model…” Clearly, Meta is building with the idea that vendor lock-in is seen as a negative and should be avoided.
Fundraising
As an open-source AI project, deciding when to raise Series A funding is a nuanced process, with investors divided on the ideal range. The general takeaway from our research is that quality of revenue, rather than a specific financial range, should be the guiding factor. While some suggest a sweet spot of ARR between $200,000 to $750,000, there is no one-size-fits-all answer.
General takeaways from founders & investors
Our survey also yielded a range of founder & investor perspectives on the role of open-source foundational models in the broader AI landscape and their potential for growth compared to closed-source models.
Founders highlighted several key points, emphasizing the importance of open-source models for experimentation and fine-tuning, as well as their ability to remove limitations found in closed-source alternatives. Open-source models are seen as cost-effective, innovative, and essential for mitigating monopolies held by closed-source alternatives.
While investors generally agree with founders on the opportunities for open-source AI solutions, they are still exploring the space and are spending time all over the stack. When asked where they’re exploring and deploying capital, responses didn’t indicate a consensus.
This divergence in perspectives suggests that the market is still in its nascent stages and undergoing formative development. While some investors see potential in areas like agents, observability tools, middleware, and applications, others express uncertainty or believe the market will organically form around undiscovered niches. The diverse responses underscore the complexity of the open-source AI landscape and the ongoing evolution of this field, where the path to successful investments may vary significantly, reflecting the dynamic nature of the industry.
Although answers were erratic, the space where almost all investors are focused is in observability, security, and privacy. The collective focus on these specific domains can be attributed to the widely held belief that the ultimate viability and success of open-source AI companies hinge significantly on their ability to gain commercial traction. Observability tools allow organizations to gain insights into the performance and behavior of their AI systems, ensuring transparency and accountability — a crucial aspect for enterprises implementing AI solutions. Furthermore, security and privacy represent paramount concerns in a data-driven world, as safeguarding sensitive information and ensuring robust cybersecurity measures are essential for both compliance and consumer trust.
The findings from our 1:1 conversations with experts, in addition to the survey sent to a much broader audience, helped inform how we at Eniac approach investing in this space.
Conclusion
In closing, our research reveals an open-source AI landscape brimming with potential, yet still taking shape. While developing effective observability tools to monitor model performance appears pivotal for enterprises to adopt open-source AI solutions, expert opinions vary on where the greatest opportunities lie across the stack. Overall, our findings provide key guideposts for navigating the complex, dynamic ecosystem still taking form. At Eniac, we eagerly anticipate partnering with the bold founders pushing open-source AI to realize its immense possibilities in building a brighter future for all. If you’re one of those founders, feel free to reach out to us [email protected] or @vicsingh on Twitter. Regardless, stay tuned for our last post, Part 3 of 3, which reveals how we at Eniac approach investing in this space!
Special thank you to all survey respondents, as well as these founders, investors, and operators who were generous enough with their time to provide first party insights & feedback in 1–1 interviews: Amanda “Robby” Robson, Tim Chen, Gaurav Gupta, James Alcorn, Zander Matheson, Alan Zabihi, Ismail Pelaseyed, Juliet Bailin, Raj Singh, Kyle Corbitt, and Andrew Carr