Haystack US Workshop and Training Day

A full day of workshops and training led by the OpenSource Connections team, OpenSearch, Datastax, Aryn.ai, John Berryman, author of Prompt Engineering for LLMs, and Trey Grainger, author of AI Powered Search.

Morning Sessions 9:00-12:30
- ~~RAG for Beginners~~
- Learning to Hybrid Search
- Hands-on Generative AI with Langflow
- Mastering LLMs: From Prompt Engineering to Agentic AI
Afternoon Sessions 1:30-5:00
- ~~Evaluating Complex Systems with AI technologies~~
- Build next-gen search with Amazon OpenSearch Service
- Build an AI-powered financial analysis application using Aryn and OpenSearch
- AI Powered Search: Interpreting Query Intent

Get the practical tools and skills you need to succeed in leading your next AI-based Search project.

Prerequisites

What do you need: A curious mind, a basic understanding of the experimental process, and a desire to understand how to pick and choose across all the enabling technologies that are available for search and search adjacent activities.
Who should attend: Developers, product managers, relevance engineers, and everyone else who wonders about how to tell if the magic box is actually magic.

Workshop Descriptions

Title	Presenter	Description
~~RAG for Beginners~~	~~Scott Stults~~	Retrieval Augmented Generation (RAG) promises to revolutionize how we search. This workshop is a hands-on journey to construct and tune a RAG application. We will build the search index for the date we will use to answer our questions. We will construct, and then refine, the prompt we use to obtain our answers from a Large Language Model (LLM). We will explore the space of limitations and gothchas associated with using Generative AI technologies.
Learning to Hybrid Search	Daniel Wrigley	Are you looking to enhance your search system by combining lexical and vector search? Struggling to find the right balance between the two? This hands-on workshop will equip you with practical techniques to systematically optimize hybrid search. What You’ll Learn: Systematic Hybrid Search Tuning: How to experiment offline to determine the best hybrid search configuration for any system. Machine Learning for Search Optimization: How to transition from static settings to a dynamic approach that predicts and fine-tunes search parameters in real time. Measuring and Iterating for Quality: How to evaluate search performance using search result quality metrics to quantify improvements. Practical Implementation: How to apply these techniques in your own search infrastructure. We’ll use OpenSearch and Jupyter notebooks for demonstrations, but the principles apply to any search engine that supports hybrid search. Who Should Attend? Search engineers & practitioners looking to fine-tune hybrid search. Data scientists interested in applying machine learning to search optimization. Prerequisites: Familiarity with search engines (lexical & vector search basics). A basic understanding of machine learning will be helpful but not required. The workshop is based on Jupyter notebooks—no coding required, but Python knowledge will enhance your experience. By the end of this session, you’ll have the knowledge and tools to build adaptive, machine learning-driven hybrid search systems that dynamically respond to user queries, improving relevance and user satisfaction.
Hands-on Generative AI with Langflow	Matt Overstreet	Learn how to use Langflow to quickly prototype Generative AI applications. This session will take engineers, and other interested individuals, from setup to building their own multi-agent LLM-based RAG instance.
Mastering LLMs: From Prompt Engineering to Agentic AI	John Berryman	The rapid evolution of AI and Large Language Models (LLMs) has opened new possibilities for automation, content generation, and interactive agents. This hands-on workshop is designed for developers, researchers, and AI enthusiasts who want to deepen their understanding of LLMs and learn how to harness their full potential. Topics covered include: – How LLMs work and the role of reinforcement learning in training – The art and science of prompt engineering, including zero-shot and few-shot techniques – Retrieval-Augmented Generation (RAG) for integrating external knowledge – Agentic AI: Designing chatbots and workflow agents – Fine-tuning models using LoRA for custom behaviors – Evaluation methods for improving AI performance – Future trends, including multimodal models and new interaction paradigms Attendees will leave with practical skills, implementation strategies, and insights into the future of AI-powered applications.
~~Evaluating Complex Systems with AI technologies~~	~~David Fisher~~	AI promises to revolutionize how we access information – but how can we measure the benefits? In this workshop we will explore the space of evaluation strategies for complex AI-based systems. We will explore the use of AI techniques, including Large Language Models (LLMs) to generate judgments for us automatically. We will also explore employing all of the things we know about evaluating search systems to evaluating AI-assisted search, and other AI-enabled activities. Get the practical tools and skills you need to succeed in leading your next AI-based Search project.
Build next-gen search with Amazon OpenSearch Service	Jon Handler	This workshop demonstrates how to build next-generation e-commerce search experiences by combining Amazon OpenSearch Service with advanced ML techniques. Participants will implement various ML search methodologies from vector search, multimodal search to hybrid approaches. By integrating the search types with LLM agents, the attendees will end up building an intelligent shopping assistant that transforms traditional product discovery into an interactive, intent-aware shopping experience.
Build an AI-powered financial analysis application using Aryn and OpenSearch	Jonathan Fritz and Henry Lindeman	In the age of GenAI, grokking complex documents has become increasingly important because without good clean data, you cannot get quality results. In this hands-on workshop, we will use Aryn’s DocParse service and Aryn’s Sycamore ETL Engine to build an application that performs financial due diligence over thousands of reports. We will first walk through how you can use our SOTA visual AI system to accurately parse and extract data from unstructured financial documents. We’ll then further process that data using Sycamore’s LLM-powered transforms to extract semantically meaningful information. Next, we’ll use Sycamore’s in-built functionalities to chunk, embed and load that data into OpenSearch. Finally, we’ll run complex analyses on the ingested data to uncover insights and extract meaningful financial patterns. By the end of the workshop, attendees will gain experience in building a scalable, AI-powered application that automates data extraction, transformation, and analyses over a large collection of unstructured documents.
AI Powered Search: Interpreting Query Intent	Trey Grainger	Whether implementing user-facing search applications or using RAG to find context for an AI model, properly interpreting query intent is one of the most important requirements for building relevant search. In this training, AI-Powered Search author Trey Grainger will walk you through key concepts and code examples for implementing a next-generation query intent engine. We’ll create building blocks – leveraging context from documents and embeddings, user preferences (from query logs and click streams), and domain-specific knowledge graphs – and integrate them together into an end-to-end query interpretation and rewriting framework. Topics covered will include: – Mental models for understanding the nuances of language interpretation, user-context, and domain-specific intent – Various approaches to semantic search (dense embeddings, sparse vectors, etc.) – Query intent approaches for user-facing search vs. RAG – Leveraging tools like Semantic Knowledge Graphs, HyDE, and SPLADE for query interpretation and expansion – Learning domain-specific terminology (synonyms, misspellings, related terms) from both documents and user behavioral signals – Boosting popular and personalized results by learning both aggregate and user-specific query intent – Semantic query parsing, query-sense disambiguation, semantic functions, and query rewriting – LLM based query interpretation and benchmarking As an attendee, you’ll walk away with the skills needed to implement better, contextualized query interpretation and rewriting and with code examples for applying these AI-powered search approaches to your own applications.

Your trainers

	Trey Grainger is lead author of the AI-Powered Search book (Manning 2024) and the Founder of Searchkernel, a software consultancy building the next generation of AI-powered search. He previously served as CTO of Presearch, a decentralized web search engine, and as Chief Algorithms Officer and SVP of Engineering at Lucidworks, an AI-powered search company whose search technology powers hundreds of the world’s leading organizations. He is also co-author of Solr in Action. Trey has 17 years of experience in search and data science, including significant work developing semantic search, personalization and recommendation systems, and building self-learning search platforms leveraging content and behavior-based reflected intelligence. This work resulted in the publication of dozens of research papers, journal articles, conference presentations, and books focused on intelligent search systems.
	David Fisher has over 30 years of experience designing, building and implementing information extraction and information retrieval systems. As a principal software engineer on the open source Lemur Project, while working in the Center for Intelligent Information Retrieval, he has developed custom applications and components for numerous search engines, including Indri, Galago, and Tsidy. His primary focus has been on efficient indexing structures and complex retrieval model implementations.
	Daniel Wrigley has worked in search since graduating in computational linguistics studies at Ludwig-Maximilians-University Munich in 2012 where he developed his weakness for search and natural language processing. His experience as a search consultant paved the way for becoming an O’Reilly author co-authoring the first German book on Apache Solr.
	Scott Stults has led implementation teams on a variety of Department of Defense, UK Ministry of Defense, and commercial projects where implementing search has been constrained by sophisticated security requirements. He has implemented Multi-Level Security (MLS) in a Lucene-based environment and is working with the Solr community to develop best practices for securing Solr. As a co-founder of OpenSource Connections, which has been working with Solr since 2007, Mr. Stults leads the Solr Security Practice for the company.
	John Berryman is the founder and principal consultant of Arcturus Labs, where he specializes in AI application development (Agency and RAG). As an early engineer on GitHub Copilot, John contributed to the development of its completions and chat functionalities, working at the forefront of AI-assisted coding tools. John is coauthor of “Prompt Engineering for LLMs”. Before his work on Copilot, John’s focus was search technology. His diverse experience includes helping to develop next-generation search system for the US Patent Office, building search and recommendations for Eventbrite, and contributing to GitHub’s code search infrastructure. John is also coauthor of “Relevant Search”, a book that distills his expertise in the field. John feels fortunate to have worked at the intersection of cutting-edge AI applications and foundational search technologies, giving him the opportunity to contribute to innovation in both LLM applications and information retrieval.
	Jon Handler is Director of Solutions Architecture for Search Services at Amazon Web Services, based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have generative AI, search, and log analytics workloads for OpenSearch. Prior to joining AWS, Jon’s career as a software developer included four years of coding a large-scale, eCommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master of Science and a Ph. D. in Computer Science and Artificial Intelligence from Northwestern University.
	Matt Overstreet is an AI and data specialist with a proven track record in helping businesses harness the power of cutting-edge search and retrieval technologies. Formerly part of the Opensource Connections team he works at DataStax, the company behind Apache Cassandra, Langflow, and other innovative data products. He has collaborated with numerous Fortune 100 companies to develop robust data retrieval solutions at scale. In this class, Matt combines deep technical expertise with hands-on instruction to guide students in building effective AI agents using Langflow.
	Jon Fritz is the founding Chief Product Officer at Aryn. Prior to that, he was the SVP of Product Management at Dremio, a data lakehouse company. Earlier, Jon was a Director at AWS, and led product management for in-memory database services (ElastiCache and MemoryDB for Redis, Amazon EMR (Apache Spark and Hadoop), and founded and built AWS’ blockchain division. Jon has an MBA from Stanford Graduate School of Business and a B.A. in Chemistry from Washington University in St. Louis.
	Henry Lindeman is a software engineer at Aryn. At Aryn, he’s contributed to OpenSearch and built a new SOTA table recognition model, among other projects. Before that, he was an undergraduate in math and computer science at Reed College.

Prerequisites

Workshop Descriptions

Your trainers

Event Dates

Get Notified about Upcoming Events