.. KitchenSink Audio documentation master file.

###################################
KitchenSink Audio Documentation
###################################

Welcome to the official documentation for KitchenSink Audio. This library provides
a simple, modular framework for building audio processing pipelines.

Core Concepts
-------------

The library is built around two fundamental concepts: **Sources** and **Sinks**.

*   A **Source** is where audio comes from. This could be a local microphone
    (``LineInAudioSource``), a network connection (like ``TCPServerAudioSource``),
    or any other origin of audio data.
*   A **Sink** is where audio goes to. This could be your speakers
    (``AudioPlayerSink``), a network connection (like ``TCPClientAudioSink``),
    or a file on disk.

You connect them by passing a sink's ``push_chunk`` method as the ``sink``
argument when creating a source. The source then calls this method to send
its audio data to the sink, creating a pipeline.

Middleware and Pipelines
------------------------

To process or analyze audio between a source and a final sink, you can create
"middleware" components. A middleware component is simply a class that acts as
both a sink and a source.

1.  It accepts a ``sink`` in its constructor, just like a real source.
2.  It has a ``push_chunk`` method, just like a real sink.

When its ``push_chunk`` method is called, it can perform an action on the audio
data (e.g., measure volume, apply an effect, log data) and then pass the chunk
along to the *next* sink in the chain.

This allows you to build complex pipelines:

.. code-block:: text

   [Mic Source] -> [Volume Monitor Middleware] -> [Network Sink]


Consuming Chunks
----------------

The callable you provide as a ``sink`` does not have to be an actual ``BaseAudioSink``
object. It can be any function or method that can process a chunk of audio data.

This is useful for when the audio stream is not meant for another destination,
but is instead being consumed for analysis. For example, you could have a
WebSocket source feed audio chunks directly to a speech-to-text engine:

.. code-block:: text

   def speech_to_text_engine(audio_chunk):
       # Process the audio, get transcription...
       transcription = my_stt_library.process(audio_chunk)
       # ...then do something with the text.
       if transcription:
           print(f"Heard: {transcription}")

   # The STT function is the "sink" for the audio source.
   ws_source = TypedWebSocketServerAudioSource(sink=speech_to_text_engine)


This pattern allows you to use the sources in this library as a generic way to
receive audio for any purpose.


Blocksize and Latency
---------------------

The ``blocksize`` parameter, available in most sources and sinks, defines the
number of audio frames per chunk. This is a key parameter for controlling
latency and performance.

*   **In Sources**: It determines how frequently the source will generate and
    push audio chunks to the sink.
*   **In Sinks**: It serves as a hint to the source about the preferred chunk
    size for optimal processing (e.g., matching the buffer size of the
    audio output device).

A smaller ``blocksize`` reduces latency but increases the overhead of function
calls and network packets. A larger ``blocksize`` is more efficient but
introduces more delay. The ideal value depends on the application's requirements.


.. toctree::
   :maxdepth: 2
   :caption: Contents:

   api_reference

.. mdinclude:: ../../README.md


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`