AVCHD Decoder DirectShow Filter SDK: Features, APIs, and Sample CodeAVCHD (Advanced Video Codec High Definition) remains a common container/format for HD camcorders and consumer video workflows. A DirectShow filter SDK that implements an AVCHD decoder helps application developers integrate playback, frame extraction, and transcoding features directly into Windows multimedia pipelines. This article covers the main features such an SDK should provide, the typical APIs and programming model, performance and integration considerations, and practical sample code to get you started building a DirectShow filter that decodes AVCHD streams.
What is AVCHD and why DirectShow?
AVCHD is a format jointly developed by Sony and Panasonic that wraps H.264/AVC video and Dolby AC‑3 or LPCM audio within a filesystem and container structure often stored on optical discs, memory cards, or file-based media. The video elementary streams use H.264 profiles commonly targeted at consumer cameras (High/Main profiles, variable GOP structures, interlaced or progressive formats).
DirectShow is Microsoft’s media-streaming architecture on Windows that lets applications compose modular filters (source, transform/codec, renderer) into filter graphs to read, process, and render multimedia. A DirectShow AVCHD decoder filter acts as a transform filter: it accepts compressed AVCHD video (often after a source or demux filter) and outputs raw uncompressed frames (e.g., YUV420P) to renderers or downstream processing filters.
Key features of a production-grade AVCHD Decoder DirectShow Filter SDK
- Full H.264/AVC support: Baseline/Main/High profile decoding including CABAC/CAVLC, SEI, SPS/PPS parsing, and slice parsing for common camera profiles.
- Interlaced and progressive frame handling: Correctly handle Top/Bottom Field Order and common AVCHD interlaced patterns; implement field combining, deinterlacing hooks, or field-aware frame output.
- Container/transport compatibility: Work with common AVCHD demuxers/sources (M2TS/TS, BDAV format, MPEG-TS, file systems of camcorders) and parse timestamps to preserve A/V sync.
- Low-latency operation: Minimal decoding latency for preview and editing workflows (fast seek, non-blocking frame delivery).
- Hardware acceleration: Optional integration with DXVA2/D3D11/Media Foundation HW-accelerated decoders when available, with software fallback.
- Adaptive bitrate/error resilience: Handle partial/corrupt frames, missing NAL units, and variable GOP structures gracefully.
- Color space and pixel format support: Output common formats (YUV420P, NV12, YUY2, RGB32) and correctly map color matrices and transfer characteristics.
- Accurate timestamps and seeking: Preserve PTS/DTS relationships, provide frame-accurate seeking and support range seeking for editing workflows.
- Thread-safe and efficient memory management: Use allocator negotiation, sample pooling, and avoid unnecessary copies.
- Diagnostics and logging: Verbose logging levels, metrics for decode time, dropped frames, and error conditions to aid debugging.
- Sample code and integration examples: Example sink applications, filter registration scripts, and guidance for integrating in common hosts (GraphEdit, custom playback apps, DirectShow.NET, C++/C# examples).
Typical SDK architecture and components
A complete SDK usually includes:
- Decoder filter binary (x86/x64) and installers/registration (regsvr32 or MSI).
- Header files and import libs for filter interfaces and helper utilities.
- C/C++ sample projects: simple player, frame grabber, demuxer + decoder chain, DXVA integration example.
- Documentation: API reference, filter pin/pin-media type specs, threading model, allocator behavior, and troubleshooting.
- Build scripts: Visual Studio solutions and instructions to build against Windows SDK.
- Tests: unit tests or sample AVCHD clips covering progressive/interlaced, variable GOP, and different audio/video combinations.
DirectShow programming model — where the decoder fits
- Source filters (file reader, camera source) feed compressed packets (usually in PES/MPEG-TS/M2TS packets) into a demux filter.
- Demux filter separates video and audio elementary streams and exposes output pins with media types like MPEG2_VIDEO and H264.
- The AVCHD decoder filter registers as capable of accepting H.264/AVC compressed media types and exposes an output pin that negotiates an uncompressed media type (e.g., MEDIASUBTYPE_NV12, MEDIASUBTYPE_YUY2).
- Filters negotiate allocators for sample buffers. The decoder must honor buffer alignment, preferred size, and count to balance memory usage and performance.
- The graph manager controls state transitions (Stopped, Paused, Running). Decoder must handle these cleanly: flushing on Stop/Seek, preroll behavior on Pause.
Core APIs and interfaces
A DirectShow filter SDK centers around COM interfaces and a handful of DirectShow contracts:
- IBaseFilter: The main COM object representing the filter; implements IFilterGraph, IMediaFilter lifecycle.
- IPin / IEnumPins: Expose input and output pins; media type negotiation occurs through IPin::Connect and related methods.
- IMemInputPin: For push model, Serve input samples via Receive(IMediaSample*).
- IAllocators / IMemAllocator: For buffer allocation and negotiation.
- IAMStreamConfig / IAMStreamControl (optional): For informing or querying stream capabilities.
- IAMVideoCompression and ISpecifyPropertyPages: If exposing codec settings in property pages.
- IAMFilterMiscFlags: For filter characteristics (e.g., whether is a renderer).
- IReferenceClock and synchronization: Decoder should honor timestamps and offer sample timestamps via IMediaSample::SetTime and SetSyncPoint semantics.
- IAMVideoProcAmp / IAMCameraControl (only relevant for capture filters).
- IBaseFilter::QueryFilterInfo -> FILTER_INFO including friendly name and graph.
The SDK will often provide helper classes for common tasks:
- CRTP base classes for COM reference counting and registration.
- Base classes for output pin negotiation and allocator handling.
- H.264 NAL reader and SPS/PPS parser utilities.
- DXVA and D3D interop helpers (IDirect3DDeviceManager9, DXVA2ConfigPictureDecode).
- Threaded decode worker and frame reordering buffer (for B/P frame reordering).
Media types and format negotiation
When implementing a transform filter, ensure you handle MediaType GUIDs and subtype layouts properly.
Input media types commonly include:
- MEDIATYPE_Video with subtype MEDIASUBTYPE_H264 (or FOURCCs like ‘H264’, ‘avc1’ in some demuxers).
- Presentation-specific subtypes with config data (e.g., AVCDecoderConfigurationRecord in extradata).
Output media types to support:
- MEDIASUBTYPE_NV12 (preferred for DXVA and hardware pipelines on Windows).
- MEDIASUBTYPE_YV12 / MEDIASUBTYPE_YUY2 for software pipelines.
- MEDIASUBTYPE_RGB32 when applications expect RGB.
Include correct VIDEOINFOHEADER2 or AM_MEDIA_TYPE with VIDEOINFOHEADER2-extras such as bmiHeader.biHeight/biWidth, AvgTimePerFrame, rcSource/dest rectangles and palette info for RGB outputs.
Handling H.264 specifics
- SPS/PPS parsing: Parse Sequence Parameter Set (SPS) to extract frame size, SAR/aspect ratio, color space info, and frame/field ordering flags.
- NAL unit framing: Implement robust splitting of Annex B (0x000001) and length-prefixed formats (common in MP4/mov) into NAL units.
- Slice and reorder buffers: Implement DPB (decoded picture buffer) logic to reorder B-frames and output display order frames with correct timestamps.
- SEI handling: Respect SEI messages like picture timing and HDR metadata (if present), passing relevant metadata downstream.
- Error concealment: On missing slices or corrupt NALs, attempt to conceal using reference frame copies or generate a plausible frame to avoid crashes.
Hardware acceleration (DXVA / D3D11VA / Media Foundation)
- Detect platform capability and provide DXVA2 (or D3D11 video) acceleration paths.
- Implement copy and synchronization to share decoded frames as textures (DXGI/NV12) to renderers.
- Provide a fallback software path (libavcodec/x264 based) for systems without hardware support or for unsupported profiles.
- Manage device loss/reset: handle device reset events and reinitialize decoder contexts and resource pools.
Performance and threading model
- Use a dedicated decode thread or thread pool to avoid blocking the graph’s main thread.
- Preallocate a pool of IMediaSample buffers negotiated via allocator to reduce runtime allocations.
- Use zero-copy when possible: supply surfaces or shared textures to downstream filters to avoid expensive CPU copies.
- Expose options to control number of frames in-flight, decode worker count, and priority to tune for low-latency preview vs. high-throughput batch transcoding.
Example: minimal DirectShow decoder flow (high-level)
- Receive(IMediaSample* sample) called on input pin with compressed data.
- Extract complete NAL units (Annex B or length-prefixed).
- Parse SPS/PPS when encountered to update output format if resolution/format changes.
- Feed NAL units to decoder core (HW or SW).
- Obtain decoded picture(s), map to output pixel format (NV12/YV12/RGB32).
- Acquire IMediaSample from allocator, fill with frame data, set time stamps and flags, and deliver via m_pOutputPin->Deliver(sample).
Sample C++ code snippets
Note: These are concise excerpts illustrating core ideas; a full SDK will include detailed COM boilerplate, HRESULT error handling, and registration code.
-
NAL extraction (Annex B to NAL array):
// find next start code and extract NAL units std::vector<std::pair<const uint8_t*, size_t>> ExtractNALs(const uint8_t* buf, size_t size) { std::vector<std::pair<const uint8_t*, size_t>> nals; size_t i = 0; while (i + 3 < size) { // find 0x000001 or 0x00000001 if (buf[i] == 0 && buf[i+1] == 0 && buf[i+2] == 1) { size_t start = i + 3; i = start; // find next start code while (i + 3 < size && !(buf[i] == 0 && buf[i+1] == 0 && (buf[i+2] == 1 || (buf[i+2] == 0 && buf[i+3] == 1)))) ++i; size_t len = (i + 3 < size) ? (i - start) : (size - start); nals.emplace_back(buf + start, len); } else { ++i; } } return nals; }
-
Simple decoder worker skeleton integrating software decoder:
// Pseudocode showing work loop while (running) { ComPtr<IMediaSample> inSample = inputQueue.pop(); BYTE* data; // compressed buffer pointer inSample->GetPointer(&data); long size = inSample->GetActualDataLength(); auto nals = ExtractNALs(data, size); for (auto &nal : nals) { decoder->SendNAL(nal.first, nal.second); while (decoder->HasDecodedFrame()) { DecodedFrame frame = decoder->GetDecodedFrame(); ComPtr<IMediaSample> outSample = allocator->Get(); // custom wrapper for allocation FillMediaSampleWithFrame(outSample.get(), frame); outSample->SetTime(...); // set RTP/PTS based timing outputPin->Deliver(outSample.get()); } } inSample->Release(); }
-
Format change handling (update output media type after SPS):
void UpdateOutputFormatFromSPS(const SPSInfo& sps) { VIDEOINFOHEADER2 vih = {}; vih.bmiHeader.biWidth = sps.width; vih.bmiHeader.biHeight = sps.height; vih.AvgTimePerFrame = sps.fps ? (UNITS / sps.fps) : 0; AM_MEDIA_TYPE* pmt = CreateMediaTypeFromVIH(vih, MEDIASUBTYPE_NV12); pOutputPin->DeliverNewSegment(...); pOutputPin->NotifyFormatChange(pmt); }
Sample application: simple playback with GraphEdit or custom player
- Register filter DLL (regsvr32).
- Open GraphEdit (or GraphStudioNext), insert source filter (File Source (Async.)), add M2TS demux (or use Source Filter that outputs H264), connect to AVCHD Decoder filter, and connect decoder output to Enhanced Video Renderer (EVR).
- For programmatic playback using IGraphBuilder:
- CreateFilterByCLSID for source, demux, decoder, and renderer.
- AddFilter/ConnectPins as needed.
- Run graph and watch logging for negotiate format changes.
Troubleshooting common issues
- No video / black frames: Verify media type negotiation; ensure output pixel format supported by renderer (prefer NV12 for EVR). Check SPS parsing and that decoder outputs frames.
- A/V sync drift: Ensure PTS/DTS preservation and SetTime on output samples. Use monotonic timestamps from the demuxer or source.
- Crashes on device loss: Implement device lost handling and reinitialize DXVA/D3D contexts.
- Slow seek: Implement efficient keyframe indexing in demuxer and support frame-accurate seeking by decoding from nearest IDR and discarding until target PTS.
- Color mismatch: Verify color range (limited/full), color matrix (BT.709 vs BT.601), and convert appropriately when outputting RGB.
Licensing and third-party libraries
- H.264 decoding IP is covered by multiple patents; if bundling a software decoder, confirm licensing (e.g., use of libavcodec/FFmpeg may have patent implications in some jurisdictions).
- Using OS-provided hardware decoders (DXVA) typically operates under Microsoft’s licensing for the platform, but ensure any redistribution of SDK code respects third-party licenses.
- Ship clear documentation of components and licenses included in SDK.
Conclusion
A robust AVCHD Decoder DirectShow Filter SDK provides not just decoding logic but a complete integration experience: reliable media type negotiation, allocator and threading optimizations, hardware acceleration support, correct handling of H.264 nuances (SPS/PPS, interlaced content, DPB reordering), and practical samples for developers. The examples above outline the high-level architecture and illustrate how to extract NALs, feed them to a decoder, and deliver frames downstream. For production use, emphasize hardware acceleration, thorough testing on real AVCHD clips, and careful attention to timestamping, color space, and error resilience.
Leave a Reply