DSIP: Decentralized Session Initiation Protocol

An Identity-First Protocol for Trusted Real-Time Media

Version: Draft v0.4
Status: Proposal
Editor: James Ferris

This draft supersedes v0.3 found here: DSIP Proposal v0.3

1. Abstract

DSIP, the Decentralized Session Initiation Protocol, is an identity-first signaling and negotiation protocol for establishing trusted real-time media sessions between people, devices, applications, services, broadcasters, and AI agents.

DSIP is inspired by SIP, but it is not limited to telephony. Where SIP primarily initiated voice and video calls between user agents, DSIP initiates verifiable media sessions between any identity-aware endpoint. These sessions may include voice, video, messaging, live broadcast, conferencing, AI interaction, device media, sensor streams, captions, metadata, or future real-time media types.

DSIP replaces carrier-owned numbers, centralized registrars, siloed application identities, and hop-by-hop trust with:

Self-sovereign identity using Decentralized Identifiers (DIDs)
Cryptographically signed signaling
Decentralized discovery and presence
First-class media and codec negotiation
Transport-independent session establishment
Verifiable source identity for calls, broadcasts, devices, and AI agents
Privacy-preserving security and selective disclosure
Pragmatic interoperation with SIP, PSTN, WebRTC, HLS/DASH, and future media transports

The goal of DSIP is not only to modernize phone calls. The goal is to create an open trust and session layer for real-time communication across the internet.

2. Why DSIP Needs to Evolve Beyond Calls

The original DSIP concept began as a decentralized successor to SIP for secure voice, video, and messaging. That remains an important use case, but it is no longer the whole vision.

Real-time communication is expanding beyond phones and PBXs. Modern communication now includes:

Human-to-human calls
Video meetings
AI voice agents
AI-to-AI interactions
Contact center media
Live captions and transcription
Smart speakers and voice assistants
Vehicle communications
Intercoms and access systems
Public safety audio/video
Radio and TV broadcasts
Live streaming
Gaming voice/video
Remote device media
Machine-to-machine media streams
Sensor and telemetry channels

Today, each ecosystem solves identity, discovery, signaling, trust, and media negotiation differently.

Phone networks use numbers, carriers, SIP, SS7, and regulatory trust chains. Video platforms use proprietary account systems and signaling. Broadcasters use URLs, platform identities, HLS/DASH manifests, or app-specific discovery. AI voice systems use WebSockets, APIs, SIP trunks, or custom glue. IoT devices often depend on vendor clouds.

The result is a fragmented world where every platform has its own identity model, its own trust model, its own signaling layer, and its own media assumptions.

DSIP proposes a common control plane for trusted real-time media.

3. Updated Definition

DSIP stands for:

Decentralized Session Initiation Protocol

DSIP is not merely “Decentralized SIP.”

A better definition is:

DSIP is an identity-first signaling, discovery, and media negotiation protocol for establishing trusted real-time sessions between any identity-aware endpoint.

An endpoint may be:

A person
A phone
A browser
A mobile app
A SIP gateway
A video room
A conference bridge
An AI agent
A contact center queue
A radio station
A TV broadcaster
A camera
A smart speaker
A vehicle
An IoT device
A public safety system
A media relay
A software service

The endpoint does not need to be a phone. It only needs a verifiable identity, discoverable capabilities, and a way to participate in a negotiated media session.

4. Motivation

4.1 Identity Is Broken Across Real-Time Communications

Traditional caller identity is weak. Phone numbers can be spoofed. CNAM is inconsistent. STIR/SHAKEN improves parts of the PSTN problem, but it is still bound to carrier-controlled telephone numbers and does not solve identity across internet-native communication.

Video platforms, messaging platforms, broadcast platforms, and AI communication systems each create their own closed identity systems. A verified identity in one platform does not automatically carry to another.

DSIP makes identity intrinsic to the session.

4.2 Signaling Is Fragmented

SIP, WebRTC signaling, proprietary video platforms, streaming manifests, contact center APIs, and AI voice APIs all solve overlapping problems in incompatible ways.

They all need to answer similar questions:

Who is initiating this session?
Who is the intended recipient or audience?
Can the source be verified?
What media types are supported?
What codecs are available?
Which transport should carry the media?
Is the session live, interactive, broadcast, recorded, or relayed?
What security and privacy policies apply?

DSIP provides a common session layer for those questions.

4.3 Deepfake Audio and Video Require Verifiable Source Identity

As synthetic audio and video become easier to create, the internet needs a way to verify the source of real-time media.

A DSIP-compatible client should be able to determine whether a media session is:

A verified person
A verified organization
A verified broadcaster
A verified AI agent
A verified device
A relay of a verified source
An unverified or anonymous source

This is not only a telecom issue. It applies to news, emergency broadcasts, business communications, customer support, public safety, and AI-generated media.

4.4 Codec and Transport Negotiation Need a Modern Standard

SIP/SDP solved a major problem by allowing endpoints to negotiate media. DSIP keeps that idea, but expands it for modern media environments.

DSIP should negotiate:

Audio codecs
Video codecs
Text and caption streams
Data channels
Transcription streams
AI context streams
Broadcast variants
Latency requirements
Bandwidth constraints
Encryption modes
Consent and recording policy
Transport bindings
Relay and federation options

4.5 Real-Time Media Is No Longer Only Conversational

A call is one type of session. DSIP should also support:

One-to-one communication
One-to-many broadcast
Many-to-many conferences
Publish/subscribe media
Federated rebroadcast
AI-assisted sessions
Device media streams
Emergency alerts
Low-latency live events

The session model should be explicit, negotiated, and extensible.

5. Core Principles

Identity-first communication
Every DSIP participant is identified by a DID or equivalent verifiable identity.
End-to-end verifiability
DSIP signaling is signed. Identity is proven cryptographically, not simply asserted by a network intermediary.
Media-agnostic design
DSIP supports audio, video, messaging, captions, metadata, data channels, sensor streams, and future media types.
Transport independence
DSIP negotiates sessions that may use RTP/SRTP, WebRTC, QUIC, WebTransport, HLS, DASH, multicast, or future transports.
Decentralized discovery
Endpoints are discovered using DID documents, DNS, WebFinger, DHTs, federation, and other decentralized or federated mechanisms.
Codec negotiation as a first-class feature
DSIP treats media capability exchange as a central protocol function, not an afterthought.
Privacy by design
DSIP supports selective disclosure, encrypted signaling payloads, ephemeral identities, private presence, and minimal metadata exposure.
Composable application profiles
Phone calls, video conferences, broadcasts, AI agents, messaging, and device streams are profiles built on the same foundation.
Pragmatic interop
DSIP must interoperate with SIP/PSTN, WebRTC, existing streaming systems, emergency networks, and enterprise communication platforms.
Open participation
Users, companies, governments, developers, broadcasters, and open-source communities should be able to operate DSIP endpoints, resolvers, relays, gateways, and media services.

6. Scope and Non-Goals

6.1 DSIP Defines

DSIP defines:

Identity binding for real-time sessions
Endpoint discovery
Presence and availability records
Session initiation messages
Session state updates
Media capability advertisement
Codec negotiation
Transport negotiation
Security and trust metadata
Application profile conventions
Interoperation models

6.2 DSIP Does Not Require

DSIP does not require:

A blockchain
A single global registrar
Carrier-controlled phone numbers
A specific codec
A specific media transport
A specific relay network
A single application provider
A single identity provider
A single commercial trust authority

6.3 DSIP Is Not Intended to Replace Every Media Protocol

DSIP should not replace RTP, SRTP, WebRTC, HLS, DASH, QUIC, or media codecs.

Instead, DSIP should act as the control plane that discovers endpoints, verifies identities, negotiates capabilities, selects transports, and establishes policy before media flows.

7. Layered Architecture

DSIP is best understood as a layered protocol stack.

+-----------------------------------------------------------+
| Application Profiles                                      |
| call | conference | broadcast | AI agent | device | msg    |
+-----------------------------------------------------------+
| Session Layer                                             |
| invite | answer | reject | update | bye | publish | join   |
+-----------------------------------------------------------+
| Media Negotiation Layer                                   |
| audio | video | text | data | captions | codecs | policy  |
+-----------------------------------------------------------+
| Trust and Identity Layer                                  |
| DIDs | signatures | VCs | attestation | reputation        |
+-----------------------------------------------------------+
| Discovery and Presence Layer                              |
| DID docs | DNS | WebFinger | DHT | federation | relays     |
+-----------------------------------------------------------+
| Transport Bindings                                        |
| QUIC | WebSocket | HTTPS | libp2p | SIP gateway          |
+-----------------------------------------------------------+
| Media Plane                                               |
| SRTP | WebRTC | RTP | QUIC media | HLS | DASH | multicast |
+-----------------------------------------------------------+

This layered model keeps DSIP flexible. A phone call, live TV stream, AI agent session, and device video feed should not need completely different trust and discovery models.

8. Identity Layer

8.1 DID-Based Identity

DSIP endpoints are identified using Decentralized Identifiers.

Recommended DID methods:

did:key for individuals, ephemeral endpoints, agents, and devices
did:web for organizations, brands, broadcasters, public agencies, and enterprises
Optional ledger-backed DID methods for audit-heavy environments

Example identities:

did:key:z6MkAlice...
did:web:example.com:users:alice
did:web:acme.com:support
did:web:acme.com:agents:billing-bot
did:web:wxyz.com:radio:main
did:web:city.gov:emergency-alerts
did:web:vehicle.example:vin:123456
did:web:building.example:front-door-camera

8.2 Endpoint Classes

DSIP should define endpoint classes to help clients make trust and UX decisions.

Possible endpoint classes:

person
organization
agent
device
broadcaster
relay
gateway
conference
queue
emergency-service
application
anonymous

A DSIP client could render these differently:

Verified Person: James Ferris
Verified Organization: ACME Bank
Verified AI Agent: ACME Billing Assistant
Verified Broadcast: WXYZ Emergency Weather Feed
Verified Device: Building A Front Door Intercom
Unverified Endpoint: Anonymous Caller

8.3 Verifiable Credentials

DSIP should support Verifiable Credentials for identity claims.

Examples:

Display name credential
Organization credential
Employee role credential
Agent authorization credential
Device ownership credential
Broadcaster license credential
Emergency service credential
Reputation credential
Compliance credential

A support bot could prove it is authorized by a company. A broadcaster could prove it controls an official station feed. A device could prove it belongs to a building or vehicle.

8.4 Device and Agent Identity

DSIP should treat devices and AI agents as first-class identities, not as secondary extensions of user accounts.

A device may have:

A manufacturer credential
An owner credential
A location or deployment credential
A secure element key
A rotation policy
A revocation mechanism

An AI agent may have:

An operator credential
A role credential
A capability manifest
A policy declaration
A disclosure flag indicating that it is non-human

Example:

{
  "id": "did:web:acme.com:agents:support",
  "class": "agent",
  "operator": "did:web:acme.com",
  "disclosure": "ai-agent",
  "roles": ["customer-support", "billing"],
  "capabilities": ["voice", "transfer", "transcription", "human-escalation"]
}

9. Discovery and Presence

9.1 Discovery Goals

DSIP discovery should answer:

Where is this identity reachable?
What services does it expose?
Is it online, offline, busy, publishing, or relay-only?
Which transports are available?
Which application profiles are supported?
Which media capabilities are advertised?
Which trust credentials are available?

9.2 DID Document Service Entry

A DID document may expose a DSIP service entry.

{
  "id": "did:web:example.com:users:alice",
  "service": [
    {
      "id": "#dsip",
      "type": "DSIPService",
      "serviceEndpoint": {
        "https": ["https://dsip.example.com/alice"],
        "wss": ["wss://relay.example.com/dsip/alice"],
        "quic": ["quic://dsip.example.com:443/alice"],
        "libp2p": ["/dns4/dsip.example.com/udp/443/quic-v1/p2p/12D3KooW..."]
      }
    }
  ]
}

9.3 Human-Friendly Names

Humans should not need to exchange raw DID strings.

DSIP should support:

WebFinger aliases
DNS TXT/SRV discovery
QR codes
Contact cards
Organization directories
Verified short names
Optional handle systems

Examples:

alice@example.com
support@acme.com
wxyz.com/radio/main
city.gov/emergency/live
frontdoor@building.example

These aliases resolve to DIDs, not to carrier-owned numbers.

9.4 Presence Records

A DSIP presence record is a short-lived signed record that describes current reachability and optional capabilities.

{
  "type": "dsip.presence.v1",
  "did": "did:web:example.com:users:alice",
  "state": "available",
  "profiles": ["call", "message", "video"],
  "endpoints": {
    "wss": ["wss://relay.example.com/dsip/conn/abcd"],
    "quic": ["quic://198.51.100.10:443/session"]
  },
  "ttl": 180,
  "seq": 58,
  "exp": 1760000000,
  "sig": "base64url(signature)"
}

Presence should be optional and privacy-preserving. Some endpoints may only expose relay-based reachability. Some identities may not publish presence at all.

9.5 Broadcast Publication Records

Broadcast endpoints need a related but different concept: publication.

{
  "type": "dsip.publication.v1",
  "publisher": "did:web:wxyz.com",
  "stream_id": "did:web:wxyz.com:radio:main",
  "title": "WXYZ Live Radio",
  "state": "live",
  "profiles": ["broadcast", "audio"],
  "variants": [
    {
      "media": "audio",
      "codec": "opus",
      "sample_rate": 48000,
      "channels": 2,
      "transport": "webrtc",
      "endpoint": "wss://live.wxyz.com/dsip"
    },
    {
      "media": "audio",
      "codec": "aac",
      "transport": "hls",
      "endpoint": "https://live.wxyz.com/main.m3u8"
    }
  ],
  "exp": 1760000000,
  "sig": "base64url(signature-by-broadcaster)"
}

This lets receivers verify the source of a live audio or video feed, even if the stream is distributed through third-party relays.

10. Signaling Protocol

10.1 Envelope Format

All DSIP signaling messages are carried in signed envelopes.

DSIP envelopes may be encoded as JSON or CBOR. Signatures may use JWS or COSE.

{
  "type": "dsip.invite.v1",
  "id": "uuid",
  "ts": 1760000000,
  "from": "did:key:z6MkCaller",
  "to": "did:web:example.com:users:bob",
  "profile": "call",
  "intent": "interactive",
  "capabilities": {},
  "media": {},
  "network": {},
  "policy": {},
  "identity": {},
  "sig": {
    "alg": "Ed25519",
    "kid": "did:key:z6MkCaller#key-1",
    "value": "base64url(signature)"
  }
}

10.2 Core Message Types

Minimum conversational session messages:

dsip.invite      Start a session
dsip.answer      Accept a session
dsip.reject      Reject a session
dsip.update      Modify session parameters
dsip.candidate   Exchange network candidates
dsip.bye         End a session
dsip.error       Report a protocol or policy failure

Publication and subscription messages:

dsip.publish     Publish a live stream or availability record
dsip.subscribe   Subscribe to a stream, presence, or event source
dsip.notify      Send subscription updates
dsip.unpublish   Stop publishing
dsip.unsubscribe Stop subscribing

Conference and group messages:

dsip.join        Join a room or group session
dsip.leave       Leave a room or group session
dsip.refer       Refer or transfer a participant
dsip.control     Send authorized control actions
dsip.floor       Request or grant speaking/control floor

Messaging and data messages:

dsip.msg         Send encrypted message payload
dsip.receipt     Delivery/read receipt
dsip.typing      Typing or composition indicator
dsip.data        Application-defined data payload

10.3 Session Intent

DSIP should separate session intent from media type.

Examples:

interactive      Real-time two-way communication
broadcast        One-to-many live media
conference       Many-to-many communication
relay            Third-party relay or rebroadcast
recording        Capture or archive media
monitoring       Listen-only or observe-only session
ai-assist        AI participates in or augments the session
device-control   Media session with control channel
emergency        Emergency or public safety priority session

This matters because a video stream may be a call, a broadcast, a camera feed, or an emergency event.

11. Application Profiles

DSIP should define application profiles that build on the same protocol foundation.

11.1 Call Profile

The call profile supports classic one-to-one or one-to-few interactive sessions.

Common media:

Audio
Video
Text chat
Captions
Screen share
Data channel

Common actions:

Invite
Answer
Reject
Hold/resume
Transfer
Add participant
End

11.2 Conference Profile

The conference profile supports rooms, meetings, webinars, and group conversations.

Common topologies:

Mesh
SFU
MCU
Hybrid

Capabilities:

Join/leave
Participant list
Moderator controls
Floor control
Recording policy
Captions
Screen share
Breakout groups

11.3 Broadcast Profile

The broadcast profile supports one-to-many media distribution.

Examples:

Radio broadcast
TV audio/video
Emergency broadcast
Sports commentary
Live event stream
Public meeting stream
Concert stream
Government announcement

Broadcast profile features:

Publisher identity verification
Stream publication records
Media variants
Low-latency and high-scale options
Relay authorization
Signed metadata
Emergency override
Multi-language audio tracks
Captions and transcripts

11.4 AI Agent Profile

The AI agent profile supports real-time interaction with AI systems.

Examples:

Customer support agent
Personal assistant
Translation agent
Transcription agent
Voice bot
Meeting assistant
Agent-to-agent negotiation

Required properties:

AI disclosure
Operator identity
Capability advertisement
Human escalation support
Recording/transcription policy
Data retention policy
Consent model

11.5 Device Media Profile

The device media profile supports real-time sessions with cameras, intercoms, smart speakers, vehicles, industrial equipment, and sensors.

Examples:

Doorbell camera
Building intercom
Vehicle voice/video session
Factory floor panel
Baby monitor
Security camera
Emergency call box

Capabilities:

Audio/video stream
Push-to-talk
Control channel
Device attestation
Location policy
Owner authorization
Local network fallback

11.6 Messaging Profile

The messaging profile supports encrypted asynchronous and synchronous messaging.

Common features:

One-to-one messages
Group messages
Delivery receipts
Typing indicators
Edits/deletes
Attachments
Offline envelopes
MLS for groups

Messaging remains a DSIP profile, not the core definition of the protocol.

12. Media Negotiation

12.1 Goals

DSIP media negotiation should allow endpoints to agree on:

Media types
Codecs
Codec parameters
Bitrate
Resolution
Frame rate
Sample rate
Channel count
Packetization interval
Latency target
Transport
Encryption
Relay mode
Simulcast/SVC options
Captions/transcription streams
Recording permissions
AI processing permissions

12.2 Media Types

Initial media types:

audio
video
text
data
caption
transcript
screen
metadata
control
sensor
ai-context

Future media types should be added through an extension registry.

12.3 Capability Advertisement

Example capability advertisement:

{
  "media": {
    "audio": {
      "codecs": [
        { "name": "opus", "sample_rates": [16000, 48000], "channels": [1, 2] },
        { "name": "pcmu", "sample_rates": [8000], "channels": [1] },
        { "name": "aac", "sample_rates": [44100, 48000], "channels": [2] }
      ]
    },
    "video": {
      "codecs": [
        { "name": "av1", "resolutions": ["720p", "1080p", "4k"], "framerates": [24, 30, 60] },
        { "name": "vp9", "resolutions": ["720p", "1080p"], "framerates": [30, 60] },
        { "name": "h264", "resolutions": ["480p", "720p", "1080p"], "framerates": [24, 30] }
      ]
    },
    "caption": {
      "formats": ["webvtt", "ttml", "plain-text", "json-events"]
    },
    "data": {
      "channels": ["control", "metadata", "ai-context"]
    }
  },
  "transports": ["webrtc", "srtp", "quic", "webtransport", "hls", "dash"],
  "encryption": ["dtls-srtp", "sframe", "mls", "tls"],
  "latency": {
    "target_ms": 150,
    "max_ms": 500
  }
}

12.4 Negotiated Result

Example negotiated session:

{
  "accepted_media": [
    {
      "type": "audio",
      "codec": "opus",
      "sample_rate": 48000,
      "channels": 2,
      "bitrate": 64000
    },
    {
      "type": "video",
      "codec": "h264",
      "resolution": "1080p",
      "framerate": 30,
      "bitrate": 2500000
    },
    {
      "type": "caption",
      "format": "webvtt",
      "source": "server-assisted"
    }
  ],
  "transport": "webrtc",
  "encryption": "dtls-srtp",
  "relay": "sfu",
  "recording": {
    "allowed": true,
    "consent": "required"
  }
}

12.5 Relationship to SDP

DSIP may carry SDP for interoperability, especially with SIP and WebRTC systems.

However, DSIP should not be limited to SDP. DSIP should define a structured media negotiation model that can map to SDP when needed.

This allows DSIP to negotiate modern and future media sessions without being constrained by legacy assumptions.

13. Transport Bindings

DSIP signaling should support multiple transport bindings.

Potential signaling transports:

HTTPS
WebSocket
QUIC
WebTransport
libp2p
SIP gateway mapping
Message queue or broker-based transport for enterprise environments

Potential media transports:

RTP
SRTP
WebRTC
QUIC media
WebTransport
HLS
DASH
SRT
RIST
Multicast RTP
Future low-latency media transports

DSIP should negotiate the best transport based on the session profile.

A phone call may prefer SRTP or WebRTC. A live broadcast may prefer QUIC or HLS. A local emergency alert may prefer multicast. An enterprise AI gateway may prefer RTP plus a WebSocket side channel.

14. Trust, Reputation, and Source Verification

14.1 Trust Model

DSIP trust is based on cryptographic identity and verifiable claims.

A receiver should be able to verify:

The message was signed by the claimed sender
The sender controls the DID
The DID resolves to expected service endpoints
Any attached credentials are valid
The session has not been replayed or tampered with
The media source matches the negotiated session

14.2 Rich Real-Time Identity

The original DSIP concept included Rich Caller ID. In v0.4, this should expand to Rich Session Identity.

Rich Session Identity may include:

Display name
Organization
Role
Endpoint class
Avatar or logo
Verified website
AI disclosure
Device type
Broadcast title
Emergency status
Trust score
Issuer credentials

Example:

{
  "type": "rsi.v1",
  "displayName": "WXYZ News Live",
  "class": "broadcaster",
  "organization": "did:web:wxyz.com",
  "logo": "https://wxyz.com/logo.png",
  "credentials": [
    {
      "type": ["VerifiableCredential", "BroadcasterCredential"],
      "issuer": "did:web:fcc.example",
      "credentialSubject": {
        "id": "did:web:wxyz.com:tv:main",
        "service": "broadcast-video"
      }
    }
  ]
}

14.3 Verified Relays and Rebroadcasts

Broadcast and relay scenarios require source preservation.

A third party may relay a stream, but receivers should still be able to verify the original publisher.

DSIP should support:

Original publisher signature
Relay signature
Relay authorization proof
Chain of custody metadata
Tamper-evident stream metadata

This is useful for public safety, journalism, government broadcasts, live sports, and federated content distribution.

15. Privacy and Security

15.1 Threats

DSIP should address:

Caller/source spoofing
Deepfake impersonation
Replay attacks
Presence tracking
Metadata leakage
Relay abuse
Unauthorized recording
Unauthorized AI processing
Key compromise
Device cloning
Spam and robocalling
Broadcast hijacking
Emergency alert spoofing

15.2 Mitigations

Mitigations include:

Signed signaling
Short-lived envelopes
Nonces and sequence numbers
Encrypted payloads
DID key rotation
Device attestation
Verifiable credentials
Selective disclosure
Ephemeral DIDs
Relay rate limiting
Reputation policies
Consent receipts
Media path binding
Emergency publisher allowlists

15.3 Consent and Policy

DSIP should make policy explicit.

Examples:

{
  "policy": {
    "recording": "consent-required",
    "transcription": "allowed",
    "ai_processing": "allowed-with-disclosure",
    "retention": "30-days",
    "redistribution": "forbidden",
    "relay": "authorized-only"
  }
}

Policies should be signed and visible to participants before or during session establishment.

16. Emergency and Public Safety

Emergency communication remains critical, but in v0.4 it should be generalized beyond emergency calling.

DSIP should support:

Emergency calls
Emergency broadcasts
Public safety video
Verified alert feeds
Dispatch channels
Location-aware routing
Priority treatment
Fallback to PSTN/NG911 where required

16.1 Emergency Session Types

emergency-call
emergency-broadcast
public-safety-video
dispatch-audio
verified-alert

16.2 Emergency Identity

Emergency identities should be operated or certified by appropriate authorities.

Examples:

did:web:911.us
did:web:112.eu
did:web:city.gov:emergency-alerts
did:web:county.gov:dispatch
did:web:weather.gov:alerts

16.3 Location and Privacy

Location should not be mandatory for all DSIP use cases.

Instead, DSIP should define context-specific location disclosure:

No location shared by default
Approximate region for routing when needed
Precise location for emergency sessions
Device-attested location for regulated use cases
Zero-knowledge region proof for privacy-preserving eligibility

The earlier idea of verifiable geolocation remains valuable, but it should not be globally required for all DSIP endpoints.

17. Interoperability

17.1 SIP and PSTN

DSIP should interoperate with SIP and PSTN through gateways.

Mappings may include:

dsip.invite        <-> SIP INVITE
dsip.answer        <-> SIP 200 OK
dsip.bye           <-> SIP BYE
dsip.refer         <-> SIP REFER
dsip.media         <-> SDP
dsip.identity      <-> SIP Identity / P-Asserted-Identity / STIR-SHAKEN context
DSIP DID           <-> E.164 number credential or gateway identity

SIP/PSTN interop will be imperfect because the PSTN cannot preserve all DSIP trust semantics end-to-end.

17.2 WebRTC

DSIP can be used as a standardized signaling and identity layer for WebRTC.

In this model:

DSIP handles identity, discovery, session negotiation, and policy
WebRTC handles ICE, DTLS-SRTP, congestion control, and media transport

17.3 Broadcast Systems

DSIP can bind to existing broadcast and streaming systems.

Examples:

HLS playlist as negotiated media variant
DASH manifest as negotiated media variant
WebRTC low-latency stream for interactive broadcast
SRT/RIST contribution feed
Multicast local distribution

DSIP does not need to replace these systems. It can verify the publisher, advertise variants, negotiate access, and preserve trust metadata.

17.4 AI Voice Systems

DSIP can interoperate with AI media systems through gateways.

Examples:

DSIP to OpenAI Realtime API gateway
DSIP to Deepgram/AssemblyAI transcription gateway
DSIP to TTS gateway
DSIP to SIP-based AI agent
DSIP to contact center AI assistant

The AI profile should make AI identity, disclosure, retention, and escalation explicit.

18. Network Components

18.1 DSIP Agent

A DSIP Agent is any endpoint capable of participating in DSIP signaling.

Responsibilities:

Manage identity keys
Resolve DIDs
Publish presence or publication records
Verify signatures
Advertise capabilities
Negotiate media
Enforce local policy
Establish media transport

18.2 Resolver

Resolvers retrieve and verify DID documents, service records, aliases, and trust metadata.

18.3 Relay

Relays forward signaling and optionally support store-and-forward envelopes.

Relays should not need plaintext access to signaling payloads unless explicitly authorized.

18.4 Media Relay

Media relays include TURN servers, SFUs, MCUs, broadcast relays, edge caches, and media gateways.

18.5 Gateway

Gateways bridge DSIP to other systems:

SIP/PSTN
WebRTC applications
Contact centers
Broadcast platforms
AI APIs
Enterprise communication systems
IoT vendor clouds

18.6 Trust Authority / Credential Issuer

Credential issuers provide verifiable claims. DSIP should not require one global trust authority. Different communities and industries may define their own trusted issuers.

19. Example Flows

19.1 Person-to-Person Call

Alice selects Bob’s verified handle.
Alice resolves Bob’s DID.
Alice retrieves Bob’s presence record.
Alice sends a signed dsip.invite with audio/video capabilities.
Bob verifies Alice’s identity and credentials.
Bob answers with mutually supported media.
ICE or another negotiated transport completes.
Encrypted media flows.
Either side sends dsip.bye to end the session.

19.2 Verified Radio Broadcast

A radio station publishes a signed DSIP publication record.
A listener searches for or follows the station identity.
The listener verifies the broadcaster DID and credentials.
The client selects the best supported media variant.
The listener subscribes to the stream.
Audio flows over WebRTC, QUIC, HLS, or another negotiated transport.
Metadata remains signed by the broadcaster.

19.3 Verified TV Broadcast with Alternate Audio

A TV broadcaster publishes a video stream with multiple audio tracks.
The DSIP publication record lists variants for video, English audio, Spanish audio, captions, and audio-only mode.
The receiver verifies the broadcaster.
The receiver negotiates H.264 video, English Opus audio, and WebVTT captions.
During an emergency, the broadcaster publishes a signed emergency override message.
Receivers verify the override and switch to the alert track.

19.4 AI Customer Support Agent

A user contacts support@acme.com.
The alias resolves to did:web:acme.com:agents:support.
The agent presents credentials proving it is operated by ACME.
The agent discloses that it is AI.
The session negotiates voice, transcript, and optional screen share.
The user consents to transcription.
The AI handles the call or escalates to a human using dsip.refer or dsip.join.

19.5 Device Video Session

A building intercom publishes limited DSIP reachability.
A resident initiates a signed session to the device.
The device verifies the resident’s access credential.
The session negotiates audio, video, and a control channel.
The resident speaks to the visitor and optionally unlocks the door through an authorized control action.

20. Economic Model

DSIP should support both free and commercial operation.

Possible models:

Free peer-to-peer sessions
Paid relays
Paid SFU/media relay usage
Broadcast subscription access
Enterprise-managed DSIP services
Credential issuance services
Reputation and abuse-prevention services
Contact center / AI agent platforms
Developer APIs and SDKs

Payments should be optional. The protocol should work without built-in payments, but should allow metering and authorization where required.

21. Governance and Extension Model

DSIP should define a core protocol and a registry of extensions.

Potential registries:

Application profiles
Media types
Codec identifiers
Transport bindings
Error codes
Credential types
Policy fields
Endpoint classes
Trust frameworks

The core should remain small enough to implement, while extensions allow DSIP to evolve.

22. Minimal DSIP v0.4 Implementation

A minimal DSIP prototype should include:

DID identity generation
DID resolution for did:key and did:web
Signed signaling envelopes
Basic presence record
invite, answer, reject, bye, and error
Audio capability negotiation
Video capability negotiation
WebRTC transport binding
Basic rich session identity display
Simple relay support
SIP gateway proof of concept

A second milestone should include:

Broadcast publication records
Subscribe/notify flow
AI agent profile
Device profile
Messaging profile
Policy negotiation
Credential verification
Media relay authorization

23. Relationship to SIP

SIP was one of the most important real-time communication protocols ever created. It gave the internet a way to initiate sessions, negotiate media using SDP, and build interoperable voice and video systems.

DSIP should preserve the good ideas:

Session initiation
User agents
Media negotiation
Proxy/relay concepts
Interoperability
Extensibility
Separation of signaling and media

But DSIP should not inherit all of SIP’s assumptions:

Phone numbers as primary identity
Carrier-controlled trust
Hop-by-hop security
Registrar-centric reachability
Legacy telephony bias
Weak caller identity
SDP-only media expression
Limited awareness of AI, devices, broadcasts, and modern application models

DSIP is the spiritual successor to SIP, not a wire-compatible replacement.

24. Conclusion

DSIP began as an idea for decentralized SIP. The stronger vision is larger.

The internet needs a common way to establish trusted real-time media sessions across people, devices, applications, broadcasters, AI agents, and communication systems.

That requires more than a new phone protocol. It requires a decentralized session layer that can answer:

Who is participating?
Can their identity be verified?
What kind of session is being requested?
What media types are supported?
Which codecs and transports can be used?
What policies apply?
Can the media source be trusted?
Can the session interoperate with existing systems?

DSIP should become that layer.

DSIP is an identity-first protocol for trusted real-time media sessions.

Phone calls are one use case. Video meetings are another. Radio and TV broadcasts are another. AI agents, device streams, public safety systems, and future media applications are all part of the same larger communication fabric.

The future of real-time communication should not be locked inside phone numbers, carrier registrars, proprietary meeting platforms, vendor clouds, or app-specific identities.

It should be open, verifiable, decentralized, and media-native from the start.

25. Glossary

AI Agent — A software-based participant capable of real-time interaction using speech, text, video, or data.

Application Profile — A DSIP-defined usage pattern such as call, conference, broadcast, messaging, AI agent, or device media.

DID — Decentralized Identifier.

DSIP Agent — Any endpoint that implements DSIP signaling and identity verification.

Endpoint — A person, device, service, application, agent, gateway, broadcaster, or relay that participates in DSIP.

Media Negotiation — The process of agreeing on media types, codecs, transports, encryption, and policy.

Publication Record — A signed record describing a live or available broadcast/media stream.

Rich Session Identity — Verifiable identity metadata associated with a DSIP session.

Session Intent — The purpose of a session, such as call, broadcast, conference, AI assist, emergency, or device control.

Transport Binding — A mapping between DSIP negotiation and an underlying signaling or media transport.

VC — Verifiable Credential.