Imagine the network has only 20 miners.
Segwit was activated only when at least 95% of the miners signaled support, which means in our case 19 of the 20 miners have upgraded.
The remaining miner will start to see some strange tx on blocks, where anyone can spend.
If he spends one of those tx and then tries to include the spending tx in a new block, the other 19 will reject it, since for them it has an invalid tx. The immense majority of upgraded miners will prevent the chain from forking.
When requesting blocks to its peers, nodes send their software version.
When our sole miner asks for blocks to its peers, they know he didn’t upgrade to Segwit, so they send him the block without the witness, which must be smaller than 1 MB, thus making the block valid according to the old rules.
When an upgraded miner asks for a block to his peers, he receives the block with the witness, where the total must be less than 4 MB.
It’s a soft fork because old nodes consider new blocks valid.