While semantic search is trendy, good old lexical search is still the backbone. Semantic techniques can improve results, but they work best when added to a solid text-based search foundation.
It’s often beneficial to overlap your chunks to mitigate semantic gaps. My go-to settings are usually a chunk size of 256-304 tokens with 32 tokens of overlap. I’ve found this to work quite well, but your mileage may vary.