Abstract: The traditional pick and place mechanisms in manufacturing and automation often lack adaptability, precision, and cost-effectiveness, when handling objects of different sizes and colours.
In visual tracking tasks, identifying repeatedly appearing targets is a significant challenge. Existing algorithms often use network structures such as CNNs and ViTs to extract the appearance features ...